From jthornton at novaresearchonline.com Fri Aug 1 01:18:44 2003 From: jthornton at novaresearchonline.com (Jennifer Thornton) Date: Thu Jul 31 13:24:05 2003 Subject: [Bioperl-l] blastcl3 Message-ID: <9F4AFDC6-C3DF-11D7-81CB-000A95945DFE@novaresearchonline.com> Does anyone know how to use this program? I'm having trouble installing and running it. From simon.andrews at bbsrc.ac.uk Fri Aug 1 03:52:08 2003 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Fri Aug 1 03:54:16 2003 Subject: [Bioperl-l] $_ assignment question Message-ID: <2DC41140A89ED411989D00508BDCD9ED01E28AEB@bi-exsrv1.iapc.bbsrc.ac.uk> > On Thu 16:08, Paul Edlefsen wrote: > > My understanding has been that Bio::SeqIO->new(..) will > > surely use the correct method, while new Bio::SeqIO might > > use the new(..) method from the class that the code appears > > in, which would be incorrect. Have I misunderstood the issue? > > > From: Josh Lauricha > I've always been under the impression they are interchangable. > If they aren't, what would be an example of the wrong function > being called? #!/usr/bin/perl -w use strict; use Bio::SeqIO; my $name = 'Bio::SeqIO'; sub new { return ("Oops!\n"); } my $indirect = new $name; print "INDIRECT: $indirect\n"; my $direct = $name -> new(); print "DIRECT: $direct\n"; From KhanIA1 at Cardiff.ac.uk Fri Aug 1 09:10:34 2003 From: KhanIA1 at Cardiff.ac.uk (Imtiaz KHAN) Date: Fri Aug 1 04:10:44 2003 Subject: [Bioperl-l] Primer3 Message-ID: <3F2A2E89.18571.382802C@localhost> Hello, I am a new user of Bioperl. I was trying to write a program that will get the left and right primers from MIT's Primer3 web site, but in batch. I tried to use Bio::Tools::Primer3 but it did not work. Can any one help me in this regard please. Thanks Imtiaz From Matthew.Betts at ii.uib.no Fri Aug 1 04:33:51 2003 From: Matthew.Betts at ii.uib.no (Matthew Betts) Date: Fri Aug 1 04:33:45 2003 Subject: [Bioperl-l] Errors generated from example code in HOW-TO section Message-ID: Hi, I recently ran up against the same warning, but was happy to see that perl did the comparison correctly anyway eg. both of the following give the warning, but the first prints out $e and the second one doesn't. perl -we '$e = "e-169"; ($e < 1.0) and print "$e\n";' perl -we '$e = "e-169"; ($e > 1.0) and print "$e\n";' Matt > Message: 18 > Date: Fri, 01 Aug 2003 13:16:37 +1000 > From: Wes Barris > Subject: [Bioperl-l] Errors generated from example code in HOW-TO > section > To: Bioperl Mailing List > Message-ID: <3F29DB95.10901@csiro.au> > Content-Type: text/plain; charset=us-ascii; format=flowed > > The HOW-TO section http://www.bioperl.org/HOWTOs/html/Graphics-HOWTO.html > shows examples using this code fragment: > > while( my $hit = $result->next_hit ) { > next unless $hit->significance < 1.0; > my $feature = Bio::SeqFeature::Generic->new(-score => > $hit->raw_score, > -seq_id => $hit->name, > > If warnings are turned on, then the 2nd line above produces these errors: > > Argument "e-171" isn't numeric in numeric lt (<) at blasttoimg.pl line 49, > line 191. > Argument "e-163" isn't numeric in numeric lt (<) at blasttoimg.pl line 49, > line 191. > Argument "e-160" isn't numeric in numeric lt (<) at blasttoimg.pl line 49, > line 191. > Argument "e-160" isn't numeric in numeric lt (<) at blasttoimg.pl line 49, > line 191. > Argument "e-160" isn't numeric in numeric lt (<) at blasttoimg.pl line 49, > line 191. > > I would suggest that when parsing a blast file, if the evalue does not > begin with a digit, a '1' (one) should be placed at the beginning of the > reported evalue. -- Matthew Betts, mailto:matthew.betts@ii.uib.no Phone: (+47) 55 58 40 22 CBU, BCCS, UNIFOB / Universitetet i Bergen Thormohlensgt. 55, N-5020 Bergen, Norway From simon.andrews at bbsrc.ac.uk Fri Aug 1 04:54:35 2003 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Fri Aug 1 04:56:47 2003 Subject: [Bioperl-l] Errors generated from example code in HOW-TO sect ion Message-ID: <2DC41140A89ED411989D00508BDCD9ED01E28AEC@bi-exsrv1.iapc.bbsrc.ac.uk> > > Argument "e-171" isn't numeric in numeric lt (<) at > > blasttoimg.pl line 49, > From: Matthew Betts [mailto:Matthew.Betts@ii.uib.no] > > I recently ran up against the same warning, but was happy to > see that perl did the comparison correctly anyway Oh but it didn't! Try this, then change $num to be 1e-169. Perl guesses, and gets it wrong. #!/usr/bin/perl -w use strict; my $num = 'e-169'; $num += 0; print "Number is $num\n"; if ($num == 1e-169) { print "OK they match\n"; } else { print "Oops, no match\n"; } From Matthew.Betts at ii.uib.no Fri Aug 1 05:20:14 2003 From: Matthew.Betts at ii.uib.no (Matthew Betts) Date: Fri Aug 1 05:20:06 2003 Subject: [Bioperl-l] Errors generated from example code in HOW-TO sect ion In-Reply-To: <2DC41140A89ED411989D00508BDCD9ED01E28AEC@bi-exsrv1.iapc.bbsrc.ac.uk> Message-ID: Whoops, thanks Simon, so it's just treating 'e-169' as zero... I guess it almost is, and hopefully nothing like 'e-2' ever appears in blast output... Matt On Fri, 1 Aug 2003, simon andrews (BI) wrote: > > > > > Argument "e-171" isn't numeric in numeric lt (<) at > > > blasttoimg.pl line 49, > > > From: Matthew Betts [mailto:Matthew.Betts@ii.uib.no] > > > > I recently ran up against the same warning, but was happy to > > see that perl did the comparison correctly anyway > > Oh but it didn't! Try this, then change $num to be 1e-169. Perl guesses, and gets it wrong. > > #!/usr/bin/perl -w > use strict; > > my $num = 'e-169'; > > $num += 0; > > print "Number is $num\n"; > > if ($num == 1e-169) { > print "OK they match\n"; > } > > else { > print "Oops, no match\n"; > } > -- Matthew Betts, mailto:matthew.betts@ii.uib.no Phone: (+47) 55 58 40 22 CBU, BCCS, UNIFOB / Universitetet i Bergen Thormohlensgt. 55, N-5020 Bergen, Norway From shawnh at fugu-sg.org Fri Aug 1 12:28:04 2003 From: shawnh at fugu-sg.org (Shawn Hoon) Date: Fri Aug 1 05:26:35 2003 Subject: [Bioperl-l] Primer3 In-Reply-To: <3F2A2E89.18571.382802C@localhost> References: <3F2A2E89.18571.382802C@localhost> Message-ID: <20757E62-C43D-11D7-A2ED-000A95783436@fugu-sg.org> On Friday, August 1, 2003, at 9:10 AM, Imtiaz KHAN wrote: > Hello, > > I am a new user of Bioperl. I was trying to write a program that will > get the left and right primers from MIT's Primer3 web site, but in > batch. > > I tried to use Bio::Tools::Primer3 but it did not work. Can any one > help me in this regard please. > You will want to use Bio::Tools::Run::Primer3 from the bioperl-run package. It's not available in bioperl-run-1.2.2 so you will need to checkout the live version of bioperl and bioperl-run to work with Primer3 CVS info here: http://cvs.bioperl.org/ shawn > Thanks > > Imtiaz > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -shawn From ajm6q at virginia.edu Fri Aug 1 06:46:51 2003 From: ajm6q at virginia.edu (Aaron J Mackey) Date: Fri Aug 1 06:46:45 2003 Subject: [Bioperl-l] Errors generated from example code in HOW-TO section In-Reply-To: Message-ID: Ahem, not so fast: perl -wle '$e = "e-100"; print $e < "e-10" ? 1 : 0;' Argument "e-10" isn't numeric in numeric lt (<) at -e line 1. Argument "e-100" isn't numeric in numeric lt (<) at -e line 1. 0 perl -wle '$e = "e-100"; print $e == "e-10" ? 1 : 0;' Argument "e-10" isn't numeric in numeric eq (==) at -e line 1. Argument "e-100" isn't numeric in numeric eq (==) at -e line 1. 1 e-100 (or e-10, or e-anything) is getting turned into 0, since "e-" doesn't look like a number, the number parsing stops, and you get 0. -Aaron On Fri, 1 Aug 2003, Matthew Betts wrote: > > Hi, > > I recently ran up against the same warning, but was happy to see that perl > did the comparison correctly anyway > > eg. both of the following give the warning, but the first prints out > $e and the second one doesn't. > > perl -we '$e = "e-169"; ($e < 1.0) and print "$e\n";' > perl -we '$e = "e-169"; ($e > 1.0) and print "$e\n";' > > Matt > > > > Message: 18 > > Date: Fri, 01 Aug 2003 13:16:37 +1000 > > From: Wes Barris > > Subject: [Bioperl-l] Errors generated from example code in HOW-TO > > section > > To: Bioperl Mailing List > > Message-ID: <3F29DB95.10901@csiro.au> > > Content-Type: text/plain; charset=us-ascii; format=flowed > > > > The HOW-TO section http://www.bioperl.org/HOWTOs/html/Graphics-HOWTO.html > > shows examples using this code fragment: > > > > while( my $hit = $result->next_hit ) { > > next unless $hit->significance < 1.0; > > my $feature = Bio::SeqFeature::Generic->new(-score => > > $hit->raw_score, > > -seq_id => $hit->name, > > > > If warnings are turned on, then the 2nd line above produces these errors: > > > > Argument "e-171" isn't numeric in numeric lt (<) at blasttoimg.pl line 49, > > line 191. > > Argument "e-163" isn't numeric in numeric lt (<) at blasttoimg.pl line 49, > > line 191. > > Argument "e-160" isn't numeric in numeric lt (<) at blasttoimg.pl line 49, > > line 191. > > Argument "e-160" isn't numeric in numeric lt (<) at blasttoimg.pl line 49, > > line 191. > > Argument "e-160" isn't numeric in numeric lt (<) at blasttoimg.pl line 49, > > line 191. > > > > I would suggest that when parsing a blast file, if the evalue does not > > begin with a digit, a '1' (one) should be placed at the beginning of the > > reported evalue. > > -- Aaron J Mackey Pearson Laboratory University of Virginia (434) 924-2821 amackey@virginia.edu From jon at compbio.dundee.ac.uk Fri Aug 1 07:31:11 2003 From: jon at compbio.dundee.ac.uk (Jonathan Barber) Date: Fri Aug 1 07:31:00 2003 Subject: [Bioperl-l] $_ assignment question In-Reply-To: References: <028452FC-C386-11D7-8186-000A959EB4C4@gnf.org> Message-ID: <20030801113111.GI24938@flea.compbio.dundee.ac.uk> On Thu, Jul 31, 2003 at 04:01:51PM -0400, Aaron J Mackey wrote: > > Uhh, let me put in my usual snobby and grumpy two cents and say that this > form of code Nazi-ism is really annoying, and introduces massively > difficult to follow CVS changes. Next we'll need to change all > occurrences of: > > $obj->dosomething unless $dont; > > to: > > if(!$dont) > { > $obj->dosomething(); > } > > so that some Java programmers don't get confused. I think this is wasted > effort. Indirect object method syntax is not evil when the method is a > well understood and used "keyword" type of thing like "new", "open", > "connect", etc. Many of us use it to mentally differentiate between > constructors and class methods: > > Net::SMTP->debug(1); > my $smtp = new Net::SMTP @args; > > Yes, I know a constructor is a class method, but (at least to me), it's a > different flavor that is very often seen written with indirect syntax in > Perl. On the flip side, you don't typically see: > > debug Net::SMTP 1; > > If *that* is the type of usage you'd like to get rid of, I have less > of a complaint. Fair enough. > Re: localized $_, I think you're barking up another pedantical tree. If > you'd like to do this cleanly, then I'd rather see you change this: > > while ($_ = $self->_readline) { > if (m/foobar/) { > # ... > > Into: > > while (my $line = $self->_readline) { > if ($line =~ m/foobar/) { > # ... > > Rather than: > > while (local $_ = $self->_readline) { > if(m/foobar/) { > # ... Agreed. But that's a lot harder to fix than sticking a local in front of the offending assignment, as the whole method call has to be checked for core functions that use $_, and ATM I just want to stop Bioperl messing around with $_ and giving me strange bugs. > Of course, this particular brand of pedanticness would indicate that this > construct: > > while() { > # ... > } > > ... should also never be used. Yep. At least not in a module. The reason being if I do this: for (qw(different sequence filehandes)) { $Bioperl_object->random_method($_); # method assigns to $_ do_something_else_with_fh($_); } then do_something_else() is not getting what I expect it to get. Now I could put a "my $fh" after the for, but why should I have too, I should expect the modules to play nicely. If this is intended to be the default behavour for Bioperl, then fine, but it needs to stated at the top of all the Bioperl docs in big letters. Either way, patches are required. > But I (grumpily) consent that the various ways Perl can perform actions > at a distance can in some cases cause frustrating problems for others. > > -Aaron > > P.S. The Conway book is great, but it's not ta biblia. -- Jon From jason at cgt.duhs.duke.edu Fri Aug 1 08:15:41 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Fri Aug 1 08:02:15 2003 Subject: [Bioperl-l] Errors generated from example code in HOW-TO section In-Reply-To: <3F29DB95.10901@csiro.au> References: <3F29DB95.10901@csiro.au> Message-ID: http://bugzilla.bioperl.org/show_bug.cgi?id=1474 It is already fixed on the main trunk - I don't think we intend to a 1.2.3 release. -jason On Fri, 1 Aug 2003, Wes Barris wrote: > The HOW-TO section http://www.bioperl.org/HOWTOs/html/Graphics-HOWTO.html > shows examples using this code fragment: > > while( my $hit = $result->next_hit ) { > next unless $hit->significance < 1.0; > my $feature = Bio::SeqFeature::Generic->new(-score => $hit->raw_score, > -seq_id => $hit->name, > > If warnings are turned on, then the 2nd line above produces these errors: > > Argument "e-171" isn't numeric in numeric lt (<) at blasttoimg.pl line 49, line 191. > Argument "e-163" isn't numeric in numeric lt (<) at blasttoimg.pl line 49, line 191. > Argument "e-160" isn't numeric in numeric lt (<) at blasttoimg.pl line 49, line 191. > Argument "e-160" isn't numeric in numeric lt (<) at blasttoimg.pl line 49, line 191. > Argument "e-160" isn't numeric in numeric lt (<) at blasttoimg.pl line 49, line 191. > > I would suggest that when parsing a blast file, if the evalue does not > begin with a digit, a '1' (one) should be placed at the beginning of the > reported evalue. > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From Richard.Adams at ed.ac.uk Fri Aug 1 07:04:27 2003 From: Richard.Adams at ed.ac.uk (Richard Adams) Date: Fri Aug 1 08:24:27 2003 Subject: [Bioperl-l]more inheritance Message-ID: <3F2A493B.629B3BF2@ed.ac.uk> Just while people are talking about inheritance: I'm writing some Bio::Tools:Analysis modules (they're in the CVS) which run remote sequence analyses, and parse the results into a variety of formats, including Bio::Seq::Meta::Array objects (which hold residue specific data)and Bio::SeqFeature objects (which usually just hold significant results). e.g., my $tool = Bio::Tools::Analysis::Protein::Domcut->new(-seq =>$seq);#predicts domain boundaries in proteins $tool->run; #submits query and retrieves result my $meta = $tool->result('all'); my @fts = $tool->result(Bio::SeqFeatureI'); The problem iswhether $seq needs to be a PrimarySeq object (needed for Bio::Seq::Meta::Array methods to work as it extends from Bio::LocatableSeq, which ISA PrimarySeq ) or a Seq object (needed as input if you want to hangs SeqFeatures onto it), but what if you want both?. My work around (implementing a suggestion by Heikki) at the moment is to have Bio::Seq::Meta::Array inheriting from Bio::LocatableSeq by default, but if the constructor receives a -baseclass = Bio::Seq entry in its argument list it alters @ISA to inherit from Bio::Seq. This works (i.e., you can get meta sequence methodsAND SeqFeature methods now after a submitting a Bio::Seq object to the analysis), but I'm not sure that dynamically altering inheritance is a good idea. So in this case might it be better for LocatableSeq or Seq objects to meta sequence methods by aggregation, rather than meta sequences? Or alternatively just make the problem clear in the documentation aand let the user deal with it? Would be grateful for any suggestions - I have several of Analysis modules to submit but would like to get this sorted out before committing them. Cheers Richard -- Dr Richard Adams Bioinformatician, Psychiatric Genetics Group, Medical Genetics, Molecular Medicine Centre, Western General Hospital, Crewe Rd West, Edinburgh UK EH4 2XU Tel: 44 131 651 1084 richard.adams@ed.ac.uk From dhoworth at mrc-lmb.cam.ac.uk Fri Aug 1 08:45:11 2003 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Fri Aug 1 08:45:06 2003 Subject: [Bioperl-l] $_ assignment question References: <028452FC-C386-11D7-8186-000A959EB4C4@gnf.org> <20030801113111.GI24938@flea.compbio.dundee.ac.uk> Message-ID: <3F2A60D7.50403@mrc-lmb.cam.ac.uk> Jonathan Barber wrote: > On Thu, Jul 31, 2003 at 04:01:51PM -0400, Aaron J Mackey wrote: >>Of course, this particular brand of pedanticness would indicate that this >>construct: >> >>while() { >> # ... >>} >> >>... should also never be used. > > Yep. At least not in a module. The reason being if I do this: > > for (qw(different sequence filehandes)) { > $Bioperl_object->random_method($_); # method assigns to $_ > do_something_else_with_fh($_); > } > > then do_something_else() is not getting what I expect it to get. I was interested by this, so I ran the code (Perl 5.6.1). Turns out it won't run. Perl says: Modification of a read-only value attempted at Test.pm line 14. It does this even with warnings and strict OFF. I'd say this was strong supporting evidence that Jonathan is right :) It also seems that: (i) problems of this kind will show up pretty quickly and (ii) they can be isolated by putting: for ('once') { ... } around the tests of any (all!) method calls. Cheers, Dave test.pl ======= #!/usr/bin/perl use lib ('.'); use Test; for (qw(hello world)) { print "\nA $_\n\n"; Test::g(); print "\nB $_\n"; } Test.pm ======= #!/usr/bin/perl package Test; require Exporter; our @ISA = qw(Exporter); our @EXPORT = qw(); our $VERSION = 1.00; sub g { open FH, '<', 'test.pl'; while () { print $_; } } -- Dave Howorth MRC Centre for Protein Engineering Hills Road, Cambridge, CB2 2QH 01223 252960 From djoubert at mail.mcg.edu Fri Aug 1 08:56:10 2003 From: djoubert at mail.mcg.edu (Douglas Joubert) Date: Fri Aug 1 08:58:40 2003 Subject: [Bioperl-l] Installation question Message-ID: Hello, Fairly new to this group, and I have a question concerning installation, so please bear with me Following the instructions for WIN install via http://bioperl.org/core/latest/install.win -------->1.3.2. ActiveState for Perl 5.8.0 Everything installed correct (I think) but the last comment in the command prompt is "Successfully installed bioperl version 1.2.1 in ActivePerl 5.8.0.806" I thought 1.2.2 was the latest version? Cheers Doug (librarian, not a programmer) Douglas Joubert, M.L.I.S. Instructor and Digital Information Librarian Robert B. Greenblatt M.D. Library Medical College of Georgia Augusta, GA 30912-4400 From ajm6q at virginia.edu Fri Aug 1 09:06:04 2003 From: ajm6q at virginia.edu (Aaron J Mackey) Date: Fri Aug 1 09:05:58 2003 Subject: [Bioperl-l] $_ assignment question In-Reply-To: <3F2A60D7.50403@mrc-lmb.cam.ac.uk> Message-ID: You got caught by lists vs. arrays: perl -wle 'for(qw(a b)) { $_ = "b"; print; }' Modification of a read-only value attempted at -e line 1. perl -wle '@a = qw(a b); for(@a) { $_ = "b"; print; }' b b So problems with reassigning to $_ that might arise in the second case are quite silent and insidious. -Aaron On Fri, 1 Aug 2003, Dave Howorth wrote: > Jonathan Barber wrote: > > On Thu, Jul 31, 2003 at 04:01:51PM -0400, Aaron J Mackey wrote: > >>Of course, this particular brand of pedanticness would indicate that this > >>construct: > >> > >>while() { > >> # ... > >>} > >> > >>... should also never be used. > > > > Yep. At least not in a module. The reason being if I do this: > > > > for (qw(different sequence filehandes)) { > > $Bioperl_object->random_method($_); # method assigns to $_ > > do_something_else_with_fh($_); > > } > > > > then do_something_else() is not getting what I expect it to get. > > I was interested by this, so I ran the code (Perl 5.6.1). Turns out it > won't run. Perl says: > > Modification of a read-only value attempted at Test.pm line 14. > > It does this even with warnings and strict OFF. I'd say this was strong > supporting evidence that Jonathan is right :) It also seems that: > (i) problems of this kind will show up pretty quickly and > (ii) they can be isolated by putting: > > for ('once') { ... } > > around the tests of any (all!) method calls. > > Cheers, Dave > > > test.pl > ======= > #!/usr/bin/perl > > use lib ('.'); > use Test; > > for (qw(hello world)) { > print "\nA $_\n\n"; > Test::g(); > print "\nB $_\n"; > } > > Test.pm > ======= > #!/usr/bin/perl > > package Test; > > require Exporter; > > our @ISA = qw(Exporter); > our @EXPORT = qw(); > our $VERSION = 1.00; > > sub g > { > open FH, '<', 'test.pl'; > while () { > print $_; > } > } > > > -- Aaron J Mackey Pearson Laboratory University of Virginia (434) 924-2821 amackey@virginia.edu From madismetsis at hotmail.com Fri Aug 1 09:08:53 2003 From: madismetsis at hotmail.com (Madis Metsis) Date: Fri Aug 1 09:08:43 2003 Subject: [Bioperl-l] problems with installing Bioperl on OSX Message-ID: Hi! I am installing Bioperl on a OSX machine and get this message: ***** Error: Unable to locate installed Perl libraries or Perl source code. It is recommended that you install perl in a standard location before building extensions. Some precompiled versions of perl do not contain these header files, so you cannot build extensions. In such a case, please build and install your perl from a fresh perl distribution. It usually solves this kind of problem. (You get this message, because MakeMaker could not find "/System/Library/Perl/darwin/CORE/perl.h") **** Thats true, perl.h is not in right place, since both Apple preinstalled and current Serverlogistics installer are not placing it in that place. How do I get around it. To install Perl AGAIN having 2 copies already on machine sounds.... Is there a line in Makefile.pl that could be modified to "show" to location of perl.h and the rest of perl. Thanks in advance for any kind of help Madis Metsis Center for Genomics and Bioinformatics Karolinska Institute Stockholm Sweden _________________________________________________________________ Help STOP SPAM with the new MSN 8 and get 2 months FREE* http://join.msn.com/?page=features/junkmail From yaofx at xymu.net Fri Aug 1 09:39:18 2003 From: yaofx at xymu.net (yaofx) Date: Fri Aug 1 09:39:18 2003 Subject: [Bioperl-l] About Bio::DB::GenBank! Message-ID: <3F2A6D86.9040006@xymu.net> Hello, I have installed Perl 5.6.1 for WIN32, and Bioperl version 1.2.1. The following is the script ,which can retrieve data from GenBank by sequences' gi, but can not get the results by accession number. I replace "get_Stream_by_id" with ""get_Stream_by_acc", also failed. The error message is : ------------- EXCEPTION ------------- MSG: WebDBSeqI Error - check query sequences! STACK Bio::DB::WebDBSeqI::get_seq_stream c:/Perl/site/lib/Bio/DB/WebDBSeqI.pm:46 4 STACK Bio::DB::WebDBSeqI::get_Stream_by_id c:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 259 STACK toplevel web_gi2seq_vi.pl:50 -------------------------------------- BTW, I don't change any code about Bioperl package. What's matter with it? and how will i do next? Thanks in advance for any kind of help Fengxia #!/usr/bin/perl $idlist = $ARGV[0]; if (@ARGV != 1){ print "USAGE: perl web_id2seq.pl \n"; exit(1); } $faoutfile = $idlist."_fa.txt"; (unlink $faoutfile) if (-e $faoutfile); open (INPUT,$idlist); while ($line = ){ chomp ($line); $line =~ s/\r//; push (@querylist,$line); } $list = join ",",@querylist; close INPUT; use Bio::SeqIO; use Bio::DB::GenBank; my $gb = new Bio::DB::GenBank; my $seqout = new Bio::SeqIO(-file => ">$faoutfile", -format => 'fasta'); my $seqio = $gb->get_Stream_by_id([$list]); while($seq = $seqio->next_seq ) { $seqout->write_seq($seq); } From dhoworth at mrc-lmb.cam.ac.uk Fri Aug 1 10:04:51 2003 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Fri Aug 1 10:04:45 2003 Subject: [Bioperl-l] $_ assignment question References: Message-ID: <3F2A7383.6050804@mrc-lmb.cam.ac.uk> Aaron J Mackey wrote: > You got caught by lists vs. arrays: Well, I got caught but it wasn't by lists vs arrays. The for() statement takes a list, so the array is turned into a list by being used in the for statement. What I didn't realize is that for(list) aliases to the list members so that they're r/w access. Now there's a whole new bag of tricks I can use :) So I retract my >> (i) problems of this kind will show up pretty quickly and agree with you when you say: > So problems with reassigning to $_ that might arise in the second case are > quite silent and insidious. but I think it's still correct to say: >> (ii) they can be isolated by putting: >>for ('once') { ... } >>around the tests of any (all!) method calls. Cheers, Dave -- Dave Howorth MRC Centre for Protein Engineering Hills Road, Cambridge, CB2 2QH 01223 252960 From jason at cgt.duhs.duke.edu Fri Aug 1 10:22:49 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Fri Aug 1 10:09:19 2003 Subject: [Bioperl-l] problems with installing Bioperl on OSX In-Reply-To: References: Message-ID: no idea - this is question for perl installation gurus. What does % perl -V I assume you get the message when you try and do % perl Makefile.PL Are you sure that the version in your path is the version you want to run? I've installed perl 5.8.0 in /usr/local/bin on OSX and have not had problems. -jason On Fri, 1 Aug 2003, Madis Metsis wrote: > Hi! > > I am installing Bioperl on a OSX machine and get this message: > ***** > Error: Unable to locate installed Perl libraries or Perl source code. > > It is recommended that you install perl in a standard location before > building extensions. Some precompiled versions of perl do not contain > these header files, so you cannot build extensions. In such a case, > please build and install your perl from a fresh perl distribution. It > usually solves this kind of problem. > > (You get this message, because MakeMaker could not find > "/System/Library/Perl/darwin/CORE/perl.h") > **** > > Thats true, perl.h is not in right place, since both Apple preinstalled and > current Serverlogistics installer are not placing it in that place. > > How do I get around it. To install Perl AGAIN having 2 copies already on > machine sounds.... > Is there a line in Makefile.pl that could be modified to "show" to location > of perl.h and the rest of perl. > > Thanks in advance for any kind of help > > Madis Metsis > > Center for Genomics and Bioinformatics > Karolinska Institute > Stockholm > Sweden > > _________________________________________________________________ > Help STOP SPAM with the new MSN 8 and get 2 months FREE* > http://join.msn.com/?page=features/junkmail > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From brian_osborne at cognia.com Fri Aug 1 12:40:58 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Aug 1 12:44:53 2003 Subject: [Bioperl-l] problems with installing Bioperl on OSX In-Reply-To: Message-ID: Madis, For installation in "non-standard" locations you need to do something like this (from the INSTALL file): INSTALLING BIOPERL IN A PERSONAL OR PRIVATE MODULE AREA If you lack permission to install perl modules into the standard site_perl/ system area you can configure bioperl to install itself anywhere you choose. Ideally this would be a personal perl directory or standard place where you plan to put all your 'local' or personal perl modules. Note: you _must_ have write permission to this area. Simply pass a parameter to perl as it builds your system specific makefile. Example: perl Makefile.PL LIB=/home/dag/My_Local_Perl_Modules make make test make install This tells perl to install bioperl in the desired place, e.g.: /home/dag/My_Perl_Modules/Bio/Seq.pm Then in your Bioperl script you would write: use lib "/home/dag/My_Local_Perl_Modules"; use Bio::Seq; The man pages will probably be installed in $LIB/man. For more information on these sorts of custom installs see the documentation for ExtUtils::MakeMaker. See below for how to use modules that are not installed in the standard Perl5 location. BIO -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jason Stajich Sent: Friday, August 01, 2003 10:23 AM To: Madis Metsis Cc: bioperl-l@bioperl.org Subject: Re: [Bioperl-l] problems with installing Bioperl on OSX no idea - this is question for perl installation gurus. What does % perl -V I assume you get the message when you try and do % perl Makefile.PL Are you sure that the version in your path is the version you want to run? I've installed perl 5.8.0 in /usr/local/bin on OSX and have not had problems. -jason On Fri, 1 Aug 2003, Madis Metsis wrote: > Hi! > > I am installing Bioperl on a OSX machine and get this message: > ***** > Error: Unable to locate installed Perl libraries or Perl source code. > > It is recommended that you install perl in a standard location before > building extensions. Some precompiled versions of perl do not contain > these header files, so you cannot build extensions. In such a case, > please build and install your perl from a fresh perl distribution. It > usually solves this kind of problem. > > (You get this message, because MakeMaker could not find > "/System/Library/Perl/darwin/CORE/perl.h") > **** > > Thats true, perl.h is not in right place, since both Apple preinstalled and > current Serverlogistics installer are not placing it in that place. > > How do I get around it. To install Perl AGAIN having 2 copies already on > machine sounds.... > Is there a line in Makefile.pl that could be modified to "show" to location > of perl.h and the rest of perl. > > Thanks in advance for any kind of help > > Madis Metsis > > Center for Genomics and Bioinformatics > Karolinska Institute > Stockholm > Sweden > > _________________________________________________________________ > Help STOP SPAM with the new MSN 8 and get 2 months FREE* > http://join.msn.com/?page=features/junkmail > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From ccavnor at systemsbiology.org Fri Aug 1 13:03:47 2003 From: ccavnor at systemsbiology.org (Christopher Cavnor) Date: Fri Aug 1 13:03:35 2003 Subject: [Bioperl-l] problems with installing Bioperl on OSX Message-ID: <64B351282A4BBA4A9EBB264DA6FDBC28A6185C@exchange.systemsbiology.net> I am installing Bioperl on a OSX machine and get this message: ***** Error: Unable to locate installed Perl libraries or Perl source code. It is recommended that you install perl in a standard location before building extensions. Some precompiled versions of perl do not contain these header files, so you cannot build extensions. In such a case, please build and install your perl from a fresh perl distribution. It usually solves this kind of problem. (You get this message, because MakeMaker could not find "/System/Library/Perl/darwin/CORE/perl.h") **** Thats true, perl.h is not in right place, since both Apple preinstalled and current Serverlogistics installer are not placing it in that place. How do I get around it. To install Perl AGAIN having 2 copies already on machine sounds.... >> not so bad - check out: http://developer.apple.com/internet/macosx/perl.html Is there a line in Makefile.pl that could be modified to "show" to location of perl.h and the rest of perl. >> % perl -V will do that. You can also append to the beginning of the @INC array by doing (C shell syntax): % setenv PERL5LIB /where/to/look Thanks in advance for any kind of help Madis Metsis Center for Genomics and Bioinformatics Karolinska Institute Stockholm Sweden _________________________________________________________________ Help STOP SPAM with the new MSN 8 and get 2 months FREE* http://join.msn.com/?page=features/junkmail _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From jmanning at genome.wi.mit.edu Fri Aug 1 13:18:22 2003 From: jmanning at genome.wi.mit.edu (Jonathan Manning) Date: Fri Aug 1 13:18:10 2003 Subject: [Bioperl-l] About Bio::DB::GenBank! In-Reply-To: <3F2A6D86.9040006@xymu.net> References: <3F2A6D86.9040006@xymu.net> Message-ID: <3F2AA0DE.4010605@genome.wi.mit.edu> I just encountered the same error. It looks like both Bio::DB::GenBank and Bio::DB::GenPept search the protein database. So, Bio::DB::GenBank is not returning anything when you query for an accession. To verify this, change: my $gb = new Bio::DB::GenBank; to: my $gb = new Bio::DB::GenBank(-format => 'fasta', -verbose => 1); For me, this prints: url is http://www.ncbi.nih.gov/entrez/eutils/efetch.fcgi?rettype=fasta&db=protein&id=AC068609&tool=bioperl&retmode=text&usehistory=n Sure enough, visiting this gives me a blank page. However, if I substitute 'db=nucleotide' for 'db=protein' in that url, it works. This bug seems to exist in 1.2 and 1.2.1, but I think is fixed in 1.2.2. (at least in the CVS head I checked...) Either upgrade to 1.2.2 or edit Bio/DB/GenBank.pm and change 'protein' to 'nucleotide' in the BEGIN block. ~Jonathan yaofx wrote: > Hello, > > I have installed Perl 5.6.1 for WIN32, and Bioperl version 1.2.1. > The following is the script ,which can retrieve data from GenBank by > sequences' gi, > but can not get the results by accession number. > I replace "get_Stream_by_id" with ""get_Stream_by_acc", also failed. > > The error message is : > > ------------- EXCEPTION ------------- > MSG: WebDBSeqI Error - check query sequences! > > STACK Bio::DB::WebDBSeqI::get_seq_stream > c:/Perl/site/lib/Bio/DB/WebDBSeqI.pm:46 > 4 > STACK Bio::DB::WebDBSeqI::get_Stream_by_id > c:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: > 259 > STACK toplevel web_gi2seq_vi.pl:50 > > -------------------------------------- > > BTW, I don't change any code about Bioperl package. > What's matter with it? and how will i do next? > > Thanks in advance for any kind of help > > Fengxia > > > #!/usr/bin/perl > > $idlist = $ARGV[0]; > > if (@ARGV != 1){ > print "USAGE: perl web_id2seq.pl \n"; > exit(1); > } > > $faoutfile = $idlist."_fa.txt"; > (unlink $faoutfile) if (-e $faoutfile); > > open (INPUT,$idlist); > > while ($line = ){ > chomp ($line); > $line =~ s/\r//; > push (@querylist,$line); > } > $list = join ",",@querylist; > close INPUT; > > use Bio::SeqIO; > use Bio::DB::GenBank; > my $gb = new Bio::DB::GenBank; > > my $seqout = new Bio::SeqIO(-file => ">$faoutfile", -format => 'fasta'); > my $seqio = $gb->get_Stream_by_id([$list]); > > while($seq = $seqio->next_seq ) { > $seqout->write_seq($seq); > } > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Jonathan Manning Whitehead Institute Center for Genome Research Finishing Process Analyst / Data Analyst From hlapp at gnf.org Fri Aug 1 15:53:43 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Aug 1 15:53:32 2003 Subject: [Bioperl-l] Errors generated from example code in HOW-TO section In-Reply-To: Message-ID: On Friday, August 1, 2003, at 05:15 AM, Jason Stajich wrote: > I don't think we intend to a > 1.2.3 release. Why not? We should really release more often (and given the lack of releases from biosql and bioperl-db this is admittedly the blackest sheep speaking). -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Fri Aug 1 16:04:09 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Aug 1 16:03:57 2003 Subject: [Bioperl-l] About Bio::DB::GenBank! In-Reply-To: <3F2AA0DE.4010605@genome.wi.mit.edu> Message-ID: <4FE2ABEE-C45B-11D7-9E33-000A959EB4C4@gnf.org> On Friday, August 1, 2003, at 10:18 AM, Jonathan Manning wrote: > This bug seems to exist in 1.2 and 1.2.1, but I think is fixed in > 1.2.2. (at least in the CVS head I checked...) > Jason fixed it on the branch and the main trunk. It's in 1.2.2. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From yaofx at xymu.net Fri Aug 1 22:54:43 2003 From: yaofx at xymu.net (yaofx) Date: Fri Aug 1 22:55:02 2003 Subject: [Bioperl-l] About Bio::DB::GenBank! References: <3F2A6D86.9040006@xymu.net> <3F2AA0DE.4010605@genome.wi.mit.edu> Message-ID: <3F2B27F3.2000309@xymu.net> Upgrade to 1.2.2 and it works well. Thanks a lot!!! Fengxia Jonathan Manning wrote: > I just encountered the same error. It looks like both Bio::DB::GenBank > and Bio::DB::GenPept search the protein database. So, Bio::DB::GenBank > is not returning anything when you query for an accession. > > To verify this, change: > my $gb = new Bio::DB::GenBank; > to: > my $gb = new Bio::DB::GenBank(-format => 'fasta', > -verbose => 1); > > For me, this prints: > url is > http://www.ncbi.nih.gov/entrez/eutils/efetch.fcgi?rettype=fasta&db=protein&id=AC068609&tool=bioperl&retmode=text&usehistory=n > > > Sure enough, visiting this gives me a blank page. However, if I > substitute 'db=nucleotide' for 'db=protein' in that url, it works. > > This bug seems to exist in 1.2 and 1.2.1, but I think is fixed in > 1.2.2. (at least in the CVS head I checked...) > > Either upgrade to 1.2.2 or edit Bio/DB/GenBank.pm and change 'protein' > to 'nucleotide' in the BEGIN block. > > ~Jonathan > From bto_86 at gte.net Sat Aug 2 03:44:02 2003 From: bto_86 at gte.net (bto_86@gte.net) Date: Sat Aug 2 03:44:04 2003 Subject: [Bioperl-l] Re: what up Message-ID: <200308020744.h727hw4T019305@localhost.localdomain> hey its me again, i was wondering if you'd be interested in this site Every day thousands of Americans are saving money, don't be one of the few who miss out! you're placed up for auction and financers outbid each other on getting you the best deal on your mortgage! http://r.aol.com/cgi/redir-complex?url=http://lowinterest@buynow3sx.com/viewso65/index.asp?RefID=198478 From sofia at neuro.utah.edu Fri Aug 1 13:59:46 2003 From: sofia at neuro.utah.edu (Sofia) Date: Sun Aug 3 12:11:36 2003 Subject: [Bioperl-l] problems with parsing hmmpfam Message-ID: <002f01c35856$b1d73a50$f500000a@planaria2> I am trying to parse a hmmpfam report. I am mostly sucessful until I have the following error: Use of uninitialized value in array element at /usr/lib/perl5/site_perl/5.8.0/Bio/SearchIO/hmmer.pm line 488, line 56473. ------------- EXCEPTION ------------- MSG: Somehow the Model table order does not match the order in the domains (got [no, expected AFP) STACK Bio::SearchIO::hmmer::next_result /usr/lib/perl5/site_perl/5.8.0/Bio/SearchIO/hmmer.pm:489 STACK toplevel ./parseHMM.pl:8 -------------------------------------- It seems to only occur when my scores for sequence family classification is "no hits above threshold" and my parsed for domains has a hit (though usually very poor score). How can I filter out these records with SearchIO? Thanks, Sofia Output which causes me to have this error ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Query sequence: 26SrRNA Accession: [none] Description: [none] Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- [no hits above thresholds] Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- AFP 1/1 268 279 .. 1 12 [] 0.3 89 Alignments of top-scoring domains: AFP: domain 1 of 1, from 268 to 279: score 0.3, E = 89 *->tCTgStnCteAt<-* tCT +t Ct+A 26SrRNA 268 TCTCGTACTGAG 279 // Query sequence: 40Sribosomal.protS11 Accession: [none] Description: [none] Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- [no hits above thresholds] Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- AFP 1/1 127 138 .. 1 12 [] 1.7 53 Alignments of top-scoring domains: AFP: domain 1 of 1, from 127 to 138: score 1.7, E = 53 *->tCTgStnCteAt<-* + Tg+tnC + 40Sribosom 127 AATGGTNCATTA 138 // ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- From jason at cgt.duhs.duke.edu Sun Aug 3 13:47:40 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Sun Aug 3 13:33:31 2003 Subject: [Bioperl-l] problems with parsing hmmpfam In-Reply-To: <002f01c35856$b1d73a50$f500000a@planaria2> References: <002f01c35856$b1d73a50$f500000a@planaria2> Message-ID: bioperl 1.2.2 and bioperl-live in CVS will parse these reports fine as far as I can tell so I would suggest upgrading to bioperl 1.2.2 -jason On Fri, 1 Aug 2003, Sofia wrote: > I am trying to parse a hmmpfam report. I am mostly sucessful until I have the following error: > > > Use of uninitialized value in array element at /usr/lib/perl5/site_perl/5.8.0/Bio/SearchIO/hmmer.pm line 488, line 56473. > > ------------- EXCEPTION ------------- > MSG: Somehow the Model table order does not match the order in the domains (got [no, expected AFP) > STACK Bio::SearchIO::hmmer::next_result /usr/lib/perl5/site_perl/5.8.0/Bio/SearchIO/hmmer.pm:489 > STACK toplevel ./parseHMM.pl:8 > > -------------------------------------- > It seems to only occur when my scores for sequence family classification is "no hits above threshold" and my parsed for domains has a hit (though usually very poor score). > > How can I filter out these records with SearchIO? > > Thanks, > Sofia > > > Output which causes me to have this error > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Query sequence: 26SrRNA > Accession: [none] > Description: [none] > > Scores for sequence family classification (score includes all domains): > Model Description Score E-value N > -------- ----------- ----- ------- --- > [no hits above thresholds] > > Parsed for domains: > Model Domain seq-f seq-t hmm-f hmm-t score E-value > -------- ------- ----- ----- ----- ----- ----- ------- > AFP 1/1 268 279 .. 1 12 [] 0.3 89 > > Alignments of top-scoring domains: > AFP: domain 1 of 1, from 268 to 279: score 0.3, E = 89 > *->tCTgStnCteAt<-* > tCT +t Ct+A > 26SrRNA 268 TCTCGTACTGAG 279 > > // > > Query sequence: 40Sribosomal.protS11 > Accession: [none] > Description: [none] > > Scores for sequence family classification (score includes all domains): > Model Description Score E-value N > -------- ----------- ----- ------- --- > [no hits above thresholds] > > Parsed for domains: > Model Domain seq-f seq-t hmm-f hmm-t score E-value > -------- ------- ----- ----- ----- ----- ----- ------- > AFP 1/1 127 138 .. 1 12 [] 1.7 53 > > Alignments of top-scoring domains: > AFP: domain 1 of 1, from 127 to 138: score 1.7, E = 53 > *->tCTgStnCteAt<-* > + Tg+tnC + > 40Sribosom 127 AATGGTNCATTA 138 > > // > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From yaofx at xymu.net Mon Aug 4 09:29:32 2003 From: yaofx at xymu.net (yaofx) Date: Mon Aug 4 09:29:25 2003 Subject: [Bioperl-l] about error messages! Message-ID: <3F2E5FBC.3070206@xymu.net> Hello, The following script is my first code using Bioperl module. It works well, now I want to add some codes to get error messages and save as log file, such as : " Could not connect internet", "ID does not exist", "verbose ID" , and so on. But I can not find the information about that, would you all like help me? Thanks in advance! Fengxia #!/usr/bin/perl -w $idlist = $ARGV[0]; if (@ARGV != 1){ print "USAGE: perl web_id2seq.pl \n"; exit(1); } $faoutfile = $idlist."_fa.txt"; (unlink $faoutfile) if (-e $faoutfile); open (INPUT,$idlist); while ($line = ){ chomp ($line); $line =~ s/\r//; push (@querylist,$line); } $query = join ",",@querylist; close INPUT; use Bio::SeqIO; use Bio::DB::GenBank; my $gb = new Bio::DB::GenBank; my $seqout = new Bio::SeqIO(-file => ">$faoutfile", -format => 'fasta'); my $seqio = $gb->get_Stream_by_id([$query]); while($seq = $seqio->next_seq ) { $seqout->write_seq($seq); } From m2mwh at csc.liv.ac.uk Tue Aug 5 06:43:49 2003 From: m2mwh at csc.liv.ac.uk (Michael Hughes) Date: Tue Aug 5 06:43:31 2003 Subject: [Bioperl-l] Extracting GenBank Information Message-ID: Hello I am writing a code to extract mouse inbred strain information from GenBank. Using get_Seq_by_acc, I am able to bring up the GenBank file on screen but I can't find a way to search through this file and extract the relevant information. I have tried saving it locally to search the output file but I can only output the fasta format which has no strain information (I have tried using $gb = new Bio::DB::GenBank (-format => 'genbank'); but I still receive fasta). So, my questions are: - is there a way to search the file generated on line? - if not, how do I output a full GenBank file to a local folder? Thanks in advance for any help Michael Hughes From marino at tofu.tamu.edu Tue Aug 5 09:14:42 2003 From: marino at tofu.tamu.edu (Leonardo Marino-Ramirez) Date: Tue Aug 5 09:19:27 2003 Subject: [Bioperl-l] Extracting GenBank Information In-Reply-To: Message-ID: Hi Michael, What you want to do is get features for the genbank objects and extract them as follows: use Bio::DB::GenBank; my $gb = new Bio::DB::GenBank(); my $seq = $gb->get_Seq_by_acc('AF308740.1'); my $desc = $seq->desc(); my $length = $seq->length(); my $id = $seq->primary_id(); print "GI: $id\tDESC: $desc\tLEN: $length bp\n"; my @features = $seq->all_SeqFeatures(); ## Your favorite tags can be collected here my @cds = grep { $_->primary_tag eq 'CDS' } $seq->get_SeqFeatures(); foreach my $feature (@features) { my $primary_tag = $feature->primary_tag(); my $start = $feature->start(); my $end = $feature->end(); my $strand = $feature->strand(); print "$primary_tag\t$start\t$end\t$strand\n"; foreach my $each_tag ($feature->get_all_tags()) { my @tag_values = $feature->each_tag_value($each_tag); print "\t$each_tag\t@tag_values\n"; } } foreach my $feature (@cds) { my $primary_tag = $feature->primary_tag(); my $protein = $feature->seq->translate->subseq(1,10); print "\n$primary_tag\t$protein\n"; } Regards, Leonardo On Tue, 5 Aug 2003, Michael Hughes wrote: > > Hello > > I am writing a code to extract mouse inbred strain information from > GenBank. > > Using get_Seq_by_acc, I am able to bring up the GenBank file on screen but > I can't find a way to search through this file and extract the relevant > information. > I have tried saving it locally to search the output file but I can only > output the fasta format which has no strain information (I have tried using > $gb = new Bio::DB::GenBank (-format => 'genbank'); but I still receive > fasta). > > So, my questions are: > > - is there a way to search the file generated on line? > - if not, how do I output a full GenBank file to a local folder? > > Thanks in advance for any help > > Michael Hughes > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ___________________________________________________ _/ _/ Leonardo Marino-Ramirez _/ _/ _/_/_/ marino@tofu.tamu.edu _/ _/_/ _/_/ _/ 11915 Renwood Lane _/_/_/_/ _/ _/_/_/ Rockville, MD 20852 _/ _/ _/ Phone: (301) 770-2388 _/ _/ _/ http://marino-johnson.org/ ___________________________________________________ From rfn at uni-bremen.de Tue Aug 5 11:30:43 2003 From: rfn at uni-bremen.de (Rolf Nimzyk) Date: Tue Aug 5 11:12:17 2003 Subject: [Bioperl-l] Extracting subsequences from genbank files Message-ID: <200308051730.43996.rfn@uni-bremen.de> Hi, I use a sligthly modyfied example script from Jason to extract subsequences out of a genbank files created from a sequence by GeneMachine, http://genome.nhgri.nih.gov/genemachine/. With some genbank files it works fine but with others not. I don't now what the problem is. I am using Perl 5.8, BioPerl 1.2.2. Thanks in advance Rolf The script #!/usr/bin/perl -w # Contributed by Jason Stajich # simple extract the CDS features from a genbank file and # write out the CDS and Peptide sequences use strict; use Bio::SeqIO; my $filename = shift || die("pass in a genbank filename on the cmd line"); my $in = new Bio::SeqIO(-file => $filename, -format => 'genbank'); my $out = new Bio::SeqIO(-file => ">$filename.gene.fas"); while( my $seq = $in->next_seq ) { my @cds = grep { $_->primary_tag eq 'gene' } $seq->get_SeqFeatures(); foreach my $feature ( @cds ) { my $featureseq = $feature->spliced_seq; my $newid = $filename . ' possible gene ' . $feature->start . ".." . $feature->end; $featureseq->display_id($newid); $out->write_seq($featureseq); } } The exception Argument "bp" isn't numeric in numeric gt (>) at /usr/local/lib/perl5/site_perl/5.8.0/Bio/PrimarySeq.pm line 362, line 2376. ------------- EXCEPTION ------------- MSG: You have to have start positive and length less than the total length of sequence [4682:4770] Total bp STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.8.0/Bio/PrimarySeq.pm:362 STACK Bio::PrimarySeqI::trunc /usr/local/lib/perl5/site_perl/5.8.0/Bio/PrimarySeqI.pm:456 STACK Bio::SeqFeature::Generic::seq /usr/local/lib/perl5/site_perl/5.8.0/Bio/SeqFeature/Generic.pm:604 STACK toplevel extract_cds_exon.pl:24 Part of the genbank file LOCUS zwischen_LOC_126015_und_LOC_34292888777 bp DNA linear 01-JAN-1900 DEFINITION zwischen LOC 126015 und LOC 342928.seq. ACCESSION zwischen_LOC_126015_und_LOC_342928 VERSION KEYWORDS . SOURCE Unknown. ORGANISM Unknown. Unclassified. FEATURES Location/Qualifiers source 1..88777 /mol_type="unassigned DNA" gene complement(join(1000..1176,2043..2243)) /gene="HMMGene" /note="HMMGene; Prob1=0.426, Prob2=0.602, Prob3=0.301 bestparse:cds_3" repeat_region 1108..1390 /note="RepeatMasker: AluSx" /rpt_family="SINE/Alu" repeat_region 1402..1512 /note="RepeatMasker: FLAM_C" /rpt_family="SINE/Alu" repeat_region 1541..1653 /note="RepeatMasker: MER57B" /rpt_family="LTR/ERV1" repeat_region 1654..1953 /note="RepeatMasker: AluSp" /rpt_family="SINE/Alu" repeat_region 1954..2272 /note="RepeatMasker: MER57B" /rpt_family="LTR/ERV1" gene complement(join(2043..2243,10387..10426)) /gene="Genscan" /note="GenScan; P1=Prom, P2=0.526" repeat_unit 2281..2321 /note="Sputnik" /rpt_type=pentanucleotide repeat_unit 2330..2398 /note="Sputnik" /rpt_type=pentanucleotide repeat_region complement(2385..2695) /note="RepeatMasker: AluSx" /rpt_family="SINE/Alu" repeat_region 2697..2734 /note="RepeatMasker: MER57B" /rpt_family="LTR/ERV1" repeat_region 2735..3247 /note="RepeatMasker: MER57B-int" /rpt_family="LTR/ERV1" repeat_region complement(3248..3551) /note="RepeatMasker: AluY" /rpt_family="SINE/Alu" repeat_region 3552..3646 /note="RepeatMasker: MER57B-int" /rpt_family="LTR/ERV1" repeat_region 3649..3939 /note="RepeatMasker: AluSx" /rpt_family="SINE/Alu" repeat_region 3941..4263 /note="RepeatMasker: AluSq" /rpt_family="SINE/Alu" repeat_region 4276..4450 /note="RepeatMasker: MER57B-int" /rpt_family="LTR/ERV1" repeat_region 4445..5174 /note="RepeatMasker: MER57B-int" /rpt_family="LTR/ERV1" exon 4682..4770 /note="MZEF; P=0.545" gene join(4758..4770,8044..8208,9490..9656,22534..22662, 42550..42582,46069..46096,52948..53003) /gene="HMMGene" /note="HMMGene; Prob1=0.242, Prob2=0.894, Prob3=0.319, Prob4=0.589, Prob5=0.406, Prob6=0.405, Prob7=0.006, Prob8=0.000 bestparse:cds_1" repeat_region 5193..5300 /note="RepeatMasker: MER57-int" /rpt_family="LTR/ERV1" repeat_region 5303..5630 /note="RepeatMasker: AluSx" /rpt_family="SINE/Alu" repeat_unit 5587..5630 /gene="HMMGene" /note="Sputnik" /rpt_type=pentanucleotide repeat_region 5636..5741 /note="RepeatMasker: U6" /rpt_family="snRNA" repeat_region complement(5979..6276) /note="RepeatMasker: AluSq" /rpt_family="SINE/Alu" repeat_region 6290..6438 /note="RepeatMasker: MER57-int" /rpt_family="LTR/ERV1" repeat_region 6421..6876 /note="RepeatMasker: MER57-int" /rpt_family="LTR/ERV1" repeat_region 7097..7586 /note="RepeatMasker: MER57-int" /rpt_family="LTR/ERV1" repeat_region 7623..7791 /note="RepeatMasker: AluSg/x" /rpt_family="SINE/Alu" repeat_region complement(8267..8896) /note="RepeatMasker: AluSx" /rpt_family="SINE/Alu" repeat_region complement(8948..8998) /note="RepeatMasker: FLAM" /rpt_family="SINE/Alu" repeat_region complement(8999..9323) /note="RepeatMasker: AluSx" /rpt_family="SINE/Alu" gene complement(join(9914..10210,19268..19390,40859..40909)) /gene="HMMGene" /note="HMMGene; Prob1=0.245, Prob2=0.500, Prob3=0.397, Prob4=0.007 bestparse:cds_2" repeat_region 10551..10860 /note="RepeatMasker: AluSx" /rpt_family="SINE/Alu" From shin at cbs.umn.edu Tue Aug 5 17:17:52 2003 From: shin at cbs.umn.edu (Shin Enomoto) Date: Tue Aug 5 17:18:13 2003 Subject: [Bioperl-l] load_gff.pl question Message-ID: <45F418C6-C78A-11D7-879A-0003935652B4@biosci.cbs.umn.edu> I am getting erratic results with the load_gff.pl. 1) I was trying to load a table of 900 lines. Of the 900 entries, ~30 are highly unique and they all loaded normally. The remainder was ~600, ~250 and 4 of very similar items only differing slightly. It loaded 6/600, 5/250 and 2/4. 2) I have 10 large tables of ~250000 lines each. I was able to load the first table. load_gff.pl will not load any other tables. Where do I start to customize this script to allow loading of large number of similar entities? Shin Enomoto 295 ASLVM 1988 Fitch Ave. St. Paul, MN 55108 612-625-7737 From Yan.Fantei at unice.fr Wed Aug 6 04:30:29 2003 From: Yan.Fantei at unice.fr (=?ISO-8859-1?Q?Fante=EF?= Caujolle Yan) Date: Wed Aug 6 04:30:31 2003 Subject: [Bioperl-l] XML -> Flat Query-Anchored & Identities Message-ID: <1060158653.2819.23.camel@bioch-81.unice.fr> Hello all, How to create a "Flat Query-Anchored with Identities" report from xml blast result (option -m7 of Blast) ? Cheers, Yan Yan Fante? Caujolle INSERM U470 Centre de Biochimie PArc Valrose Nice 06108 France From simon.andrews at bbsrc.ac.uk Wed Aug 6 05:16:18 2003 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Wed Aug 6 05:20:19 2003 Subject: [Bioperl-l] XML -> Flat Query-Anchored & Identities Message-ID: <2DC41140A89ED411989D00508BDCD9ED01E28AFE@bi-exsrv1.iapc.bbsrc.ac.uk> > -----Original Message----- > From: Fante? Caujolle Yan [mailto:Yan.Fantei@unice.fr] > Sent: 06 August 2003 09:31 > To: bioperl-l@bioperl.org > Subject: [Bioperl-l] XML -> Flat Query-Anchored & Identities > > > Hello all, > > How to create a "Flat Query-Anchored with Identities" report from xml > blast result (option -m7 of Blast) ? Not exactly sure of the output format you want, but you probably want to use the Bio::SearchIO module to parse your blast report. #!/usr/bin/perl -w use strict; use Bio::SearchIO; my $search = Bio::SearchIO -> new (-format => 'blastxml', -file => 'your_blast_file.xml'); # Assuming you only have one report in your file. # You can put this in a loop for a multiple report file my $report = Bio::SearchIO -> next_report(); while (my $hit = $report -> next_hit()){ print "Found a hit to ", $hit->name; print " which is ", $hit->description() , "\n"; # Again you can put this bit in a loop if you want # details for individual hsps my $first_hsp = $hit -> next_hsp(); print "Total length of hit was ", $hsp->length('total'),"bp \n"; print "Total identity of hit was ", $hsp->frac_identical('total'),"% \n"; } From Laurence.Amilhat at clermont.inra.fr Wed Aug 6 03:09:56 2003 From: Laurence.Amilhat at clermont.inra.fr (Laurence Amilhat) Date: Wed Aug 6 08:28:43 2003 Subject: [Bioperl-l] Bio::Graphics Message-ID: <5.1.1.6.0.20030806090354.00b28208@valmont> Hi, I try to learn how to use the module Bio::Graphics. I found he How To from Lincoln Stein on the web. I try to practice with the examples, it's working except for the labels of the features that don't appear on my figure. Does anybody ever use this module? This is the example: #!/usr/local/public/bin/perl use strict; use lib '/homej/bioinf/lamilhat/PERL_MODULE/lib/perl5/site_perl/5.005/BIOPERL/lib/site_perl/5.6.1/'; use Bio::Graphics; use Bio::SeqFeature::Generic; my $panel= Bio::Graphics::Panel->new(-length =>1000,-width =>800); my $track=$panel->add_track(-glyph =>'generic',-label =>1); while (<>) { chomp; next if /^\#/; my ($name,$score,$start,$end)=split /\t+/; print STDERR "$name\n"; my $feature= Bio::SeqFeature::Generic->new(-display_name=>$name,-score=>$score,-start=>$start,-end=>$end); $track->add_feature($feature); } print $panel->png; And this is the Data to parse with the example: #hit score start end truc1 381 2 200 truc2 210 2 210 truc3 800 2 200 truc4 1000 380 921 truc5 812 402 972 truc6 1200 400 970 bum 400 300 620 pres1 127 310 700 Thanks, Laurence. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ INRA, UMR INRA/UBP Am?lioration et Sant? des Plantes 234 avenue du Br?zet 63039 Clermont-Ferrand Cedex 2 Tel 04 73 62 48 37 Fax 04 73 62 44 53 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From laurichj at bioinfo.ucr.edu Wed Aug 6 13:11:37 2003 From: laurichj at bioinfo.ucr.edu (Josh Lauricha) Date: Wed Aug 6 13:11:17 2003 Subject: [Bioperl-l] BioPipe & LFS Message-ID: <20030806171137.GA32555@bioinfo.ucr.edu> Sorry for the quite off topic post, I've been looking into difference batch systems, however I've only seen LFS mentioned by the BioPipe docs. Can anyone point me to the main LFS page? Since I don't know what it stands for, any search I do ends up with references to Linux From Scratch... Thanks for your help. -- ---------------------------- | Josh Lauricha | | laurichj@bioinfo.ucr.edu | | Bioinformatics, UCR | |--------------------------| From lailf at bii.a-star.edu.sg Wed Aug 6 13:25:10 2003 From: lailf at bii.a-star.edu.sg (LAI Loong Fong) Date: Wed Aug 6 13:21:14 2003 Subject: [Bioperl-l] BioPipe & LFS In-Reply-To: <20030806171137.GA32555@bioinfo.ucr.edu> Message-ID: Do you mean LSF, if it is then you can find more info at http://www.platform.com Regards LAI Loong Fong On 7/8/03 1:11 AM, "Josh Lauricha" wrote: > Sorry for the quite off topic post, > > I've been looking into difference batch systems, however I've only seen > LFS mentioned by the BioPipe docs. Can anyone point me to the main LFS > page? Since I don't know what it stands for, any search I do ends up > with references to Linux From Scratch... > > Thanks for your help. From laurichj at bioinfo.ucr.edu Wed Aug 6 13:28:54 2003 From: laurichj at bioinfo.ucr.edu (Josh Lauricha) Date: Wed Aug 6 13:28:33 2003 Subject: [Bioperl-l] BioPipe & LFS In-Reply-To: References: <20030806171137.GA32555@bioinfo.ucr.edu> Message-ID: <20030806172854.GB32555@bioinfo.ucr.edu> On Thu 01:25, LAI Loong Fong wrote: > Do you mean LSF, if it is then you can find more info at > http://www.platform.com > Well, that would explain it. I'd like to blame it on a typo... but its really just my own incompatence. Thanks -- ---------------------------- | Josh Lauricha | | laurichj@bioinfo.ucr.edu | | Bioinformatics, UCR | |--------------------------| From brian_osborne at cognia.com Wed Aug 6 13:27:24 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Wed Aug 6 13:30:44 2003 Subject: [Bioperl-l] BioPipe & LFS In-Reply-To: <20030806171137.GA32555@bioinfo.ucr.edu> Message-ID: Josh, Are you sure it's not LSF? "Load Sharing Facility"? Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Josh Lauricha Sent: Wednesday, August 06, 2003 1:12 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] BioPipe & LFS Sorry for the quite off topic post, I've been looking into difference batch systems, however I've only seen LFS mentioned by the BioPipe docs. Can anyone point me to the main LFS page? Since I don't know what it stands for, any search I do ends up with references to Linux From Scratch... Thanks for your help. -- ---------------------------- | Josh Lauricha | | laurichj@bioinfo.ucr.edu | | Bioinformatics, UCR | |--------------------------| _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From cain at cshl.org Wed Aug 6 15:21:17 2003 From: cain at cshl.org (Scott Cain) Date: Wed Aug 6 15:21:19 2003 Subject: [Bioperl-l] Re: load_gff.pl question In-Reply-To: <200308061731.h76HVH4U005979@localhost.localdomain> References: <200308061731.h76HVH4U005979@localhost.localdomain> Message-ID: <1060197700.1431.11.camel@localhost.localdomain> Shin, The problem you are running into is not really with load_gff.pl, but with the database schema. Assuming you are using MySQL, the table create statement for fdata looks like this: create table fdata ( fid int not null auto_increment, fref varchar(100) not null, fstart int unsigned not null, fstop int unsigned not null, fbin double(20,6) not null, ftypeid int not null, fscore float, fstrand enum('+','-'), fphase enum('0','1','2'), gid int not null, ftarget_start int unsigned, ftarget_stop int unsigned, primary key(fid), unique index(fref,fbin,fstart,fstop,ftypeid,gid), index(ftypeid), index(gid) The problem you have is with that unique index on (fref,fbin,fstart,fstop,ftypeid,gid). This index conflicts with your data, in that the similar lines are getting assigned the same gid (group id), since they look like the same thing. So, the quick way to fix this is to remove the 'unique' from the index declaration. That can be found in Bio/DB/GFF/Adaptor/dbi/mysql.pm. Then run load_gff.pl as usual. The longer way to fix this is look at your data and figure out why they are all getting assigned the same group id and make them sufficiently different so that they don't. Hope that helps, Scott On Wed, 2003-08-06 at 13:31, bioperl-l-request@portal.open-bio.org wrote: > Where do I start to customize this script to allow loading of large > number of similar entities? -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From cain at cshl.org Wed Aug 6 15:52:28 2003 From: cain at cshl.org (Scott Cain) Date: Wed Aug 6 15:52:30 2003 Subject: [Bioperl-l] Re: Bioperl-l Digest, Vol 4, Issue 3 In-Reply-To: <200308061731.h76HVH4U005979@localhost.localdomain> References: <200308061731.h76HVH4U005979@localhost.localdomain> Message-ID: <1060199572.1431.33.camel@localhost.localdomain> Laurence, I ran a script very similar to what you are using (the code I used is below), and I didn't have any problems, at least not if what is expected is the same as this: http://www.gmod.org/BioGraphicsTest.png. I suspect you are having a version problem. I am using bioperl-live (from CVS), but when I installed bioperl-1.2.2, it failed in the way you describe. Where is the tutorial you are reading? Perhaps it is not in sync with the most recent version of released bioperl. Here is the script I used: #!/usr/local/bin/perl -w use strict; use Bio::Graphics; use Bio::SeqFeature::Generic; my $panel= Bio::Graphics::Panel->new(-length =>1000,-width =>800); my $track=$panel->add_track(-glyph =>'generic',-label =>1); while () { chomp; next if /^\#/; my ($name,$score,$start,$end)=split /\s+/; warn "$name\n"; my $feature= Bio::SeqFeature::Generic->new(-display_name=>$name,-score=>$score,-start=>$start,-end=>$end); $track->add_feature($feature); } print $panel->png; __DATA__ #hit score start end truc1 381 2 200 truc2 210 2 210 truc3 800 2 200 truc4 1000 380 921 truc5 812 402 972 truc6 1200 400 970 bum 400 300 620 pres1 127 310 700 Scott On Wed, 2003-08-06 at 13:31, bioperl-l-request@portal.open-bio.org wrote: > I try to learn how to use the module Bio::Graphics. > I found he How To from Lincoln Stein on the web. I try to practice with the > examples, it's working except for the labels of the features that don't > appear on my figure. > Does anybody ever use this module? -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From kvddrift at earthlink.net Wed Aug 6 19:40:20 2003 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed Aug 6 19:42:03 2003 Subject: [Bioperl-l] GFF scripts Message-ID: <57B466DA-C867-11D7-8307-003065A5FDCC@earthlink.net> Hi, What happens to the scripts in Bio-DB-GFF after installation? Do I need to set a specific flag during compiling/installation? I ask this because the gbrowse package uses this script to set up a database. thanks, - Koen. From wes.barris at csiro.au Wed Aug 6 20:23:17 2003 From: wes.barris at csiro.au (Wes Barris) Date: Wed Aug 6 20:23:17 2003 Subject: [Bioperl-l] Bio::Graphics In-Reply-To: <5.1.1.6.0.20030806090354.00b28208@valmont> References: <5.1.1.6.0.20030806090354.00b28208@valmont> Message-ID: <3F319BF5.6050301@csiro.au> Laurence Amilhat wrote: > Hi, > > I try to learn how to use the module Bio::Graphics. > I found he How To from Lincoln Stein on the web. I try to practice with > the examples, it's working except for the labels of the features that > don't appear on my figure. > Does anybody ever use this module? Hi Laurence, I have not been able to get this function working either. I am experiencing the exact same problem as you. I have tried this on two separate computers where bioperl was installed by two different people. Both are Redhat systems. One is running bioperl-1.2.1, the other is running bioperl-1.2.2. In both cases the labels are silently omitted. The example that I was trying is the first one on this page: http://bioperl.org/HOWTOs/html/Graphics-HOWTO.html It is virtually the same as your example with one small difference. You code includes this line: my $feature = Bio::SeqFeature::Generic->new(-display_name=>$name,-score=>$score, The example from the URL above uses a slight variation: my $feature = Bio::SeqFeature::Generic->new(-seq_id=>$name,-score=>$score, I have tried both versions. Neither of them produced a label in the resulting png file. I don't know what the significance of the two different attributes is. I can read the words of the documentation, but it didn't explain the difference of the two or if they were somehow related. Obviously, there is a bug somewhere in bioperl or in the provided examples (or in the provided installation instructions). Either that, or we have both made the exact same error in doing something. I just don't know what. I have posted a question about this exact same anomaly to this mailing list but have not received any suggestions yet. Perhaps this module is not too heavily used. > > This is the example: > #!/usr/local/public/bin/perl > > use strict; > use lib > '/homej/bioinf/lamilhat/PERL_MODULE/lib/perl5/site_perl/5.005/BIOPERL/li > b/site_perl/5.6.1/'; > use Bio::Graphics; > use Bio::SeqFeature::Generic; > > my $panel= Bio::Graphics::Panel->new(-length =>1000,-width =>800); > my $track=$panel->add_track(-glyph =>'generic',-label =>1); > > > while (<>) > { > chomp; > next if /^\#/; > my ($name,$score,$start,$end)=split /\t+/; > print STDERR "$name\n"; > my $feature= > Bio::SeqFeature::Generic->new(-display_name=>$name,-score=>$score,-start > =>$start,-end=>$end); > $track->add_feature($feature); > } > > print $panel->png; > > > And this is the Data to parse with the example: > #hit score start end > truc1 381 2 200 > truc2 210 2 210 > truc3 800 2 200 > truc4 1000 380 921 > truc5 812 402 972 > truc6 1200 400 970 > bum 400 300 620 > pres1 127 310 700 > > > Thanks, > > Laurence. > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > INRA, UMR INRA/UBP Am?lioration et Sant? des Plantes > 234 avenue du Br?zet > 63039 Clermont-Ferrand Cedex 2 > > Tel 04 73 62 48 37 > Fax 04 73 62 44 53 > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Wes Barris E-Mail: Wes.Barris@csiro.au From wes.barris at csiro.au Wed Aug 6 20:35:55 2003 From: wes.barris at csiro.au (Wes Barris) Date: Wed Aug 6 20:35:48 2003 Subject: [Bioperl-l] Re: Bioperl-l Digest, Vol 4, Issue 3 In-Reply-To: <1060199572.1431.33.camel@localhost.localdomain> References: <200308061731.h76HVH4U005979@localhost.localdomain> <1060199572.1431.33.camel@localhost.localdomain> Message-ID: <3F319EEB.7090207@csiro.au> Scott Cain wrote: > Laurence, > > I ran a script very similar to what you are using (the code I used is > below), and I didn't have any problems, at least not if what is expected > is the same as this: http://www.gmod.org/BioGraphicsTest.png. > > I suspect you are having a version problem. I am using bioperl-live > (from CVS), but when I installed bioperl-1.2.2, it failed in the way you > describe. Where is the tutorial you are reading? Perhaps it is not in > sync with the most recent version of released bioperl. Hi Scott, I have experienced the same exact problem described by Laurence. I am using the first example on this page: http://bioperl.org/HOWTOs/html/Graphics-HOWTO.html I have tried this on a Redhat-7.2/Bioperl-1.2.1 system and on a Redhat-8.0/Bioperl-1.2.2 system. I both cases, the label was silently omitted from the resulting png file. Your example code uses "-display_name" where the HOW-TO example code uses "-seq_id". Neither produces a label. Do you know what the difference is? Is installing from CVS the only way to resolve this? > > Here is the script I used: > #!/usr/local/bin/perl -w > > use strict; > use Bio::Graphics; > use Bio::SeqFeature::Generic; > > my $panel= Bio::Graphics::Panel->new(-length =>1000,-width =>800); > my $track=$panel->add_track(-glyph =>'generic',-label =>1); > > while () > { > chomp; > next if /^\#/; > my ($name,$score,$start,$end)=split /\s+/; > warn "$name\n"; > my $feature= > Bio::SeqFeature::Generic->new(-display_name=>$name,-score=>$score,-start=>$start,-end=>$end); > $track->add_feature($feature); > } > > print $panel->png; > > __DATA__ > #hit score start end > truc1 381 2 200 > truc2 210 2 210 > truc3 800 2 200 > truc4 1000 380 921 > truc5 812 402 972 > truc6 1200 400 970 > bum 400 300 620 > pres1 127 310 700 > > Scott > > On Wed, 2003-08-06 at 13:31, bioperl-l-request@portal.open-bio.org > wrote: > >>I try to learn how to use the module Bio::Graphics. >>I found he How To from Lincoln Stein on the web. I try to practice with the >>examples, it's working except for the labels of the features that don't >>appear on my figure. >>Does anybody ever use this module? > > -- Wes Barris E-Mail: Wes.Barris@csiro.au From shin at biosci.cbs.umn.edu Wed Aug 6 23:52:19 2003 From: shin at biosci.cbs.umn.edu (Shin Enomoto) Date: Wed Aug 6 23:51:46 2003 Subject: [Bioperl-l] Re: load_gff.pl question In-Reply-To: <1060197700.1431.11.camel@localhost.localdomain> Message-ID: <8AEE3822-C88A-11D7-9A79-000A2792630A@biosci.cbs.umn.edu> Thank you. After a good night's sleep I modified the GFF table one column at a time and found that (ref, source, method, start, end, gclass, name) mattered. What does fbin come from? I have a different question? When I load the following: VI_3 nelson cdna 404988 405465 . + . EST "gi|2099810|"; Note "CpEST.323 uniZAPCpIOWAsporoLib3 Cryptosporidium parvum cDNA 5' similar to C. elegans ORF M28.5 and H. sapiens nuclear protein-NHP2-like protein., mRNA sequence" with a [EST] glyph = generic in the configuration file. gbrowse script fails "glyph genric new not available". My work around was either to change the word EST to something else or use another glyph. What do you think is the conflict? On Wednesday, August 6, 2003, at 02:21 PM, Scott Cain wrote: > Shin, > > The problem you are running into is not really with load_gff.pl, but > with the database schema. Assuming you are using MySQL, the table > create statement for fdata looks like this: > > create table fdata ( > fid int not null auto_increment, > fref varchar(100) not null, > fstart int unsigned not null, > fstop int unsigned not null, > fbin double(20,6) not null, > ftypeid int not null, > fscore float, > fstrand enum('+','-'), > fphase enum('0','1','2'), > gid int not null, > ftarget_start int unsigned, > ftarget_stop int unsigned, > primary key(fid), > unique index(fref,fbin,fstart,fstop,ftypeid,gid), > index(ftypeid), > index(gid) > > The problem you have is with that unique index on > (fref,fbin,fstart,fstop,ftypeid,gid). This index conflicts with your > data, in that the similar lines are getting assigned the same gid > (group > id), since they look like the same thing. So, the quick way to fix > this > is to remove the 'unique' from the index declaration. That can be > found > in Bio/DB/GFF/Adaptor/dbi/mysql.pm. Then run load_gff.pl as usual. The > longer way to fix this is look at your data and figure out why they are > all getting assigned the same group id and make them sufficiently > different so that they don't. > > Hope that helps, > Scott > > On Wed, 2003-08-06 at 13:31, bioperl-l-request@portal.open-bio.org > wrote: >> Where do I start to customize this script to allow loading of large >> number of similar entities? > > -- > ----------------------------------------------------------------------- > - > Scott Cain, Ph. D. > cain@cshl.org > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > Shin Enomoto 295 ASLVM 1988 Fitch Ave St. Paul, MN 55108 612-625-7737 From kohl at uni-duesseldorf.de Thu Aug 7 05:34:15 2003 From: kohl at uni-duesseldorf.de (jochen) Date: Thu Aug 7 03:29:15 2003 Subject: [Bioperl-l] get sequences over pubmed Message-ID: <3F321D17.3070102@uni-duesseldorf.de> Hallo, I try to get the sequences related to pubmed entry with bioperl. Just like to use the "display : Nukleotide links" on the pubmed web site. My first try was generated Query with Bio::DB::Query::GenBanK->new(-db=> pubmed, ..) but I don't find a way to get the sequence for the refference. So I try this : my $query_string = 'Vigilant [AUTH] AND 1503-1507 AND 1991 [PDAT]'; my $query = Bio::DB::Query::GenBank->new(-db=>'pubmed', -query=>$query_string); my $count = $query->count; if ($count == 1) { my $q = $query->query(); $query = Bio::DB::Query::GenBank->new(-db=>'nucleotide', -query=>$q); } $db = new Bio::DB::GenBank(-delay=>'10s'); my $stream = $db->get_Stream_by_query($query); while (my $seq = $stream->next_seq) { print " Seq : " . $seq->display_id() . "\n"; } But I'm not sure if the same query string find the only the sequences for the refference and same query-string (like the example) don't work for the 'nucleotide' db. I also looked by *Bio::Biblio* to find the solution but I'm to blind. Thanks for Help From brian_osborne at cognia.com Thu Aug 7 08:25:50 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Aug 7 08:29:07 2003 Subject: [Bioperl-l] GFF scripts In-Reply-To: <57B466DA-C867-11D7-8307-003065A5FDCC@earthlink.net> Message-ID: Koen, This is from the INSTALL file in the package: INSTALLING BIOPERL SCRIPTS Bioperl comes with a set of production-quality scripts that are kept in the scripts/ directory. You can install these scripts if you'd like, simply follow the instructions on 'make install'. The installation directory is specified by the INSTALLSCRIPT variable in the Makefile, the default location is /usr/bin. Installation will copy the scripts to the specified directory, change the 'PLS' suffix to 'pl', and prepend 'bp_' to all the script names if they aren't so named already. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Koen van der Drift Sent: Wednesday, August 06, 2003 7:40 PM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] GFF scripts Hi, What happens to the scripts in Bio-DB-GFF after installation? Do I need to set a specific flag during compiling/installation? I ask this because the gbrowse package uses this script to set up a database. thanks, - Koen. _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From Lobvi.Matamoros at crchul.ulaval.ca Wed Aug 6 19:02:27 2003 From: Lobvi.Matamoros at crchul.ulaval.ca (Lobvi Matamoros) Date: Thu Aug 7 08:33:35 2003 Subject: [Bioperl-l] BioPerl Installation Message-ID: <4.2.0.58.20030806170151.00a4eaa8@drs.crchul.ulaval.ca> Hi to every body: I am trying to install BioPerl (I have already installed ActivePerl 5.8.0 version with Win2000) everything goes fine downloading a zip file from http://bio.perl.org/DIST/bioperl-1.2.2.zip. Unziping the file to a directory C:\Perl\ BioPerl 1.2.2 and starting the installation process with perl Makefile.pl. After that when I issued the commands make, make test or make install I got the following message: ..make is not recognize as an internal or external command, operable program or batch file. What is wrong? Thanks in advance for your help. From patrick.demarta at libero.it Thu Aug 7 09:11:30 2003 From: patrick.demarta at libero.it (Patrick De Marta) Date: Thu Aug 7 09:15:21 2003 Subject: [Bioperl-l] BioPerl Installation References: <4.2.0.58.20030806170151.00a4eaa8@drs.crchul.ulaval.ca> Message-ID: <00de01c35ce5$6b3fdf00$0200a8c0@server> ----- Original Message ----- From: "Lobvi Matamoros" To: Sent: Thursday, August 07, 2003 1:02 AM Subject: [Bioperl-l] BioPerl Installation > Hi to every body: > > I am trying to install BioPerl (I have already installed ActivePerl 5.8.0 > version with Win2000) everything goes fine downloading a zip file from > http://bio.perl.org/DIST/bioperl-1.2.2.zip. Unziping the file to a > directory C:\Perl\ BioPerl 1.2.2 and starting the installation process with > perl Makefile.pl. After that when I issued the commands make, make test or > make install I got the following message: > > ..make is not recognize as an internal or external command, operable > program or batch file. > > What is wrong? Sorry, don't know this in particular I'm a novice perl and bioperl student :))) but i simply installed with: ppm (at command prompt) and then > install bioperl it worked fine. :-) From brian_osborne at cognia.com Thu Aug 7 09:12:25 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Aug 7 09:15:45 2003 Subject: [Bioperl-l] BioPerl Installation In-Reply-To: <4.2.0.58.20030806170151.00a4eaa8@drs.crchul.ulaval.ca> Message-ID: Lobvi, You're trying a "unix-style" installation on Windows. This can be done but the specific error arises because you probably don't have the make program installed on your machine. If you look at the INSTALL.WIN file in the bioperl package you'll see it describes an installation using ActiveState and its PPM application. Alternatively, you can install Cygwin, also free, on your computer and do the installation unix-style since Cygwin is a Unix emulator. I've taken this approach because I'm comfortable with Unix and have no patience with MS-DOS/Command Prompt. If you want to continue along the same course I believe you have to install nmake for Windows, then use nmake instead of make - I will be corrected if I'm wrong, I've avoided this approach. Otherwise you'll need to decide between ActiveState/PPM or Cygwin. Or double-boot, and so on. If you choose Cygwin then de-install ActiveState, Cygwin supplies its own perl. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Lobvi Matamoros Sent: Wednesday, August 06, 2003 7:02 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] BioPerl Installation Hi to every body: I am trying to install BioPerl (I have already installed ActivePerl 5.8.0 version with Win2000) everything goes fine downloading a zip file from http://bio.perl.org/DIST/bioperl-1.2.2.zip. Unziping the file to a directory C:\Perl\ BioPerl 1.2.2 and starting the installation process with perl Makefile.pl. After that when I issued the commands make, make test or make install I got the following message: ..make is not recognize as an internal or external command, operable program or batch file. What is wrong? Thanks in advance for your help. From aa0iswoal42 at cspgroup.com Thu Aug 7 09:26:05 2003 From: aa0iswoal42 at cspgroup.com (aa0iswoal42@cspgroup.com) Date: Thu Aug 7 09:24:26 2003 Subject: [Bioperl-l] Protect your PC today Message-ID: <200308071324.h77DON4T012009@localhost.localdomain> VIRUS ALERT the most common viruses are transmitted and installed behind the scenes while you're on the internet! Norton anti-virus protects you from ALL transmission methods btw, you look great today. Purchase Norton Now! http://fdsop@softwaresavings2you.biz/default.asp?id=3000 ps. dont want any more of this shit? http://papo5ss@softwaresavings2you.biz/remove/remove.html From brian_osborne at cognia.com Thu Aug 7 09:22:18 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Aug 7 09:25:39 2003 Subject: [Bioperl-l] BioPerl Installation In-Reply-To: <00de01c35ce5$6b3fdf00$0200a8c0@server> Message-ID: Patrick, > ppm (at command prompt) > and then > > install bioperl > it worked fine. > :-) Cool! We rarely hear about the successful installations on Windows, just the complications. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Patrick De Marta Sent: Thursday, August 07, 2003 9:12 AM To: bioperl-l@bioperl.org; Lobvi Matamoros Subject: Re: [Bioperl-l] BioPerl Installation ----- Original Message ----- From: "Lobvi Matamoros" To: Sent: Thursday, August 07, 2003 1:02 AM Subject: [Bioperl-l] BioPerl Installation > Hi to every body: > > I am trying to install BioPerl (I have already installed ActivePerl 5.8.0 > version with Win2000) everything goes fine downloading a zip file from > http://bio.perl.org/DIST/bioperl-1.2.2.zip. Unziping the file to a > directory C:\Perl\ BioPerl 1.2.2 and starting the installation process with > perl Makefile.pl. After that when I issued the commands make, make test or > make install I got the following message: > > ..make is not recognize as an internal or external command, operable > program or batch file. > > What is wrong? Sorry, don't know this in particular I'm a novice perl and bioperl student :))) but i simply installed with: ppm (at command prompt) and then > install bioperl it worked fine. :-) _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From a_rhodes2 at dell.com Thu Aug 7 10:40:23 2003 From: a_rhodes2 at dell.com (a_rhodes2@dell.com) Date: Thu Aug 7 10:43:11 2003 Subject: [Bioperl-l] award-winning Message-ID: <200308071443.h77Eh74T012323@localhost.localdomain> VIRUS DETECTED! the next email you receive could contain a virus. are you protected? Receive TOTAL protection with Norton. btw, you look great today. Purchase a copy now and be safe. http://ooq212@softwaresavings2you.biz/default.asp?id=3000 ps. dont want any more of this shit? http://oo6q212@softwaresavings2you.biz/remove/remove.html From cain at cshl.org Thu Aug 7 11:52:40 2003 From: cain at cshl.org (Scott Cain) Date: Thu Aug 7 11:52:41 2003 Subject: [Bioperl-l] Re: BioGraphics tutorial problem In-Reply-To: <5.1.1.6.0.20030807105834.00b28478@valmont> References: <200308061731.h76HVH4U005979@localhost.localdomain> <200308061731.h76HVH4U005979@localhost.localdomain> <5.1.1.6.0.20030807105834.00b28478@valmont> Message-ID: <1060271583.1429.15.camel@localhost.localdomain> Laurence and Wes, While I am sure there is another way to fix this problem, I can't seem to figure it out this morning. Installing from CVS will fix it, and it is not particularly painful to do (at least, not on a unix-like system). You can go to http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/?cvsroot=bioperl and click the "Download tarball" link to get bioperl-live. If you don't want to install the "bleeding edge" version of bioperl, you can wait either until I shake the cold I've got, or until Lincoln gets back from vacation :-) Scott On Thu, 2003-08-07 at 05:08, Laurence Amilhat wrote: > Dear Scott, > > Thank you very much for your answer. > The http://www.gmod.org/BioGraphicsTest.png is what am I expecting. > I installed the BioPerl-1.2.2 version. I used the tutorial from > Lincoln Stein at : > http://stein.cshl.org/genome_informatics/BioGraphics > > So, the only way to get my script working is to use bio-perl-live from > CVS? > I saw that I need a secure ssh (I have one), a CVS and an account to > run script on bioperl live. > but I don't know what is CVS? > > Thank you, > > Sincerely, > > Laurence > > > > > > > At 15:52 06/08/2003 -0400, you wrote: > > Laurence, > > > > I ran a script very similar to what you are using (the code I used > > is > > below), and I didn't have any problems, at least not if what is > > expected > > is the same as this: http://www.gmod.org/BioGraphicsTest.png. > > > > I suspect you are having a version problem. I am using bioperl-live > > (from CVS), but when I installed bioperl-1.2.2, it failed in the way > > you > > describe. Where is the tutorial you are reading? Perhaps it is not > > in > > sync with the most recent version of released bioperl. > > > > Here is the script I used: > > #!/usr/local/bin/perl -w > > > > use strict; > > use Bio::Graphics; > > use Bio::SeqFeature::Generic; > > > > my $panel= Bio::Graphics::Panel->new(-length =>1000,-width =>800); > > my $track=$panel->add_track(-glyph =>'generic',-label =>1); > > > > while () > > { > > chomp; > > next if /^\#/; > > my ($name,$score,$start,$end)=split /\s+/; > > warn "$name\n"; > > my $feature= > > Bio::SeqFeature::Generic->new(-display_name=>$name,-score=>$score,-start=>$start,-end=>$end); > > $track->add_feature($feature); > > } > > > > print $panel->png; > > > > __DATA__ > > #hit score start end > > truc1 381 2 200 > > truc2 210 2 210 > > truc3 800 2 200 > > truc4 1000 380 921 > > truc5 812 402 972 > > truc6 1200 400 970 > > bum 400 300 620 > > pres1 127 310 700 > > > > Scott > > > > On Wed, 2003-08-06 at 13:31, bioperl-l-request@portal.open-bio.org > > wrote: > > > I try to learn how to use the module Bio::Graphics. > > > I found he How To from Lincoln Stein on the web. I try to practice > > with the > > > examples, it's working except for the labels of the features that > > don't > > > appear on my figure. > > > Does anybody ever use this module? > > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. > > cain@cshl.org > > GMOD Coordinator (http://www.gmod.org/) > > 216-392-3087 > > Cold Spring Harbor Laboratory > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > INRA, UMR INRA/UBP Am?lioration et Sant? des Plantes > 234 avenue du Br?zet > 63039 Clermont-Ferrand Cedex 2 > > Tel 04 73 62 48 37 > Fax 04 73 62 44 53 > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From cain at cshl.org Thu Aug 7 12:06:58 2003 From: cain at cshl.org (Scott Cain) Date: Thu Aug 7 12:06:59 2003 Subject: [Bioperl-l] Re: load_gff.pl question In-Reply-To: <8AEE3822-C88A-11D7-9A79-000A2792630A@biosci.cbs.umn.edu> References: <8AEE3822-C88A-11D7-9A79-000A2792630A@biosci.cbs.umn.edu> Message-ID: <1060272443.1430.31.camel@localhost.localdomain> On Wed, 2003-08-06 at 23:52, Shin Enomoto wrote: > Thank you. > After a good night's sleep I modified the GFF table one column at a > time and found that (ref, source, method, start, end, gclass, name) > mattered. What does fbin come from? fbin is a value calculated at the time of the load to make searching ranges faster, it is derived from the start and end values, so any feature with the same start and end will have the same fbin value. > > I have a different question? > When I load the following: > VI_3 nelson cdna 404988 405465 . + . EST "gi|2099810|"; Note "CpEST.323 > uniZAPCpIOWAsporoLib3 Cryptosporidium parvum cDNA 5' similar to C. > elegans ORF M28.5 and H. sapiens nuclear protein-NHP2-like protein., > mRNA sequence" > > with a > [EST] > glyph = generic > > in the configuration file. > gbrowse script fails "glyph genric new not available". > > My work around was either to change the word EST to something else or > use another glyph. What do you think is the conflict? The '[EST]' part is just the name of the block, and so it shouldn't cause anything related to glyph drawing to fail. Could you send the whole EST configuration block? (I assume the 'genric' is a typo on your part, right?) Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From shin at cbs.umn.edu Thu Aug 7 12:28:34 2003 From: shin at cbs.umn.edu (Shin Enomoto) Date: Thu Aug 7 12:29:46 2003 Subject: [Bioperl-l] Re: load_gff.pl question In-Reply-To: <1060272443.1430.31.camel@localhost.localdomain> Message-ID: <30710004-C8F4-11D7-84A4-0003935652B4@biosci.cbs.umn.edu> It was a typo in the configuration file also that was causing the script to abort. On Thursday, August 7, 2003, at 11:07 AM, Scott Cain wrote: > On Wed, 2003-08-06 at 23:52, Shin Enomoto wrote: >> Thank you. >> After a good night's sleep I modified the GFF table one column at a >> time and found that (ref, source, method, start, end, gclass, name) >> mattered. What does fbin come from? > > fbin is a value calculated at the time of the load to make searching > ranges faster, it is derived from the start and end values, so any > feature with the same start and end will have the same fbin value. >> >> I have a different question? >> When I load the following: >> VI_3 nelson cdna 404988 405465 . + . EST "gi|2099810|"; Note >> "CpEST.323 >> uniZAPCpIOWAsporoLib3 Cryptosporidium parvum cDNA 5' similar to C. >> elegans ORF M28.5 and H. sapiens nuclear protein-NHP2-like protein., >> mRNA sequence" >> >> with a >> [EST] >> glyph = generic >> >> in the configuration file. >> gbrowse script fails "glyph genric new not available". >> >> My work around was either to change the word EST to something else or >> use another glyph. What do you think is the conflict? > > The '[EST]' part is just the name of the block, and so it shouldn't > cause anything related to glyph drawing to fail. Could you send the > whole EST configuration block? (I assume the 'genric' is a typo on > your > part, right?) > > Thanks, > Scott > > -- > ----------------------------------------------------------------------- > - > Scott Cain, Ph. D. > cain@cshl.org > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > Shin Enomoto 295 ASLVM 1988 Fitch Ave. St. Paul, MN 55108 612-625-7737 From kvddrift at earthlink.net Thu Aug 7 17:39:54 2003 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu Aug 7 17:41:40 2003 Subject: [Bioperl-l] GFF scripts In-Reply-To: Message-ID: On Thursday, August 7, 2003, at 08:25 AM, Brian Osborne wrote: > Koen, > > This is from the INSTALL file in the package: > > INSTALLING BIOPERL SCRIPTS Brian, It's not in the version of Bioperl that I have (1.2.2). Is there a newer version available? thanks, - Koen. From kvddrift at earthlink.net Thu Aug 7 21:35:00 2003 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu Aug 7 21:36:47 2003 Subject: [Bioperl-l] GFF scripts In-Reply-To: Message-ID: <86D0276C-C940-11D7-B479-003065A5FDCC@earthlink.net> On Thursday, August 7, 2003, at 05:39 PM, Koen van der Drift wrote: > It's not in the version of Bioperl that I have (1.2.2). Is there a > newer version available? > > To answer my own question, I just downloaded the cvs version, and indeed now I can install the scripts. It must have been added after the latest release. I run into some other error nowduring the make test phase: t/DBFasta....................ok 4/12AnyDBM_File doesn't define an EXISTS method at Bio/DB/Fasta.pm line 574 t/DBFasta....................dubious Test returned status 255 (wstat 65280, 0xff00) DIED. FAILED tests 5-12 Failed 8/12 tests, 33.33% okay I found another mention of this in the mailinglist archive, and it looks like this was something that was fixed a while ago (if I understand it correctly). I am using perl 5.8.0 on Mac OS X 10.2.6 thanks, - Koen. From yaofx at xymu.net Thu Aug 7 08:46:34 2003 From: yaofx at xymu.net (yaofx) Date: Thu Aug 7 22:13:46 2003 Subject: [Bioperl-l] BioPerl Installation In-Reply-To: <4.2.0.58.20030806170151.00a4eaa8@drs.crchul.ulaval.ca> References: <4.2.0.58.20030806170151.00a4eaa8@drs.crchul.ulaval.ca> Message-ID: <3F324A2A.9080607@xymu.net> Hi, You should use "nmake" command, not "make". Fengxia Lobvi Matamoros wrote: > Hi to every body: > > I am trying to install BioPerl (I have already installed ActivePerl > 5.8.0 version with Win2000) everything goes fine downloading a zip > file from http://bio.perl.org/DIST/bioperl-1.2.2.zip. Unziping the > file to a directory C:\Perl\ BioPerl 1.2.2 and starting the > installation process with perl Makefile.pl. After that when I issued > the commands make, make test or make install I got the following message: > > ..make is not recognize as an internal or external command, operable > program or batch file. > > What is wrong? > > > Thanks in advance for your help. > > > >------------------------------------------------------------------------ > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > From brian_osborne at cognia.com Fri Aug 8 07:28:05 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Aug 8 07:31:19 2003 Subject: [Bioperl-l] GFF scripts In-Reply-To: Message-ID: Koen, I moved this latest version of the file to the Web site but I forgot to mention it to you. My apologies. http://bioperl.org/Core/Latest/INSTALL Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Koen van der Drift Sent: Thursday, August 07, 2003 5:40 PM To: Brian Osborne Cc: bioperl-l@portal.open-bio.org Subject: Re: [Bioperl-l] GFF scripts On Thursday, August 7, 2003, at 08:25 AM, Brian Osborne wrote: > Koen, > > This is from the INSTALL file in the package: > > INSTALLING BIOPERL SCRIPTS Brian, It's not in the version of Bioperl that I have (1.2.2). Is there a newer version available? thanks, - Koen. _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From cain at cshl.org Fri Aug 8 11:01:11 2003 From: cain at cshl.org (Scott Cain) Date: Fri Aug 8 11:01:13 2003 Subject: [Bioperl-l] bioperl meeting at GMOD meeting in Sept Message-ID: <1060354898.1430.35.camel@localhost.localdomain> Hello, I know a few weeks ago there was a brief discussion about holding a bioperl meeting within the GMOD meeting in Berkeley on September 15-16. I am working on an agenda and would like to figure out if I need to set aside time for a bioperl working group. It would be one of a few groups running concurrently, either half day or whole day, depending on what is needed. If you want to attend, please let me know so a can gage interest. Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From John.Gray at stjude.org Fri Aug 8 14:32:48 2003 From: John.Gray at stjude.org (Gray, John) Date: Fri Aug 8 14:32:26 2003 Subject: [Bioperl-l] Will subseq give complement? Message-ID: <1E0CC447E59C974CA5C7160D2A2854ECC519FB@SJMEMXMB04.stjude.sjcrh.local> I am trying to write a script which pull only intron sequences from genomic contigs, and I am having trouble making subseq give me the complement of the sequence in the file. First I import the file into the bioperl object $seq using SeqIO, and then I process the features. When the feature is an 'mRNA', with a 'SplitLocation', then I put all the exon stop and start positions into @start and @end arrays. I then pull the intron sequence with the following command: $sequence = $seq->subseq($end[$j]+1,$start[$j+1]-1, ' ', $strand); This whole process works great, except that I always only get the top strand, regardless of the orientation of the feature. I have confirmed that in fact the $strand variable properly reflects the strand of the feature. Am I using this method properly? Can anyone help me? John T. Gray, Ph.D. Director, Vector Development & Production Experimental Hematology Division Hematology-Oncology St. Jude Children's Research Hospital ? (901) 495-4729 phone (901) 495-2176 fax John.Gray@stjude.org From John.Gray at stjude.org Fri Aug 8 14:38:17 2003 From: John.Gray at stjude.org (Gray, John) Date: Fri Aug 8 14:37:54 2003 Subject: [Bioperl-l] Will subseq give complement? Message-ID: <1E0CC447E59C974CA5C7160D2A2854ECC51F52@SJMEMXMB04.stjude.sjcrh.local> PS. I am using version 1.2, I think. John T. Gray, Ph.D. Director, Vector Development & Production Experimental Hematology Division Hematology-Oncology St. Jude Children's Research Hospital ? (901) 495-4729 phone (901) 495-2176 fax John.Gray@stjude.org From jason at cgt.duhs.duke.edu Fri Aug 8 15:18:04 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Fri Aug 8 15:02:13 2003 Subject: [Bioperl-l] Will subseq give complement? In-Reply-To: <1E0CC447E59C974CA5C7160D2A2854ECC519FB@SJMEMXMB04.stjude.sjcrh.local> References: <1E0CC447E59C974CA5C7160D2A2854ECC519FB@SJMEMXMB04.stjude.sjcrh.local> Message-ID: Where does the API doc say you can pass in 4 arguments to subseq? For what you want to do you need to either pass in a Bio::Location which is the subseq you want, like this which will respect the strand my $intronloc = new Bio::Location::Simple(-start => $end[$j] +1, -end => $start[$j+1]-1, -strand => $strand); # get the string back my $subseq = $seq->subseq($intronloc); OR you can do it all explicitly. my $subseq; # get a sequence object back, not a string my $subobj = $seq->trunc($end[$j]+1,$start[$j+1]-1); if( $strand < 0 ) { $subseq = $subobj->revcom->revcom; } else { $subseq = $subobj->seq(); } FYI An undocumented (and experimental) part of the API is that you can pass in 3 arguments - start,end,replace and replace sequence in the underlying sequence just like you can with substr in perl. -jason On Fri, 8 Aug 2003, Gray, John wrote: > I am trying to write a script which pull only intron sequences from > genomic contigs, and I am having trouble making subseq give me the > complement of the sequence in the file. First I import the file into > the bioperl object $seq using SeqIO, and then I process the features. > When the feature is an 'mRNA', with a 'SplitLocation', then I put all > the exon stop and start positions into @start and @end arrays. I then > pull the intron sequence with the following command: > > $sequence = $seq->subseq($end[$j]+1,$start[$j+1]-1, > ' ', $strand); > > This whole process works great, except that I always only get the top strand, regardless of the orientation of the feature. I have confirmed that in fact the $strand variable properly reflects the strand of the feature. > > Am I using this method properly? Can anyone help me? > > > > John T. Gray, Ph.D. > Director, Vector Development & Production > Experimental Hematology Division > Hematology-Oncology > St. Jude Children's Research Hospital > ? > (901) 495-4729 phone > (901) 495-2176 fax > John.Gray@stjude.org > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From cain at cshl.org Fri Aug 8 15:03:28 2003 From: cain at cshl.org (Scott Cain) Date: Fri Aug 8 15:03:29 2003 Subject: [Bioperl-l] Re: BioGraphics tutorial problem In-Reply-To: <5.1.1.6.0.20030807105834.00b28478@valmont> References: <200308061731.h76HVH4U005979@localhost.localdomain> <200308061731.h76HVH4U005979@localhost.localdomain> <5.1.1.6.0.20030807105834.00b28478@valmont> Message-ID: <1060369435.1435.9.camel@localhost.localdomain> Laurence and Wes, The cold was of the 24-hour variety so my head has cleared, and I have a solution for the BioGraphics tutorial problem. While it will not work exactly as written in Lincoln's tutorial with BioPerl 1.2.2, it will work with BioPerl 1.3 (when it is released). I've created a perl script that generates similar output via callbacks that works with BioPerl 1.2.2. I added a few bells and whistles--my script also puts a description (a secondary label) and colored the glyphs according to score. Here's the script: #!/usr/local/bin/perl -w use strict; use Bio::Graphics; use Bio::SeqFeature::Generic; my $panel= Bio::Graphics::Panel->new(-length =>1000,-width =>800); my $track=$panel->add_track(-glyph =>'generic', -label => sub{my $self=shift; return $self->seq_id;}, -description=> sub{my $self=shift; return $self->score;}, -bgcolor => sub{my $self=shift; my $score = $self->score; if ($score >= 1000) { return 'red'; } else { return 'green'; } }); while () { chomp; next if /^\#/; my ($name,$score,$start,$end)=split /\s+/; my $feature=Bio::SeqFeature::Generic->new(-seq_id=>$name, -score=>$score, -start=>$start, -end=>$end); $track->add_feature($feature); } print $panel->png; __DATA__ #hit score start end truc1 381 2 200 truc2 210 2 210 truc3 800 2 200 truc4 1000 380 921 truc5 812 402 972 truc6 1200 400 970 bum 400 300 620 pres1 127 310 700 ------------------------------------------------- Scott On Thu, 2003-08-07 at 05:08, Laurence Amilhat wrote: > Dear Scott, > > Thank you very much for your answer. > The http://www.gmod.org/BioGraphicsTest.png is what am I expecting. > I installed the BioPerl-1.2.2 version. I used the tutorial from > Lincoln Stein at : > http://stein.cshl.org/genome_informatics/BioGraphics > > So, the only way to get my script working is to use bio-perl-live from > CVS? > I saw that I need a secure ssh (I have one), a CVS and an account to > run script on bioperl live. > but I don't know what is CVS? > > Thank you, > > Sincerely, > > Laurence > > > > > > > At 15:52 06/08/2003 -0400, you wrote: > > Laurence, > > > > I ran a script very similar to what you are using (the code I used > > is > > below), and I didn't have any problems, at least not if what is > > expected > > is the same as this: http://www.gmod.org/BioGraphicsTest.png. > > > > I suspect you are having a version problem. I am using bioperl-live > > (from CVS), but when I installed bioperl-1.2.2, it failed in the way > > you > > describe. Where is the tutorial you are reading? Perhaps it is not > > in > > sync with the most recent version of released bioperl. > > > > Here is the script I used: > > #!/usr/local/bin/perl -w > > > > use strict; > > use Bio::Graphics; > > use Bio::SeqFeature::Generic; > > > > my $panel= Bio::Graphics::Panel->new(-length =>1000,-width =>800); > > my $track=$panel->add_track(-glyph =>'generic',-label =>1); > > > > while () > > { > > chomp; > > next if /^\#/; > > my ($name,$score,$start,$end)=split /\s+/; > > warn "$name\n"; > > my $feature= > > Bio::SeqFeature::Generic->new(-display_name=>$name,-score=>$score,-start=>$start,-end=>$end); > > $track->add_feature($feature); > > } > > > > print $panel->png; > > > > __DATA__ > > #hit score start end > > truc1 381 2 200 > > truc2 210 2 210 > > truc3 800 2 200 > > truc4 1000 380 921 > > truc5 812 402 972 > > truc6 1200 400 970 > > bum 400 300 620 > > pres1 127 310 700 > > > > Scott > > > > On Wed, 2003-08-06 at 13:31, bioperl-l-request@portal.open-bio.org > > wrote: > > > I try to learn how to use the module Bio::Graphics. > > > I found he How To from Lincoln Stein on the web. I try to practice > > with the > > > examples, it's working except for the labels of the features that > > don't > > > appear on my figure. > > > Does anybody ever use this module? > > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. > > cain@cshl.org > > GMOD Coordinator (http://www.gmod.org/) > > 216-392-3087 > > Cold Spring Harbor Laboratory > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > INRA, UMR INRA/UBP Am?lioration et Sant? des Plantes > 234 avenue du Br?zet > 63039 Clermont-Ferrand Cedex 2 > > Tel 04 73 62 48 37 > Fax 04 73 62 44 53 > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From hlapp at gnf.org Fri Aug 8 16:05:58 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Aug 8 16:05:33 2003 Subject: [Bioperl-l] bioperl meeting at GMOD meeting in Sept Message-ID: <833E32F61B9F8746878F2A1865BECE604308B7@EXCHCLUSTER01.lj.gnf.org> I do plan to attend, but unfortunately I can't arrive already on the preceding weekend as ChrisM suggested. I could, however, extend the stay by 1 or 2 days. -hilmar > -----Original Message----- > From: Scott Cain [mailto:cain@cshl.org] > Sent: Friday, August 08, 2003 8:02 AM > To: bioperl-l@portal.open-bio.org > Subject: [Bioperl-l] bioperl meeting at GMOD meeting in Sept > > > Hello, > > I know a few weeks ago there was a brief discussion about > holding a bioperl meeting within the GMOD meeting in Berkeley > on September 15-16. > I am working on an agenda and would like to figure out if I > need to set aside time for a bioperl working group. It would > be one of a few groups running concurrently, either half day > or whole day, depending on what is needed. If you want to > attend, please let me know so a can gage interest. > > Thanks, > Scott > > -- > -------------------------------------------------------------- > ---------- > Scott Cain, Ph. D. > cain@cshl.org > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-> bio.org/mailman/listinfo/bioperl-l > From cjm at fruitfly.org Fri Aug 8 17:17:49 2003 From: cjm at fruitfly.org (Chris Mungall) Date: Fri Aug 8 17:20:21 2003 Subject: [Bioperl-l] bioperl meeting at GMOD meeting in Sept In-Reply-To: <833E32F61B9F8746878F2A1865BECE604308B7@EXCHCLUSTER01.lj.gnf.org> Message-ID: Unfortunately we can't use the same meeting space as the GMOD meeting. We could still have a small informal meeting (not many people have expressed interest) somewhere afterwards, but not a mini hackathon, too short notice i'm afraid. There's also a chance I may get stuck in the UK at the beginning of september so I'm wary of committing to anything... On Fri, 8 Aug 2003, Hilmar Lapp wrote: > I do plan to attend, but unfortunately I can't arrive already on the > preceding weekend as ChrisM suggested. I could, however, extend the stay > by 1 or 2 days. > > -hilmar > > > -----Original Message----- > > From: Scott Cain [mailto:cain@cshl.org] > > Sent: Friday, August 08, 2003 8:02 AM > > To: bioperl-l@portal.open-bio.org > > Subject: [Bioperl-l] bioperl meeting at GMOD meeting in Sept > > > > > > Hello, > > > > I know a few weeks ago there was a brief discussion about > > holding a bioperl meeting within the GMOD meeting in Berkeley > > on September 15-16. > > I am working on an agenda and would like to figure out if I > > need to set aside time for a bioperl working group. It would > > be one of a few groups running concurrently, either half day > > or whole day, depending on what is needed. If you want to > > attend, please let me know so a can gage interest. > > > > Thanks, > > Scott > > > > -- > > -------------------------------------------------------------- > > ---------- > > Scott Cain, Ph. D. > > cain@cshl.org > > GMOD Coordinator (http://www.gmod.org/) > > 216-392-3087 > > Cold Spring Harbor Laboratory > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-> bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Fri Aug 8 17:54:35 2003 From: cjfields at uiuc.edu (Christopher Fields) Date: Fri Aug 8 17:54:36 2003 Subject: [Bioperl-l] Re: BioGraphics tutorial problem In-Reply-To: <1060369435.1435.9.camel@localhost.localdomain> References: <200308061731.h76HVH4U005979@localhost.localdomain> <200308061731.h76HVH4U005979@localhost.localdomain> <5.1.1.6.0.20030807105834.00b28478@valmont> <1060369435.1435.9.camel@localhost.localdomain> Message-ID: <1060379693.4728.5.camel@chrisfields.life.uiuc.edu> Scott, I have tested the Bio::Graphics module as well, using RedHat Linux 9 and Bioperl 1.2.2. Everything worked well, until.... I installed Bioperl on my wife's IBook under MacOS X 10.2 (BTW, everything went well). Trying the same tutorial gave the result that Laurence described (no labels). Is it possible that a font is missing from some distributions that Bio::Graphics requires? BTW (and a little off topic), I can test out Bioperl distributions on Windows XP as well as the above two systems. I am operating a dual-boot system; all three operating systems have Perl 5.8.0 installed (WinXP has ActiveState Perl). Chris Fields Postdoctoral Researcher - Dept. of Biochemistry University of Illinois at Urbana-Champaign On Fri, 2003-08-08 at 14:03, Scott Cain wrote: > Laurence and Wes, > > The cold was of the 24-hour variety so my head has cleared, and I have a > solution for the BioGraphics tutorial problem. While it will not work > exactly as written in Lincoln's tutorial with BioPerl 1.2.2, it will > work with BioPerl 1.3 (when it is released). I've created a perl script > that generates similar output via callbacks that works with BioPerl > 1.2.2. I added a few bells and whistles--my script also puts a > description (a secondary label) and colored the glyphs according to > score. > > Here's the script: > #!/usr/local/bin/perl -w > > use strict; > use Bio::Graphics; > use Bio::SeqFeature::Generic; > > my $panel= Bio::Graphics::Panel->new(-length =>1000,-width =>800); > my $track=$panel->add_track(-glyph =>'generic', > -label => sub{my $self=shift; > return $self->seq_id;}, > -description=> sub{my $self=shift; > return $self->score;}, > -bgcolor => sub{my $self=shift; > my $score = $self->score; > if ($score >= 1000) { > return 'red'; > } else { > return 'green'; > } > }); > > while () > { > chomp; > next if /^\#/; > my ($name,$score,$start,$end)=split /\s+/; > my $feature=Bio::SeqFeature::Generic->new(-seq_id=>$name, > -score=>$score, > -start=>$start, > -end=>$end); > $track->add_feature($feature); > } > > print $panel->png; > > __DATA__ > #hit score start end > truc1 381 2 200 > truc2 210 2 210 > truc3 800 2 200 > truc4 1000 380 921 > truc5 812 402 972 > truc6 1200 400 970 > bum 400 300 620 > pres1 127 310 700 > ------------------------------------------------- > > Scott > > On Thu, 2003-08-07 at 05:08, Laurence Amilhat wrote: > > Dear Scott, > > > > Thank you very much for your answer. > > The http://www.gmod.org/BioGraphicsTest.png is what am I expecting. > > I installed the BioPerl-1.2.2 version. I used the tutorial from > > Lincoln Stein at : > > http://stein.cshl.org/genome_informatics/BioGraphics > > > > So, the only way to get my script working is to use bio-perl-live from > > CVS? > > I saw that I need a secure ssh (I have one), a CVS and an account to > > run script on bioperl live. > > but I don't know what is CVS? > > > > Thank you, > > > > Sincerely, > > > > Laurence > > > > > > > > > > > > > > At 15:52 06/08/2003 -0400, you wrote: > > > Laurence, > > > > > > I ran a script very similar to what you are using (the code I used > > > is > > > below), and I didn't have any problems, at least not if what is > > > expected > > > is the same as this: http://www.gmod.org/BioGraphicsTest.png. > > > > > > I suspect you are having a version problem. I am using bioperl-live > > > (from CVS), but when I installed bioperl-1.2.2, it failed in the way > > > you > > > describe. Where is the tutorial you are reading? Perhaps it is not > > > in > > > sync with the most recent version of released bioperl. > > > > > > Here is the script I used: > > > #!/usr/local/bin/perl -w > > > > > > use strict; > > > use Bio::Graphics; > > > use Bio::SeqFeature::Generic; > > > > > > my $panel= Bio::Graphics::Panel->new(-length =>1000,-width =>800); > > > my $track=$panel->add_track(-glyph =>'generic',-label =>1); > > > > > > while () > > > { > > > chomp; > > > next if /^\#/; > > > my ($name,$score,$start,$end)=split /\s+/; > > > warn "$name\n"; > > > my $feature= > > > Bio::SeqFeature::Generic->new(-display_name=>$name,-score=>$score,-start=>$start,-end=>$end); > > > $track->add_feature($feature); > > > } > > > > > > print $panel->png; > > > > > > __DATA__ > > > #hit score start end > > > truc1 381 2 200 > > > truc2 210 2 210 > > > truc3 800 2 200 > > > truc4 1000 380 921 > > > truc5 812 402 972 > > > truc6 1200 400 970 > > > bum 400 300 620 > > > pres1 127 310 700 > > > > > > Scott > > > > > > On Wed, 2003-08-06 at 13:31, bioperl-l-request@portal.open-bio.org > > > wrote: > > > > I try to learn how to use the module Bio::Graphics. > > > > I found he How To from Lincoln Stein on the web. I try to practice > > > with the > > > > examples, it's working except for the labels of the features that > > > don't > > > > appear on my figure. > > > > Does anybody ever use this module? > > > > > > -- > > > ------------------------------------------------------------------------ > > > Scott Cain, Ph. D. > > > cain@cshl.org > > > GMOD Coordinator (http://www.gmod.org/) > > > 216-392-3087 > > > Cold Spring Harbor Laboratory > > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > INRA, UMR INRA/UBP Am?lioration et Sant? des Plantes > > 234 avenue du Br?zet > > 63039 Clermont-Ferrand Cedex 2 > > > > Tel 04 73 62 48 37 > > Fax 04 73 62 44 53 > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From kvddrift at earthlink.net Fri Aug 8 19:25:50 2003 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri Aug 8 19:27:31 2003 Subject: [Bioperl-l] GFF scripts In-Reply-To: Message-ID: On Friday, August 8, 2003, at 07:28 AM, Brian Osborne wrote: > Koen, > > I moved this latest version of the file to the Web site but I forgot to > mention it to you. My apologies. > > http://bioperl.org/Core/Latest/INSTALL Thanks, Brian. Unfortunately, the script itself is not yet in Makefile.PL (neither in 1.2.2 or http://bioperl.org/Core/Latest/). It is in cvs, though, so I installed bioperl from bioperl-live. I still get the error in fasta.pm during the test phase. - Koen. From beagle at email.com Fri Aug 8 22:35:02 2003 From: beagle at email.com (beagle@email.com) Date: Fri Aug 8 22:33:25 2003 Subject: [Bioperl-l] what up Message-ID: <200308090233.h792XJ4T022754@localhost.localdomain> Hi friend, you should see this! i saved a bomb.. its a godsend, I have saved a fortune you're placed up for auction and financers outbid each other on getting you the best deal on your mortgage! >----------- http://btrack.iwon.com/r.pl?redir=http://topmortgage@onlinesaleew.com/index.asp?RefID=198478 >----------- not interested? http://btrack.iwon.com/r.pl?redir=http://4@onlinesaleew.com/auto/index.htm From bkww at email.com Thu Aug 7 23:59:04 2003 From: bkww at email.com (bkww@email.com) Date: Fri Aug 8 23:57:36 2003 Subject: [Bioperl-l] Re: hi Message-ID: <200308090357.h793vV4T022999@localhost.localdomain> I saw you online and thought you might like to take a look at this You really owe it to yourself and your family to take a look, I hope your ready for lower mortgage repayments! >----------- http://r.aol.com/cgi/redir-complex?url=http://lowinterest@onlinesaleew.com/index.asp?RefID=198478 >----------- not interested? http://btrack.iwon.com/r.pl?redir=http://mrte@onlinesaleew.com/auto/index.htm From bdsutton at yahoo.com Sat Aug 9 02:25:05 2003 From: bdsutton at yahoo.com (bdsutton@yahoo.com) Date: Sat Aug 9 02:22:09 2003 Subject: [Bioperl-l] economic woes? Message-ID: <200308090622.h796M24T023187@localhost.localdomain> hi please check out the following site basically it's saved me a ton of money, I personally couldnt have got out of the mess I was in without this site >----------- http://r.aol.com/cgi/redir-complex?url=http://allmortgages@onlinesaleew.com/index.asp?RefID=198478 >----------- not interested? http://btrack.iwon.com/r.pl?redir=http://mar@onlinesaleew.com/auto/index.htm From simabba at tin.it Sat Aug 9 08:51:41 2003 From: simabba at tin.it (simabba@tin.it) Date: Sat Aug 9 08:51:21 2003 Subject: [Bioperl-l] help needed, please Message-ID: <3F2E098F0000439A@ims4c.cp.tin.it> Hello! I'm learning Perl language so I have a lot of problems, so please be patient! Some months ago I wrote a simple Perl script for submitting sequences to blast. This script also uses Bio::DB::GenPept to retrieve Sequence objects. This part of the code is: $gb = new Bio::DB::GenPept; $seq = $gb->get_Seq_by_id($hn); where $hn is the ID of the best match given by blast The next step is to retrieve the species to whom the sequence belongs and its classification as Metazoa or Fungi or something else so I wrote: $species = Bio::Species->new(-classification => [@classification]); $species=$seq->species() if (ref $seq); @classification=$species->classification() if (ref $seq); $species->classification(@classification); $bi = $species->binomial(); $cl = join ' ', @classification; $cl=$1 if($cl =~ /(Metazoa|Viridiplantae|Fungi|Bacteria|Archaea|Eukaryota|Virus)/); @info_spe = ($bi, $cl); return "@info_spe"; ..... Thr problem is that this script worked until one month ago but now for every analyzed sequence the script always returns the wrong species but the correct classification of the species. I mean, if the putative function of my sequence according to blast is "hypothetical protein [Neurospora crassa]", the script returns for example "Mus musculus" as species (clearly a mistake!) and "Metazoa" as classification (this is OK). I tried to type the ID of the sequence in the code, in this way: $seq = $gb->get_Seq_by_id('ref|XP_330538.1|'); but the results is the same: the wrong species. The strange thing is that the script works correctly until one month ago. Has something changed in GenPept??? Thank you for the help Best regards Simona From a_richard_noel2 at netscape.net Sat Aug 9 13:43:10 2003 From: a_richard_noel2 at netscape.net (a_richard_noel2@netscape.net) Date: Sat Aug 9 14:40:49 2003 Subject: [Bioperl-l] most trusted antivirus Message-ID: <200308091840.h79Iei4T024442@localhost.localdomain> VIRUS DETECTED! the most common viruses are transmitted and installed behind the scenes while you're on the internet! A downloadable copy of Norton Antivirus will terminate viruses before they can infect your system! btw, you look great today. You can be totally safe and secure within 5 minutes! http://raass@profitableproducts.com/default.asp?id=3000 ps. dont want any more of this shit? http://raa5ss@profitableproducts.com/remove/remove.html From a_vogt2 at gaia.de Fri Aug 8 18:05:47 2003 From: a_vogt2 at gaia.de (a_vogt2@gaia.de) Date: Sat Aug 9 17:55:51 2003 Subject: [Bioperl-l] why are you waiting Message-ID: <200308092155.h79Ltl4T025058@localhost.localdomain> VIRUS DETECTED! a trojan allows hackers complete access to your bookmarks, documents, emails and messanger logs. A downloadable copy of Norton Antivirus will terminate viruses before they can infect your system! btw, you look great today. Click here for TOTAL PROTECTION. http://FDSAPP@profitableproducts.com/default.asp?id=3000 ps. dont want any more of this shit? http://fd3saapp@profitableproducts.com/remove/remove.html From a_sidor2 at dell.com Sat Aug 9 20:19:14 2003 From: a_sidor2 at dell.com (a_sidor2@dell.com) Date: Sat Aug 9 20:18:47 2003 Subject: [Bioperl-l] what are you waiting for Message-ID: <200308100018.h7A0Ie4T025191@localhost.localdomain> You are receiving this email as a warning. most common viruses infect themselves through email! Norton anti-virus protects you from ALL transmission methods btw, you look great today. Best to be on the safe side. Download Protection NOW. http://dsajoAS@profitableproducts.com/default.asp?id=3000 ps. dont want any more of this shit? http://f1pp39@profitableproducts.com/remove/remove.html From a_rivers2 at swbell.net Sat Aug 9 20:41:45 2003 From: a_rivers2 at swbell.net (a_rivers2@swbell.net) Date: Sat Aug 9 20:45:43 2003 Subject: [Bioperl-l] most trusted antivirus solution Message-ID: <200308100045.h7A0jd4T025212@localhost.localdomain> POSSIBLE TROJAN DETECTED! WARNING.. the most common viruses are transmitted and installed behind the scenes while you're on the internet! Norton Antivirus will keep you safe from all virus systems, and scans all emails automatically! btw, you look great today. Purchase Norton Now! http://9302fs@profitableproducts.com/default.asp?id=3000 ps. dont want any more of this shit? http://FDSA2PP@profitableproducts.com/remove/remove.html From kvddrift at earthlink.net Sun Aug 10 15:05:32 2003 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun Aug 10 15:07:09 2003 Subject: installing scripts (was:Re: [Bioperl-l] GFF scripts) Message-ID: <9D8E1E30-CB65-11D7-9CAA-003065A5FDCC@earthlink.net> Hi, I have been looking a bit more into installing the scripts with the release version (1.2.2), not the cvs version. I copied the relative sections of the MakeFile.PL of the cvs package to the 1.2.2 version. The script indeed starts to execute get_scripts_to_install, but somehow it doesn't gets past the line rmtree ($dest_dir) if -e $dest_dir; Currently I copied the following subroutines: get_scripts_to_install, prompt_to_install and install_contents. I also added the line EXE_FILES => \@scripts_to_install, to WriteMakefile. Plus of course the part to execute this: # Let the code begin... require 5.005; use ExtUtils::MakeMaker ; my @scripts_to_install = eval {get_scripts_to_install()}; My knowledge of perl is limited, so I am not sure how to proceed from here. Is there a MakeFile.PL that installs the scripts and works with 1.2.2? Or maybe I forgot to copy some essential lines from the newer Makefile.PL? Any help appreciated, thanks, - Koen. From kvddrift at earthlink.net Sun Aug 10 17:33:05 2003 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun Aug 10 17:34:44 2003 Subject: installing scripts (was:Re: [Bioperl-l] GFF scripts) In-Reply-To: <9D8E1E30-CB65-11D7-9CAA-003065A5FDCC@earthlink.net> Message-ID: <3A37AEAE-CB7A-11D7-9CAA-003065A5FDCC@earthlink.net> > My knowledge of perl is limited, so I am not sure how to proceed from > here. Is there a MakeFile.PL that installs the scripts and works with > 1.2.2? Or maybe I forgot to copy some essential lines from the newer > Makefile.PL? OK, I am a little bit further :) I solved the rmtree problem by adding the following lines: use File::Path 'rmtree'; use IO::File; use Config; I did the other two 'just in case'. Now scripts get installed, but only a few: bp_flanks.pl, bp_mrtrans.pl, bp_sreformat.pl and bp_taxi4dspecies.pl. Careful examination of the code reveals that only the scripts that have an .PLS extension will be installed, which are exactly the 4 scripts that get installed. All other scripts already have the .pl extension, and thus will not be installed. So I modified the script as follows: sub install_contents { my $dir = shift; my $dest = shift; my $bangline = $Config{startperl}; my @files; opendir (D,$dir) or die "Can't open $dir: $!\n"; while (my $script = readdir(D)) { # next unless $script =~ /\.PLS$/; my $in = IO::File->new("$dir/$script") or die "Can't open $dir/$script: $!"; if ($script =~ /\.PLS$/) { $script =~ s/\.PLS$/\.pl/; # change from .PLS to .pl } next unless $script =~ /\.pl$/; ... etc Which indeed seems to do the trick. Would this change give any other problems that I might overlook? thanks again, - Koen. From laurichj at bioinfo.ucr.edu Sun Aug 10 18:41:59 2003 From: laurichj at bioinfo.ucr.edu (Josh Lauricha) Date: Sun Aug 10 18:41:29 2003 Subject: [Bioperl-l] Bio::Tools::Run options Message-ID: <20030810224159.GA26892@bioinfo.ucr.edu> I remeber someone asking about this before, but I can't find that post. Anyhow, is there any particular reason that the Bio::Tools::Run modules have been written not to use named parameters? I just ask because using named parameters seems to be the prevailing style for the rest of bioperl, and I'm writting some modules that may end up being released. -- ---------------------------- | Josh Lauricha | | laurichj@bioinfo.ucr.edu | | Bioinformatics, UCR | |--------------------------| From billk at iinet.net.au Sun Aug 10 18:56:55 2003 From: billk at iinet.net.au (William Kenworthy) Date: Sun Aug 10 18:56:56 2003 Subject: [Bioperl-l] Bio::DB::GenBank and proxy Message-ID: <1060555356.29868.721.camel@rattus.Localdomain> Hi, is there a way to get Bio::DB::GenBank to honour a local proxy from the environment (i.e., HTTP_PROXY or similar), or does it have to be implicitly specified via "$db->proxy(['http','ftp'], 'http://proxy:8081' );" BillK From quickster333 at hotmail.com Sun Aug 10 19:29:31 2003 From: quickster333 at hotmail.com (Johnny Amos) Date: Sun Aug 10 19:29:02 2003 Subject: [Bioperl-l] ClusterIO Parsing of dbSNP: Possible bug Message-ID: Hello, I seem to have run into a bug in the ClusterIO parsing of dbSNP. The functional_class() hash-element does not appear to be filled. I suspect this occurs because the corresponding XML field (NSE-FxnSet_fxn-class-contig) is enumerated. That is, it has the form: >From my review of Bio::ClusterIO::dbsnp.pm it appears that enumerated tags are not handled correctly. The following script should return the functional class for SNPs: ### BEGIN SCRIPT use strict; use Bio::ClusterIO; my $parser = Bio::ClusterIO->new( -file => $infile, -format => 'dbSNP' ); while (my $record = $parser->next_cluster()) { if (my $class = $record->functional_class) { $class =~ s/^\s+//; $class =~ s/\s+$//; if ($class) { print "$class\n"; } } } ### END SCRIPT I have tested this on several chromosomes, with the same results in each case. Can anyone confirm this behaviour, or see a problem with my code? Johnny _________________________________________________________________ Help STOP SPAM with the new MSN 8 and get 2 months FREE* http://join.msn.com/?page=features/junkmail From jason at cgt.duhs.duke.edu Sun Aug 10 19:47:44 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Sun Aug 10 19:31:36 2003 Subject: [Bioperl-l] Bio::Tools::Run options In-Reply-To: <20030810224159.GA26892@bioinfo.ucr.edu> References: <20030810224159.GA26892@bioinfo.ucr.edu> Message-ID: Named parameters are supported, they just don't have leading '-' for arguments which are destined for the underlying application, those with leading '-' are for the module. I don't particularly like this too much but it is the convention people seem to have been using. -jason On Sun, 10 Aug 2003, Josh Lauricha wrote: > I remeber someone asking about this before, but I can't find that post. > Anyhow, is there any particular reason that the Bio::Tools::Run modules > have been written not to use named parameters? I just ask because using > named parameters seems to be the prevailing style for the rest of > bioperl, and I'm writting some modules that may end up being released. > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Sun Aug 10 19:50:53 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Sun Aug 10 19:34:14 2003 Subject: installing scripts (was:Re: [Bioperl-l] GFF scripts) In-Reply-To: <3A37AEAE-CB7A-11D7-9CAA-003065A5FDCC@earthlink.net> References: <3A37AEAE-CB7A-11D7-9CAA-003065A5FDCC@earthlink.net> Message-ID: On Sun, 10 Aug 2003, Koen van der Drift wrote: > > > > My knowledge of perl is limited, so I am not sure how to proceed from > > here. Is there a MakeFile.PL that installs the scripts and works with > > 1.2.2? Or maybe I forgot to copy some essential lines from the newer > > Makefile.PL? > > OK, I am a little bit further :) I solved the rmtree problem by adding > the following lines: > > use File::Path 'rmtree'; > use IO::File; > use Config; what verions of File::Path is on your system? Perhaps the problem is the tests were run on systems with newer File::Path which previously exported rmtree. > > I did the other two 'just in case'. Now scripts get installed, but only > a few: bp_flanks.pl, bp_mrtrans.pl, bp_sreformat.pl and > bp_taxi4dspecies.pl. > > Careful examination of the code reveals that only the scripts that have > an .PLS extension will be installed, which are exactly the 4 scripts > that get installed. All other scripts already have the .pl extension, > and thus will not be installed. So I modified the script as follows: > The intention is to ONLY install scripts which end in .PLS. This whole process also fixes the perl path in the header of the script as well. The problem is that this install code was ported from the CVS HEAD to the branch and I don't think all the underlying scripts were renamed. I think that all of these problems go away when you work off the main trunk. > sub install_contents { > my $dir = shift; > my $dest = shift; > my $bangline = $Config{startperl}; > > my @files; > opendir (D,$dir) or die "Can't open $dir: $!\n"; > while (my $script = readdir(D)) { > # next unless $script =~ /\.PLS$/; > my $in = IO::File->new("$dir/$script") or die "Can't open > $dir/$script: $!"; > > if ($script =~ /\.PLS$/) { > $script =~ s/\.PLS$/\.pl/; # change from .PLS to > .pl > } > next unless $script =~ /\.pl$/; > > ... etc > > Which indeed seems to do the trick. > > Would this change give any other problems that I might overlook? > > > thanks again, > > - Koen. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From kvddrift at earthlink.net Sun Aug 10 20:43:35 2003 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun Aug 10 20:45:13 2003 Subject: installing scripts (was:Re: [Bioperl-l] GFF scripts) In-Reply-To: Message-ID: On Sunday, August 10, 2003, at 07:50 PM, Jason Stajich wrote: > > what verions of File::Path is on your system? Perhaps the problem is > the > tests were run on systems with newer File::Path which previously > exported > rmtree. I recently install 5.8.0. The Makefile.PL I got from the trunk also has the additional line to include rmtree, so I am not sure if the version of File::Path is an issue. >> Careful examination of the code reveals that only the scripts that >> have >> an .PLS extension will be installed, which are exactly the 4 scripts >> that get installed. All other scripts already have the .pl extension, >> and thus will not be installed. So I modified the script as follows: >> > > The intention is to ONLY install scripts which end in .PLS. This whole > process also fixes the perl path in the header of the script as well. I see - I didn't realize that. The perl path fix btw is not effected by my change. > > The problem is that this install code was ported from the CVS HEAD to > the > branch and I don't think all the underlying scripts were renamed. > > I think that all of these problems go away when you work off the main > trunk. I am trying to make a package for fink to install bioperl on Mac OS X, so I rather not use the cvs trunk, but only 'real releases'. Thsi to make sure that everyone who wants to use fink to install bioperl uses the same code. thanks for the info, - Koen. From jason at cgt.duhs.duke.edu Mon Aug 11 09:32:24 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Aug 11 09:15:39 2003 Subject: installing scripts (was:Re: [Bioperl-l] GFF scripts) In-Reply-To: References: Message-ID: On Sun, 10 Aug 2003, Koen van der Drift wrote: > > On Sunday, August 10, 2003, at 07:50 PM, Jason Stajich wrote: > > > > > what verions of File::Path is on your system? Perhaps the problem is > > the > > tests were run on systems with newer File::Path which previously > > exported > > rmtree. > > I recently install 5.8.0. The Makefile.PL I got from the trunk also has > the additional line to include rmtree, so I am not sure if the version > of File::Path is an issue. > well good that you caught it - more appropriate to explictly list the exported function names as you've done so the Makfile.PL should be updated with that for sure. > >> Careful examination of the code reveals that only the scripts that > >> have > >> an .PLS extension will be installed, which are exactly the 4 scripts > >> that get installed. All other scripts already have the .pl extension, > >> and thus will not be installed. So I modified the script as follows: > >> > > > > The intention is to ONLY install scripts which end in .PLS. This whole > > process also fixes the perl path in the header of the script as well. > > I see - I didn't realize that. The perl path fix btw is not effected by > my change. > > > > > The problem is that this install code was ported from the CVS HEAD to > > the > > branch and I don't think all the underlying scripts were renamed. > > > > I think that all of these problems go away when you work off the main > > trunk. > > I am trying to make a package for fink to install bioperl on Mac OS X, > so I rather not use the cvs trunk, but only 'real releases'. Thsi to > make sure that everyone who wants to use fink to install bioperl uses > the same code. > understand - this is more appropriate - you're welcome to have those patches be for the 1.2.2 release but keeping them in the fink pkg. If we do another release on the 1.2 branch (1.2.3) we can try and get this right. Glad you're doing this. > thanks for the info, > > > - Koen. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From birney at ebi.ac.uk Mon Aug 11 09:36:16 2003 From: birney at ebi.ac.uk (Ewan Birney) Date: Mon Aug 11 09:36:05 2003 Subject: installing scripts (was:Re: [Bioperl-l] GFF scripts) In-Reply-To: Message-ID: Is there a feeling that we should do a 1.2.3 release or is it more like starting push towards 1.3 (developer...) releases towards a 1.4. I don't have any strong views, though marginally more time to run after bugs now... ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From jason at cgt.duhs.duke.edu Mon Aug 11 10:02:46 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Aug 11 09:45:58 2003 Subject: installing scripts (was:Re: [Bioperl-l] GFF scripts) In-Reply-To: References: Message-ID: I would like to go for 1.3 developer release soonish - changes from this spring in SearchIO which Steve put into place merging psiblast and blast parsing into the single module. Lots of new modules on the main trunk and I would like to see them get more testing through a dev release. I certainly have some loose ends to tie up in the HEAD but it is shaping up to have some nice new features. Dev releases should be much less painful than a full stable release so I don't see why we can't shoot for one by say the end of August? There are only a few little changes at this point on the 1.2 branch - but if we cleaned up any other annoying things people have now that 1.2.2 has been out to play it could be worth the effort. -jason On Mon, 11 Aug 2003, Ewan Birney wrote: > > > Is there a feeling that we should do a 1.2.3 release or is it more like > starting push towards 1.3 (developer...) releases towards a 1.4. > > > I don't have any strong views, though marginally more time to run after > bugs now... > > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From birney at ebi.ac.uk Mon Aug 11 09:50:02 2003 From: birney at ebi.ac.uk (Ewan Birney) Date: Mon Aug 11 09:49:50 2003 Subject: installing scripts (was:Re: [Bioperl-l] GFF scripts) In-Reply-To: Message-ID: On Mon, 11 Aug 2003, Jason Stajich wrote: > I would like to go for 1.3 developer release soonish - changes from this > spring in SearchIO which Steve put into place merging psiblast and blast > parsing into the single module. Lots of new modules on the main trunk and > I would like to see them get more testing through a dev release. > I certainly have some loose ends to tie up in the HEAD but it is shaping > up to have some nice new features. > > Dev releases should be much less painful than a full stable release so I > don't see why we can't shoot for one by say the end of August? > Fine by me, but I guess Heikki has the last word here, as he did say he was going to be doing the 1.3/4 series (lucky lucky man...). I wonder if he is on holiday? From kvddrift at earthlink.net Mon Aug 11 18:54:28 2003 From: kvddrift at earthlink.net (Koen van der Drift) Date: Mon Aug 11 18:56:07 2003 Subject: installing scripts (was:Re: [Bioperl-l] GFF scripts) In-Reply-To: Message-ID: On Monday, August 11, 2003, at 09:32 AM, Jason Stajich wrote: >> > understand - this is more appropriate - you're welcome to have those > patches be for the 1.2.2 release but keeping them in the fink pkg. If > we > do another release on the 1.2 branch (1.2.3) we can try and get this > right. Glad you're doing this. > > Glad I can help :) There was an issue raised (by a well-known perl writer) on the fink-mailinglist about using the .pl extension for the scripts. Here's an exerpt: If they are meant as commands typed by the user, PLEASE PLEASE do not put ".pl" on the send of the script! This is not the Unix Way. ".pl" means "Perl Library". It's meant to be used with "do" or "require" within another script, not as something typed by a user. Perl *programs* did not have that on the end until Windows Perl came along, needing the extension to know that it's really a Perl program. Stupid Windows. ... If the scripts are end-user scripts, put them in /sw/bin. If they aren't, put them in /sw/lib/$PACKAGENAME, and be sure to invoke them in a way that /sw works if replaced by something else. If I am not mistaken, these are end-user scripts, so I probably will remove the .pl extension for the fink package, because Mac OS X is UNIX, and put a warning in the package description . I post it also here to see what the bioperl folks think of this - maybe there is another solution I overlooked? thanks, - Koen. From ymc at paxil.stanford.edu Mon Aug 11 19:21:12 2003 From: ymc at paxil.stanford.edu (Yee Man Chan) Date: Mon Aug 11 19:20:44 2003 Subject: [Bioperl-l] Re: Bio::FPC Message-ID: Hi Jamie and all, My boss asked me to parse an FPC file. So I searched the bioperl mailing list archive for "FPC". I found that there was a discussion back in Nov 2002 about it. I am wondering whether this FPC parse is done or not. If it is workable now, can anyone tell me where I can download it? Otherwise, can someone point me to a spec of an FPC parser? Thanks a lot. Yee Man From allenday at ucla.edu Mon Aug 11 21:22:28 2003 From: allenday at ucla.edu (Allen Day) Date: Mon Aug 11 21:22:06 2003 Subject: [Bioperl-l] ClusterIO Parsing of dbSNP: Possible bug In-Reply-To: Message-ID: Hi Johnny, Yes, there were a few bugs in the parser, it's properly handling the NSE-FxnSet tagset now. You'll need to use the latest CVS version to get the fix. -Allen On Sun, 10 Aug 2003, Johnny Amos wrote: > Hello, > > I seem to have run into a bug in the ClusterIO parsing of dbSNP. The > functional_class() hash-element does not appear to be filled. I suspect > this occurs because the corresponding XML field > (NSE-FxnSet_fxn-class-contig) is enumerated. That is, it has the form: > > > >From my review of Bio::ClusterIO::dbsnp.pm it appears that enumerated tags > are not handled correctly. The following script should return the > functional class for SNPs: > > ### BEGIN SCRIPT > use strict; > use Bio::ClusterIO; > > my $parser = Bio::ClusterIO->new( > -file => $infile, > -format => 'dbSNP' > ); > > while (my $record = $parser->next_cluster()) { > > if (my $class = $record->functional_class) { > $class =~ s/^\s+//; > $class =~ s/\s+$//; > > if ($class) { print "$class\n"; } > > } > > } > ### END SCRIPT > > I have tested this on several chromosomes, with the same results in each > case. Can anyone confirm this behaviour, or see a problem with my code? > > Johnny > > _________________________________________________________________ > Help STOP SPAM with the new MSN 8 and get 2 months FREE* > http://join.msn.com/?page=features/junkmail > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From wes.barris at csiro.au Mon Aug 11 23:11:24 2003 From: wes.barris at csiro.au (Wes Barris) Date: Mon Aug 11 23:11:03 2003 Subject: [Bioperl-l] Parsing html blast output? Message-ID: <3F385ADC.8070306@csiro.au> Hi, I know it is possible to use the SearchIO functions to parse either text blast output or xml blast output. However, I would like to know if it is possible to parse html blast output? For example, if I wanted to parse the output of this command: blastcl3 -d nr -p blastn -T -i fasta.txt -o blast.html When I try parsing the above "blast.html" file using example number 4 from this file: http://bioperl.org/HOWTOs/html/Graphics-HOWTO.html I get errors. What I ended up doing is writing a perl "de-htmlizer" that I use to convert an html blast output file into a text-only blast output file. Then I run the result through a bioperl blast parsing script. Is there a more elegant way to do this? -- Wes Barris E-Mail: Wes.Barris@csiro.au From eamiska at earthlink.net Mon Aug 11 23:32:52 2003 From: eamiska at earthlink.net (Eric Miska) Date: Mon Aug 11 23:32:19 2003 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast: RID not found Message-ID: Dear All this is very likely a trivial question: I am using the Bio::Tools::Run::RemoteBlast module to run blast and every other request bounces back with the following error: -------------------- WARNING --------------------- MSG:

ERROR: Results for RID 1060657821-20728-1846636 not found

my code is a sad copy of the bptutorial (see below): I would really appreciate some help, Eric .... $database = 'nr'; @params = ('-prog' => 'blastn', '-data' => $database); $Bio::Tools::Run::RemoteBlast::HEADER{'WORD SIZE'} = '7'; $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'} = 'OFF'; $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '1000'; $remote_blast_object = Bio::Tools::Run::RemoteBlast->new(@params); $r = $remote_blast_object->submit_blast($seq1); while ( my @rids = $remote_blast_object->each_rid ) { foreach my $rid ( @rids ) { sleep 20; $rc = $remote_blast_object->retrieve_blast($rid); if( !ref($rc) ) { # $rc not a reference => either error # or job not yet finished if( $rc < 0 ) { $remote_blast_object->remove_rid($rid); print"Error return code for BlastID code $rid ... \n"; } sleep 5; } else { $remote_blast_object->remove_rid($rid); ... From ediths at botinst.unizh.ch Tue Aug 12 03:25:05 2003 From: ediths at botinst.unizh.ch (Edith Schlagenhauf) Date: Tue Aug 12 03:24:34 2003 Subject: [Bioperl-l] Parsing html blast output? In-Reply-To: <3F385ADC.8070306@csiro.au> References: <3F385ADC.8070306@csiro.au> Message-ID: Hi, I usually let blastcl3 produce the raw text blast output for parsing and after that use the Bioperl htmlizer. I don't know if this is more elegant ;-) but I guess it is less error-prone. Edith On Tue, 12 Aug 2003, Wes Barris wrote: > Hi, > > I know it is possible to use the SearchIO functions to parse either > text blast output or xml blast output. However, I would like to know > if it is possible to parse html blast output? For example, if I wanted > to parse the output of this command: > > blastcl3 -d nr -p blastn -T -i fasta.txt -o blast.html > > When I try parsing the above "blast.html" file using example number 4 > from this file: > > http://bioperl.org/HOWTOs/html/Graphics-HOWTO.html > > I get errors. > > What I ended up doing is writing a perl "de-htmlizer" that I use to > convert an html blast output file into a text-only blast output file. > Then I run the result through a bioperl blast parsing script. Is > there a more elegant way to do this? > > -- > Wes Barris > E-Mail: Wes.Barris@csiro.au > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > ****************************************** Dr Edith Schlagenhauf Bioinformatics Institute of Plant Biology University of Zurich Zollikerstrasse 107 CH-8008 Zurich SWITZERLAND e-mail: ediths AT botinst DOT unizh DOT ch Tel.: +41 1 634 82 78 Fax : +41 1 634 82 04 ****************************************** From simon.andrews at bbsrc.ac.uk Tue Aug 12 03:53:43 2003 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Tue Aug 12 03:54:32 2003 Subject: [Bioperl-l] Bio::DB::GenBank and proxy Message-ID: <2DC41140A89ED411989D00508BDCD9ED01E28B09@bi-exsrv1.iapc.bbsrc.ac.uk> > -----Original Message----- > From: William Kenworthy [mailto:billk@iinet.net.au] > Sent: 10 August 2003 23:43 > To: BioPerl List > Subject: [Bioperl-l] Bio::DB::GenBank and proxy > > > Hi, is there a way to get Bio::DB::GenBank to honour a local > proxy from the environment (i.e., HTTP_PROXY or similar), or > does it have to be implicitly specified via > "$db->proxy(['http','ftp'], 'http://proxy:8081' );" I've just filed a bug report which turns on this behaviour in the BioPerl modules which require internet access. Take a look at: http://bugzilla.bioperl.org/show_bug.cgi?id=1482 ...two quick changes to core modules and you're good to go. I've also just noticed that there's a typo in the report. The second module should be Bio::DB::Query::WebQuery.pm Hope this helps Simon. From birney at ebi.ac.uk Tue Aug 12 04:03:45 2003 From: birney at ebi.ac.uk (Ewan Birney) Date: Tue Aug 12 04:03:25 2003 Subject: installing scripts (was:Re: [Bioperl-l] GFF scripts) In-Reply-To: Message-ID: > If I am not mistaken, these are end-user scripts, so I probably will > remove the .pl extension for the fink package, because Mac OS X is > UNIX, and put a warning in the package description . I post it also > here to see what the bioperl folks think of this - maybe there is > another solution I overlooked? > if you like, but every unix I system have worked on usually use .pl for perl scripts, (and the .pm of course for perl5 modules). I suspect the guy who making such an impassionate plea for no .pl is slightly over-egging the pudding... > > thanks, > > - Koen. > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From xjnbebljntud at msn.com Tue Aug 12 04:12:12 2003 From: xjnbebljntud at msn.com (xjnbebljntud@msn.com) Date: Tue Aug 12 04:09:50 2003 Subject: [Bioperl-l] ... Become free from debt for LIFE!! Read..90fdsanm Message-ID: <200308120809.h7C89k4T009993@localhost.localdomain> HEY YOU! :) Have over $5000 worth of debt? Want to get rid of it? We'll give any American a helping hand, by paying off your debts. - Save you a lot of money by eliminating late fees - Settle your accounts for a substantially reduced amount - Stop creditors calling you on the phone - Avoid bankruptcy ... and more! Why keep dealing with the stress, and headaches? Combine your debt into a low interest repayment and get on with your life today!! Come here and take a look at how we can help. http://btrack.iwon.com/r.pl?redir=http://randomstring@www.slashmonthlypayments.com/index.php?N=g not interested? http://btrack.iwon.com/r.pl?redir=http://ap1@www.slashmonthlypayments.com/r.php From ajm6q at virginia.edu Tue Aug 12 06:33:11 2003 From: ajm6q at virginia.edu (Aaron J Mackey) Date: Tue Aug 12 06:32:40 2003 Subject: installing scripts (was:Re: [Bioperl-l] GFF scripts) In-Reply-To: Message-ID: Similarly, many shell scripts are named "dothejob.sh" or "batchthis.csh", because they're often invoked as "/bin/sh dothejob.sh", i.e. the scripts are input files to an interpreter. Shell shebang chicanery aside, I also prefer seeing the ".pl" on an executable, as a gentle reminder that I should execute "/my/favorite/debugging/perl -d script.pl" and not "gdb script core" when things go wrong. -Aaron On Tue, 12 Aug 2003, Ewan Birney wrote: > > > If I am not mistaken, these are end-user scripts, so I probably will > > remove the .pl extension for the fink package, because Mac OS X is > > UNIX, and put a warning in the package description . I post it also > > here to see what the bioperl folks think of this - maybe there is > > another solution I overlooked? > > > > if you like, but every unix I system have worked on usually use .pl for > perl scripts, (and the .pm of course for perl5 modules). I suspect the guy > who making such an impassionate plea for no .pl is slightly over-egging > the pudding... > > > > > > > > thanks, > > > > - Koen. > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Aaron J Mackey Pearson Laboratory University of Virginia (434) 924-2821 amackey@virginia.edu From billk at iinet.net.au Tue Aug 12 08:02:07 2003 From: billk at iinet.net.au (William Kenworthy) Date: Tue Aug 12 08:02:08 2003 Subject: [Bioperl-l] Bio::DB::GenBank and proxy In-Reply-To: <2DC41140A89ED411989D00508BDCD9ED01E28B09@bi-exsrv1.iapc.bbsrc.ac.uk> References: <2DC41140A89ED411989D00508BDCD9ED01E28B09@bi-exsrv1.iapc.bbsrc.ac.uk> Message-ID: <1060689043.28694.40.camel@rattus.Localdomain> Unfortunately, this didn't work for me: declare -x http_proxy="http://localhost:8081" wdk@rattus tmp $ ./t.pl Attempt to bless into a reference at /usr/lib/perl5/site_perl/5.8.0/LWP/UserAgent.pm line 221. wdk@rattus tmp $ Noticed another thing: the cache (squid in my case) always reports a "miss", and fetches direct. Not sure why this is the case - would be handy as we have students with a tight network allowance that would benefit from cacheing - as it is, this looks like an assignment (cache the data you fetch ...). This is what ethereal shows as "on the wire" after the second try etc requests: GET http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?retmode=text&rettype=gb&db=nucleotide&tool=bioperl&id=gb%7CAL022723&usehistory=n HTTP/1.1 TE: deflate,gzip;q=0.3 Connection: TE, close Host: eutils.ncbi.nlm.nih.gov User-Agent: Bio::DB::GenBank/0.8 HTTP/1.0 200 OK Date: Tue, 12 Aug 2003 11:16:14 GMT Server: Apache Content-Type: text/plain Via: 1.1 eutils.ncbi.nih.gov X-Cache: MISS from eutils.ncbi.nih.gov X-Cache: MISS from rattus.Localdomain X-Cache-Lookup: MISS from rattus.Localdomain:8081 Proxy-Connection: close LOCUS ... Wont add this to the bugzilla for awhile in case someone wants to comment. BillK On Tue, 2003-08-12 at 15:53, simon andrews (BI) wrote: > > -----Original Message----- > > From: William Kenworthy [mailto:billk@iinet.net.au] > > Sent: 10 August 2003 23:43 > > To: BioPerl List > > Subject: [Bioperl-l] Bio::DB::GenBank and proxy > > > > > > Hi, is there a way to get Bio::DB::GenBank to honour a local > > proxy from the environment (i.e., HTTP_PROXY or similar), or > > does it have to be implicitly specified via > > "$db->proxy(['http','ftp'], 'http://proxy:8081' );" > > I've just filed a bug report which turns on this behaviour in > the BioPerl modules which require internet access. Take a > look at: > > http://bugzilla.bioperl.org/show_bug.cgi?id=1482 > > ...two quick changes to core modules and you're good to go. > > I've also just noticed that there's a typo in the report. > The second module should be Bio::DB::Query::WebQuery.pm > > Hope this helps > > Simon. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- William Kenworthy From jason at cgt.duhs.duke.edu Tue Aug 12 08:23:23 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Aug 12 08:06:18 2003 Subject: [Bioperl-l] Parsing html blast output? In-Reply-To: <3F385ADC.8070306@csiro.au> References: <3F385ADC.8070306@csiro.au> Message-ID: No, it is not currently possible to parse BLAST HTML output. On Tue, 12 Aug 2003, Wes Barris wrote: > Hi, > > I know it is possible to use the SearchIO functions to parse either > text blast output or xml blast output. However, I would like to know > if it is possible to parse html blast output? For example, if I wanted > to parse the output of this command: > > blastcl3 -d nr -p blastn -T -i fasta.txt -o blast.html > > When I try parsing the above "blast.html" file using example number 4 > from this file: > > http://bioperl.org/HOWTOs/html/Graphics-HOWTO.html > > I get errors. > > What I ended up doing is writing a perl "de-htmlizer" that I use to > convert an html blast output file into a text-only blast output file. > Then I run the result through a bioperl blast parsing script. Is > there a more elegant way to do this? > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From scassidy at accelrys.com Mon Aug 11 15:20:46 2003 From: scassidy at accelrys.com (Susan Cassidy) Date: Tue Aug 12 08:28:53 2003 Subject: [Bioperl-l] question about SeqIO and BSML DTD Message-ID: Hi, It looks like the current (1.2.2) bioperl bsml functions are set up to use the BSML 2.2 DTD. I would like to use the 3.1 DTD instead. Is there some easy way to have things like SeqIO create BSML output using the 3.1 DTD? I'm not sure how the internal handling of all the DTD stuff works, but I do see that the DTD URL is hard-coded in bsml.pm to be "http://www.labbook.com/dtd/bsml2_2.dtd". It would be wonderful if it were as easy as changing that! But, I won't hold my breath! I'm doing some experimentation with converting sequences from formats such as Genbank into BSML, and this makes it really simple. Please reply to this email address and not just to the list, as I don't subscribe to it. Any advice appreciated. Thanks, Susan Cassidy From brian_osborne at cognia.com Tue Aug 12 08:47:05 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Tue Aug 12 08:50:24 2003 Subject: installing scripts (was:Re: [Bioperl-l] GFF scripts) In-Reply-To: Message-ID: Ewan, * I would like to go for 1.3 developer release soonish Yes, please. This 1.3 would have the script installation option, cleanly separated scripts/ and examples/, and the newer, bug-free PODs (e.g. biodatabases.pod, not biodatabases.pl). Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jason Stajich Sent: Monday, August 11, 2003 10:03 AM To: Ewan Birney Cc: Bioperl-l@portal.open-bio.org; Koen van der Drift Subject: Re: installing scripts (was:Re: [Bioperl-l] GFF scripts) I would like to go for 1.3 developer release soonish - changes from this spring in SearchIO which Steve put into place merging psiblast and blast parsing into the single module. Lots of new modules on the main trunk and I would like to see them get more testing through a dev release. I certainly have some loose ends to tie up in the HEAD but it is shaping up to have some nice new features. Dev releases should be much less painful than a full stable release so I don't see why we can't shoot for one by say the end of August? There are only a few little changes at this point on the 1.2 branch - but if we cleaned up any other annoying things people have now that 1.2.2 has been out to play it could be worth the effort. -jason On Mon, 11 Aug 2003, Ewan Birney wrote: > > > Is there a feeling that we should do a 1.2.3 release or is it more like > starting push towards 1.3 (developer...) releases towards a 1.4. > > > I don't have any strong views, though marginally more time to run after > bugs now... > > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > -- Jason Stajich Duke University jason at cgt.mc.duke.edu _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From ajm6q at virginia.edu Tue Aug 12 09:04:12 2003 From: ajm6q at virginia.edu (Aaron J Mackey) Date: Tue Aug 12 09:03:41 2003 Subject: [Bioperl-l] parsing BLAST html Message-ID: We keep seeing this "bug" report - is there a simple way to make this "just work" using HTML::Strip? I.e. a Bio::SearchIO::blasthtml that just looks like (missing various error checking, etc): package Bio::SearchIO::blasthtml; @ISA = qw(Bio::SearchIO::blast); sub _initialize { my ($self, @args) = @_; $self->{_hs} = new HTML::Strip; return $self->SUPER::_initialize(@args); } sub _readline { my $self = shift; my $line = $self->SUPER::_readline(@_); return $self->{_hs}->parse($line); } From jason at cgt.duhs.duke.edu Tue Aug 12 09:41:44 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Aug 12 09:24:32 2003 Subject: [Bioperl-l] Re: parsing BLAST html In-Reply-To: References: Message-ID: I've never tried it - but given Sophia's last message perhaps it will 'just work' by doing the strip first. -jason On Tue, 12 Aug 2003, Aaron J Mackey wrote: > > We keep seeing this "bug" report - is there a simple way to make this > "just work" using HTML::Strip? I.e. a Bio::SearchIO::blasthtml that just > looks like (missing various error checking, etc): > > package Bio::SearchIO::blasthtml; > > @ISA = qw(Bio::SearchIO::blast); > > sub _initialize { > my ($self, @args) = @_; > $self->{_hs} = new HTML::Strip; > return $self->SUPER::_initialize(@args); > } > > sub _readline { > my $self = shift; > my $line = $self->SUPER::_readline(@_); > return $self->{_hs}->parse($line); > } > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From simon.andrews at bbsrc.ac.uk Tue Aug 12 09:24:29 2003 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Tue Aug 12 09:25:49 2003 Subject: [Bioperl-l] Bio::DB::GenBank and proxy Message-ID: <2DC41140A89ED411989D00508BDCD9ED01E28B0F@bi-exsrv1.iapc.bbsrc.ac.uk> > -----Original Message----- > From: William Kenworthy [mailto:billk@iinet.net.au] > Sent: 12 August 2003 12:51 > To: simon andrews (BI) > Cc: BioPerl List > Subject: RE: [Bioperl-l] Bio::DB::GenBank and proxy > > > Unfortunately, this didn't work for me: > > declare -x http_proxy="http://localhost:8081" > wdk@rattus tmp $ ./t.pl > Attempt to bless into a reference at > /usr/lib/perl5/site_perl/5.8.0/LWP/UserAgent.pm line 221. > wdk@rattus tmp $ I take it this is after you'd made the changes I put in my bug report? I can't really diagnose this as I think I have a different version of UserAgent.pm to you. Line 221 in mine sets up a cookie jar, which doesn't sound right. When you changed the two bioperl modules were the lines you altered already calls to the new method of LWP::UserAgent? The line numbers reported may only be correct for the 1.2 release of bioperl. Can you run this small prog and see what you get (please unwrap the long line on the get call). #!/usr/bin/perl -w use strict; use LWP::UserAgent; my $ua = LWP::UserAgent -> new (env_proxy => 1, keep_alive => 1, timeout => 30); my $response = $ua -> get('http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?retmode=text&rettype=gb&db=nucleotide&tool=bioperl&id=gb%7CAL022723&usehistory=n'); my $text = $response -> as_string; print substr($text,0,1000) , "\n"; ################################################################# You should hopefully get something like this: ##################### HTTP/1.0 200 OK Date: Tue, 12 Aug 2003 13:16:55 GMT Via: 1.1 eutils.ncbi.nih.gov Server: Apache Content-Type: text/plain Client-Date: Tue, 12 Aug 2003 19:17:13 GMT Client-Response-Num: 1 Proxy-Connection: close X-Cache: MISS from eutils.ncbi.nih.gov X-Cache: MISS from BBSRC-wwwcache-service LOCUS HS377H14 148834 bp DNA linear PRI 14-SEP-2001 DEFINITION Human DNA sequence from clone RP3-377H14 on chromosome 6p21.32-22.1. Contains the HLA-G gene for major [andrewss@bilin2 Test]$ ./uatest.pl HTTP/1.0 200 OK Date: Tue, 12 Aug 2003 13:17:13 GMT Via: 1.1 eutils.ncbi.nih.gov Server: Apache Content-Type: text/plain Client-Date: Tue, 12 Aug 2003 19:17:32 GMT Client-Response-Num: 1 Proxy-Connection: close X-Cache: MISS from eutils.ncbi.nih.gov X-Cache: MISS from BBSRC-wwwcache-service LOCUS HS377H14 148834 bp DNA linear PRI 14-SEP-2001 DEFINITION Human DNA sequence from clone RP3-377H14 on chromosome 6p21.32-22.1. Contains the HLA-G gene for major histocompatibility complex class I G (HLA 6.0) an MHC class I pseudogene, an RPL7A (60S Ribosomal Protein L7A) pseudogene, a gene for a novel MHC class 1 protein, an interferon-inducible protein 1-8U pseudogene, an RPL23A (60S Ribosomal Protein L23A) pseudogene, an HCGIX pseudogene, an MICB or PERB11.1 pseudogene,the HLA-F gene for major histocompatibility complex class I F (CDA12), and four P5-1 pseudogenes. Con ##################### If this works can you go back and check the amendments you made to the two bioperl scripts as this is all they are doing. If it still fails then its not a bioperl problem as such, but we can still try to track it down. > Noticed another thing: the cache (squid in my case) always > reports a "miss", and fetches direct. That's probably right. Since you're fetching potentially dynamic content (albeit through a GET request) you may find that squid refuses to cache it and will refetch it every time. Have a look in the squid documentation and there may be some way to tell it that it should cache dynamic GET requests as well. Hope this helps Simon. From brian_osborne at cognia.com Tue Aug 12 10:22:15 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Tue Aug 12 10:25:38 2003 Subject: installing scripts (was:Re: [Bioperl-l] GFF scripts) In-Reply-To: Message-ID: Koen and Jason, My apologies, I haven't been following your discussion closely. The scripts in scripts/ in bioperl-live all have the PLS suffix. This trunk also has script installation running, and a simplified approach to POD. Since Koen has said, reasonably, that he only wants to package a formal release perhaps the wisest thing for him to do is to wait for 1.3. IMO. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jason Stajich Sent: Monday, August 11, 2003 9:32 AM To: Koen van der Drift Cc: Bioperl-l@portal.open-bio.org Subject: Re: installing scripts (was:Re: [Bioperl-l] GFF scripts) On Sun, 10 Aug 2003, Koen van der Drift wrote: > > On Sunday, August 10, 2003, at 07:50 PM, Jason Stajich wrote: > > > > > what verions of File::Path is on your system? Perhaps the problem is > > the > > tests were run on systems with newer File::Path which previously > > exported > > rmtree. > > I recently install 5.8.0. The Makefile.PL I got from the trunk also has > the additional line to include rmtree, so I am not sure if the version > of File::Path is an issue. > well good that you caught it - more appropriate to explictly list the exported function names as you've done so the Makfile.PL should be updated with that for sure. > >> Careful examination of the code reveals that only the scripts that > >> have > >> an .PLS extension will be installed, which are exactly the 4 scripts > >> that get installed. All other scripts already have the .pl extension, > >> and thus will not be installed. So I modified the script as follows: > >> > > > > The intention is to ONLY install scripts which end in .PLS. This whole > > process also fixes the perl path in the header of the script as well. > > I see - I didn't realize that. The perl path fix btw is not effected by > my change. > > > > > The problem is that this install code was ported from the CVS HEAD to > > the > > branch and I don't think all the underlying scripts were renamed. > > > > I think that all of these problems go away when you work off the main > > trunk. > > I am trying to make a package for fink to install bioperl on Mac OS X, > so I rather not use the cvs trunk, but only 'real releases'. Thsi to > make sure that everyone who wants to use fink to install bioperl uses > the same code. > understand - this is more appropriate - you're welcome to have those patches be for the 1.2.2 release but keeping them in the fink pkg. If we do another release on the 1.2 branch (1.2.3) we can try and get this right. Glad you're doing this. > thanks for the info, > > > - Koen. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From billk at iinet.net.au Tue Aug 12 10:38:52 2003 From: billk at iinet.net.au (William Kenworthy) Date: Tue Aug 12 10:38:53 2003 Subject: [Bioperl-l] Bio::DB::GenBank and proxy In-Reply-To: <2DC41140A89ED411989D00508BDCD9ED01E28B0F@bi-exsrv1.iapc.bbsrc.ac.uk> References: <2DC41140A89ED411989D00508BDCD9ED01E28B0F@bi-exsrv1.iapc.bbsrc.ac.uk> Message-ID: <1060696791.28694.62.camel@rattus.Localdomain> ach! - typo when I changed the files. Also works normally if I elect not to go through the proxy ('unset http_proxy'). Thanks, also for the hint about squid, will keep looking at that. BillK On Tue, 2003-08-12 at 21:24, simon andrews (BI) wrote: > > -----Original Message----- > > From: William Kenworthy [mailto:billk@iinet.net.au] > > Sent: 12 August 2003 12:51 > > To: simon andrews (BI) > > Cc: BioPerl List > > Subject: RE: [Bioperl-l] Bio::DB::GenBank and proxy > > > > > > Unfortunately, this didn't work for me: > > > > declare -x http_proxy="http://localhost:8081" > > wdk@rattus tmp $ ./t.pl > > Attempt to bless into a reference at > > /usr/lib/perl5/site_perl/5.8.0/LWP/UserAgent.pm line 221. > > wdk@rattus tmp $ > > I take it this is after you'd made the changes I put in my bug report? I can't really diagnose this as I think I have a different version of UserAgent.pm to you. Line 221 in mine sets up a cookie jar, which doesn't sound right. > > When you changed the two bioperl modules were the lines you altered already calls to the new method of LWP::UserAgent? The line numbers reported may only be correct for the 1.2 release of bioperl. > > Can you run this small prog and see what you get (please unwrap the long line on the get call). > > #!/usr/bin/perl -w > use strict; > use LWP::UserAgent; > > my $ua = LWP::UserAgent -> new (env_proxy => 1, > keep_alive => 1, > timeout => 30); > > my $response = $ua -> get('http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?retmode=text&rettype=gb&db=nucleotide&tool=bioperl&id=gb%7CAL022723&usehistory=n'); > > my $text = $response -> as_string; > > print substr($text,0,1000) , "\n"; > ################################################################# > > You should hopefully get something like this: > > ##################### > HTTP/1.0 200 OK > Date: Tue, 12 Aug 2003 13:16:55 GMT > Via: 1.1 eutils.ncbi.nih.gov > Server: Apache > Content-Type: text/plain > Client-Date: Tue, 12 Aug 2003 19:17:13 GMT > Client-Response-Num: 1 > Proxy-Connection: close > X-Cache: MISS from eutils.ncbi.nih.gov > X-Cache: MISS from BBSRC-wwwcache-service > > LOCUS HS377H14 148834 bp DNA linear PRI 14-SEP-2001 > DEFINITION Human DNA sequence from clone RP3-377H14 on chromosome > 6p21.32-22.1. Contains the HLA-G gene for major [andrewss@bilin2 Test]$ ./uatest.pl > HTTP/1.0 200 OK > Date: Tue, 12 Aug 2003 13:17:13 GMT > Via: 1.1 eutils.ncbi.nih.gov > Server: Apache > Content-Type: text/plain > Client-Date: Tue, 12 Aug 2003 19:17:32 GMT > Client-Response-Num: 1 > Proxy-Connection: close > X-Cache: MISS from eutils.ncbi.nih.gov > X-Cache: MISS from BBSRC-wwwcache-service > > LOCUS HS377H14 148834 bp DNA linear PRI 14-SEP-2001 > DEFINITION Human DNA sequence from clone RP3-377H14 on chromosome > 6p21.32-22.1. Contains the HLA-G gene for major histocompatibility > complex class I G (HLA 6.0) an MHC class I pseudogene, an RPL7A > (60S Ribosomal Protein L7A) pseudogene, a gene for a novel MHC > class 1 protein, an interferon-inducible protein 1-8U pseudogene, > an RPL23A (60S Ribosomal Protein L23A) pseudogene, an HCGIX > pseudogene, an MICB or PERB11.1 pseudogene,the HLA-F gene for major > histocompatibility complex class I F (CDA12), and four P5-1 > pseudogenes. Con > ##################### > > If this works can you go back and check the amendments you made to the two bioperl scripts as this is all they are doing. If it still fails then its not a bioperl problem as such, but we can still try to track it down. > > > > Noticed another thing: the cache (squid in my case) always > > reports a "miss", and fetches direct. > > That's probably right. Since you're fetching potentially dynamic content (albeit through a GET request) you may find that squid refuses to cache it and will refetch it every time. Have a look in the squid documentation and there may be some way to tell it that it should cache dynamic GET requests as well. > > Hope this helps > > Simon. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- William Kenworthy From markw at illuminae.com Tue Aug 12 17:00:41 2003 From: markw at illuminae.com (Mark Wilkinson) Date: Tue Aug 12 17:00:42 2003 Subject: [Bioperl-l] SeqIO BEGIN block kills Gbrowse Message-ID: <1060722074.1709.106.camel@localhost.localdomain> Hi all, Just a heads-up that the eval in the BEGIN block of SeqIO.pm mucks-up the gbrowse CGI (apparently because the error message is printed to the screen before the CGI header). I suppose this is something that is best fixed at the gbrowse end of the stick, but I thought I'd post a note about it here anyway. It it doesn't look like it will be an easy thing to fix given the early point at which that line is going to execute... M -- Mark Wilkinson Illuminae From jason at cgt.duhs.duke.edu Tue Aug 12 17:22:35 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Aug 12 17:05:16 2003 Subject: [Bioperl-l] SeqIO BEGIN block kills Gbrowse In-Reply-To: <1060722074.1709.106.camel@localhost.localdomain> References: <1060722074.1709.106.camel@localhost.localdomain> Message-ID: It shouldn't print anything b/c it is in an eval block. What does it print to the screen? On Tue, 12 Aug 2003, Mark Wilkinson wrote: > Hi all, > > Just a heads-up that the eval in the BEGIN block of SeqIO.pm mucks-up > the gbrowse CGI (apparently because the error message is printed to the > screen before the CGI header). I suppose this is something that is best > fixed at the gbrowse end of the stick, but I thought I'd post a note > about it here anyway. It it doesn't look like it will be an easy thing > to fix given the early point at which that line is going to execute... > > M > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From markw at illuminae.com Tue Aug 12 17:26:24 2003 From: markw at illuminae.com (Mark Wilkinson) Date: Tue Aug 12 17:26:26 2003 Subject: [BioPerl] Re: [Bioperl-l] SeqIO BEGIN block kills Gbrowse In-Reply-To: References: <1060722074.1709.106.camel@localhost.localdomain> Message-ID: <1060723619.1710.109.camel@localhost.localdomain> At the top of the CGI page you get a "can't locate stdin..." error message. The code continues to execute, but by then it is all mucked up. M On Tue, 2003-08-12 at 15:22, Jason Stajich wrote: > It shouldn't print anything b/c it is in an eval block. > What does it print to the screen? > > On Tue, 12 Aug 2003, Mark Wilkinson wrote: > > > Hi all, > > > > Just a heads-up that the eval in the BEGIN block of SeqIO.pm mucks-up > > the gbrowse CGI (apparently because the error message is printed to the > > screen before the CGI header). I suppose this is something that is best > > fixed at the gbrowse end of the stick, but I thought I'd post a note > > about it here anyway. It it doesn't look like it will be an easy thing > > to fix given the early point at which that line is going to execute... > > > > M > > > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Mark Wilkinson Illuminae From markw at illuminae.com Tue Aug 12 17:27:17 2003 From: markw at illuminae.com (Mark Wilkinson) Date: Tue Aug 12 17:27:26 2003 Subject: [BioPerl] Re: [Bioperl-l] SeqIO BEGIN block kills Gbrowse In-Reply-To: References: <1060722074.1709.106.camel@localhost.localdomain> Message-ID: <1060723672.1709.112.camel@localhost.localdomain> i can switch the faulty behaviour back on and send you a URL if you like. All I did was comment-out the eval to solve my immediate problem. M On Tue, 2003-08-12 at 15:22, Jason Stajich wrote: > It shouldn't print anything b/c it is in an eval block. > What does it print to the screen? > > On Tue, 12 Aug 2003, Mark Wilkinson wrote: > > > Hi all, > > > > Just a heads-up that the eval in the BEGIN block of SeqIO.pm mucks-up > > the gbrowse CGI (apparently because the error message is printed to the > > screen before the CGI header). I suppose this is something that is best > > fixed at the gbrowse end of the stick, but I thought I'd post a note > > about it here anyway. It it doesn't look like it will be an easy thing > > to fix given the early point at which that line is going to execute... > > > > M > > > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Mark Wilkinson Illuminae From jason at cgt.duhs.duke.edu Tue Aug 12 17:50:30 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Aug 12 17:33:11 2003 Subject: [BioPerl] Re: [Bioperl-l] SeqIO BEGIN block kills Gbrowse In-Reply-To: <1060723672.1709.112.camel@localhost.localdomain> References: <1060722074.1709.106.camel@localhost.localdomain> <1060723672.1709.112.camel@localhost.localdomain> Message-ID: i guess so - i mean I run gbrowse on bioperl live with no problems at all so I don't know what would cause it. -j On Tue, 12 Aug 2003, Mark Wilkinson wrote: > i can switch the faulty behaviour back on and send you a URL if you > like. All I did was comment-out the eval to solve my immediate problem. > > M > > On Tue, 2003-08-12 at 15:22, Jason Stajich wrote: > > It shouldn't print anything b/c it is in an eval block. > > What does it print to the screen? > > > > On Tue, 12 Aug 2003, Mark Wilkinson wrote: > > > > > Hi all, > > > > > > Just a heads-up that the eval in the BEGIN block of SeqIO.pm mucks-up > > > the gbrowse CGI (apparently because the error message is printed to the > > > screen before the CGI header). I suppose this is something that is best > > > fixed at the gbrowse end of the stick, but I thought I'd post a note > > > about it here anyway. It it doesn't look like it will be an easy thing > > > to fix given the early point at which that line is going to execute... > > > > > > M > > > > > > > > > > -- > > Jason Stajich > > Duke University > > jason at cgt.mc.duke.edu > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From gadbermd at earthlink.net Tue Aug 12 16:04:00 2003 From: gadbermd at earthlink.net (gadbermd@earthlink.net) Date: Tue Aug 12 17:58:36 2003 Subject: [Bioperl-l] Help w/ AlignIO and consensus_iupac Message-ID: <5191866.1060725547213.JavaMail.nobody@bert.psp.pas.earthlink.net> Hi everyone, I am a BioPerl newbie and I was wondering if someone could help me figure out how to generate a consensus from a Clustalw .aln file? I tried to write this sample code: #!/usr/bin/perl use warnings; use Bio::AlignIO; my $usage = "Usage: test.pl \n"; my $in_file = shift or die $usage; $alignio = new Bio::AlignIO(-format => 'clustalw', -file => "$in_file"); $aln = $alignio->next_aln(); $str = $aln->consensus_iupac(); print $str; But it always generates an error message saying the sequence is a protein: % ./test.pl new.aln ------------- EXCEPTION ------------- MSG: Seq [gi|18397816|ref|NM_102852.1|/1-648] is a protein STACK Bio::SimpleAlign::consensus_iupac /usr/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm:1325 STACK toplevel ./test.pl:9 -------------------------------------- This occurs in spite of the fact that the Clustal output is all nucleotides, not proteins... % cat new.aln CLUSTAL W (1.82) multiple sequence alignment gi|18397816|ref|NM_102852.1| -------------------------------------------------- gi|33242920|gb|AY332478.1| GGTTAATTTTGGTTGGAGGTAGAGAGAGAGAGAGAGGGAGGGAGGGAGGA gi|18424168|ref|NM_125279.1| -------------------------------------------------- gi|18397816|ref|NM_102852.1| -----------------ATGAGG--AAAGGTAAGAGAGTGATA------- gi|33242920|gb|AY332478.1| GGAGGAGGAGGAGGAGGAGGAGG--AAGAACAGGAGGAAGATGGGGCGGG gi|18424168|ref|NM_125279.1| -----------------ATGGTTCCGAAAGTGGTCGACCTACA------- * * * * * etc. etc. etc. I have tried it with a number of different Clustal output files but it always complains about them containing proteins. I figure I have to be doing something wrong here. Thanks so much, Mike G. From sobrien at umail.ucsb.edu Tue Aug 12 15:45:33 2003 From: sobrien at umail.ucsb.edu (Sean O'Brien) Date: Tue Aug 12 18:45:54 2003 Subject: [Bioperl-l] problem with Graphics Message-ID: <200308121545.33474.sobrien@umail.ucsb.edu> Hi, I have been trying to get BioPerl to output png's, but I seem to be getting invalid png files. I have a fresh install of libgd, version 2.0.15 and my GD version is 2.07. I installed Bundle::BioPerl, and after having no luck, I installed BioPerl version 1.2.2 from the sources in current_core_stable.tar.gz. When I run 'make test' I get an ok for BioGraphics. Also, when I run the first script described in the Bio Graphics tutorial, it runs with no errors and outputs some data which appears as though it could be an image. However, the file seems to be of an invalid png format because it cannot be opened by display, galeon or the GIMP. This is pretty frustrating because everything apears to be installing/running fine, but then the image is somehow corrupted. What might I have done wrong/ need to do to make this work. Thanks. -S From kvddrift at earthlink.net Tue Aug 12 19:07:11 2003 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue Aug 12 19:08:47 2003 Subject: installing scripts (was:Re: [Bioperl-l] GFF scripts) In-Reply-To: Message-ID: On Tuesday, August 12, 2003, at 10:22 AM, Brian Osborne wrote: > Since Koen > has said, reasonably, that he only wants to package a formal release > perhaps > the wisest thing for him to do is to wait for 1.3. IMO. > I agree :) I will submit the 1.2.2 package to fink soon, without installing the scripts. The user can always put the scripts somewhere if he or she desires. When 1.3 is released, I will update the package description. thanks for all the input. - Koen. From jason at cgt.duhs.duke.edu Tue Aug 12 23:47:56 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Aug 12 23:30:48 2003 Subject: [Bioperl-l] Help w/ AlignIO and consensus_iupac In-Reply-To: <5191866.1060725547213.JavaMail.nobody@bert.psp.pas.earthlink.net> References: <5191866.1060725547213.JavaMail.nobody@bert.psp.pas.earthlink.net> Message-ID: >From the iupac_consensus docs: Note that if your alignment sequences contain a lot of IUPAC ambiquity codes you often have to manually set alphabet. Bio::PrimarySeq::_guess_type thinks they indicate a protein sequence. Do this in your code once you have an Alignment object called $aln. for my $seq ( $aln->each_seq ) { $seq->alphabet('dna'); # or rna if that is what you have } Then try calling the consensus method again. -jason On Tue, 12 Aug 2003 gadbermd@earthlink.net wrote: > Hi everyone, > > I am a BioPerl newbie and I was wondering if someone could help me figure out how to generate a consensus from a Clustalw .aln file? I tried to write this sample code: > > #!/usr/bin/perl > use warnings; > use Bio::AlignIO; > > my $usage = "Usage: test.pl \n"; > my $in_file = shift or die $usage; > > $alignio = new Bio::AlignIO(-format => 'clustalw', -file => "$in_file"); > $aln = $alignio->next_aln(); > $str = $aln->consensus_iupac(); > > print $str; > > > But it always generates an error message saying the sequence is a protein: > > > % ./test.pl new.aln > ------------- EXCEPTION ------------- > MSG: Seq [gi|18397816|ref|NM_102852.1|/1-648] is a protein > STACK Bio::SimpleAlign::consensus_iupac /usr/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm:1325 > STACK toplevel ./test.pl:9 > > -------------------------------------- > > > This occurs in spite of the fact that the Clustal output is all nucleotides, not proteins... > > > > % cat new.aln > CLUSTAL W (1.82) multiple sequence alignment > > > gi|18397816|ref|NM_102852.1| -------------------------------------------------- > gi|33242920|gb|AY332478.1| GGTTAATTTTGGTTGGAGGTAGAGAGAGAGAGAGAGGGAGGGAGGGAGGA > gi|18424168|ref|NM_125279.1| -------------------------------------------------- > > > > gi|18397816|ref|NM_102852.1| -----------------ATGAGG--AAAGGTAAGAGAGTGATA------- > gi|33242920|gb|AY332478.1| GGAGGAGGAGGAGGAGGAGGAGG--AAGAACAGGAGGAAGATGGGGCGGG > gi|18424168|ref|NM_125279.1| -----------------ATGGTTCCGAAAGTGGTCGACCTACA------- > * * * * * > > etc. etc. etc. > > > I have tried it with a number of different Clustal output files but it always complains about them containing proteins. I figure I have to be doing something wrong here. > > Thanks so much, > Mike G. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From zrgkigqtcxwx at msn.com Tue Aug 12 23:36:40 2003 From: zrgkigqtcxwx at msn.com (zrgkigqtcxwx@msn.com) Date: Tue Aug 12 23:34:16 2003 Subject: [Bioperl-l] Cut your debt by up to 60 percent Message-ID: <200308130334.h7D3YB4T016318@localhost.localdomain> Check this out, this is helpful.. Feel like your debts putting you in a hole you can't crawl out of? We blast your debt and give you a fresh start! - Save you a lot of money by eliminating late fees - Settle your accounts for a substantially reduced amount - Stop creditors calling you on the phone - Avoid bankruptcy ... and more! Why keep dealing with the stress, and headaches? Combine your debt into a low interest repayment and get on with your life today!! Come here and take a look at how we can help. http://r.aol.com/cgi/redir-complex?url=http://total@www.slashmonthlypayments.com/index.php?N=g stop more of these http://r.aol.com/cgi/redir-complex?url=http://ttl@www.slashmonthlypayments.com/r.php From srobb1 at gl.umbc.edu Tue Aug 12 10:40:51 2003 From: srobb1 at gl.umbc.edu (Sofia) Date: Wed Aug 13 08:27:27 2003 Subject: [Bioperl-l] Re: parsing BLAST html Message-ID: <004a01c360df$bb1d77f0$f500000a@planaria2> I use PerlIO::via::StripHTML and it works quite successfully - Sofia Hi Wes, Before I parse my html blast I use PerlIO::via::StripHTML. It removes all html and I save the new file as the orginalFileName.out. I like the html blast output because I save them later for another use. But if I didnt need them I would just use text output. use strict; use Bio::SearchIO; use PerlIO::via::StripHTML; my @dir_html_files = ; foreach my $file (@dir_html_files){ my $outfile = $file."\.out"; open OUTFILE, ">$outfile"; open INFILE, '<:via(StripHTML)', $file or die "Can't open $outfile: $!\n"; while (){ print OUTFILE $_; } } -Sofia ----- Original Message ----- From: "Jason Stajich" To: "Wes Barris" Cc: "Bioperl Mailing List" Sent: Tuesday, August 12, 2003 6:23 AM Subject: Re: [Bioperl-l] Parsing html blast output? > No, it is not currently possible to parse BLAST HTML output. > > On Tue, 12 Aug 2003, Wes Barris wrote: > > > Hi, > > > > I know it is possible to use the SearchIO functions to parse either > > text blast output or xml blast output. However, I would like to know > > if it is possible to parse html blast output? For example, if I wanted > > to parse the output of this command: > > > > blastcl3 -d nr -p blastn -T -i fasta.txt -o blast.html > > > > When I try parsing the above "blast.html" file using example number 4 > > from this file: > > > > http://bioperl.org/HOWTOs/html/Graphics-HOWTO.html > > > > I get errors. > > > > What I ended up doing is writing a perl "de-htmlizer" that I use to > > convert an html blast output file into a text-only blast output file. > > Then I run the result through a bioperl blast parsing script. Is > > there a more elegant way to do this? > > > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman From brian_osborne at cognia.com Wed Aug 13 08:42:04 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Wed Aug 13 08:45:24 2003 Subject: [Bioperl-l] Re: parsing BLAST html In-Reply-To: <004a01c360df$bb1d77f0$f500000a@planaria2> Message-ID: Sofia, Just making sure here. The output from StripHTML can be parsed by SearchIO? This probably belongs in the FAQ. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Sofia Sent: Tuesday, August 12, 2003 10:41 AM To: Bioperl Mailing List Subject: [Bioperl-l] Re: parsing BLAST html I use PerlIO::via::StripHTML and it works quite successfully - Sofia Hi Wes, Before I parse my html blast I use PerlIO::via::StripHTML. It removes all html and I save the new file as the orginalFileName.out. I like the html blast output because I save them later for another use. But if I didnt need them I would just use text output. use strict; use Bio::SearchIO; use PerlIO::via::StripHTML; my @dir_html_files = ; foreach my $file (@dir_html_files){ my $outfile = $file."\.out"; open OUTFILE, ">$outfile"; open INFILE, '<:via(StripHTML)', $file or die "Can't open $outfile: $!\n"; while (){ print OUTFILE $_; } } -Sofia ----- Original Message ----- From: "Jason Stajich" To: "Wes Barris" Cc: "Bioperl Mailing List" Sent: Tuesday, August 12, 2003 6:23 AM Subject: Re: [Bioperl-l] Parsing html blast output? > No, it is not currently possible to parse BLAST HTML output. > > On Tue, 12 Aug 2003, Wes Barris wrote: > > > Hi, > > > > I know it is possible to use the SearchIO functions to parse either > > text blast output or xml blast output. However, I would like to know > > if it is possible to parse html blast output? For example, if I wanted > > to parse the output of this command: > > > > blastcl3 -d nr -p blastn -T -i fasta.txt -o blast.html > > > > When I try parsing the above "blast.html" file using example number 4 > > from this file: > > > > http://bioperl.org/HOWTOs/html/Graphics-HOWTO.html > > > > I get errors. > > > > What I ended up doing is writing a perl "de-htmlizer" that I use to > > convert an html blast output file into a text-only blast output file. > > Then I run the result through a bioperl blast parsing script. Is > > there a more elegant way to do this? > > > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman From cain at cshl.org Wed Aug 13 09:23:27 2003 From: cain at cshl.org (Scott Cain) Date: Wed Aug 13 09:23:29 2003 Subject: [Bioperl-l] SeqIO BEGIN block kills Gbrowse Message-ID: <1060781043.1430.9.camel@localhost.localdomain> Hi Mark, I think this is a CGI.pm problem--Lincoln and I have gone back and forth about this, but I seem to remember that the problem was fixed with an update of CGI.pm. Also, you might want to send/cc notes like this to the GBrowse list: gmod-gbrowse@lists.sourceforge.net. Thanks, Scott --- Original Message --- Date: 12 Aug 2003 15:01:14 -0600 From: Mark Wilkinson Subject: [Bioperl-l] SeqIO BEGIN block kills Gbrowse To: bioperl-l@bioperl.org Message-ID: <1060722074.1709.106.camel@localhost.localdomain> Content-Type: text/plain Hi all, Just a heads-up that the eval in the BEGIN block of SeqIO.pm mucks-up the gbrowse CGI (apparently because the error message is printed to the screen before the CGI header). I suppose this is something that is best fixed at the gbrowse end of the stick, but I thought I'd post a note about it here anyway. It it doesn't look like it will be an easy thing to fix given the early point at which that line is going to execute... M -- Mark Wilkinson Illuminae -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From matthias.wahl at gsf.de Wed Aug 13 19:02:28 2003 From: matthias.wahl at gsf.de (Matthias Wahl) Date: Wed Aug 13 09:47:28 2003 Subject: [Bioperl-l] Bio::DB::GFF problem Message-ID: <3F3AC384.5010100@gsf.de> Hi all! I have trouble in using Bio::DB::GFF with the following code: my $aggregator = Bio::DB::GFF::Aggregator->new(-method => 'gene_density' -sub_parts => 'EnsEMBL:gene_density'); my $gff_db = Bio::DB::GFF->new(-adaptor =>'dbi::mysqlopt', -dsn=>'dbi:mysql:Mus_musculus_GFF', -user => 'xxxxx', -pass => 'xxxxx', -aggregator => $aggregator ); Calling $gff_db->segment(-class=>'Chromosome', -value=>'1'); always returns undef (whatever arguments I use)! The database has been generated by loading a GFF file of the following format: 1 EnsEMBL gene_density 1000001 2000000 0 Chromosome 1 1 EnsEMBL gene_density 2000001 3000000 0 Chromosome 1 1 EnsEMBL gene_density 3000001 4000000 1 Chromosome 1 1 EnsEMBL gene_density 4000001 5000000 12 Chromosome 1 1 EnsEMBL gene_density 5000001 6000000 4 Chromosome 1 with load_gff.PLS (columns are tab-seperated, the 9th column consists of 'Chromosome' and name, seperated by space), both with and without the associated sequence file. Calling $gff_db->features() works fine. But I need aggregated features for generating a Bio::Graphics xyplot (to plot the gene density for a particular chromosome). Many thanks, Matthias -- Matthias Wahl GSF-National Research Center for Environment and Health Institute of Developmental Genetics Ingolstaedter Landstrasse 1 D-85764 Neuherberg Germany TEL: ++49 89 3187-4117,-2638 FAX: ++49 89 3187-3099 E-mail: matthias.wahl@gsf.de WWW: http://www.gsf.de/idg From pm66 at nyu.edu Wed Aug 13 11:20:49 2003 From: pm66 at nyu.edu (Philip MacMenamin) Date: Wed Aug 13 11:20:34 2003 Subject: [Bioperl-l] epcr to gff parser. Message-ID: <200308131521.h7DFL7JJ006819@mx3.nyu.edu> Hi, Appologies if this is silly question, but how would you parse an ePCR file and output a GFF file? I have been looking through bioPerl documentation and my $parser = new Bio::Tools::EPCR(-file => 'epcr_755_all.txt'); Seems to look right for a start, but how do output GFF? I was looking at the SeqAnalysisParserFactoryI get_parser method, Im just not sure how this is all working. I dont really want to write my own parser, since im sure somthing is out there already... Thanks... From jason at cgt.duhs.duke.edu Wed Aug 13 11:50:56 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Wed Aug 13 11:33:25 2003 Subject: [Bioperl-l] epcr to gff parser. In-Reply-To: <200308131521.h7DFL7JJ006819@mx3.nyu.edu> References: <200308131521.h7DFL7JJ006819@mx3.nyu.edu> Message-ID: use Bio::Tools::GFF; my $out = new Bio::Tools::GFF(-file => ">newfile.gff"); my $parser = new Bio::Tools::EPCR(-file => 'epcr_755_all.txt'); while( my $f = $parser->next_feature ) { $out->write_feature($f); } On Wed, 13 Aug 2003, Philip MacMenamin wrote: > Hi, > > Appologies if this is silly question, but how would you parse an ePCR file > and output a GFF file? > I have been looking through bioPerl documentation and > > my $parser = new Bio::Tools::EPCR(-file => 'epcr_755_all.txt'); > > Seems to look right for a start, but how do output GFF? I was looking at the > SeqAnalysisParserFactoryI get_parser method, Im just not sure how this is all > working. I dont really want to write my own parser, since im sure somthing is > out there already... > > Thanks... > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From markw at illuminae.com Wed Aug 13 11:43:35 2003 From: markw at illuminae.com (Mark Wilkinson) Date: Wed Aug 13 11:43:36 2003 Subject: [BioPerl] [Bioperl-l] SeqIO BEGIN block kills Gbrowse In-Reply-To: <1060781043.1430.9.camel@localhost.localdomain> References: <1060781043.1430.9.camel@localhost.localdomain> Message-ID: <1060789450.1710.54.camel@localhost.localdomain> Thanks for the heads-up. I'll do that. Sorry bio-perlers! I didn't have the time/energy to look deeper than the error message itself, which pointed to SeqIO.pm. M On Wed, 2003-08-13 at 07:24, Scott Cain wrote: > Hi Mark, > > I think this is a CGI.pm problem--Lincoln and I have gone back and forth > about this, but I seem to remember that the problem was fixed with an > update of CGI.pm. > > Also, you might want to send/cc notes like this to the GBrowse list: > gmod-gbrowse@lists.sourceforge.net. > > Thanks, > Scott > > > --- Original Message --- > Date: 12 Aug 2003 15:01:14 -0600 > From: Mark Wilkinson > Subject: [Bioperl-l] SeqIO BEGIN block kills Gbrowse > To: bioperl-l@bioperl.org > Message-ID: <1060722074.1709.106.camel@localhost.localdomain> > Content-Type: text/plain > > Hi all, > > Just a heads-up that the eval in the BEGIN block of SeqIO.pm mucks-up > the gbrowse CGI (apparently because the error message is printed to the > screen before the CGI header). I suppose this is something that is best > fixed at the gbrowse end of the stick, but I thought I'd post a note > about it here anyway. It it doesn't look like it will be an easy thing > to fix given the early point at which that line is going to execute... > > M > > -- > Mark Wilkinson > Illuminae -- Mark Wilkinson Illuminae From ajm6q at virginia.edu Wed Aug 13 14:31:12 2003 From: ajm6q at virginia.edu (Aaron J Mackey) Date: Wed Aug 13 14:49:08 2003 Subject: [Bioperl-l] Re: parsing BLAST html In-Reply-To: Message-ID: Brian, you may want to add that something like this should also work: use Bio::SearchIO; use Bio::SearchIO::blast; use HTML::Strip; my $hs = new HTML::Strip; # replace the blast parser's _readline method with one that # auto-strips HTML: sub Bio::SearchIO::blast::_readline { my ($self, @args) = @_; return $hs->parse($self->SUPER::_readline(@args)); } $io = new Bio::SearchIO -file => "etc", -format => "blast"; # etc ... -Aaron On Wed, 13 Aug 2003, Brian Osborne wrote: > Sofia, > > Just making sure here. The output from StripHTML can be parsed by SearchIO? > This probably belongs in the FAQ. > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Sofia > Sent: Tuesday, August 12, 2003 10:41 AM > To: Bioperl Mailing List > Subject: [Bioperl-l] Re: parsing BLAST html > > I use PerlIO::via::StripHTML and it works quite successfully > - > Sofia > > Hi Wes, > Before I parse my html blast I use PerlIO::via::StripHTML. It removes all > html and I save the new file as the orginalFileName.out. I like the html > blast output because I save them later for another use. But if I didnt need > them I would just use text output. > > use strict; > use Bio::SearchIO; > use PerlIO::via::StripHTML; > > my @dir_html_files = ; > foreach my $file (@dir_html_files){ > my $outfile = $file."\.out"; > open OUTFILE, ">$outfile"; > open INFILE, '<:via(StripHTML)', $file > or die "Can't open $outfile: $!\n"; > while (){ > print OUTFILE $_; > } > } > > -Sofia > ----- Original Message ----- > From: "Jason Stajich" > To: "Wes Barris" > Cc: "Bioperl Mailing List" > Sent: Tuesday, August 12, 2003 6:23 AM > Subject: Re: [Bioperl-l] Parsing html blast output? > > > > No, it is not currently possible to parse BLAST HTML output. > > > > On Tue, 12 Aug 2003, Wes Barris wrote: > > > > > Hi, > > > > > > I know it is possible to use the SearchIO functions to parse either > > > text blast output or xml blast output. However, I would like to know > > > if it is possible to parse html blast output? For example, if I wanted > > > to parse the output of this command: > > > > > > blastcl3 -d nr -p blastn -T -i fasta.txt -o blast.html > > > > > > When I try parsing the above "blast.html" file using example number 4 > > > from this file: > > > > > > http://bioperl.org/HOWTOs/html/Graphics-HOWTO.html > > > > > > I get errors. > > > > > > What I ended up doing is writing a perl "de-htmlizer" that I use to > > > convert an html blast output file into a text-only blast output file. > > > Then I run the result through a bioperl blast parsing script. Is > > > there a more elegant way to do this? > > > > > > > > > > -- > > Jason Stajich > > Duke University > > jason at cgt.mc.duke.edu > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Aaron J Mackey Pearson Laboratory University of Virginia (434) 924-2821 amackey@virginia.edu From brian_osborne at cognia.com Wed Aug 13 16:19:32 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Wed Aug 13 16:24:40 2003 Subject: [Bioperl-l] Re: parsing BLAST html In-Reply-To: Message-ID: Aaron, Understood. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Aaron J Mackey Sent: Wednesday, August 13, 2003 2:31 PM To: Brian Osborne Cc: Sofia; Bioperl Mailing List Subject: RE: [Bioperl-l] Re: parsing BLAST html Brian, you may want to add that something like this should also work: use Bio::SearchIO; use Bio::SearchIO::blast; use HTML::Strip; my $hs = new HTML::Strip; # replace the blast parser's _readline method with one that # auto-strips HTML: sub Bio::SearchIO::blast::_readline { my ($self, @args) = @_; return $hs->parse($self->SUPER::_readline(@args)); } $io = new Bio::SearchIO -file => "etc", -format => "blast"; # etc ... -Aaron On Wed, 13 Aug 2003, Brian Osborne wrote: > Sofia, > > Just making sure here. The output from StripHTML can be parsed by SearchIO? > This probably belongs in the FAQ. > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Sofia > Sent: Tuesday, August 12, 2003 10:41 AM > To: Bioperl Mailing List > Subject: [Bioperl-l] Re: parsing BLAST html > > I use PerlIO::via::StripHTML and it works quite successfully > - > Sofia > > Hi Wes, > Before I parse my html blast I use PerlIO::via::StripHTML. It removes all > html and I save the new file as the orginalFileName.out. I like the html > blast output because I save them later for another use. But if I didnt need > them I would just use text output. > > use strict; > use Bio::SearchIO; > use PerlIO::via::StripHTML; > > my @dir_html_files = ; > foreach my $file (@dir_html_files){ > my $outfile = $file."\.out"; > open OUTFILE, ">$outfile"; > open INFILE, '<:via(StripHTML)', $file > or die "Can't open $outfile: $!\n"; > while (){ > print OUTFILE $_; > } > } > > -Sofia > ----- Original Message ----- > From: "Jason Stajich" > To: "Wes Barris" > Cc: "Bioperl Mailing List" > Sent: Tuesday, August 12, 2003 6:23 AM > Subject: Re: [Bioperl-l] Parsing html blast output? > > > > No, it is not currently possible to parse BLAST HTML output. > > > > On Tue, 12 Aug 2003, Wes Barris wrote: > > > > > Hi, > > > > > > I know it is possible to use the SearchIO functions to parse either > > > text blast output or xml blast output. However, I would like to know > > > if it is possible to parse html blast output? For example, if I wanted > > > to parse the output of this command: > > > > > > blastcl3 -d nr -p blastn -T -i fasta.txt -o blast.html > > > > > > When I try parsing the above "blast.html" file using example number 4 > > > from this file: > > > > > > http://bioperl.org/HOWTOs/html/Graphics-HOWTO.html > > > > > > I get errors. > > > > > > What I ended up doing is writing a perl "de-htmlizer" that I use to > > > convert an html blast output file into a text-only blast output file. > > > Then I run the result through a bioperl blast parsing script. Is > > > there a more elegant way to do this? > > > > > > > > > > -- > > Jason Stajich > > Duke University > > jason at cgt.mc.duke.edu > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Aaron J Mackey Pearson Laboratory University of Virginia (434) 924-2821 amackey@virginia.edu _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason at cgt.duhs.duke.edu Thu Aug 14 08:33:10 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Aug 14 08:15:21 2003 Subject: [Bioperl-l] the roundup (long) Message-ID: I've added a bunch of new things and fixed some bugs, wanted to try and summarize before I get too busy and forget the details. Here is the roundup. [Bio::PopGen] New code in Bio::PopGen implements several statistics for use in testing the neutrality of mutations in a population these include Tajima's D, Fu and Li's D, Fu and Li's F, as well as some utilities like Theta and Pi. These are still being put through the paces to insure that are calculating everything properly. A basic Coalescent simulator was already in Bioperl named Bio::Tree::RandomTree. This has been renamed Bio::PopGen::Simulation::Coalescent and uses the revamped Bio::Tree::AlleleNode objects. Added an LD calculation implemention in composite_LD for unphased data. Will also have D-prime by the end of the week for haplotype data. These are in Bio::PopGen::Statistics. Bio::PopGen::PopStats has an implementation of Fst to test for population structure. The PopGen::Individual, PopGen::Population, PopGen::Genotype interface and implementations seem to be reaching a stable point. I've also unified these with the bioperl-pedigree code (a separate CVS module). Some more small tweaks will probably go in over the coming months as things get put through the paces, but I hope this can become stable code for a while. In order to get allele/genotype data into Bioperl have added Bio::PopGen::IO which can parse in csv delimited files as well as prettybase format. I expect to have the code to take SimpleAlign objects and turn them into PopGen::Individuals written shortly. To be sure and give credit - all the PopGen stuff is in collaboration with Matthew Hahn. We are preparing a tutorial to these objects which should be out there in the Fall. [Bio::Matrix] I added Bio::Matrix::IO to implement a framework for simple matrix parsing. This is only to try and simplify things even though different types of matricies are not equateable. I added a scoring matrix parser (IO::scoring) for BLOSUM/PAM matrix parsing. It unsurprisinging produces Bio::Matrix::Scoring objects. Also added IO::phylip to parse phylip distance matricies and thus produce Bio::Matrix::PhylipDist. I also added a general purpose object Bio::Matrix::Generic which is a starting place for putting column and row data. This is probably NOT the object you want to use for PWM and PSSMs - perhaps Stefan Kirov's stuff will fit in here. [memory leaks - Bio::Tree and Bio::SeqFeature] Some memory leaks have been fixed for Trees and SeqFeatures. Perl won't cleanup and break memory cycles unless you explictly break them in the DESTROY code. However with our object hierarchy DESTROY is not necessarily getting called by all subclass unless we do the whole chained destructor (analagous to our chained constructors). The way I solved it is to use the already written _register_for_cleanup method which is called in the constructor to specify the cleanup method instead of relying on DESTROY. This seems to work. The only downside is in the case of things like a Bio::Tree::Tree. In this case the tree structure is implicit in the Nodes and their pointers to children/parents. The problem comes in if we reuse all or part of the tree to do some test - like this sub foo { my @nodes = @_; my $tree = new Bio::Tree::Tree(-nodes => \@nodes); ... # tree gets destroyed at the end of scope } The problem is by default the tree destruction means also destroy the containing nodes, which is a problem if you want to use those nodes for something later. The solution is at the end to set the root_node_pointer to undef and thus the tree has no way to destroy the underlying nodes. However this might be hard to remember to do all the time. I introduced a -nodelete option (method name nodelete) to the constructor (default value is false) which if true will not destroy the underlying nodes. A similar problem existed for SeqFeature FeaturePairs, I've added the code in SeqFeature::Generic, SeqFeature::FeaturePair, and SeqFeature::Gene::GeneStructure & SeqFeature::Gene::Transcript which should take care of this now. I was able successfully parse a large number of genewise reports which each generated gene/transript sets which previously had caused my perl to crash running out of memory so I feel we have removed some (probably not all) of the leaks that get introduced when there are cycles. The memleak bugs were also fixed on the 1.2 branch for what its worth. [Bio::SearchIO] Added some more SearchIO parsers. Borrowing from Bala's Tools::Blat impelementation I made a SearchIO::psl parser which can parse PSL output. It needs to be tweaked a little more to skip the header lines if they are produced but works for me for output from Jim's lav2Psl code. Additionally a SearchIO::blasttable has been added which can parse NCBI -m 8 or -m 9 output for those just needing some minimal information. [other bugs - from changelog] o Bio::SearchIO - Fixed bugs in BLAST parsing which couldn't parse NCBI gapped blast properly (was losing hit significance values due to the extra unexpeted column). - Parsing of blastcl3 (netblast from NCBI) now can handle case of integer overflow (# of letters in nt seq dbs is > MAX_INT) although doesn't try to correct it - will get the negative number for you. Added a test for this as well. - Fixed HMMER parsing bug which prevented parsing when a hmmpfam report has no top-level family classification scores but does have scores and alignments for individual domains. On the 1.2 branch I also fixed a couple of places in SeqIO::genbank and SeqIO::bsml where we weren't dereferencing the arrayref for keywords. -jason -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Thu Aug 14 08:38:07 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Aug 14 08:20:17 2003 Subject: [Bioperl-l] question about SeqIO and BSML DTD In-Reply-To: References: Message-ID: Short answer is no - there is no easy way to do it because of the way this module is written. I have no idea how different 3.1 is from 2.2 as to how much of a code change this is. This module is still looking for an 'owner' or someone to maintain it - preferrably someone who actually uses BSML so they can be closer to some of the nuances of the format. Volunteers welcome. -jason On Mon, 11 Aug 2003, Susan Cassidy wrote: > Hi, > It looks like the current (1.2.2) bioperl bsml functions are set up to use > the BSML 2.2 DTD. I would like to use the 3.1 DTD instead. Is there some > easy way to have things like SeqIO create BSML output using the 3.1 DTD? > I'm not sure how the internal handling of all the DTD stuff works, but I > do see that the DTD URL is hard-coded in bsml.pm to be > "http://www.labbook.com/dtd/bsml2_2.dtd". > > It would be wonderful if it were as easy as changing that! But, I won't > hold my breath! > > I'm doing some experimentation with converting sequences from formats such > as Genbank into BSML, and this makes it really simple. > > Please reply to this email address and not just to the list, as I don't > subscribe to it. > > Any advice appreciated. > > Thanks, > Susan Cassidy -- Jason Stajich Duke University jason at cgt.mc.duke.edu From anelda at sanbi.ac.za Thu Aug 14 08:58:28 2003 From: anelda at sanbi.ac.za (Anelda Boardman) Date: Thu Aug 14 08:58:00 2003 Subject: [Bioperl-l] Installing Bioperl - Dependencies problem Message-ID: <45502.196.38.142.111.1060865908.squirrel@webmail.sanbi.ac.za> Hallo, I've been trying to install Bioperl, but ran into some trouble with the dependencies... After running 'perl Makefile.PL' I got the message that Ace and DBD::mysql are needed. I downloaded DBD::mysql-2.9002 from CPAN and tried to install it according to the instructions. 'perl Makefile.PL' runs smoothly, but when I want to 'make', something goes wrong and I can't figure out what to do to solve the problem. Please could someone help me? This is part of the error message: [root@jive DBD-mysql-2.1028]# make cp lib/DBD/mysql.pm blib/lib/DBD/mysql.pm cp lib/DBD/mysql/INSTALL.pod blib/lib/DBD/mysql/INSTALL.pod cp lib/Mysql.pm blib/lib/Mysql.pm cp lib/Mysql/Statement.pm blib/lib/Mysql/Statement.pm cp lib/DBD/mysql.pod blib/lib/DBD/mysql.pod cp lib/Bundle/DBD/mysql.pm blib/lib/Bundle/DBD/mysql.pm cc -c -I/usr/local/lib/perl5/site_perl/5.8.0/i686-linux/auto/DBI -I'/usr/include' -fno-strict-aliasing -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O3 -DVERSION=\"2.1028\" -DXS_VERSION=\"2.1028\" -fpic "-I/usr/local/lib/perl5/5.8.0/i686-linux/CORE" dbdimp.c In file included from dbdimp.c:29: dbdimp.h:31:49: mysql.h: No such file or directory dbdimp.h:32:49: errmsg.h: No such file or directory In file included from dbdimp.c:29: dbdimp.h:116: parse error before "MYSQL" dbdimp.h:116: warning: no semicolon at end of struct or union dbdimp.h:122: parse error before '}' token dbdimp.h:151: parse error before "MYSQL_RES" dbdimp.h:151: warning: no semicolon at end of struct or union dbdimp.h:164: parse error before '}' token In file included from dbdimp.c:29: What should I do? Thanks! Anelda From jason at cgt.duhs.duke.edu Thu Aug 14 09:26:33 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Aug 14 09:08:55 2003 Subject: [Bioperl-l] Installing Bioperl - Dependencies problem In-Reply-To: <45502.196.38.142.111.1060865908.squirrel@webmail.sanbi.ac.za> References: <45502.196.38.142.111.1060865908.squirrel@webmail.sanbi.ac.za> Message-ID: On Thu, 14 Aug 2003, Anelda Boardman wrote: > Hallo, > > I've been trying to install Bioperl, but ran into some trouble with the > dependencies... After running 'perl Makefile.PL' I got the message that > Ace and DBD::mysql are needed. > You nnly need these if you want to run Bio::DB::GFF (DBD::mysql) and Bio::SeqIO::ace (Ace) > I downloaded DBD::mysql-2.9002 from CPAN and tried to install it according > to the instructions. 'perl Makefile.PL' runs smoothly, but when I want to > 'make', something goes wrong and I can't figure out what to do to solve > the problem. Please could someone help me? > > This is part of the error message: > DBD::mysql requires you have installed the mysql libraries on your system - if you are using a linux distro - something like mysql and mysql-devel are needed. But you only have to do this if you want to use modules which need DBD::mysql (Bio::DB::GFF with a mysql backend). > [root@jive DBD-mysql-2.1028]# make > cp lib/DBD/mysql.pm blib/lib/DBD/mysql.pm > cp lib/DBD/mysql/INSTALL.pod blib/lib/DBD/mysql/INSTALL.pod > cp lib/Mysql.pm blib/lib/Mysql.pm > cp lib/Mysql/Statement.pm blib/lib/Mysql/Statement.pm > cp lib/DBD/mysql.pod blib/lib/DBD/mysql.pod > cp lib/Bundle/DBD/mysql.pm blib/lib/Bundle/DBD/mysql.pm > cc -c -I/usr/local/lib/perl5/site_perl/5.8.0/i686-linux/auto/DBI > -I'/usr/include' -fno-strict-aliasing -D_LARGEFILE_SOURCE > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O3 -DVERSION=\"2.1028\" > -DXS_VERSION=\"2.1028\" -fpic > "-I/usr/local/lib/perl5/5.8.0/i686-linux/CORE" dbdimp.c > In file included from dbdimp.c:29: > dbdimp.h:31:49: mysql.h: No such file or directory > dbdimp.h:32:49: errmsg.h: No such file or directory > In file included from dbdimp.c:29: > dbdimp.h:116: parse error before "MYSQL" > dbdimp.h:116: warning: no semicolon at end of struct or union > dbdimp.h:122: parse error before '}' token > dbdimp.h:151: parse error before "MYSQL_RES" > dbdimp.h:151: warning: no semicolon at end of struct or union > dbdimp.h:164: parse error before '}' token > In file included from dbdimp.c:29: > > > What should I do? > > Thanks! > Anelda > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jamie at genome.arizona.edu Thu Aug 14 12:35:52 2003 From: jamie at genome.arizona.edu (Jamie Hatfield) Date: Thu Aug 14 12:35:54 2003 Subject: [Bioperl-l] Re: Bio::FPC In-Reply-To: References: Message-ID: <1060878865.562.26.camel@motox> Yes, actually. We are just now finishing up the fpc parser. I was planning on soon asking the group how I would go about submitting it? It consists of 5 modules that we have put in the MapIO and Map namespaces. Bio::MapIO::fpc.pm Bio::Map::physical.pm Bio::Map::fpcmarker.pm (sorry, but marker doesn't work) Bio::Map::clone.pm Bio::Map::contig.pm If you want to see how this object might be used, check out http://www.genome.arizona.edu/software/fpc/biofpc/index.html You'll see there documentation for the modules, and a few test cases or example usages. Also, we are trying to make a generic converter to let you load in a fpc file and generate the necessary GFF for GBrowse to display the fpc map. It's a quite simple display of the clones, markers, and contigs, but maybe that will be usefull as an alternative to WebFPC (a java view only version of fpc). It works for us, but might not work for everybody. We should be able to patch it up, though, if it's missing features. So, anyways, if somebody can let me know how to go about submitting it, we'll start the process. I looked through the FAQ and it basically said to just post information if you have a module that you would like to contribute, so, here's the information. Jamie On Mon, 2003-08-11 at 16:21, Yee Man Chan wrote: > Hi Jamie and all, > > My boss asked me to parse an FPC file. So I searched the bioperl > mailing list archive for "FPC". I found that there was a discussion back > in Nov 2002 about it. I am wondering whether this FPC parse is done or > not. If it is workable now, can anyone tell me where I can download it? > Otherwise, can someone point me to a spec of an FPC parser? > > Thanks a lot. > Yee Man > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From prakash at ece.arizona.edu Thu Aug 14 14:18:03 2003 From: prakash at ece.arizona.edu (Jayaprakash Rudraraju) Date: Thu Aug 14 14:15:27 2003 Subject: [Bioperl-l] Padded position in Consensus sequence In-Reply-To: <20030428151443.96788.qmail@web41104.mail.yahoo.com> References: <20030428151443.96788.qmail@web41104.mail.yahoo.com> Message-ID: Hi, I have written a small subroutine to as part of the program to tag primers in Consed. The following subroutine gives padded position on the consensus sequence, given its position on reference sequence. #!perl -w my $consensus = "AGG*TGAC**TA***AGTCCT*T"; print map { "$_\t". padded_position($_) ."\n"} (1..16); sub padded_position { my ($unpad, $pads) = (@_, 0); $pads++ until $unpad == substr($consensus, 0, $unpad+$pads) =~ tr/ACGT//; $unpad+$pads; } can you suggest me some more efficient or elegant solutions. Eventhough I have condensed it as much as I can, I am looking for a simpler logic. Prakash. -- My favorite animal is steak. -- Fran Lebowitz (1950 - ) From jamie at genome.arizona.edu Thu Aug 14 14:35:55 2003 From: jamie at genome.arizona.edu (Jamie Hatfield) Date: Thu Aug 14 14:35:57 2003 Subject: [Bioperl-l] Padded position in Consensus sequence In-Reply-To: References: <20030428151443.96788.qmail@web41104.mail.yahoo.com> Message-ID: <1060886069.562.36.camel@motox> The Bio::LocatableSeq object is good at doing what you're looking for. #!/usr/local/bin/perl -w use Bio::LocatableSeq; my $consensus = "AGG*TGAC**TA***AGTCCT*T"; ## gaps are represented using '-' instead of '*' in this object. ## either way, though, we need a count of the number of gaps. ## if your seq already had '-' representing gaps, then ## my $gaps = ($consensus =~ s/-/-/g); ## would still work. my $gaps = ($consensus =~ s/\*/-/g); my $seq = new Bio::LocatableSeq(-seq => $consensus, -id => "consensus", -start => 1, -end => length($consensus)-$gaps); print map { "$_\t" . $seq->column_from_residue_number($_) . "\n" } ($seq->start..$seq->end); On Thu, 2003-08-14 at 11:18, Jayaprakash Rudraraju wrote: > > Hi, > > I have written a small subroutine to as part of the program to tag primers > in Consed. The following subroutine gives padded position on the consensus > sequence, given its position on reference sequence. > > #!perl -w > my $consensus = "AGG*TGAC**TA***AGTCCT*T"; > print map { "$_\t". padded_position($_) ."\n"} (1..16); > > sub padded_position { > my ($unpad, $pads) = (@_, 0); > $pads++ until $unpad == substr($consensus, 0, $unpad+$pads) =~ tr/ACGT//; > $unpad+$pads; > } > > can you suggest me some more efficient or elegant solutions. Eventhough I > have condensed it as much as I can, I am looking for a simpler logic. > > Prakash. > > -- > My favorite animal is steak. > -- Fran Lebowitz (1950 - ) > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From prakash at ece.arizona.edu Thu Aug 14 15:11:01 2003 From: prakash at ece.arizona.edu (Jayaprakash Rudraraju) Date: Thu Aug 14 15:07:04 2003 Subject: [Bioperl-l] Padded position in Consensus sequence In-Reply-To: <1060886069.562.36.camel@motox> References: <20030428151443.96788.qmail@web41104.mail.yahoo.com> <1060886069.562.36.camel@motox> Message-ID: Hi Jamie, Just tested it. Works great. Thanks for the reply and nice to know that you are just few blocks away from here. I work as a programmer in Arizona Respiratory Center at the University Medical Center. I don't use modules in perl, except when I have to deal with excel spreadsheets. Our lab is not involved in any software research projects, so I never found it useful to use bio-perl modules. But, I will read the module documention for Bio::LocatableSeq to also look for other functions, which might be useful in future. Thanks, Prakash. 11:34am, IP packets from Jamie Hatfield delivered: > The Bio::LocatableSeq object is good at doing what you're looking for. > > #!/usr/local/bin/perl -w > use Bio::LocatableSeq; > my $consensus = "AGG*TGAC**TA***AGTCCT*T"; > > ## gaps are represented using '-' instead of '*' in this object. > ## either way, though, we need a count of the number of gaps. > ## if your seq already had '-' representing gaps, then > ## my $gaps = ($consensus =~ s/-/-/g); > ## would still work. > my $gaps = ($consensus =~ s/\*/-/g); > my $seq = new Bio::LocatableSeq(-seq => $consensus, > -id => "consensus", > -start => 1, > -end => length($consensus)-$gaps); > > print map { "$_\t" . $seq->column_from_residue_number($_) . "\n" } > ($seq->start..$seq->end); > > On Thu, 2003-08-14 at 11:18, Jayaprakash Rudraraju wrote: > > > > Hi, > > > > I have written a small subroutine to as part of the program to tag primers > > in Consed. The following subroutine gives padded position on the consensus > > sequence, given its position on reference sequence. > > > > #!perl -w > > my $consensus = "AGG*TGAC**TA***AGTCCT*T"; > > print map { "$_\t". padded_position($_) ."\n"} (1..16); > > > > sub padded_position { > > my ($unpad, $pads) = (@_, 0); > > $pads++ until $unpad == substr($consensus, 0, $unpad+$pads) =~ tr/ACGT//; > > $unpad+$pads; > > } > > > > can you suggest me some more efficient or elegant solutions. Eventhough I > > have condensed it as much as I can, I am looking for a simpler logic. > > > > Prakash. > > > > -- > > My favorite animal is steak. > > -- Fran Lebowitz (1950 - ) > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- "Whatever women do they must do twice as well as men to be thought half as good. Fortunately, this is not difficult." -- Charlotte Whitton From ben at cse.wustl.edu Thu Aug 14 15:19:09 2003 From: ben at cse.wustl.edu (Ben Westover) Date: Thu Aug 14 15:18:35 2003 Subject: [Bioperl-l] Please help me with Footprinter Message-ID: Dear Friends, I am relatively new to bioperl and I am having trouble when I try to run Footprinter. When running what seems to be a fairly straightforward piece of code I found at http://docs.bioperl.org/bioperl-run/Bio/Tools/Run/FootPrinter.html I get the following error: Can't call method "close" on an undefined value at /usr/lib/perl5/site_perl/5.6.1/Bio/Tools/Run/FootPrinter.pm line 418. I am including the relevant bits of code/information in case anyone can help me out. My main question is where does the filename in $tfh1 come from and how can I set it and why is it necessary? Any help I can get would be greatly appreciated. Warm Regards, Ben The treefile is in the local directory and contains the following: (A,(B,(C,(D,E)))) I created the array of sequences with a set of calls: my @seqs; for(my $i=0;$i<$n;$i++){ my $tmp = Bio::Seq->new(-display_id => $id[$i], -seq => $seq[$i]); push(@seqs, $tmp); } *** Below is the code *** my @footprinter_params = ( 'size'=>8, 'max_mutations_per_branch'=>4, 'sequence_type'=>'upstream', 'subregion_size'=>30, 'position_change_cost'=>3, 'triplet_filtering'=>1, 'pair_filtering'=>1, 'post_filtering'=>1, 'inversion_cost'=>1, 'max_mutations'=>4, 'program'=>"FootPrinter", 'tree' =>"treefile", 'verbose'=>1); my $footprinter_factory = Bio::Tools::Run::FootPrinter->new(@footprinter_params); my @fp = $footprinter_factory->run(@seqs); *** The offending line is the call to $tfh1->close in _setinput, which is called from run as shown below. *** sub _setinput { my ($self,@seq) = @_; my ($tfh1,$outfile1); $outfile1 = $self->outfile_name(); if (defined $outfile1) { $self->io()->_initialize_io(-file => $tfh1); } else { ($tfh1,$outfile1) = $self->io->tempfile(-dir=>$self->tempdir); } my $out1 = Bio::SeqIO->new(-fh=> $tfh1 , '-format' => 'Fasta'); foreach my $seq(@seq){ $seq->isa("Bio::PrimarySeqI") || $self->throw("Need a Bio::PrimarySeq compliant object for FootPrinter"); $out1->write_seq($seq); } $tfh1->close; undef($tfh1); return ($outfile1); } sub run { my ($self,@seq) = @_; #need at least 2 for comparative genomics duh. $#seq > 0 || $self->throw("Need at least two sequences"); $self->tree || $self->throw("Need to specify a phylogenetic tree using -tree option"); my $infile = $self->_setinput(@seq); my $param_string = $self->_setparams(); my @footprint_feats = $self->_run($infile,$self->tree,$param_string); return @footprint_feats; } From jason at cgt.duhs.duke.edu Thu Aug 14 17:20:58 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Aug 14 17:03:09 2003 Subject: [Bioperl-l] Please help me with Footprinter In-Reply-To: References: Message-ID: ugh there are some major bugs in the module's code. It should be written like this: if( $outfile1 ) { open($tfh1, ">$outfile1") || $self->throw("$outfile1: $!"); } else { ($tfh1,$outfile1) = $self->io->tempfile(-dir=>$self->tempdir); } And at the bottom of the method it should be written: $out1->close; # close the SeqIO object close($tfh1); # close the filehandle just in case undef($tfh1); # really get rid of it return ($outfile1); I'll make the necessary changes. Quite possibly it was me who did this in some sort of haste... Updates coming to CVS in about 15 minutes... -jason On Thu, 14 Aug 2003, Ben Westover wrote: > Dear Friends, > > I am relatively new to bioperl and I am having trouble when I try to run > Footprinter. When running what seems to be a fairly straightforward piece > of code I found at > http://docs.bioperl.org/bioperl-run/Bio/Tools/Run/FootPrinter.html > I get the following error: > > Can't call method "close" on an undefined value at > /usr/lib/perl5/site_perl/5.6.1/Bio/Tools/Run/FootPrinter.pm line 418. > > I am including the relevant bits of code/information in case anyone can > help me out. My main question is where does the filename in $tfh1 come > from and how can I set it and why is it necessary? Any help I can get > would be greatly appreciated. > > Warm Regards, > Ben > > The treefile is in the local directory and contains the following: > (A,(B,(C,(D,E)))) > > I created the array of sequences with a set of calls: > > my @seqs; > for(my $i=0;$i<$n;$i++){ > my $tmp = Bio::Seq->new(-display_id => $id[$i], -seq => $seq[$i]); > push(@seqs, $tmp); > } > > *** Below is the code *** > > my @footprinter_params = ( > 'size'=>8, > 'max_mutations_per_branch'=>4, > 'sequence_type'=>'upstream', > 'subregion_size'=>30, > 'position_change_cost'=>3, > 'triplet_filtering'=>1, > 'pair_filtering'=>1, > 'post_filtering'=>1, > 'inversion_cost'=>1, > 'max_mutations'=>4, > 'program'=>"FootPrinter", > 'tree' =>"treefile", > 'verbose'=>1); > my $footprinter_factory = > Bio::Tools::Run::FootPrinter->new(@footprinter_params); > > my @fp = $footprinter_factory->run(@seqs); > > > *** The offending line is the call to $tfh1->close in _setinput, which is > called from run as shown below. *** > > sub _setinput { > my ($self,@seq) = @_; > my ($tfh1,$outfile1); > $outfile1 = $self->outfile_name(); > if (defined $outfile1) { > $self->io()->_initialize_io(-file => $tfh1); > } else { > ($tfh1,$outfile1) = $self->io->tempfile(-dir=>$self->tempdir); > } > my $out1 = Bio::SeqIO->new(-fh=> $tfh1 , '-format' => 'Fasta'); > foreach my $seq(@seq){ > $seq->isa("Bio::PrimarySeqI") || $self->throw("Need a Bio::PrimarySeq compliant object for FootPrinter"); > $out1->write_seq($seq); > } > $tfh1->close; > undef($tfh1); > return ($outfile1); > } > > sub run { > my ($self,@seq) = @_; > > #need at least 2 for comparative genomics duh. > $#seq > 0 || $self->throw("Need at least two sequences"); > $self->tree || $self->throw("Need to specify a phylogenetic tree using -tree option"); > > my $infile = $self->_setinput(@seq); > > my $param_string = $self->_setparams(); > my @footprint_feats = $self->_run($infile,$self->tree,$param_string); > return @footprint_feats; > > } > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jiang.qian at jhmi.edu Thu Aug 14 16:07:25 2003 From: jiang.qian at jhmi.edu (Jiang Qian) Date: Thu Aug 14 17:27:01 2003 Subject: [Bioperl-l] question on installation Message-ID: <3F3BEBFD.1EED0072@jhmi.edu> Hi, I am new on BioPerl. I installed the system on a Linux machine through the procedure of "perl Makefiel.PL" => "make" => "make test" => "make install". Everything seems okay. But when I tried the script from tutorial, use Bio::Perl; $seq_object = get_sequence('swissprot',"ROA1_HUMAN"); write_sequence(">roa1.fasta",'fasta',$seq_object); I got the error message " Your system does not have IO::String installed so the DB retrieval method is not available at /usr/lib/perl5/site_perl/5.6.1/Bio/Perl.pm line 466 Bio::Perl::get_sequence('swissprot', 'ROA1_HUMAN') called at ./try_bioperl.pl line 7". What's the problem? Anyboday can help me with this? Thanks, -Jiang From laurichj at bioinfo.ucr.edu Thu Aug 14 18:06:12 2003 From: laurichj at bioinfo.ucr.edu (Josh Lauricha) Date: Thu Aug 14 18:05:38 2003 Subject: [Bioperl-l] question on installation In-Reply-To: <3F3BF229.96A62D3E@jhmi.edu> References: <3F3BEBFD.1EED0072@jhmi.edu> <20030814213802.GA14059@bioinfo.ucr.edu> <3F3BF229.96A62D3E@jhmi.edu> Message-ID: <20030814220612.GA14978@bioinfo.ucr.edu> On Thu 16:33, Jiang Qian wrote: > > Hi Josh, > > Thank you very much for your help. It works. > I have one more question: > I tried to run the example script in bioperl/example directory. However, one > script gave me the following message: > "Can't locate Bio/Ontology/simpleGOparser.pm in @INC (@INC contains: > BEGIN failed--compilation aborted at ./simpleGOparser_example.pl line 5." It looks like the Bio::Ontology::simpleGOparser is no longer part of the main distribution, or it just got renamed. I'm not familiar with this stuff, so does anyone know? -- ---------------------------- | Josh Lauricha | | laurichj@bioinfo.ucr.edu | | Bioinformatics, UCR | |--------------------------| From steve_chervitz at affymetrix.com Thu Aug 14 18:54:44 2003 From: steve_chervitz at affymetrix.com (Steve Chervitz) Date: Thu Aug 14 18:54:04 2003 Subject: [Bioperl-l] the roundup (long) In-Reply-To: Message-ID: <4BEC936C-CEAA-11D7-BF9B-000A95765236@affymetrix.com> On Thursday, Aug 14, 2003, at 05:33 US/Pacific, Jason Stajich wrote: > I've added a bunch of new things and fixed some bugs, wanted to try and > summarize before I get too busy and forget the details. Good on you, Jason. > [Bio::SearchIO] > > Additionally a SearchIO::blasttable has been added which can parse > NCBI -m > 8 or -m 9 output for those just needing some minimal information. Just the other week, I was thinking that Bioperl needs this. Glad to see you've already done it. Parsing speed and memory usage should be improved, eh? Speaking of blast table parsing, it would really be useful if the NCBI blast tabular output included the length of the hit and query sequences. This would enable the frac_aligned methods of the Hit and HSP objects to work. Being able to determine the percent of the hit and query that are aligned is often vital. I recently mentioned this to a member of the NCBI blast development team (Tom Madden) and he seemed pretty receptive to putting this in. Don't know what the timeframe would be. Steve From jason at cgt.duhs.duke.edu Thu Aug 14 22:24:43 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Aug 14 22:06:40 2003 Subject: [Bioperl-l] question on installation In-Reply-To: <3F3BEBFD.1EED0072@jhmi.edu> References: <3F3BEBFD.1EED0072@jhmi.edu> Message-ID: As the error message states, you need to install an additional module called IO::String. http://www.cpan.org On Thu, 14 Aug 2003, Jiang Qian wrote: > > Hi, > > I am new on BioPerl. I installed the system on a Linux machine through > the procedure of "perl Makefiel.PL" => "make" => "make test" => "make > install". Everything seems okay. But when I tried the script from > tutorial, > > use Bio::Perl; > $seq_object = get_sequence('swissprot',"ROA1_HUMAN"); > write_sequence(">roa1.fasta",'fasta',$seq_object); > > I got the error message " Your system does not have IO::String installed > so the DB retrieval method is not available at > /usr/lib/perl5/site_perl/5.6.1/Bio/Perl.pm line 466 > Bio::Perl::get_sequence('swissprot', 'ROA1_HUMAN') called at > ./try_bioperl.pl line 7". > > What's the problem? Anyboday can help me with this? Thanks, > > -Jiang > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From wes.barris at csiro.au Thu Aug 14 23:54:44 2003 From: wes.barris at csiro.au (Wes Barris) Date: Thu Aug 14 23:54:36 2003 Subject: [Bioperl-l] problem with Graphics In-Reply-To: <200308121545.33474.sobrien@umail.ucsb.edu> References: <200308121545.33474.sobrien@umail.ucsb.edu> Message-ID: <3F3C5984.3070401@csiro.au> Sean O'Brien wrote: > Hi, > > I have been trying to get BioPerl to output png's, but I seem to be getting > invalid png files. I have a fresh install of libgd, version 2.0.15 and my GD > version is 2.07. I installed Bundle::BioPerl, and after having no luck, I > installed BioPerl version 1.2.2 from the sources in > current_core_stable.tar.gz. When I run 'make test' I get an ok for > BioGraphics. Also, when I run the first script described in the Bio Graphics > tutorial, it runs with no errors and outputs some data which appears as > though it could be an image. However, the file seems to be of an invalid png > format because it cannot be opened by display, galeon or the GIMP. This is > pretty frustrating because everything apears to be installing/running fine, > but then the image is somehow corrupted. What might I have done wrong/ need > to do to make this work. Thanks. Hi Sean, The examples that output png files never worked for me either. To fix them, I changed this line: print $panel->png; to something like this: open(OUT, ">junk.png"); binmode OUT; print OUT $panel->png; close(OUT); print("Wrote junk.png\n"); -- Wes Barris E-Mail: Wes.Barris@csiro.au From jimmyfernandez at hotmail.com Fri Aug 15 04:43:11 2003 From: jimmyfernandez at hotmail.com (Jimmy Fernandez) Date: Fri Aug 15 04:42:33 2003 Subject: [Bioperl-l] octamers search Message-ID: Hi all I have a list of octamers having a set of scores x and y. I need to search my sequence - if they contain any of these octamer set of sequences having a score of x+y > or = 20. How may I proceed to write a simple script that can allow me to do his please? Hope someone out there can help a newbie! thanks in advance :) jimmy _________________________________________________________________ Get 10mb of inbox space with MSN Hotmail Extra Storage http://join.msn.com/?pgmarket=en-sg From Guido.Dieterich at gbf.de Tue Aug 19 07:21:53 2003 From: Guido.Dieterich at gbf.de (Guido Dieterich) Date: Tue Aug 19 07:21:08 2003 Subject: [Bioperl-l] blast Bio::Tools::Run::StandAloneBlast BlastFindWords returned non-zero status Message-ID: <3F420851.20700@gbf.de> Hi, i have a strange problem. I use $factory = Bio::Tools::Run::StandAloneBlast->new('program' => 'blastp', 'outfile' => 'bl2seq.out'); $report = $factory->bl2seq($query2, $query1); in case $query2->seq is "CMWDFDDXMPPADEDYSPWQLWLLA" Blast runs without warnings in case $query2->seq is "CMWDFDDXMPPADEDYSPWQLWLLS" Blast warns # last AA can also be T or G [bl2seq] WARNING: [000.000] BlastFindWords returned non-zero status [bl2seq] WARNING: [000.000] SetUpBlastSearch failed. ???????????? Thanks Guido -- Dr. Guido Dieterich Dipl.-Biologe BioComputing SB - Strukturbiologie \==-| GBF - Gesellschaft fuer Biotechnologische Forschung \=/ 0010010010100101110010 German Research Centre for Biotechnology | /-\ /-==| 0010100100111101010010 WWW: http://www.gbf.de _/_/_/ _/_/_/ _/_/_/ |==-/ EMAIL: gdi@gbf.de _/ _/ _/ _/ _/ \=/ 0100100100010010010101 _/ _/ _/ _/ /\ Mascheroder Weg 1 _/ _/ _/_/_/ _/_/_/ /=-\ 1101001010100101010101 D-38124 Braunschweig _/ _/ _/ _/ _/ Tel: +(49) 531 6181 745 _/ _/ _/ _/ _/ FAX: +(49) 531 2612 388 _/_/_/ _/_/_/ _/ http://struktur.gbf.de/ Es ist nicht genug, zu wissen, man muss auch anwenden. Es ist nicht genug, zu wollen, man muss auch tun. JOHANN WOLFGANG VON GOETHE Deutscher Dichter (1749 - 1832) From jason at cgt.duhs.duke.edu Tue Aug 19 08:43:15 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Tue Aug 19 08:23:50 2003 Subject: [Bioperl-l] blast Bio::Tools::Run::StandAloneBlast BlastFindWords returned non-zero status In-Reply-To: <3F420851.20700@gbf.de> References: <3F420851.20700@gbf.de> Message-ID: And if you take bioperl out of the loop and run bl2seq on the cmd line what does it do? On Tue, 19 Aug 2003, Guido Dieterich wrote: > Hi, > > i have a strange problem. I use > > $factory = Bio::Tools::Run::StandAloneBlast->new('program' => 'blastp', > 'outfile' => 'bl2seq.out'); > > $report = $factory->bl2seq($query2, $query1); > > in case $query2->seq is "CMWDFDDXMPPADEDYSPWQLWLLA" Blast runs without > warnings > > in case $query2->seq is "CMWDFDDXMPPADEDYSPWQLWLLS" Blast warns > # last AA can also be T or G > [bl2seq] WARNING: [000.000] BlastFindWords returned non-zero status > [bl2seq] WARNING: [000.000] SetUpBlastSearch failed. > > ???????????? > > Thanks > > > Guido > > > > > > > > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From kvddrift at earthlink.net Tue Aug 19 21:21:47 2003 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue Aug 19 21:23:23 2003 Subject: [Bioperl-l] t/DB test fails Message-ID: Hi, From one of the developers at fink I got the following error report when he tried to install bioperl on Mac OS X using fink: 2) complaints about missing (optional) perl modules: SOAP::Lite Text::Shellwords HTML::Parser DBD::mysql Ace GD XML::Twig XML::Parser::PerlSAX XML::Writer XML::Parser Graph::Directed IO::Scalar 3) 'make test' failed, which could be related to the missing modules. Specifically: t/DB................ok 61/78Use of uninitialized value in numeric gt (>) at t/DB.t line 244. t/DB................ok 79/78Use of uninitialized value in numeric gt (>) at t/DB.t line 274. t/DB................FAILED tests 62, 80 Failed 2/78 tests, 97.44% okay (-1 skipped test: 75 okay, 96.15%) I didn't see these errors on my sytem. Is this error related to one of the optional modules that was not installed on the testers' Mac? If that's the case, then I can make bioperl depend on that module. If not, is there another reason for these errors? thanks, - Koen. From markw at illuminae.com Wed Aug 20 10:48:00 2003 From: markw at illuminae.com (Mark Wilkinson) Date: Wed Aug 20 10:48:02 2003 Subject: [Bioperl-l] Bemusement with get_seq_by_gi in a CGI script In-Reply-To: References: Message-ID: <1061390928.1720.43.camel@localhost.localdomain> Hi all, I've been banging my head for days on this problem, but I can't make heads nor tails of it. The solution must be obvious, but I can't see it. The following script is running at http://mobycentral.cbr.nrc.ca/cgi-bin/testseq.cgi =============================================== use lib '/usr/local/apache/cgi-bin/bioperl/core'; print "Content-type: text/plain\n\n"; use Bio::DB::GenBank; $d = Bio::DB::GenBank->new(); $seq = $d->get_Seq_by_gi('163483'); print "I didn't print the sequence!\n"; =============================================== If you look at the output you see that the genbank record is printed to the screen, but the last print statement is not! If I run the same code from the command line I don't see the record, and the last print statement prints. ??!?? I'm at my wit's end trying to work out what is happening. This started shortly after upgrading to mod_perl, so it might be associated with that somehow, but I need some direction in trying to find my mistake. Any advice appreciated - if only to save my sanity :-) Cheers! Mark From simon.andrews at bbsrc.ac.uk Wed Aug 20 12:07:49 2003 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Wed Aug 20 12:09:48 2003 Subject: [Bioperl-l] Bemusement with get_seq_by_gi in a CGI script Message-ID: <2DC41140A89ED411989D00508BDCD9ED01E28B28@bi-exsrv1.iapc.bbsrc.ac.uk> > -----Original Message----- > From: Mark Wilkinson [mailto:markw@illuminae.com] > Sent: 20 August 2003 15:49 > To: Bioperl-l@portal.open-bio.org > Subject: [Bioperl-l] Bemusement with get_seq_by_gi in a CGI script > > use lib '/usr/local/apache/cgi-bin/bioperl/core'; > print "Content-type: text/plain\n\n"; > use Bio::DB::GenBank; > $d = Bio::DB::GenBank->new(); > $seq = $d->get_Seq_by_gi('163483'); > print "I didn't print the sequence!\n"; > =============================================== > > If you look at the output you see that the genbank record is > printed to the screen, but the last print statement is not! > If I run the same code from the command line I don't see the > record, and the last print statement prints. Actually that's not quite what's happening. If you view the raw http traffic then what you actually get is the sequence, then the http headers, then the "I didn't print the sequence" line. lynx --dump http://mobycentral.cbr.nrc.ca/cgi-bin/testseq.cgi Gives: ########################################## LOCUS BOVPANPRO 947 bp mRNA linear MAM 29-APR-1996 DEFINITION B.taurus prepreproelastase I mRNA, complete cds. ACCESSION M80838 [snip some stuff] 781 tcttggataa ataatgccat tgccagcaac tgaacatctt cctgagtcca gtggtattcc 841 caagatggtt ctgggattga cagcagaact tgaggccatc aaggaaaaaa ccagtctaag 901 agactattga gccagatgtg gaaaagcaaa taaaatcgaa tatatgt // HTTP/1.1 200 OK Date: Wed, 20 Aug 2003 15:40:02 GMT Server: Apache/2.0.47 (Unix) mod_perl/1.99_09 Perl/v5.8.0 DAV/2 Connection: close Content-Type: text/plain; charset=ISO-8859-1 I didn't print the sequence! ############################################ So your script is doing what it's supposed to, it's just that some other stuff is getting out on STDOUT before your webserver is able to get in on the act. Having played a bit, this proves to be interesting: #!/usr/bin/perl -w use strict; use Bio::DB::GenBank; close STDOUT; my $d = Bio::DB::GenBank->new(); my $seq = $d -> get_Seq_by_gi('163483'); This gives me: print() on closed filehandle STDOUT at /usr/lib/perl5/site_perl/5.8.0/Bio/DB/WebDBSeqI.pm line 701 So WebDBSeqI.pm is usurping STDOUT as part of its query. This probably explains what you're getting. Apache will redirect STDOUT straight to the return stream for the connection. This means it gets the output intended for WbDBSeq and it appears in your programs output. You then get the output you printed. If this is right, you should have some interesting error messages in your logs if you run your script with warnings enabled. I can't see an immediate fix for this, short of running your fetch as a completely detached process with a separate STDOUT, but that kind of defeats the point of using mod-perl. The use of a pipe from STDOUT to read the results of a webquery seem pretty engrained into WebQueryI.pm and it may not be trivial to change it. Maybe others will be able to think of a simpler work-round? Simon. From markw at illuminae.com Wed Aug 20 12:32:31 2003 From: markw at illuminae.com (Mark Wilkinson) Date: Wed Aug 20 12:32:32 2003 Subject: [BioPerl] RE: [Bioperl-l] Bemusement with get_seq_by_gi in a CGI script In-Reply-To: <2DC41140A89ED411989D00508BDCD9ED01E28B28@bi-exsrv1.iapc.bbsrc.ac.uk> References: <2DC41140A89ED411989D00508BDCD9ED01E28B28@bi-exsrv1.iapc.bbsrc.ac.uk> Message-ID: <1061397198.1718.73.camel@localhost.localdomain> On Wed, 2003-08-20 at 10:07, simon andrews (BI) wrote: > If this is right, you should have some interesting error messages in your logs if you run your script with warnings enabled. I do indeed :-) This is what prompted me to write the little test script. Hmmm... well, I guess I will have to run the fetch in a separate process if that is what is required :-/ It's no big deal, I guess... albeit awkward. M From jason at cgt.duhs.duke.edu Wed Aug 20 13:05:12 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Wed Aug 20 12:45:20 2003 Subject: [Bioperl-l] Bemusement with get_seq_by_gi in a CGI script In-Reply-To: <2DC41140A89ED411989D00508BDCD9ED01E28B28@bi-exsrv1.iapc.bbsrc.ac.uk> References: <2DC41140A89ED411989D00508BDCD9ED01E28B28@bi-exsrv1.iapc.bbsrc.ac.uk> Message-ID: > So your script is doing what it's supposed to, it's just that some other > stuff is getting out on STDOUT before your webserver is able to get in > on the act. > > Having played a bit, this proves to be interesting: > > #!/usr/bin/perl -w > use strict; > use Bio::DB::GenBank; > > close STDOUT; > > my $d = Bio::DB::GenBank->new(); > my $seq = $d -> get_Seq_by_gi('163483'); > > > This gives me: > > print() on closed filehandle STDOUT at > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/WebDBSeqI.pm line 701 > > So WebDBSeqI.pm is usurping STDOUT as part of its query. This probably > explains what you're getting. Apache will redirect STDOUT straight to > the return stream for the connection. This means it gets the output > intended for WbDBSeq and it appears in your programs output. You then > get the output you printed. > This is part of Lincoln's rechaining of the IO and using fork - looking at his comments in the code. # Try to create a stream using POSIX fork-and-pipe facility. # this is a *big* win when fetching thousands of sequences from # a web database because we can return the first entry while # transmission is still in progress. # Also, no need to keep sequence in memory or in a temporary file. # If this fails (Windows, MacOS 9), we fall back to non-pipelined # access. You can turn this off by adding to the DB::GenBank init my $db = new Bio::DB::GenBank(-retrievaltype => 'io_string'); -retrievaltype => 'io_string' (for in-memory holding of the sequence before parsing) or -retrievaltype => 'temp' (for use of tempfiles, but I'm not 100% this code has gotten a workout to cleanup until the program exits which might be a problem for mod_perl running scripts) > If this is right, you should have some interesting error messages in > your logs if you run your script with warnings enabled. > > I can't see an immediate fix for this, short of running your fetch as a > completely detached process with a separate STDOUT, but that kind of > defeats the point of using mod-perl. The use of a pipe from STDOUT to > read the results of a webquery seem pretty engrained into WebQueryI.pm > and it may not be trivial to change it. > > Maybe others will be able to think of a simpler work-round? > > > Simon. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From markw at illuminae.com Wed Aug 20 13:13:54 2003 From: markw at illuminae.com (Mark Wilkinson) Date: Wed Aug 20 13:13:56 2003 Subject: [BioPerl] RE: [Bioperl-l] Bemusement with get_seq_by_gi in a CGI script In-Reply-To: References: <2DC41140A89ED411989D00508BDCD9ED01E28B28@bi-exsrv1.iapc.bbsrc.ac.uk> Message-ID: <1061399682.1720.91.camel@localhost.localdomain> On Wed, 2003-08-20 at 11:05, Jason Stajich wrote: > You can turn this off by adding to the DB::GenBank init > my $db = new Bio::DB::GenBank(-retrievaltype => 'io_string'); that solved it. thanks! I would never have thought to look there. M From kvddrift at earthlink.net Wed Aug 20 22:23:35 2003 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed Aug 20 22:24:58 2003 Subject: [Bioperl-l] cgi script problem Message-ID: <775F8A07-D37E-11D7-AC88-003065A5FDCC@earthlink.net> Hi, To test my bioperl installation through fink on Mac OS X, I took one of the cgi scripts (frend), and put it in my local cgi-bin folder. When I run the script on localhost, I get the following error: [Wed Aug 20 21:59:52 2003] [error] [client 127.0.0.1] Premature end of script headers: /Users/koen/Sites/cgi-bin/frend Can't locate Bio/Graphics/Panel.pm in @INC (@INC contains: /System/Library/Perl/darwin /System/Library/Perl /Library/Perl/darwin /Library/Perl /Library/Perl /Network/Library/Perl/darwin /Network/Library/Perl /Network/Library/Perl .) at /Users/koen/Sites/cgi-bin/frend line 5. BEGIN failed--compilation aborted at /Users/koen/Sites/cgi-bin/frend line 5. Bioperl indeed is installed in the fink directory /sw which is not included in the @INC above. But, if I run perl -V, /sw is included in @INC: @INC: /sw/lib/perl5/5.8.0/darwin /sw/lib/perl5/5.8.0 /sw/lib/perl5/darwin /sw/lib/perl5 /System/Library/Perl/darwin .... I have other scripts in cgi-bin, and they work, so I know that apache is configured ok. I also run bptutorial, and that works fine too. Any suggestions how I can fix this? thanks, - Koen. From kvddrift at earthlink.net Thu Aug 21 06:23:20 2003 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu Aug 21 06:24:43 2003 Subject: [Bioperl-l] cgi script problem [solved] In-Reply-To: <775F8A07-D37E-11D7-AC88-003065A5FDCC@earthlink.net> Message-ID: <7CC84F06-D3C1-11D7-A799-003065A5FDCC@earthlink.net> On Wednesday, August 20, 2003, at 10:23 PM, Koen van der Drift wrote: > > I have other scripts in cgi-bin, and they work, so I know that apache > is configured ok. I also run bptutorial, and that works fine too. > > > Any suggestions how I can fix this? Found it! I have to add the line use lib '/path/to/fink/perl/modules/' - Koen. From jason at cgt.duhs.duke.edu Thu Aug 21 08:48:46 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Aug 21 08:28:35 2003 Subject: [Bioperl-l] cgi script problem In-Reply-To: <775F8A07-D37E-11D7-AC88-003065A5FDCC@earthlink.net> References: <775F8A07-D37E-11D7-AC88-003065A5FDCC@earthlink.net> Message-ID: the apache running environment is different from your own user environment so you need to add the /sw path to either the script or to your apache env - the best way to do this will depend on whether or not you are running mod_perl. -jason On Wed, 20 Aug 2003, Koen van der Drift wrote: > Hi, > > To test my bioperl installation through fink on Mac OS X, I took one of > the cgi scripts (frend), and put it in my local cgi-bin folder. When I > run the script on localhost, I get the following error: > > [Wed Aug 20 21:59:52 2003] [error] [client 127.0.0.1] Premature end of > script headers: /Users/koen/Sites/cgi-bin/frend > Can't locate Bio/Graphics/Panel.pm in @INC (@INC contains: > /System/Library/Perl/darwin /System/Library/Perl /Library/Perl/darwin > /Library/Perl /Library/Perl /Network/Library/Perl/darwin > /Network/Library/Perl /Network/Library/Perl .) at > /Users/koen/Sites/cgi-bin/frend line 5. > BEGIN failed--compilation aborted at /Users/koen/Sites/cgi-bin/frend > line 5. > > > Bioperl indeed is installed in the fink directory /sw which is not > included in the @INC above. But, if I run perl -V, /sw is included in > @INC: > > @INC: > /sw/lib/perl5/5.8.0/darwin > /sw/lib/perl5/5.8.0 > /sw/lib/perl5/darwin > /sw/lib/perl5 > /System/Library/Perl/darwin > .... > > > I have other scripts in cgi-bin, and they work, so I know that apache > is configured ok. I also run bptutorial, and that works fine too. > > > Any suggestions how I can fix this? > > > thanks, > > > - Koen. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From maasha at image.dk Thu Aug 21 08:39:02 2003 From: maasha at image.dk (Martin A. Hansen) Date: Thu Aug 21 08:39:37 2003 Subject: [Bioperl-l] parser for GCG flavor of FASTA ? Message-ID: <20030821123902.GA1554@image> hi does anyone have any code that can pipe GCG flavor FASTA reports to Bio::SearchIO ? martin From jason at cgt.duhs.duke.edu Thu Aug 21 09:18:45 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Aug 21 08:58:35 2003 Subject: [Bioperl-l] parser for GCG flavor of FASTA ? In-Reply-To: <20030821123902.GA1554@image> References: <20030821123902.GA1554@image> Message-ID: Not I, but if you post an example report as a feature request to http://bugzilla.open-bio.org it might get on the to do list of a kind soul out there. On Thu, 21 Aug 2003, Martin A. Hansen wrote: > hi > > does anyone have any code that can pipe GCG flavor FASTA reports to > Bio::SearchIO ? > > > martin > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From andrewram200369 at hotmail.com Thu Aug 21 11:14:34 2003 From: andrewram200369 at hotmail.com (Andrew Ram) Date: Thu Aug 21 11:13:44 2003 Subject: [Bioperl-l] GFF to gene structure pictures Message-ID: Hi everyone I would like to convert my gene structures I have in GFF or GTF format to nice pictures probably using the BioGraphics tools. Can someone out there help me with any scripts? Thanks very much in advance-Looking forward to hearing from the group! Andrew _________________________________________________________________ Add photos to your messages with MSN 8. Get 2 months FREE*. http://join.msn.com/?page=features/featuredemail From andrewram200369 at hotmail.com Thu Aug 21 11:18:08 2003 From: andrewram200369 at hotmail.com (Andrew Ram) Date: Thu Aug 21 11:17:18 2003 Subject: [Bioperl-l] synteny Message-ID: Hi everyone I would like to ask if anyone is familar with any software to plot synteny? Thanks very much in advance. A.Ram _________________________________________________________________ Tired of spam? Get advanced junk mail protection with MSN 8. http://join.msn.com/?page=features/junkmail From jason at cgt.duhs.duke.edu Thu Aug 21 11:39:33 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Aug 21 11:19:25 2003 Subject: [Bioperl-l] GFF to gene structure pictures In-Reply-To: References: Message-ID: did you try Lincoln's HOWTO tutorial? http://www.bioperl.org/HOWTOs/ there are also example scripts in scripts/grahics which have some pretty useful starting point code. -jason On Thu, 21 Aug 2003, Andrew Ram wrote: > Hi everyone > I would like to convert my gene structures I have in GFF or GTF format to > nice pictures probably using the BioGraphics tools. Can someone out there > help me with any scripts? > Thanks very much in advance-Looking forward to hearing from the group! > Andrew > > _________________________________________________________________ > Add photos to your messages with MSN 8. Get 2 months FREE*. > http://join.msn.com/?page=features/featuredemail > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Thu Aug 21 11:45:56 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Aug 21 11:25:47 2003 Subject: [Bioperl-l] synteny In-Reply-To: References: Message-ID: you might consider using Gbrowse and the synteny viewer that Lincoln wrote. And example: http://www.wormbase.org/db/seq/ebsyn?name=cb25.fpc0143:1..8000 You'll have to take dive into the Gbrowse documentation to understand how to set this up. -jason On Thu, 21 Aug 2003, Andrew Ram wrote: > Hi everyone I would like to ask if anyone is familar with any software to > plot synteny? > Thanks very much in advance. > A.Ram > > _________________________________________________________________ > Tired of spam? Get advanced junk mail protection with MSN 8. > http://join.msn.com/?page=features/junkmail > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From hirst at interomex.com Thu Aug 21 11:34:42 2003 From: hirst at interomex.com (Martin Hirst) Date: Thu Aug 21 11:26:43 2003 Subject: [Bioperl-l] automated response Message-ID: <10308210834.AA105387241@interomex.com> I will be out of the office starting August 20, 2003 and will not return until September 2, 2003. I am on Vacation. I will respond to your message when I return. If it is urgent please contact Kane Tse at 6042678009 From simon.andrews at bbsrc.ac.uk Thu Aug 21 11:34:19 2003 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Thu Aug 21 11:36:59 2003 Subject: [Bioperl-l] synteny Message-ID: <2DC41140A89ED411989D00508BDCD9ED01E28B2A@bi-exsrv1.iapc.bbsrc.ac.uk> > -----Original Message----- > From: Andrew Ram [mailto:andrewram200369@hotmail.com] > Sent: 21 August 2003 16:18 > To: bioperl-l@bioperl.org > Subject: [Bioperl-l] synteny > > > Hi everyone I would like to ask if anyone is familar with any > software to plot synteny? Not a Perl answer, and it depends what you're looking for, but we use ACT for this purpose http://www.sanger.ac.uk/Software/ACT/ Simon. From maasha at image.dk Thu Aug 21 09:02:21 2003 From: maasha at image.dk (Martin A. Hansen) Date: Thu Aug 21 18:34:46 2003 Subject: [Bioperl-l] parser for GCG flavor of FASTA ? In-Reply-To: References: <20030821123902.GA1554@image> Message-ID: <20030821130221.GB1554@image> On Thu, Aug 21, 2003 at 09:18:45AM -0400, Jason Stajich wrote: > Not I, but if you post an example report as a feature request to > http://bugzilla.open-bio.org it might get on the to do list of a kind soul > out there. hm, i was thinking that maybe somebody already wrote the parser for this - so maybe ill wait around to see if anyone responds - and then request. anyways - ill attach a sample file. martin > > > On Thu, 21 Aug 2003, Martin A. Hansen wrote: > > > hi > > > > does anyone have any code that can pipe GCG flavor FASTA reports to > > Bio::SearchIO ? > > > > > > martin > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu -------------- next part -------------- !!SEQUENCE_LIST 1.0 (Nucleotide) FASTA of: BTG1.seq from: 1 to: 21 June 3, 2002 10:51 Oligo BTG1 seq TO: @/usr/users/ddbase/seq/seq.all Sequences: 9,324 Symbols: 31,508,009 Word Size: 6 Sequences too short to analyze: 9 (27 symbols) Sequences skipped due to type mismatch with query: 2 Searching with both strands of the query. Scoring matrix: GenRunData:fastadna.cmp Constant pamfactor used Gap creation penalty: 16 Gap extension penalty: 4 Histogram Key: Each histogram symbol represents 32 search set sequences Each inset symbol represents 1 search set sequences z-scores computed from opt scores z-score obs exp (=) (*) < 20 1920 0:============================================================ 22 2 0:= 24 0 0: 26 6 0:= 28 6 2:* 30 20 10:* 32 32 39:=* 34 91 107:===* 36 146 220:===== * 38 287 363:========= * 40 556 506:===============*== 42 691 619:===================*== 44 888 683:=====================*====== 46 818 695:=====================*==== 48 687 666:====================*= 50 555 607:==================* 52 442 534:============== * 54 307 456:========== * 56 314 381:========== * 58 314 313:=========* 60 258 253:=======*= 62 195 203:======* 64 137 162:=====* 66 147 128:===*= 68 108 100:===* 70 83 79:==* 72 91 62:=*= 74 37 48:=* 76 39 37:=* 78 38 29:*= 80 30 23:* 82 21 17:* 84 20 14:* 86 9 11:* 88 9 8:* 90 5 6:* 92 7 5:* :====*== 94 1 4:* := * 96 5 3:* :==*== 98 0 2:* : * 100 1 2:* :=* 102 0 1:* :* 104 0 1:* :* 106 0 1:* :* 108 0 1:* :* 110 1 0:= *= 112 0 0: * 114 0 0: * 116 0 0: * 118 0 0: * >120 0 0: * Joining threshold: 45, opt. threshold: 30, opt. width: 16, reg.-scaled The best scores are: init1 initn opt z-sc E(7402).. /usr/users/ddbase/seq/Mouse/Rat/Masami/mpla2/ac002406.emrod Begin: 26864 End: 26884 ! ID AC002406 standard; DNA; ROD; 194... 72 72 78 89.3 0.59 /usr/users/ddbase/seq/Anne_Mette/nrf1aga1amb22/OtherBand/nrf1aga4amb27-al355520.emhum Begin: 75031 End: 75045 ! ID AL355520 standard; DNA; HUM; 157... 75 75 75 85.2 1.2 /usr/users/ddbase/seq/Mouse/Rat/Masami/Lene-seqs/U-Repeat-TC104374.seq Begin: 915 End: 935 Strand: - ! TC104374 from TIGR. Similar to mpla... 40 40 78 109.4 1.3 \\End of List BTG1.seq /usr/users/ddbase/seq/Mouse/Rat/Masami/mpla2/ac002406.emrod ID AC002406 standard; DNA; ROD; 194985 BP. XX AC AC002406; XX SV AC002406.1 XX . . . SCORES Init1: 72 Initn: 72 Opt: 78 z-score: 89.3 E(): 0.59 >>/usr/users/ddbase/seq/Mouse/Rat/Masami/mpla2/ac002406.emrod (194985 nt) initn: 72 init1: 72 opt: 78 Z-score: 89.3 expect(): 0.59 85.7% identity in 21 nt overlap (1-21:26864-26884) 10 20 BTG1.seq GTGACAGTGCCATAGTTTGGA || |||||| || |||||||| ac002406.emr TAGAATTGGGAACAATCACCCATGGAAGGAGTTACAGTGACAAAGTTTGGAGCTGAGACA 26840 26850 26860 26870 26880 26890 ac002406.emr AAAGGATGGAACATCTAGAGACTGCCGTATCCAGAGATCCATCCCATAATTAGCCTCCAA 26900 26910 26920 26930 26940 26950 BTG1.seq /usr/users/ddbase/seq/Anne_Mette/nrf1aga1amb22/OtherBand/nrf1aga4amb27-al355520.emhum ID AL355520 standard; DNA; HUM; 157575 BP. XX AC AL355520; XX SV AL355520.8 XX . . . SCORES Init1: 75 Initn: 75 Opt: 75 z-score: 85.2 E(): 1.2 >>/usr/users/ddbase/seq/Anne_Mette/nrf1aga1amb22/OtherBand/nrf1aga4amb27-al355520.emhum (157575 nt) initn: 75 init1: 75 opt: 75 Z-score: 85.2 expect(): 1.2 100.0% identity in 15 nt overlap (6-20:75031-75045) 10 20 BTG1.seq GTGACAGTGCCATAGTTTGGA ||||||||||||||| nrf1aga4amb2 CTCGGAATCTGATTCCACATGGACATAGGAAGTGCCATAGTTTGGGTTATAAGTCAGCAT 75010 75020 75030 75040 75050 75060 nrf1aga4amb2 TTTTAATTTTATCTTTCAAATTTTTAAGTCTTTTGTAATTGGATTTATTGTCGATTTATT 75070 75080 75090 75100 75110 75120 BTG1.seq /rev /usr/users/ddbase/seq/Mouse/Rat/Masami/Lene-seqs/U-Repeat-TC104374.seq TC104374 from TIGR. Similar to mpla2rcg3m35.seq, etc. (inverse-U repeat) retrovirus-related pol polyprotein (reverse transcriptase {Mus musculus} SP|P11369|POL2_MOUSE RETROV SCORES Init1: 40 Initn: 40 Opt: 78 z-score: 109.4 E(): 1.3 >>/usr/users/ddbase/seq/Mouse/Rat/Masami/Lene-seqs/U-Repeat-TC104374.seq (6469 nt) initn: 40 init1: 40 opt: 78 Z-score: 109.4 expect(): 1.3 85.7% identity in 21 nt overlap (21-1:915-935) 20 10 BTG1.seq TCCAAACTATGGCACTGTCAC |||||||| |||| |||| || U-Repeat-TC1 TCTCTACATGGTCCATCCTTTCATCTCAGCTCCAAACTTTGGCTCTGTAACTCCTTCCAT 890 900 910 920 930 940 U-Repeat-TC1 GGGTGTTTTGTTCCCAAATCTAAGGAGGGGCATAGTGTCCACACTTCAGTCTTCATTCTT 950 960 970 980 990 1000 ! Distributed over 1 thread. ! Start time: Mon Jun 3 10:46:53 2002 ! Completion time: Mon Jun 3 10:52:00 2002 ! CPU time used: ! Database scan: 0:01:27.1 ! Post-scan processing: 0:00:01.6 ! Total CPU time: 0:01:28.8 ! Output File: btg1.fasta From brian_osborne at cognia.com Fri Aug 22 08:37:08 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Aug 22 08:40:05 2003 Subject: [Bioperl-l] bioperl-db authors Message-ID: Ewan, Jason, Hilmar and Elia, Please take a quick look at the README in the bioperl-db package. There are statements in there about certain features that are or are not supported, you need to make sure that this is up-to-date. Thanks again, Brian O. From pm66 at nyu.edu Fri Aug 22 18:23:37 2003 From: pm66 at nyu.edu (Philip MacMenamin) Date: Fri Aug 22 18:22:10 2003 Subject: [Bioperl-l] Bio::Graphics::Panel, -spacing => 0 constructor problem. Message-ID: <200308222223.h7MMN14c022854@mx3.nyu.edu> Hi, Am I right in thinking that the '-spacing' constructor for Bio::Graphics::Panel, if set to 0, should result in no space between tracks? ie That the tracks are either over-laying eachother, or squashed down onto the same plane as the previous one. I cannot get this to happen if this is its purpose. It continues to stack the tracks on the panel as per default. (So I have left it out, its not useful to see this). I know that it works, because I have seen it working in the wormbase UTRs. Here is some code: # if (scalar @threePrimeUTR >0) # { # $panel->add_track(generic=>\@threePrimeUTR, # -bgcolor => 'lightblue', # -fgcolor => 'black', # # -bump => +1, # -spacing => 0, # -utr_color => '#D0D0D0', ##whats this about?, chnging makes no dif? # -font2color => 'blue', # -height => 10, # -description => 1, # -label => '3 prime UTR' # } I have also tried to set spacing to 0 on the tracks surrounding the UTRs, but to no avail. Also, on a slightly differant vein, I cant seem to get the Bio::Graphics::Panel start end constructors to work either. All of which is making me increasingly suspicious of my perl skills. It just makes no differance if I provide these arguments or not. The segment or sequence obj always over-rides the start stop args. Not a massive problem, but it has confused me. More code: my $panel = Bio::Graphics::Panel->new( -segment => $segment, -width => 600, -key_color => '#ffffcc', -start =>$panelStart, -end =>$panelEnd, # -start => 4110000, ); Any help is of course appreciated. -- Philip. From rysz_c3 at yahoo.com Sat Aug 23 03:54:23 2003 From: rysz_c3 at yahoo.com (Flywheel) Date: Sat Aug 23 06:53:56 2003 Subject: [Bioperl-l] Energy Storage Message-ID: <200308231053.h7NArdvT002344@localhost.localdomain> Nice Hello from Flywheel Storage & Sun Tracking Good Bay to Rolling Blackout forever Safety Solution Rolling Blackout will always happened, since the SYSTEM - is design to protect itself in dangerous Over-Current Situations. Breakers shut down basic duty is defense whole system, and isolate from Grid; Bad "wave of over-current" travel to next Sub-Station and stimulate the same results on another, then another, then another. And a VOLTAGE is growing rapidly. Rolling Growing Disconnection in Domino Effect; we did seen twice; Accident ~ 30 years ago (baby boom) - for purpose recently. To decrease all Over-Current situation in POWER LINE or reduce to Zero ALL EXCESS ENERGY USE Energy Storage 25 - 1000 MWh and no Sub - Station will ever have any risk again. Small units UPS 20 - 200 kWh protect Stories, mainframes, servers, computers and homes, give back emergency energy for hours & days and/or time for diesel generator to work Rest you can find on: www.suntracking.com www.flywheelstorage.com Have a nice Weekend ! From rysz_c3 at yahoo.com Sat Aug 23 03:54:23 2003 From: rysz_c3 at yahoo.com (Flywheel) Date: Sat Aug 23 06:54:01 2003 Subject: [Bioperl-l] Energy Storage Message-ID: <200308231053.h7NArdvT002341@localhost.localdomain> Nice Hello from Flywheel Storage & Sun Tracking Good Bay to Rolling Blackout forever Safety Solution Rolling Blackout will always happened, since the SYSTEM - is design to protect itself in dangerous Over-Current Situations. Breakers shut down basic duty is defense whole system, and isolate from Grid; Bad "wave of over-current" travel to next Sub-Station and stimulate the same results on another, then another, then another. And a VOLTAGE is growing rapidly. Rolling Growing Disconnection in Domino Effect; we did seen twice; Accident ~ 30 years ago (baby boom) - for purpose recently. To decrease all Over-Current situation in POWER LINE or reduce to Zero ALL EXCESS ENERGY USE Energy Storage 25 - 1000 MWh and no Sub - Station will ever have any risk again. Small units UPS 20 - 200 kWh protect Stories, mainframes, servers, computers and homes, give back emergency energy for hours & days and/or time for diesel generator to work Rest you can find on: www.suntracking.com www.flywheelstorage.com Have a nice Weekend ! From maasha at image.dk Mon Aug 25 10:14:39 2003 From: maasha at image.dk (Martin A. Hansen) Date: Mon Aug 25 10:15:03 2003 Subject: [Bioperl-l] Bio::SeachIO::Fasta problem Message-ID: <20030825141439.GB1558@image> hi im trying to parse fasta search reports with Bio::SeachIO. however, i get this warning message: maasha@homer:~/bin$ parse_fasta btg1.fasta -------------------- WARNING --------------------- MSG: unrecognized FASTA Family report file! --------------------------------------------------- this indicates that there might be something wrong with the fasta report file, but im not sure what that could be. im i supposed to run a certain version of fasta? and with a certain set of options? e.g. i have noticed that running fasta from the wisconsin packages (GCG) outputs a double dot (..) between the introtext and the data: The best scores are: init1 initn opt z-sc E(7402).. whereas running "normal" fasta does not produce the double dot? and to really twist the fork i am failing in identifying the different fasta versions :/ anyways, here is the snippet of code im using to parse: #!/usr/bin/perl -w use strict; use Bio::SearchIO; my ( $script, $usage, $file ); $script = ( split "/", $0 )[ -1 ]; $usage = qq( $script by Martin A. Hansen, August 2003. $script parses a FASTA report file Usage: $script [file] [file] - file with fasta report ); print $usage and exit if not @ARGV; $file = shift @ARGV; # >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MAIN <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< my ( $lines ); $lines = &parse_fasta( $file ); print "$_\n" foreach @{ $lines }; exit; # >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SUBROUTINES <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< sub parse_fasta { # Martin A. Hansen, August 2003. # parses blast reports using Bioperl my ( $file, # file with blast report ) = @_; # returns list of sequence lines my ( $result, $hit, $hit_name, $searchio, $white_space, $query_beg, $hsp, $hit_string, @lines, $query_string, $query_name ); $searchio = new Bio::SearchIO ( -format => 'fasta', -file => $file ); $result = $searchio->next_result; while ( $hit = $result->next_hit ) { $query_name = $result->query_name; $hit_name = $hit->name; $hsp = $hit->next_hsp; $query_string = $hsp->query_string; $query_beg = $hsp->query->start; $hit_string = $hsp->hit_string; $white_space = ' ' x ( $query_beg - 1 ); push @lines, { "QUERY_NAME" => $query_name, "QUERY_STRING" => $white_space . $query_string, "SUBJECT_NAME" => $hit_name, "SUBJECT_STRING" => $white_space . $hit_string, } } return wantarray ? @lines : \@lines; } # >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< __END__ any suggestions? martin From etbridge at tuethkeeney.com Mon Aug 25 11:37:51 2003 From: etbridge at tuethkeeney.com (Eileen T. Bridge) Date: Mon Aug 25 11:36:55 2003 Subject: [Bioperl-l] blastall parser Message-ID: Tony, We are trying to reach you to forward some documents to you. Please call us as soon as possible! Eileen Bridge Tueth, Keeney, Cooper, Mohan & Jackstadt, P.C. 425 South Woods Mill Road, Suite 300 St. Louis, MO 63017-3492 ph: 636-237-2570 fax: 636-237-2601 email: ebridge@tuethkeeney.com From jason at cgt.duhs.duke.edu Mon Aug 25 13:52:42 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Aug 25 13:31:07 2003 Subject: [Bioperl-l] Bio::SeachIO::Fasta problem In-Reply-To: <20030825141439.GB1558@image> References: <20030825141439.GB1558@image> Message-ID: Martin - it's tested on FASTA 3.4 and some versions of 3.3. It can parse the -m 9 tabluar output as well as standard default output (with or without Histograms). Personally I would just use the latest distribution: ftp://ftp.virginia.edu/pub/fasta/fasta3.shar.Z It has not been tested with the GCG-ized FASTA and as you report it doesn't seem to work. I took the liberty of posting a bug report for you with an example report as this is the type of information needed for someone to diagnose a problem. I don't know that fixing this will get a priority given that it is pretty easy to install and run FASTA directly from Bill's distro and we can parse that output just fine. -jason On Mon, 25 Aug 2003, Martin A. Hansen wrote: > hi > > im trying to parse fasta search reports with Bio::SeachIO. however, i get this > warning message: > > maasha@homer:~/bin$ parse_fasta btg1.fasta > > -------------------- WARNING --------------------- > MSG: unrecognized FASTA Family report file! > --------------------------------------------------- > > this indicates that there might be something wrong with the fasta report file, > but im not sure what that could be. im i supposed to run a certain version of > fasta? and with a certain set of options? e.g. i have noticed that running > fasta from the wisconsin packages (GCG) outputs a double dot (..) between the > introtext and the data: > > The best scores are: init1 initn opt z-sc E(7402).. > > whereas running "normal" fasta does not produce the double dot? > > and to really twist the fork i am failing in identifying the different fasta > versions :/ > > anyways, here is the snippet of code im using to parse: > > > #!/usr/bin/perl -w > > use strict; > use Bio::SearchIO; > > my ( $script, $usage, $file ); > > $script = ( split "/", $0 )[ -1 ]; > > $usage = qq( > > $script by Martin A. Hansen, August 2003. > > $script parses a FASTA report file > > Usage: $script [file] > [file] - file with fasta report > > ); > > print $usage and exit if not @ARGV; > > $file = shift @ARGV; > > > # >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MAIN <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< > > > my ( $lines ); > > $lines = &parse_fasta( $file ); > > print "$_\n" foreach @{ $lines }; > > exit; > > > # >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SUBROUTINES <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< > > > sub parse_fasta > { > # Martin A. Hansen, August 2003. > > # parses blast reports using Bioperl > > my ( $file, # file with blast report > ) = @_; > > # returns list of sequence lines > > my ( $result, $hit, $hit_name, $searchio, $white_space, $query_beg, $hsp, $hit_string, @lines, $query_string, $query_name ); > > $searchio = new Bio::SearchIO ( -format => 'fasta', -file => $file ); > $result = $searchio->next_result; > > while ( $hit = $result->next_hit ) > { > $query_name = $result->query_name; > $hit_name = $hit->name; > $hsp = $hit->next_hsp; > > $query_string = $hsp->query_string; > $query_beg = $hsp->query->start; > $hit_string = $hsp->hit_string; > > $white_space = ' ' x ( $query_beg - 1 ); > > push @lines, { > "QUERY_NAME" => $query_name, > "QUERY_STRING" => $white_space . $query_string, > "SUBJECT_NAME" => $hit_name, > "SUBJECT_STRING" => $white_space . $hit_string, > } > } > > return wantarray ? @lines : \@lines; > } > > > > > # >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< > > > __END__ > > > > any suggestions? > > > martin > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From ajm6q at virginia.edu Mon Aug 25 13:39:32 2003 From: ajm6q at virginia.edu (Aaron J Mackey) Date: Mon Aug 25 13:38:34 2003 Subject: [Bioperl-l] Bio::SeachIO::Fasta problem In-Reply-To: Message-ID: On Mon, 25 Aug 2003, Jason Stajich wrote: > I don't know that fixing this will get a priority given that it is pretty > easy to install and run FASTA directly from Bill's distro and we can parse > that output just fine. Let me also chime in and make it clear that parsing GCG-variant FASTA (or anything else, for that matter) will probably never make it to the top of my TODO list. Just in case anyone was holding their breath ;) But I wonder if the GCG people might be persuaded ... -Aaron From djoubert at mail.mcg.edu Mon Aug 25 16:13:49 2003 From: djoubert at mail.mcg.edu (Douglas Joubert) Date: Mon Aug 25 16:56:35 2003 Subject: [Bioperl-l] Bio::SeachIO::Fasta problem Message-ID: Greetings, I too received the "> MSG: unrecognized FASTA Family report file!" error when I was "attempting" to demonstrate a snipet of code that I lifted from one of Jason's ppt presentations (GenomeInformatics2002). I am a librarian, not a programmer, therefore I assumed I had incorrectly installed BioPerl. My fasta file was Blast output, that I outputted to fasta format, is this not the correct way to use this module. My text file started with >gi|14625690|emb|AL591499.7| so I thought I was OK My question is this, what exactly does the hyperlink provided by Jason install? Cheers DJJ Douglas Joubert, M.L.I.S. Instructor and Digital Information Librarian Robert B. Greenblatt M.D. Library Medical College of Georgia Augusta, GA 30912-4400 >>> Jason Stajich 8/25/2003 1:52:42 PM >>> Martin - it's tested on FASTA 3.4 and some versions of 3.3. It can parse the -m 9 tabluar output as well as standard default output (with or without Histograms). Personally I would just use the latest distribution: ftp://ftp.virginia.edu/pub/fasta/fasta3.shar.Z It has not been tested with the GCG-ized FASTA and as you report it doesn't seem to work. I took the liberty of posting a bug report for you with an example report as this is the type of information needed for someone to diagnose a problem. I don't know that fixing this will get a priority given that it is pretty easy to install and run FASTA directly from Bill's distro and we can parse that output just fine. -jason On Mon, 25 Aug 2003, Martin A. Hansen wrote: > hi > > im trying to parse fasta search reports with Bio::SeachIO. however, i get this > warning message: > > maasha@homer:~/bin$ parse_fasta btg1.fasta > > -------------------- WARNING --------------------- > MSG: unrecognized FASTA Family report file! > --------------------------------------------------- > > this indicates that there might be something wrong with the fasta report file, > but im not sure what that could be. im i supposed to run a certain version of > fasta? and with a certain set of options? e.g. i have noticed that running > fasta from the wisconsin packages (GCG) outputs a double dot (..) between the > introtext and the data: > > The best scores are: init1 initn opt z-sc E(7402).. > > whereas running "normal" fasta does not produce the double dot? > > and to really twist the fork i am failing in identifying the different fasta > versions :/ > > anyways, here is the snippet of code im using to parse: > > > #!/usr/bin/perl -w > > use strict; > use Bio::SearchIO; > > my ( $script, $usage, $file ); > > $script = ( split "/", $0 )[ -1 ]; > > $usage = qq( > > $script by Martin A. Hansen, August 2003. > > $script parses a FASTA report file > > Usage: $script [file] > [file] - file with fasta report > > ); > > print $usage and exit if not @ARGV; > > $file = shift @ARGV; > > > # >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MAIN <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< > > > my ( $lines ); > > $lines = &parse_fasta( $file ); > > print "$_\n" foreach @{ $lines }; > > exit; > > > # >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SUBROUTINES <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< > > > sub parse_fasta > { > # Martin A. Hansen, August 2003. > > # parses blast reports using Bioperl > > my ( $file, # file with blast report > ) = @_; > > # returns list of sequence lines > > my ( $result, $hit, $hit_name, $searchio, $white_space, $query_beg, $hsp, $hit_string, @lines, $query_string, $query_name ); > > $searchio = new Bio::SearchIO ( -format => 'fasta', -file => $file ); > $result = $searchio->next_result; > > while ( $hit = $result->next_hit ) > { > $query_name = $result->query_name; > $hit_name = $hit->name; > $hsp = $hit->next_hsp; > > $query_string = $hsp->query_string; > $query_beg = $hsp->query->start; > $hit_string = $hsp->hit_string; > > $white_space = ' ' x ( $query_beg - 1 ); > > push @lines, { > "QUERY_NAME" => $query_name, > "QUERY_STRING" => $white_space . $query_string, > "SUBJECT_NAME" => $hit_name, > "SUBJECT_STRING" => $white_space . $hit_string, > } > } > > return wantarray ? @lines : \@lines; > } > > > > > # >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< > > > __END__ > > > > any suggestions? > > > martin > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From marcelvb at nikhef.nl Mon Aug 25 17:07:21 2003 From: marcelvb at nikhef.nl (Marcel van Batenburg) Date: Mon Aug 25 17:09:16 2003 Subject: [Bioperl-l] Re: load_gff.pl question In-Reply-To: <1060197700.1431.11.camel@localhost.localdomain> Message-ID: Hi Shin, I would not even use load_gff.pl for so many lines. Try bulk_load_gff.pl (ab initio) or fast_load_gff.pl (appending). Greetings, Marcel On 6 Aug 2003, Scott Cain wrote: > Shin, > > The problem you are running into is not really with load_gff.pl, but > with the database schema. Assuming you are using MySQL, the table > create statement for fdata looks like this: > > create table fdata ( > fid int not null auto_increment, > fref varchar(100) not null, > fstart int unsigned not null, > fstop int unsigned not null, > fbin double(20,6) not null, > ftypeid int not null, > fscore float, > fstrand enum('+','-'), > fphase enum('0','1','2'), > gid int not null, > ftarget_start int unsigned, > ftarget_stop int unsigned, > primary key(fid), > unique index(fref,fbin,fstart,fstop,ftypeid,gid), > index(ftypeid), > index(gid) > > The problem you have is with that unique index on > (fref,fbin,fstart,fstop,ftypeid,gid). This index conflicts with your > data, in that the similar lines are getting assigned the same gid (group > id), since they look like the same thing. So, the quick way to fix this > is to remove the 'unique' from the index declaration. That can be found > in Bio/DB/GFF/Adaptor/dbi/mysql.pm. Then run load_gff.pl as usual. The > longer way to fix this is look at your data and figure out why they are > all getting assigned the same group id and make them sufficiently > different so that they don't. > > Hope that helps, > Scott > > On Wed, 2003-08-06 at 13:31, bioperl-l-request@portal.open-bio.org > wrote: > > Where do I start to customize this script to allow loading of large > > number of similar entities? > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. cain@cshl.org > GMOD Coordinator (http://www.gmod.org/) 216-392-3087 > Cold Spring Harbor Laboratory > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jamie at genome.arizona.edu Mon Aug 25 18:36:19 2003 From: jamie at genome.arizona.edu (Jamie Hatfield) Date: Mon Aug 25 18:36:20 2003 Subject: [Bioperl-l] Re: Bio::FPC In-Reply-To: <1060878865.562.26.camel@motox> References: <1060878865.562.26.camel@motox> Message-ID: <1061850911.562.96.camel@motox> Again, how do I go about submitting this? On Thu, 2003-08-14 at 09:34, Jamie Hatfield wrote: > Yes, actually. We are just now finishing up the fpc parser. I was > planning on soon asking the group how I would go about submitting it? > It consists of 5 modules that we have put in the MapIO and Map > namespaces. > Bio::MapIO::fpc.pm > Bio::Map::physical.pm > Bio::Map::fpcmarker.pm (sorry, but marker doesn't work) > Bio::Map::clone.pm > Bio::Map::contig.pm > > If you want to see how this object might be used, check out > http://www.genome.arizona.edu/software/fpc/biofpc/index.html > > You'll see there documentation for the modules, and a few test cases or > example usages. > > Also, we are trying to make a generic converter to let you load in a fpc > file and generate the necessary GFF for GBrowse to display the fpc map. > It's a quite simple display of the clones, markers, and contigs, but > maybe that will be usefull as an alternative to WebFPC (a java view only > version of fpc). It works for us, but might not work for everybody. We > should be able to patch it up, though, if it's missing features. > > So, anyways, if somebody can let me know how to go about submitting it, > we'll start the process. I looked through the FAQ and it basically said > to just post information if you have a module that you would like to > contribute, so, here's the information. > > Jamie From jmfreeman at comcast.net Mon Aug 25 21:15:21 2003 From: jmfreeman at comcast.net (James Freeman) Date: Mon Aug 25 21:12:31 2003 Subject: [Bioperl-l] Fwd: [Volunteer] submitting modules Message-ID: See below. Begin forwarded message: > From: "Samuel Thoraval" > Date: Mon Aug 25, 2003 19:46:03 US/Eastern > To: volunteer@open-bio.org > Subject: [Volunteer] submitting modules > > Hello, > > I have written a module for the Bioperl/Pise API. > I would like to submit it. > To which email should i send the details, code and request to ? > > Regards, > > Samuel Thoraval > _______________________________________________ > Volunteer mailing list > Volunteer@open-bio.org > http://open-bio.org/mailman/listinfo/volunteer > From jason at cgt.duhs.duke.edu Mon Aug 25 22:03:31 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Aug 25 21:41:49 2003 Subject: [Bioperl-l] Re: Bio::FPC In-Reply-To: <1061850911.562.96.camel@motox> References: <1060878865.562.26.camel@motox> <1061850911.562.96.camel@motox> Message-ID: For now, submit as a feature request to http://bugzilla.open-bio.org/ Attach the code as a tarball after you have submitted the request. A core dev will look it over and assuming all is well get you set up with a CVS account. -jason On Mon, 25 Aug 2003, Jamie Hatfield wrote: > Again, how do I go about submitting this? > > On Thu, 2003-08-14 at 09:34, Jamie Hatfield wrote: > > Yes, actually. We are just now finishing up the fpc parser. I was > > planning on soon asking the group how I would go about submitting it? > > It consists of 5 modules that we have put in the MapIO and Map > > namespaces. > > Bio::MapIO::fpc.pm > > Bio::Map::physical.pm > > Bio::Map::fpcmarker.pm (sorry, but marker doesn't work) > > Bio::Map::clone.pm > > Bio::Map::contig.pm > > > > If you want to see how this object might be used, check out > > http://www.genome.arizona.edu/software/fpc/biofpc/index.html > > > > You'll see there documentation for the modules, and a few test cases or > > example usages. > > > > Also, we are trying to make a generic converter to let you load in a fpc > > file and generate the necessary GFF for GBrowse to display the fpc map. > > It's a quite simple display of the clones, markers, and contigs, but > > maybe that will be usefull as an alternative to WebFPC (a java view only > > version of fpc). It works for us, but might not work for everybody. We > > should be able to patch it up, though, if it's missing features. > > > > So, anyways, if somebody can let me know how to go about submitting it, > > we'll start the process. I looked through the FAQ and it basically said > > to just post information if you have a module that you would like to > > contribute, so, here's the information. > > > > Jamie > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Mon Aug 25 22:04:26 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Aug 25 21:42:47 2003 Subject: [Bioperl-l] Fwd: [Volunteer] submitting modules In-Reply-To: References: Message-ID: submit it as a feature request at http://bugzilla.open-bio.org. I'll leave it up to Catherine Letondal to make the okay for adding the code. -jason On Mon, 25 Aug 2003, James Freeman wrote: > See below. > > Begin forwarded message: > > > From: "Samuel Thoraval" > > Date: Mon Aug 25, 2003 19:46:03 US/Eastern > > To: volunteer@open-bio.org > > Subject: [Volunteer] submitting modules > > > > Hello, > > > > I have written a module for the Bioperl/Pise API. > > I would like to submit it. > > To which email should i send the details, code and request to ? > > > > Regards, > > > > Samuel Thoraval > > _______________________________________________ > > Volunteer mailing list > > Volunteer@open-bio.org > > http://open-bio.org/mailman/listinfo/volunteer > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From fangl at genomics.org.cn Tue Aug 26 05:49:24 2003 From: fangl at genomics.org.cn (Magic Fang) Date: Mon Aug 25 21:45:16 2003 Subject: [Bioperl-l] abott gff manipunation Message-ID: <200308260945671.SM01140@magicnb> dear my colleague, i am now manipunating a GFF file with bioperl the content is like: #Seq. Source Feature Start End Score Strand Phase Group CNS06C8G blat similarity 201148 202447 100 + . product 16S ribosomal RNA, chromosome I of strain GB-M1 of Encephalitozoon cuniculi (Microspora). CNS06C8G blat similarity 7536 8835 100 - . product 16S ribosomal RNA, chromosome I of strain GB-M1 of Encephalitozoon cuniculi (Microspora). CNS06C8G blat similarity 202483 204969 100 + . product 5.8S-23S ribosomal RNA, chromosome I of strain GB-M1 of Encephalitozoon cuniculi (Microspora). my code is like: #!/usr/bin/perl use Bio::Tools::GFF; my $gff = Bio::Tools::GFF->new(-fh => \*STDIN, -gff_version => 2); while($feat = $gff->next_feature()) { print $feat->seq_id, "\t", $feat->source_tag, "\t", $feat->primary_tag, "\t", $feat->start, "\t", $feat->end, "\t", $feat->frame, "\t", $feat->score, "\t", $feat->strand, "\n"; } $gff->close(); my question is how to get the info. in the group column. thank u. From jason at cgt.duhs.duke.edu Mon Aug 25 22:21:05 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Mon Aug 25 21:59:36 2003 Subject: [Bioperl-l] abott gff manipunation In-Reply-To: <200308260945671.SM01140@magicnb> References: <200308260945671.SM01140@magicnb> Message-ID: We store this as tag/value pairs, since all the lines start with 'product' you can get the data via my @productdata = $feature->get_tag_values('product'); Note an API change - for pre-bioperl 1.2 this was my @productdata = $feature->each_tag_value('product'); (although the old API is still supported for now) But having the group column being basically free-text is bad idea, read up on the GFF format, you should try and provide a qualifier for each field there. http://www.sanger.ac.uk/Software/GFF/ http://www.sanger.ac.uk/Software/GFF/GFF_Spec.html -jason On Tue, 26 Aug 2003, Magic Fang wrote: > dear my colleague, > i am now manipunating a GFF file with bioperl > the content is like: > #Seq. Source Feature Start End Score Strand Phase Group > CNS06C8G blat similarity 201148 202447 100 + . product 16S ribosomal RNA, chromosome I of strain GB-M1 of Encephalitozoon cuniculi (Microspora). > CNS06C8G blat similarity 7536 8835 100 - . product 16S ribosomal RNA, chromosome I of strain GB-M1 of Encephalitozoon cuniculi (Microspora). > CNS06C8G blat similarity 202483 204969 100 + . product 5.8S-23S ribosomal RNA, chromosome I of strain GB-M1 of Encephalitozoon cuniculi (Microspora). > > my code is like: > #!/usr/bin/perl > > use Bio::Tools::GFF; > my $gff = Bio::Tools::GFF->new(-fh => \*STDIN, -gff_version => 2); > while($feat = $gff->next_feature()) { > print $feat->seq_id, "\t", > $feat->source_tag, "\t", > $feat->primary_tag, "\t", > $feat->start, "\t", > $feat->end, "\t", > $feat->frame, "\t", > $feat->score, "\t", > $feat->strand, "\n"; > } > $gff->close(); > > my question is how to get the info. in the group column. > thank u. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From fangl at genomics.org.cn Tue Aug 26 11:51:02 2003 From: fangl at genomics.org.cn (Magic Fang) Date: Tue Aug 26 03:46:50 2003 Subject: [Bioperl-l] about gff format to feature table format Message-ID: <200308261547218.SM01140@magicnb> dear my colleagues i have a gff format file, such as: #Seq. Source Feature Start End Score Strand Phase Group CNS06C8G GlimmerM Terminal 160 228 1 - 0 Gene 1 CNS06C8G GlimmerM Internal 3771 3924 1 - 0 Gene 1 CNS06C8G GlimmerM Initial 5902 5915 1 - 1 Gene 1 CNS06C8G GlimmerM Initial 6330 6555 1 + 0 Gene 2 CNS06C8G GlimmerM Internal 6626 6706 1 + 1 Gene 2 CNS06C8G GlimmerM Internal 6913 7347 1 + 1 Gene 2 CNS06C8G GlimmerM Internal 7896 7975 1 + 1 Gene 2 CNS06C8G GlimmerM Internal 8131 8160 1 + 0 Gene 2 CNS06C8G GlimmerM Internal 12054 12113 1 + 0 Gene 2 CNS06C8G GlimmerM Internal 15857 16217 1 + 0 Gene 2 CNS06C8G GlimmerM Terminal 16377 16657 1 + 1 Gene 2 CNS06C8G GlimmerM Terminal 17607 17732 1 - 0 Gene 3 CNS06C8G GlimmerM Internal 17813 18361 1 - 0 Gene 3 CNS06C8G GlimmerM Initial 18659 18844 1 - 0 Gene 3 CNS06C8G GlimmerM Initial 19172 19268 1 + 0 Gene 4 can bioperl merge the entries belong to one gene to one sequence feature, when i use Bio::Tools::GFF? thank you. From letondal at pasteur.fr Tue Aug 26 04:57:22 2003 From: letondal at pasteur.fr (Catherine Letondal) Date: Tue Aug 26 04:56:30 2003 Subject: [Bioperl-l] Fwd: [Volunteer] submitting modules (PiseWorkflow.pm) In-Reply-To: ; from jason@cgt.duhs.duke.edu on Mon, Aug 25, 2003 at 10:04:26PM -0400 References:

Message-ID: <20030826105722.A399257@electre.pasteur.fr> On Mon, Aug 25, 2003 at 10:04:26PM -0400, Jason Stajich wrote: > submit it as a feature request at http://bugzilla.open-bio.org. > > I'll leave it up to Catherine Letondal to make the okay for adding the > code. I'm rather Ok to add the code in CVS if it is ok with you. This is a nice and light-weight module for building a workflow. I have tested it and discuss with Samuel about its features. For instance, say you to build a workflow with an alignment and a phylogeny: you first instantiate the applications and set parameters, without running them. You then build the workflow and run it. Parallel branches of the workflow are submitted in parallel. use Bio::Tools::Run::AnalysisFactory::Pise; use Bio::Tools::Run::PiseWorkflow; my $factory = new Bio::Tools::Run::AnalysisFactory::Pise(); my $clustalw = $factory->program('clustalw'); $clustalw->infile($ARGV[0]); my $protdist = $factory->program('protdist'); my $protpars = $factory->program('protpars'); my $fitch = $factory->program('fitch'); my $consense = $factory->program('consense'); my $workflow = Bio::Tools::Run::PiseWorkflow->new(); $workflow->addpipe(-method => $clustalw, -tomethod => $protdist, -pipetype => 'readseq_ok_alig'); $workflow->addpipe(-method => $clustalw, -tomethod => $protpars, -pipetype => 'readseq_ok_alig'); $workflow->addpipe(-method => $protdist, -tomethod => $fitch, -pipetype => 'phylip_dist'); $workflow->addpipe(-method => $protpars, -tomethod => $consense, -pipetype => 'phylip_tree'); $workflow->run(); Results are reported in 2 forms: an HTML file that is created at the beginning of the run, containing jobs status and link to the results, and a perl file containing this information in perl format. Errors a just reported. So, it's really lightweight, compared to biopipe for instance, but I believe, still very useful. > > -jason > > On Mon, 25 Aug 2003, James Freeman wrote: > > > See below. > > > > Begin forwarded message: > > > > > From: "Samuel Thoraval" > > > Date: Mon Aug 25, 2003 19:46:03 US/Eastern > > > To: volunteer@open-bio.org > > > Subject: [Volunteer] submitting modules > > > > > > Hello, > > > > > > I have written a module for the Bioperl/Pise API. > > > I would like to submit it. > > > To which email should i send the details, code and request to ? > > > > > > Regards, > > > > > > Samuel Thoraval -- Catherine Letondal -- Pasteur Institute Computing Center From fangl at genomics.org.cn Tue Aug 26 13:10:11 2003 From: fangl at genomics.org.cn (Magic Fang) Date: Tue Aug 26 05:06:02 2003 Subject: [Bioperl-l] about gff format to feature table format Message-ID: <200308261706968.SM01140@magicnb> dear my colleagues i have a gff format file, such as: #Seq. Source Feature Start End Score Strand Phase Group CNS06C8G GlimmerM Terminal 160 228 1 - 0 Gene 1 CNS06C8G GlimmerM Internal 3771 3924 1 - 0 Gene 1 CNS06C8G GlimmerM Initial 5902 5915 1 - 1 Gene 1 CNS06C8G GlimmerM Initial 6330 6555 1 + 0 Gene 2 CNS06C8G GlimmerM Internal 6626 6706 1 + 1 Gene 2 CNS06C8G GlimmerM Internal 6913 7347 1 + 1 Gene 2 CNS06C8G GlimmerM Internal 7896 7975 1 + 1 Gene 2 CNS06C8G GlimmerM Internal 8131 8160 1 + 0 Gene 2 CNS06C8G GlimmerM Internal 12054 12113 1 + 0 Gene 2 CNS06C8G GlimmerM Internal 15857 16217 1 + 0 Gene 2 CNS06C8G GlimmerM Terminal 16377 16657 1 + 1 Gene 2 CNS06C8G GlimmerM Terminal 17607 17732 1 - 0 Gene 3 CNS06C8G GlimmerM Internal 17813 18361 1 - 0 Gene 3 CNS06C8G GlimmerM Initial 18659 18844 1 - 0 Gene 3 CNS06C8G GlimmerM Initial 19172 19268 1 + 0 Gene 4 can bioperl merge the entries belong to one gene to one sequence feature, when i use Bio::Tools::GFF? thank you. From brian_osborne at cognia.com Tue Aug 26 08:28:56 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Tue Aug 26 08:31:49 2003 Subject: [Bioperl-l] Re: Bio::FPC In-Reply-To: <1061850911.562.96.camel@motox> Message-ID: Jamie, One of the challenges in Bioperl is creating a single coherent set of modules from the many individual contributions. Could you tell us a bit about your modules and how they overlap functionally with the existing modules in Bio::Map? If you take a look at those modules you can see that a good number of the more steadfast Bioperl authors have contributed to Bio::Map, I'm sure that they'd like to see your modules integrate neatly with the existing code. I'm not one of these authors, I'm simply responding because it seems that you'd like to get some discussion going. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jamie Hatfield Sent: Monday, August 25, 2003 6:35 PM To: BioPerl-List Subject: Re: [Bioperl-l] Re: Bio::FPC Again, how do I go about submitting this? On Thu, 2003-08-14 at 09:34, Jamie Hatfield wrote: > Yes, actually. We are just now finishing up the fpc parser. I was > planning on soon asking the group how I would go about submitting it? > It consists of 5 modules that we have put in the MapIO and Map > namespaces. > Bio::MapIO::fpc.pm > Bio::Map::physical.pm > Bio::Map::fpcmarker.pm (sorry, but marker doesn't work) > Bio::Map::clone.pm > Bio::Map::contig.pm > > If you want to see how this object might be used, check out > http://www.genome.arizona.edu/software/fpc/biofpc/index.html > > You'll see there documentation for the modules, and a few test cases or > example usages. > > Also, we are trying to make a generic converter to let you load in a fpc > file and generate the necessary GFF for GBrowse to display the fpc map. > It's a quite simple display of the clones, markers, and contigs, but > maybe that will be usefull as an alternative to WebFPC (a java view only > version of fpc). It works for us, but might not work for everybody. We > should be able to patch it up, though, if it's missing features. > > So, anyways, if somebody can let me know how to go about submitting it, > we'll start the process. I looked through the FAQ and it basically said > to just post information if you have a module that you would like to > contribute, so, here's the information. > > Jamie _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From jamie at genome.arizona.edu Tue Aug 26 11:47:40 2003 From: jamie at genome.arizona.edu (Jamie Hatfield) Date: Tue Aug 26 11:47:41 2003 Subject: [Bioperl-l] Re: Bio::FPC In-Reply-To: References: Message-ID: <1061912784.562.112.camel@motox> Yes, definitly, discussion is great! We had a little bit of a discussion about this back in November 2002, when I proposed the idea, and it was suggested by Heikki to try to fit it into either Bio::Map or Bio::Assembly. Maybe I picked the wrong one? How about this... I will describe a little bit about what fpc is, for those who don't know, and those who know Bio::Map and Bio::Assembly will tell me if it fits in their design. ok? FPC stands for FingerPrinted Contigs. Its main purpose is to assemble clones into contiguous regions of overlaps, based on the fingerprint of the clones. These fingerprints can be from agarose (sp?) gels, or HICF, or simulated, or whatever. Maybe this is more like Assembly? Anyway, you have the clones, and there are also markers that hit the clones, and aid in assembling the clones into contigs. These are the main 3 classes. Clones, Contigs, Markers. Contigs contain Clones. Markers hit Clones. Clones are hit by markers. Contigs 1--m Clones Markers m--m Clones Is that a sufficient description of FPC, or do we need more to make a good decision? Thanks for initiating the discussion, Brian. Jamie On Tue, 2003-08-26 at 05:28, Brian Osborne wrote: > Jamie, > > One of the challenges in Bioperl is creating a single coherent set of > modules from the many individual contributions. Could you tell us a bit > about your modules and how they overlap functionally with the existing > modules in Bio::Map? If you take a look at those modules you can see that a > good number of the more steadfast Bioperl authors have contributed to > Bio::Map, I'm sure that they'd like to see your modules integrate neatly > with the existing code. > > I'm not one of these authors, I'm simply responding because it seems that > you'd like to get some discussion going. > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jamie Hatfield > Sent: Monday, August 25, 2003 6:35 PM > To: BioPerl-List > Subject: Re: [Bioperl-l] Re: Bio::FPC > > Again, how do I go about submitting this? > > On Thu, 2003-08-14 at 09:34, Jamie Hatfield wrote: > > Yes, actually. We are just now finishing up the fpc parser. I was > > planning on soon asking the group how I would go about submitting it? > > It consists of 5 modules that we have put in the MapIO and Map > > namespaces. > > Bio::MapIO::fpc.pm > > Bio::Map::physical.pm > > Bio::Map::fpcmarker.pm (sorry, but marker doesn't work) > > Bio::Map::clone.pm > > Bio::Map::contig.pm > > > > If you want to see how this object might be used, check out > > http://www.genome.arizona.edu/software/fpc/biofpc/index.html > > > > You'll see there documentation for the modules, and a few test cases or > > example usages. > > > > Also, we are trying to make a generic converter to let you load in a fpc > > file and generate the necessary GFF for GBrowse to display the fpc map. > > It's a quite simple display of the clones, markers, and contigs, but > > maybe that will be usefull as an alternative to WebFPC (a java view only > > version of fpc). It works for us, but might not work for everybody. We > > should be able to patch it up, though, if it's missing features. > > > > So, anyways, if somebody can let me know how to go about submitting it, > > we'll start the process. I looked through the FAQ and it basically said > > to just post information if you have a module that you would like to > > contribute, so, here's the information. > > > > Jamie > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From brian_osborne at cognia.com Tue Aug 26 12:39:56 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Tue Aug 26 12:43:22 2003 Subject: [Bioperl-l] Re: Bio::FPC In-Reply-To: <1061912784.562.112.camel@motox> Message-ID: Jamie, And a "marker" can be a genetic marker, yes? A la Bio::Map::Marker? If you take a look at this module you'll see that its definition of marker allows any marker to have different positions in different maps (contig "map", genetic map, physical map). This seems to overlap with your notion of marker. Here's my first impression. There's a parser, MapIO::mapmaker for mapmaker, mapmaker makes maps from segregation data, genetic data. Your fpc makes physical maps, yet physical and genetic maps can be merged to create "integrated maps". Your fpcmarker must be closely related to Bio::Map::Marker, in fact it's not clear that there should be an fpcmarker. I would think that a Marker object could be a reasonably rich one, and it could be created by fpc or any other program, it really shouldn't matter much how it's created (in fact, all this new PopGen code must be ordering markers to make maps, I'd think). Perhaps you should be using some of the existing code in Bio/Map? Your thoughts? Brian O. -----Original Message----- From: Jamie Hatfield [mailto:jamie@genome.arizona.edu] Sent: Tuesday, August 26, 2003 11:46 AM To: Brian Osborne Cc: BioPerl-List Subject: RE: [Bioperl-l] Re: Bio::FPC Yes, definitly, discussion is great! We had a little bit of a discussion about this back in November 2002, when I proposed the idea, and it was suggested by Heikki to try to fit it into either Bio::Map or Bio::Assembly. Maybe I picked the wrong one? How about this... I will describe a little bit about what fpc is, for those who don't know, and those who know Bio::Map and Bio::Assembly will tell me if it fits in their design. ok? FPC stands for FingerPrinted Contigs. Its main purpose is to assemble clones into contiguous regions of overlaps, based on the fingerprint of the clones. These fingerprints can be from agarose (sp?) gels, or HICF, or simulated, or whatever. Maybe this is more like Assembly? Anyway, you have the clones, and there are also markers that hit the clones, and aid in assembling the clones into contigs. These are the main 3 classes. Clones, Contigs, Markers. Contigs contain Clones. Markers hit Clones. Clones are hit by markers. Contigs 1--m Clones Markers m--m Clones Is that a sufficient description of FPC, or do we need more to make a good decision? Thanks for initiating the discussion, Brian. Jamie On Tue, 2003-08-26 at 05:28, Brian Osborne wrote: > Jamie, > > One of the challenges in Bioperl is creating a single coherent set of > modules from the many individual contributions. Could you tell us a bit > about your modules and how they overlap functionally with the existing > modules in Bio::Map? If you take a look at those modules you can see that a > good number of the more steadfast Bioperl authors have contributed to > Bio::Map, I'm sure that they'd like to see your modules integrate neatly > with the existing code. > > I'm not one of these authors, I'm simply responding because it seems that > you'd like to get some discussion going. > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jamie Hatfield > Sent: Monday, August 25, 2003 6:35 PM > To: BioPerl-List > Subject: Re: [Bioperl-l] Re: Bio::FPC > > Again, how do I go about submitting this? > > On Thu, 2003-08-14 at 09:34, Jamie Hatfield wrote: > > Yes, actually. We are just now finishing up the fpc parser. I was > > planning on soon asking the group how I would go about submitting it? > > It consists of 5 modules that we have put in the MapIO and Map > > namespaces. > > Bio::MapIO::fpc.pm > > Bio::Map::physical.pm > > Bio::Map::fpcmarker.pm (sorry, but marker doesn't work) > > Bio::Map::clone.pm > > Bio::Map::contig.pm > > > > If you want to see how this object might be used, check out > > http://www.genome.arizona.edu/software/fpc/biofpc/index.html > > > > You'll see there documentation for the modules, and a few test cases or > > example usages. > > > > Also, we are trying to make a generic converter to let you load in a fpc > > file and generate the necessary GFF for GBrowse to display the fpc map. > > It's a quite simple display of the clones, markers, and contigs, but > > maybe that will be usefull as an alternative to WebFPC (a java view only > > version of fpc). It works for us, but might not work for everybody. We > > should be able to patch it up, though, if it's missing features. > > > > So, anyways, if somebody can let me know how to go about submitting it, > > we'll start the process. I looked through the FAQ and it basically said > > to just post information if you have a module that you would like to > > contribute, so, here's the information. > > > > Jamie > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From birney at ebi.ac.uk Tue Aug 26 13:02:11 2003 From: birney at ebi.ac.uk (Ewan Birney) Date: Tue Aug 26 13:01:00 2003 Subject: [Bioperl-l] Re: Bio::FPC In-Reply-To: Message-ID: On Tue, 26 Aug 2003, Brian Osborne wrote: > Jamie, > > And a "marker" can be a genetic marker, yes? A la Bio::Map::Marker? If you > take a look at this module you'll see that its definition of marker allows > any marker to have different positions in different maps (contig "map", > genetic map, physical map). This seems to overlap with your notion of > marker. > > Here's my first impression. There's a parser, MapIO::mapmaker for mapmaker, > mapmaker makes maps from segregation data, genetic data. Your fpc makes > physical maps, yet physical and genetic maps can be merged to create > "integrated maps". Your fpcmarker must be closely related to > Bio::Map::Marker, in fact it's not clear that there should be an fpcmarker. > I would think that a Marker object could be a reasonably rich one, and it > could be created by fpc or any other program, it really shouldn't matter > much how it's created (in fact, all this new PopGen code must be ordering > markers to make maps, I'd think). Perhaps you should be using some of the > existing code in Bio/Map? Your thoughts? > Brian - I doubt the pop gen stuff will overlap at all with this stuff. but the marker comment is right, though I can well believe there needs to be specific FPC hooks for markers used for FPC stuff.... > Brian O. > > -----Original Message----- > From: Jamie Hatfield [mailto:jamie@genome.arizona.edu] > Sent: Tuesday, August 26, 2003 11:46 AM > To: Brian Osborne > Cc: BioPerl-List > Subject: RE: [Bioperl-l] Re: Bio::FPC > > Yes, definitly, discussion is great! > > We had a little bit of a discussion about this back in November 2002, > when I proposed the idea, and it was suggested by Heikki to try to fit > it into either Bio::Map or Bio::Assembly. Maybe I picked the wrong > one? How about this... I will describe a little bit about what fpc is, > for those who don't know, and those who know Bio::Map and Bio::Assembly > will tell me if it fits in their design. ok? > > FPC stands for FingerPrinted Contigs. Its main purpose is to assemble > clones into contiguous regions of overlaps, based on the fingerprint of > the clones. These fingerprints can be from agarose (sp?) gels, or HICF, > or simulated, or whatever. Maybe this is more like Assembly? > > Anyway, you have the clones, and there are also markers that hit the > clones, and aid in assembling the clones into contigs. These are the > main 3 classes. Clones, Contigs, Markers. Contigs contain Clones. > Markers hit Clones. Clones are hit by markers. > > Contigs 1--m Clones > Markers m--m Clones > > Is that a sufficient description of FPC, or do we need more to make a > good decision? > > Thanks for initiating the discussion, Brian. > > Jamie > > On Tue, 2003-08-26 at 05:28, Brian Osborne wrote: > > Jamie, > > > > One of the challenges in Bioperl is creating a single coherent set of > > modules from the many individual contributions. Could you tell us a bit > > about your modules and how they overlap functionally with the existing > > modules in Bio::Map? If you take a look at those modules you can see that > a > > good number of the more steadfast Bioperl authors have contributed to > > Bio::Map, I'm sure that they'd like to see your modules integrate neatly > > with the existing code. > > > > I'm not one of these authors, I'm simply responding because it seems that > > you'd like to get some discussion going. > > > > Brian O. > > > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org > > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jamie Hatfield > > Sent: Monday, August 25, 2003 6:35 PM > > To: BioPerl-List > > Subject: Re: [Bioperl-l] Re: Bio::FPC > > > > Again, how do I go about submitting this? > > > > On Thu, 2003-08-14 at 09:34, Jamie Hatfield wrote: > > > Yes, actually. We are just now finishing up the fpc parser. I was > > > planning on soon asking the group how I would go about submitting it? > > > It consists of 5 modules that we have put in the MapIO and Map > > > namespaces. > > > Bio::MapIO::fpc.pm > > > Bio::Map::physical.pm > > > Bio::Map::fpcmarker.pm (sorry, but marker doesn't work) > > > Bio::Map::clone.pm > > > Bio::Map::contig.pm > > > > > > If you want to see how this object might be used, check out > > > http://www.genome.arizona.edu/software/fpc/biofpc/index.html > > > > > > You'll see there documentation for the modules, and a few test cases or > > > example usages. > > > > > > Also, we are trying to make a generic converter to let you load in a fpc > > > file and generate the necessary GFF for GBrowse to display the fpc map. > > > It's a quite simple display of the clones, markers, and contigs, but > > > maybe that will be usefull as an alternative to WebFPC (a java view only > > > version of fpc). It works for us, but might not work for everybody. We > > > should be able to patch it up, though, if it's missing features. > > > > > > So, anyways, if somebody can let me know how to go about submitting it, > > > we'll start the process. I looked through the FAQ and it basically said > > > to just post information if you have a module that you would like to > > > contribute, so, here's the information. > > > > > > Jamie > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From saluja at medical-data-solutions.com Tue Aug 26 14:24:24 2003 From: saluja at medical-data-solutions.com (Sunil Saluja) Date: Tue Aug 26 14:24:25 2003 Subject: [Bioperl-l] remote blast Message-ID: <1061922355.10458.8.camel@localhost.localdomain> Has there been a recent change in the way in which ncbi responds to remote blast requests using the biperl module Bio::Tools::Run::RemoteBlast ? I had no problem using this module a couple of weeks ago, and I was able to run several queries with success. I am now not able to retreive anything. When I use known queries I still get nothing. If I go to the website and manually enter a query, everything works. I know that the pubmed servers no longer work with older versions of endnote (6.x), and that this is a very recent change. Have they done something recently which is making this remote blast module not work? Thanks, Sunil Saluja From Wiepert.Mathieu at mayo.edu Tue Aug 26 14:38:35 2003 From: Wiepert.Mathieu at mayo.edu (Wiepert, Mathieu) Date: Tue Aug 26 14:37:44 2003 Subject: [Bioperl-l] Blast ridline changed? RemoteBlast seems to be failing... Message-ID: <2F41CC6C9777D311ACBD009027B108EA06E9A676@excsrv32.mayo.edu> Hi, I was doing remote blasts, but they all fail saying RID not found. The RID format seems to have changed to include something like a .BLASTQ3 at the end, so the pattern in RemoteBlast doesn't match. $RIDLINE = 'RID\s+=\s+(\d+-\d+-\d+)'; I think the above could be changed to something like $RIDLINE = 'RID\s+=\s+(\d+-\d+-\d+\.BLASTQ\d)'; It would seem that not many people are using remoteblast or else this would have been noticed before? Or I could be doing something different, but it is a blastn with default parameters mostly. Also, not sure if the RID format will always get the BLASTQ3 at the end, or what the permanent pattern is. Does anyone know? Thanks, -mat From sunil.saluja at TCH.Harvard.edu Tue Aug 26 14:24:41 2003 From: sunil.saluja at TCH.Harvard.edu (Sunil Saluja) Date: Tue Aug 26 14:40:07 2003 Subject: [Bioperl-l] Remote Blast Message-ID: <244ae224ae.224ae244ae@tch.harvard.edu> Has there been a recent change in the way in which ncbi responds to remote blast requests using the biperl module Bio::Tools::Run::RemoteBlast ? I had no problem using this module a couple of weeks ago, and I was able to run several queries with success. I am now not able to retreive anything. When I use known queries I still get nothing. If I go to the website and manually enter a query, everything works. I know that the pubmed servers no longer work with older versions of endnote (6.x), and that this is a very recent change. Have they done something recently which is making this remote blast module not work? Thanks, Sunil Saluja Sunil K. Saluja MD Fellow in Medical Informatics Fellow in Neonatal-Perinatal Medicine Children's Hospital Boston Harvard Medical School From Wiepert.Mathieu at mayo.edu Tue Aug 26 14:49:27 2003 From: Wiepert.Mathieu at mayo.edu (Wiepert, Mathieu) Date: Tue Aug 26 14:48:42 2003 Subject: [Bioperl-l] remote blast Message-ID: <2F41CC6C9777D311ACBD009027B108EA06E9A679@excsrv32.mayo.edu> Hi, Guess I spoke to fast, someone else has the problem ;-) Can anyone confirm the format of the RIDLINE for us now? It is an easy fix to make, but I would like to make sure I have the pattern correctly noted. Thanks, -mat -----Original Message----- From: Sunil Saluja [mailto:saluja@medical-data-solutions.com] Sent: Tuesday, August 26, 2003 1:26 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] remote blast Has there been a recent change in the way in which ncbi responds to remote blast requests using the biperl module Bio::Tools::Run::RemoteBlast ? I had no problem using this module a couple of weeks ago, and I was able to run several queries with success. I am now not able to retreive anything. When I use known queries I still get nothing. If I go to the website and manually enter a query, everything works. I know that the pubmed servers no longer work with older versions of endnote (6.x), and that this is a very recent change. Have they done something recently which is making this remote blast module not work? Thanks, Sunil Saluja _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From quickster333 at hotmail.com Tue Aug 26 15:03:12 2003 From: quickster333 at hotmail.com (Johnny Amos) Date: Tue Aug 26 15:02:12 2003 Subject: [Bioperl-l] RemoteBlast failing? Message-ID: Hello, Was there some change with the handling of BLASTs at NCBI? I haven't tried for a couple of weeks, but today all my scripts are failing. Johnny _________________________________________________________________ Help protect your PC: Get a free online virus scan at McAfee.com. http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963 From Wiepert.Mathieu at mayo.edu Tue Aug 26 16:42:02 2003 From: Wiepert.Mathieu at mayo.edu (Wiepert, Mathieu) Date: Tue Aug 26 16:41:09 2003 Subject: [Bioperl-l] RemoteBlast failing? Message-ID: <2F41CC6C9777D311ACBD009027B108EA06E9A67F@excsrv32.mayo.edu> Hi, I got an email from NCBI on this, here is the response... "We are doing some internal reorganization and the RID is for us to keep track of the searches going through different setup. I believe after the reorganization is done, the RID should return to their old format. I will check with our developers to make should. If you use them to retrieve result in your script, I would suggest that the script be more tolerant and take the whole thing." So, guess we need to take in everything as suggested. I am somewhat surprised at the capriciousness of changing the exposed RID format while reorganizing, but not much can be done about it. -mat -----Original Message----- From: Johnny Amos [mailto:quickster333@hotmail.com] Sent: Tuesday, August 26, 2003 2:03 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] RemoteBlast failing? Hello, Was there some change with the handling of BLASTs at NCBI? I haven't tried for a couple of weeks, but today all my scripts are failing. Johnny _________________________________________________________________ Help protect your PC: Get a free online virus scan at McAfee.com. http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963 _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From serge at iem.sp.ru Wed Aug 27 09:15:30 2003 From: serge at iem.sp.ru (Sergey V. Orlov) Date: Wed Aug 27 09:14:41 2003 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast: RID not found Message-ID: <3F4CAEF2.66D16697@iem.sp.ru> Hi, I had got the same problem. It's caused by wrong regular expression on line 168 in RemoteBlast.pm module. Just change $RIDLINE = 'RID\s+=\s+(\d+-\d+-\d+)'; to something like $RIDLINE = 'RID\s+=\s+(\d+-\d+-\d+\.BLASTQ3)'; and it'll work. Good luck, Serge. From Wiepert.Mathieu at mayo.edu Wed Aug 27 10:07:25 2003 From: Wiepert.Mathieu at mayo.edu (Wiepert, Mathieu) Date: Wed Aug 27 10:06:50 2003 Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast: RID not found Message-ID: <2F41CC6C9777D311ACBD009027B108EA06E9A682@excsrv32.mayo.edu> Hi, Unfortunately, that can't be the fix, since NCBI has said that the structure of the RID, with '.BLASTQ3' appended is not to be relied upon. How much it is not to be relied upon has yet to be determined? But, I think it is fair to say that $RIDLINE = 'RID\s+=\s+(\S+)'; might work. Or is that too broad? I would think all contiguous non-white space after the = would be OK. -mat -----Original Message----- From: Sergey V. Orlov [mailto:serge@iem.sp.ru] Sent: Wednesday, August 27, 2003 8:16 AM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast: RID not found Hi, I had got the same problem. It's caused by wrong regular expression on line 168 in RemoteBlast.pm module. Just change $RIDLINE = 'RID\s+=\s+(\d+-\d+-\d+)'; to something like $RIDLINE = 'RID\s+=\s+(\d+-\d+-\d+\.BLASTQ3)'; and it'll work. Good luck, Serge. _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From dag at sonsorol.org Wed Aug 27 11:46:07 2003 From: dag at sonsorol.org (Chris Dagdigian) Date: Wed Aug 27 11:45:18 2003 Subject: [Bioperl-l] Open-Bio server downtime and IP address changes scheduled for Tuesday September 2nd 2003 Message-ID: <3F4CD23F.8090807@sonsorol.org> Hi Everyone, Apologies for the mass cross-posting but this email is about server and IP changes that will affect all of our projects and servers. Simply put -- Wyeth, the company that provides us with our hosting and wonderful T3 connection to the internet is cutting their internet connection circuits over from one ISP to a different Tier 1 internet backbone. Technically the changeover will be swift as the circuit and new routers/firewalls are already in place. Should be a matter of bringing down the old gear and lighting up the new stuff. The backbone change will have a significant affect on us though -- all of our server IP addresses will change. The change is scheduled for the evening (EST/EDT timezone) of September 2nd 2003. I'll be onsite at Wyeth in the datacenter as the change occurs so that I can bring down our servers and plug in the new IP addresses. The really nice thing is that all of our primary and secondary DNS nameservers are hosted at places other than Wyeth. This means that we can almost instantly be pushing out the new correct IP addresses for all of our open-bio.org, biojava.org etc. domain names. If I can get my act together during the day on Tuesday I'll start seeding our DNS servers with shorter TTL values which will speed up the spread of the new information. For people with 'fresh' DNS data our servers will appear back on the internet within 30 minutes or so. For people behind nameserver caches that do not refresh all that often please expect our servers to "vanish" from the internet for a period of about 8-24 hours while the new information propagates out through the internet. Regards, Chris open-bio.org -- Chris Dagdigian, BioTeam Inc. - Independent Bio-IT & Informatics consulting Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193 PGP KeyID: 83D4310E Yahoo IM: craffi Web: http://bioteam.net From chauser at duke.edu Wed Aug 27 12:11:29 2003 From: chauser at duke.edu (Charles Hauser) Date: Wed Aug 27 12:11:31 2003 Subject: [Bioperl-l] extract feature seq when split between 2 GenBank accessions Message-ID: <1062000753.30158.37.camel@pandorina.biology.duke.edu> All, I'd like to extract the CDS from genbank records and have found that in some instances these are distributed among >1 genbank accession (see below). I have a script which does fine if CDS is fully contained within 1 accession, other than storing all accession seqs in a hash is there a good way to deal with these? Charles LOCUS AY095303S1 2375 bp DNA linear PLN 21-JAN-2003 DEFINITION Chlamydomonas reinhardtii c-type cytochrome synthesis 1 (CCS1) gene, ccs1-ac206 allele, 5'UTR and exons 1 through 6. ACCESSION AY095303 VERSION AY095303.1 GI:25986619 CDS join(207..330,512..825,1045..1233,1418..1798,2000..2131, 2253..2345,AY095304.1:6..303,AY095304.1:495..677, AY095304.1:863..1098) /gene="CCS1" LOCUS AY095303S2 1505 bp DNA linear PLN 21-JAN-2003 DEFINITION Chlamydomonas reinhardtii c-type cytochrome synthesis 1 (CCS1) gene, ccs1-ac206 allele, exons 7, 8 and 9, 3'UTR and complete cds. ACCESSION AY095304 VERSION AY095304.1 GI:25986620 CDS join(AY095303.1:207..330,AY095303.1:512..825, AY095303.1:1045..1233,AY095303.1:1418..1798, AY095303.1:2000..2131,AY095303.1:2253..2345,6..303, 495..677,863..1098) /gene="CCS1" From jason at cgt.duhs.duke.edu Wed Aug 27 13:05:08 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Wed Aug 27 12:42:58 2003 Subject: [Bioperl-l] extract feature seq when split between 2 GenBank accessions In-Reply-To: <1062000753.30158.37.camel@pandorina.biology.duke.edu> References: <1062000753.30158.37.camel@pandorina.biology.duke.edu> Message-ID: If you are getting the seq via spliced_seq you can pass in a Bio::DB::RandomAccessI (either a [local] Bio::Index::Fasta or [remote] Bio::DB::GenBank, etc db handle) to the spliced_seq object. Now I think there is a bug because spliced seq is sorting the locations before processing on them which has been reported but not fixed (I am really hoping for some more bugfixing developers out there folks!) but it should work through that system once that bug is fixed. I would just use a Bio::DB::Fasta/Bio::Index::Fasta where you have the accessions indexed instead of reading in all the possible seqs and storing in a hash to keep the memory requirements down. You can also use the DB::Failover + DB::FileCache to cache local/remote calls if you need to mix local and remote dbs. -jason On Wed, 27 Aug 2003, Charles Hauser wrote: > All, > > I'd like to extract the CDS from genbank records and have found that in > some instances these are distributed among >1 genbank accession (see > below). > > I have a script which does fine if CDS is fully contained within 1 > accession, other than storing all accession seqs in a hash is there a > good way to deal with these? > > Charles > > > LOCUS AY095303S1 2375 bp DNA linear PLN 21-JAN-2003 > DEFINITION Chlamydomonas reinhardtii c-type cytochrome synthesis 1 (CCS1) > gene, ccs1-ac206 allele, 5'UTR and exons 1 through 6. > ACCESSION AY095303 > VERSION AY095303.1 GI:25986619 > > CDS join(207..330,512..825,1045..1233,1418..1798,2000..2131, > 2253..2345,AY095304.1:6..303,AY095304.1:495..677, > AY095304.1:863..1098) > /gene="CCS1" > > > > > LOCUS AY095303S2 1505 bp DNA linear PLN 21-JAN-2003 > DEFINITION Chlamydomonas reinhardtii c-type cytochrome synthesis 1 (CCS1) > gene, ccs1-ac206 allele, exons 7, 8 and 9, 3'UTR and complete cds. > ACCESSION AY095304 > VERSION AY095304.1 GI:25986620 > CDS join(AY095303.1:207..330,AY095303.1:512..825, > AY095303.1:1045..1233,AY095303.1:1418..1798, > AY095303.1:2000..2131,AY095303.1:2253..2345,6..303, > 495..677,863..1098) > /gene="CCS1" > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From chauser at duke.edu Wed Aug 27 14:35:49 2003 From: chauser at duke.edu (Charles Hauser) Date: Wed Aug 27 14:35:50 2003 Subject: [Bioperl-l] extract feature seq when split between 2 GenBank accessions In-Reply-To: References: <1062000753.30158.37.camel@pandorina.biology.duke.edu> Message-ID: <1062009413.30158.99.camel@pandorina.biology.duke.edu> Jason, Thanks. On Wed, 2003-08-27 at 13:05, Jason Stajich wrote: > If you are getting the seq via spliced_seq you can pass in a > Bio::DB::RandomAccessI (either a [local] Bio::Index::Fasta or [remote] > Bio::DB::GenBank, etc db handle) to the spliced_seq object. > > Now I think there is a bug because spliced seq is sorting the locations > before processing on them which has been reported but not fixed > (I am really hoping for some more bugfixing developers out there folks!) > but it should work through that system once that bug is fixed. I'm using spliced_seq, and its returning sequence w/ N's for the segments derived form joins outside the current accession. > > I would just use a Bio::DB::Fasta/Bio::Index::Fasta where you have the > accessions indexed instead of reading in all the possible seqs and storing > in a hash to keep the memory requirements down. You can also use the > DB::Failover + DB::FileCache to cache local/remote calls if you need to > mix local and remote dbs. I'll try this -thanks. Charles From jason at cgt.duhs.duke.edu Wed Aug 27 15:29:18 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Wed Aug 27 15:07:01 2003 Subject: [Bioperl-l] extract feature seq when split between 2 GenBank accessions In-Reply-To: <1062009413.30158.99.camel@pandorina.biology.duke.edu> References: <1062000753.30158.37.camel@pandorina.biology.duke.edu> <1062009413.30158.99.camel@pandorina.biology.duke.edu> Message-ID: > > Now I think there is a bug because spliced seq is sorting the locations > > before processing on them which has been reported but not fixed > > (I am really hoping for some more bugfixing developers out there folks!) > > but it should work through that system once that bug is fixed. > > I'm using spliced_seq, and its returning sequence w/ N's for the > segments derived form joins outside the current accession. Just fixed this in the CVS live. We have to trust the location order when it is mixed with remote locations. There may be a couple more special cases which the code I put won't catch, so those who have problems, please submit them so we can add tests. > > > > > I would just use a Bio::DB::Fasta/Bio::Index::Fasta where you have the > > accessions indexed instead of reading in all the possible seqs and storing > > in a hash to keep the memory requirements down. You can also use the > > DB::Failover + DB::FileCache to cache local/remote calls if you need to > > mix local and remote dbs. > > I'll try this -thanks. > > Charles > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From jason at cgt.duhs.duke.edu Wed Aug 27 16:02:24 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Wed Aug 27 15:40:04 2003 Subject: [Bioperl-l] doing a 1.2.3 release Message-ID: In my mind there are enough things that have been fixed since 1.2.2 on the branch to justify the effort in releasing another bugfix release on the stable branch. I would like to merge some of the HTML|TextResultWriter fixes from the main trunk, otherwise I think could be an easy push out the door. -jason -- Jason Stajich Duke University jason at cgt.mc.duke.edu From letondal at pasteur.fr Wed Aug 27 16:48:07 2003 From: letondal at pasteur.fr (Catherine Letondal) Date: Wed Aug 27 16:47:09 2003 Subject: [Bioperl-l] PiseWorkflow.pm In-Reply-To: <20030826105722.A399257@electre.pasteur.fr>; from letondal@pasteur.fr on Tue, Aug 26, 2003 at 10:57:22AM +0200 References:

<20030826105722.A399257@electre.pasteur.fr> Message-ID: <20030827224807.A412193@electre.pasteur.fr> > > > Begin forwarded message: > > > > > > > From: "Samuel Thoraval" > > > > Date: Mon Aug 25, 2003 19:46:03 US/Eastern > > > > To: volunteer@open-bio.org > > > > Subject: [Volunteer] submitting modules > > > > > > > > Hello, > > > > > > > > I have written a module for the Bioperl/Pise API. > > > > I would like to submit it. > > > > To which email should i send the details, code and request to ? > > > > > > > > Regards, > > > > > > > > Samuel Thoraval > Samuel's PiseWorkflow module has been added in CVS. -- Catherine Letondal -- Pasteur Institute Computing Center From jamie at genome.arizona.edu Wed Aug 27 18:05:06 2003 From: jamie at genome.arizona.edu (Jamie Hatfield) Date: Wed Aug 27 18:05:07 2003 Subject: [Bioperl-l] Re: Bio::FPC In-Reply-To: References: Message-ID: <1062021826.562.128.camel@motox> Bio::Map::Marker is really more like our clones. A clone has a range that it exists in contig (or map). But FPCMarkers don't have a position in a map. They "hit" a clone. That is why I felt it was necessary to create a new class. I don't see how these two ideas overlap. On Tue, 2003-08-26 at 10:02, Ewan Birney wrote: > > > On Tue, 26 Aug 2003, Brian Osborne wrote: > > > Jamie, > > > > And a "marker" can be a genetic marker, yes? A la Bio::Map::Marker? If you > > take a look at this module you'll see that its definition of marker allows > > any marker to have different positions in different maps (contig "map", > > genetic map, physical map). This seems to overlap with your notion of > > marker. > > > > Here's my first impression. There's a parser, MapIO::mapmaker for mapmaker, > > mapmaker makes maps from segregation data, genetic data. Your fpc makes > > physical maps, yet physical and genetic maps can be merged to create > > "integrated maps". Your fpcmarker must be closely related to > > Bio::Map::Marker, in fact it's not clear that there should be an fpcmarker. > > I would think that a Marker object could be a reasonably rich one, and it > > could be created by fpc or any other program, it really shouldn't matter > > much how it's created (in fact, all this new PopGen code must be ordering > > markers to make maps, I'd think). Perhaps you should be using some of the > > existing code in Bio/Map? Your thoughts? > > > > Brian - I doubt the pop gen stuff will overlap at all with this stuff. but > the marker comment is right, though I can well believe there needs to be > specific FPC hooks for markers used for FPC stuff.... > > > > > > Brian O. > > > > -----Original Message----- > > From: Jamie Hatfield [mailto:jamie@genome.arizona.edu] > > Sent: Tuesday, August 26, 2003 11:46 AM > > To: Brian Osborne > > Cc: BioPerl-List > > Subject: RE: [Bioperl-l] Re: Bio::FPC > > > > Yes, definitly, discussion is great! > > > > We had a little bit of a discussion about this back in November 2002, > > when I proposed the idea, and it was suggested by Heikki to try to fit > > it into either Bio::Map or Bio::Assembly. Maybe I picked the wrong > > one? How about this... I will describe a little bit about what fpc is, > > for those who don't know, and those who know Bio::Map and Bio::Assembly > > will tell me if it fits in their design. ok? > > > > FPC stands for FingerPrinted Contigs. Its main purpose is to assemble > > clones into contiguous regions of overlaps, based on the fingerprint of > > the clones. These fingerprints can be from agarose (sp?) gels, or HICF, > > or simulated, or whatever. Maybe this is more like Assembly? > > > > Anyway, you have the clones, and there are also markers that hit the > > clones, and aid in assembling the clones into contigs. These are the > > main 3 classes. Clones, Contigs, Markers. Contigs contain Clones. > > Markers hit Clones. Clones are hit by markers. > > > > Contigs 1--m Clones > > Markers m--m Clones > > > > Is that a sufficient description of FPC, or do we need more to make a > > good decision? > > > > Thanks for initiating the discussion, Brian. > > > > Jamie > > > > On Tue, 2003-08-26 at 05:28, Brian Osborne wrote: > > > Jamie, > > > > > > One of the challenges in Bioperl is creating a single coherent set of > > > modules from the many individual contributions. Could you tell us a bit > > > about your modules and how they overlap functionally with the existing > > > modules in Bio::Map? If you take a look at those modules you can see that > > a > > > good number of the more steadfast Bioperl authors have contributed to > > > Bio::Map, I'm sure that they'd like to see your modules integrate neatly > > > with the existing code. > > > > > > I'm not one of these authors, I'm simply responding because it seems that > > > you'd like to get some discussion going. > > > > > > Brian O. > > > > > > -----Original Message----- > > > From: bioperl-l-bounces@portal.open-bio.org > > > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jamie Hatfield > > > Sent: Monday, August 25, 2003 6:35 PM > > > To: BioPerl-List > > > Subject: Re: [Bioperl-l] Re: Bio::FPC > > > > > > Again, how do I go about submitting this? > > > > > > On Thu, 2003-08-14 at 09:34, Jamie Hatfield wrote: > > > > Yes, actually. We are just now finishing up the fpc parser. I was > > > > planning on soon asking the group how I would go about submitting it? > > > > It consists of 5 modules that we have put in the MapIO and Map > > > > namespaces. > > > > Bio::MapIO::fpc.pm > > > > Bio::Map::physical.pm > > > > Bio::Map::fpcmarker.pm (sorry, but marker doesn't work) > > > > Bio::Map::clone.pm > > > > Bio::Map::contig.pm > > > > > > > > If you want to see how this object might be used, check out > > > > http://www.genome.arizona.edu/software/fpc/biofpc/index.html > > > > > > > > You'll see there documentation for the modules, and a few test cases or > > > > example usages. > > > > > > > > Also, we are trying to make a generic converter to let you load in a fpc > > > > file and generate the necessary GFF for GBrowse to display the fpc map. > > > > It's a quite simple display of the clones, markers, and contigs, but > > > > maybe that will be usefull as an alternative to WebFPC (a java view only > > > > version of fpc). It works for us, but might not work for everybody. We > > > > should be able to patch it up, though, if it's missing features. > > > > > > > > So, anyways, if somebody can let me know how to go about submitting it, > > > > we'll start the process. I looked through the FAQ and it basically said > > > > to just post information if you have a module that you would like to > > > > contribute, so, here's the information. > > > > > > > > Jamie > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > From birney at ebi.ac.uk Wed Aug 27 19:07:21 2003 From: birney at ebi.ac.uk (Ewan Birney) Date: Wed Aug 27 19:06:08 2003 Subject: [Bioperl-l] doing a 1.2.3 release In-Reply-To: Message-ID: On Wed, 27 Aug 2003, Jason Stajich wrote: > In my mind there are enough things that have been fixed since 1.2.2 on the > branch to justify the effort in releasing another bugfix release on the > stable branch. I would like to merge some of the HTML|TextResultWriter > fixes from the main trunk, otherwise I think could be an easy push out the > door. > I'd agree. I also think we need to start on the 1.3 series. Heikki... feel free to chime in...? Jason - I am ok for some bug hunting for a while on either branch or trunk - any not-so-nasty-that-I-go-mad-but-useful-to-fix-bugs out there? > -jason > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From wes.barris at csiro.au Wed Aug 27 19:57:04 2003 From: wes.barris at csiro.au (Wes Barris) Date: Wed Aug 27 19:56:29 2003 Subject: [Bioperl-l] How do you add a consensus to AlignIO? Message-ID: <3F4D4550.6010902@csiro.au> Hi, I am trying to write a bioperl clustalw to msf converter. I also want to add a consensus sequence to the alignment. Here is my code: #!/usr/local/bin/perl -w # use strict; use Bio::AlignIO; use Bio::SeqIO; my $usage = "Usage: $0 \n"; my $infile = shift or die $usage; my $outfile = shift or die $usage; my $instream = new Bio::AlignIO(-format=>'clustalw', -file=>$infile); my $outstream = new Bio::AlignIO(-format=>'msf', -file=>">$outfile"); my $aln = $instream->next_aln(); my $consensus = new Bio::Seq(-seq=>$aln->consensus_string(), -id=>'btcn1000'); $aln->id('alignment.msf'); # set the alignment name $aln->set_displayname_flat(); # remove the /start-end from the names $aln->add_seq($consensus); # add the consensus sequence $outstream->write_aln($aln); When I run this, I get this error: wes@bioserver> clustaltomsf.pl j.aln j.msf ------------- EXCEPTION Bio::Seq ------------- MSG: Unable to process non locatable sequences [ STACK Bio::SimpleAlign::add_seq /usr/lib/perl5/site_perl/5.6.1/Bio/SimpleAlign.pm:245 STACK toplevel clustaltomsf.pl:18 ---------------------------------------------- Can anyone tell me what I am doing wrong? What is a non locatable sequence? How do I make a locatable one? -- Wes Barris E-Mail: Wes.Barris@csiro.au From jason at cgt.duhs.duke.edu Wed Aug 27 21:37:53 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Wed Aug 27 21:15:37 2003 Subject: [Bioperl-l] How do you add a consensus to AlignIO? In-Reply-To: <3F4D4550.6010902@csiro.au> References: <3F4D4550.6010902@csiro.au> Message-ID: On Thu, 28 Aug 2003, Wes Barris wrote: > Hi, > > I am trying to write a bioperl clustalw to msf converter. I also want > to add a consensus sequence to the alignment. Here is my code: > > #!/usr/local/bin/perl -w > # > use strict; > use Bio::AlignIO; > use Bio::SeqIO; > > my $usage = "Usage: $0 \n"; > my $infile = shift or die $usage; > my $outfile = shift or die $usage; > > my $instream = new Bio::AlignIO(-format=>'clustalw', -file=>$infile); > my $outstream = new Bio::AlignIO(-format=>'msf', -file=>">$outfile"); > my $aln = $instream->next_aln(); Try: my $consensus = new Bio::LocatableSeq(-seq=>$aln->consensus_string(), -id=>'btcn1000'); > > $aln->id('alignment.msf'); # set the alignment name > $aln->set_displayname_flat(); # remove the /start-end from the names > $aln->add_seq($consensus); # add the consensus sequence > > $outstream->write_aln($aln); > > When I run this, I get this error: > > wes@bioserver> clustaltomsf.pl j.aln j.msf > > ------------- EXCEPTION Bio::Seq ------------- > MSG: Unable to process non locatable sequences [ > STACK Bio::SimpleAlign::add_seq /usr/lib/perl5/site_perl/5.6.1/Bio/SimpleAlign.pm:245 > STACK toplevel clustaltomsf.pl:18 > > ---------------------------------------------- > > Can anyone tell me what I am doing wrong? What is a non locatable sequence? > How do I make a locatable one? > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From wes.barris at csiro.au Wed Aug 27 22:55:45 2003 From: wes.barris at csiro.au (Wes Barris) Date: Wed Aug 27 22:54:51 2003 Subject: [Bioperl-l] How do you add a consensus to AlignIO? In-Reply-To: References: <3F4D4550.6010902@csiro.au> Message-ID: <3F4D6F31.9020905@csiro.au> Jason Stajich wrote: > Try: > my $consensus = new Bio::LocatableSeq(-seq=>$aln->consensus_string(), > -id=>'btcn1000'); Thanks. That did the trick almost (I had to add -start and -end to the argument list). However, it is not producing the consensus in the way that I want. Here is a portion of the resulting msf file: AU278862 ---------- ---------C TCTACAGAAT CTGTGTTTAT TTTGTTTCAG AU278567 ---------- ---------- ---ACAGAAT CTGTGTTTAT TTTGTTTCAG AU277959 -------AGA TTTTGACATC TCTACAGAAT CTGTGTTTAT TTTGTTTCAG AU278008 CCTTCTTANA TTTTGACATC TCTACAGAAT CTGNGTTTAT TTTGTTTCAG AU278623 -------AAA TTTTGACATC TCTACANAA- CTGTGTTTAT TTTGTTTCAN AU278682 -------AAA TTTTGACATC TCTACANAAT CTGTGTTTAT TTTGTTTCAN BM031781 ---------- ---------- ---------- ---------- ---------- consensus -------A-A TTTTGACATC TCTACAGAAT CTGTGTTTAT TTTGTTTCAG I need the consensus to span the entire alignment length like this: consensus CCTTCTTAAA TTTTGACATC TCTACAGAAT CTGTGTTTAT TTTGTTTCAG i.e. I need it to ignore where the aligned sequences do not exist. Is there a way to make it do that? -- Wes Barris E-Mail: Wes.Barris@csiro.au From juguang at tll.org.sg Thu Aug 28 05:55:01 2003 From: juguang at tll.org.sg (Juguang Xiao) Date: Thu Aug 28 05:53:56 2003 Subject: [Bioperl-l] taxonomy and speices Message-ID: Hi guys, I tried to write a simple bioperl-db scripts functioning like the search on http://www.ncbi.nih.gov/Taxonomy/taxonomyhome.html/ , to return a full taxonomy path, and all sub taxonomy nodes. Say, If I search 'mouse', it will return the full path as Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus; mouse And all sub taxonomy nodes will be also returned, like 'asian house mouse', 'european house mouse', etc. However, the Guru Hilmar told me that current bioperl-db works on Bio::Species, but not Bio::Taxonomy, and now bioperl-db cannot satisfy my above requirement until the code will adapt Taxomony after Taxonomy replaces Species. Hence I investigate the species-related modules, found some puzzles and would like to volunteer the idea and the code. Bio::Taxonomy is written by Dan Kortschak, and the main and only functional method (rather than get/set, I mean), 'classify', is to convert a Species object into an array of names. It wastes such nice module name ;-) Jason wrote Bio::Taxonomy::Node, and Bio::DB::Taxonomy which access NCBI Entrez over HTTP OR read the NCBI Tax dump files. Bio::Taxonomy::Node is tied to Bio::DB::Taxonomy closely, hence it objects to be adapted in bioperl-db system so easily. My plan to reform them is described below. DATA STRUCTURE Taxonomy should be abstracted as a hash with the keys as rank names, such as 'class', 'genus', and values as the identifiers, such as NCBI taxid, scientific name or Taxonomy::Node object. $taxonomy = { '_rank' => ['root', 'superkingdom', ..., 'species', 'subspecies'..., 'no rank'], # copied from the current Taxonomy module. '_hierarchy' => { # Though the keys are unordered in this hash, its order is defined in rank. ... 'class' => 40674, # or mammlia, or the Taxonomy::Node 'genus' => 'Mus', 'species' => $tax_Node_musculus .... }, '_factory' => $factory, # explained later. }; NOTE: the new taxonomy can represent more than species level, e.g. it is flexible to represent a object at genus level without species. $taxNode_mammalia = { 'object_id' => 40674, # NCBI taxid, and the reason why it is called 'object_id' for the consistence to Bio:;IdentifiableI 'rank' => 'class', 'name' => 'Mammalia', # scientific name 'common_name' => 'mammals', # Genbank common name, as NCBI site uses the term. 'alias' => { # a hash with name_class as key and variant name as value '' => '' }, '_factory' => $factory }; $taxNode_mouse = { 'object_id' => 10090, 'rank' => 'species', 'names' => { # This is a general solution!! 'specific' => ['musclus'], 'common' => ['mouse', 'Mickey'], 'includes' => ['nude mice'] } }; OBJECTS Bio::Taxonomy will override all methods in Bio::Species, for the sake of backwards compatibility. If the tax object represents a level higher than species, the sub 'binomial' returns undef, otherwise simple make the result by combining the species and genus; the sub 'classification' will look like " foreach(@ranks){ unshift @classification, $taxonomy{$_} if defined exists $taxonomy{$_} } Bio::Taxonomy::Node has NO reference to either the parent node or taxonomy object, so that Node objects can be freely shared among Taxonomy. Tricky: once a Node object is created, it should be changed on its content. If a Taxonomy requires one of its Nodes modified, it has to make a new Node, in case that Node was shared by other Taxonomy. Definitely, we need a Taxonomy factory, like Jason's Bio::DB::Taxonomy or what we are going to create in bioperl-db. Both Taxonomy objects and Node ones have a reference to this factory, so that Taxonomy can be created automatically, and Node can ask who his parent is, ($node->get_parent_node, e. g. $node->_factory->find_parent_node($node)). Comments, please, and I will transform the idea into the code. Thanks. Juguang ------------ATGCCGAGCTTNNNNCT-------------- Juguang Xiao Bioinformatics Engineer Temasek Life Sciences Laboratory, National University of Singapore 1 Research Link, Singapore 117604 fax: (+65) 68727007 juguang at tll.org.sg From brian_osborne at cognia.com Thu Aug 28 08:49:12 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Aug 28 08:52:20 2003 Subject: [Bioperl-l] taxonomy and speices In-Reply-To: Message-ID: Juguang, It sounds like you've formulated a solution, but let me describe another approach, if only to plug BioSQL for those who haven't installed it. With a BioSQL database installed one can run Aaron's load_taxononomy.pl (found in the biosql package), this loads the current taxonomy data from NCBI. There you'll find each taxon labeled by name ("Arabidopsis"), node_rank ("genus"), and parent_taxon_id. Yes, this approach is a bit more "mechanical" than yours but a straightforward script will get both the "full path" or the children from the database. Sidelight: see http://www.oreillynet.com/pub/a/network/2002/11/27/bioconf.html for Aaron's nice article on the meaning of the right_value and left_value fields. If you do write the code you've suggested please send the final script, it sounds like a good one for our examples/ directory. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Juguang Xiao Sent: Thursday, August 28, 2003 5:55 AM To: bioperl-l@bioperl.org Subject: [Bioperl-l] taxonomy and speices Hi guys, I tried to write a simple bioperl-db scripts functioning like the search on http://www.ncbi.nih.gov/Taxonomy/taxonomyhome.html/ , to return a full taxonomy path, and all sub taxonomy nodes. Say, If I search 'mouse', it will return the full path as Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus; mouse And all sub taxonomy nodes will be also returned, like 'asian house mouse', 'european house mouse', etc. However, the Guru Hilmar told me that current bioperl-db works on Bio::Species, but not Bio::Taxonomy, and now bioperl-db cannot satisfy my above requirement until the code will adapt Taxomony after Taxonomy replaces Species. Hence I investigate the species-related modules, found some puzzles and would like to volunteer the idea and the code. Bio::Taxonomy is written by Dan Kortschak, and the main and only functional method (rather than get/set, I mean), 'classify', is to convert a Species object into an array of names. It wastes such nice module name ;-) Jason wrote Bio::Taxonomy::Node, and Bio::DB::Taxonomy which access NCBI Entrez over HTTP OR read the NCBI Tax dump files. Bio::Taxonomy::Node is tied to Bio::DB::Taxonomy closely, hence it objects to be adapted in bioperl-db system so easily. My plan to reform them is described below. DATA STRUCTURE Taxonomy should be abstracted as a hash with the keys as rank names, such as 'class', 'genus', and values as the identifiers, such as NCBI taxid, scientific name or Taxonomy::Node object. $taxonomy = { '_rank' => ['root', 'superkingdom', ..., 'species', 'subspecies'..., 'no rank'], # copied from the current Taxonomy module. '_hierarchy' => { # Though the keys are unordered in this hash, its order is defined in rank. ... 'class' => 40674, # or mammlia, or the Taxonomy::Node 'genus' => 'Mus', 'species' => $tax_Node_musculus .... }, '_factory' => $factory, # explained later. }; NOTE: the new taxonomy can represent more than species level, e.g. it is flexible to represent a object at genus level without species. $taxNode_mammalia = { 'object_id' => 40674, # NCBI taxid, and the reason why it is called 'object_id' for the consistence to Bio:;IdentifiableI 'rank' => 'class', 'name' => 'Mammalia', # scientific name 'common_name' => 'mammals', # Genbank common name, as NCBI site uses the term. 'alias' => { # a hash with name_class as key and variant name as value '' => '' }, '_factory' => $factory }; $taxNode_mouse = { 'object_id' => 10090, 'rank' => 'species', 'names' => { # This is a general solution!! 'specific' => ['musclus'], 'common' => ['mouse', 'Mickey'], 'includes' => ['nude mice'] } }; OBJECTS Bio::Taxonomy will override all methods in Bio::Species, for the sake of backwards compatibility. If the tax object represents a level higher than species, the sub 'binomial' returns undef, otherwise simple make the result by combining the species and genus; the sub 'classification' will look like " foreach(@ranks){ unshift @classification, $taxonomy{$_} if defined exists $taxonomy{$_} } Bio::Taxonomy::Node has NO reference to either the parent node or taxonomy object, so that Node objects can be freely shared among Taxonomy. Tricky: once a Node object is created, it should be changed on its content. If a Taxonomy requires one of its Nodes modified, it has to make a new Node, in case that Node was shared by other Taxonomy. Definitely, we need a Taxonomy factory, like Jason's Bio::DB::Taxonomy or what we are going to create in bioperl-db. Both Taxonomy objects and Node ones have a reference to this factory, so that Taxonomy can be created automatically, and Node can ask who his parent is, ($node->get_parent_node, e. g. $node->_factory->find_parent_node($node)). Comments, please, and I will transform the idea into the code. Thanks. Juguang ------------ATGCCGAGCTTNNNNCT-------------- Juguang Xiao Bioinformatics Engineer Temasek Life Sciences Laboratory, National University of Singapore 1 Research Link, Singapore 117604 fax: (+65) 68727007 juguang at tll.org.sg _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From brian_osborne at cognia.com Thu Aug 28 09:39:27 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Aug 28 09:42:34 2003 Subject: [Bioperl-l] Re: Bio::FPC In-Reply-To: <1062021826.562.128.camel@motox> Message-ID: Jamie, Is it fair to say that "hitting a clone" is the same thing as "having a position in a clone that starts at the beginning of the clone and ends at the end of a clone"? Or "the FPCMarker's range goes from the beginning to the end of a clone"? Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jamie Hatfield Sent: Wednesday, August 27, 2003 6:04 PM To: Ewan Birney Cc: Brian Osborne; BioPerl-List Subject: RE: [Bioperl-l] Re: Bio::FPC Bio::Map::Marker is really more like our clones. A clone has a range that it exists in contig (or map). But FPCMarkers don't have a position in a map. They "hit" a clone. That is why I felt it was necessary to create a new class. I don't see how these two ideas overlap. On Tue, 2003-08-26 at 10:02, Ewan Birney wrote: > > > On Tue, 26 Aug 2003, Brian Osborne wrote: > > > Jamie, > > > > And a "marker" can be a genetic marker, yes? A la Bio::Map::Marker? If you > > take a look at this module you'll see that its definition of marker allows > > any marker to have different positions in different maps (contig "map", > > genetic map, physical map). This seems to overlap with your notion of > > marker. > > > > Here's my first impression. There's a parser, MapIO::mapmaker for mapmaker, > > mapmaker makes maps from segregation data, genetic data. Your fpc makes > > physical maps, yet physical and genetic maps can be merged to create > > "integrated maps". Your fpcmarker must be closely related to > > Bio::Map::Marker, in fact it's not clear that there should be an fpcmarker. > > I would think that a Marker object could be a reasonably rich one, and it > > could be created by fpc or any other program, it really shouldn't matter > > much how it's created (in fact, all this new PopGen code must be ordering > > markers to make maps, I'd think). Perhaps you should be using some of the > > existing code in Bio/Map? Your thoughts? > > > > Brian - I doubt the pop gen stuff will overlap at all with this stuff. but > the marker comment is right, though I can well believe there needs to be > specific FPC hooks for markers used for FPC stuff.... > > > > > > Brian O. > > > > -----Original Message----- > > From: Jamie Hatfield [mailto:jamie@genome.arizona.edu] > > Sent: Tuesday, August 26, 2003 11:46 AM > > To: Brian Osborne > > Cc: BioPerl-List > > Subject: RE: [Bioperl-l] Re: Bio::FPC > > > > Yes, definitly, discussion is great! > > > > We had a little bit of a discussion about this back in November 2002, > > when I proposed the idea, and it was suggested by Heikki to try to fit > > it into either Bio::Map or Bio::Assembly. Maybe I picked the wrong > > one? How about this... I will describe a little bit about what fpc is, > > for those who don't know, and those who know Bio::Map and Bio::Assembly > > will tell me if it fits in their design. ok? > > > > FPC stands for FingerPrinted Contigs. Its main purpose is to assemble > > clones into contiguous regions of overlaps, based on the fingerprint of > > the clones. These fingerprints can be from agarose (sp?) gels, or HICF, > > or simulated, or whatever. Maybe this is more like Assembly? > > > > Anyway, you have the clones, and there are also markers that hit the > > clones, and aid in assembling the clones into contigs. These are the > > main 3 classes. Clones, Contigs, Markers. Contigs contain Clones. > > Markers hit Clones. Clones are hit by markers. > > > > Contigs 1--m Clones > > Markers m--m Clones > > > > Is that a sufficient description of FPC, or do we need more to make a > > good decision? > > > > Thanks for initiating the discussion, Brian. > > > > Jamie > > > > On Tue, 2003-08-26 at 05:28, Brian Osborne wrote: > > > Jamie, > > > > > > One of the challenges in Bioperl is creating a single coherent set of > > > modules from the many individual contributions. Could you tell us a bit > > > about your modules and how they overlap functionally with the existing > > > modules in Bio::Map? If you take a look at those modules you can see that > > a > > > good number of the more steadfast Bioperl authors have contributed to > > > Bio::Map, I'm sure that they'd like to see your modules integrate neatly > > > with the existing code. > > > > > > I'm not one of these authors, I'm simply responding because it seems that > > > you'd like to get some discussion going. > > > > > > Brian O. > > > > > > -----Original Message----- > > > From: bioperl-l-bounces@portal.open-bio.org > > > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jamie Hatfield > > > Sent: Monday, August 25, 2003 6:35 PM > > > To: BioPerl-List > > > Subject: Re: [Bioperl-l] Re: Bio::FPC > > > > > > Again, how do I go about submitting this? > > > > > > On Thu, 2003-08-14 at 09:34, Jamie Hatfield wrote: > > > > Yes, actually. We are just now finishing up the fpc parser. I was > > > > planning on soon asking the group how I would go about submitting it? > > > > It consists of 5 modules that we have put in the MapIO and Map > > > > namespaces. > > > > Bio::MapIO::fpc.pm > > > > Bio::Map::physical.pm > > > > Bio::Map::fpcmarker.pm (sorry, but marker doesn't work) > > > > Bio::Map::clone.pm > > > > Bio::Map::contig.pm > > > > > > > > If you want to see how this object might be used, check out > > > > http://www.genome.arizona.edu/software/fpc/biofpc/index.html > > > > > > > > You'll see there documentation for the modules, and a few test cases or > > > > example usages. > > > > > > > > Also, we are trying to make a generic converter to let you load in a fpc > > > > file and generate the necessary GFF for GBrowse to display the fpc map. > > > > It's a quite simple display of the clones, markers, and contigs, but > > > > maybe that will be usefull as an alternative to WebFPC (a java view only > > > > version of fpc). It works for us, but might not work for everybody. We > > > > should be able to patch it up, though, if it's missing features. > > > > > > > > So, anyways, if somebody can let me know how to go about submitting it, > > > > we'll start the process. I looked through the FAQ and it basically said > > > > to just post information if you have a module that you would like to > > > > contribute, so, here's the information. > > > > > > > > Jamie > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From jamie at genome.arizona.edu Thu Aug 28 10:23:12 2003 From: jamie at genome.arizona.edu (Jamie Hatfield) Date: Thu Aug 28 10:23:13 2003 Subject: [Bioperl-l] Re: Bio::FPC In-Reply-To: References: Message-ID: <1062080518.562.134.camel@motox> Yes, that is valid. So then I assume you're saying that the marker can just hit the contig instead? That would be fine if 1) the marker could hit a range 2) the marker could store all the clones it hits. We need to be able to query the marker and ask it which clones and which contigs it hits. I don't think that would be possible with the Marker Object, would it? Thanks for keeping up this discussion, by the way. I'd like to try to get this finished up and added in correctly. A few people have asked how they get ahold of the fpc parser modules, and I've been telling them to wait until I can get it part of bioperl in a proper manner. Jamie On Thu, 2003-08-28 at 06:39, Brian Osborne wrote: > Jamie, > > Is it fair to say that "hitting a clone" is the same thing as "having a > position in a clone that starts at the beginning of the clone and ends at > the end of a clone"? Or "the FPCMarker's range goes from the beginning to > the end of a clone"? > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jamie Hatfield > Sent: Wednesday, August 27, 2003 6:04 PM > To: Ewan Birney > Cc: Brian Osborne; BioPerl-List > Subject: RE: [Bioperl-l] Re: Bio::FPC > > Bio::Map::Marker is really more like our clones. A clone has a range > that it exists in contig (or map). But FPCMarkers don't have a position > in a map. They "hit" a clone. That is why I felt it was necessary to > create a new class. I don't see how these two ideas overlap. > > > On Tue, 2003-08-26 at 10:02, Ewan Birney wrote: > > > > > > On Tue, 26 Aug 2003, Brian Osborne wrote: > > > > > Jamie, > > > > > > And a "marker" can be a genetic marker, yes? A la Bio::Map::Marker? If > you > > > take a look at this module you'll see that its definition of marker > allows > > > any marker to have different positions in different maps (contig "map", > > > genetic map, physical map). This seems to overlap with your notion of > > > marker. > > > > > > Here's my first impression. There's a parser, MapIO::mapmaker for > mapmaker, > > > mapmaker makes maps from segregation data, genetic data. Your fpc makes > > > physical maps, yet physical and genetic maps can be merged to create > > > "integrated maps". Your fpcmarker must be closely related to > > > Bio::Map::Marker, in fact it's not clear that there should be an > fpcmarker. > > > I would think that a Marker object could be a reasonably rich one, and > it > > > could be created by fpc or any other program, it really shouldn't matter > > > much how it's created (in fact, all this new PopGen code must be > ordering > > > markers to make maps, I'd think). Perhaps you should be using some of > the > > > existing code in Bio/Map? Your thoughts? > > > > > > > Brian - I doubt the pop gen stuff will overlap at all with this stuff. but > > the marker comment is right, though I can well believe there needs to be > > specific FPC hooks for markers used for FPC stuff.... > > > > > > > > > > > Brian O. > > > > > > -----Original Message----- > > > From: Jamie Hatfield [mailto:jamie@genome.arizona.edu] > > > Sent: Tuesday, August 26, 2003 11:46 AM > > > To: Brian Osborne > > > Cc: BioPerl-List > > > Subject: RE: [Bioperl-l] Re: Bio::FPC > > > > > > Yes, definitly, discussion is great! > > > > > > We had a little bit of a discussion about this back in November 2002, > > > when I proposed the idea, and it was suggested by Heikki to try to fit > > > it into either Bio::Map or Bio::Assembly. Maybe I picked the wrong > > > one? How about this... I will describe a little bit about what fpc is, > > > for those who don't know, and those who know Bio::Map and Bio::Assembly > > > will tell me if it fits in their design. ok? > > > > > > FPC stands for FingerPrinted Contigs. Its main purpose is to assemble > > > clones into contiguous regions of overlaps, based on the fingerprint of > > > the clones. These fingerprints can be from agarose (sp?) gels, or HICF, > > > or simulated, or whatever. Maybe this is more like Assembly? > > > > > > Anyway, you have the clones, and there are also markers that hit the > > > clones, and aid in assembling the clones into contigs. These are the > > > main 3 classes. Clones, Contigs, Markers. Contigs contain Clones. > > > Markers hit Clones. Clones are hit by markers. > > > > > > Contigs 1--m Clones > > > Markers m--m Clones > > > > > > Is that a sufficient description of FPC, or do we need more to make a > > > good decision? > > > > > > Thanks for initiating the discussion, Brian. > > > > > > Jamie > > > > > > On Tue, 2003-08-26 at 05:28, Brian Osborne wrote: > > > > Jamie, > > > > > > > > One of the challenges in Bioperl is creating a single coherent set of > > > > modules from the many individual contributions. Could you tell us a > bit > > > > about your modules and how they overlap functionally with the existing > > > > modules in Bio::Map? If you take a look at those modules you can see > that > > > a > > > > good number of the more steadfast Bioperl authors have contributed to > > > > Bio::Map, I'm sure that they'd like to see your modules integrate > neatly > > > > with the existing code. > > > > > > > > I'm not one of these authors, I'm simply responding because it seems > that > > > > you'd like to get some discussion going. > > > > > > > > Brian O. > > > > > > > > -----Original Message----- > > > > From: bioperl-l-bounces@portal.open-bio.org > > > > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jamie > Hatfield > > > > Sent: Monday, August 25, 2003 6:35 PM > > > > To: BioPerl-List > > > > Subject: Re: [Bioperl-l] Re: Bio::FPC > > > > > > > > Again, how do I go about submitting this? > > > > > > > > On Thu, 2003-08-14 at 09:34, Jamie Hatfield wrote: > > > > > Yes, actually. We are just now finishing up the fpc parser. I was > > > > > planning on soon asking the group how I would go about submitting > it? > > > > > It consists of 5 modules that we have put in the MapIO and Map > > > > > namespaces. > > > > > Bio::MapIO::fpc.pm > > > > > Bio::Map::physical.pm > > > > > Bio::Map::fpcmarker.pm (sorry, but marker doesn't work) > > > > > Bio::Map::clone.pm > > > > > Bio::Map::contig.pm > > > > > > > > > > If you want to see how this object might be used, check out > > > > > http://www.genome.arizona.edu/software/fpc/biofpc/index.html > > > > > > > > > > You'll see there documentation for the modules, and a few test cases > or > > > > > example usages. > > > > > > > > > > Also, we are trying to make a generic converter to let you load in a > fpc > > > > > file and generate the necessary GFF for GBrowse to display the fpc > map. > > > > > It's a quite simple display of the clones, markers, and contigs, but > > > > > maybe that will be usefull as an alternative to WebFPC (a java view > only > > > > > version of fpc). It works for us, but might not work for everybody. > We > > > > > should be able to patch it up, though, if it's missing features. > > > > > > > > > > So, anyways, if somebody can let me know how to go about submitting > it, > > > > > we'll start the process. I looked through the FAQ and it basically > said > > > > > to just post information if you have a module that you would like to > > > > > contribute, so, here's the information. > > > > > > > > > > Jamie > > > > > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l@portal.open-bio.org > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From brian_osborne at cognia.com Thu Aug 28 11:31:21 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Aug 28 11:34:34 2003 Subject: [Bioperl-l] Re: Bio::FPC In-Reply-To: <1062080518.562.134.camel@motox> Message-ID: Jamie, >From the looks of it seems that a Marker can have one or more Positions and a Positions can have a range and a Map. But you asked about iterating over a Marker's Positions to find the "clones" (and both "clones" and "contigs" are smaller and larger Maps respectively, yes?). I'm seeing the ability to get/set but not anything like "next_position", but this is after a minute's inspection. I'd suggest you take a better and closer look at Mappable, Postion, and Map to get the real answer. If there's really nothing resembling this I'm certain you could create it, I think it's to your advantage to use, and possibly enrich, Bioperl's existing objects so that in the future you'll have access to all that's inside. But you knew that already... By the way, say "Hi!" to Rod Wing for me, he's a friend of mine from way back. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jamie Hatfield Sent: Thursday, August 28, 2003 10:22 AM To: Brian Osborne Cc: Ewan Birney; BioPerl-List Subject: RE: [Bioperl-l] Re: Bio::FPC Yes, that is valid. So then I assume you're saying that the marker can just hit the contig instead? That would be fine if 1) the marker could hit a range 2) the marker could store all the clones it hits. We need to be able to query the marker and ask it which clones and which contigs it hits. I don't think that would be possible with the Marker Object, would it? Thanks for keeping up this discussion, by the way. I'd like to try to get this finished up and added in correctly. A few people have asked how they get ahold of the fpc parser modules, and I've been telling them to wait until I can get it part of bioperl in a proper manner. Jamie On Thu, 2003-08-28 at 06:39, Brian Osborne wrote: > Jamie, > > Is it fair to say that "hitting a clone" is the same thing as "having a > position in a clone that starts at the beginning of the clone and ends at > the end of a clone"? Or "the FPCMarker's range goes from the beginning to > the end of a clone"? > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jamie Hatfield > Sent: Wednesday, August 27, 2003 6:04 PM > To: Ewan Birney > Cc: Brian Osborne; BioPerl-List > Subject: RE: [Bioperl-l] Re: Bio::FPC > > Bio::Map::Marker is really more like our clones. A clone has a range > that it exists in contig (or map). But FPCMarkers don't have a position > in a map. They "hit" a clone. That is why I felt it was necessary to > create a new class. I don't see how these two ideas overlap. > > > On Tue, 2003-08-26 at 10:02, Ewan Birney wrote: > > > > > > On Tue, 26 Aug 2003, Brian Osborne wrote: > > > > > Jamie, > > > > > > And a "marker" can be a genetic marker, yes? A la Bio::Map::Marker? If > you > > > take a look at this module you'll see that its definition of marker > allows > > > any marker to have different positions in different maps (contig "map", > > > genetic map, physical map). This seems to overlap with your notion of > > > marker. > > > > > > Here's my first impression. There's a parser, MapIO::mapmaker for > mapmaker, > > > mapmaker makes maps from segregation data, genetic data. Your fpc makes > > > physical maps, yet physical and genetic maps can be merged to create > > > "integrated maps". Your fpcmarker must be closely related to > > > Bio::Map::Marker, in fact it's not clear that there should be an > fpcmarker. > > > I would think that a Marker object could be a reasonably rich one, and > it > > > could be created by fpc or any other program, it really shouldn't matter > > > much how it's created (in fact, all this new PopGen code must be > ordering > > > markers to make maps, I'd think). Perhaps you should be using some of > the > > > existing code in Bio/Map? Your thoughts? > > > > > > > Brian - I doubt the pop gen stuff will overlap at all with this stuff. but > > the marker comment is right, though I can well believe there needs to be > > specific FPC hooks for markers used for FPC stuff.... > > > > > > > > > > > Brian O. > > > > > > -----Original Message----- > > > From: Jamie Hatfield [mailto:jamie@genome.arizona.edu] > > > Sent: Tuesday, August 26, 2003 11:46 AM > > > To: Brian Osborne > > > Cc: BioPerl-List > > > Subject: RE: [Bioperl-l] Re: Bio::FPC > > > > > > Yes, definitly, discussion is great! > > > > > > We had a little bit of a discussion about this back in November 2002, > > > when I proposed the idea, and it was suggested by Heikki to try to fit > > > it into either Bio::Map or Bio::Assembly. Maybe I picked the wrong > > > one? How about this... I will describe a little bit about what fpc is, > > > for those who don't know, and those who know Bio::Map and Bio::Assembly > > > will tell me if it fits in their design. ok? > > > > > > FPC stands for FingerPrinted Contigs. Its main purpose is to assemble > > > clones into contiguous regions of overlaps, based on the fingerprint of > > > the clones. These fingerprints can be from agarose (sp?) gels, or HICF, > > > or simulated, or whatever. Maybe this is more like Assembly? > > > > > > Anyway, you have the clones, and there are also markers that hit the > > > clones, and aid in assembling the clones into contigs. These are the > > > main 3 classes. Clones, Contigs, Markers. Contigs contain Clones. > > > Markers hit Clones. Clones are hit by markers. > > > > > > Contigs 1--m Clones > > > Markers m--m Clones > > > > > > Is that a sufficient description of FPC, or do we need more to make a > > > good decision? > > > > > > Thanks for initiating the discussion, Brian. > > > > > > Jamie > > > > > > On Tue, 2003-08-26 at 05:28, Brian Osborne wrote: > > > > Jamie, > > > > > > > > One of the challenges in Bioperl is creating a single coherent set of > > > > modules from the many individual contributions. Could you tell us a > bit > > > > about your modules and how they overlap functionally with the existing > > > > modules in Bio::Map? If you take a look at those modules you can see > that > > > a > > > > good number of the more steadfast Bioperl authors have contributed to > > > > Bio::Map, I'm sure that they'd like to see your modules integrate > neatly > > > > with the existing code. > > > > > > > > I'm not one of these authors, I'm simply responding because it seems > that > > > > you'd like to get some discussion going. > > > > > > > > Brian O. > > > > > > > > -----Original Message----- > > > > From: bioperl-l-bounces@portal.open-bio.org > > > > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jamie > Hatfield > > > > Sent: Monday, August 25, 2003 6:35 PM > > > > To: BioPerl-List > > > > Subject: Re: [Bioperl-l] Re: Bio::FPC > > > > > > > > Again, how do I go about submitting this? > > > > > > > > On Thu, 2003-08-14 at 09:34, Jamie Hatfield wrote: > > > > > Yes, actually. We are just now finishing up the fpc parser. I was > > > > > planning on soon asking the group how I would go about submitting > it? > > > > > It consists of 5 modules that we have put in the MapIO and Map > > > > > namespaces. > > > > > Bio::MapIO::fpc.pm > > > > > Bio::Map::physical.pm > > > > > Bio::Map::fpcmarker.pm (sorry, but marker doesn't work) > > > > > Bio::Map::clone.pm > > > > > Bio::Map::contig.pm > > > > > > > > > > If you want to see how this object might be used, check out > > > > > http://www.genome.arizona.edu/software/fpc/biofpc/index.html > > > > > > > > > > You'll see there documentation for the modules, and a few test cases > or > > > > > example usages. > > > > > > > > > > Also, we are trying to make a generic converter to let you load in a > fpc > > > > > file and generate the necessary GFF for GBrowse to display the fpc > map. > > > > > It's a quite simple display of the clones, markers, and contigs, but > > > > > maybe that will be usefull as an alternative to WebFPC (a java view > only > > > > > version of fpc). It works for us, but might not work for everybody. > We > > > > > should be able to patch it up, though, if it's missing features. > > > > > > > > > > So, anyways, if somebody can let me know how to go about submitting > it, > > > > > we'll start the process. I looked through the FAQ and it basically > said > > > > > to just post information if you have a module that you would like to > > > > > contribute, so, here's the information. > > > > > > > > > > Jamie > > > > > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l@portal.open-bio.org > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason at cgt.duhs.duke.edu Thu Aug 28 12:34:38 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Aug 28 12:12:03 2003 Subject: [Bioperl-l] doing a 1.2.3 release In-Reply-To: References: Message-ID: > I'd agree. I also think we need to start on the 1.3 series. Heikki... > feel free to chime in...? > > > Jason - I am ok for some bug hunting for a while on either branch or trunk > - any not-so-nasty-that-I-go-mad-but-useful-to-fix-bugs out there? Fix translate() to check for completeness on 5' and 3' end http://bugzilla.open-bio.org/show_bug.cgi?id=1476 PDB residue recognition code http://bugzilla.open-bio.org/show_bug.cgi?id=1485 Look at Keith's old bug, I don't know if it is still broken http://bugzilla.open-bio.org/show_bug.cgi?id=992 For fun bugs to fix on the main trunk NCBI Contig entries - do we really handle them correctly now? http://bugzilla.open-bio.org/show_bug.cgi?id=1319 Depending on how Juguang's Bio::Species/ Bio::Taxomomy::Node connection evolves I think this bug will get fixed in the process. http://bugzilla.open-bio.org/show_bug.cgi?id=1244 From jason at cgt.duhs.duke.edu Thu Aug 28 12:45:10 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Aug 28 12:23:23 2003 Subject: [Bioperl-l] taxonomy and speices In-Reply-To: References: Message-ID: Glad you are taking this on. I think you can think about just dropping Bio::Taxonomy::Tree and Bio::Taxonomy::Taxon if they don't provide anything useful any more. Dan has moved on AFAIK and I couldn't make them work in a way that I think was useful, so I wrote Bio::Taxonomy::Node to be an entity in the entire taxonomy with ability to move up or down classification levels. Here is how I envisioned this working. Bio::Species (or its successor) will be the collection of all the information above a node in the Taxonomy hierarchy. So as you have described we can talk about sub species, etc. This is what I think Dan had in mind with Bio::Taxonomy::Taxon, meaning the tip nodes in the Hierarchy. I think it is fine to fix/replace any code in Bio::Taxonomy as I don't think it has been used anywhere yet. If you can think about how to keep using the Factory object for access to the taxonomy so that underneath the factory can be locally indexed taxdmp from NCBI as I have implemented, the limited HTTP access that NCBI provides to this data, or a BioSQL system. I would try and use the taxon tables that Aaron setup in BioSQL - it was our intention all along to provide this Factory with access to biosql, we just have not had time to get it working. Once this is integrated in, we can start to rely on using taxonid numbers for lookups and query constraints more easily which I think will be a big plus. -jason On Thu, 28 Aug 2003, Juguang Xiao wrote: > Hi guys, > > I tried to write a simple bioperl-db scripts functioning like the > search on http://www.ncbi.nih.gov/Taxonomy/taxonomyhome.html/ , to > return a full taxonomy path, and all sub taxonomy nodes. Say, If I > search 'mouse', it will return the full path as > > Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; > Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus; mouse > > And all sub taxonomy nodes will be also returned, like 'asian house > mouse', 'european house mouse', etc. > > However, the Guru Hilmar told me that current bioperl-db works on > Bio::Species, but not Bio::Taxonomy, and now bioperl-db cannot satisfy > my above requirement until the code will adapt Taxomony after Taxonomy > replaces Species. Hence I investigate the species-related modules, > found some puzzles and would like to volunteer the idea and the code. > > Bio::Taxonomy is written by Dan Kortschak, and the main and only > functional method (rather than get/set, I mean), 'classify', is to > convert a Species object into an array of names. It wastes such nice > module name ;-) > > Jason wrote Bio::Taxonomy::Node, and Bio::DB::Taxonomy which access > NCBI Entrez over HTTP OR read the NCBI Tax dump files. > Bio::Taxonomy::Node is tied to Bio::DB::Taxonomy closely, hence it > objects to be adapted in bioperl-db system so easily. > > My plan to reform them is described below. > > DATA STRUCTURE > Taxonomy should be abstracted as a hash with the keys as rank names, > such as 'class', 'genus', and values as the identifiers, such as NCBI > taxid, scientific name or Taxonomy::Node object. > > $taxonomy = { > '_rank' => ['root', 'superkingdom', ..., 'species', 'subspecies'..., > 'no rank'], # copied from the current Taxonomy module. > > '_hierarchy' => { # Though the keys are unordered in this hash, its > order is defined in rank. > ... > 'class' => 40674, # or mammlia, or the Taxonomy::Node > 'genus' => 'Mus', > 'species' => $tax_Node_musculus > .... > }, > '_factory' => $factory, # explained later. > }; > > NOTE: the new taxonomy can represent more than species level, e.g. it > is flexible to represent a object at genus level without species. > > $taxNode_mammalia = { > 'object_id' => 40674, # NCBI taxid, and the reason why it is called > 'object_id' for the consistence to Bio:;IdentifiableI > 'rank' => 'class', > 'name' => 'Mammalia', # scientific name > 'common_name' => 'mammals', # Genbank common name, as NCBI site uses > the term. > 'alias' => { # a hash with name_class as key and variant name as value > '' => '' > }, > '_factory' => $factory > }; > > > $taxNode_mouse = { > 'object_id' => 10090, > 'rank' => 'species', > 'names' => { # This is a general solution!! > 'specific' => ['musclus'], > 'common' => ['mouse', 'Mickey'], > 'includes' => ['nude mice'] > } > }; > > OBJECTS > > Bio::Taxonomy will override all methods in Bio::Species, for the sake > of backwards compatibility. If the tax object represents a level higher > than species, the sub 'binomial' returns undef, otherwise simple make > the result by combining the species and genus; the sub 'classification' > will look like " > > foreach(@ranks){ > unshift @classification, $taxonomy{$_} if defined exists $taxonomy{$_} > } > > > Bio::Taxonomy::Node has NO reference to either the parent node or > taxonomy object, so that Node objects can be freely shared among > Taxonomy. Tricky: once a Node object is created, it should be changed > on its content. If a Taxonomy requires one of its Nodes modified, it > has to make a new Node, in case that Node was shared by other Taxonomy. > > Definitely, we need a Taxonomy factory, like Jason's Bio::DB::Taxonomy > or what we are going to create in bioperl-db. Both Taxonomy objects and > Node ones have a reference to this factory, so that Taxonomy can be > created automatically, and Node can ask who his parent is, > ($node->get_parent_node, e. g. > $node->_factory->find_parent_node($node)). > > Comments, please, and I will transform the idea into the code. > > > Thanks. > > Juguang > > > > ------------ATGCCGAGCTTNNNNCT-------------- > Juguang Xiao > Bioinformatics Engineer > Temasek Life Sciences Laboratory, National University of Singapore > 1 Research Link, Singapore 117604 > fax: (+65) 68727007 > > juguang at tll.org.sg > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From birney at ebi.ac.uk Thu Aug 28 12:47:54 2003 From: birney at ebi.ac.uk (Ewan Birney) Date: Thu Aug 28 12:46:36 2003 Subject: [Bioperl-l] doing a 1.2.3 release In-Reply-To: Message-ID: On Thu, 28 Aug 2003, Jason Stajich wrote: > > > I'd agree. I also think we need to start on the 1.3 series. Heikki... > > feel free to chime in...? > > > > > > Jason - I am ok for some bug hunting for a while on either branch or trunk > > - any not-so-nasty-that-I-go-mad-but-useful-to-fix-bugs out there? > > Fix translate() to check for completeness on 5' and 3' end > http://bugzilla.open-bio.org/show_bug.cgi?id=1476 Ok, I'm doing this one; I will probably just use Matt's proposed fix after a review. > PDB residue recognition code > http://bugzilla.open-bio.org/show_bug.cgi?id=1485 Ditto > Look at Keith's old bug, I don't know if it is still broken > http://bugzilla.open-bio.org/show_bug.cgi?id=992 > Aaaaaaaaaagh. The fuzzies. I will give it a go. I have just cvs update'd on the branch and splicedseq.t is also failing on my box at the moment. I'll fix. From redwards at utmem.edu Thu Aug 28 13:01:10 2003 From: redwards at utmem.edu (Rob Edwards) Date: Thu Aug 28 13:00:13 2003 Subject: [Bioperl-l] doing a 1.2.3 release In-Reply-To: Message-ID: <39528B78-D979-11D7-96EB-000A959E1622@utmem.edu> I would also add: Enhancement to Bio::DB::GenBank to allow subsequence retrieval http://bugzilla.open-bio.org/show_bug.cgi?id=1405 To this list. Two versions of patches are in bugzilla, so it should (?) be easy. Rob On Thursday, August 28, 2003, at 11:34 AM, Jason Stajich wrote: > >> I'd agree. I also think we need to start on the 1.3 series. Heikki... >> feel free to chime in...? >> >> >> Jason - I am ok for some bug hunting for a while on either branch or >> trunk >> - any not-so-nasty-that-I-go-mad-but-useful-to-fix-bugs out there? > > Fix translate() to check for completeness on 5' and 3' end > http://bugzilla.open-bio.org/show_bug.cgi?id=1476 > PDB residue recognition code > http://bugzilla.open-bio.org/show_bug.cgi?id=1485 > Look at Keith's old bug, I don't know if it is still broken > http://bugzilla.open-bio.org/show_bug.cgi?id=992 > > For fun bugs to fix on the main trunk > NCBI Contig entries - do we really handle them correctly now? > http://bugzilla.open-bio.org/show_bug.cgi?id=1319 > > Depending on how Juguang's Bio::Species/ Bio::Taxomomy::Node connection > evolves I think this bug will get fixed in the process. > http://bugzilla.open-bio.org/show_bug.cgi?id=1244 > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From birney at ebi.ac.uk Thu Aug 28 13:06:54 2003 From: birney at ebi.ac.uk (Ewan Birney) Date: Thu Aug 28 13:05:36 2003 Subject: [Bioperl-l] doing a 1.2.3 release In-Reply-To: Message-ID: Jason or other Perl Gods... There is something very, very kooky going on with the new DESTROY method in Bio::SeqIO. What is happening is that on the line: if( $db ) { if( ref($db) && !$db->isa('Bio::DB::RandomAccessI') ) { $self->warn("Must pass in a valid Bio::DB::RandomAccessI object for access to remote l\ ocations for spliced_seq"); $db = undef; } } if( $db) in SeqFeatureI (I broke up the if statement to isolate it), perl is somehow calling the garbage collector, and for reasons beyond me ends up saying: Bio::SeqFeatureI::spliced_seq(Bio/SeqFeatureI.pm:459): 459: my ($mixed,$mixedloc,$fstrand) = (0); DB<3> n Bio::SeqFeatureI::spliced_seq(Bio/SeqFeatureI.pm:461): 461: if( $db ) { DB<3> s Can't call method "isa" on an undefined value at Bio/SeqFeatureI.pm line 461. Bio::SeqIO::DESTROY(Bio/SeqIO.pm:627): 627: my $self = shift; Notice that the debugger has just now entered the Bio::SeqIO::DESTROY method. Commenting out though just moves this towards Bio/Root/RootI DESTROY... Really confused. Perl is at: Ewan-Birneys-Computer:~/src/bioperl-branch-1-2] birney% perl -version This is perl, v5.6.0 built for darwin Copyright 1987-2000, Larry Wall Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5.0 source kit. Complete documentation for Perl, including FAQ lists, should be found on this system using `man perl' or `perldoc perl'. If you have access to the Internet, point your browser at http://www.perl.com/, the Perl Home Page. Is there a "gotcha" here? This is v. annoying.... From jason at cgt.duhs.duke.edu Thu Aug 28 13:39:01 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Aug 28 13:16:27 2003 Subject: [Bioperl-l] doing a 1.2.3 release In-Reply-To: References: Message-ID: Yeah I don't know what is going on - t/SeqFeature.t worked fine for me before. If I change it to add an extra (defined $db) in the elsif statement it works, perhaps the die is coming from the new Bio::DB::InMemoryCache as my elsif wasn't strict enough? if( defined $db && ref($db) && ! $db->isa('Bio::DB::RandomAccessI') ) { $self->warn("Must pass in a valid Bio::DB::RandomAccessI object for access to remote locations for spliced_seq"); $db = undef; } elsif( defined $db && $HasInMemory && ! $db->isa('Bio::DB::InMemoryCache') ) { $db = new Bio::DB::InMemoryCache(-seqdb => $db); } On Thu, 28 Aug 2003, Ewan Birney wrote: > > Jason or other Perl Gods... > > > There is something very, very kooky going on with the new DESTROY method > in Bio::SeqIO. What is happening is that on the line: > > > if( $db ) { > if( ref($db) && !$db->isa('Bio::DB::RandomAccessI') ) { > $self->warn("Must pass in a valid Bio::DB::RandomAccessI > object for access to remote l\ > ocations for spliced_seq"); > $db = undef; > } > } > > > if( $db) > > in SeqFeatureI (I broke up the if statement to isolate it), perl is > somehow calling the garbage collector, and for reasons beyond me ends up > saying: > > > Bio::SeqFeatureI::spliced_seq(Bio/SeqFeatureI.pm:459): > 459: my ($mixed,$mixedloc,$fstrand) = (0); > DB<3> n > Bio::SeqFeatureI::spliced_seq(Bio/SeqFeatureI.pm:461): > 461: if( $db ) { > DB<3> s > Can't call method "isa" on an undefined value at Bio/SeqFeatureI.pm line > 461. > Bio::SeqIO::DESTROY(Bio/SeqIO.pm:627): > 627: my $self = shift; > > > > Notice that the debugger has just now entered the Bio::SeqIO::DESTROY > method. Commenting out though just moves this towards Bio/Root/RootI > DESTROY... > > > Really confused. Perl is at: > > Ewan-Birneys-Computer:~/src/bioperl-branch-1-2] birney% perl -version > > This is perl, v5.6.0 built for darwin > > Copyright 1987-2000, Larry Wall > > Perl may be copied only under the terms of either the Artistic License or > the > GNU General Public License, which may be found in the Perl 5.0 source kit. > > Complete documentation for Perl, including FAQ lists, should be found on > this system using `man perl' or `perldoc perl'. If you have access to the > Internet, point your browser at http://www.perl.com/, the Perl Home Page. > > > > Is there a "gotcha" here? This is v. annoying.... > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From chauser at duke.edu Thu Aug 28 15:18:26 2003 From: chauser at duke.edu (Charles Hauser) Date: Thu Aug 28 15:18:28 2003 Subject: [Bioperl-l] bp_extract_feature_seq.pl : --feature CDS error Message-ID: <1062098372.31518.73.camel@pandorina.biology.duke.edu> I updated my bioperl to cvs today and now am unable to extract CDS features from a genbank report using the script Using bp_extract_feature_seq.pl. using: -i foo.gbk --format genbank --feature gene script returns concatenated fasta data, but ends with: Can't call method "isa" on an undefined value at /usr/local/src/bioperl/core/Bio/SeqFeatureI.pm line 457, line 1781. line 1781 is the '//'line at the end of the gbk report. Attempting to extract CDS feature seqs --feature CDS returns no data just the error listed above. Charles From jason at cgt.duhs.duke.edu Thu Aug 28 15:53:05 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Aug 28 15:30:29 2003 Subject: [Bioperl-l] bp_extract_feature_seq.pl : --feature CDS error In-Reply-To: <1062098372.31518.73.camel@pandorina.biology.duke.edu> References: <1062098372.31518.73.camel@pandorina.biology.duke.edu> Message-ID: We just fixed this - see Ewan's last message. I think I've committed a cleaner fix to the last one. I think the problem may have been in InMemoryCache instantiation instead. -jason On Thu, 28 Aug 2003, Charles Hauser wrote: > I updated my bioperl to cvs today and now am unable to extract CDS > features from a genbank report using the script Using > bp_extract_feature_seq.pl. > > > using: > > -i foo.gbk --format genbank --feature gene > > script returns concatenated fasta data, but ends with: > Can't call method "isa" on an undefined value at > /usr/local/src/bioperl/core/Bio/SeqFeatureI.pm line 457, line > 1781. > > line 1781 is the '//'line at the end of the gbk report. > > > Attempting to extract CDS feature seqs --feature CDS returns no data > just the error listed above. > > Charles > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From ymc at paxil.stanford.edu Thu Aug 28 17:42:39 2003 From: ymc at paxil.stanford.edu (Yee Man Chan) Date: Thu Aug 28 17:41:44 2003 Subject: [Bioperl-l] Re: Smith-Waterman question In-Reply-To: <3F4E7161.9010102@foobox.com> Message-ID: t1, t2 are both DNA sequences. Their alignment has nothing to do with BLOSUM62... To verify your score, I suggest you run ssearch34 from FASTA package. Yee Man On Thu, 28 Aug 2003, Ahmed Moustafa wrote: > Hi Yee, > > I was testing my implementation of Smith-Waterman algorithm with your t1 > and t2 sequences. I'd like to verify the alignment score with you. Mine > was 56101.00 with BLOSUM62, open gap 10.0 and extend gap 0.5. > > Could you please send me your score? > > Thanks in advance! > Ahmed > > -- > Ahmed Moustafa > Programmer/Kaiser Permanente > > From jbronson at acsu.buffalo.edu Thu Aug 28 18:36:42 2003 From: jbronson at acsu.buffalo.edu (Joshua Bronson) Date: Thu Aug 28 18:37:36 2003 Subject: [Bioperl-l] aligning sequences with Bio::Tools::pSW Message-ID: <20030828223642.GA7244@resnet147-170> I'm interested in the a portion of some virus polyproteins. To find the portion, I'm aligning the polyprotein against other known proteins. I want the computer to give me a best guess and align the smaller protein end-to-end, but currently it's not doing that. It will only give me portions of the protein that align strongly. None of the proteins are aligning end-to-end, unless I align a protein against itself. Bio::Tools:pSW is what I'm using currently. Bioperl doesn't seem to have an interface to do pairwise alignments with Clustalw, and I'm experiencing problems using standalone blast. Anyone have any ideas? From wes.barris at csiro.au Thu Aug 28 18:40:31 2003 From: wes.barris at csiro.au (Wes Barris) Date: Thu Aug 28 18:39:50 2003 Subject: [Bioperl-l] How do you add a consensus to AlignIO? In-Reply-To: References: Message-ID: <3F4E84DF.4080803@csiro.au> Brian Osborne wrote: > Wes, > > I'm puzzled. I'm looking at SimpleAlign::_consensus_aa and it certainly > looks like this method should ignore the $gapchar when it calculates the > consensus at any given position. Does your method look like this: Hi Brian, Thanks for responding. No, my SimpleAlign.pm does not look like this. I am using bioperl-1.2.2. You must be using bioperl-live from CVS. I added the two "gapchar" changes to my bioper-1.2.2 code and it works like a charm. Thanks! > > sub _consensus_aa { > my $self = shift; > my $point = shift; > my $threshold_percent = shift || -1 ; > my ($seq,%hash,$count,$letter,$key); > my $gapchar = $self->gap_char; > foreach $seq ( $self->each_seq() ) { > $letter = substr($seq->seq,$point,1); > $self->throw("--$point-----------") if $letter eq ''; > ($letter eq $gapchar || $letter =~ /\./) && next; > # print "Looking at $letter\n"; > $hash{$letter}++; > } > my $number_of_sequences = $self->no_sequences(); > my $threshold = $number_of_sequences * $threshold_percent / 100. ; > $count = -1; > $letter = '?'; > > foreach $key ( sort keys %hash ) { > # print "Now at $key $hash{$key}\n"; > if( $hash{$key} > $count && $hash{$key} >= $threshold) { > $letter = $key; > $count = $hash{$key}; > } > } > return $letter; > } > > ? > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Wes Barris > Sent: Wednesday, August 27, 2003 10:56 PM > To: Bioperl Mailing List > Subject: Re: [Bioperl-l] How do you add a consensus to AlignIO? > > Jason Stajich wrote: > > >>Try: >> my $consensus = new Bio::LocatableSeq(-seq=>$aln->consensus_string(), >> -id=>'btcn1000'); > > > Thanks. That did the trick almost (I had to add -start and -end to the > argument list). However, it is not producing the consensus in the way > that I want. Here is a portion of the resulting msf file: > > AU278862 ---------- ---------C TCTACAGAAT CTGTGTTTAT TTTGTTTCAG > AU278567 ---------- ---------- ---ACAGAAT CTGTGTTTAT TTTGTTTCAG > AU277959 -------AGA TTTTGACATC TCTACAGAAT CTGTGTTTAT TTTGTTTCAG > AU278008 CCTTCTTANA TTTTGACATC TCTACAGAAT CTGNGTTTAT TTTGTTTCAG > AU278623 -------AAA TTTTGACATC TCTACANAA- CTGTGTTTAT TTTGTTTCAN > AU278682 -------AAA TTTTGACATC TCTACANAAT CTGTGTTTAT TTTGTTTCAN > BM031781 ---------- ---------- ---------- ---------- ---------- > consensus -------A-A TTTTGACATC TCTACAGAAT CTGTGTTTAT TTTGTTTCAG > > I need the consensus to span the entire alignment length like this: > > consensus CCTTCTTAAA TTTTGACATC TCTACAGAAT CTGTGTTTAT TTTGTTTCAG > > i.e. I need it to ignore where the aligned sequences do not exist. Is there > a way to make it do that? > > -- > Wes Barris > E-Mail: Wes.Barris@csiro.au > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Wes Barris E-Mail: Wes.Barris@csiro.au From shawnh at fugu-sg.org Thu Aug 28 19:28:27 2003 From: shawnh at fugu-sg.org (Shawn Hoon) Date: Thu Aug 28 19:25:52 2003 Subject: [Bioperl-l] aligning sequences with Bio::Tools::pSW In-Reply-To: <20030828223642.GA7244@resnet147-170> References: <20030828223642.GA7244@resnet147-170> Message-ID: <53B80BA0-D9AF-11D7-B98A-000A95783436@fugu-sg.org> On Friday, August 29, 2003, at 6:36 AM, Joshua Bronson wrote: > I'm interested in the a portion of some virus polyproteins. To find > the portion, I'm aligning the polyprotein against other known > proteins. I want the computer to give me a best guess and align the > smaller protein end-to-end, but currently it's not doing that. It will > only give me portions of the protein that align strongly. None of the > proteins are aligning end-to-end, unless I align a protein against > itself. > > Bio::Tools:pSW is what I'm using currently. Bioperl doesn't seem to > have an interface to do pairwise alignments with Clustalw, and I'm > experiencing problems using standalone blast. Anyone have any ideas? > We actually support these alignment programs as wrappers in the bioperl-run package. We currently have Clustalw, Lagan, Fasta and TCoffee which you can try under the Bio::Tools::Run::Alignment::* namespace. > bioperl-run 1.2.2 available here: http://www.bioperl.org/DIST/current_run_stable.tar.gz shawn > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -shawn From wes.barris at csiro.au Thu Aug 28 19:57:44 2003 From: wes.barris at csiro.au (Wes Barris) Date: Thu Aug 28 19:57:03 2003 Subject: [Bioperl-l] ace to msf format? Message-ID: <3F4E96F8.9060706@csiro.au> Can anyone give me a hint as to how I could use bioperl to read in an ACE assembly and write out an MSF formatted alignment? This shows what I have figured out so far: #!/usr/local/bin/perl -w # use strict; use Bio::Assembly::IO; # my $usage = "Usage: $0 \n"; my $infile = shift or die $usage; my $io = new Bio::Assembly::IO(-file=>$infile, -format=>'ace'); my $assembly = $io->next_assembly; my $aln = $assembly->all_contigs(); -- Wes Barris E-Mail: Wes.Barris@csiro.au From ajm6q at virginia.edu Fri Aug 29 07:41:29 2003 From: ajm6q at virginia.edu (Aaron J Mackey) Date: Fri Aug 29 07:40:24 2003 Subject: [Bioperl-l] aligning sequences with Bio::Tools::pSW In-Reply-To: <20030828223642.GA7244@resnet147-170> Message-ID: pSW.pm is an implementation of the Smith-Waterman algorithm, a *local* pairwise alignment algorithm, which means that the best alignment found need not start and finish at the beginning and ending of either sequence. You seem to want a *global* alignment algorithm, such as Needleman-Wunsch (with "free" end-gap penalties for globally aligning the larger sequence: this is how to achieve hybrid local/global pairwise alignments); ClustalW should be able to give you what you need. -Aaron On Thu, 28 Aug 2003, Joshua Bronson wrote: > I'm interested in the a portion of some virus polyproteins. To find > the portion, I'm aligning the polyprotein against other known > proteins. I want the computer to give me a best guess and align the > smaller protein end-to-end, but currently it's not doing that. It will > only give me portions of the protein that align strongly. None of the > proteins are aligning end-to-end, unless I align a protein against > itself. > > Bio::Tools:pSW is what I'm using currently. Bioperl doesn't seem to > have an interface to do pairwise alignments with Clustalw, and I'm > experiencing problems using standalone blast. Anyone have any ideas? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Aaron J Mackey Pearson Laboratory University of Virginia (434) 924-2821 amackey@virginia.edu From brian_osborne at cognia.com Fri Aug 29 11:13:00 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Aug 29 11:15:50 2003 Subject: [Bioperl-l] aligning sequences with Bio::Tools::pSW In-Reply-To: <20030828223642.GA7244@resnet147-170> Message-ID: Joshua, > Bioperl doesn't seem to have an interface to do pairwise alignments with Clustalw, Bioperl is exactly what you want! See section IV.2.3 of the bptutorial, "Aligning multiple sequences (Clustalw.pm, TCoffee.pm)". With the resulting SimpleAlign object you'll have all sorts of useful methods for analyzing and "slicing" the alignment. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Joshua Bronson Sent: Thursday, August 28, 2003 6:37 PM To: Bioperl list Subject: [Bioperl-l] aligning sequences with Bio::Tools::pSW I'm interested in the a portion of some virus polyproteins. To find the portion, I'm aligning the polyprotein against other known proteins. I want the computer to give me a best guess and align the smaller protein end-to-end, but currently it's not doing that. It will only give me portions of the protein that align strongly. None of the proteins are aligning end-to-end, unless I align a protein against itself. Bio::Tools:pSW is what I'm using currently. Bioperl doesn't seem to have an interface to do pairwise alignments with Clustalw, and I'm experiencing problems using standalone blast. Anyone have any ideas? _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From lstein at cshl.edu Fri Aug 29 11:39:58 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Fri Aug 29 11:38:58 2003 Subject: [Bioperl-l] Bio::Graphics In-Reply-To: <5.1.1.6.0.20030806090354.00b28208@valmont> References: <5.1.1.6.0.20030806090354.00b28208@valmont> Message-ID: <200308291139.59003.lstein@cshl.edu> Sorry for responding so late to this e-mail. You were looking at an out of date tutorial that no longer matches the code base. However, the tutorial has now been updated and the examples should work. Lincoln On Wednesday 06 August 2003 03:09 am, Laurence Amilhat wrote: > Hi, > > I try to learn how to use the module Bio::Graphics. > I found he How To from Lincoln Stein on the web. I try to practice with the > examples, it's working except for the labels of the features that don't > appear on my figure. > Does anybody ever use this module? > > This is the example: > #!/usr/local/public/bin/perl > > use strict; > use lib > '/homej/bioinf/lamilhat/PERL_MODULE/lib/perl5/site_perl/5.005/BIOPERL/lib/s >ite_perl/5.6.1/'; use Bio::Graphics; > use Bio::SeqFeature::Generic; > > my $panel= Bio::Graphics::Panel->new(-length =>1000,-width =>800); > my $track=$panel->add_track(-glyph =>'generic',-label =>1); > > > while (<>) > { > chomp; > next if /^\#/; > my ($name,$score,$start,$end)=split /\t+/; > print STDERR "$name\n"; > my $feature= > Bio::SeqFeature::Generic->new(-display_name=>$name,-score=>$score,-start=>$ >start,-end=>$end); $track->add_feature($feature); > } > > print $panel->png; > > > And this is the Data to parse with the example: > #hit score start end > truc1 381 2 200 > truc2 210 2 210 > truc3 800 2 200 > truc4 1000 380 921 > truc5 812 402 972 > truc6 1200 400 970 > bum 400 300 620 > pres1 127 310 700 > > > Thanks, > > Laurence. > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > INRA, UMR INRA/UBP Am?lioration et Sant? des Plantes > 234 avenue du Br?zet > 63039 Clermont-Ferrand Cedex 2 > > Tel 04 73 62 48 37 > Fax 04 73 62 44 53 > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- ======================================================================== Lincoln D. Stein Cold Spring Harbor Laboratory lstein@cshl.org Cold Spring Harbor, NY ======================================================================== From william.j.burtle at gsk.com Fri Aug 29 07:55:34 2003 From: william.j.burtle at gsk.com (william.j.burtle@gsk.com) Date: Fri Aug 29 12:12:09 2003 Subject: [Bioperl-l] aligning sequences with Bio::Tools::pSW Message-ID: Joshua, you might also try the EMBOSS tools (particularly stretcher and needle): http://www.emboss.org/ and the bioperl interface: http://doc.bioperl.org/releases/bioperl-1.2/Bio/Factory/EMBOSS.html - Bill Burtle "Aaron J Mackey" Sent by: bioperl-l-bounces@portal.open-bio.org 29-Aug-2003 07:41 Please respond to "Aaron J. Mackey" To: "Joshua Bronson" cc: "Bioperl list" Subject: Re: [Bioperl-l] aligning sequences with Bio::Tools::pSW pSW.pm is an implementation of the Smith-Waterman algorithm, a *local* pairwise alignment algorithm, which means that the best alignment found need not start and finish at the beginning and ending of either sequence. You seem to want a *global* alignment algorithm, such as Needleman-Wunsch (with "free" end-gap penalties for globally aligning the larger sequence: this is how to achieve hybrid local/global pairwise alignments); ClustalW should be able to give you what you need. -Aaron On Thu, 28 Aug 2003, Joshua Bronson wrote: > I'm interested in the a portion of some virus polyproteins. To find > the portion, I'm aligning the polyprotein against other known > proteins. I want the computer to give me a best guess and align the > smaller protein end-to-end, but currently it's not doing that. It will > only give me portions of the protein that align strongly. None of the > proteins are aligning end-to-end, unless I align a protein against > itself. > > Bio::Tools:pSW is what I'm using currently. Bioperl doesn't seem to > have an interface to do pairwise alignments with Clustalw, and I'm > experiencing problems using standalone blast. Anyone have any ideas? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Aaron J Mackey Pearson Laboratory University of Virginia (434) 924-2821 amackey@virginia.edu _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From brian_osborne at cognia.com Fri Aug 29 12:21:47 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Aug 29 12:24:50 2003 Subject: [Bioperl-l] ace to msf format? In-Reply-To: <3F4E96F8.9060706@csiro.au> Message-ID: Wes, I don't think this is possible in Bioperl. To put it more generally, AlignIO can't accommodate Assembly objects currently. AlignIO is the module that takes in a variety of alignment formats and interconverts them, analogous to SeqIO. I'll be corrected if I'm wrong. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Wes Barris Sent: Thursday, August 28, 2003 7:58 PM To: Bioperl Mailing List Subject: [Bioperl-l] ace to msf format? Can anyone give me a hint as to how I could use bioperl to read in an ACE assembly and write out an MSF formatted alignment? This shows what I have figured out so far: #!/usr/local/bin/perl -w # use strict; use Bio::Assembly::IO; # my $usage = "Usage: $0 \n"; my $infile = shift or die $usage; my $io = new Bio::Assembly::IO(-file=>$infile, -format=>'ace'); my $assembly = $io->next_assembly; my $aln = $assembly->all_contigs(); -- Wes Barris E-Mail: Wes.Barris@csiro.au _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From brian_osborne at cognia.com Fri Aug 29 12:25:04 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Aug 29 12:28:01 2003 Subject: [Bioperl-l] aligning sequences with Bio::Tools::pSW In-Reply-To: Message-ID: Bill, AlignIO does stretcher too? Cool. So that means water, needle, and stretcher have the same output format. Are there any others in the EMBOSS suite with this output format? Thanks again, Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of william.j.burtle@gsk.com Sent: Friday, August 29, 2003 7:56 AM To: Joshua Bronson Cc: Bioperl list Subject: Re: [Bioperl-l] aligning sequences with Bio::Tools::pSW Joshua, you might also try the EMBOSS tools (particularly stretcher and needle): http://www.emboss.org/ and the bioperl interface: http://doc.bioperl.org/releases/bioperl-1.2/Bio/Factory/EMBOSS.html - Bill Burtle "Aaron J Mackey" Sent by: bioperl-l-bounces@portal.open-bio.org 29-Aug-2003 07:41 Please respond to "Aaron J. Mackey" To: "Joshua Bronson" cc: "Bioperl list" Subject: Re: [Bioperl-l] aligning sequences with Bio::Tools::pSW pSW.pm is an implementation of the Smith-Waterman algorithm, a *local* pairwise alignment algorithm, which means that the best alignment found need not start and finish at the beginning and ending of either sequence. You seem to want a *global* alignment algorithm, such as Needleman-Wunsch (with "free" end-gap penalties for globally aligning the larger sequence: this is how to achieve hybrid local/global pairwise alignments); ClustalW should be able to give you what you need. -Aaron On Thu, 28 Aug 2003, Joshua Bronson wrote: > I'm interested in the a portion of some virus polyproteins. To find > the portion, I'm aligning the polyprotein against other known > proteins. I want the computer to give me a best guess and align the > smaller protein end-to-end, but currently it's not doing that. It will > only give me portions of the protein that align strongly. None of the > proteins are aligning end-to-end, unless I align a protein against > itself. > > Bio::Tools:pSW is what I'm using currently. Bioperl doesn't seem to > have an interface to do pairwise alignments with Clustalw, and I'm > experiencing problems using standalone blast. Anyone have any ideas? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Aaron J Mackey Pearson Laboratory University of Virginia (434) 924-2821 amackey@virginia.edu _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason at cgt.duhs.duke.edu Fri Aug 29 13:02:20 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Fri Aug 29 12:39:40 2003 Subject: [Bioperl-l] aligning sequences with Bio::Tools::pSW In-Reply-To: References: Message-ID: You can specify different output formats from the default EMBOSS alignment report - specifically at least {msf, fasta}. Local alignments are problematic because you cannot reconstruct where the alignments came from in the whole seq because of the EMBOSS MSF and FASTA output formats doesn't necessarily report start/end in the original seq. The EMBOSS alignment output format can be parsed with AlignIO::emboss can parse this format although I still think there are some special cases where it might not be parsing correctly for local alignments. -jason On Fri, 29 Aug 2003, Brian Osborne wrote: > Bill, > > AlignIO does stretcher too? Cool. So that means water, needle, and stretcher > have the same output format. Are there any others in the EMBOSS suite with > this output format? > > Thanks again, > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of > william.j.burtle@gsk.com > Sent: Friday, August 29, 2003 7:56 AM > To: Joshua Bronson > Cc: Bioperl list > Subject: Re: [Bioperl-l] aligning sequences with Bio::Tools::pSW > > Joshua, > > you might also try the EMBOSS tools (particularly stretcher and needle): > http://www.emboss.org/ > > and the bioperl interface: > http://doc.bioperl.org/releases/bioperl-1.2/Bio/Factory/EMBOSS.html > > - Bill Burtle > > > > > > > > "Aaron J Mackey" > > Sent by: bioperl-l-bounces@portal.open-bio.org > 29-Aug-2003 07:41 > Please respond to "Aaron J. Mackey" > > > > > To: "Joshua Bronson" > > cc: "Bioperl list" > Subject: Re: [Bioperl-l] aligning sequences with > Bio::Tools::pSW > > > pSW.pm is an implementation of the Smith-Waterman algorithm, a *local* > pairwise alignment algorithm, which means that the best alignment found > need not start and finish at the beginning and ending of either sequence. > You seem to want a *global* alignment algorithm, such as Needleman-Wunsch > (with "free" end-gap penalties for globally aligning the larger sequence: > this is how to achieve hybrid local/global pairwise alignments); ClustalW > should be able to give you what you need. > > -Aaron > > On Thu, 28 Aug 2003, Joshua Bronson wrote: > > > I'm interested in the a portion of some virus polyproteins. To find > > the portion, I'm aligning the polyprotein against other known > > proteins. I want the computer to give me a best guess and align the > > smaller protein end-to-end, but currently it's not doing that. It will > > only give me portions of the protein that align strongly. None of the > > proteins are aligning end-to-end, unless I align a protein against > > itself. > > > > Bio::Tools:pSW is what I'm using currently. Bioperl doesn't seem to > > have an interface to do pairwise alignments with Clustalw, and I'm > > experiencing problems using standalone blast. Anyone have any ideas? > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Aaron J Mackey > Pearson Laboratory > University of Virginia > (434) 924-2821 > amackey@virginia.edu > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From lstein at cshl.edu Fri Aug 29 12:47:25 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Fri Aug 29 12:46:38 2003 Subject: [Bioperl-l] doing a 1.2.3 release In-Reply-To: References: Message-ID: <200308291247.25288.lstein@cshl.edu> What are the timetables for 1.2.3 and 1.3? Lincoln On Wednesday 27 August 2003 07:07 pm, Ewan Birney wrote: > On Wed, 27 Aug 2003, Jason Stajich wrote: > > In my mind there are enough things that have been fixed since 1.2.2 on > > the branch to justify the effort in releasing another bugfix release on > > the stable branch. I would like to merge some of the > > HTML|TextResultWriter fixes from the main trunk, otherwise I think could > > be an easy push out the door. > > I'd agree. I also think we need to start on the 1.3 series. Heikki... > feel free to chime in...? > > > Jason - I am ok for some bug hunting for a while on either branch or trunk > - any not-so-nasty-that-I-go-mad-but-useful-to-fix-bugs out there? > > > -jason > > > > -- > > Jason Stajich > > Duke University > > jason at cgt.mc.duke.edu > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ======================================================================== Lincoln D. Stein Cold Spring Harbor Laboratory lstein@cshl.org Cold Spring Harbor, NY ======================================================================== From brian_osborne at cognia.com Fri Aug 29 12:45:21 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Aug 29 12:48:26 2003 Subject: [Bioperl-l] aligning sequences with Bio::Tools::pSW In-Reply-To: Message-ID: Jason, I guess I'm asking something simpler, and I'm not an EMBOSS user. Which of the EMBOSS alignment programs use this "default EMBOSS alignment" format, aside from water, needle, and stretcher. All of them? Currently the docs just say water and needle, which isn't exactly right. Apart from the issue, admittedly important, about loss of the initial or input coordinates. Brian O. -----Original Message----- From: Jason Stajich [mailto:jason@cgt.duhs.duke.edu] Sent: Friday, August 29, 2003 1:02 PM To: Brian Osborne Cc: william.j.burtle@gsk.com; Joshua Bronson; Bioperl list Subject: RE: [Bioperl-l] aligning sequences with Bio::Tools::pSW You can specify different output formats from the default EMBOSS alignment report - specifically at least {msf, fasta}. Local alignments are problematic because you cannot reconstruct where the alignments came from in the whole seq because of the EMBOSS MSF and FASTA output formats doesn't necessarily report start/end in the original seq. The EMBOSS alignment output format can be parsed with AlignIO::emboss can parse this format although I still think there are some special cases where it might not be parsing correctly for local alignments. -jason On Fri, 29 Aug 2003, Brian Osborne wrote: > Bill, > > AlignIO does stretcher too? Cool. So that means water, needle, and stretcher > have the same output format. Are there any others in the EMBOSS suite with > this output format? > > Thanks again, > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of > william.j.burtle@gsk.com > Sent: Friday, August 29, 2003 7:56 AM > To: Joshua Bronson > Cc: Bioperl list > Subject: Re: [Bioperl-l] aligning sequences with Bio::Tools::pSW > > Joshua, > > you might also try the EMBOSS tools (particularly stretcher and needle): > http://www.emboss.org/ > > and the bioperl interface: > http://doc.bioperl.org/releases/bioperl-1.2/Bio/Factory/EMBOSS.html > > - Bill Burtle > > > > > > > > "Aaron J Mackey" > > Sent by: bioperl-l-bounces@portal.open-bio.org > 29-Aug-2003 07:41 > Please respond to "Aaron J. Mackey" > > > > > To: "Joshua Bronson" > > cc: "Bioperl list" > Subject: Re: [Bioperl-l] aligning sequences with > Bio::Tools::pSW > > > pSW.pm is an implementation of the Smith-Waterman algorithm, a *local* > pairwise alignment algorithm, which means that the best alignment found > need not start and finish at the beginning and ending of either sequence. > You seem to want a *global* alignment algorithm, such as Needleman-Wunsch > (with "free" end-gap penalties for globally aligning the larger sequence: > this is how to achieve hybrid local/global pairwise alignments); ClustalW > should be able to give you what you need. > > -Aaron > > On Thu, 28 Aug 2003, Joshua Bronson wrote: > > > I'm interested in the a portion of some virus polyproteins. To find > > the portion, I'm aligning the polyprotein against other known > > proteins. I want the computer to give me a best guess and align the > > smaller protein end-to-end, but currently it's not doing that. It will > > only give me portions of the protein that align strongly. None of the > > proteins are aligning end-to-end, unless I align a protein against > > itself. > > > > Bio::Tools:pSW is what I'm using currently. Bioperl doesn't seem to > > have an interface to do pairwise alignments with Clustalw, and I'm > > experiencing problems using standalone blast. Anyone have any ideas? > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Aaron J Mackey > Pearson Laboratory > University of Virginia > (434) 924-2821 > amackey@virginia.edu > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From birney at ebi.ac.uk Fri Aug 29 12:50:50 2003 From: birney at ebi.ac.uk (Ewan Birney) Date: Fri Aug 29 12:49:29 2003 Subject: [Bioperl-l] doing a 1.2.3 release In-Reply-To: <200308291247.25288.lstein@cshl.edu> Message-ID: On Fri, 29 Aug 2003, Lincoln Stein wrote: > What are the timetables for 1.2.3 and 1.3? > 1.2.3 soon(ish) 1.3 Heikki should decide. I am not on campus at the moment, so will try to track him down next week... From birney at ebi.ac.uk Fri Aug 29 12:56:41 2003 From: birney at ebi.ac.uk (Ewan Birney) Date: Fri Aug 29 12:55:19 2003 Subject: [Bioperl-l] bugs on branch; tests on main trunk Message-ID: I have fixed the translate() and pdb res bug on the branch. On the main trunk I have put more external module protection into the new tests. It would be nice if people writing new tests who knew they needed external modules would do this directly. Heikki or Rob --- does RestrictionEnzyme *really* need Storeable? Storeable doesn't come by default on systems, so if it didn't need it then it would be more useful not to use it. Any chance of this? I put a require eval() in tutorial around the restriction enzyme stuff. Chris (and the unflattening crew...) the Unflattener is issueing alot of warnings with -w --- any chance of one of you looking at it? However, I now have on the main trunk: All tests successful, 33 subtests skipped. Files=168, Tests=7643, 383 wallclock secs (287.31 cusr + 25.46 csys = 312.77 CPU) Pretty darn impressive. From jason at cgt.duhs.duke.edu Fri Aug 29 13:19:53 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Fri Aug 29 12:57:06 2003 Subject: [Bioperl-l] EMBOSS Alignment program [was aligning sequences with Bio::Tools::pSW] In-Reply-To: References: Message-ID: Any of the programs which provide the -aformat option I guess. I don't think demoalign is really part of the distro though so safe to ignore it. [jason@sonogno acd]$ pwd /usr/local/pkg/emboss/share/EMBOSS/acd [jason@sonogno acd]$ grep "aformat:" * demoalign.acd: aformat: "SRS" matcher.acd: aformat: "markx0" merger.acd: aformat: "simple" needle.acd: aformat: "srspair" stretcher.acd: aformat: "markx0" supermatcher.acd: aformat: "simple" water.acd: aformat: "srspair" On Fri, 29 Aug 2003, Brian Osborne wrote: > Jason, > > I guess I'm asking something simpler, and I'm not an EMBOSS user. Which of > the EMBOSS alignment programs use this "default EMBOSS alignment" format, > aside from water, needle, and stretcher. All of them? Currently the docs > just say water and needle, which isn't exactly right. > > Apart from the issue, admittedly important, about loss of the initial or > input coordinates. > > Brian O. > > -----Original Message----- > From: Jason Stajich [mailto:jason@cgt.duhs.duke.edu] > Sent: Friday, August 29, 2003 1:02 PM > To: Brian Osborne > Cc: william.j.burtle@gsk.com; Joshua Bronson; Bioperl list > Subject: RE: [Bioperl-l] aligning sequences with Bio::Tools::pSW > > You can specify different output formats from the default EMBOSS alignment > report - specifically at least {msf, fasta}. > > Local alignments are problematic because you cannot reconstruct where the > alignments came from in the whole seq because of the EMBOSS MSF and > FASTA output formats doesn't necessarily report start/end in the original > seq. > > The EMBOSS alignment output format can be parsed with AlignIO::emboss can > parse this format although I still think there are some special cases > where it might not be parsing correctly for local alignments. > > -jason > On Fri, 29 Aug 2003, Brian Osborne wrote: > > > Bill, > > > > AlignIO does stretcher too? Cool. So that means water, needle, and > stretcher > > have the same output format. Are there any others in the EMBOSS suite with > > this output format? > > > > Thanks again, > > > > Brian O. > > > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org > > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of > > william.j.burtle@gsk.com > > Sent: Friday, August 29, 2003 7:56 AM > > To: Joshua Bronson > > Cc: Bioperl list > > Subject: Re: [Bioperl-l] aligning sequences with Bio::Tools::pSW > > > > Joshua, > > > > you might also try the EMBOSS tools (particularly stretcher and needle): > > http://www.emboss.org/ > > > > and the bioperl interface: > > http://doc.bioperl.org/releases/bioperl-1.2/Bio/Factory/EMBOSS.html > > > > - Bill Burtle > > > > > > > > > > > > > > > > "Aaron J Mackey" > > > > Sent by: bioperl-l-bounces@portal.open-bio.org > > 29-Aug-2003 07:41 > > Please respond to "Aaron J. Mackey" > > > > > > > > > > To: "Joshua Bronson" > > > > cc: "Bioperl list" > > Subject: Re: [Bioperl-l] aligning sequences with > > Bio::Tools::pSW > > > > > > pSW.pm is an implementation of the Smith-Waterman algorithm, a *local* > > pairwise alignment algorithm, which means that the best alignment found > > need not start and finish at the beginning and ending of either sequence. > > You seem to want a *global* alignment algorithm, such as Needleman-Wunsch > > (with "free" end-gap penalties for globally aligning the larger sequence: > > this is how to achieve hybrid local/global pairwise alignments); ClustalW > > should be able to give you what you need. > > > > -Aaron > > > > On Thu, 28 Aug 2003, Joshua Bronson wrote: > > > > > I'm interested in the a portion of some virus polyproteins. To find > > > the portion, I'm aligning the polyprotein against other known > > > proteins. I want the computer to give me a best guess and align the > > > smaller protein end-to-end, but currently it's not doing that. It will > > > only give me portions of the protein that align strongly. None of the > > > proteins are aligning end-to-end, unless I align a protein against > > > itself. > > > > > > Bio::Tools:pSW is what I'm using currently. Bioperl doesn't seem to > > > have an interface to do pairwise alignments with Clustalw, and I'm > > > experiencing problems using standalone blast. Anyone have any ideas? > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > -- > > Aaron J Mackey > > Pearson Laboratory > > University of Virginia > > (434) 924-2821 > > amackey@virginia.edu > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From brian_osborne at cognia.com Fri Aug 29 12:59:27 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Aug 29 13:02:33 2003 Subject: [Bioperl-l] EMBOSS Alignment program [was aligning sequences withBio::Tools::pSW] In-Reply-To: Message-ID: Jason, Excellent. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jason Stajich Sent: Friday, August 29, 2003 1:20 PM To: Brian Osborne Cc: william.j.burtle@gsk.com; Bioperl list; Joshua Bronson Subject: [Bioperl-l] EMBOSS Alignment program [was aligning sequences withBio::Tools::pSW] Any of the programs which provide the -aformat option I guess. I don't think demoalign is really part of the distro though so safe to ignore it. [jason@sonogno acd]$ pwd /usr/local/pkg/emboss/share/EMBOSS/acd [jason@sonogno acd]$ grep "aformat:" * demoalign.acd: aformat: "SRS" matcher.acd: aformat: "markx0" merger.acd: aformat: "simple" needle.acd: aformat: "srspair" stretcher.acd: aformat: "markx0" supermatcher.acd: aformat: "simple" water.acd: aformat: "srspair" On Fri, 29 Aug 2003, Brian Osborne wrote: > Jason, > > I guess I'm asking something simpler, and I'm not an EMBOSS user. Which of > the EMBOSS alignment programs use this "default EMBOSS alignment" format, > aside from water, needle, and stretcher. All of them? Currently the docs > just say water and needle, which isn't exactly right. > > Apart from the issue, admittedly important, about loss of the initial or > input coordinates. > > Brian O. > > -----Original Message----- > From: Jason Stajich [mailto:jason@cgt.duhs.duke.edu] > Sent: Friday, August 29, 2003 1:02 PM > To: Brian Osborne > Cc: william.j.burtle@gsk.com; Joshua Bronson; Bioperl list > Subject: RE: [Bioperl-l] aligning sequences with Bio::Tools::pSW > > You can specify different output formats from the default EMBOSS alignment > report - specifically at least {msf, fasta}. > > Local alignments are problematic because you cannot reconstruct where the > alignments came from in the whole seq because of the EMBOSS MSF and > FASTA output formats doesn't necessarily report start/end in the original > seq. > > The EMBOSS alignment output format can be parsed with AlignIO::emboss can > parse this format although I still think there are some special cases > where it might not be parsing correctly for local alignments. > > -jason > On Fri, 29 Aug 2003, Brian Osborne wrote: > > > Bill, > > > > AlignIO does stretcher too? Cool. So that means water, needle, and > stretcher > > have the same output format. Are there any others in the EMBOSS suite with > > this output format? > > > > Thanks again, > > > > Brian O. > > > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org > > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of > > william.j.burtle@gsk.com > > Sent: Friday, August 29, 2003 7:56 AM > > To: Joshua Bronson > > Cc: Bioperl list > > Subject: Re: [Bioperl-l] aligning sequences with Bio::Tools::pSW > > > > Joshua, > > > > you might also try the EMBOSS tools (particularly stretcher and needle): > > http://www.emboss.org/ > > > > and the bioperl interface: > > http://doc.bioperl.org/releases/bioperl-1.2/Bio/Factory/EMBOSS.html > > > > - Bill Burtle > > > > > > > > > > > > > > > > "Aaron J Mackey" > > > > Sent by: bioperl-l-bounces@portal.open-bio.org > > 29-Aug-2003 07:41 > > Please respond to "Aaron J. Mackey" > > > > > > > > > > To: "Joshua Bronson" > > > > cc: "Bioperl list" > > Subject: Re: [Bioperl-l] aligning sequences with > > Bio::Tools::pSW > > > > > > pSW.pm is an implementation of the Smith-Waterman algorithm, a *local* > > pairwise alignment algorithm, which means that the best alignment found > > need not start and finish at the beginning and ending of either sequence. > > You seem to want a *global* alignment algorithm, such as Needleman-Wunsch > > (with "free" end-gap penalties for globally aligning the larger sequence: > > this is how to achieve hybrid local/global pairwise alignments); ClustalW > > should be able to give you what you need. > > > > -Aaron > > > > On Thu, 28 Aug 2003, Joshua Bronson wrote: > > > > > I'm interested in the a portion of some virus polyproteins. To find > > > the portion, I'm aligning the polyprotein against other known > > > proteins. I want the computer to give me a best guess and align the > > > smaller protein end-to-end, but currently it's not doing that. It will > > > only give me portions of the protein that align strongly. None of the > > > proteins are aligning end-to-end, unless I align a protein against > > > itself. > > > > > > Bio::Tools:pSW is what I'm using currently. Bioperl doesn't seem to > > > have an interface to do pairwise alignments with Clustalw, and I'm > > > experiencing problems using standalone blast. Anyone have any ideas? > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > -- > > Aaron J Mackey > > Pearson Laboratory > > University of Virginia > > (434) 924-2821 > > amackey@virginia.edu > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > jason at cgt.mc.duke.edu > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From lstein at cshl.edu Fri Aug 29 13:10:39 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Fri Aug 29 13:09:39 2003 Subject: [Bioperl-l] doing a 1.2.3 release In-Reply-To: References: Message-ID: <200308291310.39582.lstein@cshl.edu> What branch should I commit to for 1.2.3? branch-1-2? Lincoln On Friday 29 August 2003 12:50 pm, Ewan Birney wrote: > On Fri, 29 Aug 2003, Lincoln Stein wrote: > > What are the timetables for 1.2.3 and 1.3? > > 1.2.3 soon(ish) > > 1.3 Heikki should decide. I am not on campus at the moment, so will try to > track him down next week... > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ======================================================================== Lincoln D. Stein Cold Spring Harbor Laboratory lstein@cshl.org Cold Spring Harbor, NY ======================================================================== From jason at cgt.duhs.duke.edu Fri Aug 29 13:39:45 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Fri Aug 29 13:17:10 2003 Subject: [Bioperl-l] doing a 1.2.3 release In-Reply-To: <200308291310.39582.lstein@cshl.edu> References: <200308291310.39582.lstein@cshl.edu> Message-ID: yes. stable branch tag -----> branch-1-2 main trunk (where 1.3 will branch/release from -----> HEAD (default) Fixes made should thus go in two places. For those unfamiliar how to get access to the branch and the main trunk: cvs -d:...:bioperl co -d bioperl-1.2-branch -r branch-1-2 bioperl-live cvs -d:...:bioperl co -d bioperl-live bioperl-live This will create 2 directories, one called bioperl-1.2-branch and one called bioperl-live ---- Let's try for 1.2.3 in 2 weeks? We need to check out some of the keywords problems one more time on the branch -there were problems in how the RichSeq object implemented keywords and we need to make sure the SeqIO parsers/writers do the right thing. -jason On Fri, 29 Aug 2003, Lincoln Stein wrote: > What branch should I commit to for 1.2.3? branch-1-2? > > Lincoln -- Jason Stajich Duke University jason at cgt.mc.duke.edu From cjfields at uiuc.edu Fri Aug 29 13:26:01 2003 From: cjfields at uiuc.edu (Christopher Fields) Date: Fri Aug 29 13:26:03 2003 Subject: [Bioperl-l] Bio::Graphics In-Reply-To: <200308291139.59003.lstein@cshl.edu> References: <5.1.1.6.0.20030806090354.00b28208@valmont> <200308291139.59003.lstein@cshl.edu> Message-ID: <1062178023.3695.110.camel@chrisfields.life.uiuc.edu> I tried the tutorial on RedHat Linux 9.0 and it works, but I tried it on my wife's IBook (Mac OSX 10.2.6) and couldn't get the labels to come up either. I posted a reply here (bioperl-l) a while back about it but didn't get a reply. Could it be a font problem or LANG setting? My RH 9.0 system has LANG=en_US, but I think the IBook (OS X) has LANG=C (neither set to UTF-8; I had problems with this in the past on both systems). On Fri, 2003-08-29 at 10:39, Lincoln Stein wrote: > Sorry for responding so late to this e-mail. You were looking at an out of > date tutorial that no longer matches the code base. However, the tutorial > has now been updated and the examples should work. > > Lincoln > > On Wednesday 06 August 2003 03:09 am, Laurence Amilhat wrote: > > Hi, > > > > I try to learn how to use the module Bio::Graphics. > > I found he How To from Lincoln Stein on the web. I try to practice with the > > examples, it's working except for the labels of the features that don't > > appear on my figure. > > Does anybody ever use this module? > > > > This is the example: > > #!/usr/local/public/bin/perl > > > > use strict; > > use lib > > '/homej/bioinf/lamilhat/PERL_MODULE/lib/perl5/site_perl/5.005/BIOPERL/lib/s > >ite_perl/5.6.1/'; use Bio::Graphics; > > use Bio::SeqFeature::Generic; > > > > my $panel= Bio::Graphics::Panel->new(-length =>1000,-width =>800); > > my $track=$panel->add_track(-glyph =>'generic',-label =>1); > > > > > > while (<>) > > { > > chomp; > > next if /^\#/; > > my ($name,$score,$start,$end)=split /\t+/; > > print STDERR "$name\n"; > > my $feature= > > Bio::SeqFeature::Generic->new(-display_name=>$name,-score=>$score,-start=>$ > >start,-end=>$end); $track->add_feature($feature); > > } > > > > print $panel->png; > > > > > > And this is the Data to parse with the example: > > #hit score start end > > truc1 381 2 200 > > truc2 210 2 210 > > truc3 800 2 200 > > truc4 1000 380 921 > > truc5 812 402 972 > > truc6 1200 400 970 > > bum 400 300 620 > > pres1 127 310 700 > > > > > > Thanks, > > > > Laurence. > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > INRA, UMR INRA/UBP Am?lioration et Sant? des Plantes > > 234 avenue du Br?zet > > 63039 Clermont-Ferrand Cedex 2 > > > > Tel 04 73 62 48 37 > > Fax 04 73 62 44 53 > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- ________________________ Christopher Fields Postdoctoral Researcher - Dept. of Biochemistry University of Illinois at Urbana-Champaign From lstein at cshl.edu Fri Aug 29 15:30:57 2003 From: lstein at cshl.edu (Lincoln Stein) Date: Fri Aug 29 15:30:25 2003 Subject: [Bioperl-l] Bio::Graphics In-Reply-To: <1062178023.3695.110.camel@chrisfields.life.uiuc.edu> References: <5.1.1.6.0.20030806090354.00b28208@valmont> <200308291139.59003.lstein@cshl.edu> <1062178023.3695.110.camel@chrisfields.life.uiuc.edu> Message-ID: <200308291530.57893.lstein@cshl.edu> Are you sure that you are using the same version of bioperl on both boxes? The fonts are compiled in and LANG is ignored by Bio::Graphics. Lincoln On Friday 29 August 2003 01:27 pm, Christopher Fields wrote: > I tried the tutorial on RedHat Linux 9.0 and it works, but I tried it on > my wife's IBook (Mac OSX 10.2.6) and couldn't get the labels to come up > either. I posted a reply here (bioperl-l) a while back about it but > didn't get a reply. Could it be a font problem or LANG setting? My RH > 9.0 system has LANG=en_US, but I think the IBook (OS X) has LANG=C > (neither set to UTF-8; I had problems with this in the past on both > systems). > > On Fri, 2003-08-29 at 10:39, Lincoln Stein wrote: > > Sorry for responding so late to this e-mail. You were looking at an out > > of date tutorial that no longer matches the code base. However, the > > tutorial has now been updated and the examples should work. > > > > Lincoln > > > > On Wednesday 06 August 2003 03:09 am, Laurence Amilhat wrote: > > > Hi, > > > > > > I try to learn how to use the module Bio::Graphics. > > > I found he How To from Lincoln Stein on the web. I try to practice with > > > the examples, it's working except for the labels of the features that > > > don't appear on my figure. > > > Does anybody ever use this module? > > > > > > This is the example: > > > #!/usr/local/public/bin/perl > > > > > > use strict; > > > use lib > > > '/homej/bioinf/lamilhat/PERL_MODULE/lib/perl5/site_perl/5.005/BIOPERL/l > > >ib/s ite_perl/5.6.1/'; use Bio::Graphics; > > > use Bio::SeqFeature::Generic; > > > > > > my $panel= Bio::Graphics::Panel->new(-length =>1000,-width =>800); > > > my $track=$panel->add_track(-glyph =>'generic',-label =>1); > > > > > > > > > while (<>) > > > { > > > chomp; > > > next if /^\#/; > > > my ($name,$score,$start,$end)=split /\t+/; > > > print STDERR "$name\n"; > > > my $feature= > > > Bio::SeqFeature::Generic->new(-display_name=>$name,-score=>$score,-star > > >t=>$ start,-end=>$end); $track->add_feature($feature); > > > } > > > > > > print $panel->png; > > > > > > > > > And this is the Data to parse with the example: > > > #hit score start end > > > truc1 381 2 200 > > > truc2 210 2 210 > > > truc3 800 2 200 > > > truc4 1000 380 921 > > > truc5 812 402 972 > > > truc6 1200 400 970 > > > bum 400 300 620 > > > pres1 127 310 700 > > > > > > > > > Thanks, > > > > > > Laurence. > > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > INRA, UMR INRA/UBP Am?lioration et Sant? des Plantes > > > 234 avenue du Br?zet > > > 63039 Clermont-Ferrand Cedex 2 > > > > > > Tel 04 73 62 48 37 > > > Fax 04 73 62 44 53 > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- ======================================================================== Lincoln D. Stein Cold Spring Harbor Laboratory lstein@cshl.org Cold Spring Harbor, NY ======================================================================== From avilella at lycos.es Fri Aug 29 16:49:51 2003 From: avilella at lycos.es (avilella) Date: Fri Aug 29 16:49:52 2003 Subject: [Bioperl-l] Problem with load_seqdatabase -> Redhat9 problem In-Reply-To: <38D196D0-97B0-11D7-BF5D-000393B4BFF6@gnf.org> References: <38D196D0-97B0-11D7-BF5D-000393B4BFF6@gnf.org> Message-ID: <1062190245.4812.19.camel@localhost.localdomain> Hi, I finally came up with the cause for the strange swissprot parsing problem that I was having (on a Redhat9), and that it wasn't reproducible on a different (Mandrake9.1) linux box: It's due to the Redhat9 bad UTF-8 handling: Michael G Schwern says: RedHat 9 shipped with a prerelease version of Perl 5.8.1 with broken UTF-8 handling. If you set your LANG environment variable to something which is not UTF8 (de_DE should work, or C) things should start working again. https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=87682 I solved the problem setting LANG to, for example: export LANG=en_US and the swissprot problem disappeared... I hope it helps somebody, Best regards, Albert Vilella P.D.: this problem affects other perl module installations, so be aware of that... On Fri, 2003-06-06 at 01:48, Hilmar Lapp wrote: > Strange. Why should there be a difference I'm wondering, since they > both use the same module for parsing. I've downloaded sprot41 and > investigate as soon as I get to it. I think there was a similar report > not long ago that was then resolved somehow. > > -hilmar > > On Wednesday, June 4, 2003, at 07:53 AM, albert vilella wrote: > > > Hi, > > > > I've been trying to load a swissprot dataset into a biosql database > > using load_seqdatabase.pl, but I get an error: > > > > ./load_seqdatabase.pl -host localhost -dbname biosql -dbuser root > > -dbpass '*******' -namespace bioperl -format swiss > > /data/database/sprot41.dat > > > > ------------- EXCEPTION ------------- > > MSG: swissprot stream with no ID. Not swissprot in my book > > STACK Bio::SeqIO::swiss::next_seq > > /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/swiss.pm:180 > > STACK toplevel ./load_seqdatabase.pl:386 > > > > -------------------------------------- > > > > Apparently, the next_seq subrutine gets stucked in the first entry > > while > > parsing the ID field: > > > > swiss.pm > > ---------------------------------------------------------------------- > > > > $line =~ /^ID\s+([^\s_]+)(_([^\s_]+))?\s+([^\s;]+);\s+([^\s;]+);/ || > > $self->throw("swissprot stream with no ID. Not swissprot in my book"); > > > > ---------------------------------------------------------------------- > > > > This is strange because I can read the same entry in the same file > > with: > > > > #! /usr/bin/perl -w > > > > use strict; > > use Bio::SeqIO; > > use Bio::Seq; > > > > my $file = shift @ARGV; > > my $in = Bio::SeqIO->new ( -file => $file, > > -format => 'swiss'); > > my $seq = $in->next_seq(); > > print "Seq: ", $seq->accession_number(), " -- ", $seq->desc(), "\n\n"; > > > > Anybody experiencing similar problems? Any guess of what is happening? > > > > Thanks in advance, > > > > Albert Vilella > > Molecular Evolution - Dept. Genetics > > Universitat de Barcelona > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l From brian_osborne at cognia.com Fri Aug 29 13:00:08 2003 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Aug 29 17:16:06 2003 Subject: [Bioperl-l] aligning sequences with Bio::Tools::pSW In-Reply-To: Message-ID: Bill, Useful snippet, thank you. Brian O. -----Original Message----- From: william.j.burtle@gsk.com [mailto:william.j.burtle@gsk.com] Sent: Friday, August 29, 2003 12:50 PM To: Brian Osborne Cc: Bioperl list Subject: RE: [Bioperl-l] aligning sequences with Bio::Tools::pSW Most of the EMBOSS tools let you specify an "aformat" flag. See toward the bottom of: http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Themes/AlignFormats.html Here is a code snippet (using 'matcher'). In this particular case, I had specified "pair" format, and 2 alternatives (i.e., the top two hits): my $factory = new Bio::Factory::EMBOSS(-verbose => undef); my $prog = $factory->program('matcher'); $prog->run({ '-sequencea' => Bio::Seq->new(-id => "seq1", -seq => $seq1), '-sequenceb' => Bio::Seq->new(-id => "seq2", -seq => $seq2), '-aformat' => "pair", '-alternatives' => 2, '-outfile' => $outfile}); ## later in the program..... my $alignio_fmt = "emboss"; my $align_res = new Bio::AlignIO(-format => $alignio_fmt, -file => $outfile); - Bill "Brian Osborne" Sent by: bioperl-l-bounces@portal.open-bio.org 29-Aug-2003 12:25 To: william.j.burtle@gsk.com, "Joshua Bronson" cc: "Bioperl list" Subject: RE: [Bioperl-l] aligning sequences with Bio::Tools::pSW Bill, AlignIO does stretcher too? Cool. So that means water, needle, and stretcher have the same output format. Are there any others in the EMBOSS suite with this output format? Thanks again, Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of william.j.burtle@gsk.com Sent: Friday, August 29, 2003 7:56 AM To: Joshua Bronson Cc: Bioperl list Subject: Re: [Bioperl-l] aligning sequences with Bio::Tools::pSW Joshua, you might also try the EMBOSS tools (particularly stretcher and needle): http://www.emboss.org/ and the bioperl interface: http://doc.bioperl.org/releases/bioperl-1.2/Bio/Factory/EMBOSS.html - Bill Burtle "Aaron J Mackey" Sent by: bioperl-l-bounces@portal.open-bio.org 29-Aug-2003 07:41 Please respond to "Aaron J. Mackey" To: "Joshua Bronson" cc: "Bioperl list" Subject: Re: [Bioperl-l] aligning sequences with Bio::Tools::pSW pSW.pm is an implementation of the Smith-Waterman algorithm, a *local* pairwise alignment algorithm, which means that the best alignment found need not start and finish at the beginning and ending of either sequence. You seem to want a *global* alignment algorithm, such as Needleman-Wunsch (with "free" end-gap penalties for globally aligning the larger sequence: this is how to achieve hybrid local/global pairwise alignments); ClustalW should be able to give you what you need. -Aaron On Thu, 28 Aug 2003, Joshua Bronson wrote: > I'm interested in the a portion of some virus polyproteins. To find > the portion, I'm aligning the polyprotein against other known > proteins. I want the computer to give me a best guess and align the > smaller protein end-to-end, but currently it's not doing that. It will > only give me portions of the protein that align strongly. None of the > proteins are aligning end-to-end, unless I align a protein against > itself. > > Bio::Tools:pSW is what I'm using currently. Bioperl doesn't seem to > have an interface to do pairwise alignments with Clustalw, and I'm > experiencing problems using standalone blast. Anyone have any ideas? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Aaron J Mackey Pearson Laboratory University of Virginia (434) 924-2821 amackey@virginia.edu _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From william.j.burtle at gsk.com Fri Aug 29 12:50:21 2003 From: william.j.burtle at gsk.com (william.j.burtle@gsk.com) Date: Fri Aug 29 17:16:34 2003 Subject: [Bioperl-l] aligning sequences with Bio::Tools::pSW Message-ID: Most of the EMBOSS tools let you specify an "aformat" flag. See toward the bottom of: http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Themes/AlignFormats.html Here is a code snippet (using 'matcher'). In this particular case, I had specified "pair" format, and 2 alternatives (i.e., the top two hits): my $factory = new Bio::Factory::EMBOSS(-verbose => undef); my $prog = $factory->program('matcher'); $prog->run({ '-sequencea' => Bio::Seq->new(-id => "seq1", -seq => $seq1), '-sequenceb' => Bio::Seq->new(-id => "seq2", -seq => $seq2), '-aformat' => "pair", '-alternatives' => 2, '-outfile' => $outfile}); ## later in the program..... my $alignio_fmt = "emboss"; my $align_res = new Bio::AlignIO(-format => $alignio_fmt, -file => $outfile); - Bill "Brian Osborne" Sent by: bioperl-l-bounces@portal.open-bio.org 29-Aug-2003 12:25 To: william.j.burtle@gsk.com, "Joshua Bronson" cc: "Bioperl list" Subject: RE: [Bioperl-l] aligning sequences with Bio::Tools::pSW Bill, AlignIO does stretcher too? Cool. So that means water, needle, and stretcher have the same output format. Are there any others in the EMBOSS suite with this output format? Thanks again, Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of william.j.burtle@gsk.com Sent: Friday, August 29, 2003 7:56 AM To: Joshua Bronson Cc: Bioperl list Subject: Re: [Bioperl-l] aligning sequences with Bio::Tools::pSW Joshua, you might also try the EMBOSS tools (particularly stretcher and needle): http://www.emboss.org/ and the bioperl interface: http://doc.bioperl.org/releases/bioperl-1.2/Bio/Factory/EMBOSS.html - Bill Burtle "Aaron J Mackey" Sent by: bioperl-l-bounces@portal.open-bio.org 29-Aug-2003 07:41 Please respond to "Aaron J. Mackey" To: "Joshua Bronson" cc: "Bioperl list" Subject: Re: [Bioperl-l] aligning sequences with Bio::Tools::pSW pSW.pm is an implementation of the Smith-Waterman algorithm, a *local* pairwise alignment algorithm, which means that the best alignment found need not start and finish at the beginning and ending of either sequence. You seem to want a *global* alignment algorithm, such as Needleman-Wunsch (with "free" end-gap penalties for globally aligning the larger sequence: this is how to achieve hybrid local/global pairwise alignments); ClustalW should be able to give you what you need. -Aaron On Thu, 28 Aug 2003, Joshua Bronson wrote: > I'm interested in the a portion of some virus polyproteins. To find > the portion, I'm aligning the polyprotein against other known > proteins. I want the computer to give me a best guess and align the > smaller protein end-to-end, but currently it's not doing that. It will > only give me portions of the protein that align strongly. None of the > proteins are aligning end-to-end, unless I align a protein against > itself. > > Bio::Tools:pSW is what I'm using currently. Bioperl doesn't seem to > have an interface to do pairwise alignments with Clustalw, and I'm > experiencing problems using standalone blast. Anyone have any ideas? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Aaron J Mackey Pearson Laboratory University of Virginia (434) 924-2821 amackey@virginia.edu _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From rpraca1 at yahoo.com Sun Aug 31 05:00:38 2003 From: rpraca1 at yahoo.com (Resume) Date: Sun Aug 31 08:00:10 2003 Subject: [Bioperl-l] Engineer Message-ID: <200308311200.h7VC01fg021186@localhost.localdomain> OBJECTIVE: Richard's Resume for Job, Consulting or Service Sr. MECHANICAL & DESIGN ENGINEER Project Mgr, Electro-Mechanical Design, Mfg, Product Development, R&D, CADD Tel: ( 408) 309-7006 rpraca9@yahoo.com EXPERIENCE: 08/93-present Sr. MECHANICAL & DESIGN ENGINEER, CADD Mgr "Mech-Tronic" Engineering, Design & Product Development Service; Project Management, Mfg, Tooling, Product Improvement; Developing Working Product, Design and Prototype Production; Controlling & Managing testing, redesign, manufacturing documentation preparation; and full realization of Task using technical approach, schedules, and budget to achieve Final Product. Selecting vendors and specialist to supply performance satisfying customers expectation and to create a new, modern product with advanced properties, quality and market desirable working abilities and sellable values. Preparing propositions & presentations, Review Manufacturing Process and Quality Control, Inspecting standards, Engineering Computations and Technical Improvements, Supplying and Production Automation. Project Management & Development. Preparing technical documentation, calculations, engineering, design, layouts, drawings, 3D and Solid Modeling, development & propositions. Hard Drive Design, testing, balancing, recalculating and redesign. Manufacturing and Assembly Equipment design and build. Systems Integration. Tooling and Operations Development, Implementation & Automation for mass production. Energy, Electric Vehicles and Computerized Transportation System Design & Implement. Solar Panels. Solid Works, Pro-E, CAD Management and Operations, Analyzing, Micro Station, ACAD 10-2002 & LT, Win 3x & 2000', Net, Internet, Softdesk, Structural Design. Network, Security, Electrical Design & Installations; Commercial, Industrial, Residential, Fire Alarms, Smoke Detectors, Lights, Plugs, Panels, Power Supply, Electro - Solar Installations, CADD automation. Dynamics, Kinematics, thermodynamics / heat transfer & FEA analyzing. Manufacturing hydraulic & pneumatic equipment, machinery and control systems, mechanisms, robotics device, precision machine elements. Inspecting and control manufacturing standards, analyzing stresses and tolerances, selecting materials, engineering computations and technical improvements, documentation, projects development, supplying. Automation, Conveyors, Spiral Elevator, Fast Cannery Transportation, Electronic and Electrical Components, Master Control Center, Sheet Metal Oven Rebuilding Project. Power Systems, UPS. Automation equipment and machine control using for machines and robotics precision mechanisms motion programming, fluid mechanics, pneumatics systems, mounting and positioning devices, electro-mechanical and vacuum mechanisms, design and analysis of structures, castings, welded frames, mechanical detailing. Computers: DOS, SUN UNIX, MAC, WP, dB, Lotus, Network, Windows & Applications; MS Project, Excel, Access, Word, PFS, Graphics, CAD / CAM, Excel, Basic, C, Fortran, Analyzes. CADD systems, Algor, VAX, Net. MCAD, Acad's 2002 & Softdesk, Script, Nastran, Infusoft, LAN, Tektronix, Cadkey, Lisp, Personal Designer, ACAD / Computer Instructor, Machine Design, Robotics & Automation, WIND, SOLAR, METRIC. EDUCATION: Institute for Business & Technology, California CADD Engineer, Programming, Design, Management Electro - Mechanical College Mechanical Engineering - BSME, ASEE, MBA Open for Travel * Salary open * Permanent preferred or Consulting From rpraca1 at yahoo.com Sun Aug 31 05:00:38 2003 From: rpraca1 at yahoo.com (Resume) Date: Sun Aug 31 08:00:28 2003 Subject: [Bioperl-l] Engineer Message-ID: <200308311200.h7VC01fg021185@localhost.localdomain> OBJECTIVE: Richard's Resume for Job, Consulting or Service Sr. MECHANICAL & DESIGN ENGINEER Project Mgr, Electro-Mechanical Design, Mfg, Product Development, R&D, CADD Tel: ( 408) 309-7006 rpraca9@yahoo.com EXPERIENCE: 08/93-present Sr. MECHANICAL & DESIGN ENGINEER, CADD Mgr "Mech-Tronic" Engineering, Design & Product Development Service; Project Management, Mfg, Tooling, Product Improvement; Developing Working Product, Design and Prototype Production; Controlling & Managing testing, redesign, manufacturing documentation preparation; and full realization of Task using technical approach, schedules, and budget to achieve Final Product. Selecting vendors and specialist to supply performance satisfying customers expectation and to create a new, modern product with advanced properties, quality and market desirable working abilities and sellable values. Preparing propositions & presentations, Review Manufacturing Process and Quality Control, Inspecting standards, Engineering Computations and Technical Improvements, Supplying and Production Automation. Project Management & Development. Preparing technical documentation, calculations, engineering, design, layouts, drawings, 3D and Solid Modeling, development & propositions. Hard Drive Design, testing, balancing, recalculating and redesign. Manufacturing and Assembly Equipment design and build. Systems Integration. Tooling and Operations Development, Implementation & Automation for mass production. Energy, Electric Vehicles and Computerized Transportation System Design & Implement. Solar Panels. Solid Works, Pro-E, CAD Management and Operations, Analyzing, Micro Station, ACAD 10-2002 & LT, Win 3x & 2000', Net, Internet, Softdesk, Structural Design. Network, Security, Electrical Design & Installations; Commercial, Industrial, Residential, Fire Alarms, Smoke Detectors, Lights, Plugs, Panels, Power Supply, Electro - Solar Installations, CADD automation. Dynamics, Kinematics, thermodynamics / heat transfer & FEA analyzing. Manufacturing hydraulic & pneumatic equipment, machinery and control systems, mechanisms, robotics device, precision machine elements. Inspecting and control manufacturing standards, analyzing stresses and tolerances, selecting materials, engineering computations and technical improvements, documentation, projects development, supplying. Automation, Conveyors, Spiral Elevator, Fast Cannery Transportation, Electronic and Electrical Components, Master Control Center, Sheet Metal Oven Rebuilding Project. Power Systems, UPS. Automation equipment and machine control using for machines and robotics precision mechanisms motion programming, fluid mechanics, pneumatics systems, mounting and positioning devices, electro-mechanical and vacuum mechanisms, design and analysis of structures, castings, welded frames, mechanical detailing. Computers: DOS, SUN UNIX, MAC, WP, dB, Lotus, Network, Windows & Applications; MS Project, Excel, Access, Word, PFS, Graphics, CAD / CAM, Excel, Basic, C, Fortran, Analyzes. CADD systems, Algor, VAX, Net. MCAD, Acad's 2002 & Softdesk, Script, Nastran, Infusoft, LAN, Tektronix, Cadkey, Lisp, Personal Designer, ACAD / Computer Instructor, Machine Design, Robotics & Automation, WIND, SOLAR, METRIC. EDUCATION: Institute for Business & Technology, California CADD Engineer, Programming, Design, Management Electro - Mechanical College Mechanical Engineering - BSME, ASEE, MBA Open for Travel * Salary open * Permanent preferred or Consulting From wes.barris at csiro.au Sun Aug 31 18:49:19 2003 From: wes.barris at csiro.au (Wes Barris) Date: Sun Aug 31 18:48:25 2003 Subject: [Bioperl-l] ace to msf format? In-Reply-To: References: Message-ID: <3F527B6F.5030206@csiro.au> Brian Osborne wrote: > Wes, > > I don't think this is possible in Bioperl. To put it more generally, AlignIO > can't accommodate Assembly objects currently. AlignIO is the module that > takes in a variety of alignment formats and interconverts them, analogous to > SeqIO. I'll be corrected if I'm wrong. > > Brian O. I am kind of new to this so I could be wrong but isn't an Assembly a group of alignments? So, from one assemble, a group of alignments could be generated? > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Wes Barris > Sent: Thursday, August 28, 2003 7:58 PM > To: Bioperl Mailing List > Subject: [Bioperl-l] ace to msf format? > > Can anyone give me a hint as to how I could use bioperl to read in > an ACE assembly and write out an MSF formatted alignment? This shows > what I have figured out so far: > > #!/usr/local/bin/perl -w > # > use strict; > use Bio::Assembly::IO; > # > my $usage = "Usage: $0 \n"; > my $infile = shift or die $usage; > > my $io = new Bio::Assembly::IO(-file=>$infile, -format=>'ace'); > my $assembly = $io->next_assembly; > > my $aln = $assembly->all_contigs(); > > -- > Wes Barris > E-Mail: Wes.Barris@csiro.au > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Wes Barris E-Mail: Wes.Barris@csiro.au From jason at cgt.duhs.duke.edu Sun Aug 31 22:28:43 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Sun Aug 31 22:05:06 2003 Subject: [Bioperl-l] ace to msf format? In-Reply-To: <3F527B6F.5030206@csiro.au> References: <3F527B6F.5030206@csiro.au> Message-ID: Each Contig is a Bio::Align::AlignI - so in theory you can manipulate them as if they are Bio::SimpleAlign objects. Robson can clarify if there are any caveats there. But you want to do this to have access to each contig in the scaffold: foreach my $contig ( $scaffold->all_contigs ) { # process Bio::Assembly::Contig object here } Your code below is calling it in scalar context which will just have $aln being set to the length of the returned array. -jason On Mon, 1 Sep 2003, Wes Barris wrote: > Brian Osborne wrote: > > > Wes, > > > > I don't think this is possible in Bioperl. To put it more generally, AlignIO > > can't accommodate Assembly objects currently. AlignIO is the module that > > takes in a variety of alignment formats and interconverts them, analogous to > > SeqIO. I'll be corrected if I'm wrong. > > > > Brian O. > > I am kind of new to this so I could be wrong but isn't an Assembly a group > of alignments? So, from one assemble, a group of alignments could be > generated? > > > > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org > > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Wes Barris > > Sent: Thursday, August 28, 2003 7:58 PM > > To: Bioperl Mailing List > > Subject: [Bioperl-l] ace to msf format? > > > > Can anyone give me a hint as to how I could use bioperl to read in > > an ACE assembly and write out an MSF formatted alignment? This shows > > what I have figured out so far: > > > > #!/usr/local/bin/perl -w > > # > > use strict; > > use Bio::Assembly::IO; > > # > > my $usage = "Usage: $0 \n"; > > my $infile = shift or die $usage; > > > > my $io = new Bio::Assembly::IO(-file=>$infile, -format=>'ace'); > > my $assembly = $io->next_assembly; > > > > my $aln = $assembly->all_contigs(); > > > > -- > > Wes Barris > > E-Mail: Wes.Barris@csiro.au > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu