From granjeau at tagc.univ-mrs.fr Thu Mar 1 02:36:43 2007 From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137) Date: Thu, 01 Mar 2007 08:36:43 +0100 Subject: [Bioperl-l] retrieven ids In-Reply-To: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com> References: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com> Message-ID: <45E6828B.4080808@tagc.univ-mrs.fr> Hi, I am not sure it's the key answer but the FAQ may help you http://www.bioperl.org/wiki/FAQ#How_do_I_retrieve_a_nucleotide_coding_sequence_when_I_have_a_protein_gi_number.3F Cheers, --Samuel Luba Pardo wrote: > Hi everyone, > I wonder if someone could give an advice of the following: > I want to retrieve the DNA coding sequence of a RefSeq protein id. I do not > want to translate the protein back to DNA, but rather get the DNA coding > sequence ID and then retrieve the DNA sequence from Gen Bank. Is there any > module that allow to get all possible ids for a sequence given a gi protein > ? > > Thank you very much in advance, > L. Pardo > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From lubapardo at gmail.com Thu Mar 1 02:48:27 2007 From: lubapardo at gmail.com (Luba Pardo) Date: Thu, 1 Mar 2007 08:48:27 +0100 Subject: [Bioperl-l] retrieven ids In-Reply-To: <57419.10.0.7.57.1172691089.squirrel@gscmail.wustl.edu> References: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com> <57419.10.0.7.57.1172691089.squirrel@gscmail.wustl.edu> Message-ID: <58ff33550702282348w7263f9c1o8a1d4bd6270c4fd0@mail.gmail.com> Thank you very much. L. Pardo On 28/02/07, Dave Messina wrote: > > Whenever I'm unsure of how to do something, I first look to see if one of > the HOWTOs on bioperl.org covers it. In this case, the Features HOWTO has > example code which I think will do what you want. > > Genbank records typically have the coding sequence of a protein as a > feature, so I would do something like: > > - use the RefSeq protein IDs to query Entrez and get back the Genbank > records. > > - read the Features HOWTO to refresh my memory on the syntax for grabbing > features. > > That HOWTO is at: > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation > > - whip up a little script to loop through the Genbank records one at a > time with SeqIO and pull out the cDNA sequence features. > > > Dave > > > From granjeau at tagc.univ-mrs.fr Thu Mar 1 05:09:11 2007 From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137) Date: Thu, 01 Mar 2007 11:09:11 +0100 Subject: [Bioperl-l] _rearrange In-Reply-To: References: Message-ID: <45E6A647.4060605@tagc.univ-mrs.fr> Hi, May be you will find information in http://www.bioperl.org/wiki/Advanced_BioPerl#rearrange.28.29 http://www.bioperl.org/wiki/Bioperl_Best_Practices Cheers, --Samuel Caroline Johnston wrote: > hi, > > Is there a discussion of the rationale behind the _rearrange method > somewhere? I'm probably just being gormless, but I think I'm missing the > point a bit. > > Is it okay for a method just to expect named params like > ->foo(arg1=>'stuff', arg2=>'things'); ? > > Cxx > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From michael.watson at bbsrc.ac.uk Thu Mar 1 05:58:16 2007 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu, 1 Mar 2007 10:58:16 -0000 Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna In-Reply-To: <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com> References: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk> <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2EBE3@iahce2ksrv1.iah.bbsrc.ac.uk> In fact, those pad_left and pad_right arguments have no effect whatsoever (using bioperl 1.5.2_100) my $panel = Bio::Graphics::Panel->new(-key_style => between, -offset => $start, -length => $stop - $start + 1, -width => 800 -pad_left =>5000, -pad_right =>5000 ); Even if I set them to 5000, the image looks exactly as if I had not set them. The only way I can get around this is to edit Glyph/dna.pm lines 184 and 186 such that $x2+3 is changed to $x2-20 - then the axes are drawn on the image instead of outside of it. This is obviously a hack, which upsets my karma. Mick ________________________________ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: 15 February 2007 18:53 To: michael watson (IAH-C) Cc: BioPerl-List Subject: Re: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna Hi Michael, When you set up the panel, do this: Bio::Graphics::Panel->new(-blah -blah, -pad_left => 20, -pad_right => 20); This will leave enough room on the left and right for you to see the Y axis. Otherwise it runs off the edge of the image (ok, this is a mis-design, but it was the only way to solve a chicken-and-egg problem about who gets to say how wide the panel is) Lincoln On 2/15/07, michael watson (IAH-C) wrote: Hi OK I have some great images out of this glyph, but I can't see the axis, and nor is it labelled (ie does it go from 0 - 100%?) so isn't great for publication. The docs say: "NOTE: -gc_window=>'auto' gives nice results and is recommended for drawing GC content. The GC content axes draw slightly outside the panel, so you may wish to add some extra padding on the right and left. " Any idea how to do this? Basically, I want a nice GC graph with the axis quite clearly labelled, and a nice "%GC" title next to it :) Thanks Mick The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From michael.watson at bbsrc.ac.uk Thu Mar 1 06:01:39 2007 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu, 1 Mar 2007 11:01:39 -0000 Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9503E2EBE3@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk> <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com> <8975119BCD0AC5419D61A9CF1A923E9503E2EBE3@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2EBE4@iahce2ksrv1.iah.bbsrc.ac.uk> On further inspection, the lack of a comma was causing my karma upset - apologies. Mick ________________________________ From: michael watson (IAH-C) Sent: 01 March 2007 10:58 To: 'lincoln.stein at gmail.com' Cc: BioPerl-List Subject: RE: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna In fact, those pad_left and pad_right arguments have no effect whatsoever (using bioperl 1.5.2_100) my $panel = Bio::Graphics::Panel->new(-key_style => between, -offset => $start, -length => $stop - $start + 1, -width => 800 -pad_left =>5000, -pad_right =>5000 ); Even if I set them to 5000, the image looks exactly as if I had not set them. The only way I can get around this is to edit Glyph/dna.pm lines 184 and 186 such that $x2+3 is changed to $x2-20 - then the axes are drawn on the image instead of outside of it. This is obviously a hack, which upsets my karma. Mick ________________________________ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: 15 February 2007 18:53 To: michael watson (IAH-C) Cc: BioPerl-List Subject: Re: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna Hi Michael, When you set up the panel, do this: Bio::Graphics::Panel->new(-blah -blah, -pad_left => 20, -pad_right => 20); This will leave enough room on the left and right for you to see the Y axis. Otherwise it runs off the edge of the image (ok, this is a mis-design, but it was the only way to solve a chicken-and-egg problem about who gets to say how wide the panel is) Lincoln On 2/15/07, michael watson (IAH-C) wrote: Hi OK I have some great images out of this glyph, but I can't see the axis, and nor is it labelled (ie does it go from 0 - 100%?) so isn't great for publication. The docs say: "NOTE: -gc_window=>'auto' gives nice results and is recommended for drawing GC content. The GC content axes draw slightly outside the panel, so you may wish to add some extra padding on the right and left. " Any idea how to do this? Basically, I want a nice GC graph with the axis quite clearly labelled, and a nice "%GC" title next to it :) Thanks Mick The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From heikki at sanbi.ac.za Thu Mar 1 06:02:30 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 1 Mar 2007 13:02:30 +0200 Subject: [Bioperl-l] Bio::SeqIO::FTHelper In-Reply-To: References: Message-ID: <200703011302.30855.heikki@sanbi.ac.za> Chris, It was meant to collect code that was common to all three main databases using similar feature tables. Now might be the time to optimise the parsing speed by removing it. Do you have a plan how to do it? -Heikki On Tuesday 27 February 2007 22:57:40 Chris Fields wrote: > Could anyone tell me what FTHelper is used for? From what I gather > it rolls up seqfeature data into a lightweight object but then > creates a SeqFeature::Generic anyway (at least for GenBank/EMBL/ > Swiss), which seems to be a waste of memory and time. Is there > something I'm missing (besides my sanity of course)? > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From lubapardo at gmail.com Thu Mar 1 09:47:23 2007 From: lubapardo at gmail.com (Luba Pardo) Date: Thu, 1 Mar 2007 15:47:23 +0100 Subject: [Bioperl-l] (no subject) Message-ID: <58ff33550703010647j1e7908f3sf3c01a74eeceaca4@mail.gmail.com> Dear all, Sorry if the questions is too basic but I am trying to learn BioPerl modules. So I am trying to get the CDS sequence from a gi identification protein using the "features" method. I started to run the example of the FAQ doc (How do I retrieve a nucleotide coding sequence when I have a protein gi number?) , but I can not get the script to run. the script is: use Bio::Factory::FTLocationFactory; use Bio::DB::GenPept; use Bio::DB::GenBank; my $gp = Bio::DB::GenPept->new; my $gb = Bio::DB::GenBank->new; # factory to turn strings into Bio::Location objects my $loc_factory = Bio::Factory::FTLocationFactory->new; my $protein_gi = '405830'; my $prot_obj = $gp->get_Seq_by_id($protein_gi);; foreach my $feat ( $prot_obj->top_SeqFeatures ) { if ( $feat->primary_tag eq 'CDS' ) { # example: 'coded_by="U05729.1:1..122"' my @coded_by = $feat->each_tag_value('coded_by'); my ($nuc_acc,$loc_str) = split /\:/, $coded_by[0]; my $nuc_obj = $gb->get_Seq_by_acc($nuc_acc); # create Bio::Location object from a string my $loc_object = $loc_factory->from_string($loc_str); # create a Feature object by using a Location my $feat_obj = Bio::SeqFeature::Generic->new(-location =>$loc_object); # associate the Feature object with the nucleotide Seq object $nuc_obj->add_SeqFeature($feat_obj); my $cds_obj = $feat_obj->spliced_seq; print "CDS sequence is ",$cds_obj->seq,"\n"; } } The error I got is ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Must specify a query or list of uids to fetch STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.1/Bio/Root/Root.pm:359 STACK: Bio::DB::NCBIHelper::get_request /usr/lib/perl5/site_perl/5.8.1/Bio/DB/NCBIHelper.pm:192 STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/lib/perl5/site_perl/5.8.1/Bio/DB/WebDBSeqI.pm:432 STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/lib/perl5/site_perl/5.8.1/Bio/DB/NCBIHelper.pm:361 STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc /usr/lib/perl5/site_perl/5.8.1/Bio/DB/WebDBSeqI.pm:172 STACK: feature1.pl:16 But I can not see where part of the script is that I have to specify a list of gi. That very odd. Am I interpreting the script wrong? I also tried : get_Seq_by_acc ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: acc complement(join(AL593843.9 does not exist STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.1/Bio/Root/Root.pm:359 STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc /usr/lib/perl5/site_perl/5.8.1/Bio/DB/WebDBSeqI.pm:181 STACK: feature1.pl:16 Can anyone let me know what am I doing wromg? Thank you very much in advance L. Pardo From jay at jays.net Thu Mar 1 10:51:38 2007 From: jay at jays.net (Jay Hannah) Date: Thu, 1 Mar 2007 09:51:38 -0600 (CST) Subject: [Bioperl-l] Enhance Bio::PrimarySeqI::trunc() for Bio::Location::Split ? Message-ID: In my GenBank files when I'm sitting on a CDS usually I can just call $feature->seq->seq; and out pops the exact nucleotide sequence which codes my protein. Very cool. Unfortunately, I have a crazy GenBank file which contains a CDS with a split range like this: CDS join(1959..2355,1..92) When I try to use $feature->seq->seq I don't end up with just the properly pieced together coding region, I end up with the *entire* nucleotide sequence. This seems to be happening because Bio::SeqFeature::Generic::seq 506: my $seq = $self->{'_gsf_seq'}->trunc($self->start(), $self->end()); (which is calling Bio::PrimarySeqI::trunc) works fine when Bio::SeqFeature::Generic is using '_location' => Bio::Location::Simple=HASH(0x1804344) '_end' => 2842 '_location_type' => 'EXACT' '_root_verbose' => 0 '_seqid' => 'AE015930' '_start' => 1601 '_strand' => 1 but when things get complicated and Bio::SeqFeature::Generic is using '_location' => Bio::Location::Split=HASH(0x1d1f130) '_seqid' => 'PNECG' '_splittype' => 'JOIN' '_sublocations' => ARRAY(0x1d1e654) 0 Bio::Location::Simple=HASH(0x1d1f290) '_end' => 2355 '_location_type' => 'EXACT' '_root_verbose' => 0 '_seqid' => 'PNECG' '_start' => 1959 '_strand' => 1 1 Bio::Location::Simple=HASH(0x1d1f338) '_end' => 92 '_location_type' => 'EXACT' '_root_verbose' => 0 '_seqid' => 'PNECG' '_start' => 1 '_strand' => 1 Simply passing $self->start and $self->end into trunc() will not pull off the appropriate magic. Question 1: Perhaps my data was bad and I should refuse to process join(1959..2355,1..92)? My accession is M12730, and if I download that from NCBI now it looks like they've changed it so my problem no longer exists in that sequence anyway. There are already 71 examples of CDS join in various files in t/data, and *none* of those examples jump backwards. Should I write this off as bad data or try to enhance BioPerl? I'm happy to throw my painful M12730 on the end of t/data/test.genbank and write tests for it if anyone thinks it is important. Question 2: Even if we can just ignore my M12730, though, I think there's still a problem afoot. Below I demo L26462 (already siting in t/data/test.genbank) which has a CDS join(866..957,1088..1310,2161..2289) In this case (as my tests below demonstrate), $feature->seq->seq is pulling the right range of nucleotide, but it's also pulling the gaps (introns). Isn't that wrong? Shouldn't it skip the introns? So... is the appropriate approach to try to enhance Bio::PrimarySeqI::trunc() for Bio::Location::Split? Or should trunc() be left alone and Bio::SeqFeature::Generic::seq() needs to get smarter? Or...? Thanks, oh mighty BioWizards! :) j seqlab.net http://www.bioperl.org/wiki/User:Jhannah ----------------- Tack this on the end of t/genbank.t and the length test at the end fails: ----------------- # Enhance Bio::PrimarySeqI::trunc() for Bio::Location::Split ? my $stream = Bio::SeqIO->new(-file => Bio::Root::IO->catfile ("t","data","test.genbank"), -verbose => $verbose, -format => 'genbank'); my $seq = $stream->next_seq; while ($seq->accession ne "M37762") { $seq = $stream->next_seq; } # M37762 has a CDS 76..819, which should work fine. ok(my @features = $seq->get_SeqFeatures(), "get_SeqFeatures()"); my $feat; foreach my $feat2 ( @features ) { next unless ($feat2->primary_tag eq "CDS"); my @db_xrefs = $feat2->annotation->get_Annotations("db_xref"); if (grep { $_ eq "GI:179403" } @db_xrefs) { $feat = $feat2; last; } } my ($protein_seq) = $feat->annotation->get_Annotations("translation"); ok($protein_seq =~ /^MTILFLTMVISYFGCMKA.*GWRFIRIDTSCVCTLTIKRGR$/, "protein sequence"); my ($nucleotide_seq) = $feat->seq->seq; ok($nucleotide_seq =~ /^ATGACCATCCTTTTCCTT.*ACCATTAAAAGGGGAAGATAG$/, "nucleotide sequence"); is(length($nucleotide_seq), 744, "nucleotide length"); # Jump down to L26462 which has a CDS join(866..957,1088..1310,2161..2289), which is broken? while ($seq->accession ne "L26462") { $seq = $stream->next_seq; } ok(my @features = $seq->get_SeqFeatures(), "get_SeqFeatures()"); my $feat; foreach my $feat2 ( @features ) { next unless ($feat2->primary_tag eq "CDS"); my @db_xrefs = $feat2->annotation->get_Annotations("db_xref"); if (grep { $_ eq "GI:532506" } @db_xrefs) { $feat = $feat2; last; } } my ($protein_seq) = $feat->annotation->get_Annotations("translation"); ok($protein_seq =~ /^MVHLTPEEKSAVTALWGK.*VQAAYQKVVAGVANALAHKYH$/, "protein sequence"); my ($nucleotide_seq) = $feat->seq->seq; ok($nucleotide_seq =~ /^ATGGTGCATCTGACTCCT.*CTGGCCCACAAGTATCACTAA$/, "nucleotide sequence - correct CDS range"); #print "[$nucleotide_seq]\n"; ok($nucleotide_seq !~ /^ACCTCCTATTTGACACCA.*TGCTAGTCTCCCGGAACTATC$/, "nucleotide sequence - full nucleotide should not match"); is(length($nucleotide_seq), 444, "nucleotide length"); # I have an old(?) version of M12730 which lists # CDS join(1959..2355,1..92) # /db_xref="GI:150830" # Crazy ranges like that don't work at all, you end up with the full nucleotide sequence... # But NCBI doesn't list M12730 that way any more, so now I would be OK? # ------------------ From cjfields at uiuc.edu Thu Mar 1 10:24:03 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Mar 2007 09:24:03 -0600 Subject: [Bioperl-l] Bio::SeqIO::FTHelper In-Reply-To: <200703011302.30855.heikki@sanbi.ac.za> References: <200703011302.30855.heikki@sanbi.ac.za> Message-ID: <8AA2B586-4C2B-4379-BFF0-0CEB15FAE68E@uiuc.edu> I do have a rough outline of what I think could be done: http://www.bioperl.org/wiki/Handler-based_SeqIO_parsers where you could switch out handlers to deal with incoming data chunks. Any suggestions there are welcome. I'll probably commit examples of the above in the next week or two (GenBank, EMBL, Swiss parsers using the same handlers) which don't use FTHelper. So far I have all three passing tests based on genbank/ embl/swiss.t but they need a few more tweaks before I commit. chris On Mar 1, 2007, at 5:02 AM, Heikki Lehvaslaiho wrote: > Chris, > > It was meant to collect code that was common to all three main > databases using > similar feature tables. > > Now might be the time to optimise the parsing speed by removing it. > Do you > have a plan how to do it? > > -Heikki > > On Tuesday 27 February 2007 22:57:40 Chris Fields wrote: >> Could anyone tell me what FTHelper is used for? From what I gather >> it rolls up seqfeature data into a lightweight object but then >> creates a SeqFeature::Generic anyway (at least for GenBank/EMBL/ >> Swiss), which seems to be a waste of memory and time. Is there >> something I'm missing (besides my sanity of course)? >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Mar 1 10:57:02 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Mar 2007 09:57:02 -0600 Subject: [Bioperl-l] Enhance Bio::PrimarySeqI::trunc() for Bio::Location::Split ? In-Reply-To: References: Message-ID: Jay, Have you tried using $feature->spliced_seq() instead of seq()? Using seq() retrieves the full sequence for the split location (from start of first sublocation to end of last), while spliced_seq() splices the sublocation sequences together, which is what I think you want. chris On Mar 1, 2007, at 9:51 AM, Jay Hannah wrote: > In my GenBank files when I'm sitting on a CDS usually I can just call > > $feature->seq->seq; > > and out pops the exact nucleotide sequence which codes my protein. > Very > cool. > > Unfortunately, I have a crazy GenBank file which contains a CDS with a > split range like this: CDS join(1959..2355,1..92) > > When I try to use $feature->seq->seq I don't end up with just the > properly > pieced together coding region, I end up with the *entire* nucleotide > sequence. > > This seems to be happening because > > Bio::SeqFeature::Generic::seq > 506: my $seq = $self->{'_gsf_seq'}->trunc($self->start(), $self- > >end()); > (which is calling Bio::PrimarySeqI::trunc) > > works fine when Bio::SeqFeature::Generic is using > > '_location' => Bio::Location::Simple=HASH(0x1804344) > '_end' => 2842 > '_location_type' => 'EXACT' > '_root_verbose' => 0 > '_seqid' => 'AE015930' > '_start' => 1601 > '_strand' => 1 > > but when things get complicated and Bio::SeqFeature::Generic is using > > '_location' => Bio::Location::Split=HASH(0x1d1f130) > '_seqid' => 'PNECG' > '_splittype' => 'JOIN' > '_sublocations' => ARRAY(0x1d1e654) > 0 Bio::Location::Simple=HASH(0x1d1f290) > '_end' => 2355 > '_location_type' => 'EXACT' > '_root_verbose' => 0 > '_seqid' => 'PNECG' > '_start' => 1959 > '_strand' => 1 > 1 Bio::Location::Simple=HASH(0x1d1f338) > '_end' => 92 > '_location_type' => 'EXACT' > '_root_verbose' => 0 > '_seqid' => 'PNECG' > '_start' => 1 > '_strand' => 1 > > Simply passing $self->start and $self->end into trunc() will not pull > off the appropriate magic. > > Question 1: Perhaps my data was bad and I should refuse to process > join(1959..2355,1..92)? My accession is M12730, and if I download that > from NCBI now it looks like they've changed it so my problem no longer > exists in that sequence anyway. There are already 71 examples of > CDS join > in various files in t/data, and *none* of those examples jump > backwards. > Should I write this off as bad data or try to enhance BioPerl? I'm > happy > to throw my painful M12730 on the end of t/data/test.genbank and write > tests for it if anyone thinks it is important. > > Question 2: Even if we can just ignore my M12730, though, I think > there's > still a problem afoot. Below I demo L26462 (already siting in > t/data/test.genbank) which has a > > CDS join(866..957,1088..1310,2161..2289) > > In this case (as my tests below demonstrate), $feature->seq->seq is > pulling the right range of nucleotide, but it's also pulling the gaps > (introns). Isn't that wrong? Shouldn't it skip the introns? > > So... is the appropriate approach to try to enhance > Bio::PrimarySeqI::trunc() for Bio::Location::Split? Or should trunc > () be > left alone and Bio::SeqFeature::Generic::seq() needs to get smarter? > > Or...? > > Thanks, oh mighty BioWizards! :) > > j > seqlab.net > http://www.bioperl.org/wiki/User:Jhannah > > > > ----------------- > Tack this on the end of t/genbank.t and the length test at the end > fails: > ----------------- > # Enhance Bio::PrimarySeqI::trunc() for Bio::Location::Split ? > my $stream = Bio::SeqIO->new(-file => Bio::Root::IO->catfile > ("t","data","test.genbank"), > -verbose => $verbose, > -format => 'genbank'); > my $seq = $stream->next_seq; > while ($seq->accession ne "M37762") { > $seq = $stream->next_seq; > } > # M37762 has a CDS 76..819, which should work fine. > ok(my @features = $seq->get_SeqFeatures(), "get_SeqFeatures()"); > my $feat; > foreach my $feat2 ( @features ) { > next unless ($feat2->primary_tag eq "CDS"); > my @db_xrefs = $feat2->annotation->get_Annotations("db_xref"); > if (grep { $_ eq "GI:179403" } @db_xrefs) { > $feat = $feat2; > last; > } > } > my ($protein_seq) = $feat->annotation->get_Annotations("translation"); > ok($protein_seq =~ /^MTILFLTMVISYFGCMKA.*GWRFIRIDTSCVCTLTIKRGR > $/, "protein sequence"); > my ($nucleotide_seq) = $feat->seq->seq; > ok($nucleotide_seq =~ /^ATGACCATCCTTTTCCTT.*ACCATTAAAAGGGGAAGATAG > $/, "nucleotide sequence"); > is(length($nucleotide_seq), > 744, "nucleotide length"); > > # Jump down to L26462 which has a CDS join > (866..957,1088..1310,2161..2289), which is broken? > while ($seq->accession ne "L26462") { > $seq = $stream->next_seq; > } > ok(my @features = $seq->get_SeqFeatures(), "get_SeqFeatures()"); > my $feat; > foreach my $feat2 ( @features ) { > next unless ($feat2->primary_tag eq "CDS"); > my @db_xrefs = $feat2->annotation->get_Annotations("db_xref"); > if (grep { $_ eq "GI:532506" } @db_xrefs) { > $feat = $feat2; > last; > } > } > my ($protein_seq) = $feat->annotation->get_Annotations("translation"); > ok($protein_seq =~ /^MVHLTPEEKSAVTALWGK.*VQAAYQKVVAGVANALAHKYH > $/, "protein sequence"); > my ($nucleotide_seq) = $feat->seq->seq; > ok($nucleotide_seq =~ /^ATGGTGCATCTGACTCCT.*CTGGCCCACAAGTATCACTAA > $/, "nucleotide sequence - correct CDS range"); > #print "[$nucleotide_seq]\n"; > ok($nucleotide_seq !~ /^ACCTCCTATTTGACACCA.*TGCTAGTCTCCCGGAACTATC > $/, "nucleotide sequence - full nucleotide should not match"); > is(length($nucleotide_seq), > 444, "nucleotide length"); > > # I have an old(?) version of M12730 which lists > # CDS join(1959..2355,1..92) > # /db_xref="GI:150830" > # Crazy ranges like that don't work at all, you end up with the > full nucleotide sequence... > # But NCBI doesn't list M12730 that way any more, so now I would be > OK? > > # ------------------ From sac at bioperl.org Thu Mar 1 11:30:59 2007 From: sac at bioperl.org (Steve Chervitz) Date: Thu, 1 Mar 2007 09:30:59 -0700 Subject: [Bioperl-l] [Bioperl-announce-l] BioPerl leadership additions In-Reply-To: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> References: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> Message-ID: <000101c75c1e$fecb7770$6400a8c0@CodonSolutions.local> Welcome to the club, Chris & Sendu. Always good to have an infusion of new blood and capable, motivated hands. Steve On 2/26/07, Jason Stajich wrote: > > Dear BioPerl Users and Developers, > > I want to announce a addition in the leadership of BioPerl. > Christopher Fields and and Sendu Bala are now members of the BioPerl > Core developer group to recognize their ongoing leadership in the > project. Chris and Sendu were instrumental in the 1.5.2 Developer > release and have made a significant commitment and contribution to > the quality of the code and the documentation of the project. We > have invited them to be part of the core to recognize their work and > to feel comfortable to ask them to do more. ;-) > > The Core group was established to insure that someone was responsible > for making code releases, vetting new developers for CVS write > accounts, and generally dealing with things that might otherwise slip > through the cracks. We are very excited to have more people > contributing to and maintaining the toolkit. We look forward to > their help along with all the other developers, as we work towards a > 1.6 release release this year. > > As always, while their is a need for some individuals to lead the > project, we encourage contributions from all levels of expertise to > improve the code, documentation, and tutorials of the project. > > We plan to discuss the progress of the toolkit at this year's > Bioinformatics Open Source Conference held in Vienna, Austria in > conjunction with the SIG meetings at ISMB. We are trying to use > BOSC 2007 as a chance for the developers of Open Bioinformatics > Foundation sponsored and related projects to coordinate future > development and release cycles. > > Jason Stajich on behalf of the Core developers > > _______________________________________________ > Bioperl-announce-l mailing list > Bioperl-announce-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l > _______________________________________________ Bioperl-announce-l mailing list Bioperl-announce-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l From arareko at campus.iztacala.unam.mx Thu Mar 1 11:30:59 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 1 Mar 2007 09:30:59 -0700 Subject: [Bioperl-l] [Bioperl-announce-l] BioPerl leadership additions In-Reply-To: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> References: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> Message-ID: <000001c75c1e$fec90670$6400a8c0@CodonSolutions.local> Congrats Chris & Sendu! Very well-deserved. Keep up the great work. Cheers! Mauricio. Jason Stajich wrote: > Dear BioPerl Users and Developers, > > I want to announce a addition in the leadership of BioPerl. > Christopher Fields and and Sendu Bala are now members of the BioPerl > Core developer group to recognize their ongoing leadership in the > project. Chris and Sendu were instrumental in the 1.5.2 Developer > release and have made a significant commitment and contribution to > the quality of the code and the documentation of the project. We > have invited them to be part of the core to recognize their work and > to feel comfortable to ask them to do more. ;-) > > The Core group was established to insure that someone was responsible > for making code releases, vetting new developers for CVS write > accounts, and generally dealing with things that might otherwise slip > through the cracks. We are very excited to have more people > contributing to and maintaining the toolkit. We look forward to > their help along with all the other developers, as we work towards a > 1.6 release release this year. > > As always, while their is a need for some individuals to lead the > project, we encourage contributions from all levels of expertise to > improve the code, documentation, and tutorials of the project. > > We plan to discuss the progress of the toolkit at this year's > Bioinformatics Open Source Conference held in Vienna, Austria in > conjunction with the SIG meetings at ISMB. We are trying to use > BOSC 2007 as a chance for the developers of Open Bioinformatics > Foundation sponsored and related projects to coordinate future > development and release cycles. > > Jason Stajich on behalf of the Core developers > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM _______________________________________________ Bioperl-announce-l mailing list Bioperl-announce-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l From johnsonm at gmail.com Thu Mar 1 11:49:20 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Thu, 1 Mar 2007 10:49:20 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: References: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu>

Message-ID: On 2/28/07, Hilmar Lapp wrote: > > > I'm not sure how the user would be able to take out the child hitting > ctrl-c if you run it through system() (except if the parent > terminates first - but maybe then terminating a run-away child is in > good order). Quoting the perlfunc docs on system: Since "SIGINT" and "SIGQUIT" are ignored during the execution of "system", if you expect your program to terminate on receipt of these signals you will need to arrange to do so yourself based on the return value. @args = ("command", "arg1", "arg2"); system(@args) == 0 or die "system @args failed: $?" You can check all the failure possibilities by inspecting $? like this: if ($? == -1) { print "failed to execute: $!\n"; } elsif ($? & 127) { printf "child died with signal %d, %s coredump\n", ($? & 127), ($? & 128) ? 'with' : 'without'; } else { printf "child exited with value %d\n", $? >> 8; } or more portably by using the W*() calls of the POSIX exten? sion; see perlport for more information. When the arguments get executed via the system shell, results and return codes will be subject to its quirks and capabili? ties. See "'STRING'" in perlop and "exec" for details. So, during a call to system(), a CTRL-C (SIGINT) won't take out the parent, but it will take out the child, unless the child has caught it and handled it. If you don't care why the child failed, just that it did, I suppose the distinction is a subtle one. > I haven't read the IPC::run POD in full detail but you will want to > make sure that if the parent gets killed the child does get killed > too, or otherwise you'll have a run-away process that novices will > have trouble with understanding or terminating. I'll double check. > Other than that though IPC::run seems like a useful module, so > incurring this as a dependency should be fine. > From thiago.venancio at gmail.com Thu Mar 1 13:02:14 2007 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Thu, 1 Mar 2007 15:02:14 -0300 Subject: [Bioperl-l] frac_aligned_query returning results >1. Message-ID: <44255ea80703011002l245e9576s66319ee695d3bd5b@mail.gmail.com> Hi all, I have read a lot of threads regarding my issue, but still didn't get any efficient answer yet. I am with problems with frac_aligned_query(). It is returning "> 1" results. I have just updated my SearhUtils.pm from: http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/Search/SearchUtils.pm The problem persists and, additionally, I get several warnings like: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Undefined sub-sequence (1507,1507) . Valid range = 1444 - 1507 STACK: Error::throw STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:328 STACK: Bio::Search::HSP::HSPI::matches /usr/share/perl5/Bio/Search/HSP/HSPI.pm:711 STACK: Bio::Search::SearchUtils::_adjust_contigs /usr/share/perl5/Bio/Search/SearchUtils.pm:489 STACK: Bio::Search::SearchUtils::tile_hsps /usr/share/perl5/Bio/Search/SearchUtils.pm:200 STACK: Bio::Search::Hit::GenericHit::frac_aligned_query /usr/share/perl5/Bio/Search/Hit/GenericHit.pm:1145 STACK: ./geraStatGenome.pl:17 My code is pretty clean: while( my $hit = $result->next_hit ) { print $result->query_name."\t".$hit->frac_aligned_query('query')."\t".$hit->frac_identical( 'query' )."\n"; last; } Thanks. Thiago -- "The way to get started is to quit talking and begin doing." Walt Disney ======================== Thiago Motta Venancio, MSc PhD student in Bioinformatics University of Sao Paulo ======================== From cjfields at uiuc.edu Thu Mar 1 13:27:10 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Mar 2007 12:27:10 -0600 Subject: [Bioperl-l] frac_aligned_query returning results >1. In-Reply-To: <44255ea80703011002l245e9576s66319ee695d3bd5b@mail.gmail.com> References: <44255ea80703011002l245e9576s66319ee695d3bd5b@mail.gmail.com> Message-ID: <8D7B0767-46A6-4083-B94E-D6490B241B84@uiuc.edu> This is related to a reported bug: http://bugzilla.open-bio.org/show_bug.cgi?id=2193 The relevant code used to tile HSPs is a bit brittle and sometimes leads to errors like this. The error (which is actually a thrown exception) is wrapped in an eval block and converted to a warn for that reason. I'm not familiar with the tiling algorithm used, maybe Steve can add some input? chris On Mar 1, 2007, at 12:02 PM, Thiago Venancio wrote: > Hi all, > > I have read a lot of threads regarding my issue, but still didn't > get any > efficient answer yet. > > I am with problems with frac_aligned_query(). It is returning "> 1" > results. > I have just updated my SearhUtils.pm from: > http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ > Bio/Search/SearchUtils.pm > > > The problem persists and, additionally, I get several warnings like: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Undefined sub-sequence (1507,1507) . Valid range = 1444 - 1507 > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:328 > STACK: Bio::Search::HSP::HSPI::matches > /usr/share/perl5/Bio/Search/HSP/HSPI.pm:711 > STACK: Bio::Search::SearchUtils::_adjust_contigs > /usr/share/perl5/Bio/Search/SearchUtils.pm:489 > STACK: Bio::Search::SearchUtils::tile_hsps > /usr/share/perl5/Bio/Search/SearchUtils.pm:200 > STACK: Bio::Search::Hit::GenericHit::frac_aligned_query > /usr/share/perl5/Bio/Search/Hit/GenericHit.pm:1145 > STACK: ./geraStatGenome.pl:17 > > My code is pretty clean: > > while( my $hit = $result->next_hit ) { > print > $result->query_name."\t".$hit->frac_aligned_query('query')."\t". > $hit->frac_identical( > 'query' )."\n"; > last; > } > > > Thanks. > > Thiago > > > -- > "The way to get started is to quit talking and begin doing." > Walt Disney > > ======================== > Thiago Motta Venancio, MSc > PhD student in Bioinformatics > University of Sao Paulo > ======================== > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From Kevin.M.Brown at asu.edu Thu Mar 1 13:28:22 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 1 Mar 2007 11:28:22 -0700 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <45E5F43C.9080902@sendu.me.uk> References: <45E5F43C.9080902@sendu.me.uk> Message-ID: <1A4207F8295607498283FE9E93B775B402CC9D60@EX02.asurite.ad.asu.edu> Are you certain that GD has SVG enabled in it? Sounds like this error is from outside the bioperl panel and is instead from GD and the GD perl module. > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Wednesday, February 28, 2007 2:30 PM > To: bioperl list > Cc: Lincoln Stein > Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails > > I have GD 2.35 and GD::SVG 2.33 installed. > > I have a working script in which a Bio::Graphics::Panel > object is made and output with: > > print $panel->png; > > This is fine. Changing it to: > > print $panel->svg; > > Gives the error: > > Can't locate object method "svg" via package "GD:Image" at > /.../Bio/Graphics/Panel.pm line 971, line 192. > > > Am I supposed to do something else to get this to work? > > > Cheers, > Sendu. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Thu Mar 1 14:51:19 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Mar 2007 13:51:19 -0600 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <1A4207F8295607498283FE9E93B775B402CC9D60@EX02.asurite.ad.asu.edu> References: <45E5F43C.9080902@sendu.me.uk> <1A4207F8295607498283FE9E93B775B402CC9D60@EX02.asurite.ad.asu.edu> Message-ID: Does SVG output via GD still require GD::SVG (or SVG::GD, I can't remember which)? chris On Mar 1, 2007, at 12:28 PM, Kevin Brown wrote: > Are you certain that GD has SVG enabled in it? Sounds like this error > is from outside the bioperl panel and is instead from GD and the GD > perl > module. .. From stefan.kirov at bms.com Thu Mar 1 15:11:11 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 01 Mar 2007 15:11:11 -0500 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: References: <45E5F43C.9080902@sendu.me.uk> <1A4207F8295607498283FE9E93B775B402CC9D60@EX02.asurite.ad.asu.edu> Message-ID: <45E7335F.8070102@bms.com> Chris Fields wrote: > Does SVG output via GD still require GD::SVG (or SVG::GD, I can't > remember which)? > > chris > > On Mar 1, 2007, at 12:28 PM, Kevin Brown wrote: > > >> Are you certain that GD has SVG enabled in it? Sounds like this error >> is from outside the bioperl panel and is instead from GD and the GD >> perl >> module. >> > .. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Guys, I think you missed parts of the discussion yesterday, it was the object constructor, which decides if it should use GD or GD::SVG... Stefan From cjfields at uiuc.edu Thu Mar 1 15:14:41 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Mar 2007 14:14:41 -0600 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <45E7335F.8070102@bms.com> References: <45E5F43C.9080902@sendu.me.uk> <1A4207F8295607498283FE9E93B775B402CC9D60@EX02.asurite.ad.asu.edu> <45E7335F.8070102@bms.com> Message-ID: <5D75FAFC-F71A-4528-8650-818C2CFC85FF@uiuc.edu> Nope, I saw that. I was just curious; I hadn't used GD in a while but will be soon... chris On Mar 1, 2007, at 2:11 PM, Stefan Kirov wrote: > Chris Fields wrote: >> Does SVG output via GD still require GD::SVG (or SVG::GD, I can't >> remember which)? >> >> chris >> >> On Mar 1, 2007, at 12:28 PM, Kevin Brown wrote: >> >> >>> Are you certain that GD has SVG enabled in it? Sounds like this >>> error >>> is from outside the bioperl panel and is instead from GD and the >>> GD perl >>> module. >>> >> .. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > Guys, I think you missed parts of the discussion yesterday, it was > the object constructor, which decides if it should use GD or > GD::SVG... > Stefan From stefan.kirov at bms.com Thu Mar 1 15:20:46 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 01 Mar 2007 15:20:46 -0500 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <45E7335F.8070102@bms.com> References: <45E5F43C.9080902@sendu.me.uk> <1A4207F8295607498283FE9E93B775B402CC9D60@EX02.asurite.ad.asu.edu> <45E7335F.8070102@bms.com> Message-ID: <45E7359E.5030104@bms.com> Stefan Kirov wrote: > Chris Fields wrote: > >> Does SVG output via GD still require GD::SVG (or SVG::GD, I can't >> remember which)? >> >> chris >> >> On Mar 1, 2007, at 12:28 PM, Kevin Brown wrote: >> >> >> >>> Are you certain that GD has SVG enabled in it? Sounds like this error >>> is from outside the bioperl panel and is instead from GD and the GD >>> perl >>> module. >>> >>> >> .. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > Guys, I think you missed parts of the discussion yesterday, it was the > object constructor, which decides if it should use GD or GD::SVG... > Stefan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > OK, sorry.. In any case yes, it requires GD::SVG since the constructor instantiate GD::SVG object if you pass -image_class=~'svg' Stefan From jay at jays.net Thu Mar 1 16:15:03 2007 From: jay at jays.net (Jay Hannah) Date: Thu, 1 Mar 2007 15:15:03 -0600 (CST) Subject: [Bioperl-l] Enhance Bio::PrimarySeqI::trunc() for Bio::Location::Split ? In-Reply-To: References:

Message-ID: On Thu, 1 Mar 2007, Chris Fields wrote: > Have you tried using $feature->spliced_seq() instead of seq()? Using > seq() retrieves the full sequence for the split location (from start > of first sublocation to end of last), while spliced_seq() splices the > sublocation sequences together, which is what I think you want. Genius. No wonder they promoted you into the core developer group. :) Using this: my ($nucleotide_seq) = $feat->spliced_seq(-nosort => 1)->seq; Gives me what I expected against these: # M37762 CDS 76..819 # L26462 CDS join(866..957,1088..1310,2161..2289) # M12730 CDS join(1959..2355,1..92) I'm happy to submit my patches for t/genbank.t and t/data/test.genbank if that would make the universe a slightly better place. (...or t/SeqFeature.t or t/splicedseq.t, which appear to be the tests that have spliced_seq calls in them so far...) Thanks! j seqlab.net http://www.bioperl.org/wiki/User:Jhannah From lstein at cshl.edu Thu Mar 1 15:39:12 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 1 Mar 2007 15:39:12 -0500 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <45E7359E.5030104@bms.com> References: <45E5F43C.9080902@sendu.me.uk> <1A4207F8295607498283FE9E93B775B402CC9D60@EX02.asurite.ad.asu.edu> <45E7335F.8070102@bms.com> <45E7359E.5030104@bms.com> Message-ID: <6dce9a0b0703011239m7065a570l6d19e6c7065fca45@mail.gmail.com> You need to have GD::SVG installed and then instantiate the panel with: -image_class=>'GD::SVG' Lincoln On 3/1/07, Stefan Kirov wrote: > > Stefan Kirov wrote: > > Chris Fields wrote: > > > >> Does SVG output via GD still require GD::SVG (or SVG::GD, I can't > >> remember which)? > >> > >> chris > >> > >> On Mar 1, 2007, at 12:28 PM, Kevin Brown wrote: > >> > >> > >> > >>> Are you certain that GD has SVG enabled in it? Sounds like this error > >>> is from outside the bioperl panel and is instead from GD and the GD > >>> perl > >>> module. > >>> > >>> > >> .. > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >> > > Guys, I think you missed parts of the discussion yesterday, it was the > > object constructor, which decides if it should use GD or GD::SVG... > > Stefan > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > OK, sorry.. > In any case yes, it requires GD::SVG since the constructor instantiate > GD::SVG object if you pass -image_class=~'svg' > Stefan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From stefan.kirov at bms.com Thu Mar 1 16:03:03 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 01 Mar 2007 16:03:03 -0500 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <6dce9a0b0703011239m7065a570l6d19e6c7065fca45@mail.gmail.com> References: <45E5F43C.9080902@sendu.me.uk> <1A4207F8295607498283FE9E93B775B402CC9D60@EX02.asurite.ad.asu.edu> <45E7335F.8070102@bms.com> <45E7359E.5030104@bms.com> <6dce9a0b0703011239m7065a570l6d19e6c7065fca45@mail.gmail.com> Message-ID: <45E73F87.1090104@bms.com> Lincoln Stein wrote: > You need to have GD::SVG installed and then instantiate the panel with: > -image_class=>'GD::SVG' > Yes, silly me I was looking at the code and did not realize that =~/svg/ is only to check, and the actual class name is taken of the arg list. Sorry Chris. Stefan > Lincoln > > On 3/1/07, Stefan Kirov wrote: > >> Stefan Kirov wrote: >> >>> Chris Fields wrote: >>> >>> >>>> Does SVG output via GD still require GD::SVG (or SVG::GD, I can't >>>> remember which)? >>>> >>>> chris >>>> >>>> On Mar 1, 2007, at 12:28 PM, Kevin Brown wrote: >>>> >>>> >>>> >>>> >>>>> Are you certain that GD has SVG enabled in it? Sounds like this error >>>>> is from outside the bioperl panel and is instead from GD and the GD >>>>> perl >>>>> module. >>>>> >>>>> >>>>> >>>> .. >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> >>> Guys, I think you missed parts of the discussion yesterday, it was the >>> object constructor, which decides if it should use GD or GD::SVG... >>> Stefan >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> OK, sorry.. >> In any case yes, it requires GD::SVG since the constructor instantiate >> GD::SVG object if you pass -image_class=~'svg' >> Stefan >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > > From lstein at cshl.edu Thu Mar 1 16:10:52 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 1 Mar 2007 16:10:52 -0500 Subject: [Bioperl-l] [Bioperl-announce-l] BioPerl leadership additions In-Reply-To: <8f200b4c0702270959n7521f693l915bfabe7ccb7ef7@mail.gmail.com> References: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> <8f200b4c0702270959n7521f693l915bfabe7ccb7ef7@mail.gmail.com> Message-ID: <6dce9a0b0703011310n6c4ec150hfb7835ed576461d4@mail.gmail.com> Hear hear! Lincoln On 2/27/07, Steve Chervitz wrote: > > Welcome to the club, Chris & Sendu. Always good to have an infusion of new > blood and capable, motivated hands. > > Steve > > On 2/26/07, Jason Stajich wrote: > > > > Dear BioPerl Users and Developers, > > > > I want to announce a addition in the leadership of BioPerl. > > Christopher Fields and and Sendu Bala are now members of the BioPerl > > Core developer group to recognize their ongoing leadership in the > > project. Chris and Sendu were instrumental in the 1.5.2 Developer > > release and have made a significant commitment and contribution to > > the quality of the code and the documentation of the project. We > > have invited them to be part of the core to recognize their work and > > to feel comfortable to ask them to do more. ;-) > > > > The Core group was established to insure that someone was responsible > > for making code releases, vetting new developers for CVS write > > accounts, and generally dealing with things that might otherwise slip > > through the cracks. We are very excited to have more people > > contributing to and maintaining the toolkit. We look forward to > > their help along with all the other developers, as we work towards a > > 1.6 release release this year. > > > > As always, while their is a need for some individuals to lead the > > project, we encourage contributions from all levels of expertise to > > improve the code, documentation, and tutorials of the project. > > > > We plan to discuss the progress of the toolkit at this year's > > Bioinformatics Open Source Conference held in Vienna, Austria in > > conjunction with the SIG meetings at ISMB. We are trying to use > > BOSC 2007 as a chance for the developers of Open Bioinformatics > > Foundation sponsored and related projects to coordinate future > > development and release cycles. > > > > Jason Stajich on behalf of the Core developers > > > > _______________________________________________ > > Bioperl-announce-l mailing list > > Bioperl-announce-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Thu Mar 1 16:23:49 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 1 Mar 2007 16:23:49 -0500 Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9503E2EBE3@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk> <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com> <8975119BCD0AC5419D61A9CF1A923E9503E2EBE3@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <6dce9a0b0703011323i44dea645ha5aba6361dbc964@mail.gmail.com> I'm glad you picked that up. I would have never noticed the missing comma. NB: if you set "use warnings" at the top of your script, then you would have gotten an error message about subtraction with an undefined variable. Lincoln On 3/1/07, michael watson (IAH-C) wrote: > > In fact, those pad_left and pad_right arguments have no effect whatsoever > (using bioperl 1.5.2_100) > > my $panel = Bio::Graphics::Panel->new(-key_style => between, > -offset => $start, > -length => $stop - $start + 1, > -width => 800 > -pad_left =>5000, > -pad_right =>5000 > ); > Even if I set them to 5000, the image looks exactly as if I had not set > them. > > The only way I can get around this is to edit Glyph/dna.pm lines 184 and > 186 such that $x2+3 is changed to $x2-20 - then the axes are drawn on the > image instead of outside of it. This is obviously a hack, which upsets my > karma. > > Mick > > ------------------------------ > *From:* lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] *On > Behalf Of *Lincoln Stein > *Sent:* 15 February 2007 18:53 > *To:* michael watson (IAH-C) > *Cc:* BioPerl-List > *Subject:* Re: [Bioperl-l] The axis of GC content in > Bio::Graphics::glyph:dna > > Hi Michael, > > When you set up the panel, do this: > > Bio::Graphics::Panel->new(-blah -blah, > -pad_left => 20, > -pad_right => 20); > > This will leave enough room on the left and right for you to see the Y > axis. Otherwise it runs off the edge of the image (ok, this is a mis-design, > but it was the only way to solve a chicken-and-egg problem about who gets to > say how wide the panel is) > > Lincoln > > On 2/15/07, michael watson (IAH-C) wrote: > > > > Hi > > > > OK I have some great images out of this glyph, but I can't see the axis, > > and nor is it labelled (ie does it go from 0 - 100%?) so isn't great for > > publication. The docs say: > > > > "NOTE: -gc_window=>'auto' gives nice results and is recommended for > > drawing GC content. The GC content axes draw slightly outside the > > panel, so you may wish to add some extra padding on the right and > > left. " > > > > Any idea how to do this? > > > > Basically, I want a nice GC graph with the axis quite clearly labelled, > > and a nice "%GC" title next to it :) > > > > Thanks > > > > Mick > > > > The information contained in this message may be confidential or legally > > privileged and is intended solely for the addressee. If you have > > received this message in error please delete it & notify the originator > > immediately. > > Unauthorised use, disclosure, copying or alteration of this message is > > forbidden & may be unlawful. > > The contents of this e-mail are the views of the sender and do not > > necessarily represent the views of the Institute. > > This email and associated attachments has been checked locally for > > viruses but we can accept no responsibility once it has left our > > systems. > > Communications on Institute computers are monitored to secure the > > effective operation of the systems and for other lawful purposes. > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Thu Mar 1 16:25:09 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Mar 2007 15:25:09 -0600 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <45E73F87.1090104@bms.com> References: <45E5F43C.9080902@sendu.me.uk> <1A4207F8295607498283FE9E93B775B402CC9D60@EX02.asurite.ad.asu.edu> <45E7335F.8070102@bms.com> <45E7359E.5030104@bms.com> <6dce9a0b0703011239m7065a570l6d19e6c7065fca45@mail.gmail.com> <45E73F87.1090104@bms.com> Message-ID: <6727E8E1-F2D9-4C4E-843F-FC6D53ADAAA7@uiuc.edu> No problemo. chris On Mar 1, 2007, at 3:03 PM, Stefan Kirov wrote: > Lincoln Stein wrote: >> You need to have GD::SVG installed and then instantiate the panel >> with: >> -image_class=>'GD::SVG' >> > Yes, silly me I was looking at the code and did not realize that =~/ > svg/ > is only to check, and the actual class name is taken of the arg list. > Sorry Chris. > Stefan >> Lincoln >> >> On 3/1/07, Stefan Kirov wrote: >> >>> Stefan Kirov wrote: >>> >>>> Chris Fields wrote: >>>> >>>> >>>>> Does SVG output via GD still require GD::SVG (or SVG::GD, I can't >>>>> remember which)? >>>>> >>>>> chris >>>>> >>>>> On Mar 1, 2007, at 12:28 PM, Kevin Brown wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> Are you certain that GD has SVG enabled in it? Sounds like >>>>>> this error >>>>>> is from outside the bioperl panel and is instead from GD and >>>>>> the GD >>>>>> perl >>>>>> module. >>>>>> >>>>>> >>>>>> >>>>> .. >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> >>>> Guys, I think you missed parts of the discussion yesterday, it >>>> was the >>>> object constructor, which decides if it should use GD or GD::SVG... >>>> Stefan >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>> OK, sorry.. >>> In any case yes, it requires GD::SVG since the constructor >>> instantiate >>> GD::SVG object if you pass -image_class=~'svg' >>> Stefan >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Mar 1 16:29:18 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Mar 2007 15:29:18 -0600 Subject: [Bioperl-l] Enhance Bio::PrimarySeqI::trunc() for Bio::Location::Split ? In-Reply-To: References:

Message-ID: On Mar 1, 2007, at 3:15 PM, Jay Hannah wrote: > On Thu, 1 Mar 2007, Chris Fields wrote: >> Have you tried using $feature->spliced_seq() instead of seq()? Using >> seq() retrieves the full sequence for the split location (from start >> of first sublocation to end of last), while spliced_seq() splices the >> sublocation sequences together, which is what I think you want. > > Genius. No wonder they promoted you into the core developer group. :) > > Using this: > my ($nucleotide_seq) = $feat->spliced_seq(-nosort => 1)->seq; > > Gives me what I expected against these: > > # M37762 CDS 76..819 > # L26462 CDS join(866..957,1088..1310,2161..2289) > # M12730 CDS join(1959..2355,1..92) > > I'm happy to submit my patches for t/genbank.t and t/data/ > test.genbank if > that would make the universe a slightly better place. (...or > t/SeqFeature.t or t/splicedseq.t, which appear to be the tests that > have > spliced_seq calls in them so far...) > > Thanks! > > j > seqlab.net > http://www.bioperl.org/wiki/User:Jhannah The more the merrier tests the better, I say. I would only put in one example, though (maybe the last one, M12730, since it's from a gene in a circular sequence split across the start). I'm still planning on testing out some variations of Bio::Location::SplitLocationI (which impacts sliced_seq() ) and have started a page on it, so any added tests would be great. chris From harris at cshl.edu Thu Mar 1 15:09:16 2007 From: harris at cshl.edu (Todd Harris) Date: Thu, 1 Mar 2007 13:09:16 -0700 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: References: <45E5F43C.9080902@sendu.me.uk> Message-ID: <1172779760.7B32453@fd9.dngr.org> Hi Chris - I don't believe that GD or gd for that matter can generate SVG but I could be wrong. SVG output can be generated from GD using either GD::SVG or SVG::GD, two modules that accomplish the same task through a similar strategy. Todd On Thu, 1 Mar 2007 2:05 pm, Chris Fields wrote: > Does SVG output via GD still require GD::SVG (or SVG::GD, I can't > remember which)? > > chris > > On Mar 1, 2007, at 12:28 PM, Kevin Brown wrote: > >> Are you certain that GD has SVG enabled in it? Sounds like this error >> is from outside the bioperl panel and is instead from GD and the GD >> perl >> module. > .. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From johnsonm at gmail.com Thu Mar 1 17:46:12 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Thu, 1 Mar 2007 16:46:12 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: <45E61AA9.9030906@sendu.me.uk> References: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> <45E61AA9.9030906@sendu.me.uk> Message-ID: Now that I'm using _set_from_args() and trying to get all the options and switches working that I never use, it occurs to me that a 4-in-1 module for Glimmer2/Glimmer3/GlimmerM/GllimmerHMM is not going to fly due to the options and switches being different. At this point, I think I'm going to end up with a Genemark module, a Glimmer2 module, and a Glimmer3 module. Feh. On 2/28/07, Sendu Bala wrote: > > Mark Johnson wrote: > > I'm using _rearrange() now. I'll look at _set_from_args(). Is either > one > > preferred to the other? > > _set_from_args() is implemented using _rearrange() iirc. In any case, > they do different things but _set_from_args() just makes creating > wrapper modules a lot simpler. Another example: compare revisions 1.15 > and 1.16 of Bio::Tools::Run::Alignment::Lagan where I reimplemented it > to use _set_from_args() and _setparams(). > > > http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-run/Bio/Tools/Run/Alignment/Lagan.pm.diff?r1=text&tr1=1.15&r2=text&tr2=1.16&diff_format=h > > So, its new, but I'd recommend new modules, especially wrappers, make > use of it. > From bix at sendu.me.uk Thu Mar 1 18:06:54 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 01 Mar 2007 23:06:54 +0000 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: References: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> <45E61AA9.9030906@sendu.me.uk> Message-ID: <45E75C8E.7010809@sendu.me.uk> Mark Johnson wrote: > Now that I'm using _set_from_args() and trying to get all the options > and switches working that I never use, it occurs to me that a 4-in-1 > module for Glimmer2/Glimmer3/GlimmerM/GllimmerHMM is not going to fly > due to the options and switches being different. At this point, I think > I'm going to end up with a Genemark module, a Glimmer2 module, and a > Glimmer3 module. Feh. I think a 4in1 would still be possible. Presumably at some point you know which one you will run, so let the user set everything in the single new() even if it doesn't make sense, but then form argument strings with sub _setparams { ... if ($glimmer2) { my $param_string = $self->SUPER::_setparams( -params => [@glim2params], -dash => 1); } elsif ($glimmer3) { ... Or if you want to be stricter in new(), do something like: sub new { my($class, @args) = @_; my $self = $class->SUPER::new(@args); my ($type) = $self->_rearrange([qw(TYPE)], @args); if ($type eq 'glimmer2') } $self->_set_from_args(\@args, -methods => [@glim2params], -create => 1); } elsif ($type eq ... You'll have to figure out something yourself if you want to warn about the user supplying args that their requested type doesn't use. All that said, if these Glimmer things are different programs with different uses (and not simply different versions of the same thing with the same function), by all means make separate modules. From cjfields at uiuc.edu Thu Mar 1 18:08:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Mar 2007 17:08:46 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: References: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> <45E61AA9.9030906@sendu.me.uk> Message-ID: <9CE93EC0-A9DB-4A5B-8CE7-F15795375587@uiuc.edu> I have been working on an Infernal wrapper (not finished yet but getting there) which does this: # when run() is called, cmsearch is the program run... my $factory = Bio::Tools::Run::Infernal->new('-program' =>'cmsearch', @params); in Infernal.pm: # for each program... my %INFERNAL_PROGRAM = ( ... cmsearch => [qw(h W informat toponly local noalign dumptrees thresh X inside null2 learninserts hmmfb hmmweinberg hmmpad hmmonly hthresh beta noqdb qdbfile hbanded hbandp banddump sums scan2bands)], ... ); then set in new() based on only the parameters listed for the program; I'm still toying with whether the program needs to be specified in the constructor prior to a run. There are prob. other variations on this using AUTOLOAD and _set_from_args() etc. chris On Mar 1, 2007, at 4:46 PM, Mark Johnson wrote: > Now that I'm using _set_from_args() and trying to get all the > options and switches working that I never use, it occurs to me that > a 4-in-1 module for Glimmer2/Glimmer3/GlimmerM/GllimmerHMM is not > going to fly due to the options and switches being different. At > this point, I think I'm going to end up with a Genemark module, a > Glimmer2 module, and a Glimmer3 module. Feh. From lstein at cshl.edu Thu Mar 1 18:12:29 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 1 Mar 2007 18:12:29 -0500 Subject: [Bioperl-l] Bio::Factory::EMBOSS, CGI and taint In-Reply-To: <45E55884.9010908@uq.edu.au> References: <45E55884.9010908@uq.edu.au> Message-ID: <6dce9a0b0703011512k360bd94dv82e143d4477ebcea@mail.gmail.com> You'll need to set the %ENV hash to a known safe state. e.g.: $ENV{PATH}="/bin:/usr/bin:/usr/local/bin"; Lincoln On 2/28/07, Neil Saunders wrote: > > Dear Bioperlers, > > I'm trying to understand an error that occurs when Bio::Factory::EMBOSS is > used > in a CGI script. Using BioPerl 1.5.2 on Ubuntu Dapper, Apache 2.0.55, > Perl 5.8.7. > > If I load this test CGI script (cgi.pl) in a browser: > > BEGIN CODE > ---------- > #!/usr/bin/perl -Tw > use strict; > use CGI; > use Bio::Factory::EMBOSS; > > my $cgi = new CGI; > my $f = new Bio::Factory::EMBOSS; > > print $cgi->header, > $cgi->start_html, > $cgi->end_html; > -------- > END CODE > > I get a 500 server error and the Apache error log reads: > [error] [client 192.168.0.3] Premature end of script headers: cgi.pl > > I can fix this in 2 ways: > > (1) Move the "my $f = new Bio::Factory::EMBOSS" line to the end of the > script, > which isn't a very useful fix. > (2) Remove the -T switch from the shebang line > > There seem to be a few old posts on the list regarding "taint-safe" > modules. It > seems that the new Bio::Factory::EMBOSS object is interfering with the > headers > in some way, but I'm no CGI.pm guru and wondered if anyone could shed > light on this. > > thanks, > Neil > -- > School of Molecular and Microbial Sciences > University of Queensland > Brisbane 4072 Australia > > http://nsaunders.wordpress.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From arareko at campus.iztacala.unam.mx Thu Mar 1 21:52:49 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 01 Mar 2007 20:52:49 -0600 Subject: [Bioperl-l] BioPerl in MiniCPAN Message-ID: <45E79181.9090404@campus.iztacala.unam.mx> Folks, Just found this post by Brian D Foy at the O'Reilly ONLamp Blog, BioPerl takes a reasonable part in the picture: http://www.oreillynet.com/onlamp/blog/2007/02/minicpan_and_grandperspective.html It would be interesting to see the same graphic for the whole CPAN repository... :) Cheers, Mauricio. -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From heikki at sanbi.ac.za Fri Mar 2 01:08:15 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Fri, 2 Mar 2007 08:08:15 +0200 Subject: [Bioperl-l] Bio::SeqIO::FTHelper In-Reply-To: <8AA2B586-4C2B-4379-BFF0-0CEB15FAE68E@uiuc.edu> References: <200703011302.30855.heikki@sanbi.ac.za> <8AA2B586-4C2B-4379-BFF0-0CEB15FAE68E@uiuc.edu> Message-ID: <200703020808.15664.heikki@sanbi.ac.za> This sounds great. Is the speed increase noticeable? -Heikki On Thursday 01 March 2007 17:24:03 Chris Fields wrote: > I do have a rough outline of what I think could be done: > > http://www.bioperl.org/wiki/Handler-based_SeqIO_parsers > > where you could switch out handlers to deal with incoming data > chunks. Any suggestions there are welcome. > > I'll probably commit examples of the above in the next week or two > (GenBank, EMBL, Swiss parsers using the same handlers) which don't > use FTHelper. So far I have all three passing tests based on genbank/ > embl/swiss.t but they need a few more tweaks before I commit. > > chris > > On Mar 1, 2007, at 5:02 AM, Heikki Lehvaslaiho wrote: > > Chris, > > > > It was meant to collect code that was common to all three main > > databases using > > similar feature tables. > > > > Now might be the time to optimise the parsing speed by removing it. > > Do you > > have a plan how to do it? > > > > -Heikki > > > > On Tuesday 27 February 2007 22:57:40 Chris Fields wrote: > >> Could anyone tell me what FTHelper is used for? From what I gather > >> it rolls up seqfeature data into a lightweight object but then > >> creates a SeqFeature::Generic anyway (at least for GenBank/EMBL/ > >> Swiss), which seems to be a waste of memory and time. Is there > >> something I'm missing (besides my sanity of course)? > >> > >> chris > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > ______ _/ _/_____________________________________________________ > > _/ _/ > > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > > _/ _/ _/ SANBI, South African National Bioinformatics Institute > > _/ _/ _/ University of Western Cape, South Africa > > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > > ___ _/_/_/_/_/________________________________________________________ > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From bix at sendu.me.uk Fri Mar 2 06:23:45 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 02 Mar 2007 11:23:45 +0000 Subject: [Bioperl-l] frac_aligned_query returning results >1. In-Reply-To: <44255ea80703011002l245e9576s66319ee695d3bd5b@mail.gmail.com> References: <44255ea80703011002l245e9576s66319ee695d3bd5b@mail.gmail.com> Message-ID: <45E80941.6020406@sendu.me.uk> Thiago Venancio wrote: > Hi all, > > I have read a lot of threads regarding my issue, but still didn't get any > efficient answer yet. > > I am with problems with frac_aligned_query(). It is returning "> 1" > results. > I have just updated my SearhUtils.pm from: > http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/Search/SearchUtils.pm It shouldn't return results greater than 1. Please send me a minimal blast report that gives such results. Make sure you only have one copy of SearchUtils.pm and that is the latest version (or that you are definitely using that latest version). > The problem persists and, additionally, I get several warnings like: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Undefined sub-sequence (1507,1507) . Valid range = 1444 - 1507 > STACK: Error::throw I don't know about that problem. See Chris's reply. From bix at sendu.me.uk Fri Mar 2 06:25:25 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 02 Mar 2007 11:25:25 +0000 Subject: [Bioperl-l] frac_aligned_query returning results >1. In-Reply-To: <8D7B0767-46A6-4083-B94E-D6490B241B84@uiuc.edu> References: <44255ea80703011002l245e9576s66319ee695d3bd5b@mail.gmail.com> <8D7B0767-46A6-4083-B94E-D6490B241B84@uiuc.edu> Message-ID: <45E809A5.9060407@sendu.me.uk> Chris Fields wrote: > This is related to a reported bug: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2193 > > The relevant code used to tile HSPs is a bit brittle and sometimes > leads to errors like this. The error (which is actually a thrown > exception) is wrapped in an eval block and converted to a warn for > that reason. I'm not familiar with the tiling algorithm used, maybe > Steve can add some input? Depending on what exactly you're talking about here, I may have re-written that algorithm. Nice to know the bug survived ;) From bix at sendu.me.uk Fri Mar 2 06:42:26 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 02 Mar 2007 11:42:26 +0000 Subject: [Bioperl-l] Problem of Installing Bioperl In-Reply-To: <405300.74546.qm@web60523.mail.yahoo.com> References: <405300.74546.qm@web60523.mail.yahoo.com> Message-ID: <45E80DA2.6050303@sendu.me.uk> Chan Kuang Lim wrote: > Thank you for your reply, but still cant solve my problem. The folder > 'ppm-VzM4DH' do not exist in my system. so, there is no > bioperl-1.5.2_100.tgz. How i can get it? You should post back to the mailing list; I don't have a Windows machine to test things out on. Others with more Windows experience may be able to help you. I can suggest making sure you have the latest version of (ActiveState) perl and the GUI PPM installer. As a last resort you can ensure you have nmake installed and try installing with CPAN on the command-line. It will no doubt be helpful if you supply complete, unedited details of what you do and the errors you receive so we can diagnose your problem successfully. From n.haigh at sheffield.ac.uk Fri Mar 2 07:43:04 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 02 Mar 2007 12:43:04 +0000 Subject: [Bioperl-l] Problem of Installing Bioperl In-Reply-To: <459942.77644.qm@web60518.mail.yahoo.com> References: <459942.77644.qm@web60518.mail.yahoo.com> Message-ID: <45E81BD8.3030304@sheffield.ac.uk> Chan Kuang Lim wrote: > I have problem of installing bioperl in windows using command-line installation. > In the cmd windows, after > ppm-shell > search bioperl > install 2 > > many downloading had done, but the next line is: > Unpacking bioperl-1.5.2_100...ppm install failed: Can't extract files from C:.............../Bioperl-1.5.2_100.tgz > > > Hope you can answer my question. Thank you. > > Regards, > Chan Kuang Lim > Malaysia > > I should be able to help out, but I'm a little busy at the moment. If you are still having problems, let us know the details of your system, e.g. what version of windows, if you are logged in as an administrator, what version of activeperl and what version of Perl. Cheers Nath From cjfields at uiuc.edu Fri Mar 2 08:33:08 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 2 Mar 2007 07:33:08 -0600 Subject: [Bioperl-l] frac_aligned_query returning results >1. In-Reply-To: <45E809A5.9060407@sendu.me.uk> References: <44255ea80703011002l245e9576s66319ee695d3bd5b@mail.gmail.com> <8D7B0767-46A6-4083-B94E-D6490B241B84@uiuc.edu> <45E809A5.9060407@sendu.me.uk> Message-ID: On Mar 2, 2007, at 5:25 AM, Sendu Bala wrote: > Chris Fields wrote: >> This is related to a reported bug: >> http://bugzilla.open-bio.org/show_bug.cgi?id=2193 >> The relevant code used to tile HSPs is a bit brittle and >> sometimes leads to errors like this. The error (which is >> actually a thrown exception) is wrapped in an eval block and >> converted to a warn for that reason. I'm not familiar with the >> tiling algorithm used, maybe Steve can add some input? > > Depending on what exactly you're talking about here, I may have re- > written that algorithm. Nice to know the bug survived ;) Yep, I saw your commits (revs. 1.16, 1.17, 1.19, and 1.20). I can check code prior to that to see if it changes anything for better or worse or gets rid of the bug (prob. later today or tomorrow), though I can't see why your revisions would make it worse. If anything they're now more accurate. Thiago can also try; just pull up a revision prior to the ones listed above and see if it helps: http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ Search/SearchUtils.pm?cvsroot=bioperl Jason had previously indicted problems with tiling (i.e. similar exceptions were thrown) prior to your commits so I don't think your changes are related, but one never knows. Chris From bix at sendu.me.uk Fri Mar 2 08:35:34 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 02 Mar 2007 13:35:34 +0000 Subject: [Bioperl-l] (no subject) In-Reply-To: <58ff33550703010647j1e7908f3sf3c01a74eeceaca4@mail.gmail.com> References: <58ff33550703010647j1e7908f3sf3c01a74eeceaca4@mail.gmail.com> Message-ID: <45E82826.30007@sendu.me.uk> Luba Pardo wrote: > Dear all, > Sorry if the questions is too basic but I am trying to learn BioPerl > modules. So I am trying to get the CDS sequence from a gi identification > protein using the "features" method. I started to run the example of the FAQ > doc (How do I retrieve a nucleotide coding sequence when I have a protein gi > number?) [ http://www.bioperl.org/wiki/FAQ#How_do_I_retrieve_a_nucleotide_coding_sequence_when_I_have_a_protein_gi_number.3F ] [snip] > my $protein_gi = '405830'; > my $prot_obj = $gp->get_Seq_by_id($protein_gi);; > foreach my $feat ( $prot_obj->top_SeqFeatures ) { > if ( $feat->primary_tag eq 'CDS' ) { > # example: 'coded_by="U05729.1:1..122"' > my @coded_by = $feat->each_tag_value('coded_by'); > my ($nuc_acc, $loc_str) = split /\:/, $coded_by[0]; [snip] > The error I got is > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Must specify a query or list of uids to fetch [snip] > But I can not see where part of the script is that I have to specify a list > of gi. That very odd. Am I interpreting the script wrong? If you use warnings you'd have seen a problem on the line with the split: @coded_by is empty. This is because you aren't supplying a protein GI. In this case it would be 405831, not 405830. 405830 is already the nucleotide GI so you don't need to do this stuff with coded_by. Use the code in the next section of the FAQ instead: http://www.bioperl.org/wiki/FAQ#How_do_I_get_the_complete_spliced_nucleotide_sequence_from_the_CDS_section.3F From lubapardo at gmail.com Fri Mar 2 08:47:26 2007 From: lubapardo at gmail.com (Luba Pardo) Date: Fri, 2 Mar 2007 14:47:26 +0100 Subject: [Bioperl-l] (no subject) In-Reply-To: <45E82826.30007@sendu.me.uk> References: <58ff33550703010647j1e7908f3sf3c01a74eeceaca4@mail.gmail.com> <45E82826.30007@sendu.me.uk> Message-ID: <58ff33550703020547r26edb40pb9af8dc6556e27d1@mail.gmail.com> Thank you all for your advice. It certaintly made my weekend! Indeed, I could run the example using the RefSeq accesion number. As suggested earlier by Samuel, I run the command over RefSeq (get_Seq_by_id ) method and it worked even without taking out the version last numbers. I am attaching the modified script I run (I checked the translated protein also to verify I got the correct CDS) use Bio::Factory::FTLocationFactory; use Bio::DB::RefSeq; use Bio::DB::GenBank; my $gp = Bio::DB::RefSeq->new; my $gb = Bio::DB::GenBank->new; # factory to turn strings into Bio::Location objects my $loc_factory = Bio::Factory::FTLocationFactory->new; open (IN,"refids.txt") or die "\n I can't open the file\n"; open (OUT, ">>refseqfast.txt") or die "\n I can write it\n"; while () { chomp; my $protein_acc = $_; #print "que onda $protein_acc\n"; #die; my $prot_obj = $gp->get_Seq_by_id($protein_acc); foreach my $feat ( $prot_obj->top_SeqFeatures ) { if ( $feat->primary_tag eq 'CDS' ) { # example: 'coded_by="U05729.1:1..122"' my @coded_by = $feat->each_tag_value('coded_by'); my ($nuc_acc,$loc_str) = split /\:/, $coded_by[0]; print " $nuc_acc\n"; # $nuc_acc = ~s/(\w+).\d+/\1/; print " $nuc_acc\n"; my $nuc_obj = $gb->get_Seq_by_id($nuc_acc); # create Bio::Location object from a string my $loc_object = $loc_factory->from_string($loc_str); # create a Feature object by using a Location my $feat_obj = Bio::SeqFeature::Generic->new(-location =>$loc_object); # associate the Feature object with the nucleotide Seq object $nuc_obj->add_SeqFeature($feat_obj); my $cds_obj = $feat_obj->spliced_seq; print OUT ">",$nuc_acc,"\n",$cds_obj->seq,"\n"; } } } On 02/03/07, Sendu Bala wrote: > > Luba Pardo wrote: > > Dear all, > > Sorry if the questions is too basic but I am trying to learn BioPerl > > modules. So I am trying to get the CDS sequence from a gi identification > > protein using the "features" method. I started to run the example of the > FAQ > > doc (How do I retrieve a nucleotide coding sequence when I have a > protein gi > > number?) > > [ > > http://www.bioperl.org/wiki/FAQ#How_do_I_retrieve_a_nucleotide_coding_sequence_when_I_have_a_protein_gi_number.3F > ] > > [snip] > > my $protein_gi = '405830'; > > my $prot_obj = $gp->get_Seq_by_id($protein_gi);; > > foreach my $feat ( $prot_obj->top_SeqFeatures ) { > > if ( $feat->primary_tag eq 'CDS' ) { > > # example: 'coded_by="U05729.1:1..122"' > > my @coded_by = $feat->each_tag_value('coded_by'); > > my ($nuc_acc, $loc_str) = split /\:/, $coded_by[0]; > [snip] > > The error I got is > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Must specify a query or list of uids to fetch > [snip] > > But I can not see where part of the script is that I have to specify a > list > > of gi. That very odd. Am I interpreting the script wrong? > > If you use warnings you'd have seen a problem on the line with the > split: @coded_by is empty. This is because you aren't supplying a > protein GI. In this case it would be 405831, not 405830. 405830 is > already the nucleotide GI so you don't need to do this stuff with > coded_by. Use the code in the next section of the FAQ instead: > > > http://www.bioperl.org/wiki/FAQ#How_do_I_get_the_complete_spliced_nucleotide_sequence_from_the_CDS_section.3F > > From thiago.venancio at gmail.com Fri Mar 2 06:38:48 2007 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Fri, 2 Mar 2007 08:38:48 -0300 Subject: [Bioperl-l] Bioperl-l] frac_aligned_query returning results >1. Message-ID: <44255ea80703020338t608c1d71k810baf92ede1180e@mail.gmail.com> Hi Sendu and Chris, Thanks for the help. As I mentioned, I have updated my SearchUtils file from: http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/Search/SearchUtils.pm I am also using the lates BioPerl version, installed from CPAN. Please find a buggy blast report attached. In this case, the frac_aligned_query() outputs "1.04", but I have others with " 1.57" for example. Just for a quantitative aspect, I got ">1" values in only 61 / 53,377. The line where I call the function is : print $result->query_name."\t".$hit->frac_aligned_query()."\t".$hit->frac_identical( 'query' )."\n"; Thiago -- "The way to get started is to quit talking and begin doing." Walt Disney ======================== Thiago Motta Venancio, MSc PhD student in Bioinformatics University of Sao Paulo ======================== -------------- next part -------------- BLASTN 2.2.6 [Apr-09-2003] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= AEDES_02704.C (1069 letters) Database: aaegypti.SCAFFOLDS-MASKED.AEDES1.fa 4758 sequences; 1,383,971,543 total letters Searching..........done Score E Sequences producing significant alignments: (bits) Value supercontig:1:supercont1.157:1:2064756:1 supercontig supercont1.157 858 0.0 >supercontig:1:supercont1.157:1:2064756:1 supercontig supercont1.157 Length = 2064756 Score = 858 bits (433), Expect = 0.0 Identities = 448/453 (98%) Strand = Plus / Plus Query: 12 cattttaaatgcatatattgggtgccatcatgactgcctgactcctaaacttgacctcga 71 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 759400 cattttaaatgcatatattgggtgccatcatgactgcctgactcctaaacttgacctcga 759459 Query: 72 ggcctatattctatcccttcttacatgtagtggcttaatcctagattgctggtactcacg 131 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 759460 ggcctatattctatcccttcttacatgtagtggcttaatcctagattgctggtactcacg 759519 Query: 132 gcaccggctattatgctcccgccgcctctagctccaccagcctctgcatcctttctgacg 191 |||||||| |||||||||||||||||| |||||||||||||||| ||||||||||||||| Sbjct: 759520 gcaccggccattatgctcccgccgcctttagctccaccagcctccgcatcctttctgacg 759579 Query: 192 gtctgcgctcgccgtcgtcgcacgctctgtgctcctgcaccacctcagctgaagcggcta 251 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 759580 gtctgcgctcgccgtcgtcgcacgctctgtgctcctgcaccacctcagctgaagcggctg 759639 Query: 252 gctctcgacggtcattcaccgccgacgaccttgcactactgtgttgggaacccctggtcg 311 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 759640 gctctcgacggtcattcaccgccgacgaccttgcactactgtgttgggaacccctggtcg 759699 Query: 312 tgagtcacagtccgctcttcctccgatgtgccaaatgtcaaacgctgatatggctacgga 371 |||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||| Sbjct: 759700 tgagtcacagtccgctcttcctccgatgtgtcaaatgtcaaacgctgatatggctacgga 759759 Query: 372 cgatcctccaccgcggtgttgggaacccctgagtgtagactcgctgggccttccttcgat 431 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 759760 cgatcctccaccgcggtgttgggaacccctgagtgtagactcgctgggccttccttcgat 759819 Query: 432 gagccaaagaacgaaactgcaacgaaaaaaccc 464 ||||||||||||||||||||||||||||||||| Sbjct: 759820 gagccaaagaacgaaactgcaacgaaaaaaccc 759852 Score = 803 bits (405), Expect = 0.0 Identities = 441/453 (97%) Strand = Plus / Plus Query: 12 cattttaaatgcatatattgggtgccatcatgactgcctgactcctaaacttgacctcga 71 ||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||| Sbjct: 768455 cattttaaatgcatatattgggtgccatcatgactacctgactcctaaacttgacctcga 768514 Query: 72 ggcctatattctatcccttcttacatgtagtggcttaatcctagattgctggtactcacg 131 ||||||||||||||| ||||||||||||||||||||||||||||||| |||||||||||| Sbjct: 768515 ggcctatattctatctcttcttacatgtagtggcttaatcctagatttctggtactcacg 768574 Query: 132 gcaccggctattatgctcccgccgcctctagctccaccagcctctgcatcctttctgacg 191 |||||||| |||||||||||||||||| |||||||||||||||| |||| |||||||||| Sbjct: 768575 gcaccggccattatgctcccgccgcctttagctccaccagcctccgcattctttctgacg 768634 Query: 192 gtctgcgctcgccgtcgtcgcacgctctgtgctcctgcaccacctcagctgaagcggcta 251 ||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||| Sbjct: 768635 gtctgcgctcgccgtcgtcgcacgctctgtgctcctgcaccccctcagctgaagcggctg 768694 Query: 252 gctctcgacggtcattcaccgccgacgaccttgcactactgtgttgggaacccctggtcg 311 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 768695 gctctcgacggtcattcaccgccgacgaccttgcactactgtgttgggaacccctggtcg 768754 Query: 312 tgagtcacagtccgctcttcctccgatgtgccaaatgtcaaacgctgatatggctacgga 371 ||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||| Sbjct: 768755 tgagtcacagtccgctcttcctccgatgtgccaaatgtcaactgctgatatggctacgga 768814 Query: 372 cgatcctccaccgcggtgttgggaacccctgagtgtagactcgctgggccttccttcgat 431 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 768815 cgatcctccaccgcggtgttgggaacccctgagtgtagactcgctgggccttccttcgat 768874 Query: 432 gagccaaagaacgaaactgcaacgaaaaaaccc 464 |||||||||||| |||||||||||||||||||| Sbjct: 768875 gagccaaagaacaaaactgcaacgaaaaaaccc 768907 Score = 317 bits (160), Expect = 3e-84 Identities = 170/172 (98%), Gaps = 1/172 (0%) Strand = Plus / Plus Query: 899 ttcctttcaccgtatggccacgtgacgatgatgagctctggcgcctttcgcgtccggatg 958 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 769407 ttcctttcaccgtatggccacgtgacgatgatgagctctggcgcctttcgcgtccggatg 769466 Query: 959 ttttccgccacggccggtaacaactatgtaacctttcactatggaaaactgcaaaagttt 1018 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 769467 ttttccgccacggccggtaacaactatgtaacctttcactatggaaaactgcaaaagttc 769526 Query: 1019 tctttcgccggcttt-aacggcactaactttttgtggcaaaatccgcttatt 1069 ||||||||||||||| |||||||||||||||||||||||||||||||||||| Sbjct: 769527 tctttcgccggctttaaacggcactaactttttgtggcaaaatccgcttatt 769578 Score = 311 bits (157), Expect = 2e-82 Identities = 167/169 (98%), Gaps = 1/169 (0%) Strand = Plus / Plus Query: 902 ctttcaccgtatggccacgtgacgatgatgagctctggcgcctttcgcgtccggatgttt 961 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 760355 ctttcaccgtatggccacgtgacgatgatgagctctggcgcctttcgcgtccggatgttt 760414 Query: 962 tccgccacggccggtaacaactatgtaacctttcactatggaaaactgcaaaagttttct 1021 |||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||| Sbjct: 760415 tccgccacggccggtaacaactatgtaacctttcactatggaaaactgcaaaagttctct 760474 Query: 1022 ttcgccggcttt-aacggcactaactttttgtggcaaaatccgcttatt 1069 |||||||||||| |||||||||||||||||||||||||||||||||||| Sbjct: 760475 ttcgccggctttaaacggcactaactttttgtggcaaaatccgcttatt 760523 Score = 293 bits (148), Expect = 5e-77 Identities = 151/152 (99%) Strand = Plus / Plus Query: 587 ccccctttctttccgtcggaatcacgatccgagcagtagcgtcagcagacaatggatttg 646 ||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||| Sbjct: 769138 ccccctttctttccgtcggaatcgcgatccgagcagtagcgtcagcagacaatggatttg 769197 Query: 647 cggcgttcttattgttgaacaatgcgtgcccgtcctgatgccagcgatgatctcgtgatc 706 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 769198 cggcgttcttattgttgaacaatgcgtgcccgtcctgatgccagcgatgatctcgtgatc 769257 Query: 707 ggacaatcacgtcggtttcgaagcggttggcc 738 |||||||||||||||||||||||||||||||| Sbjct: 769258 ggacaatcacgtcggtttcgaagcggttggcc 769289 Score = 293 bits (148), Expect = 5e-77 Identities = 151/152 (99%) Strand = Plus / Plus Query: 587 ccccctttctttccgtcggaatcacgatccgagcagtagcgtcagcagacaatggatttg 646 ||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||| Sbjct: 760083 ccccctttctttccgtcggaatcgcgatccgagcagtagcgtcagcagacaatggatttg 760142 Query: 647 cggcgttcttattgttgaacaatgcgtgcccgtcctgatgccagcgatgatctcgtgatc 706 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 760143 cggcgttcttattgttgaacaatgcgtgcccgtcctgatgccagcgatgatctcgtgatc 760202 Query: 707 ggacaatcacgtcggtttcgaagcggttggcc 738 |||||||||||||||||||||||||||||||| Sbjct: 760203 ggacaatcacgtcggtttcgaagcggttggcc 760234 Score = 242 bits (122), Expect = 2e-61 Identities = 125/126 (99%) Strand = Plus / Plus Query: 463 cccaataaggggtttctcgtcgtgtgtgggtgcttcgagggaagcgacagtcaacctggc 522 |||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||| Sbjct: 768959 cccaataaggggtttctcgtcgtgtgtgggtgcttcgagggaagcgacagccaacctggc 769018 Query: 523 tcgcttcgaggtcctcctacggctcttgcagtggctatgcgctattggtggatgcaaaat 582 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 769019 tcgcttcgaggtcctcctacggctcttgcagtggctatgcgctattggtggatgcaaaat 769078 Query: 583 cgtccc 588 |||||| Sbjct: 769079 cgtccc 769084 Score = 242 bits (122), Expect = 2e-61 Identities = 125/126 (99%) Strand = Plus / Plus Query: 463 cccaataaggggtttctcgtcgtgtgtgggtgcttcgagggaagcgacagtcaacctggc 522 |||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||| Sbjct: 759904 cccaataaggggtttctcgtcgtgtgtgggtgcttcgagggaagcgacagccaacctggc 759963 Query: 523 tcgcttcgaggtcctcctacggctcttgcagtggctatgcgctattggtggatgcaaaat 582 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 759964 tcgcttcgaggtcctcctacggctcttgcagtggctatgcgctattggtggatgcaaaat 760023 Query: 583 cgtccc 588 |||||| Sbjct: 760024 cgtccc 760029 Score = 123 bits (62), Expect = 1e-25 Identities = 65/66 (98%) Strand = Plus / Plus Query: 737 ccactctaaaatgcttcttcgtgccttctaggtcgtcgatatttgccgcgaaaaccgtga 796 |||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||| Sbjct: 769344 ccactctaaaatgcttcttcgtgccttctaggtcgttgatatttgccgcgaaaaccgtga 769403 Query: 797 tccttc 802 |||||| Sbjct: 769404 tccttc 769409 Score = 121 bits (61), Expect = 4e-25 Identities = 64/65 (98%) Strand = Plus / Plus Query: 737 ccactctaaaatgcttcttcgtgccttctaggtcgtcgatatttgccgcgaaaaccgtga 796 |||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||| Sbjct: 760289 ccactctaaaatgcttcttcgtgccttctaggtcgttgatatttgccgcgaaaaccgtga 760348 Query: 797 tcctt 801 ||||| Sbjct: 760349 tcctt 760353 Score = 105 bits (53), Expect = 2e-20 Identities = 68/73 (93%) Query: 806 gttaaaaataatgaagattacacgtcatctaaactccatttatgcgatgtaaacatgacg 865 ||||||||||||||||||||||| |||| |||||| ||||||||| ||| |||||||||| Sbjct: 1251522 gttaaaaataatgaagattacacatcatgtaaacttcatttatgcaatgcaaacatgacg 1251463 Query: 866 tcatgtaaattta 878 ||||||||||||| Sbjct: 1251462 tcatgtaaattta 1251450 Score = 97.6 bits (49), Expect = 6e-18 Identities = 70/77 (90%) Strand = Plus / Plus Query: 802 cacagttaaaaataatgaagattacacgtcatctaaactccatttatgcgatgtaaacat 861 ||||||||||| |||||||||||| |||| ||||||||| || |||||||||||||||| Sbjct: 1251086 cacagttaaaactaatgaagattaaacgttatctaaactttatatatgcgatgtaaacat 1251145 Query: 862 gacgtcatgtaaattta 878 || |||||||||||||| Sbjct: 1251146 gaagtcatgtaaattta 1251162 Score = 61.9 bits (31), Expect = 3e-07 Identities = 37/39 (94%) Strand = Plus / Minus Query: 802 cacagttaaaaataatgaagattacacgtcatctaaact 840 |||||| ||||||||||||||||||||||||| |||||| Sbjct: 1601368 cacagtaaaaaataatgaagattacacgtcatgtaaact 1601330 From thiago.venancio at gmail.com Fri Mar 2 07:29:43 2007 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Fri, 2 Mar 2007 09:29:43 -0300 Subject: [Bioperl-l] frac aligned query Message-ID: <44255ea80703020429m53a2eb7ek6588011bd8400a0a@mail.gmail.com> Hi Sendu and Chris, Sorry for mailing again, my previous email was blocked by the list (suspicious header). Thanks for the help. As I mentioned, I have updated my SearchUtils file from: http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/Search/SearchUtils.pm I am also using the lates BioPerl version, installed from CPAN. Please find a buggy blast report attached. In this case, the frac_aligned_query() outputs "1.04", but I have others with " 1.57" for example. Just for a quantitative aspect, I got ">1" values in only 61 / 53,377. The line where I call the function is : print $result->query_name."\t".$hit->frac_aligned_query()."\t".$hit->frac_identical( 'query' )."\n"; Thiago On 3/2/07, Sendu Bala wrote: > > Chris Fields wrote: > > This is related to a reported bug: > > > > http://bugzilla.open-bio.org/show_bug.cgi?id=2193 > > > > The relevant code used to tile HSPs is a bit brittle and sometimes > > leads to errors like this. The error (which is actually a thrown > > exception) is wrapped in an eval block and converted to a warn for > > that reason. I'm not familiar with the tiling algorithm used, maybe > > Steve can add some input? > > Depending on what exactly you're talking about here, I may have > re-written that algorithm. Nice to know the bug survived ;) > -- "The way to get started is to quit talking and begin doing." Walt Disney ======================== Thiago Motta Venancio, MSc PhD student in Bioinformatics University of Sao Paulo ======================== -------------- next part -------------- BLASTN 2.2.6 [Apr-09-2003] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= AEDES_02704.C (1069 letters) Database: aaegypti.SCAFFOLDS-MASKED.AEDES1.fa 4758 sequences; 1,383,971,543 total letters Searching..........done Score E Sequences producing significant alignments: (bits) Value supercontig:1:supercont1.157:1:2064756:1 supercontig supercont1.157 858 0.0 >supercontig:1:supercont1.157:1:2064756:1 supercontig supercont1.157 Length = 2064756 Score = 858 bits (433), Expect = 0.0 Identities = 448/453 (98%) Strand = Plus / Plus Query: 12 cattttaaatgcatatattgggtgccatcatgactgcctgactcctaaacttgacctcga 71 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 759400 cattttaaatgcatatattgggtgccatcatgactgcctgactcctaaacttgacctcga 759459 Query: 72 ggcctatattctatcccttcttacatgtagtggcttaatcctagattgctggtactcacg 131 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 759460 ggcctatattctatcccttcttacatgtagtggcttaatcctagattgctggtactcacg 759519 Query: 132 gcaccggctattatgctcccgccgcctctagctccaccagcctctgcatcctttctgacg 191 |||||||| |||||||||||||||||| |||||||||||||||| ||||||||||||||| Sbjct: 759520 gcaccggccattatgctcccgccgcctttagctccaccagcctccgcatcctttctgacg 759579 Query: 192 gtctgcgctcgccgtcgtcgcacgctctgtgctcctgcaccacctcagctgaagcggcta 251 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 759580 gtctgcgctcgccgtcgtcgcacgctctgtgctcctgcaccacctcagctgaagcggctg 759639 Query: 252 gctctcgacggtcattcaccgccgacgaccttgcactactgtgttgggaacccctggtcg 311 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 759640 gctctcgacggtcattcaccgccgacgaccttgcactactgtgttgggaacccctggtcg 759699 Query: 312 tgagtcacagtccgctcttcctccgatgtgccaaatgtcaaacgctgatatggctacgga 371 |||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||| Sbjct: 759700 tgagtcacagtccgctcttcctccgatgtgtcaaatgtcaaacgctgatatggctacgga 759759 Query: 372 cgatcctccaccgcggtgttgggaacccctgagtgtagactcgctgggccttccttcgat 431 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 759760 cgatcctccaccgcggtgttgggaacccctgagtgtagactcgctgggccttccttcgat 759819 Query: 432 gagccaaagaacgaaactgcaacgaaaaaaccc 464 ||||||||||||||||||||||||||||||||| Sbjct: 759820 gagccaaagaacgaaactgcaacgaaaaaaccc 759852 Score = 803 bits (405), Expect = 0.0 Identities = 441/453 (97%) Strand = Plus / Plus Query: 12 cattttaaatgcatatattgggtgccatcatgactgcctgactcctaaacttgacctcga 71 ||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||| Sbjct: 768455 cattttaaatgcatatattgggtgccatcatgactacctgactcctaaacttgacctcga 768514 Query: 72 ggcctatattctatcccttcttacatgtagtggcttaatcctagattgctggtactcacg 131 ||||||||||||||| ||||||||||||||||||||||||||||||| |||||||||||| Sbjct: 768515 ggcctatattctatctcttcttacatgtagtggcttaatcctagatttctggtactcacg 768574 Query: 132 gcaccggctattatgctcccgccgcctctagctccaccagcctctgcatcctttctgacg 191 |||||||| |||||||||||||||||| |||||||||||||||| |||| |||||||||| Sbjct: 768575 gcaccggccattatgctcccgccgcctttagctccaccagcctccgcattctttctgacg 768634 Query: 192 gtctgcgctcgccgtcgtcgcacgctctgtgctcctgcaccacctcagctgaagcggcta 251 ||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||| Sbjct: 768635 gtctgcgctcgccgtcgtcgcacgctctgtgctcctgcaccccctcagctgaagcggctg 768694 Query: 252 gctctcgacggtcattcaccgccgacgaccttgcactactgtgttgggaacccctggtcg 311 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 768695 gctctcgacggtcattcaccgccgacgaccttgcactactgtgttgggaacccctggtcg 768754 Query: 312 tgagtcacagtccgctcttcctccgatgtgccaaatgtcaaacgctgatatggctacgga 371 ||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||| Sbjct: 768755 tgagtcacagtccgctcttcctccgatgtgccaaatgtcaactgctgatatggctacgga 768814 Query: 372 cgatcctccaccgcggtgttgggaacccctgagtgtagactcgctgggccttccttcgat 431 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 768815 cgatcctccaccgcggtgttgggaacccctgagtgtagactcgctgggccttccttcgat 768874 Query: 432 gagccaaagaacgaaactgcaacgaaaaaaccc 464 |||||||||||| |||||||||||||||||||| Sbjct: 768875 gagccaaagaacaaaactgcaacgaaaaaaccc 768907 Score = 317 bits (160), Expect = 3e-84 Identities = 170/172 (98%), Gaps = 1/172 (0%) Strand = Plus / Plus Query: 899 ttcctttcaccgtatggccacgtgacgatgatgagctctggcgcctttcgcgtccggatg 958 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 769407 ttcctttcaccgtatggccacgtgacgatgatgagctctggcgcctttcgcgtccggatg 769466 Query: 959 ttttccgccacggccggtaacaactatgtaacctttcactatggaaaactgcaaaagttt 1018 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 769467 ttttccgccacggccggtaacaactatgtaacctttcactatggaaaactgcaaaagttc 769526 Query: 1019 tctttcgccggcttt-aacggcactaactttttgtggcaaaatccgcttatt 1069 ||||||||||||||| |||||||||||||||||||||||||||||||||||| Sbjct: 769527 tctttcgccggctttaaacggcactaactttttgtggcaaaatccgcttatt 769578 Score = 311 bits (157), Expect = 2e-82 Identities = 167/169 (98%), Gaps = 1/169 (0%) Strand = Plus / Plus Query: 902 ctttcaccgtatggccacgtgacgatgatgagctctggcgcctttcgcgtccggatgttt 961 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 760355 ctttcaccgtatggccacgtgacgatgatgagctctggcgcctttcgcgtccggatgttt 760414 Query: 962 tccgccacggccggtaacaactatgtaacctttcactatggaaaactgcaaaagttttct 1021 |||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||| Sbjct: 760415 tccgccacggccggtaacaactatgtaacctttcactatggaaaactgcaaaagttctct 760474 Query: 1022 ttcgccggcttt-aacggcactaactttttgtggcaaaatccgcttatt 1069 |||||||||||| |||||||||||||||||||||||||||||||||||| Sbjct: 760475 ttcgccggctttaaacggcactaactttttgtggcaaaatccgcttatt 760523 Score = 293 bits (148), Expect = 5e-77 Identities = 151/152 (99%) Strand = Plus / Plus Query: 587 ccccctttctttccgtcggaatcacgatccgagcagtagcgtcagcagacaatggatttg 646 ||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||| Sbjct: 769138 ccccctttctttccgtcggaatcgcgatccgagcagtagcgtcagcagacaatggatttg 769197 Query: 647 cggcgttcttattgttgaacaatgcgtgcccgtcctgatgccagcgatgatctcgtgatc 706 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 769198 cggcgttcttattgttgaacaatgcgtgcccgtcctgatgccagcgatgatctcgtgatc 769257 Query: 707 ggacaatcacgtcggtttcgaagcggttggcc 738 |||||||||||||||||||||||||||||||| Sbjct: 769258 ggacaatcacgtcggtttcgaagcggttggcc 769289 Score = 293 bits (148), Expect = 5e-77 Identities = 151/152 (99%) Strand = Plus / Plus Query: 587 ccccctttctttccgtcggaatcacgatccgagcagtagcgtcagcagacaatggatttg 646 ||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||| Sbjct: 760083 ccccctttctttccgtcggaatcgcgatccgagcagtagcgtcagcagacaatggatttg 760142 Query: 647 cggcgttcttattgttgaacaatgcgtgcccgtcctgatgccagcgatgatctcgtgatc 706 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 760143 cggcgttcttattgttgaacaatgcgtgcccgtcctgatgccagcgatgatctcgtgatc 760202 Query: 707 ggacaatcacgtcggtttcgaagcggttggcc 738 |||||||||||||||||||||||||||||||| Sbjct: 760203 ggacaatcacgtcggtttcgaagcggttggcc 760234 Score = 242 bits (122), Expect = 2e-61 Identities = 125/126 (99%) Strand = Plus / Plus Query: 463 cccaataaggggtttctcgtcgtgtgtgggtgcttcgagggaagcgacagtcaacctggc 522 |||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||| Sbjct: 768959 cccaataaggggtttctcgtcgtgtgtgggtgcttcgagggaagcgacagccaacctggc 769018 Query: 523 tcgcttcgaggtcctcctacggctcttgcagtggctatgcgctattggtggatgcaaaat 582 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 769019 tcgcttcgaggtcctcctacggctcttgcagtggctatgcgctattggtggatgcaaaat 769078 Query: 583 cgtccc 588 |||||| Sbjct: 769079 cgtccc 769084 Score = 242 bits (122), Expect = 2e-61 Identities = 125/126 (99%) Strand = Plus / Plus Query: 463 cccaataaggggtttctcgtcgtgtgtgggtgcttcgagggaagcgacagtcaacctggc 522 |||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||| Sbjct: 759904 cccaataaggggtttctcgtcgtgtgtgggtgcttcgagggaagcgacagccaacctggc 759963 Query: 523 tcgcttcgaggtcctcctacggctcttgcagtggctatgcgctattggtggatgcaaaat 582 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 759964 tcgcttcgaggtcctcctacggctcttgcagtggctatgcgctattggtggatgcaaaat 760023 Query: 583 cgtccc 588 |||||| Sbjct: 760024 cgtccc 760029 Score = 123 bits (62), Expect = 1e-25 Identities = 65/66 (98%) Strand = Plus / Plus Query: 737 ccactctaaaatgcttcttcgtgccttctaggtcgtcgatatttgccgcgaaaaccgtga 796 |||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||| Sbjct: 769344 ccactctaaaatgcttcttcgtgccttctaggtcgttgatatttgccgcgaaaaccgtga 769403 Query: 797 tccttc 802 |||||| Sbjct: 769404 tccttc 769409 Score = 121 bits (61), Expect = 4e-25 Identities = 64/65 (98%) Strand = Plus / Plus Query: 737 ccactctaaaatgcttcttcgtgccttctaggtcgtcgatatttgccgcgaaaaccgtga 796 |||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||| Sbjct: 760289 ccactctaaaatgcttcttcgtgccttctaggtcgttgatatttgccgcgaaaaccgtga 760348 Query: 797 tcctt 801 ||||| Sbjct: 760349 tcctt 760353 Score = 105 bits (53), Expect = 2e-20 Identities = 68/73 (93%) Query: 806 gttaaaaataatgaagattacacgtcatctaaactccatttatgcgatgtaaacatgacg 865 ||||||||||||||||||||||| |||| |||||| ||||||||| ||| |||||||||| Sbjct: 1251522 gttaaaaataatgaagattacacatcatgtaaacttcatttatgcaatgcaaacatgacg 1251463 Query: 866 tcatgtaaattta 878 ||||||||||||| Sbjct: 1251462 tcatgtaaattta 1251450 Score = 97.6 bits (49), Expect = 6e-18 Identities = 70/77 (90%) Strand = Plus / Plus Query: 802 cacagttaaaaataatgaagattacacgtcatctaaactccatttatgcgatgtaaacat 861 ||||||||||| |||||||||||| |||| ||||||||| || |||||||||||||||| Sbjct: 1251086 cacagttaaaactaatgaagattaaacgttatctaaactttatatatgcgatgtaaacat 1251145 Query: 862 gacgtcatgtaaattta 878 || |||||||||||||| Sbjct: 1251146 gaagtcatgtaaattta 1251162 Score = 61.9 bits (31), Expect = 3e-07 Identities = 37/39 (94%) Strand = Plus / Minus Query: 802 cacagttaaaaataatgaagattacacgtcatctaaact 840 |||||| ||||||||||||||||||||||||| |||||| Sbjct: 1601368 cacagtaaaaaataatgaagattacacgtcatgtaaact 1601330 From cjfields at uiuc.edu Fri Mar 2 09:35:34 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 2 Mar 2007 08:35:34 -0600 Subject: [Bioperl-l] Bio::SeqIO::FTHelper In-Reply-To: <200703020808.15664.heikki@sanbi.ac.za> References: <200703011302.30855.heikki@sanbi.ac.za> <8AA2B586-4C2B-4379-BFF0-0CEB15FAE68E@uiuc.edu> <200703020808.15664.heikki@sanbi.ac.za> Message-ID: <7EC38884-D3E1-470D-9FAC-548797433B9D@uiuc.edu> The current parsers are slightly faster, but not enough to make a huge difference unless you're parsing thousands of sequences. However, it does demonstrate that a good deal of the performance issues stem from object creation and not parsing, an issue that is already known. For instance, if you do everything up to (but skip) instantiation of an object, like a SeqFeature/Annotation/Species, the parsing speeds up dramatically dependent on the number of objects created. I also saw significant increases in speed when using FTHelper (instead of SeqFeatures) or Bio::Taxon (instead of Bio::Species), so lighter objects definitely help. I basically just separate the two key steps into two distinct tasks (driver and handler); I haven't thought much about validation though I would probably separate that into a third task. Regardless, the current drivers are flexible enough to deal with the occasional oddity and not die. It's much easier to maintain and extend; for instance if you wanted to develop lightweight objects it's now easier to accomplish (i.e. rewrite/overload a handler vs. rewrite next_seq () ), and you can separately develop a faster driver via next_seq() as long as it threw the same data structure. Multiple parsers can also use the same handler. I currently have GenBank/EMBL/SwissProt all sharing the same handler and passing all tests. chris On Mar 2, 2007, at 12:08 AM, Heikki Lehvaslaiho wrote: > This sounds great. Is the speed increase noticeable? > > -Heikki > > > On Thursday 01 March 2007 17:24:03 Chris Fields wrote: >> I do have a rough outline of what I think could be done: >> >> http://www.bioperl.org/wiki/Handler-based_SeqIO_parsers >> >> where you could switch out handlers to deal with incoming data >> chunks. Any suggestions there are welcome. >> >> I'll probably commit examples of the above in the next week or two >> (GenBank, EMBL, Swiss parsers using the same handlers) which don't >> use FTHelper. So far I have all three passing tests based on >> genbank/ >> embl/swiss.t but they need a few more tweaks before I commit. >> >> chris >> >> On Mar 1, 2007, at 5:02 AM, Heikki Lehvaslaiho wrote: >>> Chris, >>> >>> It was meant to collect code that was common to all three main >>> databases using >>> similar feature tables. >>> >>> Now might be the time to optimise the parsing speed by removing it. >>> Do you >>> have a plan how to do it? >>> >>> -Heikki >>> >>> On Tuesday 27 February 2007 22:57:40 Chris Fields wrote: >>>> Could anyone tell me what FTHelper is used for? From what I gather >>>> it rolls up seqfeature data into a lightweight object but then >>>> creates a SeqFeature::Generic anyway (at least for GenBank/EMBL/ >>>> Swiss), which seems to be a waste of memory and time. Is there >>>> something I'm missing (besides my sanity of course)? >>>> >>>> chris >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> ______ _/ _/ >>> _____________________________________________________ >>> _/ _/ >>> _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za >>> _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho >>> _/ _/ _/ SANBI, South African National Bioinformatics >>> Institute >>> _/ _/ _/ University of Western Cape, South Africa >>> _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 >>> ___ _/_/_/_/_/ >>> ________________________________________________________ >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From Kevin.M.Brown at asu.edu Fri Mar 2 10:21:16 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Fri, 2 Mar 2007 08:21:16 -0700 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <6dce9a0b0703011239m7065a570l6d19e6c7065fca45@mail.gmail.com> References: <45E5F43C.9080902@sendu.me.uk><1A4207F8295607498283FE9E93B775B402CC9D60@EX02.asurite.ad.asu.edu><45E7335F.8070102@bms.com> <45E7359E.5030104@bms.com> <6dce9a0b0703011239m7065a570l6d19e6c7065fca45@mail.gmail.com> Message-ID: <1A4207F8295607498283FE9E93B775B402CC9E79@EX02.asurite.ad.asu.edu> > You need to have GD::SVG installed and then instantiate the > panel with: > -image_class=>'GD::SVG' If this is the case, then why have an SVG method in Bio::Graphics::Panel if it doesn't do this for you. Either the method should be removed and the normal $panel->gd method should be called to get an image out or calling that method should setup and create the SVG for the user. Either way I don't see anything in the documentation or wiki that points out this "gotcha". From stefan.kirov at bms.com Fri Mar 2 10:41:34 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Fri, 02 Mar 2007 10:41:34 -0500 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <1A4207F8295607498283FE9E93B775B402CC9E79@EX02.asurite.ad.asu.edu> References: <45E5F43C.9080902@sendu.me.uk> <1A4207F8295607498283FE9E93B775B402CC9D60@EX02.asurite.ad.asu.edu> <45E7335F.8070102@bms.com> <45E7359E.5030104@bms.com> <6dce9a0b0703011239m7065a570l6d19e6c7065fca45@mail.gmail.com> <1A4207F8295607498283FE9E93B775B402CC9E79@EX02.asurite.ad.asu.edu> Message-ID: <45E845AE.6060400@bms.com> Kevin Brown wrote: >> You need to have GD::SVG installed and then instantiate the >> panel with: >> -image_class=>'GD::SVG' >> > > If this is the case, then why have an SVG method in Bio::Graphics::Panel > if it doesn't do this for you. Either the method should be removed and > the normal $panel->gd method should be called to get an image out or > calling that method should setup and create the SVG for the user. > Either way I don't see anything in the documentation or wiki that points > out this "gotcha". > > I don't think it is that easy, since the you cannot simply switch between graphics libraries, but perhaps svg method should check the class that was used and throw an error if it is not GD::SVG. Stefan From bix at sendu.me.uk Fri Mar 2 11:05:16 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 02 Mar 2007 16:05:16 +0000 Subject: [Bioperl-l] frac_aligned_query returning results >1. In-Reply-To: <44255ea80703020336y7e423b94rd07acd380fe4b8fd@mail.gmail.com> References: <44255ea80703011002l245e9576s66319ee695d3bd5b@mail.gmail.com> <8D7B0767-46A6-4083-B94E-D6490B241B84@uiuc.edu> <45E809A5.9060407@sendu.me.uk> <44255ea80703020336y7e423b94rd07acd380fe4b8fd@mail.gmail.com> Message-ID: <45E84B3C.5000402@sendu.me.uk> Thiago Venancio wrote: > Hi Sendu and Chris, > > Thanks for the help. > As I mentioned, I have updated my SearchUtils file from: > http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/Search/SearchUtils.pm > > > I am also using the lates BioPerl version, installed from CPAN. > > Please find a buggy blast report attached. > In this case, the frac_aligned_query() outputs "1.04", but I have others > with " 1.57" for example. > > Just for a quantitative aspect, I got ">1" values in only 61 / 53,377. Many thanks for that. I've committed another fix for SearchUtils so please get revision 1.23 and try again. Hopefully all 61 will no longer be >1, but if any are please send me sample blast files again. For anyone interested, the bug was due to a completely unbelievable oversight on my part in the contig merging algorithm: I forgot to deal with contigs that were fully contained by others. Wow! From johnsonm at gmail.com Fri Mar 2 11:10:55 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Fri, 2 Mar 2007 10:10:55 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: <45E75C8E.7010809@sendu.me.uk> References: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> <45E61AA9.9030906@sendu.me.uk> <45E75C8E.7010809@sendu.me.uk> Message-ID: > I think a 4in1 would still be possible. Presumably at some point you > know which one you will run, so let the user set everything in the > single new() even if it doesn't make sense, but then form argument > strings with Something like that occurred to me while driving home last night. That ought to separate things cleanly enough, especially if I validate the options against the selected program. I'm wasn't really thrilled with the idea of code duplication between multiple modules, either. All that said, if these Glimmer things are different programs with > different uses (and not simply different versions of the same thing with > the same function), by all means make separate modules. > It's a 'family' of gene predictors, two eukaryotic, two prokaryotic. They're just similar enough to need similar solutions, and just different enough to be slightly annoying. 8) From thiago.venancio at gmail.com Fri Mar 2 11:14:20 2007 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Fri, 2 Mar 2007 13:14:20 -0300 Subject: [Bioperl-l] frac_aligned_query returning results >1. In-Reply-To: <45E84B3C.5000402@sendu.me.uk> References: <44255ea80703011002l245e9576s66319ee695d3bd5b@mail.gmail.com> <8D7B0767-46A6-4083-B94E-D6490B241B84@uiuc.edu> <45E809A5.9060407@sendu.me.uk> <44255ea80703020336y7e423b94rd07acd380fe4b8fd@mail.gmail.com> <45E84B3C.5000402@sendu.me.uk> Message-ID: <44255ea80703020814v31495221i72f65db532c0dd9b@mail.gmail.com> Hi Sendu, Great to know you fixed the problem. I have updated the SearchUtils and seems to be correct now. Best! Thiago On 3/2/07, Sendu Bala wrote: > > Thiago Venancio wrote: > > Hi Sendu and Chris, > > > > Thanks for the help. > > As I mentioned, I have updated my SearchUtils file from: > > > http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/Search/SearchUtils.pm > > < > http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/Search/SearchUtils.pm > > > > > > I am also using the lates BioPerl version, installed from CPAN. > > > > Please find a buggy blast report attached. > > In this case, the frac_aligned_query() outputs "1.04", but I have others > > with " 1.57" for example. > > > > Just for a quantitative aspect, I got ">1" values in only 61 / 53,377. > > Many thanks for that. > > I've committed another fix for SearchUtils so please get revision 1.23 > and try again. Hopefully all 61 will no longer be >1, but if any are > please send me sample blast files again. > > For anyone interested, the bug was due to a completely unbelievable > oversight on my part in the contig merging algorithm: I forgot to deal > with contigs that were fully contained by others. Wow! > -- "The way to get started is to quit talking and begin doing." Walt Disney ======================== Thiago Motta Venancio, MSc PhD student in Bioinformatics University of Sao Paulo ======================== From johnsonm at gmail.com Fri Mar 2 11:15:34 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Fri, 2 Mar 2007 10:15:34 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: <9CE93EC0-A9DB-4A5B-8CE7-F15795375587@uiuc.edu> References: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> <45E61AA9.9030906@sendu.me.uk> <9CE93EC0-A9DB-4A5B-8CE7-F15795375587@uiuc.edu> Message-ID: On 3/1/07, Chris Fields wrote: > > I have been working on an Infernal wrapper (not finished yet but > getting there) which does this: Speaking of Infernal...that's on my shopping list, too. Though we don't invoke cmsearch directly, we use the Sanger rfam_scan wrapper ( http://www.sanger.ac.uk/Software/Rfam/help/scripts/search/rfam_scan.pl), which does a pre-cmsearch blast to determine which models to run. I wonder whether to wrap the wrapper, or just incorporate the guts of rfam_scan into a Bioperl wrapper (if the licensing is compatible)? From cjfields at uiuc.edu Fri Mar 2 11:48:57 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 2 Mar 2007 10:48:57 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: References: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> <45E61AA9.9030906@sendu.me.uk> <9CE93EC0-A9DB-4A5B-8CE7-F15795375587@uiuc.edu> Message-ID: <21B5F8B1-5795-4D30-A693-E4166DE84C35@uiuc.edu> On Mar 2, 2007, at 10:15 AM, Mark Johnson wrote: > On 3/1/07, Chris Fields wrote: > I have been working on an Infernal wrapper (not finished yet but > getting there) which does this: > > Speaking of Infernal...that's on my shopping list, too. Though we > don't invoke cmsearch directly, we use the Sanger rfam_scan wrapper > ( http://www.sanger.ac.uk/Software/Rfam/help/scripts/search/ > rfam_scan.pl), which does a pre-cmsearch blast to determine which > models to run. > > I wonder whether to wrap the wrapper, or just incorporate the guts > of rfam_scan into a Bioperl wrapper (if the licensing is compatible)? You could modify it to use both Bio::Tools::Run::StandAloneBlast and Bio::Tools::Run::Infernal. I'm still toying with the latter (only cmsearch works) but will probably get back to it this weekend as I have a ton of Infernal searches to do. BTW, the latest Infernal (v0.72, which the wrapper supports) allows you to prerun a HMM before running with a full-blown CM search and allows you to specify a bit score cutoff to decrease the noise level. Much faster, particularly when running full genome searches... chris From gowthaman.ramasamy at sbri.org Fri Mar 2 16:03:44 2007 From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy) Date: Fri, 2 Mar 2007 13:03:44 -0800 Subject: [Bioperl-l] Parsing CDS info from GFF file Message-ID: Hi List, I am trying to find a way to grab cordinates CDS (startcodon-stopcodon) from a GFF file. But, the GFF file has cordinates of individual exons (cds). Just wondering if there is any tool/module/script available for this. It should take care of both multi-exonic genes and + or - strand as well. set of examples of GFF file entries are bellow... many thanks in advance gowtham SBRI, Seattle. 1400 TIGR gene 127456 128386 . + . ID=1400.t00213;Name=hypothetical protein 1400 TIGR mRNA 127456 128386 . + . ID=1400.m02493;Parent=1400.t00213 1400 TIGR five_prime_utr 127456 127993 . + . ID=utr5p_of_1400.m02493;Parent=1400.m02493 1400 TIGR exon 127456 128386 . + . ID=1400.e05831;Parent=1400.m02493 1400 TIGR CDS 127994 128314 . + 0 ID=cds_of_1400.m02493;Parent=1400.m02493 1400 TIGR three_prime_utr 128315 128386 . + . ID=utr3p_of_1400.m02493;Parent=1400.m02493 1400 TIGR gene 232655 233965 . - . ID=1400.t00271;Name=pleckstrin homology domain protein, puta tive 1400 TIGR mRNA 232655 233965 . - . ID=1400.m02876;Parent=1400.t00271 1400 TIGR five_prime_utr 233477 233965 . - . ID=utr5p_of_1400.m02876;Parent=1400.m02876 1400 TIGR exon 233339 233965 . - . ID=1400.e05827;Parent=1400.m02876 1400 TIGR CDS 233339 233476 . - 0 ID=cds_of_1400.m02876;Parent=1400.m02876 1400 TIGR exon 233011 233182 . - . ID=1400.e05826;Parent=1400.m02876 1400 TIGR CDS 233011 233182 . - 0 ID=cds_of_1400.m02876;Parent=1400.m02876 1400 TIGR exon 232655 232781 . - . ID=1400.e05825;Parent=1400.m02876 1400 TIGR CDS 232729 232781 . - 1 ID=cds_of_1400.m02876;Parent=1400.m02876 1400 TIGR three_prime_utr 232655 232728 . - . ID=utr3p_of_1400.m02876;Parent=1400.m02876 From gowthaman.ramasamy at sbri.org Fri Mar 2 16:08:27 2007 From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy) Date: Fri, 2 Mar 2007 13:08:27 -0800 Subject: [Bioperl-l] Parsing CDS info from GFF file Message-ID: Hi List, I am trying to find a way to grab cordinates CDS (startcodon-stopcodon) from a GFF file. But, the GFF file has cordinates of individual exons (cds). Just wondering if there is any tool/module/script available for this. It should take care of both multi-exonic genes and + or - strand as well. set of examples of GFF file entries are bellow... many thanks in advance gowtham SBRI, Seattle. 1400 TIGR gene 127456 128386 . + . ID=1400.t00213;Name=hypothetical protein 1400 TIGR mRNA 127456 128386 . + . ID=1400.m02493;Parent=1400.t00213 1400 TIGR five_prime_utr 127456 127993 . + . ID=utr5p_of_1400.m02493;Parent=1400.m02493 1400 TIGR exon 127456 128386 . + . ID=1400.e05831;Parent=1400.m02493 1400 TIGR CDS 127994 128314 . + 0 ID=cds_of_1400.m02493;Parent=1400.m02493 1400 TIGR three_prime_utr 128315 128386 . + . ID=utr3p_of_1400.m02493;Parent=1400.m02493 1400 TIGR gene 232655 233965 . - . ID=1400.t00271;Name=pleckstrin homology domain protein, puta tive 1400 TIGR mRNA 232655 233965 . - . ID=1400.m02876;Parent=1400.t00271 1400 TIGR five_prime_utr 233477 233965 . - . ID=utr5p_of_1400.m02876;Parent=1400.m02876 1400 TIGR exon 233339 233965 . - . ID=1400.e05827;Parent=1400.m02876 1400 TIGR CDS 233339 233476 . - 0 ID=cds_of_1400.m02876;Parent=1400.m02876 1400 TIGR exon 233011 233182 . - . ID=1400.e05826;Parent=1400.m02876 1400 TIGR CDS 233011 233182 . - 0 ID=cds_of_1400.m02876;Parent=1400.m02876 1400 TIGR exon 232655 232781 . - . ID=1400.e05825;Parent=1400.m02876 1400 TIGR CDS 232729 232781 . - 1 ID=cds_of_1400.m02876;Parent=1400.m02876 1400 TIGR three_prime_utr 232655 232728 . - . ID=utr3p_of_1400.m02876;Parent=1400.m02876 From sac at bioperl.org Fri Mar 2 18:00:28 2007 From: sac at bioperl.org (Steve Chervitz) Date: Fri, 2 Mar 2007 15:00:28 -0800 Subject: [Bioperl-l] frac_aligned_query returning results >1. In-Reply-To: <44255ea80703020814v31495221i72f65db532c0dd9b@mail.gmail.com> References: <44255ea80703011002l245e9576s66319ee695d3bd5b@mail.gmail.com> <8D7B0767-46A6-4083-B94E-D6490B241B84@uiuc.edu> <45E809A5.9060407@sendu.me.uk> <44255ea80703020336y7e423b94rd07acd380fe4b8fd@mail.gmail.com> <45E84B3C.5000402@sendu.me.uk> <44255ea80703020814v31495221i72f65db532c0dd9b@mail.gmail.com> Message-ID: <8f200b4c0703021500k8fbaa8cj7af8971389e7379@mail.gmail.com> Glad you fixed the problem, Sendu. I thought this might have been due to a problem in HSPI::matches() since it was reporting (1507,1507) as an invalid range within (1444,1507), when it should be valid (the last position). So it looked like an edge condition bug, but I didn't confirm. So there still could be a lingering problem in the matches() function, or in the way the matches string is parsed from the report. Speaking of which, HSPI::matches() is quite BLAST-specific. It's even format specific, since it won't work if you are parsing in tabular blast reports as they lack any string of match symbols. I thought about moving the matches implementation in HSPI into BlastHSP.pm, but that module appears to not be used anymore. Not sure the way to go here. Steve On 3/2/07, Thiago Venancio wrote: > > Hi Sendu, > > Great to know you fixed the problem. > I have updated the SearchUtils and seems to be correct now. > > Best! > > Thiago > > > On 3/2/07, Sendu Bala wrote: > > > > Thiago Venancio wrote: > > > Hi Sendu and Chris, > > > > > > Thanks for the help. > > > As I mentioned, I have updated my SearchUtils file from: > > > > > > http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/Search/SearchUtils.pm > > > < > > > http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/Search/SearchUtils.pm > > > > > > > > > I am also using the lates BioPerl version, installed from CPAN. > > > > > > Please find a buggy blast report attached. > > > In this case, the frac_aligned_query() outputs "1.04", but I have > others > > > with " 1.57" for example. > > > > > > Just for a quantitative aspect, I got ">1" values in only 61 / 53,377. > > > > Many thanks for that. > > > > I've committed another fix for SearchUtils so please get revision 1.23 > > and try again. Hopefully all 61 will no longer be >1, but if any are > > please send me sample blast files again. > > > > For anyone interested, the bug was due to a completely unbelievable > > oversight on my part in the contig merging algorithm: I forgot to deal > > with contigs that were fully contained by others. Wow! > > > > > > -- > "The way to get started is to quit talking and begin doing." > Walt Disney > > ======================== > Thiago Motta Venancio, MSc > PhD student in Bioinformatics > University of Sao Paulo > ======================== > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From alexamies at gmail.com Fri Mar 2 21:42:15 2007 From: alexamies at gmail.com (Alex Amies) Date: Fri, 2 Mar 2007 18:42:15 -0800 Subject: [Bioperl-l] New Article on Approaches to Web Development for Bioinformatics Message-ID: <1ad8057e0703021842y683853f5k1c97dbf362f20dda@mail.gmail.com> I have written an article on Approaches to Web Development for Bioinformatics at http://medicalcomputing.net/tools_dna1.php There is a fairly large section on BioPerl at http://medicalcomputing.net/tools_dna13.php I hope that someone gets something useful out of it. I also looking for feedback on it and, in particular, please let me know about any mistakes in it. The intent of the article is to give an overview of various approaches to developing web based tools for bioinformatics. It describes the alternatives at each layer of the system, including the data layer and sources of data, the application programming layer, the web layer, and bioinformatics tools and software libraries. Alex From shameer at ncbs.res.in Sat Mar 3 00:02:55 2007 From: shameer at ncbs.res.in (Shameer Khadar) Date: Sat, 3 Mar 2007 10:32:55 +0530 (IST) Subject: [Bioperl-l] New Article on Approaches to Web Development for Bioinformatics In-Reply-To: <1ad8057e0703021842y683853f5k1c97dbf362f20dda@mail.gmail.com> References: <1ad8057e0703021842y683853f5k1c97dbf362f20dda@mail.gmail.com> Message-ID: <52557.192.168.1.1.1172898175.squirrel@mail.ncbs.res.in> Hi alex, Its interesting, Can you send me a pdf copy of your article ? Cheers, > I have written an article on Approaches to Web Development for > Bioinformatics at > > http://medicalcomputing.net/tools_dna1.php > > There is a fairly large section on BioPerl at > > http://medicalcomputing.net/tools_dna13.php > > I hope that someone gets something useful out of it. I also looking for > feedback on it and, in particular, please let me know about any mistakes > in > it. > > The intent of the article is to give an overview of various approaches to > developing web based tools for bioinformatics. It describes the > alternatives > at each layer of the system, including the data layer and sources of data, > the application programming layer, the web layer, and bioinformatics tools > and software libraries. > > Alex > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Shameer Khadar Jr. Research Fellow Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India T - 91-080-23636420-32 EXT 4241 F - 91-080-23636662/23636675 W - http://www.ncbs.res.in -------------------------------------------------- "Refrain from illusions, insist on work and not words, patiently seek divine and scientific truth." From bix at sendu.me.uk Sat Mar 3 02:46:07 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Sat, 03 Mar 2007 07:46:07 +0000 Subject: [Bioperl-l] New Article on Approaches to Web Development for Bioinformatics In-Reply-To: <1ad8057e0703021842y683853f5k1c97dbf362f20dda@mail.gmail.com> References: <1ad8057e0703021842y683853f5k1c97dbf362f20dda@mail.gmail.com> Message-ID: <45E927BF.4060506@sendu.me.uk> Alex Amies wrote: > I have written an article on Approaches to Web Development for > Bioinformatics at > > http://medicalcomputing.net/tools_dna1.php > > There is a fairly large section on BioPerl at > > http://medicalcomputing.net/tools_dna13.php > > I hope that someone gets something useful out of it. I also looking for > feedback on it and, in particular, please let me know about any mistakes in > it. Thanks for that. Can I suggest you remove the instruction to install BioPerl 1.4 and replace it with one to install the latest version? Ie. summarise information at: http://www.bioperl.org/wiki/Installing_BioPerl Or just point people to that page. From alexamies at gmail.com Sat Mar 3 13:15:41 2007 From: alexamies at gmail.com (Alex Amies) Date: Sat, 3 Mar 2007 10:15:41 -0800 Subject: [Bioperl-l] New Article on Approaches to Web Development for Bioinformatics In-Reply-To: <45E927BF.4060506@sendu.me.uk> References: <1ad8057e0703021842y683853f5k1c97dbf362f20dda@mail.gmail.com> <45E927BF.4060506@sendu.me.uk> Message-ID: <1ad8057e0703031015g7b2d26c2wbda0715e54001612@mail.gmail.com> Sendu, Thanks for your comment. Looking into it a bit more I am confused. I see from the BioPerl download page that Bioperl 1.4.0, which is the version I used, is listed at the latest Stable Release, even though it was released in Dec-2003. Bioperl 1.5.2 is listed as a Developer Release. Is that right? Also, the links to the links to the 1.4.0 zip files are dead. Alex On 3/2/07, Sendu Bala wrote: > Alex Amies wrote: > > I have written an article on Approaches to Web Development for > > Bioinformatics at > > > > http://medicalcomputing.net/tools_dna1.php > > > > There is a fairly large section on BioPerl at > > > > http://medicalcomputing.net/tools_dna13.php > > > > I hope that someone gets something useful out of it. I also looking for > > feedback on it and, in particular, please let me know about any mistakes in > > it. > > Thanks for that. Can I suggest you remove the instruction to install > BioPerl 1.4 and replace it with one to install the latest version? Ie. > summarise information at: > > http://www.bioperl.org/wiki/Installing_BioPerl > > Or just point people to that page. > From alexamies at gmail.com Sat Mar 3 13:46:50 2007 From: alexamies at gmail.com (Alex Amies) Date: Sat, 3 Mar 2007 10:46:50 -0800 Subject: [Bioperl-l] New Article on Approaches to Web Development for Bioinformatics In-Reply-To: <52557.192.168.1.1.1172898175.squirrel@mail.ncbs.res.in> References: <1ad8057e0703021842y683853f5k1c97dbf362f20dda@mail.gmail.com> <52557.192.168.1.1.1172898175.squirrel@mail.ncbs.res.in> Message-ID: <1ad8057e0703031046t3fbd4c07h8d8ec7ea3b864e4@mail.gmail.com> Shameer, I have put a pdf version of the article here: http://medicalcomputing.net/WebDevelopmentBioinformatics.pdf Alex On 3/2/07, Shameer Khadar wrote: > Hi alex, > > Its interesting, Can you send me a pdf copy of your article ? > Cheers, > > > I have written an article on Approaches to Web Development for > > Bioinformatics at > > > > http://medicalcomputing.net/tools_dna1.php > > > > There is a fairly large section on BioPerl at > > > > http://medicalcomputing.net/tools_dna13.php > > > > I hope that someone gets something useful out of it. I also looking for > > feedback on it and, in particular, please let me know about any mistakes > > in > > it. > > > > The intent of the article is to give an overview of various approaches to > > developing web based tools for bioinformatics. It describes the > > alternatives > > at each layer of the system, including the data layer and sources of data, > > the application programming layer, the web layer, and bioinformatics tools > > and software libraries. > > > > Alex > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > Shameer Khadar > Jr. Research Fellow > Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group > National Centre for Biological Sciences (TIFR) > UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India > T - 91-080-23636420-32 EXT 4241 F - 91-080-23636662/23636675 > W - http://www.ncbs.res.in > -------------------------------------------------- > "Refrain from illusions, insist on work and not words, > patiently seek divine and scientific truth." > > From thiago.venancio at gmail.com Sat Mar 3 07:41:39 2007 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Sat, 3 Mar 2007 09:41:39 -0300 Subject: [Bioperl-l] frac_aligned_query returning results >1. In-Reply-To: <8f200b4c0703021500k8fbaa8cj7af8971389e7379@mail.gmail.com> References: <44255ea80703011002l245e9576s66319ee695d3bd5b@mail.gmail.com> <8D7B0767-46A6-4083-B94E-D6490B241B84@uiuc.edu> <45E809A5.9060407@sendu.me.uk> <44255ea80703020336y7e423b94rd07acd380fe4b8fd@mail.gmail.com> <45E84B3C.5000402@sendu.me.uk> <44255ea80703020814v31495221i72f65db532c0dd9b@mail.gmail.com> <8f200b4c0703021500k8fbaa8cj7af8971389e7379@mail.gmail.com> Message-ID: <44255ea80703030441u1034aec4h19b9d93a3f74cc33@mail.gmail.com> Hi all. Sorry about this, but the bug persists. Although the number of problematic cases is too low (3 out of 35139), they are present. Please find attached an example buggy blast report. The line I use to call the function is: print $result->query_name."\t".$hit->frac_aligned_query."\n"; The warning bellow is still appearing a lot of times during processing reports, so I think it is not due to the same bug. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Undefined sub-sequence (821,821). Valid range = 778 - 821 STACK: Error::throw STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:328 STACK: Bio::Search::HSP::HSPI::matches /usr/share/perl5/Bio/Search/HSP/HSPI.pm:711 STACK: Bio::Search::SearchUtils::_adjust_contigs /usr/share/perl5/Bio/Search/SearchUtils.pm:421 STACK: Bio::Search::SearchUtils::tile_hsps /usr/share/perl5/Bio/Search/SearchUtils.pm:200 STACK: Bio::Search::Hit::GenericHit::frac_aligned_query /usr/share/perl5/Bio/Search/Hit/GenericHit.pm:1145 STACK: ./geraStatGenome.pl:34 ----------------------------------------------------------- I have checked the code, but I have no idea about what is happening in this case. the attached file produced the ">1" result and pops the exception error, so it could be useful. Thiago On 3/2/07, Steve Chervitz wrote: > > Glad you fixed the problem, Sendu. > > I thought this might have been due to a problem in HSPI::matches() since > it was reporting (1507,1507) as an invalid range within (1444,1507), when it > should be valid (the last position). So it looked like an edge condition > bug, but I didn't confirm. So there still could be a lingering problem in > the matches() function, or in the way the matches string is parsed from the > report. > > Speaking of which, HSPI::matches() is quite BLAST-specific. It's even > format specific, since it won't work if you are parsing in tabular blast > reports as they lack any string of match symbols. I thought about moving the > matches implementation in HSPI into BlastHSP.pm, but that module appears > to not be used anymore. Not sure the way to go here. > > Steve > > On 3/2/07, Thiago Venancio < thiago.venancio at gmail.com> wrote: > > > Hi Sendu, > > > > Great to know you fixed the problem. > > I have updated the SearchUtils and seems to be correct now. > > > > Best! > > > > Thiago > > > > > > On 3/2/07, Sendu Bala wrote: > > > > > > Thiago Venancio wrote: > > > > Hi Sendu and Chris, > > > > > > > > Thanks for the help. > > > > As I mentioned, I have updated my SearchUtils file from: > > > > > > > > > http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/Search/SearchUtils.pm > > > > < > > > > > http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/Search/SearchUtils.pm > > > > > > > > > > > > I am also using the lates BioPerl version, installed from CPAN. > > > > > > > > Please find a buggy blast report attached. > > > > In this case, the frac_aligned_query() outputs "1.04", but I have > > others > > > > with " 1.57" for example. > > > > > > > > Just for a quantitative aspect, I got ">1" values in only 61 / > > 53,377. > > > > > > Many thanks for that. > > > > > > I've committed another fix for SearchUtils so please get revision 1.23 > > > and try again. Hopefully all 61 will no longer be >1, but if any are > > > please send me sample blast files again. > > > > > > For anyone interested, the bug was due to a completely unbelievable > > > oversight on my part in the contig merging algorithm: I forgot to deal > > > with contigs that were fully contained by others. Wow! > > > > > > > > > > > -- > > "The way to get started is to quit talking and begin doing." > > Walt Disney > > > > ======================== > > Thiago Motta Venancio, MSc > > PhD student in Bioinformatics > > University of Sao Paulo > > ======================== > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- "The way to get started is to quit talking and begin doing." Walt Disney ======================== Thiago Motta Venancio, MSc PhD student in Bioinformatics University of Sao Paulo ======================== -------------- next part -------------- BLASTN 2.2.6 [Apr-09-2003] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= AEDES_05359.C (821 letters) Database: aaegypti.SCAFFOLDS-MASKED.AEDES1.fa 4758 sequences; 1,383,971,543 total letters Searching..........done Score E Sequences producing significant alignments: (bits) Value supercontig:1:supercont1.60:1:2993848:1 supercontig supercont1.60 779 0.0 >supercontig:1:supercont1.60:1:2993848:1 supercontig supercont1.60 Length = 2993848 Score = 779 bits (393), Expect = 0.0 Identities = 393/393 (100%) Strand = Plus / Minus Query: 336 cctttcatttttacggtgaccttcaccatcggcttctgatgacggcaaaaacgtgtgtgc 395 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2976894 cctttcatttttacggtgaccttcaccatcggcttctgatgacggcaaaaacgtgtgtgc 2976835 Query: 396 ctaagttacatgtgccaaaagtttctatttctaccgagtcttgcgtcgtgtgtcgtgagt 455 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2976834 ctaagttacatgtgccaaaagtttctatttctaccgagtcttgcgtcgtgtgtcgtgagt 2976775 Query: 456 gaagattgggaagagaacgaaagcctactaaaagcttttttggcatggtgacaagtctcc 515 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2976774 gaagattgggaagagaacgaaagcctactaaaagcttttttggcatggtgacaagtctcc 2976715 Query: 516 acgtcttgcgaaatggcgtttccttttatagccacgggtgttcccacacgttttggggta 575 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2976714 acgtcttgcgaaatggcgtttccttttatagccacgggtgttcccacacgttttggggta 2976655 Query: 576 ctgtcgggagtagttgctatacgttcaacaggtttaattttgccttgtccgacatgaact 635 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2976654 ctgtcgggagtagttgctatacgttcaacaggtttaattttgccttgtccgacatgaact 2976595 Query: 636 ttttcgggttgtccaggtgtaggagttgcagctacgagttggcgcaacaggaatgagatc 695 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2976594 ttttcgggttgtccaggtgtaggagttgcagctacgagttggcgcaacaggaatgagatc 2976535 Query: 696 attttgttttgttctggcgggattcttactgct 728 ||||||||||||||||||||||||||||||||| Sbjct: 2976534 attttgttttgttctggcgggattcttactgct 2976502 Score = 726 bits (366), Expect = 0.0 Identities = 388/394 (98%), Gaps = 1/394 (0%) Strand = Plus / Minus Query: 336 cctttcatttttacggtgaccttcaccatcggcttctgatgacggcaaaaacgtgtgtgc 395 |||||||||||||||||||||||||||||||||||||||||||| |||||||||||||| Sbjct: 2955826 cctttcatttttacggtgaccttcaccatcggcttctgatgacgacaaaaacgtgtgtgt 2955767 Query: 396 ctaagttacatgtgccaaaagtttctatttctaccgagtcttgcgtcgtgtgtcgtgagt 455 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2955766 ctaagttacatgtgccaaaagtttctatttctaccgagtcttgcgtcgtgtgtcgtgagt 2955707 Query: 456 gaagattgggaagagaacgaaagcctactaaaagcttttttggcatggtgacaagtctcc 515 | | |||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2955706 ggaaattgggaagagaacgaaagcctactaaaagcttttttggcatggtgacaagtctcc 2955647 Query: 516 acgtcttgcgaaatggcgtttccttttatagccacgggtgttcccacacg-ttttggggt 574 |||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||| Sbjct: 2955646 acgtcttgcgaaatggcgtttccttttatagccacgggtgttcccacacgtttttggggt 2955587 Query: 575 actgtcgggagtagttgctatacgttcaacaggtttaattttgccttgtccgacatgaac 634 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2955586 actgtcgggagtagttgctatacgttcaacaggtttaattttgccttgtccgacatgaac 2955527 Query: 635 tttttcgggttgtccaggtgtaggagttgcagctacgagttggcgcaacaggaatgagat 694 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2955526 tttttcgggttgtccaggtgtaggagttgcagctacgagttggcgcaacaggaatgagat 2955467 Query: 695 cattttgttttgttctggcgggattcttactgct 728 |||||||||||||||||| ||||||||||||||| Sbjct: 2955466 cattttgttttgttctggggggattcttactgct 2955433 Score = 630 bits (318), Expect = e-178 Identities = 333/338 (98%) Strand = Plus / Minus Query: 1 gaaactttgtaattaagtgtaaaatatctgcctatctgtgaatttcgccagactatcaat 60 |||||||||||||||||||||||||| ||||||| ||||||||||||||||||||||||| Sbjct: 2966288 gaaactttgtaattaagtgtaaaatacctgcctacctgtgaatttcgccagactatcaat 2966229 Query: 61 ccatggttaacttttgtcctatcgtcaagatatagtttacaaagatagattattgattat 120 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2966228 ccatggttaacttttgtcctatcgtcaagatatagtttacaaagatagattattgattat 2966169 Query: 121 tgatcttaccaagaaacttgttgattacttcgatcgagacctggaatgattgcacacaca 180 |||||||||| ||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2966168 tgatcttacccagaaacttgttgattacttcgatcgagacctggaatgattgcacacaca 2966109 Query: 181 gcaatgctctgacacctacttcttcgtacaatatttctgcctctttgttatcatcgtctt 240 ||||||||||||||||||||||||||||||||||||||||||||| ||||||||| |||| Sbjct: 2966108 gcaatgctctgacacctacttcttcgtacaatatttctgcctcttcgttatcatcatctt 2966049 Query: 241 cgtgggcaattgggtccgaaccctccgaatcaaatttgtcgggctctacttctcttttgg 300 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2966048 cgtgggcaattgggtccgaaccctccgaatcaaatttgtcgggctctacttctcttttgg 2965989 Query: 301 gattcatagtggcttgacctcaagcgctaattaatcct 338 |||||||||||||||||||||||||||||||||||||| Sbjct: 2965988 gattcatagtggcttgacctcaagcgctaattaatcct 2965951 Score = 486 bits (245), Expect = e-135 Identities = 245/245 (100%) Strand = Plus / Minus Query: 94 agtttacaaagatagattattgattattgatcttaccaagaaacttgttgattacttcga 153 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2980127 agtttacaaagatagattattgattattgatcttaccaagaaacttgttgattacttcga 2980068 Query: 154 tcgagacctggaatgattgcacacacagcaatgctctgacacctacttcttcgtacaata 213 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2980067 tcgagacctggaatgattgcacacacagcaatgctctgacacctacttcttcgtacaata 2980008 Query: 214 tttctgcctctttgttatcatcgtcttcgtgggcaattgggtccgaaccctccgaatcaa 273 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2980007 tttctgcctctttgttatcatcgtcttcgtgggcaattgggtccgaaccctccgaatcaa 2979948 Query: 274 atttgtcgggctctacttctcttttgggattcatagtggcttgacctcaagcgctaatta 333 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2979947 atttgtcgggctctacttctcttttgggattcatagtggcttgacctcaagcgctaatta 2979888 Query: 334 atcct 338 ||||| Sbjct: 2979887 atcct 2979883 Score = 194 bits (98), Expect = 3e-47 Identities = 98/98 (100%) Strand = Plus / Minus Query: 1 gaaactttgtaattaagtgtaaaatatctgcctatctgtgaatttcgccagactatcaat 60 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2982763 gaaactttgtaattaagtgtaaaatatctgcctatctgtgaatttcgccagactatcaat 2982704 Query: 61 ccatggttaacttttgtcctatcgtcaagatatagttt 98 |||||||||||||||||||||||||||||||||||||| Sbjct: 2982703 ccatggttaacttttgtcctatcgtcaagatatagttt 2982666 Score = 145 bits (73), Expect = 2e-32 Identities = 87/94 (92%) Strand = Plus / Minus Query: 727 cttaaaaaattctacnnnnnnngtttacaatatcaaaactacagtcgacacacatatttt 786 ||||||||||||||| |||||||||||||||||||||||||||||||||||||| Sbjct: 2976416 cttaaaaaattctactttttttgtttacaatatcaaaactacagtcgacacacatatttt 2976357 Query: 787 gttaatttgtaggtgttgcttcgattcatcttca 820 |||||||||||||||||||||||||||||||||| Sbjct: 2976356 gttaatttgtaggtgttgcttcgattcatcttca 2976323 Score = 71.9 bits (36), Expect = 3e-10 Identities = 42/44 (95%) Strand = Plus / Minus Query: 778 acatattttgttaatttgtaggtgttgcttcgattcatcttcac 821 ||||||||||||||||||||||||| ||||||||||||| |||| Sbjct: 2955299 acatattttgttaatttgtaggtgtggcttcgattcatcatcac 2955256 >supercontig:1:supercont1.971:1:313087:1 supercontig supercont1.971 Length = 313087 Score = 139 bits (70), Expect = 1e-30 Identities = 103/114 (90%) Strand = Plus / Plus Query: 192 acacctacttcttcgtacaatatttctgcctctttgttatcatcgtcttcgtgggcaatt 251 |||| ||||||||||||| ||||||||||||||| ||||||||||||||||||||| ||| Sbjct: 202647 acacttacttcttcgtacgatatttctgcctcttcgttatcatcgtcttcgtgggccatt 202706 Query: 252 gggtccgaaccctccgaatcaaatttgtcgggctctacttctcttttgggattc 305 |||||||| ||||||||| ||| ||| |||||||||| |||| ||| ||||||| Sbjct: 202707 gggtccgatccctccgaagcaattttttcgggctctatttctttttcgggattc 202760 Score = 71.9 bits (36), Expect = 3e-10 Identities = 48/52 (92%) Strand = Plus / Plus Query: 123 atcttaccaagaaacttgttgattacttcgatcgagacctggaatgattgca 174 |||||||||||||||||||| ||||| || ||||||||| |||||||||||| Sbjct: 202569 atcttaccaagaaacttgtttattacgtctatcgagacccggaatgattgca 202620 Score = 71.9 bits (36), Expect = 3e-10 Identities = 72/84 (85%) Strand = Plus / Plus Query: 22 aaatatctgcctatctgtgaatttcgccagactatcaatccatggttaacttttgtccta 81 ||||| ||||||| |||||| |||||||| || ||||||||||| |||||||||||||| Sbjct: 202376 aaataactgcctacctgtgattttcgccatacattcaatccatggataacttttgtccta 202435 Query: 82 tcgtcaagatatagtttacaaaga 105 || ||||||| ||| ||||||| Sbjct: 202436 tctccaagatacggttcacaaaga 202459 Database: aaegypti.SCAFFOLDS-MASKED.AEDES1.fa Posted date: Nov 6, 2006 5:26 PM Number of letters in database: 1,383,971,543 Number of sequences in database: 4758 Lambda K H 1.37 0.711 1.31 Gapped Lambda K H 1.37 0.711 1.31 Matrix: blastn matrix:1 -3 Gap Penalties: Existence: 5, Extension: 2 Number of Hits to DB: 758,150 Number of Sequences: 4758 Number of extensions: 758150 Number of successful extensions: 7086 Number of sequences better than 1.0e-05: 4 Number of HSP's better than 0.0 without gapping: 4 Number of HSP's successfully gapped in prelim test: 0 Number of HSP's that attempted gapping in prelim test: 7012 Number of HSP's gapped (non-prelim): 73 length of query: 821 length of database: 1,383,971,543 effective HSP length: 20 effective length of query: 801 effective length of database: 1,383,876,383 effective search space: 1108484982783 effective search space used: 1108484982783 T: 0 A: 0 X1: 6 (11.9 bits) X2: 15 (29.7 bits) S1: 12 (24.3 bits) S2: 29 (58.0 bits) From lstein at cshl.edu Sat Mar 3 14:15:43 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Sat, 3 Mar 2007 12:15:43 -0700 Subject: [Bioperl-l] [Bioperl-announce-l] BioPerl leadership additions In-Reply-To: <8f200b4c0702270959n7521f693l915bfabe7ccb7ef7@mail.gmail.com> References: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org><8f200b4c0702270959n7521f693l915bfabe7ccb7ef7@mail.gmail.com> Message-ID: <000001c75dc8$5710a720$6400a8c0@CodonSolutions.local> Hear hear! Lincoln On 2/27/07, Steve Chervitz wrote: > > Welcome to the club, Chris & Sendu. Always good to have an infusion of new > blood and capable, motivated hands. > > Steve > > On 2/26/07, Jason Stajich wrote: > > > > Dear BioPerl Users and Developers, > > > > I want to announce a addition in the leadership of BioPerl. > > Christopher Fields and and Sendu Bala are now members of the BioPerl > > Core developer group to recognize their ongoing leadership in the > > project. Chris and Sendu were instrumental in the 1.5.2 Developer > > release and have made a significant commitment and contribution to > > the quality of the code and the documentation of the project. We > > have invited them to be part of the core to recognize their work and > > to feel comfortable to ask them to do more. ;-) > > > > The Core group was established to insure that someone was responsible > > for making code releases, vetting new developers for CVS write > > accounts, and generally dealing with things that might otherwise slip > > through the cracks. We are very excited to have more people > > contributing to and maintaining the toolkit. We look forward to > > their help along with all the other developers, as we work towards a > > 1.6 release release this year. > > > > As always, while their is a need for some individuals to lead the > > project, we encourage contributions from all levels of expertise to > > improve the code, documentation, and tutorials of the project. > > > > We plan to discuss the progress of the toolkit at this year's > > Bioinformatics Open Source Conference held in Vienna, Austria in > > conjunction with the SIG meetings at ISMB. We are trying to use > > BOSC 2007 as a chance for the developers of Open Bioinformatics > > Foundation sponsored and related projects to coordinate future > > development and release cycles. > > > > Jason Stajich on behalf of the Core developers > > > > _______________________________________________ > > Bioperl-announce-l mailing list > > Bioperl-announce-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu _______________________________________________ Bioperl-announce-l mailing list Bioperl-announce-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l From cjfields at uiuc.edu Sat Mar 3 17:07:40 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 3 Mar 2007 16:07:40 -0600 Subject: [Bioperl-l] frac_aligned_query returning results >1. In-Reply-To: <44255ea80703030441u1034aec4h19b9d93a3f74cc33@mail.gmail.com> References: <44255ea80703011002l245e9576s66319ee695d3bd5b@mail.gmail.com> <8D7B0767-46A6-4083-B94E-D6490B241B84@uiuc.edu> <45E809A5.9060407@sendu.me.uk> <44255ea80703020336y7e423b94rd07acd380fe4b8fd@mail.gmail.com> <45E84B3C.5000402@sendu.me.uk> <44255ea80703020814v31495221i72f65db532c0dd9b@mail.gmail.com> <8f200b4c0703021500k8fbaa8cj7af8971389e7379@mail.gmail.com> <44255ea80703030441u1034aec4h19b9d93a3f74cc33@mail.gmail.com> Message-ID: <5BE8C067-BFCB-436F-BFED-1644618E8686@uiuc.edu> Thiago, Could you file a bug report and add the relevant files as attachments? http://www.bioperl.org/wiki/Bugs http://bugzilla.open-bio.org/ chris On Mar 3, 2007, at 6:41 AM, Thiago Venancio wrote: > Hi all. > > Sorry about this, but the bug persists. Although the number of > problematic > cases is too low (3 out of 35139), they are present. > > Please find attached an example buggy blast report. > > The line I use to call the function is: > print $result->query_name."\t".$hit->frac_aligned_query."\n"; > > The warning bellow is still appearing a lot of times during processing > reports, so I think it is not due to the same bug. > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Undefined sub-sequence (821,821). Valid range = 778 - 821 > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:328 > STACK: Bio::Search::HSP::HSPI::matches > /usr/share/perl5/Bio/Search/HSP/HSPI.pm:711 > STACK: Bio::Search::SearchUtils::_adjust_contigs > /usr/share/perl5/Bio/Search/SearchUtils.pm:421 > STACK: Bio::Search::SearchUtils::tile_hsps > /usr/share/perl5/Bio/Search/SearchUtils.pm:200 > STACK: Bio::Search::Hit::GenericHit::frac_aligned_query > /usr/share/perl5/Bio/Search/Hit/GenericHit.pm:1145 > STACK: ./geraStatGenome.pl:34 > ----------------------------------------------------------- > > I have checked the code, but I have no idea about what is happening > in this > case. the attached file produced the ">1" result and pops the > exception > error, so it could be useful. > > Thiago > > > On 3/2/07, Steve Chervitz wrote: >> >> Glad you fixed the problem, Sendu. >> >> I thought this might have been due to a problem in HSPI::matches() >> since >> it was reporting (1507,1507) as an invalid range within >> (1444,1507), when it >> should be valid (the last position). So it looked like an edge >> condition >> bug, but I didn't confirm. So there still could be a lingering >> problem in >> the matches() function, or in the way the matches string is parsed >> from the >> report. >> >> Speaking of which, HSPI::matches() is quite BLAST-specific. It's even >> format specific, since it won't work if you are parsing in tabular >> blast >> reports as they lack any string of match symbols. I thought about >> moving the >> matches implementation in HSPI into BlastHSP.pm, but that module >> appears >> to not be used anymore. Not sure the way to go here. >> >> Steve >> >> On 3/2/07, Thiago Venancio < thiago.venancio at gmail.com> wrote: >> >> > Hi Sendu, >> > >> > Great to know you fixed the problem. >> > I have updated the SearchUtils and seems to be correct now. >> > >> > Best! >> > >> > Thiago >> > >> > >> > On 3/2/07, Sendu Bala wrote: >> > > >> > > Thiago Venancio wrote: >> > > > Hi Sendu and Chris, >> > > > >> > > > Thanks for the help. >> > > > As I mentioned, I have updated my SearchUtils file from: >> > > > >> > > >> > http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl- >> live/Bio/Search/SearchUtils.pm >> > > > < >> > > >> > http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl- >> live/Bio/Search/SearchUtils.pm >> > > > >> > > > >> > > > I am also using the lates BioPerl version, installed from CPAN. >> > > > >> > > > Please find a buggy blast report attached. >> > > > In this case, the frac_aligned_query() outputs "1.04", but I >> have >> > others >> > > > with " 1.57" for example. >> > > > >> > > > Just for a quantitative aspect, I got ">1" values in only 61 / >> > 53,377. >> > > >> > > Many thanks for that. >> > > >> > > I've committed another fix for SearchUtils so please get >> revision 1.23 >> > > and try again. Hopefully all 61 will no longer be >1, but if >> any are >> > > please send me sample blast files again. >> > > >> > > For anyone interested, the bug was due to a completely >> unbelievable >> > > oversight on my part in the contig merging algorithm: I forgot >> to deal >> > > with contigs that were fully contained by others. Wow! >> > > >> > >> > >> > >> > -- >> > "The way to get started is to quit talking and begin doing." >> > Walt Disney >> > >> > ======================== >> > Thiago Motta Venancio, MSc >> > PhD student in Bioinformatics >> > University of Sao Paulo >> > ======================== >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> > > > -- > "The way to get started is to quit talking and begin doing." > Walt Disney > > ======================== > Thiago Motta Venancio, MSc > PhD student in Bioinformatics > University of Sao Paulo > ======================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From arareko at campus.iztacala.unam.mx Sat Mar 3 17:32:46 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Sat, 03 Mar 2007 16:32:46 -0600 Subject: [Bioperl-l] New Article on Approaches to Web Development for Bioinformatics In-Reply-To: <1ad8057e0703021842y683853f5k1c97dbf362f20dda@mail.gmail.com> References: <1ad8057e0703021842y683853f5k1c97dbf362f20dda@mail.gmail.com> Message-ID: <45E9F78E.8040406@campus.iztacala.unam.mx> Hi Alex, I think you've put a very nice & concise introductory article. I'd like to comment a little on some sections I've read: * Introduction > "Given that you have an idea for analyzing or presenting data in a > particular was, a complete bioinformatics web application depends of > these basic pieces, which is what this article is all about: > > 1. A source of data... > 2. An application programming language... > 3. A web application platform... > 4. Optionally, a data store... > 5. Optionally, you would reuse software tools..." Even though you do a small mention about Web Services at the very end of the article (under Application Integration -> Programmatic Integration), I believe that Web Services can be another optional (or even basic) piece of a web application. In fact, many web applications consist only of Web Services without HTML user interfaces. * Application Development Languages > "There are many different programming platforms and tools available to > solve bioinformatics problems. It can be bewildering at first, but it > makes more sense to build on top of some of these tools rather than > build from scratch. Some the problems with using these tools for a > bioinformatics portal are > > 1. Many tools are written... > 2. Some tools have particular prerequisites... > 3. Many may not be in a form... > 4. The context that gives meaning... > > Standardization on a particular platform can help manageability but > for most organizations a compromise between standardization and > adoption of several different platforms will allow many people to > develop software in platforms that they are already comfortable with > and allow the reuse of a large amount of freely available software..." I would add to the problems list the fact that building web (or other kind of) applications on top of a platform whose codebase is evolving constantly, can make them very difficult to maintain. The case of EnsEMBL comes to my mind here: they opted to stick with BioPerl 1.2.3 as a core library and haven't moved onto a higher version of it because the EnsEMBL code is so vast, that a simple upgrade of BioPerl would break a lot of their code. AFAIK, it's because of this and the slowness at some parts of BioPerl that EnsEMBL is gradually saying goodbye to BioPerl. Also, I think that depending on the amount of available code you plan to import into your application, sometimes having a whole platform at the very bottom can add unnecessary extra weight to your application. More weight could be equal to less speed, this is critical in web development. * Application Integration -> Navigation > "The basic way that users will navigate into and around your > application should be using HTTP GET and POST requests with specific > URL's. Users bookmark these URL's and other applications will link to > them. Most applications developers did not realize it at first, but > these URL's are, in fact, an interface into your application that you > must maintain in a consistent way as you change and evolve your > software. Otherwise, they will find dead links..." Just as I clicked the bookmark button for your article :) The same principle could apply to its filenames. A URL of the form: http://medicalcomputing.net/tools_dna17.php is less indicative of the real content of the article and can mislead potential readers. Optimising the URL's will make them better to be indexed by search engines, something like: http://medicalcomputing.net/web-development-bioinformatics17.php would do the trick. To conclude my comments, I was surprised to see a section about BioPHP and not about other more-known toolkits like BioPython or BioRuby. What about their role in web development? Python is also a common language for web programming and with all the recent *hot* stuff like Ruby On Rails, it's very likely that both Bio* toolkits are more than ready for deploying web applications. I'm Cc'ing this to their respective mailing lists to see if someone wants to give you some feedback about them in order to complement your article. Other than that, I really liked your work :) Cheers, Mauricio. Alex Amies wrote: > I have written an article on Approaches to Web Development for > Bioinformatics at > > http://medicalcomputing.net/tools_dna1.php > > There is a fairly large section on BioPerl at > > http://medicalcomputing.net/tools_dna13.php > > I hope that someone gets something useful out of it. I also looking for > feedback on it and, in particular, please let me know about any mistakes in > it. > > The intent of the article is to give an overview of various approaches to > developing web based tools for bioinformatics. It describes the alternatives > at each layer of the system, including the data layer and sources of data, > the application programming layer, the web layer, and bioinformatics tools > and software libraries. > > Alex > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From jason at bioperl.org Sat Mar 3 21:56:32 2007 From: jason at bioperl.org (Jason Stajich) Date: Sat, 3 Mar 2007 18:56:32 -0800 Subject: [Bioperl-l] New Article on Approaches to Web Development for Bioinformatics In-Reply-To: <1ad8057e0703031015g7b2d26c2wbda0715e54001612@mail.gmail.com> References: <1ad8057e0703021842y683853f5k1c97dbf362f20dda@mail.gmail.com> <45E927BF.4060506@sendu.me.uk> <1ad8057e0703031015g7b2d26c2wbda0715e54001612@mail.gmail.com> Message-ID: <2FF966D6-04AE-41F6-BFD8-56962940CED3@bioperl.org> http://bioperl.org/wiki/ FAQ#What_is_the_difference_between_1.5.2_and_1.4.0.3F_What_do_you_mean_d eveloper_release.3F We'll do a new stable series (1.6) sometime this year. There have been several API changes that need to be vetted or removed before we can really make 1.6. Chris, Sendu, Torsten, and many others have been doing a great job bringing things up to speed for a very solid new stable release series. zipfiles are up now. thanks. -jason On Mar 3, 2007, at 10:15 AM, Alex Amies wrote: > Sendu, > > Thanks for your comment. Looking into it a bit more I am confused. > > I see from the BioPerl download page that Bioperl 1.4.0, which is the > version I used, is listed at the latest Stable Release, even though it > was released in Dec-2003. Bioperl 1.5.2 is listed as a Developer > Release. Is that right? Also, the links to the links to the 1.4.0 > zip files are dead. > > Alex > > On 3/2/07, Sendu Bala wrote: >> Alex Amies wrote: >>> I have written an article on Approaches to Web Development for >>> Bioinformatics at >>> >>> http://medicalcomputing.net/tools_dna1.php >>> >>> There is a fairly large section on BioPerl at >>> >>> http://medicalcomputing.net/tools_dna13.php >>> >>> I hope that someone gets something useful out of it. I also >>> looking for >>> feedback on it and, in particular, please let me know about any >>> mistakes in >>> it. >> >> Thanks for that. Can I suggest you remove the instruction to install >> BioPerl 1.4 and replace it with one to install the latest version? >> Ie. >> summarise information at: >> >> http://www.bioperl.org/wiki/Installing_BioPerl >> >> Or just point people to that page. >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070303/baaea26c/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2613 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070303/baaea26c/attachment.bin From alexamies at gmail.com Sat Mar 3 22:09:51 2007 From: alexamies at gmail.com (Alex Amies) Date: Sat, 3 Mar 2007 19:09:51 -0800 Subject: [Bioperl-l] New Article on Approaches to Web Development for Bioinformatics In-Reply-To: <45E9F78E.8040406@campus.iztacala.unam.mx> References: <1ad8057e0703021842y683853f5k1c97dbf362f20dda@mail.gmail.com> <45E9F78E.8040406@campus.iztacala.unam.mx> Message-ID: <1ad8057e0703031909v4880f5f1t3c4159b75c36bcca@mail.gmail.com> Mauricio, Thanks for your comments. You are right that I could have said a lot more about web services. I plan on doing that but I haven't got there yet. Actually, with all the hype about web services I have been surprised to find the programming model so complicated. As you mention, I certainly could have thought out my own URL's better. I have been surprised not to find more PHP activity in bioinformatics. To me, besides being a lightweight and pleasant language to program in it is incredibly economical for hosting Internet applications and there is a huge open source community around PHP in general. The same can be said of Perl. It is because of my own ignorance and lack of time that I have not investigated Python and Ruby. I may do in the future and write about them. Alex On 3/3/07, Mauricio Herrera Cuadra wrote: > Hi Alex, > > I think you've put a very nice & concise introductory article. I'd like > to comment a little on some sections I've read: > > * Introduction > > > "Given that you have an idea for analyzing or presenting data in a > > particular was, a complete bioinformatics web application depends of > > these basic pieces, which is what this article is all about: > > > > 1. A source of data... > > 2. An application programming language... > > 3. A web application platform... > > 4. Optionally, a data store... > > 5. Optionally, you would reuse software tools..." > > Even though you do a small mention about Web Services at the very end of > the article (under Application Integration -> Programmatic Integration), > I believe that Web Services can be another optional (or even basic) > piece of a web application. In fact, many web applications consist only > of Web Services without HTML user interfaces. > > * Application Development Languages > > > "There are many different programming platforms and tools available to > > solve bioinformatics problems. It can be bewildering at first, but it > > makes more sense to build on top of some of these tools rather than > > build from scratch. Some the problems with using these tools for a > > bioinformatics portal are > > > > 1. Many tools are written... > > 2. Some tools have particular prerequisites... > > 3. Many may not be in a form... > > 4. The context that gives meaning... > > > > Standardization on a particular platform can help manageability but > > for most organizations a compromise between standardization and > > adoption of several different platforms will allow many people to > > develop software in platforms that they are already comfortable with > > and allow the reuse of a large amount of freely available software..." > > I would add to the problems list the fact that building web (or other > kind of) applications on top of a platform whose codebase is evolving > constantly, can make them very difficult to maintain. The case of > EnsEMBL comes to my mind here: they opted to stick with BioPerl 1.2.3 as > a core library and haven't moved onto a higher version of it because the > EnsEMBL code is so vast, that a simple upgrade of BioPerl would break a > lot of their code. AFAIK, it's because of this and the slowness at some > parts of BioPerl that EnsEMBL is gradually saying goodbye to BioPerl. > > Also, I think that depending on the amount of available code you plan to > import into your application, sometimes having a whole platform at the > very bottom can add unnecessary extra weight to your application. More > weight could be equal to less speed, this is critical in web development. > > * Application Integration -> Navigation > > > "The basic way that users will navigate into and around your > > application should be using HTTP GET and POST requests with specific > > URL's. Users bookmark these URL's and other applications will link to > > them. Most applications developers did not realize it at first, but > > these URL's are, in fact, an interface into your application that you > > must maintain in a consistent way as you change and evolve your > > software. Otherwise, they will find dead links..." > > Just as I clicked the bookmark button for your article :) The same > principle could apply to its filenames. A URL of the form: > http://medicalcomputing.net/tools_dna17.php is less indicative of the > real content of the article and can mislead potential readers. > Optimising the URL's will make them better to be indexed by search > engines, something like: > http://medicalcomputing.net/web-development-bioinformatics17.php would > do the trick. > > To conclude my comments, I was surprised to see a section about BioPHP > and not about other more-known toolkits like BioPython or BioRuby. What > about their role in web development? Python is also a common language > for web programming and with all the recent *hot* stuff like Ruby On > Rails, it's very likely that both Bio* toolkits are more than ready for > deploying web applications. I'm Cc'ing this to their respective mailing > lists to see if someone wants to give you some feedback about them in > order to complement your article. Other than that, I really liked your > work :) > > Cheers, > Mauricio. > > Alex Amies wrote: > > I have written an article on Approaches to Web Development for > > Bioinformatics at > > > > http://medicalcomputing.net/tools_dna1.php > > > > There is a fairly large section on BioPerl at > > > > http://medicalcomputing.net/tools_dna13.php > > > > I hope that someone gets something useful out of it. I also looking for > > feedback on it and, in particular, please let me know about any mistakes in > > it. > > > > The intent of the article is to give an overview of various approaches to > > developing web based tools for bioinformatics. It describes the alternatives > > at each layer of the system, including the data layer and sources of data, > > the application programming layer, the web layer, and bioinformatics tools > > and software libraries. > > > > Alex > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > > > > From hlapp at gmx.net Sun Mar 4 09:54:06 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 4 Mar 2007 09:54:06 -0500 Subject: [Bioperl-l] Phyloinformatics Summer of Code Message-ID: The Phyloinformatics Hackathon group, a significant fraction of which is fielded by BioPerl-affiliated people (see http:// phyloinformatics.net/Participants), is preparing to apply for the Google Summer of Code program. The page for collecting ideas etc is at http://phyloinformatics.net/Phyloinformatics_Summer_of_Code_2007 This is mostly a stub right now but will rapidly (have to) be fleshed out over the next couple of days (the deadline for application is March 9). Please feel free to add ideas that you have directly (wiki registration is open), or email them to me. If we are accepted, we'll (hopefully) have students over the summer, some of which will possibly work on BioPerl-related projects. (There will be non-BioPerl projects as well.) These may be newcomers to BioPerl, newcomers to distributed OSS development, or even programming newbies ... Given the helping hand that this community has readily extended to newbies in the past, I'm hoping that you'll help us help them overcome the initial barriers, too. If anyone is willing to go beyond that and would be willing to help out as a mentor (or back-up mentor), that'd be awesome; just drop me an email. Cheers, -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From lubapardo at gmail.com Mon Mar 5 07:34:37 2007 From: lubapardo at gmail.com (Luba Pardo) Date: Mon, 5 Mar 2007 13:34:37 +0100 Subject: [Bioperl-l] Probles with clustalw Message-ID: <58ff33550703050434r1fa2d3femfb3bfe2686258f48@mail.gmail.com> Hello, I am learning how to use the module of Bio::Tools::Run::Alignment::Clustalw. I started to run the example provided at the module documentation, but it does not work with my input file. the script is use warnings; BEGIN { $ENV{CLUSTALDIR} = '/home/luba/bin/clustalx1.82.linux/';} use Bio::Tools::Run::Alignment::Clustalw; #use Bio::Root::Root::Run::WrapperBase; # Build a clustalw alignment factory my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); # Pass the factory a list of sequences to be aligned. my $inputfilename = 'cysprot.fa'; my $aln = $factory->align($inputfilename); # $aln is a SimpleAlign object. # or my $seq_array_ref = \@seq_array; # where @seq_array is an array of Bio::Seq objects my $aln = $factory->align($seq_array_ref); # Or one can pass the factory a pair of (sub)alignments #to be aligned against each other, e.g.: my $aln = $factory->profile_align($aln1,$aln2); # where $aln1 and $aln2 are Bio::SimpleAlign objects. # Or one can pass the factory an alignment and one or more unaligned # sequences to be added to the alignment. For example: my $aln = $factory->profile_align($aln1,$seq); # $seq is a Bio::Seq object. # Get a tree of the sequences my $tree = $factory->tree(\@seq_array); # Get both an alignment and a tree my ($aln, $tree) = $factory->run(\@seq_array); # Do a footprinting analysis on the supplied sequences, getting back the # most conserved sub-alignments my @results = $factory->footprint(\@seq_array); foreach my $result (@results) { print $result->consensus_string, "\n"; } ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Bad input data (sequences need an id ) or less than 2 sequences in ARRAY(0x8861280) ! STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.1/Bio/Root/Root.pm:359 STACK: Bio::Tools::Run::Alignment::Clustalw::align /usr/lib/perl5/site_perl/5.8.1/Bio/Tools/Run/Alignment/Clustalw.pm:484 STACK: clustal1.pl:17 ----------------------------------------------------------- The input file is OK as I run the program clustalw with my input file of fasta sequences and it worked. Also if I leave the sentence # $aln = $factory->align($seq_array_ref); as a comment I get another error: EXCEPTION: Bio::Root::NotImplemented ------------- MSG: Abstract method "Bio::Tools::Run::WrapperBase::run" is not implemented by package Bio::Tools::Run::Alignment::Clustalw. This is not your fault - author of Bio::Tools::Run::Alignment::Clustalw should be blamed! STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.1/Bio/Root/Root.pm:359 STACK: Bio::Root::RootI::throw_not_implemented /usr/lib/perl5/site_perl/5.8.1/Bio/Root/RootI.pm:522 STACK: Bio::Tools::Run::WrapperBase::run /usr/lib/perl5/site_perl/5.8.1/Bio/Tools/Run/WrapperBase.pm:95 STACK: clustal1.pl:32 ---------------------------------------------------------------- I know I must ber doing a very simple erro, but I can not run the example. Can anybody give an advice/ Thanks in advance, L. Pardo From bix at sendu.me.uk Mon Mar 5 07:41:19 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 05 Mar 2007 12:41:19 +0000 Subject: [Bioperl-l] frac_aligned_query returning results >1. In-Reply-To: <44255ea80703030441u1034aec4h19b9d93a3f74cc33@mail.gmail.com> References: <44255ea80703011002l245e9576s66319ee695d3bd5b@mail.gmail.com> <8D7B0767-46A6-4083-B94E-D6490B241B84@uiuc.edu> <45E809A5.9060407@sendu.me.uk> <44255ea80703020336y7e423b94rd07acd380fe4b8fd@mail.gmail.com> <45E84B3C.5000402@sendu.me.uk> <44255ea80703020814v31495221i72f65db532c0dd9b@mail.gmail.com> <8f200b4c0703021500k8fbaa8cj7af8971389e7379@mail.gmail.com> <44255ea80703030441u1034aec4h19b9d93a3f74cc33@mail.gmail.com> Message-ID: <45EC0FEF.7050308@sendu.me.uk> Thiago Venancio wrote: > Hi all. > > Sorry about this, but the bug persists. Although the number of > problematic cases is too low (3 out of 35139), they are present. > > Please find attached an example buggy blast report. > > The line I use to call the function is: > print $result->query_name."\t".$hit->frac_aligned_query."\n"; > > The warning bellow is still appearing a lot of times during processing > reports, so I think it is not due to the same bug. > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Undefined sub-sequence (821,821). Valid range = 778 - 821 > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:328 > STACK: Bio::Search::HSP::HSPI::matches > /usr/share/perl5/Bio/Search/HSP/HSPI.pm:711 > STACK: Bio::Search::SearchUtils::_adjust_contigs > /usr/share/perl5/Bio/Search/SearchUtils.pm:421 > STACK: Bio::Search::SearchUtils::tile_hsps > /usr/share/perl5/Bio/Search/SearchUtils.pm:200 > STACK: Bio::Search::Hit::GenericHit::frac_aligned_query > /usr/share/perl5/Bio/Search/Hit/GenericHit.pm:1145 > STACK: ./geraStatGenome.pl:34 > ----------------------------------------------------------- > > I have checked the code, but I have no idea about what is happening in > this case. the attached file produced the ">1" result and pops the > exception error, so it could be useful. Are you sure you attached the correct file? Are you sure you're using the latest version of all relevant modules (GenericHit, SearchUtils, HSPI, SearchIO, SearchIO::blast, possibly others)? What is the exact code you're using when you generate the problems? I see nothing wrong on my end: correct answer and no exception warning. ---> thiago.pl #!/usr/bin/perl -w use warnings; use strict; use Bio::SearchIO; my $sin = Bio::SearchIO->new(-format => 'blast', -file => 'buggyBlast.txt'); my $result = $sin->next_result; my $hit = $result->next_hit; print $result->query_name."\t".$hit->frac_aligned_query."\n"; exit; <--- $ perl thiago.pl AEDES_05359.C 1.00 $ From thiago.venancio at gmail.com Mon Mar 5 07:56:08 2007 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Mon, 5 Mar 2007 09:56:08 -0300 Subject: [Bioperl-l] frac_aligned_query returning results >1. In-Reply-To: <45EC0FEF.7050308@sendu.me.uk> References: <44255ea80703011002l245e9576s66319ee695d3bd5b@mail.gmail.com> <8D7B0767-46A6-4083-B94E-D6490B241B84@uiuc.edu> <45E809A5.9060407@sendu.me.uk> <44255ea80703020336y7e423b94rd07acd380fe4b8fd@mail.gmail.com> <45E84B3C.5000402@sendu.me.uk> <44255ea80703020814v31495221i72f65db532c0dd9b@mail.gmail.com> <8f200b4c0703021500k8fbaa8cj7af8971389e7379@mail.gmail.com> <44255ea80703030441u1034aec4h19b9d93a3f74cc33@mail.gmail.com> <45EC0FEF.7050308@sendu.me.uk> Message-ID: <44255ea80703050456t7bddc7d6j6afe24776f039df@mail.gmail.com> Hi all, So I deduce the problem is in my end, but I have updated all the packages through CPAN, using the command: perl -MCPAN -e "install Bundle::BioPerl" I think this should update all tthe essencial packages, no ? Thanks, Thiago On 3/5/07, Sendu Bala wrote: > > Thiago Venancio wrote: > > Hi all. > > > > Sorry about this, but the bug persists. Although the number of > > problematic cases is too low (3 out of 35139), they are present. > > > > Please find attached an example buggy blast report. > > > > The line I use to call the function is: > > print $result->query_name."\t".$hit->frac_aligned_query."\n"; > > > > The warning bellow is still appearing a lot of times during processing > > reports, so I think it is not due to the same bug. > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Undefined sub-sequence (821,821). Valid range = 778 - 821 > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:328 > > STACK: Bio::Search::HSP::HSPI::matches > > /usr/share/perl5/Bio/Search/HSP/HSPI.pm:711 > > STACK: Bio::Search::SearchUtils::_adjust_contigs > > /usr/share/perl5/Bio/Search/SearchUtils.pm:421 > > STACK: Bio::Search::SearchUtils::tile_hsps > > /usr/share/perl5/Bio/Search/SearchUtils.pm:200 > > STACK: Bio::Search::Hit::GenericHit::frac_aligned_query > > /usr/share/perl5/Bio/Search/Hit/GenericHit.pm:1145 > > STACK: ./geraStatGenome.pl:34 > > ----------------------------------------------------------- > > > > I have checked the code, but I have no idea about what is happening in > > this case. the attached file produced the ">1" result and pops the > > exception error, so it could be useful. > > Are you sure you attached the correct file? Are you sure you're using > the latest version of all relevant modules (GenericHit, SearchUtils, > HSPI, SearchIO, SearchIO::blast, possibly others)? What is the exact > code you're using when you generate the problems? > > I see nothing wrong on my end: correct answer and no exception warning. > > ---> thiago.pl > > #!/usr/bin/perl -w > use warnings; > use strict; > > use Bio::SearchIO; > > my $sin = Bio::SearchIO->new(-format => 'blast', > -file => 'buggyBlast.txt'); > my $result = $sin->next_result; > my $hit = $result->next_hit; > > print $result->query_name."\t".$hit->frac_aligned_query."\n"; > > exit; > <--- > > > $ perl thiago.pl > AEDES_05359.C 1.00 > $ > > -- "The way to get started is to quit talking and begin doing." Walt Disney ======================== Thiago Motta Venancio, MSc PhD student in Bioinformatics University of Sao Paulo ======================== From bix at sendu.me.uk Mon Mar 5 08:01:40 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 05 Mar 2007 13:01:40 +0000 Subject: [Bioperl-l] frac_aligned_query returning results >1. In-Reply-To: <44255ea80703050456t7bddc7d6j6afe24776f039df@mail.gmail.com> References: <44255ea80703011002l245e9576s66319ee695d3bd5b@mail.gmail.com> <8D7B0767-46A6-4083-B94E-D6490B241B84@uiuc.edu> <45E809A5.9060407@sendu.me.uk> <44255ea80703020336y7e423b94rd07acd380fe4b8fd@mail.gmail.com> <45E84B3C.5000402@sendu.me.uk> <44255ea80703020814v31495221i72f65db532c0dd9b@mail.gmail.com> <8f200b4c0703021500k8fbaa8cj7af8971389e7379@mail.gmail.com> <44255ea80703030441u1034aec4h19b9d93a3f74cc33@mail.gmail.com> <45EC0FEF.7050308@sendu.me.uk> <44255ea80703050456t7bddc7d6j6afe24776f039df@mail.gmail.com> Message-ID: <45EC14B4.5040002@sendu.me.uk> Thiago Venancio wrote: > Hi all, > > So I deduce the problem is in my end, but I have updated all the > packages through CPAN, using the command: > > perl -MCPAN -e "install Bundle::BioPerl" > > I think this should update all tthe essencial packages, no ? No: Bundle::BioPerl doesn't install BioPerl, only some of its optional external dependencies (and is out of date in any case). Since you need the latest version of everything, get a complete CVS checkout: http://www.bioperl.org/wiki/Using_CVS#Instructions_for_downloading_any_BioPerl_repository_using_anonymous_CVS Or install BioPerl 1.5.2 and overwrite just the needed modules I outlined from CVS: http://www.bioperl.org/wiki/Installing_BioPerl From lubapardo at gmail.com Mon Mar 5 08:40:51 2007 From: lubapardo at gmail.com (Luba Pardo) Date: Mon, 5 Mar 2007 14:40:51 +0100 Subject: [Bioperl-l] Problems with clustalw Message-ID: <58ff33550703050540u77ab7553peeb0eeafa8079701@mail.gmail.com> Hello, I am learning how to use the module of Bio::Tools::Run::Alignment::Clustalw. I started to run the example provided at the module documentation, but it does not work with my input file. the script is use warnings; BEGIN { $ENV{CLUSTALDIR} = '/home/luba/bin/clustalx1.82.linux/';} use Bio::Tools::Run::Alignment::Clustalw; #use Bio::Root::Root::Run::WrapperBase; # Build a clustalw alignment factory my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); # Pass the factory a list of sequences to be aligned. my $inputfilename = 'cysprot.fa'; my $aln = $factory->align($inputfilename); # $aln is a SimpleAlign object. # or my $seq_array_ref = \@seq_array; # where @seq_array is an array of Bio::Seq objects my $aln = $factory->align($seq_array_ref); # Or one can pass the factory a pair of (sub)alignments #to be aligned against each other, e.g.: my $aln = $factory->profile_align($aln1,$aln2); # where $aln1 and $aln2 are Bio::SimpleAlign objects. # Or one can pass the factory an alignment and one or more unaligned # sequences to be added to the alignment. For example: my $aln = $factory->profile_align($aln1,$seq); # $seq is a Bio::Seq object. # Get a tree of the sequences my $tree = $factory->tree(\@seq_array); # Get both an alignment and a tree my ($aln, $tree) = $factory->run(\@seq_array); # Do a footprinting analysis on the supplied sequences, getting back the # most conserved sub-alignments my @results = $factory->footprint(\@seq_array); foreach my $result (@results) { print $result->consensus_string, "\n"; } ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Bad input data (sequences need an id ) or less than 2 sequences in ARRAY(0x8861280) ! STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.1/Bio/Root/Root.pm:359 STACK: Bio::Tools::Run::Alignment::Clustalw::align /usr/lib/perl5/site_perl/5.8.1/Bio/Tools/Run/Alignment/Clustalw.pm:484 STACK: clustal1.pl:17 ----------------------------------------------------------- The input file is OK as I run the program clustalw with my input file of fasta sequences and it worked. Also if I leave the sentence # $aln = $factory->align($seq_array_ref); as a comment I get another error: EXCEPTION: Bio::Root::NotImplemented ------------- MSG: Abstract method "Bio::Tools::Run::WrapperBase::run" is not implemented by package Bio::Tools::Run::Alignment::Clustalw. This is not your fault - author of Bio::Tools::Run::Alignment::Clustalw should be blamed! STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.1/Bio/Root/Root.pm:359 STACK: Bio::Root::RootI::throw_not_implemented /usr/lib/perl5/site_perl/5.8.1/Bio/Root/RootI.pm:522 STACK: Bio::Tools::Run::WrapperBase::run /usr/lib/perl5/site_perl/5.8.1/Bio/Tools/Run/WrapperBase.pm:95 STACK: clustal1.pl:32 ---------------------------------------------------------------- I know I must ber doing a very simple erro, but I can not run the example. Can anybody give an advice/ Thanks in advance, L. Pardo From thiago.venancio at gmail.com Mon Mar 5 08:43:06 2007 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Mon, 5 Mar 2007 10:43:06 -0300 Subject: [Bioperl-l] frac_aligned_query returning results >1. In-Reply-To: <45EC14B4.5040002@sendu.me.uk> References: <44255ea80703011002l245e9576s66319ee695d3bd5b@mail.gmail.com> <45E809A5.9060407@sendu.me.uk> <44255ea80703020336y7e423b94rd07acd380fe4b8fd@mail.gmail.com> <45E84B3C.5000402@sendu.me.uk> <44255ea80703020814v31495221i72f65db532c0dd9b@mail.gmail.com> <8f200b4c0703021500k8fbaa8cj7af8971389e7379@mail.gmail.com> <44255ea80703030441u1034aec4h19b9d93a3f74cc33@mail.gmail.com> <45EC0FEF.7050308@sendu.me.uk> <44255ea80703050456t7bddc7d6j6afe24776f039df@mail.gmail.com> <45EC14B4.5040002@sendu.me.uk> Message-ID: <44255ea80703050543m6bdb4e5bq2c32b1bb6e200eb8@mail.gmail.com> Hi Sendu, You absolutely right !! I have updated all the packages and it's fine now. I have done this process several times, but this time, I don't know why I did things in this way. Thanks. Thiago On 3/5/07, Sendu Bala wrote: > > Thiago Venancio wrote: > > Hi all, > > > > So I deduce the problem is in my end, but I have updated all the > > packages through CPAN, using the command: > > > > perl -MCPAN -e "install Bundle::BioPerl" > > > > I think this should update all tthe essencial packages, no ? > > No: Bundle::BioPerl doesn't install BioPerl, only some of its optional > external dependencies (and is out of date in any case). > > Since you need the latest version of everything, get a complete CVS > checkout: > > http://www.bioperl.org/wiki/Using_CVS#Instructions_for_downloading_any_BioPerl_repository_using_anonymous_CVS > > Or install BioPerl 1.5.2 and overwrite just the needed modules I > outlined from CVS: > http://www.bioperl.org/wiki/Installing_BioPerl > -- "The way to get started is to quit talking and begin doing." Walt Disney ======================== Thiago Motta Venancio, MSc PhD student in Bioinformatics University of Sao Paulo ======================== From cjfields at uiuc.edu Mon Mar 5 09:11:37 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 5 Mar 2007 08:11:37 -0600 Subject: [Bioperl-l] frac_aligned_query returning results >1. In-Reply-To: <44255ea80703050543m6bdb4e5bq2c32b1bb6e200eb8@mail.gmail.com> References: <44255ea80703011002l245e9576s66319ee695d3bd5b@mail.gmail.com> <45E809A5.9060407@sendu.me.uk> <44255ea80703020336y7e423b94rd07acd380fe4b8fd@mail.gmail.com> <45E84B3C.5000402@sendu.me.uk> <44255ea80703020814v31495221i72f65db532c0dd9b@mail.gmail.com> <8f200b4c0703021500k8fbaa8cj7af8971389e7379@mail.gmail.com> <44255ea80703030441u1034aec4h19b9d93a3f74cc33@mail.gmail.com> <45EC0FEF.7050308@sendu.me.uk> <44255ea80703050456t7bddc7d6j6afe24776f039df@mail.gmail.com> <45EC14B4.5040002@sendu.me.uk> <44255ea80703050543m6bdb4e5bq2c32b1bb6e200eb8@mail.gmail.com> Message-ID: <3196D1F3-7039-4E5B-BDFB-F7BDCEBD616F@uiuc.edu> Good to hear. Sendu's fix unfortunately doesn't work for the bug Torsten posted in Bugzilla (which are BLASTP reports, I believe), but I'll try a clean update to see if it changes anything. chris On Mar 5, 2007, at 7:43 AM, Thiago Venancio wrote: > Hi Sendu, > > You absolutely right !! > > I have updated all the packages and it's fine now. I have done this > process > several times, but this time, I don't know why I did things in this > way. > > Thanks. > > Thiago > > On 3/5/07, Sendu Bala wrote: >> >> Thiago Venancio wrote: >>> Hi all, >>> >>> So I deduce the problem is in my end, but I have updated all the >>> packages through CPAN, using the command: >>> >>> perl -MCPAN -e "install Bundle::BioPerl" >>> >>> I think this should update all tthe essencial packages, no ? >> >> No: Bundle::BioPerl doesn't install BioPerl, only some of its >> optional >> external dependencies (and is out of date in any case). >> >> Since you need the latest version of everything, get a complete CVS >> checkout: >> >> http://www.bioperl.org/wiki/ >> Using_CVS#Instructions_for_downloading_any_BioPerl_repository_using_a >> nonymous_CVS >> >> Or install BioPerl 1.5.2 and overwrite just the needed modules I >> outlined from CVS: >> http://www.bioperl.org/wiki/Installing_BioPerl >> > > > > -- > "The way to get started is to quit talking and begin doing." > Walt Disney > > ======================== > Thiago Motta Venancio, MSc > PhD student in Bioinformatics > University of Sao Paulo > ======================== > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From lubapardo at gmail.com Tue Mar 6 05:56:52 2007 From: lubapardo at gmail.com (Luba Pardo) Date: Tue, 6 Mar 2007 11:56:52 +0100 Subject: [Bioperl-l] clustalw Message-ID: <58ff33550703060256r3a9881ecgf08ef1b4ca5d2045@mail.gmail.com> Hello, I tried to post this question yesterday (sorry if you get the email several times). I am trying to run a script for Clustalw based on few examples. I always get an error: EXCEPTION: Bio::Root::Exception ------------- MSG: Bad input data (sequences need an id ) or less than 2 sequences in ARRAY(0x8861280) ! STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.1/Bio/Root/Root.pm:359 STACK: Bio::Tools::Run::Alignment::Clustalw::align /usr/lib/perl5/site_perl/5.8.1/Bio/Tools/Run/Alignment/Clustalw.pm:484 STACK: clustal1.pl:17 or EXCEPTION: Bio::Root::NotImplemented ------------- MSG: Abstract method "Bio::Tools::Run::WrapperBase::run" is not implemented by package Bio::Tools::Run::Alignment::Clustalw. This is not your fault - author of Bio::Tools::Run::Alignment::Clustalw should be blamed! No matter if I used a scalar inputfile or a SeqIO object. I also run the clustalw.pl script, but I can not run the example number 3. Is there anything going with the clustalw module? Because clustalw runs OK with my files if I use it without Bioperl. THIS IS THE SCRIPT BEGIN {$ENV{CLUSTALDIR} = '/home/luba/bin/clustalx1.82.linux/';} use Bio::SeqIO; use Bio::Tools::Run::Alignment::Clustalw; use Bio::SimpleAlign; use Bio::AlignIO; #use strict; use warnings; my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); my $str = Bio::SeqIO->new(-file=> 'clustalw.fa ', '-format' => 'Fasta'); my @seq_array =(); while ( my $seq = $str->next_seq() ) {push (@seq_array, $seq) ;} my $seq_array_ref = \@seq_array; my $aln = $factory->align($seq_array_ref); # Get a tree of the sequences $tree = $factory->tree(\@seq_array); # Get both an alignment and a tree ($aln, $tree) = $factory->run(\@seq_array); # Do a footprinting analysis on the supplied sequences, getting back the # most conserved sub-alignments my @results = $factory->footprint(\@seq_array); foreach my $result (@results) { print $result->consensus_string, "\n"; } Thanks in advance, L. Pardo From bix at sendu.me.uk Tue Mar 6 08:33:59 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 06 Mar 2007 13:33:59 +0000 Subject: [Bioperl-l] clustalw In-Reply-To: <58ff33550703060256r3a9881ecgf08ef1b4ca5d2045@mail.gmail.com> References: <58ff33550703060256r3a9881ecgf08ef1b4ca5d2045@mail.gmail.com> Message-ID: <45ED6DC7.80204@sendu.me.uk> Luba Pardo wrote: > Hello, > I tried to post this question yesterday (sorry if you get the email several > times). We did. Please only send one email and trust that it will make it to the list. > I am trying to run a script for Clustalw based on few examples. I always get > an error: > > EXCEPTION: Bio::Root::Exception ------------- > MSG: Bad input data (sequences need an id ) or less than 2 sequences in > ARRAY(0x8861280) ! > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.1/Bio/Root/Root.pm:359 > STACK: Bio::Tools::Run::Alignment::Clustalw::align > /usr/lib/perl5/site_perl/5.8.1/Bio/Tools/Run/Alignment/Clustalw.pm:484 > STACK: clustal1.pl:17 As the Exception message states, you probably didn't supply 2 or more sequences. See how many elements @seq_array has after your while loop. What exactly is 'clustalw.fa'? Is it really a plain, unaligned multi-fasta file with 2 or more sequences in it? > or > > EXCEPTION: Bio::Root::NotImplemented ------------- > MSG: Abstract method "Bio::Tools::Run::WrapperBase::run" is not implemented > by package Bio::Tools::Run::Alignment::Clustalw. > This is not your fault - author of Bio::Tools::Run::Alignment::Clustalw > should be blamed! You're using code from the synopsis of the 'live' (latest, CVS-only) version of Bio::Tools::Run::Alignment::Clustalw but do not have that version installed. The run() method was only added recently. If you actually want the run() method, update the Clustalw module from CVS. http://www.bioperl.org/wiki/Getting_BioPerl#CVS > THIS IS THE SCRIPT > > BEGIN {$ENV{CLUSTALDIR} = '/home/luba/bin/clustalx1.82.linux/';} > > use Bio::SeqIO; > use Bio::Tools::Run::Alignment::Clustalw; > use Bio::SimpleAlign; > use Bio::AlignIO; > #use strict; > use warnings; > > my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); > my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); > > my $str = Bio::SeqIO->new(-file=> 'clustalw.fa ', '-format' => 'Fasta'); > my @seq_array =(); > while ( my $seq = $str->next_seq() ) {push (@seq_array, $seq) ;} > my $seq_array_ref = \@seq_array; > > my $aln = $factory->align($seq_array_ref); > > # Get a tree of the sequences > $tree = $factory->tree(\@seq_array); > > # Get both an alignment and a tree > ($aln, $tree) = $factory->run(\@seq_array); > > # Do a footprinting analysis on the supplied sequences, getting back the > # most conserved sub-alignments > my @results = $factory->footprint(\@seq_array); > foreach my $result (@results) { > print $result->consensus_string, "\n"; > } You need to learn to read and understand the synopsis code before trying to use it. The synopsis code usually isn't intended to be used whole-sale. Rather, as in this case, it demonstrates a few useful things that might not make sense all in the same script. So there's no need for you to get an alignment with the align() method, a tree with the tree() method and then get the alignment and tree again with the run() method. You also don't need to do footprinting with footprint() unless you're actually interested in footprinting! tree() and footprint() won't work for you because, again, those are recent additions to the module. Upgrade from CVS if you really want to footprint. From bix at sendu.me.uk Tue Mar 6 10:30:32 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 06 Mar 2007 15:30:32 +0000 Subject: [Bioperl-l] clustalw In-Reply-To: <58ff33550703060610i5acdb035w340eaff9b6e664cc@mail.gmail.com> References: <58ff33550703060256r3a9881ecgf08ef1b4ca5d2045@mail.gmail.com> <45ED6DC7.80204@sendu.me.uk> <58ff33550703060610i5acdb035w340eaff9b6e664cc@mail.gmail.com> Message-ID: <45ED8918.6050104@sendu.me.uk> Luba Pardo wrote: > I am sorry if my questions are too simple or if this is something I > should not be asking. They're not too simple, and certainly are appropriate for the Bioperl mailing list. You should keep your replies on the list as well so others have an opportunity to answer them. > I have 8 sequences in Fasta format. The format is exactly the same as in > other files (e.g. the ones provided in the examples), and I got the > same error. For the code you gave earlier, again you will need to update the Clustalw module. Download it from the website and replace your existing copy. If you need help with that, ask and I'm sure someone can give directions. > Regarding the tree and the footprinting, well, I need the alignment > (together with the tree) to use the PALM software. I did not have the > way to see whether I needed to use CSV, sorry for that too. > > Indeed, I tried even simpler things like this: > > > > #use strict; > use warnings; > > BEGIN {$ENV{CLUSTALDIR} = '/home/luba/bin/clustalx1.82.linux/';} > use Bio::SeqIO; > use Bio::Tools::Run::Alignment::Clustalw; > use Bio::AlignIO; > > my $seq1 = new Bio::PrimarySeq(-seq => > 'MAVNPELAPFTLSRGIPSFDDQALSTIIQLQDCIQQAIQQLNYSTAEFLAELLYAECSILDKSSVYWSDAVYLYALSLFLNKSYHTAFQISKEFKEYHLGIAYIFGRCALQLSQGVNEAILTLLSIINVFSSNSSNTRINMVLNSNLVHIPDLATLNCLLGNLYMKLDHSKEGAFYHSEALAINPYLWESYEAICKMRATVDLKRVFFDIAGKKSNSHNNNAASSFPSTSLSHFEPRSQPSLYSKTNKNGNNNINNNVNTLFQSSNSPPSTSASSFSSIQHFSRSQQQQANTSIRTCQNKNTQTPKNPAINSKTSSALPNNISMNLVSPSSKQPTISSLAKVYNRNKLLTTPPSKLLNNDRNHQNNNNNNNNNNNNNNNNNNNNNNNNIINKTTFKTPRNLYSSTGRLTTSKKNPRSLIISNSILTSDYQITLPEIMYNFALILRSSSQYNSFKAIRLFESQIPSHIKDTMPWCLVQLGKLHFEIINYDMSLKYFNRLKDLQPARVKDMEIFSTLLWHLHDKVKSSNLANGLMDTMPNKPETWCCIGNLLSLQKDHDAAIKAFEKATQLDPNFAYAYTLQGHEHSSNDSSDSAKTCYRKALACDPQHYNAYYGLGTSAMKLGQYEEALLYFEKARSINPVNVVLICCCGGSLEKLGYKEKALQYYELACHLQPTSSLSKYKMGQLLYSMTRYNVALQTFEELVKLVPDDATAHYLLGQTYRIVGRKKDAIKELTVAMNLDPKGNQVIIDELQKCHMQE', > -id => 'seq1'); > > my $seq2 = new Bio::PrimarySeq( -seq => > 'CLIFXRLLLIQMIHPQARRAFTFLQQQEPYRIQSMEQLSTLLWHLADLPALSHLSQSLISISRSSPQAWIAVGNCFSLQKDHDEAMRCFRRATQVDEGCAYAWTLCGYEAVEMEEYERAMAFYRTAIRTDARHYNAWYVLFFFFFFFFVPGDIDSXPKKGMEWGXFISKRIDRGMRSIILKEPSKSIQLIPFFYVALVWXVGVSSYPLETMTNIDFPKKKKALEKSNDVVQALHFYERASKYAPTSAMVQFKRIRALVALQRYDEAISALVPLTHSAPDEANVFFLLGKCLLKKERRQEATMAFTNARELEPK', > -id => 'seq2'); > > my $factory = new Bio::Tools::Run::Alignment::Clustalw('ktuple' => 2, > 'matrix' => 'BLOSUM'); > > my $aln = $factory->align([$seq1,$seq2]); > > my $alignout ->write_aln($aln); > > print " $alignout is the alignment\n"; > > > but it did not work either. Look at the error message you get. It would have said something like 'Can't call method "write_aln" on an undefined value at ...' You're trying to call the method write_aln() on something that doesn't exist ($alignout). You need to create $alignout first as an instance of an object that has a write_aln() method, then you can call the method. write_aln() belongs to Bio::AlignIO, so do: my $alignout = Bio::AlignIO->new(-format => 'clustalw', -file => ">test.aln"); $alignout->write_aln($aln); The alignment will now be in the file 'test.aln'. From bix at sendu.me.uk Wed Mar 7 09:36:41 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 07 Mar 2007 14:36:41 +0000 Subject: [Bioperl-l] frac_aligned_query returning results >1. In-Reply-To: <3196D1F3-7039-4E5B-BDFB-F7BDCEBD616F@uiuc.edu> References: <44255ea80703011002l245e9576s66319ee695d3bd5b@mail.gmail.com> <45E809A5.9060407@sendu.me.uk> <44255ea80703020336y7e423b94rd07acd380fe4b8fd@mail.gmail.com> <45E84B3C.5000402@sendu.me.uk> <44255ea80703020814v31495221i72f65db532c0dd9b@mail.gmail.com> <8f200b4c0703021500k8fbaa8cj7af8971389e7379@mail.gmail.com> <44255ea80703030441u1034aec4h19b9d93a3f74cc33@mail.gmail.com> <45EC0FEF.7050308@sendu.me.uk> <44255ea80703050456t7bddc7d6j6afe24776f039df@mail.gmail.com> <45EC14B4.5040002@sendu.me.uk> <44255ea80703050543m6bdb4e5bq2c32b1bb6e200eb8@mail.gmail.com> <3196D1F3-7039-4E5B-BDFB-F7BDCEBD616F@uiuc.edu> Message-ID: <45EECDF9.4050204@sendu.me.uk> Chris Fields wrote: > Good to hear. Sendu's fix unfortunately doesn't work for the bug > Torsten posted in Bugzilla (which are BLASTP reports, I believe), but > I'll try a clean update to see if it changes anything. Those warnings should now be fixed as well. Let me know if not. From cjfields at uiuc.edu Wed Mar 7 09:40:42 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Mar 2007 08:40:42 -0600 Subject: [Bioperl-l] frac_aligned_query returning results >1. In-Reply-To: <45EECDF9.4050204@sendu.me.uk> References: <44255ea80703011002l245e9576s66319ee695d3bd5b@mail.gmail.com> <45E809A5.9060407@sendu.me.uk> <44255ea80703020336y7e423b94rd07acd380fe4b8fd@mail.gmail.com> <45E84B3C.5000402@sendu.me.uk> <44255ea80703020814v31495221i72f65db532c0dd9b@mail.gmail.com> <8f200b4c0703021500k8fbaa8cj7af8971389e7379@mail.gmail.com> <44255ea80703030441u1034aec4h19b9d93a3f74cc33@mail.gmail.com> <45EC0FEF.7050308@sendu.me.uk> <44255ea80703050456t7bddc7d6j6afe24776f039df@mail.gmail.com> <45EC14B4.5040002@sendu.me.uk> <44255ea80703050543m6bdb4e5bq2c32b1bb6e200eb8@mail.gmail.com> <3196D1F3-7039-4E5B-BDFB-F7BDCEBD616F@uiuc.edu> <45EECDF9.4050204@sendu.me.uk> Message-ID: Yep, fixed now. Thanks Sendu! chris On Mar 7, 2007, at 8:36 AM, Sendu Bala wrote: > Chris Fields wrote: >> Good to hear. Sendu's fix unfortunately doesn't work for the bug >> Torsten posted in Bugzilla (which are BLASTP reports, I believe), >> but I'll try a clean update to see if it changes anything. > > Those warnings should now be fixed as well. Let me know if not. > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bertrand.beckert at gmail.com Wed Mar 7 11:26:36 2007 From: bertrand.beckert at gmail.com (bertrand beckert) Date: Wed, 7 Mar 2007 17:26:36 +0100 Subject: [Bioperl-l] problem for Bio::Tools::Run::RNAMotif Message-ID: <500217090703070826m66e29f52k83a06866738df9d6@mail.gmail.com> hello, I am using RNAmotif suite of programs... I just try the module Bio::Tools::Run::RNAMotif, the lastest version ( RNAMotif.pm,v 1.4 2007/02/07--) In order to test how it function i just try the short example... but as you can imagine it don't work.... error: laptop:~/rnamot_test$ ./test_rnamotif.pl Can't locate object method "_set_from_args" via package "Bio::Tools::Run::RNAMotif" at /usr/local/share/perl/5.8.8/Bio/Tools/Run/RNAMotif.pm line 176, line 1. laptop:~/rnamot_test$ could you help me? thanks Bertrand BECKERT IBMC - UPR 9002 du CNRS - ARN 15, rue Rene Descartes F-67084 STRASBOURG Cedex bertrand.beckert at gmail.com From cjfields at uiuc.edu Wed Mar 7 11:55:31 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Mar 2007 10:55:31 -0600 Subject: [Bioperl-l] problem for Bio::Tools::Run::RNAMotif In-Reply-To: <500217090703070826m66e29f52k83a06866738df9d6@mail.gmail.com> References: <500217090703070826m66e29f52k83a06866738df9d6@mail.gmail.com> Message-ID: <525AA65D-EEF8-4BF3-8DD7-3155E3D86E3F@uiuc.edu> If you're running Bio::Tools::Run::RNAMotif (which is only in CVS currently) and BioPerl 1.5.2 you're essentially mixing versions, so I think you'll need to update your core bioperl from CVS. I can't remember precisely but I think _set_from_args() was committed to CVS after the last BioPerl release (1.5.2); the last bioperl-run release was also 1.5.2, which lacked my wrapper modules for RNAMotif, Infernal, etc. Note I'm still working on those modules (Infernal, RNAMotif, ERPIN, etc), hence the reason they weren't in the last release. It's possible I may change some of the interface design to make it more straightforward (ERPIN and Infernal in particular). chris PS- let me know how it works out. On Mar 7, 2007, at 10:26 AM, bertrand beckert wrote: > hello, > > I am using RNAmotif suite of programs... > I just try the module Bio::Tools::Run::RNAMotif, the lastest version ( > RNAMotif.pm,v 1.4 2007/02/07--) > > In order to test how it function i just try the short example... > but as > you can imagine it don't work.... > > error: > laptop:~/rnamot_test$ ./test_rnamotif.pl > Can't locate object method "_set_from_args" via package > "Bio::Tools::Run::RNAMotif" at > /usr/local/share/perl/5.8.8/Bio/Tools/Run/RNAMotif.pm line 176, > line > 1. > laptop:~/rnamot_test$ > > could you help me? > > thanks > > > Bertrand BECKERT > IBMC - UPR 9002 du CNRS - ARN > 15, rue Rene Descartes > F-67084 STRASBOURG Cedex > > bertrand.beckert at gmail.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Wed Mar 7 11:59:54 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 07 Mar 2007 16:59:54 +0000 Subject: [Bioperl-l] problem for Bio::Tools::Run::RNAMotif In-Reply-To: <500217090703070826m66e29f52k83a06866738df9d6@mail.gmail.com> References: <500217090703070826m66e29f52k83a06866738df9d6@mail.gmail.com> Message-ID: <45EEEF8A.2040107@sendu.me.uk> bertrand beckert wrote: > hello, > > I am using RNAmotif suite of programs... > I just try the module Bio::Tools::Run::RNAMotif, the lastest version ( > RNAMotif.pm,v 1.4 2007/02/07--) > > In order to test how it function i just try the short example... but as > you can imagine it don't work.... > > error: > laptop:~/rnamot_test$ ./test_rnamotif.pl > Can't locate object method "_set_from_args" via package > "Bio::Tools::Run::RNAMotif" at > /usr/local/share/perl/5.8.8/Bio/Tools/Run/RNAMotif.pm line 176, line > 1. > laptop:~/rnamot_test$ > > could you help me? You'll also need to update Bio::Root::RootI and Bio::Tools::Run::WrapperBase from CVS. From n.haigh at sheffield.ac.uk Thu Mar 8 06:49:20 2007 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Thu, 08 Mar 2007 11:49:20 +0000 Subject: [Bioperl-l] Devel::Cover Message-ID: <45EFF840.6010300@sheffield.ac.uk> I seem to remember Devel::Cover being mentioned somewhere as a means to see how well BioPerl code coverage was with tests. I've just been having a play around with this for an application I'm writing, and the code coverage metrics it produces are/have been pretty useful in determining where additional tests are needed in order to fully stress test my modules/application. I was just wondering if anyone got around to looking at this for BioPerl? I'm just running the tests at the moment on bioperl-live. If anyone is interested in the results, I can put them on a webserver? Cheers Nath From cjfields at uiuc.edu Thu Mar 8 07:13:52 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 8 Mar 2007 06:13:52 -0600 Subject: [Bioperl-l] Devel::Cover In-Reply-To: <45EFF840.6010300@sheffield.ac.uk> References: <45EFF840.6010300@sheffield.ac.uk> Message-ID: <4E14E34B-1294-4FD6-8110-C3CEFD6ACDE8@uiuc.edu> On Mar 8, 2007, at 5:49 AM, Nathan Haigh wrote: > I seem to remember Devel::Cover being mentioned somewhere as a > means to > see how well BioPerl code coverage was with tests. > > I've just been having a play around with this for an application I'm > writing, and the code coverage metrics it produces are/have been > pretty > useful in determining where additional tests are needed in order to > fully stress test my modules/application. > > I was just wondering if anyone got around to looking at this for > BioPerl? I'm just running the tests at the moment on bioperl-live. If > anyone is interested in the results, I can put them on a webserver? > > Cheers > Nath I guess it depends on how verbose the output is. We could add it to the wiki if it were small enough, or add it as a routinely updated archived file elsewhere. If you have a script you could add it to bioperl-live in /maintenance. Glancing through tests while updating makes me think we should come up with an advanced HOWTO for writing tests that encompasses Devel::Cover and Test::More. If you have any notes re: Devel::Cover you could start up a new page and add them in. chris From bix at sendu.me.uk Thu Mar 8 07:26:51 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 08 Mar 2007 12:26:51 +0000 Subject: [Bioperl-l] Devel::Cover In-Reply-To: <45EFF840.6010300@sheffield.ac.uk> References: <45EFF840.6010300@sheffield.ac.uk> Message-ID: <45F0010B.7000101@sendu.me.uk> Nathan Haigh wrote: > I seem to remember Devel::Cover being mentioned somewhere as a means to > see how well BioPerl code coverage was with tests. > > I've just been having a play around with this for an application I'm > writing, and the code coverage metrics it produces are/have been pretty > useful in determining where additional tests are needed in order to > fully stress test my modules/application. > > I was just wondering if anyone got around to looking at this for > BioPerl? I'm just running the tests at the moment on bioperl-live. If > anyone is interested in the results, I can put them on a webserver? I was going to get round to it eventually. Build.PL supports these things, making use of Devel::Cover et al.: ./Build help testcover ./Build help testpod ./Build help testpodcoverage I started running ./Build testcover and its amazingly slow. So if you run it and capture the output somewhere that might be useful. Cheers, Sendu. From n.haigh at sheffield.ac.uk Thu Mar 8 07:48:53 2007 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Thu, 08 Mar 2007 12:48:53 +0000 Subject: [Bioperl-l] Devel::Cover In-Reply-To: <45F0010B.7000101@sendu.me.uk> References: <45EFF840.6010300@sheffield.ac.uk> <45F0010B.7000101@sendu.me.uk> Message-ID: <45F00635.7010000@sheffield.ac.uk> Sendu Bala wrote: > Nathan Haigh wrote: >> I seem to remember Devel::Cover being mentioned somewhere as a means to >> see how well BioPerl code coverage was with tests. >> >> I've just been having a play around with this for an application I'm >> writing, and the code coverage metrics it produces are/have been pretty >> useful in determining where additional tests are needed in order to >> fully stress test my modules/application. >> >> I was just wondering if anyone got around to looking at this for >> BioPerl? I'm just running the tests at the moment on bioperl-live. If >> anyone is interested in the results, I can put them on a webserver? > > I was going to get round to it eventually. > > Build.PL supports these things, making use of Devel::Cover et al.: > > ./Build help testcover > ./Build help testpod > ./Build help testpodcoverage > > I started running > > ./Build testcover > > and its amazingly slow. So if you run it and capture the output > somewhere that might be useful. > > > Cheers, > Sendu. Ah, that's another cool thing about Build.PL - I was going about this the long way :-[ I'll post results somewhere shortly and let you know! Nath From cjfields at uiuc.edu Thu Mar 8 08:12:05 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 8 Mar 2007 07:12:05 -0600 Subject: [Bioperl-l] Devel::Cover In-Reply-To: <45F00635.7010000@sheffield.ac.uk> References: <45EFF840.6010300@sheffield.ac.uk> <45F0010B.7000101@sendu.me.uk> <45F00635.7010000@sheffield.ac.uk> Message-ID: <3CB8C952-7493-4B1B-8071-A2B4A5D5A708@uiuc.edu> On Mar 8, 2007, at 6:48 AM, Nathan Haigh wrote: > Sendu Bala wrote: >> ... >> >> I was going to get round to it eventually. >> >> Build.PL supports these things, making use of Devel::Cover et al.: >> >> ./Build help testcover >> ./Build help testpod >> ./Build help testpodcoverage >> >> I started running >> >> ./Build testcover >> >> and its amazingly slow. So if you run it and capture the output >> somewhere that might be useful. >> >> >> Cheers, >> Sendu. > > Ah, that's another cool thing about Build.PL - I was going about this > the long way :-[ > > I'll post results somewhere shortly and let you know! > > Nath It does seem like a better way. Can you run these on single modules subsets of modules? Definitely need to look into Module::Build docs more when I have the time... chris From bix at sendu.me.uk Thu Mar 8 08:20:40 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 08 Mar 2007 13:20:40 +0000 Subject: [Bioperl-l] Devel::Cover In-Reply-To: <3CB8C952-7493-4B1B-8071-A2B4A5D5A708@uiuc.edu> References: <45EFF840.6010300@sheffield.ac.uk> <45F0010B.7000101@sendu.me.uk> <45F00635.7010000@sheffield.ac.uk> <3CB8C952-7493-4B1B-8071-A2B4A5D5A708@uiuc.edu> Message-ID: <45F00DA8.8030308@sendu.me.uk> Chris Fields wrote: > > On Mar 8, 2007, at 6:48 AM, Nathan Haigh wrote: > >> Sendu Bala wrote: > >>> ./Build testcover > >> Ah, that's another cool thing about Build.PL - I was going about this >> the long way :-[ > > It does seem like a better way. Can you run these on single modules > subsets of modules? Yes, same syntax as normal testing. ./Build testcover --test_files t/Ontology.t --test_files t/Map.t > Definitely need to look into Module::Build docs more when I have the > time... I've no idea how to interpret the resulting coverage.html file either. Must read the Devel docs... From n.haigh at sheffield.ac.uk Thu Mar 8 08:23:23 2007 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Thu, 08 Mar 2007 13:23:23 +0000 Subject: [Bioperl-l] Devel::Cover In-Reply-To: <45F00DA8.8030308@sendu.me.uk> References: <45EFF840.6010300@sheffield.ac.uk> <45F0010B.7000101@sendu.me.uk> <45F00635.7010000@sheffield.ac.uk> <3CB8C952-7493-4B1B-8071-A2B4A5D5A708@uiuc.edu> <45F00DA8.8030308@sendu.me.uk> Message-ID: <45F00E4B.1030404@sheffield.ac.uk> -- snip -- >> >> It does seem like a better way. Can you run these on single modules >> subsets of modules? > > Yes, same syntax as normal testing. > > ./Build testcover --test_files t/Ontology.t --test_files t/Map.t > > >> Definitely need to look into Module::Build docs more when I have the >> time... > > I've no idea how to interpret the resulting coverage.html file either. > Must read the Devel docs... Thanks for that Sendu! For some explaination of the coverage see: Devel::Cover::Tutorial http://search.cpan.org/dist/Devel-Cover/lib/Devel/Cover/Tutorial.pod Nath From n.haigh at sheffield.ac.uk Thu Mar 8 09:00:39 2007 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Thu, 08 Mar 2007 14:00:39 +0000 Subject: [Bioperl-l] Devel::Cover In-Reply-To: <45F00DA8.8030308@sendu.me.uk> References: <45EFF840.6010300@sheffield.ac.uk> <45F0010B.7000101@sendu.me.uk> <45F00635.7010000@sheffield.ac.uk> <3CB8C952-7493-4B1B-8071-A2B4A5D5A708@uiuc.edu> <45F00DA8.8030308@sendu.me.uk> Message-ID: <45F01707.1040109@sheffield.ac.uk> Find the Devel::Cover report for bioperl-live here: http://www.bioinf.shef.ac.uk/public/bioperl-live/cover_db/coverage.html First things to note are: 1) There appear to be quite a few modules without POD for all methods (pod column). 2) The test suite doesn't test all subs in the modules (sub column). More rigorous code coverage would be achieved by ensuring tests were designed to execute all statements (stmt column). Details of the other columns can be found in Devel::Cover::Tutorial but essentially inform how well all the different possible routes in condition statements are covered by the tests. Have fun! Nath From bix at sendu.me.uk Thu Mar 8 09:20:16 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 08 Mar 2007 14:20:16 +0000 Subject: [Bioperl-l] Devel::Cover In-Reply-To: <45F01707.1040109@sheffield.ac.uk> References: <45EFF840.6010300@sheffield.ac.uk> <45F0010B.7000101@sendu.me.uk> <45F00635.7010000@sheffield.ac.uk> <3CB8C952-7493-4B1B-8071-A2B4A5D5A708@uiuc.edu> <45F00DA8.8030308@sendu.me.uk> <45F01707.1040109@sheffield.ac.uk> Message-ID: <45F01BA0.3070400@sendu.me.uk> Nathan Haigh wrote: > Find the Devel::Cover report for bioperl-live here: > > http://www.bioinf.shef.ac.uk/public/bioperl-live/cover_db/coverage.html Thanks for that. > First things to note are: > 1) There appear to be quite a few modules without POD for all methods > (pod column). > 2) The test suite doesn't test all subs in the modules (sub column). > > More rigorous code coverage would be achieved by ensuring tests were > designed to execute all statements (stmt column). Details of the other > columns can be found in Devel::Cover::Tutorial but essentially inform > how well all the different possible routes in condition statements are > covered by the tests. > > Have fun! Really interesting, but fun? I'm actually kind of frightened by it ;) In some crazy dream I'd like to see the pod and sub columns at 100% for all modules in time for Bioperl 1.6. From a brief scan though it seems like an incredible amount of work would be needed. Doing the POD would be relatively easy. Does anyone feel inspired to take on that particular challenge I wonder? PS. Did you run this with BIOPERLDEBUG=1 ? I have t/RemoteBlast.t hanging on me atm. From n.haigh at sheffield.ac.uk Thu Mar 8 09:27:49 2007 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Thu, 08 Mar 2007 14:27:49 +0000 Subject: [Bioperl-l] Devel::Cover In-Reply-To: <45F01BA0.3070400@sendu.me.uk> References: <45EFF840.6010300@sheffield.ac.uk> <45F0010B.7000101@sendu.me.uk> <45F00635.7010000@sheffield.ac.uk> <3CB8C952-7493-4B1B-8071-A2B4A5D5A708@uiuc.edu> <45F00DA8.8030308@sendu.me.uk> <45F01707.1040109@sheffield.ac.uk> <45F01BA0.3070400@sendu.me.uk> Message-ID: <45F01D65.70302@sheffield.ac.uk> Sendu Bala wrote: > Nathan Haigh wrote: >> Find the Devel::Cover report for bioperl-live here: >> >> http://www.bioinf.shef.ac.uk/public/bioperl-live/cover_db/coverage.html > > Thanks for that. > > >> First things to note are: >> 1) There appear to be quite a few modules without POD for all methods >> (pod column). >> 2) The test suite doesn't test all subs in the modules (sub column). >> >> More rigorous code coverage would be achieved by ensuring tests were >> designed to execute all statements (stmt column). Details of the other >> columns can be found in Devel::Cover::Tutorial but essentially inform >> how well all the different possible routes in condition statements are >> covered by the tests. >> >> Have fun! > > Really interesting, but fun? I'm actually kind of frightened by it ;) > In some crazy dream I'd like to see the pod and sub columns at 100% > for all modules in time for Bioperl 1.6. From a brief scan though it > seems like an incredible amount of work would be needed. > > Doing the POD would be relatively easy. Does anyone feel inspired to > take on that particular challenge I wonder? > > > PS. Did you run this with BIOPERLDEBUG=1 ? I have t/RemoteBlast.t > hanging on me atm. Yep - ideally, those two columns should all be at 100%. Yes I ran it with BIOPERLDEBUG=1 I didn't watch the output for any anomalies, although I did get a couple of fails: Failed Test Stat Wstat Total Fail Failed List of Failed ------------------------------------------------------------------------------- t/HtSNP.t 2 512 7 14 200.00% 1-7 t/alignUtilities.t 255 65280 35 42 120.00% 15-35 I've just added some JavaScript to allow the table to be sorted by any column - a bit easier to find those low % modules now 8-) - I'm notifying the authors of Devel::Cover as it's nice feature if they include it in their code. I'll have a look at addressing some of the pod issues as and when I get a bit of time. Nath From dmessina at wustl.edu Thu Mar 8 09:49:24 2007 From: dmessina at wustl.edu (Dave Messina) Date: Thu, 08 Mar 2007 08:49:24 -0600 Subject: [Bioperl-l] Devel::Cover In-Reply-To: <45F01D65.70302@sheffield.ac.uk> References: <45EFF840.6010300@sheffield.ac.uk> <45F0010B.7000101@sendu.me.uk> <45F00635.7010000@sheffield.ac.uk> <3CB8C952-7493-4B1B-8071-A2B4A5D5A708@uiuc.edu> <45F00DA8.8030308@sendu.me.uk> <45F01707.1040109@sheffield.ac.uk> <45F01BA0.3070400@sendu.me.uk> <45F01D65.70302@sheffield.ac.uk> Message-ID: <45F02274.4020205@wustl.edu> Great idea, Nathan -- lay bare the code metrics to spur us into action. :) > I'll have a look at addressing some of the pod issues as and when I get > a bit of time. > I'm not sure how much my schedule will permit, but I will chip in on the pods and subs, too. Dave From cjfields at uiuc.edu Thu Mar 8 09:50:45 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 8 Mar 2007 08:50:45 -0600 Subject: [Bioperl-l] Devel::Cover In-Reply-To: <45F01D65.70302@sheffield.ac.uk> References: <45EFF840.6010300@sheffield.ac.uk> <45F0010B.7000101@sendu.me.uk> <45F00635.7010000@sheffield.ac.uk> <3CB8C952-7493-4B1B-8071-A2B4A5D5A708@uiuc.edu> <45F00DA8.8030308@sendu.me.uk> <45F01707.1040109@sheffield.ac.uk> <45F01BA0.3070400@sendu.me.uk> <45F01D65.70302@sheffield.ac.uk> Message-ID: <41618AA5-7642-4AC5-BF50-1F163467BE91@uiuc.edu> On Mar 8, 2007, at 8:27 AM, Nathan Haigh wrote: > Sendu Bala wrote: >> ... >> PS. Did you run this with BIOPERLDEBUG=1 ? I have t/RemoteBlast.t >> hanging on me atm. > > Yep - ideally, those two columns should all be at 100%. Yes I ran it > with BIOPERLDEBUG=1 > > I didn't watch the output for any anomalies, although I did get a > couple > of fails: > > Failed Test Stat Wstat Total Fail Failed List of Failed > ---------------------------------------------------------------------- > --------- > t/HtSNP.t 2 512 7 14 200.00% 1-7 > t/alignUtilities.t 255 65280 35 42 120.00% 15-35 The alignUtilities.t flop is my fault; forgot to remove those when I changed bracket_strings to a SimpleAlign class method. I committed a change to CVS and they should pass now. I'll add those tests back to SimpleAlign when I can. As for other work, I still have more work to do on the RNA_SearchIO tests since they're throwing warnings due to seqfeatures w/o sequences, but they should all pass. The SNP tests pass on my end, but I really need to run a clean checkout and rerun tests. > I've just added some JavaScript to allow the table to be sorted by any > column - a bit easier to find those low % modules now 8-) - I'm > notifying the authors of Devel::Cover as it's nice feature if they > include it in their code. > > I'll have a look at addressing some of the pod issues as and when I > get > a bit of time. > Nath I like that! You should talk to Jason, Mauricio, or Chris D. about adding this to the bioperl website with live updates (like PDOC or Deobfuscator). It would be a valuable resource and give us close to real-time progress on what modules need work, etc. chris From n.haigh at sheffield.ac.uk Thu Mar 8 09:53:14 2007 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Thu, 08 Mar 2007 14:53:14 +0000 Subject: [Bioperl-l] Devel::Cover In-Reply-To: <45F01D65.70302@sheffield.ac.uk> References: <45EFF840.6010300@sheffield.ac.uk> <45F0010B.7000101@sendu.me.uk> <45F00635.7010000@sheffield.ac.uk> <3CB8C952-7493-4B1B-8071-A2B4A5D5A708@uiuc.edu> <45F00DA8.8030308@sendu.me.uk> <45F01707.1040109@sheffield.ac.uk> <45F01BA0.3070400@sendu.me.uk> <45F01D65.70302@sheffield.ac.uk> Message-ID: <45F0235A.7080808@sheffield.ac.uk> -- snip -- > > I've just added some JavaScript to allow the table to be sorted by any > column - a bit easier to find those low % modules now 8-) - I'm > notifying the authors of Devel::Cover as it's nice feature if they > include it in their code. > Should have checked before posting - the sorting should work correctly now! From lubapardo at gmail.com Thu Mar 8 09:56:06 2007 From: lubapardo at gmail.com (Luba Pardo) Date: Thu, 8 Mar 2007 15:56:06 +0100 Subject: [Bioperl-l] Simplealign question Message-ID: <58ff33550703080656hd4c9979w76d867aa8575bc32@mail.gmail.com> Hello all, I am trying to understand the objects of the Bio::SimpleAlign module. I read in the doc of the module that you get an array of Seq objects if you use the each_seq method. So, can you get back the sequences used in the alignment from the align object using the each_seq method? For example: # Extract sequences and check values for the alignment column $pos (this is part of the doc. example) foreach $seq ($aln->each_seq) { } what the $seq is? I printed and is a locatableseq object, but if it is a Seq object, should I be able to get the actual seq string? I tried something like ... my $seq_obj= Bio::Seq->new(); my $x = $seq_obj->seq; but it does not print anything. Does someone know if this makes sense? Regards, L. Pardo From cjfields at uiuc.edu Thu Mar 8 10:08:12 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 8 Mar 2007 09:08:12 -0600 Subject: [Bioperl-l] Devel::Cover In-Reply-To: <45F01BA0.3070400@sendu.me.uk> References: <45EFF840.6010300@sheffield.ac.uk> <45F0010B.7000101@sendu.me.uk> <45F00635.7010000@sheffield.ac.uk> <3CB8C952-7493-4B1B-8071-A2B4A5D5A708@uiuc.edu> <45F00DA8.8030308@sendu.me.uk> <45F01707.1040109@sheffield.ac.uk> <45F01BA0.3070400@sendu.me.uk> Message-ID: On Mar 8, 2007, at 8:20 AM, Sendu Bala wrote: > Nathan Haigh wrote: >> Find the Devel::Cover report for bioperl-live here: >> http://www.bioinf.shef.ac.uk/public/bioperl-live/cover_db/ >> coverage.html > > Thanks for that. > > >> First things to note are: >> 1) There appear to be quite a few modules without POD for all methods >> (pod column). >> 2) The test suite doesn't test all subs in the modules (sub column). >> More rigorous code coverage would be achieved by ensuring tests were >> designed to execute all statements (stmt column). Details of the >> other >> columns can be found in Devel::Cover::Tutorial but essentially inform >> how well all the different possible routes in condition statements >> are >> covered by the tests. >> Have fun! > > Really interesting, but fun? I'm actually kind of frightened by it ;) > In some crazy dream I'd like to see the pod and sub columns at 100% > for all modules in time for Bioperl 1.6. From a brief scan though > it seems like an incredible amount of work would be needed. I'm looking at these not as absolute values as much as a relative indicator of areas that need improvement, but the two areas you mention (pod, sub) should be higher. Things like branch and condition coverage will be much harder to get to 100% and may be unrealistic. Saying that, I think we could aim for threshold values, like have a minimum of ~75-80% for the total, or have a minimal coverage of 80-90% for specific things like subs, POD, etc, with the gold standard being 100%. > Doing the POD would be relatively easy. Does anyone feel inspired > to take on that particular challenge I wonder? Docs? That should be up to the developer, but we know how that goes ; > I think the issue with POD is laid out in the project priority list, but maybe it should be moved up. > PS. Did you run this with BIOPERLDEBUG=1 ? I have t/RemoteBlast.t > hanging on me atm. chris From gyang at plantbio.uga.edu Wed Mar 7 12:24:06 2007 From: gyang at plantbio.uga.edu (Guojun Yang) Date: Wed, 07 Mar 2007 12:24:06 -0500 Subject: [Bioperl-l] =?iso-8859-1?q?How_to_run_Blast_with_a_user_defined_D?= =?iso-8859-1?q?NA_substitution_scoring_matrix=3F?= In-Reply-To: 20060203194450.792e8d4e@dogwood.plantbio.uga.edu Message-ID: <20070307172406.8e82d12a@dogwood.plantbio.uga.edu> Hi, All, I need to run blast using a user defined DNA scoring matrix (may sound funny but I am really serious). Can anybody give me a hint on it? Thanks a lot, Guojun Yang From bix at sendu.me.uk Thu Mar 8 11:32:51 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 08 Mar 2007 16:32:51 +0000 Subject: [Bioperl-l] Simplealign question In-Reply-To: <58ff33550703080656hd4c9979w76d867aa8575bc32@mail.gmail.com> References: <58ff33550703080656hd4c9979w76d867aa8575bc32@mail.gmail.com> Message-ID: <45F03AB3.7050506@sendu.me.uk> Luba Pardo wrote: > Hello all, > I am trying to understand the objects of the Bio::SimpleAlign module. I read > in the doc of the module that you get an array of Seq objects if you use the > each_seq method. > So, can you get back the sequences used in the alignment from the align > object using the each_seq method? For example: > > # Extract sequences and check values for the alignment column $pos (this is > part of the doc. example) > foreach $seq ($aln->each_seq) { > > } > what the $seq is? I printed and is a locatableseq object, but if it is a Seq > object, should I be able to get the actual seq string? I tried something > like > ... > my $seq_obj= Bio::Seq->new(); > > my $x = $seq_obj->seq; > but it does not print anything. You just created a new Bio::Seq object and didn't supply any sequence to it, so of course asking for the sequence isn't going to give you anything. When you get a $seq from each_seq() in your foreach loop, $seq is already a Bio::LocatableSeq as you already discovered (which inherits from Bio::PrimarySeq) so you don't create a new one. You just use it via the methods it has: http://doc.bioperl.org/bioperl-live/Bio/PrimarySeq.html foreach $seq ($aln->each_seq) { print $seq->seq, "\n"; } I think it will be very beneficial to you to go away and learn about basic object oriented programming: http://perldoc.perl.org/perlboot.html http://perldoc.perl.org/perltoot.html or even non-perl-specific resources. From n.haigh at sheffield.ac.uk Thu Mar 8 11:38:31 2007 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Thu, 08 Mar 2007 16:38:31 +0000 Subject: [Bioperl-l] new/unannounced methods Message-ID: <45F03C07.6040303@sheffield.ac.uk> I've come across a couple of methods that are commented as being new/unannounced e.g.: Bio::Location::Atomic::trunc As it's not been documented with POD it reduces the POD coverage metric. I wondered if it makes sense to have these types of methods initially made private with the use of a leading underscore until it's unveiling? This way, programmers know not to touch the method from outside the module and the POD coverage doesn't see it as being missed. Any thoughts? Nath From bix at sendu.me.uk Thu Mar 8 11:44:05 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 08 Mar 2007 16:44:05 +0000 Subject: [Bioperl-l] Devel::Cover In-Reply-To: <41618AA5-7642-4AC5-BF50-1F163467BE91@uiuc.edu> References: <45EFF840.6010300@sheffield.ac.uk> <45F0010B.7000101@sendu.me.uk> <45F00635.7010000@sheffield.ac.uk> <3CB8C952-7493-4B1B-8071-A2B4A5D5A708@uiuc.edu> <45F00DA8.8030308@sendu.me.uk> <45F01707.1040109@sheffield.ac.uk> <45F01BA0.3070400@sendu.me.uk> <45F01D65.70302@sheffield.ac.uk> <41618AA5-7642-4AC5-BF50-1F163467BE91@uiuc.edu> Message-ID: <45F03D55.8040000@sendu.me.uk> Chris Fields wrote: > > On Mar 8, 2007, at 8:27 AM, Nathan Haigh wrote: [ ./Build testcover -> http://www.bioinf.shef.ac.uk/public/bioperl-live/cover_db/coverage.html ] >> I've just added some JavaScript to allow the table to be sorted by any >> column - a bit easier to find those low % modules now 8-) - I'm >> notifying the authors of Devel::Cover as it's nice feature if they >> include it in their code. >> >> I'll have a look at addressing some of the pod issues as and when I get >> a bit of time. >> Nath > > I like that! You should talk to Jason, Mauricio, or Chris D. about > adding this to the bioperl website with live updates (like PDOC or > Deobfuscator). It would be a valuable resource and give us close to > real-time progress on what modules need work, etc. Seconded. Can we make this happen please? From cjfields at uiuc.edu Thu Mar 8 11:51:41 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 8 Mar 2007 10:51:41 -0600 Subject: [Bioperl-l] Devel::Cover In-Reply-To: <45F03D55.8040000@sendu.me.uk> References: <45EFF840.6010300@sheffield.ac.uk> <45F0010B.7000101@sendu.me.uk> <45F00635.7010000@sheffield.ac.uk> <3CB8C952-7493-4B1B-8071-A2B4A5D5A708@uiuc.edu> <45F00DA8.8030308@sendu.me.uk> <45F01707.1040109@sheffield.ac.uk> <45F01BA0.3070400@sendu.me.uk> <45F01D65.70302@sheffield.ac.uk> <41618AA5-7642-4AC5-BF50-1F163467BE91@uiuc.edu> <45F03D55.8040000@sendu.me.uk> Message-ID: <9FFEF20D-4C58-4A9B-A0B1-CDF22B377935@uiuc.edu> On Mar 8, 2007, at 10:44 AM, Sendu Bala wrote: > Chris Fields wrote: >> On Mar 8, 2007, at 8:27 AM, Nathan Haigh wrote: > > [ > ./Build testcover > > -> > > http://www.bioinf.shef.ac.uk/public/bioperl-live/cover_db/ > coverage.html (took the support email off the response)... I did notice a number of modules are missing (Bio::DB::EUtilities* for instance). Wonder why that happened... chris From n.haigh at sheffield.ac.uk Thu Mar 8 11:55:23 2007 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Thu, 08 Mar 2007 16:55:23 +0000 Subject: [Bioperl-l] Devel::Cover In-Reply-To: <45F03D55.8040000@sendu.me.uk> References: <45EFF840.6010300@sheffield.ac.uk> <45F0010B.7000101@sendu.me.uk> <45F00635.7010000@sheffield.ac.uk> <3CB8C952-7493-4B1B-8071-A2B4A5D5A708@uiuc.edu> <45F00DA8.8030308@sendu.me.uk> <45F01707.1040109@sheffield.ac.uk> <45F01BA0.3070400@sendu.me.uk> <45F01D65.70302@sheffield.ac.uk> <41618AA5-7642-4AC5-BF50-1F163467BE91@uiuc.edu> <45F03D55.8040000@sendu.me.uk> Message-ID: <45F03FFB.1040601@sheffield.ac.uk> -- snip -- >> >> I like that! You should talk to Jason, Mauricio, or Chris D. about >> adding this to the bioperl website with live updates (like PDOC or >> Deobfuscator). It would be a valuable resource and give us close to >> real-time progress on what modules need work, etc. > > Seconded. Can we make this happen please? > Would you like me to do anything? Nath From bix at sendu.me.uk Thu Mar 8 11:55:41 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 08 Mar 2007 16:55:41 +0000 Subject: [Bioperl-l] Devel::Cover In-Reply-To: <9FFEF20D-4C58-4A9B-A0B1-CDF22B377935@uiuc.edu> References: <45EFF840.6010300@sheffield.ac.uk> <45F0010B.7000101@sendu.me.uk> <45F00635.7010000@sheffield.ac.uk> <3CB8C952-7493-4B1B-8071-A2B4A5D5A708@uiuc.edu> <45F00DA8.8030308@sendu.me.uk> <45F01707.1040109@sheffield.ac.uk> <45F01BA0.3070400@sendu.me.uk> <45F01D65.70302@sheffield.ac.uk> <41618AA5-7642-4AC5-BF50-1F163467BE91@uiuc.edu> <45F03D55.8040000@sendu.me.uk> <9FFEF20D-4C58-4A9B-A0B1-CDF22B377935@uiuc.edu> Message-ID: <45F0400D.2020108@sendu.me.uk> Chris Fields wrote: > > On Mar 8, 2007, at 10:44 AM, Sendu Bala wrote: >> [ >> ./Build testcover >> >> -> >> >> http://www.bioinf.shef.ac.uk/public/bioperl-live/cover_db/coverage.html > > (took the support email off the response)... > > I did notice a number of modules are missing (Bio::DB::EUtilities* for > instance). Wonder why that happened... No idea. I did a quick one for myself with BIOPERLDEBUG=0 and got results for EUtilities: http://bix.sendu.me.uk/bioperl_live_cover/coverage.html (It's already out of date though - nice catches on the wrong method names Nathan!) From dmessina at wustl.edu Thu Mar 8 11:56:13 2007 From: dmessina at wustl.edu (Dave Messina) Date: Thu, 08 Mar 2007 10:56:13 -0600 Subject: [Bioperl-l] new/unannounced methods In-Reply-To: <45F03C07.6040303@sheffield.ac.uk> References: <45F03C07.6040303@sheffield.ac.uk> Message-ID: <45F0402D.6030103@wustl.edu> Nathan Haigh wrote: > I've come across a couple of methods that are commented as being > new/unannounced e.g.: > Bio::Location::Atomic::trunc > > As it's not been documented with POD it reduces the POD coverage metric. > I wondered if it makes sense to have these types of methods initially > made private with the use of a leading underscore until it's unveiling? > This way, programmers know not to touch the method from outside the > module and the POD coverage doesn't see it as being missed. > Maybe add POD that it's experimental and convert Jason's comments into the POD description? From cjfields at uiuc.edu Thu Mar 8 11:59:20 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 8 Mar 2007 10:59:20 -0600 Subject: [Bioperl-l] new/unannounced methods In-Reply-To: <45F03C07.6040303@sheffield.ac.uk> References: <45F03C07.6040303@sheffield.ac.uk> Message-ID: <639699B6-0C26-4CCD-9DF2-C0B0BDAD97E3@uiuc.edu> On Mar 8, 2007, at 10:38 AM, Nathan Haigh wrote: > I've come across a couple of methods that are commented as being > new/unannounced e.g.: > Bio::Location::Atomic::trunc See: http://bugzilla.open-bio.org/show_bug.cgi?id=1572 > As it's not been documented with POD it reduces the POD coverage > metric. > I wondered if it makes sense to have these types of methods initially > made private with the use of a leading underscore until it's > unveiling? > This way, programmers know not to touch the method from outside the > module and the POD coverage doesn't see it as being missed. > > Any thoughts? > > Nath That would be cheating, wouldn't it? ; > I think if a method is intended for public consumption it should be vetted via Devel::Cover. If it is obviously meant to be a private method then it should get the underscore and be passed over. trunc(), judging from the bug report, doesn't seem to be a private method (i.e. the idea was that one could obtain a truncated location using this method). If anything it needs tests and documentation, but apparently it doesn't work as intended (or does it?). chris From bix at sendu.me.uk Thu Mar 8 12:04:16 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 08 Mar 2007 17:04:16 +0000 Subject: [Bioperl-l] new/unannounced methods In-Reply-To: <45F03C07.6040303@sheffield.ac.uk> References: <45F03C07.6040303@sheffield.ac.uk> Message-ID: <45F04210.90204@sendu.me.uk> Nathan Haigh wrote: > I've come across a couple of methods that are commented as being > new/unannounced e.g.: > Bio::Location::Atomic::trunc > > As it's not been documented with POD it reduces the POD coverage metric. > I wondered if it makes sense to have these types of methods initially > made private with the use of a leading underscore until it's unveiling? > This way, programmers know not to touch the method from outside the > module and the POD coverage doesn't see it as being missed. > > Any thoughts? I got the impression people are already trying to use them anyway, so changing the method name may not be ideal. I'd suggest adding POD that made it clear it was a method of dubious status. Thank you for your efforts! From cjfields at uiuc.edu Thu Mar 8 12:04:45 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 8 Mar 2007 11:04:45 -0600 Subject: [Bioperl-l] Devel::Cover In-Reply-To: <45F03FFB.1040601@sheffield.ac.uk> References: <45EFF840.6010300@sheffield.ac.uk> <45F0010B.7000101@sendu.me.uk> <45F00635.7010000@sheffield.ac.uk> <3CB8C952-7493-4B1B-8071-A2B4A5D5A708@uiuc.edu> <45F00DA8.8030308@sendu.me.uk> <45F01707.1040109@sheffield.ac.uk> <45F01BA0.3070400@sendu.me.uk> <45F01D65.70302@sheffield.ac.uk> <41618AA5-7642-4AC5-BF50-1F163467BE91@uiuc.edu> <45F03D55.8040000@sendu.me.uk> <45F03FFB.1040601@sheffield.ac.uk> Message-ID: <761DC49A-46EA-42A5-A0FE-0FC1C4BC7CEA@uiuc.edu> On Mar 8, 2007, at 10:55 AM, Nathan Haigh wrote: > -- snip -- >>> >>> I like that! You should talk to Jason, Mauricio, or Chris D. about >>> adding this to the bioperl website with live updates (like PDOC or >>> Deobfuscator). It would be a valuable resource and give us close to >>> real-time progress on what modules need work, etc. >> >> Seconded. Can we make this happen please? >> > > Would you like me to do anything? > > Nath I think you (and Sendu) already have; these responses have been sent along to open-bio support. We may hear from The Powers That Be shortly. chris From n.haigh at sheffield.ac.uk Thu Mar 8 12:17:45 2007 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Thu, 08 Mar 2007 17:17:45 +0000 Subject: [Bioperl-l] new/unannounced methods In-Reply-To: <45F04210.90204@sendu.me.uk> References: <45F03C07.6040303@sheffield.ac.uk> <45F04210.90204@sendu.me.uk> Message-ID: <45F04539.1000804@sheffield.ac.uk> Sendu Bala wrote: > Nathan Haigh wrote: >> I've come across a couple of methods that are commented as being >> new/unannounced e.g.: >> Bio::Location::Atomic::trunc >> >> As it's not been documented with POD it reduces the POD coverage metric. >> I wondered if it makes sense to have these types of methods initially >> made private with the use of a leading underscore until it's unveiling? >> This way, programmers know not to touch the method from outside the >> module and the POD coverage doesn't see it as being missed. >> >> Any thoughts? > > I got the impression people are already trying to use them anyway, so > changing the method name may not be ideal. I'd suggest adding POD that > made it clear it was a method of dubious status. > > Thank you for your efforts! > Done From n.haigh at sheffield.ac.uk Thu Mar 8 12:38:02 2007 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Thu, 08 Mar 2007 17:38:02 +0000 Subject: [Bioperl-l] Devel::Cover Pod metric Message-ID: <45F049FA.6070208@sheffield.ac.uk> In Bio::SearchIO::exponerate There doesn't appear to be POD for write_result(), result_count() or report_count(). However, only report_count is flagged as not having POD - any idea why this might be? Nath From cjfields at uiuc.edu Thu Mar 8 13:36:39 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 8 Mar 2007 12:36:39 -0600 Subject: [Bioperl-l] problem for Bio::Tools::Run::RNAMotif In-Reply-To: <500217090703080013w5cbed61bv23edc1502d4d3d5@mail.gmail.com> References: <500217090703070826m66e29f52k83a06866738df9d6@mail.gmail.com> <525AA65D-EEF8-4BF3-8DD7-3155E3D86E3F@uiuc.edu> <500217090703080013w5cbed61bv23edc1502d4d3d5@mail.gmail.com> Message-ID: Let me know if you have any suggestions. Thanks! chris On Mar 8, 2007, at 2:13 AM, bertrand beckert wrote: > yes, it seems to work! > > thank. > > Bertrand > > 2007/3/7, Chris Fields : If you're running > Bio::Tools::Run::RNAMotif (which is only in CVS > currently) and BioPerl 1.5.2 you're essentially mixing versions, so I > think you'll need to update your core bioperl from CVS. I can't > remember precisely but I think _set_from_args() was committed to CVS > after the last BioPerl release (1.5.2); the last bioperl-run release > was also 1.5.2, which lacked my wrapper modules for RNAMotif, > Infernal, etc. > > Note I'm still working on those modules (Infernal, RNAMotif, ERPIN, > etc), hence the reason they weren't in the last release. It's > possible I may change some of the interface design to make it more > straightforward (ERPIN and Infernal in particular). > > chris > > PS- let me know how it works out. > > On Mar 7, 2007, at 10:26 AM, bertrand beckert wrote: > > > hello, > > > > I am using RNAmotif suite of programs... > > I just try the module Bio::Tools::Run::RNAMotif, the lastest > version ( > > RNAMotif.pm,v 1.4 2007/02/07--) > > > > In order to test how it function i just try the short example... > > but as > > you can imagine it don't work.... > > > > error: > > laptop:~/rnamot_test$ ./test_rnamotif.pl > > Can't locate object method "_set_from_args" via package > > "Bio::Tools::Run::RNAMotif" at > > /usr/local/share/perl/5.8.8/Bio/Tools/Run/RNAMotif.pm line 176, > > line > > 1. > > laptop:~/rnamot_test$ > > > > could you help me? > > > > thanks > > > > > > Bertrand BECKERT > > IBMC - UPR 9002 du CNRS - ARN > > 15, rue Rene Descartes > > F-67084 STRASBOURG Cedex > > > > bertrand.beckert at gmail.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > > > > -- > Bertrand BECKERT > PhD student > IBMC - UPR 9002 du CNRS - ARN > 15, rue Rene Descartes > F-67084 STRASBOURG Cedex > > b.beckert at ibmc.u-strasbg.fr > bertrand.beckert at gmail.com Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From darin.london at duke.edu Tue Mar 6 11:03:59 2007 From: darin.london at duke.edu (Darin London) Date: Tue, 06 Mar 2007 11:03:59 -0500 Subject: [Bioperl-l] Announcing BOSC 2007 Message-ID: <45ED90EF.7030000@duke.edu> The BOSC Organizing Committee are proud to announce BOSC 2007, occurring in Vienna, Austria on July 19th, 20th. The conference this year promises to be exciting, as the BOSC developers attempt to define and solve currently intractable problems in Bioinformatics. Please refer to the following website for complete information, and requests for submissions. Thank you, and we hope to see you in Vienna. http://open-bio.org/wiki/BOSC_2007 The BOSC organizing Committee Please pass this email on to anyone that would be interested. From pmiguel at purdue.edu Fri Mar 9 11:03:15 2007 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Fri, 09 Mar 2007 11:03:15 -0500 Subject: [Bioperl-l] new/unannounced methods In-Reply-To: <639699B6-0C26-4CCD-9DF2-C0B0BDAD97E3@uiuc.edu> References: <45F03C07.6040303@sheffield.ac.uk> <639699B6-0C26-4CCD-9DF2-C0B0BDAD97E3@uiuc.edu> Message-ID: <45F18543.2030309@purdue.edu> Chris Fields wrote: > On Mar 8, 2007, at 10:38 AM, Nathan Haigh wrote: > > >> I've come across a couple of methods that are commented as being >> new/unannounced e.g.: >> Bio::Location::Atomic::trunc >> > > See: > > http://bugzilla.open-bio.org/show_bug.cgi?id=1572 > > >> As it's not been documented with POD it reduces the POD coverage >> metric. >> I wondered if it makes sense to have these types of methods initially >> made private with the use of a leading underscore until it's >> unveiling? >> This way, programmers know not to touch the method from outside the >> module and the POD coverage doesn't see it as being missed. >> >> Any thoughts? >> >> Nath >> > > That would be cheating, wouldn't it? ; > > > I think if a method is intended for public consumption it should be > vetted via Devel::Cover. If it is obviously meant to be a private > method then it should get the underscore and be passed over. > > trunc(), judging from the bug report, doesn't seem to be a private > method (i.e. the idea was that one could obtain a truncated location > using this method). If anything it needs tests and documentation, > but apparently it doesn't work as intended (or does it?). > > chris Not sure if this constitutes a namespace collision but Bio::PrimarySeqI has a method trunc(). It is like subseq() but returns an object, rather than a string. Phillip From david.ray at mail.wvu.edu Thu Mar 8 19:41:11 2007 From: david.ray at mail.wvu.edu (daray) Date: Thu, 8 Mar 2007 16:41:11 -0800 (PST) Subject: [Bioperl-l] Repeatmasker scripts Message-ID: <9385755.post@talk.nabble.com> I'm new here and new to BioPerl. Please let me know if I am breaking any rules. Does anyone know of scripts designed to parse repeatmasker output so that split repetitive elements can be recovered as a single row? I tried a search but I am either searching for the wrong terms or there isn't anything to find. Below is an example of what I would like to do but any comparable system would be useful. This is the output, single elements are split over two lines. 591 7.1 0 2.4 Mluc_cont1.010442 1392 1478 -967 C Tc2_ML1_coding Unknown 0 1296 1212 1 3825 4.7 0.6 1 Mluc_cont1.010442 1470 1959 -486 C Tc2_ML1_coding Unknown -808 488 1 1 1816 7 0 7 Mluc_cont1.010866 3614 3890 -836 C Tc2_ML1_coding Unknown -1037 259 1 2 596 3.6 2.4 2.4 Mluc_cont1.011200 1155 1239 -847 C Tc2_ML1_coding Unknown 0 1296 1212 3 3848 5.2 0.8 0.6 Mluc_cont1.011200 1231 1717 -369 C Tc2_ML1_coding Unknown -808 488 1 3 It would be nice to combine the multiple entries as follows or do something similar. 591 7.1 0 2.4 Mluc_cont1.010442 1392 1959 -967 C Tc2_ML1_coding Unknown 0 1296 1212 1 1816 7 0 7 Mluc_cont1.010866 3614 3890 -836 C Tc2_ML1_coding Unknown -1037 259 1 2 596 3.6 2.4 2.4 Mluc_cont1.011200 1155 1717 -847 C Tc2_ML1_coding Unknown 0 1296 1212 3 Any help would be appreciated. -- View this message in context: http://www.nabble.com/Repeatmasker-scripts-tf3372839.html#a9385755 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jason at bioperl.org Fri Mar 9 12:13:15 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 9 Mar 2007 09:13:15 -0800 Subject: [Bioperl-l] new/unannounced methods In-Reply-To: <45F18543.2030309@purdue.edu> References: <45F03C07.6040303@sheffield.ac.uk> <639699B6-0C26-4CCD-9DF2-C0B0BDAD97E3@uiuc.edu> <45F18543.2030309@purdue.edu> Message-ID: Difference objects so there is no conflict. The location trunc() is used when truncating a sequence which helps create the sequence object that you get back from the Seq trunc() so it makes sense to keep the names the same. -jason On Mar 9, 2007, at 8:03 AM, Phillip San Miguel wrote: > Chris Fields wrote: >> On Mar 8, 2007, at 10:38 AM, Nathan Haigh wrote: >> >> >>> I've come across a couple of methods that are commented as being >>> new/unannounced e.g.: >>> Bio::Location::Atomic::trunc >>> >> >> See: >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=1572 >> >> >>> As it's not been documented with POD it reduces the POD coverage >>> metric. >>> I wondered if it makes sense to have these types of methods >>> initially >>> made private with the use of a leading underscore until it's >>> unveiling? >>> This way, programmers know not to touch the method from outside the >>> module and the POD coverage doesn't see it as being missed. >>> >>> Any thoughts? >>> >>> Nath >>> >> >> That would be cheating, wouldn't it? ; > >> >> I think if a method is intended for public consumption it should be >> vetted via Devel::Cover. If it is obviously meant to be a private >> method then it should get the underscore and be passed over. >> >> trunc(), judging from the bug report, doesn't seem to be a private >> method (i.e. the idea was that one could obtain a truncated location >> using this method). If anything it needs tests and documentation, >> but apparently it doesn't work as intended (or does it?). >> >> chris > Not sure if this constitutes a namespace collision but > Bio::PrimarySeqI > has a method trunc(). It is like subseq() but returns an object, > rather > than a string. > > Phillip > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070309/ee573d3e/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2613 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070309/ee573d3e/attachment.bin From mkiwala at watson.wustl.edu Fri Mar 9 12:37:47 2007 From: mkiwala at watson.wustl.edu (Michael Kiwala) Date: Fri, 09 Mar 2007 11:37:47 -0600 Subject: [Bioperl-l] new/unannounced methods In-Reply-To: <45F18543.2030309@purdue.edu> References: <45F03C07.6040303@sheffield.ac.uk> <639699B6-0C26-4CCD-9DF2-C0B0BDAD97E3@uiuc.edu> <45F18543.2030309@purdue.edu> Message-ID: <45F19B6B.7040709@watson.wustl.edu> Phillip San Miguel wrote: > Chris Fields wrote: > >> On Mar 8, 2007, at 10:38 AM, Nathan Haigh wrote: >> >> >> >>> I've come across a couple of methods that are commented as being >>> new/unannounced e.g.: >>> Bio::Location::Atomic::trunc >>> >>> >> See: >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=1572 >> >> >> >>> As it's not been documented with POD it reduces the POD coverage >>> metric. >>> I wondered if it makes sense to have these types of methods initially >>> made private with the use of a leading underscore until it's >>> unveiling? >>> This way, programmers know not to touch the method from outside the >>> module and the POD coverage doesn't see it as being missed. >>> >>> Any thoughts? >>> >>> Nath >>> >>> >> That would be cheating, wouldn't it? ; > >> >> I think if a method is intended for public consumption it should be >> vetted via Devel::Cover. If it is obviously meant to be a private >> method then it should get the underscore and be passed over. >> >> trunc(), judging from the bug report, doesn't seem to be a private >> method (i.e. the idea was that one could obtain a truncated location >> using this method). If anything it needs tests and documentation, >> but apparently it doesn't work as intended (or does it?). >> >> chris >> > Not sure if this constitutes a namespace collision but Bio::PrimarySeqI > has a method trunc(). It is like subseq() but returns an object, rather > than a string. > > Phillip > Actually, I think it's polymorphism. What I imagine is that eventually you should be able to call trunc() on any PrimarySeqI implementing object and it will trunc() itself and any children it has that also can('trunc'). So if the sequence you are trying to trunc() implements FeatureHolderI like SeqI does then (one day, once this is all coded) the sequence object will also do the Right Thing with the features it holds (currently, it does not). Similarly, revcom(), cat(), and excise()* should work the same way. Is anyone out there currently working on this? If not I am volunteering. I'd like to move Bio::SeqUtils::trunc_with_features() and friends to Bio::SeqI::trunc(), etc. *excise() is a method I'd like to add that would allow one to remove a section out of the middle of a sequence, such as a transposable element that you don't want to submit to GenBank. From xikun.wu at bbsrc.ac.uk Fri Mar 9 12:41:38 2007 From: xikun.wu at bbsrc.ac.uk (xikun wu (IAH-C)) Date: Fri, 9 Mar 2007 17:41:38 -0000 Subject: [Bioperl-l] sub_SeqFeature Message-ID: <2060EFE0-C7DC-43B8-96B8-11AA281BB5D6@bbsrc.ac.uk> Hi Hilmar, Sorry for the unexpected contact! I am using bioperl-1.4 but got problem in using "sub_SeqFeature", nothing was returned from it. Any hints from you will be really appreciated! Here are the codes: ################################ use lib "/usr/local/share/bioperl-1.4"; use Bio::Seq; use Bio::SeqIO; die "usage: perl test.pl \n" unless @ARGV==1; my ($embl)=@ARGV; my $in=Bio::SeqIO->new('-file'=>$embl, '-format' => 'EMBL'); while(my $seq=$in->next_seq()) { my @feats=$seq->get_all_SeqFeatures(); foreach my $feat (@feats) { if($feat->primary_tag eq "CDS") { my ($gene_id)=$feat->each_tag_value('gene'); my @subfeat=$feat->sub_SeqFeature(); my $exon_number=scalar(@subfeat); print"$gene_id\t$exon_number\t at subfeat\n"; } } } __END__ ################################## The output is: gene.46709 0 gene.46836 0 gene.46137 0 gene.46285 0 ...... Thank you very much! Best wishse, Xikun **************************** Xikun Wu PhD in Bioinformatics Institute for Animal Health Compton Nr Newbury Berkshire, RG20 7NN U.K. Tel: 01635 577275 Fax: 01635 577263 The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. From rahall2 at ualr.edu Fri Mar 9 14:34:59 2007 From: rahall2 at ualr.edu (Roger Hall) Date: Fri, 9 Mar 2007 13:34:59 -0600 Subject: [Bioperl-l] FW: Justice Department: FBI acted illegally on data Message-ID: <006001c76282$06a25180$4601a8c0@LIBERAL2> Ouch. Wrong list. Don't you hate it when that happens? Roger -----Original Message----- From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Friday, March 09, 2007 1:34 PM To: Bioperl-L (bioperl-l at lists.open-bio.org) Subject: Justice Department: FBI acted illegally on data And I'm just sooooooooooooooooooooo surprised! Who could have ever guessed that power would corrupt? http://www.msnbc.msn.com/id/11100916/ Thanks! Roger Hall Technical Director MidSouth Bioinformatics Center University of Arkansas at Little Rock (501) 569-8074 From rahall2 at ualr.edu Fri Mar 9 14:34:18 2007 From: rahall2 at ualr.edu (Roger Hall) Date: Fri, 9 Mar 2007 13:34:18 -0600 Subject: [Bioperl-l] Justice Department: FBI acted illegally on data Message-ID: <005601c76281$ee30fe80$4601a8c0@LIBERAL2> And I'm just sooooooooooooooooooooo surprised! Who could have ever guessed that power would corrupt? http://www.msnbc.msn.com/id/11100916/ Thanks! Roger Hall Technical Director MidSouth Bioinformatics Center University of Arkansas at Little Rock (501) 569-8074 From hlapp at gmx.net Fri Mar 9 15:09:58 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 9 Mar 2007 15:09:58 -0500 Subject: [Bioperl-l] Fwd: sub_SeqFeature References: <8975119BCD0AC5419D61A9CF1A923E9504604FA1@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <199D9184-E49C-40E5-B140-03CC95794359@gmx.net> Begin forwarded message: > From: "xikun wu $IAH-C$" > Date: March 9, 2007 12:41:38 PM EST > To: > Subject: sub_SeqFeature > > Hi Hilmar, > > Sorry for the unexpected contact! > > I am using bioperl-1.4 but got problem in using "sub_SeqFeature", > nothing was returned from it. Any hints from you will be really > appreciated! > > Here are the codes: > > ################################ > > use lib "/usr/local/share/bioperl-1.4"; > use Bio::Seq; > use Bio::SeqIO; > > die "usage: perl test.pl \n" unless @ARGV==1; > my ($embl)=@ARGV; > my $in=Bio::SeqIO->new('-file'=>$embl, '-format' => 'EMBL'); > while(my $seq=$in->next_seq()) > { > my @feats=$seq->get_all_SeqFeatures(); > foreach my $feat (@feats) > { > if($feat->primary_tag eq "CDS") > { > my ($gene_id)=$feat->each_tag_value('gene'); > my @subfeat=$feat->sub_SeqFeature(); > my $exon_number=scalar(@subfeat); > print"$gene_id\t$exon_number\t at subfeat\n"; > } > } > } > > __END__ > > ################################## > > > The output is: > > gene.46709 0 > gene.46836 0 > gene.46137 0 > gene.46285 0 > ...... > > > Thank you very much! > > Best wishse, > Xikun > > **************************** > Xikun Wu > PhD in Bioinformatics > Institute for Animal Health > Compton > Nr Newbury > Berkshire, RG20 7NN > U.K. > Tel: 01635 577275 > Fax: 01635 577263 > > The information contained in this message may be confidential or > legally > privileged and is intended solely for the addressee. If you have > received this message in error please delete it & notify the > originator > immediately. > Unauthorised use, disclosure, copying or alteration of this message is > forbidden & may be unlawful. > The contents of this e-mail are the views of the sender and do not > necessarily represent the views of the Institute. > This email and associated attachments has been checked locally for > viruses but we can accept no responsibility once it has left our > systems. > Communications on Institute computers are monitored to secure the > effective operation of the systems and for other lawful purposes. -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Fri Mar 9 19:23:07 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 9 Mar 2007 18:23:07 -0600 Subject: [Bioperl-l] FW: Justice Department: FBI acted illegally on data In-Reply-To: <006001c76282$06a25180$4601a8c0@LIBERAL2> References: <006001c76282$06a25180$4601a8c0@LIBERAL2> Message-ID: <81A1AE87-35BF-42FD-8485-6D04AE5AB183@uiuc.edu> No problem. Nice to hear from you Roger! chris On Mar 9, 2007, at 1:34 PM, Roger Hall wrote: > Ouch. Wrong list. > > Don't you hate it when that happens? > > Roger > > -----Original Message----- > From: Roger Hall [mailto:rahall2 at ualr.edu] > Sent: Friday, March 09, 2007 1:34 PM > To: Bioperl-L (bioperl-l at lists.open-bio.org) > Subject: Justice Department: FBI acted illegally on data > > And I'm just sooooooooooooooooooooo surprised! Who could have > ever guessed that power would corrupt? > > http://www.msnbc.msn.com/id/11100916/ > > Thanks! > > Roger Hall > Technical Director > MidSouth Bioinformatics Center > University of Arkansas at Little Rock > (501) 569-8074 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Fri Mar 9 21:55:05 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 9 Mar 2007 20:55:05 -0600 Subject: [Bioperl-l] new/unannounced methods In-Reply-To: <45F19B6B.7040709@watson.wustl.edu> References: <45F03C07.6040303@sheffield.ac.uk> <639699B6-0C26-4CCD-9DF2-C0B0BDAD97E3@uiuc.edu> <45F18543.2030309@purdue.edu> <45F19B6B.7040709@watson.wustl.edu> Message-ID: On Mar 9, 2007, at 11:37 AM, Michael Kiwala wrote: > Phillip San Miguel wrote: ... >> Not sure if this constitutes a namespace collision but >> Bio::PrimarySeqI >> has a method trunc(). It is like subseq() but returns an object, >> rather >> than a string. >> >> Phillip >> > Actually, I think it's polymorphism. What I imagine is that eventually > you should be able to call trunc() on any PrimarySeqI implementing > object and it will trunc() itself and any children it has that also > can('trunc'). So if the sequence you are trying to trunc() implements > FeatureHolderI like SeqI does then (one day, once this is all > coded) the > sequence object will also do the Right Thing with the features it > holds > (currently, it does not). Similarly, revcom(), cat(), and excise()* > should work the same way. One would think so, yes. > Is anyone out there currently working on this? If not I am > volunteering. > I'd like to move Bio::SeqUtils::trunc_with_features() and friends to > Bio::SeqI::trunc(), etc. I don't think you'll hear too many protests if you want to take it on. Just make sure to add decent tests as needed and document everything (one lesson learned from all this!). The trunc() method is implemented in Bio::PrimarySeqI, so it probably should stay there. It looks like the method already takes into account the possibility the object may be a Bio::SeqI or other PrimarySeqI besides Bio::PrimarySeq. You could just add a check for $seq->isa('FeatureHolder') there then truncate features accordingly. BTW, how would you handle CDS seqfeatures and 'translation' tag data or other seqfeat-based stuff that's location-dependent? > *excise() is a method I'd like to add that would allow one to remove a > section out of the middle of a sequence, such as a transposable > element > that you don't want to submit to GenBank. Sounds fine with me, thought this may belong in Bio::PrimarySeqI as well (at least I think so, since it seems like it would apply to any sequence). chris From hlapp at gmx.net Sat Mar 10 12:52:22 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 10 Mar 2007 12:52:22 -0500 Subject: [Bioperl-l] new/unannounced methods In-Reply-To: References: <45F03C07.6040303@sheffield.ac.uk> <639699B6-0C26-4CCD-9DF2-C0B0BDAD97E3@uiuc.edu> <45F18543.2030309@purdue.edu> <45F19B6B.7040709@watson.wustl.edu> Message-ID: On Mar 9, 2007, at 9:55 PM, Chris Fields wrote: >> Is anyone out there currently working on this? If not I am >> volunteering. >> I'd like to move Bio::SeqUtils::trunc_with_features() and friends to >> Bio::SeqI::trunc(), etc. > > I don't think you'll hear too many protests if you want to take it > on. Just make sure to add decent tests as needed and document > everything (one lesson learned from all this!). > > The trunc() method is implemented in Bio::PrimarySeqI, so it probably > should stay there. Yes, but PrimarySeqI's don't have features. You could say that it may nevertheless know that there is a derived class of Seqs that do. However, why not override the method in, e.g., Bio::SeqI, call the inherited function, and then add whatever is needed to truncate the features, too. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Sat Mar 10 15:47:32 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 10 Mar 2007 14:47:32 -0600 Subject: [Bioperl-l] new/unannounced methods In-Reply-To: References: <45F03C07.6040303@sheffield.ac.uk> <639699B6-0C26-4CCD-9DF2-C0B0BDAD97E3@uiuc.edu> <45F18543.2030309@purdue.edu> <45F19B6B.7040709@watson.wustl.edu>

Message-ID: On Mar 10, 2007, at 11:52 AM, Hilmar Lapp wrote: > > On Mar 9, 2007, at 9:55 PM, Chris Fields wrote: > >>> Is anyone out there currently working on this? If not I am >>> volunteering. >>> I'd like to move Bio::SeqUtils::trunc_with_features() and friends to >>> Bio::SeqI::trunc(), etc. >> >> I don't think you'll hear too many protests if you want to take it >> on. Just make sure to add decent tests as needed and document >> everything (one lesson learned from all this!). >> >> The trunc() method is implemented in Bio::PrimarySeqI, so it probably >> should stay there. > > Yes, but PrimarySeqI's don't have features. > > You could say that it may nevertheless know that there is a derived > class of Seqs that do. However, why not override the method in, e.g., > Bio::SeqI, call the inherited function, and then add whatever is > needed to truncate the features, too. > > -hilmar Yes, that's true; it may make it simpler to keep that code confined to SeqI. My point was that PrimarySeqI::trunc() already seems to anticipate the object might be something else besides a simple PrimarySeq so one could use isa->('FeatureHolderI') there: ... my $seqclass; if($self->can_call_new()) { $seqclass = ref($self); } else { $seqclass = 'Bio::PrimarySeq'; $self->_attempt_to_load_Seq(); } my $out = $seqclass->new( '-seq' => $str, '-display_id' => $self->display_id, '-accession_number' => $self->accession_number, '-alphabet' => $self->alphabet, '-desc' => $self->desc(), '-verbose' => $self->verbose ); return $out; ... but it would be just as easy to do this in Bio::SeqI after calling $self->SUPER::trunc(). chris From torsten.seemann at infotech.monash.edu.au Sat Mar 10 23:32:04 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Sun, 11 Mar 2007 15:32:04 +1100 Subject: [Bioperl-l] How to run Blast with a user defined DNA substitution scoring matrix? In-Reply-To: <20070307172406.8e82d12a@dogwood.plantbio.uga.edu> References: <20060203194450.792e8d4e@dogwood.plantbio.uga.edu> <20070307172406.8e82d12a@dogwood.plantbio.uga.edu> Message-ID: > I need to run blast using a user defined DNA scoring matrix (may sound funny but I am really serious). Can anybody give me a hint on it? This mailing list is for help with using BioPerl modules, not for general bioinformatics tools. In Bio::Tools::Run::StandAloneBlast->new(), you can pass in any valid parameter for 'blastall' and it will be used when running blastall. Standard blastall has parameters for match (-q) and mismatch (-r), but I don't believe it can do a full custom matrix for DNA alignments. You should look the the EMBOSS tools "water" and "needle" which take a "-datafile" parameter to supply a custom DNA matrix. --Torsten From lzhtom at hotmail.com Sun Mar 11 22:33:44 2007 From: lzhtom at hotmail.com (zhihua li) Date: Mon, 12 Mar 2007 02:33:44 +0000 Subject: [Bioperl-l] a problem with standalone blast Message-ID: HI all, I know this is probably not a right question to ask here, 'cause it's solely blast, although i did try using standalone blastall module. But I couldn't find any other suitable mailing lists. So i hope guys here could help me out. Recently I got a problem when I tried to blast a nucleotide sequence against a database. The problem occured when I was using blast module in perl, but then i realized it was caused by blast directly. The database was generated with formatdb and is correct, 'cause all other queries went through quite well. The problematic query sequence is: >ENSG00000162825.7 GGGGAAGAAGATCAAAGAAGAAAGAAGAAGGGGAAGAAAAGAAGGGGAAGAAG The only difference it has compared with other successful queries is that it's shorter . After running blastn I got the following message: [blastall] WARNING: ENSG00000162825.7: Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options. At first I thought it might have been masked out too much during the filtering process. So I ran it again with the filtering switched off. After that I got following error: [blastall] ERROR: Threshold for extending hits, default if zero blastp 11, blastn 0, blastx 12, tblastn 13 tblastx 13, megablast 0 [F] is bad or out of range [? to ?] Could you help me out with this? Thanks a lot! Zhihua Li _________________________________________________________________ ???????????????????????????? MSN Messenger: http://messenger.msn.com/cn From thiago.venancio at gmail.com Mon Mar 12 06:44:47 2007 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Mon, 12 Mar 2007 07:44:47 -0300 Subject: [Bioperl-l] a problem with standalone blast In-Reply-To: References: Message-ID: <44255ea80703120344o74d73a5egeb6dddca01e3a08@mail.gmail.com> Hi Zhihua, I have this problem once. It is most probably caused by a conbination of small subject sequences with low complexity regions. This seems to be your case... T. On 3/11/07, zhihua li wrote: > > HI all, > > I know this is probably not a right question to ask here, 'cause it's > solely blast, although i did try using standalone blastall module. But I > couldn't find any other suitable mailing lists. So i hope guys here could > help me out. > > Recently I got a problem when I tried to blast a nucleotide sequence > against a database. The problem occured when I was using blast module in > perl, but then i realized it was caused by blast directly. > > The database was generated with formatdb and is correct, 'cause all other > queries went through quite well. > > The problematic query sequence is: > >ENSG00000162825.7 > GGGGAAGAAGATCAAAGAAGAAAGAAGAAGGGGAAGAAAAGAAGGGGAAGAAG > > The only difference it has compared with other successful queries is that > it's shorter . > > After running blastn I got the following message: > [blastall] WARNING: ENSG00000162825.7: Could not calculate ungapped > Karlin-Altschul parameters due to an invalid query sequence or its > translation. Please verify the query sequence(s) and/or filtering options. > > At first I thought it might have been masked out too much during the > filtering process. So I ran it again with the filtering switched off. > After that I got following error: > [blastall] ERROR: Threshold for extending hits, default if zero > blastp 11, blastn 0, blastx 12, tblastn 13 > tblastx 13, megablast 0 [F] is bad or out of range [? to ?] > > Could you help me out with this? > > Thanks a lot! > > > Zhihua Li > > _________________________________________________________________ > ???????????????????????????? MSN Messenger: http://messenger.msn.com/cn > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- "The way to get started is to quit talking and begin doing." Walt Disney ======================== Thiago Motta Venancio, MSc PhD student in Bioinformatics University of Sao Paulo ======================== From e-just at northwestern.edu Mon Mar 12 18:02:01 2007 From: e-just at northwestern.edu (Eric Just) Date: Mon, 12 Mar 2007 17:02:01 -0500 Subject: [Bioperl-l] Tandem Repeats finder parser Message-ID: Hi there, I have written a simple parser for Tandem Repeats Finder output. Is there any interest in including this module in Bioperl? If so, I will conform it to Bioperl standards, write some tests, then send it to whomever is interested. Eric From cjfields at uiuc.edu Mon Mar 12 18:47:19 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 12 Mar 2007 17:47:19 -0500 Subject: [Bioperl-l] Tandem Repeats finder parser In-Reply-To: References: Message-ID: <3993B838-360D-4BE5-9B04-A4D05650C758@uiuc.edu> I don't see a problem. The best place for it is prob. in the Bio::Tools namespace (unless you have other ideas). chris On Mar 12, 2007, at 5:02 PM, Eric Just wrote: > Hi there, > > I have written a simple parser for Tandem Repeats Finder output. Is > there any interest in including this module in Bioperl? If so, I will > conform it to Bioperl standards, write some tests, then send it to > whomever is interested. > > Eric > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From lzhtom at hotmail.com Mon Mar 12 20:12:54 2007 From: lzhtom at hotmail.com (zhihua li) Date: Tue, 13 Mar 2007 00:12:54 +0000 Subject: [Bioperl-l] a problem with standalone blast In-Reply-To: <20070312221519.GA27940@eniac.jgi-psf.org> Message-ID: thanks guys. now i know what happened. This did come from a low complexity region being masked out almost completely, and I did mistake the option '-F' for '-f'. Thanks all of the help! _________________________________________________________________ ???????????????????????????? MSN Messenger: http://messenger.msn.com/cn From n.saunders at uq.edu.au Tue Mar 13 06:27:09 2007 From: n.saunders at uq.edu.au (Neil Saunders) Date: Tue, 13 Mar 2007 20:27:09 +1000 Subject: [Bioperl-l] Bio::Tools::Run::Signalp Message-ID: <45F67C7D.40200@uq.edu.au> dear BioPerlers, Is there any better documentation for Bio::Tools::Run::Signalp other than the POD? I'm trying to test it out using little more than the POD synopsis and I get this error: ------------------------------------------------------------ Use of uninitialized value in concatenation (.) or string at /usr/local/share/perl/5.8.7/Bio/Tools/Run/Signalp.pm line 235, line 1. sh: Illegal option -t ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Signalp call ( -t euk -trunc 50 /tmp/81hT1BUrRk/98TZbtK884 > /tmp/81hT1BUrRk/XZjUtdxewD) crashed: 512 ------------------------------------------------------------ It's unclear how to pass parameters to signalp using this module or even if it works with my signalp version (3.0). Any pointers greatly appreciated. Neil -- School of Molecular and Microbial Sciences University of Queensland Brisbane 4072 Australia http://nsaunders.wordpress.com From n.haigh at sheffield.ac.uk Tue Mar 13 08:38:15 2007 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Tue, 13 Mar 2007 12:38:15 +0000 Subject: [Bioperl-l] Test Suite and Test::Exception Message-ID: <45F69B37.2000505@sheffield.ac.uk> I've really started looking into tests while writing some of my own modules and I've found Test::Exception pretty good at testing things you would expect to die and live. I know things like this can be done with an eval block, but Test::Exception is nice and tidy and provides throws_ok etc to test if a certain message was thrown (using a string or regex). I think it could be pretty useful for use in BioPerl tests, especially if tests are to be devised to cover some of the metrics provided by Devel::Cover e.g. Branch coverage. For a quick example: test that an error is thrown when providing an inappropriate parameter to one of the bioperl-run modules (e.g. Clustalw.pm). -- start test code -- # Build a clustalw alignment factory my @params = ('unknown_param_name' => 2); $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); dies_ok{ Bio::Tools::Run::Alignment::Clustalw->new(@params);} "Correctly died when using an unknown parameter"; -- end test code -- However, Test::Exception isn't installed by default. In addition, these types of tests could inflate the test suit - is this, or could this be an issue? Anywho, any thoughts on this? Nath From cjfields at uiuc.edu Tue Mar 13 09:04:50 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 13 Mar 2007 08:04:50 -0500 Subject: [Bioperl-l] Test Suite and Test::Exception In-Reply-To: <45F69B37.2000505@sheffield.ac.uk> References: <45F69B37.2000505@sheffield.ac.uk> Message-ID: You could add it to t/lib, which is what we do with Test::Simple/ More. It seems like a worthwhile addition. I have to agree with one of the CPAN reviewers, though, when using it in cases like your example; I would use throws_ok{}, which makes sure an exception was thrown and checks $@ using a qr{} (just to make sure the exception was the expected one and not something else like a nonexistent file, etc.). chris On Mar 13, 2007, at 7:38 AM, Nathan Haigh wrote: > I've really started looking into tests while writing some of my own > modules and I've found Test::Exception pretty good at testing > things you > would expect to die and live. I know things like this can be done with > an eval block, but Test::Exception is nice and tidy and provides > throws_ok etc to test if a certain message was thrown (using a > string or > regex). I think it could be pretty useful for use in BioPerl tests, > especially if tests are to be devised to cover some of the metrics > provided by Devel::Cover e.g. Branch coverage. > > For a quick example: test that an error is thrown when providing an > inappropriate parameter to one of the bioperl-run modules (e.g. > Clustalw.pm). > > -- start test code -- > # Build a clustalw alignment factory > my @params = ('unknown_param_name' => 2); > $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); > dies_ok{ Bio::Tools::Run::Alignment::Clustalw->new(@params);} > "Correctly died when using an unknown parameter"; > -- end test code -- > > However, Test::Exception isn't installed by default. In addition, > these > types of tests could inflate the test suit - is this, or could this be > an issue? > > Anywho, any thoughts on this? > Nath > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From n.haigh at sheffield.ac.uk Tue Mar 13 09:49:43 2007 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Tue, 13 Mar 2007 13:49:43 +0000 Subject: [Bioperl-l] Test Suite and Test::Exception In-Reply-To: References: <45F69B37.2000505@sheffield.ac.uk> Message-ID: <45F6ABF7.1060505@sheffield.ac.uk> Yep, I agree with using throws_ok{}. Another thing regarding testing, has anyone thought about having an automated test run via cron on CVS HEAD and have the results made available on the website? Nath Chris Fields wrote: > You could add it to t/lib, which is what we do with > Test::Simple/More. It seems like a worthwhile addition. I have to > agree with one of the CPAN reviewers, though, when using it in cases > like your example; I would use throws_ok{}, which makes sure an > exception was thrown and checks $@ using a qr{} (just to make sure the > exception was the expected one and not something else like a > nonexistent file, etc.). > > chris > > On Mar 13, 2007, at 7:38 AM, Nathan Haigh wrote: > >> I've really started looking into tests while writing some of my own >> modules and I've found Test::Exception pretty good at testing things you >> would expect to die and live. I know things like this can be done with >> an eval block, but Test::Exception is nice and tidy and provides >> throws_ok etc to test if a certain message was thrown (using a string or >> regex). I think it could be pretty useful for use in BioPerl tests, >> especially if tests are to be devised to cover some of the metrics >> provided by Devel::Cover e.g. Branch coverage. >> >> For a quick example: test that an error is thrown when providing an >> inappropriate parameter to one of the bioperl-run modules (e.g. >> Clustalw.pm). >> >> -- start test code -- >> # Build a clustalw alignment factory >> my @params = ('unknown_param_name' => 2); >> $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); >> dies_ok{ Bio::Tools::Run::Alignment::Clustalw->new(@params);} >> "Correctly died when using an unknown parameter"; >> -- end test code -- >> >> However, Test::Exception isn't installed by default. In addition, these >> types of tests could inflate the test suit - is this, or could this be >> an issue? >> >> Anywho, any thoughts on this? >> Nath >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > From cjfields at uiuc.edu Tue Mar 13 10:14:58 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 13 Mar 2007 09:14:58 -0500 Subject: [Bioperl-l] Test Suite and Test::Exception In-Reply-To: <45F6ABF7.1060505@sheffield.ac.uk> References: <45F69B37.2000505@sheffield.ac.uk> <45F6ABF7.1060505@sheffield.ac.uk> Message-ID: I always liked the Pugs smoketest system, where Pugs tests can be run and you get nice htmlized output, then the results can be sent to a smokeserver: http://m19s28.vlinux.de/cgi-bin/pugs-smokeserv.pl? It supposedly can be configured to test non-Pugs perl5-related stuff. chris On Mar 13, 2007, at 8:49 AM, Nathan Haigh wrote: > Yep, I agree with using throws_ok{}. > > Another thing regarding testing, has anyone thought about having an > automated test run via cron on CVS HEAD and have the results made > available on the website? > > Nath > > Chris Fields wrote: >> You could add it to t/lib, which is what we do with >> Test::Simple/More. It seems like a worthwhile addition. I have to >> agree with one of the CPAN reviewers, though, when using it in cases >> like your example; I would use throws_ok{}, which makes sure an >> exception was thrown and checks $@ using a qr{} (just to make sure >> the >> exception was the expected one and not something else like a >> nonexistent file, etc.). >> >> chris >> >> On Mar 13, 2007, at 7:38 AM, Nathan Haigh wrote: >> >>> I've really started looking into tests while writing some of my own >>> modules and I've found Test::Exception pretty good at testing >>> things you >>> would expect to die and live. I know things like this can be done >>> with >>> an eval block, but Test::Exception is nice and tidy and provides >>> throws_ok etc to test if a certain message was thrown (using a >>> string or >>> regex). I think it could be pretty useful for use in BioPerl tests, >>> especially if tests are to be devised to cover some of the metrics >>> provided by Devel::Cover e.g. Branch coverage. >>> >>> For a quick example: test that an error is thrown when providing an >>> inappropriate parameter to one of the bioperl-run modules (e.g. >>> Clustalw.pm). >>> >>> -- start test code -- >>> # Build a clustalw alignment factory >>> my @params = ('unknown_param_name' => 2); >>> $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); >>> dies_ok{ Bio::Tools::Run::Alignment::Clustalw->new(@params);} >>> "Correctly died when using an unknown parameter"; >>> -- end test code -- >>> >>> However, Test::Exception isn't installed by default. In addition, >>> these >>> types of tests could inflate the test suit - is this, or could >>> this be >>> an issue? >>> >>> Anywho, any thoughts on this? >>> Nath >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From e-just at northwestern.edu Tue Mar 13 10:37:42 2007 From: e-just at northwestern.edu (Eric Just) Date: Tue, 13 Mar 2007 09:37:42 -0500 Subject: [Bioperl-l] Tandem Repeats finder parser In-Reply-To: <45F6AE21.7000709@bms.com> References: <3993B838-360D-4BE5-9B04-A4D05650C758@uiuc.edu> <45F6AE21.7000709@bms.com> Message-ID: Hi Stefan Trf does indeed store the parameters in the filename, however it also has a parameters line in the output file which is where I pull the parameters from: Parameters: 2 7 7 80 10 50 12 That may be a new feature, I'm using Tandem Repeats Rinder version 4.00 I was planning on using that since the file name is not necessarily stable (a user can rename the file). Sound reasonable? Eric On 3/13/07, Stefan Kirov wrote: > Eric, > The last time I used it, trf had a funny way of assigning filenames. Do > you plan to parse the filenames and extract the parameters from there, > or you are going to stick only to the content? > If you can parse the filenames, that might be very useful. > Stefan > > Chris Fields wrote: > > I don't see a problem. The best place for it is prob. in the > > Bio::Tools namespace (unless you have other ideas). > > > > chris > > > > On Mar 12, 2007, at 5:02 PM, Eric Just wrote: > > > > > >> Hi there, > >> > >> I have written a simple parser for Tandem Repeats Finder output. Is > >> there any interest in including this module in Bioperl? If so, I will > >> conform it to Bioperl standards, write some tests, then send it to > >> whomever is interested. > >> > >> Eric > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > From stefan.kirov at bms.com Tue Mar 13 09:58:57 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Tue, 13 Mar 2007 08:58:57 -0500 Subject: [Bioperl-l] Tandem Repeats finder parser In-Reply-To: <3993B838-360D-4BE5-9B04-A4D05650C758@uiuc.edu> References: <3993B838-360D-4BE5-9B04-A4D05650C758@uiuc.edu> Message-ID: <45F6AE21.7000709@bms.com> Eric, The last time I used it, trf had a funny way of assigning filenames. Do you plan to parse the filenames and extract the parameters from there, or you are going to stick only to the content? If you can parse the filenames, that might be very useful. Stefan Chris Fields wrote: > I don't see a problem. The best place for it is prob. in the > Bio::Tools namespace (unless you have other ideas). > > chris > > On Mar 12, 2007, at 5:02 PM, Eric Just wrote: > > >> Hi there, >> >> I have written a simple parser for Tandem Repeats Finder output. Is >> there any interest in including this module in Bioperl? If so, I will >> conform it to Bioperl standards, write some tests, then send it to >> whomever is interested. >> >> Eric >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at uiuc.edu Tue Mar 13 11:27:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 13 Mar 2007 10:27:27 -0500 Subject: [Bioperl-l] Tandem Repeats finder parser In-Reply-To: References: <3993B838-360D-4BE5-9B04-A4D05650C758@uiuc.edu> <45F6AE21.7000709@bms.com> Message-ID: Sounds fine by me. As for the versioning issue you can always state the minimal version supported in POD. chris On Mar 13, 2007, at 9:37 AM, Eric Just wrote: > Hi Stefan > > Trf does indeed store the parameters in the filename, however it also > has a parameters line in the output file which is where I pull the > parameters from: > > Parameters: 2 7 7 80 10 50 12 > > That may be a new feature, I'm using Tandem Repeats Rinder version > 4.00 I was planning on using that since the file name is not > necessarily stable (a user can rename the file). Sound reasonable? > > Eric > > > On 3/13/07, Stefan Kirov wrote: >> Eric, >> The last time I used it, trf had a funny way of assigning >> filenames. Do >> you plan to parse the filenames and extract the parameters from >> there, >> or you are going to stick only to the content? >> If you can parse the filenames, that might be very useful. >> Stefan >> >> Chris Fields wrote: >>> I don't see a problem. The best place for it is prob. in the >>> Bio::Tools namespace (unless you have other ideas). >>> >>> chris >>> >>> On Mar 12, 2007, at 5:02 PM, Eric Just wrote: >>> >>> >>>> Hi there, >>>> >>>> I have written a simple parser for Tandem Repeats Finder >>>> output. Is >>>> there any interest in including this module in Bioperl? If so, >>>> I will >>>> conform it to Bioperl standards, write some tests, then send it to >>>> whomever is interested. >>>> >>>> Eric >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From johnsonm at gmail.com Tue Mar 13 12:31:30 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Tue, 13 Mar 2007 11:31:30 -0500 Subject: [Bioperl-l] Tandem Repeats finder parser In-Reply-To: References: Message-ID: I'm going to need a trf parser. I'd be happy to use yours instead of writing one myself. I'll be eagerly watching cvs. 8) On 3/12/07, Eric Just wrote: > Hi there, > > I have written a simple parser for Tandem Repeats Finder output. Is > there any interest in including this module in Bioperl? If so, I will > conform it to Bioperl standards, write some tests, then send it to > whomever is interested. > > Eric > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From stefan.kirov at bms.com Tue Mar 13 12:47:16 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Tue, 13 Mar 2007 11:47:16 -0500 Subject: [Bioperl-l] Tandem Repeats finder parser In-Reply-To: References: <3993B838-360D-4BE5-9B04-A4D05650C758@uiuc.edu> <45F6AE21.7000709@bms.com> Message-ID: <45F6D594.4020104@bms.com> Eric Just wrote: > Hi Stefan > > Trf does indeed store the parameters in the filename, however it also > has a parameters line in the output file which is where I pull the > parameters from: > > Parameters: 2 7 7 80 10 50 12 > > That may be a new feature, I'm using Tandem Repeats Rinder version > 4.00 I was planning on using that since the file name is not > necessarily stable (a user can rename the file). Sound reasonable? > Sure, this is perfectly fine. Stefan > Eric > > > On 3/13/07, Stefan Kirov wrote: > >> Eric, >> The last time I used it, trf had a funny way of assigning filenames. Do >> you plan to parse the filenames and extract the parameters from there, >> or you are going to stick only to the content? >> If you can parse the filenames, that might be very useful. >> Stefan >> >> Chris Fields wrote: >> >>> I don't see a problem. The best place for it is prob. in the >>> Bio::Tools namespace (unless you have other ideas). >>> >>> chris >>> >>> On Mar 12, 2007, at 5:02 PM, Eric Just wrote: >>> >>> >>> >>>> Hi there, >>>> >>>> I have written a simple parser for Tandem Repeats Finder output. Is >>>> there any interest in including this module in Bioperl? If so, I will >>>> conform it to Bioperl standards, write some tests, then send it to >>>> whomever is interested. >>>> >>>> Eric >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From n.saunders at uq.edu.au Tue Mar 13 14:12:40 2007 From: n.saunders at uq.edu.au (Neil Saunders) Date: Wed, 14 Mar 2007 04:12:40 +1000 Subject: [Bioperl-l] Bio::Tools::Run::Signalp Message-ID: <45F6E998.7040109@uq.edu.au> dear BioPerlers, Re: my last post regarding Bio::Tools::Run::Signalp, my error came from an incorrect $ENV{'SIGNALPDIR'}. My new error is: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: parsing problem in signalp which stems from the Bio::Tools::Signalp parser. So I'm (a) wondering whether this module can parse SignalP 3.0 output and (b) still curious as to how to specify parameters to signalp using the Run module. thanks, Neil -- School of Molecular and Microbial Sciences University of Queensland Brisbane 4072 Australia http://nsaunders.wordpress.com From jason at bioperl.org Tue Mar 13 14:34:48 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 13 Mar 2007 11:34:48 -0700 Subject: [Bioperl-l] Bio::Tools::Run::Signalp In-Reply-To: <45F6E998.7040109@uq.edu.au> References: <45F6E998.7040109@uq.edu.au> Message-ID: <9C84DB98-BC9B-48B9-8F73-B66D57AD1C96@bioperl.org> On Mar 13, 2007, at 11:12 AM, Neil Saunders wrote: > dear BioPerlers, > > Re: my last post regarding Bio::Tools::Run::Signalp, my error came > from an > incorrect $ENV{'SIGNALPDIR'}. > > My new error is: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: parsing problem in signalp > > which stems from the Bio::Tools::Signalp parser. So I'm (a) > wondering whether > this module can parse SignalP 3.0 output and (b) still curious as > to how to > specify parameters to signalp using the Run module. > For the parsing: Unfortunately, I doubt that it was written for SignalP 3.0 since it was probably written before that came out and it hasn't been updated much since then. I think it needs a owner at this point as well as the original devs on this have moved on. Can you perhaps provide good example files for 3.0 output? For the Running: It was written quite a while ago by the fugu folks and I think they just hardcoded the parameters -- if you look at _run code you see this in the setup string: "-t euk -trunc 50 " So this is an example of an orphaned module that needs someone to bring it up to date with the current standards of the bioperl-run package. Sorry to not have better news than that.... -j > thanks, > Neil > -- > School of Molecular and Microbial Sciences > University of Queensland > Brisbane 4072 Australia > > http://nsaunders.wordpress.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070313/5eb708a3/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2613 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070313/5eb708a3/attachment.bin From cjfields at uiuc.edu Tue Mar 13 15:19:35 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 13 Mar 2007 14:19:35 -0500 Subject: [Bioperl-l] Bio::Tools::Run::Signalp In-Reply-To: <9C84DB98-BC9B-48B9-8F73-B66D57AD1C96@bioperl.org> References: <45F6E998.7040109@uq.edu.au> <9C84DB98-BC9B-48B9-8F73-B66D57AD1C96@bioperl.org> Message-ID: <782E8C3B-9656-4AB9-A77E-0B51CEA79046@uiuc.edu> Not sure if the one Emmanuel Quevillon added to Bugzilla is updated for Signalp 3.0, but it is fairly recent. It just hasn't been added in yet: http://bugzilla.open-bio.org/show_bug.cgi?id=2203 Neil, could you try that one out, or file a bug with some example data? chris On Mar 13, 2007, at 1:34 PM, Jason Stajich wrote: > On Mar 13, 2007, at 11:12 AM, Neil Saunders wrote: > >> dear BioPerlers, >> >> Re: my last post regarding Bio::Tools::Run::Signalp, my error came >> from an >> incorrect $ENV{'SIGNALPDIR'}. >> >> My new error is: >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: parsing problem in signalp >> >> which stems from the Bio::Tools::Signalp parser. So I'm (a) >> wondering whether >> this module can parse SignalP 3.0 output and (b) still curious as >> to how to >> specify parameters to signalp using the Run module. >> > For the parsing: > Unfortunately, I doubt that it was written for SignalP 3.0 since it > was probably written before that came out and it hasn't been > updated much since then. I think it needs a owner at this point > as well as the original devs on this have moved on. > Can you perhaps provide good example files for 3.0 output? > > For the Running: > It was written quite a while ago by the fugu folks and I think they > just hardcoded the parameters -- if you look at > _run code you see this in the setup string: > "-t euk -trunc 50 " > > So this is an example of an orphaned module that needs someone to > bring it up to date with the current standards of the bioperl-run > package. > Sorry to not have better news than that.... > > -j > >> thanks, >> Neil >> -- >> School of Molecular and Microbial Sciences >> University of Queensland >> Brisbane 4072 Australia >> >> http://nsaunders.wordpress.com >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jayk at u.arizona.edu Wed Mar 14 00:15:25 2007 From: jayk at u.arizona.edu (Jay Konieczka) Date: Tue, 13 Mar 2007 21:15:25 -0700 Subject: [Bioperl-l] Bio::SearchIO blastxml for multiple queries Message-ID: <44041B68-5A3E-4295-956F-8599AA2F7D34@u.arizona.edu> Hi, My blastxml output, which is supposed to contain multiple queries, each having multiple hits, is improperly parsed by Bio::SearchIO - only the first query is recognized. I realize there has been plenty of talk about this issue since it arose, but I've been having trouble finding instructions for a patch. Is there an easy fix for this? I'm running blastall version 2.2.15 (using blastn) and attempting to parse the output with bioperl 1.4. Thanks for any help, Jay -- Jay H. Konieczka Ph.D. Student, Antin Lab Molecular & Cellular Biology University of Arizona Phone: 1.520.591.3446 1656 E. Mabel, MRB 317 Tucson, AZ 85724 USA _____________________ From zhousuxia88 at 126.com Wed Mar 14 00:49:46 2007 From: zhousuxia88 at 126.com (zhousuxia88 at 126.com) Date: Wed, 14 Mar 2007 12:49:46 +0800 (CST) Subject: [Bioperl-l] =?gb2312?Q?=C0=B4=D7=D4zhousuxia88=B5=C4=D3=CA=BC=FE?= Message-ID: <45F77EEA.0000BC.15326@bj126app3.126.com> From n.haigh at sheffield.ac.uk Wed Mar 14 04:02:21 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Wed, 14 Mar 2007 08:02:21 +0000 Subject: [Bioperl-l] Bio::SearchIO blastxml for multiple queries In-Reply-To: <44041B68-5A3E-4295-956F-8599AA2F7D34@u.arizona.edu> References: <44041B68-5A3E-4295-956F-8599AA2F7D34@u.arizona.edu> Message-ID: <45F7AC0D.80801@sheffield.ac.uk> I believe a lot of work went into the Bio::SearchIO::* modules between bioperl 1.4 and the 1.5.2 developer release. It's probably wise to update your bioperl installation to the latest version and try this again. Nath Jay Konieczka wrote: > Hi, > > My blastxml output, which is supposed to contain multiple queries, > each having multiple hits, is improperly parsed by Bio::SearchIO - > only the first query is recognized. I realize there has been plenty > of talk about this issue since it arose, but I've been having trouble > finding instructions for a patch. Is there an easy fix for this? > I'm running blastall version 2.2.15 (using blastn) and attempting to > parse the output with bioperl 1.4. > > Thanks for any help, > > Jay > > -- > Jay H. Konieczka > Ph.D. Student, Antin Lab > Molecular & Cellular Biology > University of Arizona > > Phone: 1.520.591.3446 > > 1656 E. Mabel, MRB 317 > Tucson, AZ 85724 USA > _____________________ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From progig01 at yahoo.fr Wed Mar 14 06:59:01 2007 From: progig01 at yahoo.fr (al bob) Date: Wed, 14 Mar 2007 11:59:01 +0100 (CET) Subject: [Bioperl-l] NCBI Blast via HTTP Message-ID: <851565.10139.qm@web23302.mail.ird.yahoo.com> Hi, I'm wondering if there is a problem with the remote execution of the NCBI Blast via HTTP using bioperl object Bio::Tools::Run::RemoteBlast. I used it last year and it was perfect but I don't know why after 20th of January, it doesn't work any more : it appears to run very slow but it doen't give a result even after a long time waiting. Thanks for any advice. Bob, ___________________________________________________________________________ D?couvrez une nouvelle fa?on d'obtenir des r?ponses ? toutes vos questions ! Profitez des connaissances, des opinions et des exp?riences des internautes sur Yahoo! Questions/R?ponses http://fr.answers.yahoo.com From cjfields at uiuc.edu Wed Mar 14 09:05:20 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 14 Mar 2007 08:05:20 -0500 Subject: [Bioperl-l] Bio::SearchIO blastxml for multiple queries In-Reply-To: <44041B68-5A3E-4295-956F-8599AA2F7D34@u.arizona.edu> References: <44041B68-5A3E-4295-956F-8599AA2F7D34@u.arizona.edu> Message-ID: <06C49D58-8B94-44EF-B218-70160E29D448@uiuc.edu> This has been fixed in CVS. You will need to update to 1.5.2 since you're running bioperl 1.4, but I would recommend just installing from CVS if that's possible. I will probably do some tweaking to the blastxml parser to better deal with multiple reports. At the moment it parses the full XML output and caches the Result objects, so if you have a huge XML file it will likely choke your system. chris On Mar 13, 2007, at 11:15 PM, Jay Konieczka wrote: > Hi, > > My blastxml output, which is supposed to contain multiple queries, > each having multiple hits, is improperly parsed by Bio::SearchIO - > only the first query is recognized. I realize there has been plenty > of talk about this issue since it arose, but I've been having trouble > finding instructions for a patch. Is there an easy fix for this? > I'm running blastall version 2.2.15 (using blastn) and attempting to > parse the output with bioperl 1.4. > > Thanks for any help, > > Jay > > -- > Jay H. Konieczka > Ph.D. Student, Antin Lab > Molecular & Cellular Biology > University of Arizona > > Phone: 1.520.591.3446 > > 1656 E. Mabel, MRB 317 > Tucson, AZ 85724 USA > _____________________ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From granjeau at tagc.univ-mrs.fr Wed Mar 14 09:19:45 2007 From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137) Date: Wed, 14 Mar 2007 14:19:45 +0100 Subject: [Bioperl-l] Reading a XML sequence (UniParc) into a BioSeq object In-Reply-To: <45E2A59B.6080300@tagc.univ-mrs.fr> References: <45E2A59B.6080300@tagc.univ-mrs.fr> Message-ID: <45F7F671.4030508@tagc.univ-mrs.fr> Hi, Since nobody gave me a clue nor told me that my question is silly (it should be ;-) ), I finally realized a hack within an object that inherits from BioFetch and overloads post process method, converting uniparc XML to swiss format. The really nice approach of parsing uniparc XML and creating a object was too hard for me. It's amazing what BioPerl can do. Regards, --Samuel =head1 NAME ICIM::Bio::DB::BioFetch - Database object interface to BioFetch retrieval =head1 SYNOPSIS see Bio::DB::BioFetch =head1 DESCRIPTION See Bio::DB::BioFetch for main description. The Begin code adds a few databases. The post_process method converts UniParc XML format to Swiss format for string transfer type. =head1 SEE ALSO This module inherits from BioFetch. http://doc.bioperl.org/bioperl-live/Bio/DB/BioFetch.html This module is a light copy of BioFetch. =head1 AUTHOR Email Samuel Granjeaud, Egranjeau at tagc.univ-mrs.frE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ =cut # Let the code begin... package ICIM::Bio::DB::BioFetch; use strict; use warnings; use Bio::Root::IO; use base qw(Bio::DB::BioFetch); BEGIN { $Bio::DB::BioFetch::FORMATMAP{ipi} = { default => 'swiss', # default BioFetch format/SeqIOmodule pair swissprot => 'swiss', # alternative BioFetch format/module pair fasta => 'fasta', # alternative BioFetch format/module pair namespace => 'ipi', }; $Bio::DB::BioFetch::FORMATMAP{uniparc} = { default => 'swiss', # default BioFetch format/SeqIOmodule pair swissprot => 'swiss', # alternative BioFetch format/module pair fasta => 'fasta', # alternative BioFetch format/module pair namespace => 'uniparc', }; } =head2 postprocess_data Title : postprocess_data Usage : $self->postprocess_data ( 'type' => 'string', 'location' => \$datastr); Function: process downloaded data before loading into a Bio::SeqIO Returns : void Args : hash with two keys - 'type' can be 'string' or 'file' - 'location' either file location or string reference containing data =cut sub postprocess_data { my ($self,%args) = @_; # check for errors in the stream if ($args{'type'} eq 'string') { my $stringref = $args{'location'}; if ($$stringref =~ /^ERROR (\d+) (.+)/m) { $self->throw("BioFetch Error $1: $2"); } # Post-process: convert UniParc XML format in swiss format if ($$stringref =~ /^$/msg) { # Get an entry my $seqEntry = $1; $seqEntry =~ s/[\n\r]+/\n/g; # Get ID my ($id) = ( $seqEntry =~ //m ); # Get DR, database croos-references my @dr = (); while ($seqEntry =~ /$/mg) { push (@dr, "DR $1; $2; $3; $4; $5."); } # Get SQ, sequence itself my ($len,$crc,$seq) = ( $seqEntry =~ /$(.+?)<\/sequence>/ms ); $seq =~ s/^/ /mg; $seq =~ s/(\w{10})/ $1/mg; $seq =~ s/(\w{10})(\w{1,9})$/$1 $2/m; $seq =~ s/^( \w{1,9})$/ $1/m; # Append to results push( @pSeq, sprintf("ID %-20s Reviewed; % 5d AA.\n",$id,$len), join("\n", at dr,),"\nSQ SEQUENCE $len AA; $crc CRC64;$seq//\n" ); } # Replace input string by results $$stringref = join('', at pSeq); } } elsif ($args{'type'} eq 'file') { open (F,$args{'location'}) or $self->throw("Couldn't open $args{location}: $!"); # this is dumb, but the error may be anywhere on the first three lines because the # CGI headers are sometimes printed out by the server... my @data = (scalar ,scalar ,scalar ); if (join('', at data) =~ /^ERROR (\d+) (.+)/m) { $self->throw("BioFetch Error $1: $2"); } close F; } else { $self->throw("Don't know how to postprocess data of type $args{'type'}"); } } 1; Samuel GRANJEAUD - IR/IFR137 wrote: > Hello ! > > I would like to fill a BioSeq object with the output from a dbfetch > request at EI on UniParc database (which replies only XML code, as I am > interested in references). If somebody could tell which BioPerl object > to use or a way or convert it in Swiss format or could tell me the way > to do it or has got a piece of code (is > http://doc.bioperl.org/bioperl-live/Bio/SeqIO/interpro.html a good > starting point), I would appreciate a lot. > > Best regards, > --Samuel > > > > active="Y" created="04-Jan-2005" last="15-Dec-2006"/> > version_i="1" active="N" created="15-Feb-2005" last="06-Feb-2007"/> > active="Y" created="03-Apr-2006" last="27-Nov-2006"/> > active="N" created="07-Mar-2005" last="07-Mar-2005"/> > active="N" created="06-Sep-2005" last="06-Oct-2006"/> > active="N" created="15-Aug-2005" last="02-Dec-2005"/> > > > MSTRSVSSSSYRRMFGGPGTASRPSSSRSYVTTSTRTYSLGSALRPSTSRSLYASSPGGV > YATRSSAVRLRSSVPGVRLLQDSVDFSLADAINTEFKNTRTNEKVELQELNDRFANYIDK > VRFLEQQNKILLAELEQLKGQGKSRLGDLYEEEMRELRRQVDQLTNDKARVEVERDNLAE > DIMRLREKLQEEMLQREEAENTLQSFRQDVDNASLARLDLERKVESLQEEIAFLKKLHEE > EIQELQAQIQEQHVQIDVDVSKPDLTAALRDVRQQYESVAAKNLQEAEEWYKSKFADLSE > AANRNNDALRQAKQESTEYRRQVQSLTCEVDALKGTNESLERQMREMEENFAVEAANYQD > TIGRLQDEIQNMKEEMARHLREYQDLLNVKMALDIEIATYRKLLEGEESRISLPLPNFSS > LNLRGKHFISL > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From emmanuel.quevillon at versailles.inra.fr Wed Mar 14 09:36:55 2007 From: emmanuel.quevillon at versailles.inra.fr (Emmanuel Quevillon) Date: Wed, 14 Mar 2007 14:36:55 +0100 Subject: [Bioperl-l] Bio::Tools::Run::Signalp In-Reply-To: <782E8C3B-9656-4AB9-A77E-0B51CEA79046@uiuc.edu> References: <45F6E998.7040109@uq.edu.au> <9C84DB98-BC9B-48B9-8F73-B66D57AD1C96@bioperl.org> <782E8C3B-9656-4AB9-A77E-0B51CEA79046@uiuc.edu> Message-ID: <45F7FA77.7000304@versailles.inra.fr> -----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 Chris Fields wrote: > Not sure if the one Emmanuel Quevillon added to Bugzilla is updated > for Signalp 3.0, but it is fairly recent. It just hasn't been added > in yet: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2203 > > Neil, could you try that one out, or file a bug with some example data? > > chris > > On Mar 13, 2007, at 1:34 PM, Jason Stajich wrote: > >> On Mar 13, 2007, at 11:12 AM, Neil Saunders wrote: >> >>> dear BioPerlers, >>> >>> Re: my last post regarding Bio::Tools::Run::Signalp, my error came >>> from an >>> incorrect $ENV{'SIGNALPDIR'}. >>> >>> My new error is: >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: parsing problem in signalp >>> >>> which stems from the Bio::Tools::Signalp parser. So I'm (a) >>> wondering whether >>> this module can parse SignalP 3.0 output and (b) still curious as >>> to how to >>> specify parameters to signalp using the Run module. >>> >> For the parsing: >> Unfortunately, I doubt that it was written for SignalP 3.0 since it >> was probably written before that came out and it hasn't been >> updated much since then. I think it needs a owner at this point >> as well as the original devs on this have moved on. >> Can you perhaps provide good example files for 3.0 output? >> >> For the Running: >> It was written quite a while ago by the fugu folks and I think they >> just hardcoded the parameters -- if you look at >> _run code you see this in the setup string: >> "-t euk -trunc 50 " >> >> So this is an example of an orphaned module that needs someone to >> bring it up to date with the current standards of the bioperl-run >> package. >> Sorry to not have better news than that.... >> >> -j >> >>> thanks, >>> Neil >>> -- >>> School of Molecular and Microbial Sciences >>> University of Queensland >>> Brisbane 4072 Australia >>> >>> http://nsaunders.wordpress.com >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> -- >> Jason Stajich >> Miller Research Fellow >> University of California, Berkeley >> lab: 510.642.8441 >> http://pmb.berkeley.edu/~taylor/people/js.html >> http://fungalgenomes.org/ >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Hi, I just added the test script like the output (Signalp v3.0). Can you test it from cvs? Let me know how it goes. Sorry I've not done it before. Regards - -- Emmanuel - --------------------------------------------------------------------- Emmanuel Quevillon INRA-URGI / Bayer CropScience 523 Place des Terrasses http://urgi.versailles.inra.fr 91000 EVRY http://gpi.versailles.inra.fr Tel : 01 60 87 37 39 http://www.bayercropscience.com PGP public key server : http://pgp.mit.edu/ Key ID : 0x7888852B - --------------------------------------------------------------------- -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFF9/p3ZxKwy3iIhSsRA9fYAJ47juQc/PNfstkw34BeFawgy+lYIgCgkPEZ T3z4XiTS1b6XJJVL//fz4Fo= =CrWc -----END PGP SIGNATURE----- From cjfields at uiuc.edu Wed Mar 14 10:59:30 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 14 Mar 2007 09:59:30 -0500 Subject: [Bioperl-l] Reading a XML sequence (UniParc) into a BioSeq object In-Reply-To: <45F7F671.4030508@tagc.univ-mrs.fr> References: <45E2A59B.6080300@tagc.univ-mrs.fr> <45F7F671.4030508@tagc.univ-mrs.fr> Message-ID: That's probably the best short-term fix though I'm sure it's quite a bit slower than a direct UniParc XML-to-Bio::Seq via SeqIO. I am looking into adding a few more XML::SAX-based parsers (INSDSeqXML, GBSeqXML, EMBLXML, etc), so we could add UniProt XML to the list (which I think Uniparc uses, correct?). chris On Mar 14, 2007, at 8:19 AM, Samuel GRANJEAUD - IR/IFR137 wrote: > Hi, > > Since nobody gave me a clue nor told me that my question is silly (it > should be ;-) ), I finally realized a hack within an object that > inherits from BioFetch and overloads post process method, converting > uniparc XML to swiss format. The really nice approach of parsing > uniparc > XML and creating a object was too hard for me. > > It's amazing what BioPerl can do. > > Regards, > --Samuel > > =head1 NAME > > ICIM::Bio::DB::BioFetch - Database object interface to BioFetch > retrieval > > =head1 SYNOPSIS > > see Bio::DB::BioFetch > > =head1 DESCRIPTION > > See Bio::DB::BioFetch for main description. > > The Begin code adds a few databases. > > The post_process method converts UniParc XML format to Swiss format > for string transfer type. > > =head1 SEE ALSO > > This module inherits from BioFetch. > http://doc.bioperl.org/bioperl-live/Bio/DB/BioFetch.html > > This module is a light copy of BioFetch. > > =head1 AUTHOR > > Email Samuel Granjeaud, Egranjeau at tagc.univ-mrs.frE > > =head1 APPENDIX > > The rest of the documentation details each of the object > methods. Internal methods are usually preceded with a _ > > =cut > > # Let the code begin... > > package ICIM::Bio::DB::BioFetch; > > use strict; > use warnings; > > use Bio::Root::IO; > > use base qw(Bio::DB::BioFetch); > > BEGIN { > > $Bio::DB::BioFetch::FORMATMAP{ipi} = { > default => 'swiss', # default BioFetch format/SeqIOmodule > pair > swissprot => 'swiss', # alternative BioFetch format/module > pair > fasta => 'fasta', # alternative BioFetch format/module > pair > namespace => 'ipi', > }; > $Bio::DB::BioFetch::FORMATMAP{uniparc} = { > default => 'swiss', # default BioFetch format/SeqIOmodule > pair > swissprot => 'swiss', # alternative BioFetch format/module > pair > fasta => 'fasta', # alternative BioFetch format/module > pair > namespace => 'uniparc', > }; > } > > =head2 postprocess_data > > Title : postprocess_data > Usage : $self->postprocess_data ( 'type' => 'string', > 'location' => \$datastr); > Function: process downloaded data before loading into a Bio::SeqIO > Returns : void > Args : hash with two keys - 'type' can be 'string' or 'file' > - 'location' either file location or > string > reference containing data > > =cut > > sub postprocess_data { > my ($self,%args) = @_; > > # check for errors in the stream > if ($args{'type'} eq 'string') { > my $stringref = $args{'location'}; > if ($$stringref =~ /^ERROR (\d+) (.+)/m) { > $self->throw("BioFetch Error $1: $2"); > } > > # Post-process: convert UniParc XML format in swiss format > if ($$stringref =~ /^ > my @pSeq = (); > while ($$stringref =~ /^($/ > msg) { > # Get an entry > my $seqEntry = $1; > $seqEntry =~ s/[\n\r]+/\n/g; > # Get ID > my ($id) = ( $seqEntry =~ //m ); > # Get DR, database croos-references > my @dr = (); > while ($seqEntry =~ / .+? active="(\S+)" created="(\S+)" last="(\S+)"\/>$/mg) { > push (@dr, "DR $1; $2; $3; $4; $5."); > } > # Get SQ, sequence itself > my ($len,$crc,$seq) = ( $seqEntry =~ / length="(\S+)" crc64="(\S+)">$(.+?)<\/sequence>/ms ); > $seq =~ s/^/ /mg; > $seq =~ s/(\w{10})/ $1/mg; > $seq =~ s/(\w{10})(\w{1,9})$/$1 $2/m; > $seq =~ s/^( \w{1,9})$/ $1/m; > # Append to results > push( @pSeq, > sprintf("ID %-20s Reviewed; % 5d AA.\n", > $id,$len), > join("\n", at dr,),"\nSQ SEQUENCE $len AA; $crc > CRC64;$seq//\n" ); > } > # Replace input string by results > $$stringref = join('', at pSeq); > > } > } > > elsif ($args{'type'} eq 'file') { > open (F,$args{'location'}) or $self->throw("Couldn't open > $args{location}: $!"); > # this is dumb, but the error may be anywhere on the first > three > lines because the > # CGI headers are sometimes printed out by the server... > my @data = (scalar ,scalar ,scalar ); > if (join('', at data) =~ /^ERROR (\d+) (.+)/m) { > $self->throw("BioFetch Error $1: $2"); > } > close F; > } > > else { > $self->throw("Don't know how to postprocess data of type > $args{'type'}"); > } > } > > 1; > > > Samuel GRANJEAUD - IR/IFR137 wrote: >> Hello ! >> >> I would like to fill a BioSeq object with the output from a dbfetch >> request at EI on UniParc database (which replies only XML code, as >> I am >> interested in references). If somebody could tell which BioPerl >> object >> to use or a way or convert it in Swiss format or could tell me the >> way >> to do it or has got a piece of code (is >> http://doc.bioperl.org/bioperl-live/Bio/SeqIO/interpro.html a good >> starting point), I would appreciate a lot. >> >> Best regards, >> --Samuel >> >> >> >> > active="Y" created="04-Jan-2005" last="15-Dec-2006"/> >> > version_i="1" active="N" created="15-Feb-2005" last="06-Feb-2007"/> >> > active="Y" created="03-Apr-2006" last="27-Nov-2006"/> >> > active="N" created="07-Mar-2005" last="07-Mar-2005"/> >> > active="N" created="06-Sep-2005" last="06-Oct-2006"/> >> > active="N" created="15-Aug-2005" last="02-Dec-2005"/> >> >> >> MSTRSVSSSSYRRMFGGPGTASRPSSSRSYVTTSTRTYSLGSALRPSTSRSLYASSPGGV >> YATRSSAVRLRSSVPGVRLLQDSVDFSLADAINTEFKNTRTNEKVELQELNDRFANYIDK >> VRFLEQQNKILLAELEQLKGQGKSRLGDLYEEEMRELRRQVDQLTNDKARVEVERDNLAE >> DIMRLREKLQEEMLQREEAENTLQSFRQDVDNASLARLDLERKVESLQEEIAFLKKLHEE >> EIQELQAQIQEQHVQIDVDVSKPDLTAALRDVRQQYESVAAKNLQEAEEWYKSKFADLSE >> AANRNNDALRQAKQESTEYRRQVQSLTCEVDALKGTNESLERQMREMEENFAVEAANYQD >> TIGRLQDEIQNMKEEMARHLREYQDLLNVKMALDIEIATYRKLLEGEESRISLPLPNFSS >> LNLRGKHFISL >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From e-just at northwestern.edu Wed Mar 14 11:00:21 2007 From: e-just at northwestern.edu (Eric Just) Date: Wed, 14 Mar 2007 10:00:21 -0500 Subject: [Bioperl-l] Tandem Repeats finder parser In-Reply-To: References: Message-ID: Ok, Bio::Tools::TandemRepeatsFinder is now in CVS. It only parses the data file that is created when TRF is run with the -d option. It does not parse any of the html files that are produced. Look at the POD for a quick usage summary as well as t/TandemRepeatsFinder.t for how to extract each piece of data. Let me know if you have any questions or suggestions. (From TRF docs:) ---------------------------------------------------------------------------------------------------- -d: A data file is produced if this option is present. This file is a text file which contains the same information, in the same order, as the summary table file, plus consensus pattern and repeat sequences. This file contains no labeling and is suitable for additional processing, for example with a perl script, outside of the program. Eric On 3/13/07, Mark Johnson wrote: > I'm going to need a trf parser. I'd be happy to use yours instead of > writing one myself. I'll be eagerly watching cvs. 8) > > On 3/12/07, Eric Just wrote: > > Hi there, > > > > I have written a simple parser for Tandem Repeats Finder output. Is > > there any interest in including this module in Bioperl? If so, I will > > conform it to Bioperl standards, write some tests, then send it to > > whomever is interested. > > > > Eric > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From granjeau at tagc.univ-mrs.fr Wed Mar 14 11:39:32 2007 From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137) Date: Wed, 14 Mar 2007 16:39:32 +0100 Subject: [Bioperl-l] Reading a XML sequence (UniParc) into a BioSeq object In-Reply-To: References: <45E2A59B.6080300@tagc.univ-mrs.fr> <45F7F671.4030508@tagc.univ-mrs.fr> Message-ID: <45F81734.4050509@tagc.univ-mrs.fr> Chris Fields wrote: > That's probably the best short-term fix though I'm sure it's quite a > bit slower than a direct UniParc XML-to-Bio::Seq via SeqIO. I am > looking into adding a few more XML::SAX-based parsers (INSDSeqXML, > GBSeqXML, EMBLXML, etc), so we could add UniProt XML to the list > (which I think Uniparc uses, correct?). I think so, but I am not sure of that since I am a dummy of XML and UniParc is using a very small subset of uniprot features. Best thing is to have an expert look at http://www.pir.uniprot.org/support/docs/uniprot.xsd http://www.pir.uniprot.org/support/docs/xml_news.htm Adding the listed XML parsers to BioPerl would be nice. Regards, --Samuel From cjfields at uiuc.edu Wed Mar 14 12:16:52 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 14 Mar 2007 11:16:52 -0500 Subject: [Bioperl-l] NCBI Blast via HTTP In-Reply-To: <851565.10139.qm@web23302.mail.ird.yahoo.com> References: <851565.10139.qm@web23302.mail.ird.yahoo.com> Message-ID: I tried this today and had no issues. The NCBI BLAST server is typically busy so you may have to wait dep. on the type of BLAST you intend on running, whether you use stats, etc. You can check on the status if you set RemoteBlast::verbose() to 1. Warning: it's pretty noisy as verbose returns the HTMLized page containing the length of time to recheck the RID. chris On Mar 14, 2007, at 5:59 AM, al bob wrote: > Hi, > > I'm wondering if there is a problem with the remote > execution of the NCBI Blast via HTTP using bioperl > object Bio::Tools::Run::RemoteBlast. I used it last > year and it was perfect but I don't know why after > 20th of January, it doesn't work any more : it appears > to run very slow but it doen't give a result even > after a long time waiting. > > Thanks for any advice. > > Bob, > > > > > > > ______________________________________________________________________ > _____ > D?couvrez une nouvelle fa?on d'obtenir des r?ponses ? toutes vos > questions ! > Profitez des connaissances, des opinions et des exp?riences des > internautes sur Yahoo! Questions/R?ponses > http://fr.answers.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From dwaner at scitegic.com Wed Mar 14 12:10:40 2007 From: dwaner at scitegic.com (dwaner at scitegic.com) Date: Wed, 14 Mar 2007 09:10:40 -0700 Subject: [Bioperl-l] NCBI Blast via HTTP In-Reply-To: <851565.10139.qm@web23302.mail.ird.yahoo.com> Message-ID: We have seen this too, and I think I have tracked it down to recent changes in NCBI's results page format. Specifically, there is no longer the QBlastInfo block that BioPerl uses to detect that the results are ready. There is also no closing PRE tag that BioPerl uses to detect the end of the blast results text. Also, in looking at the BioPerl code, I see that it is assuming that "QBlastInfoBegin" and "Status=READY" are on separate lines. This is not always the case, and has caused failures for us in the past. I am looking into this now, and communicating with NCBI about the format changes. I will keep the BioPerl list updated when I know more. David Waner Bioinformatics Software Engineer Accelrys/SciTegic From cjfields at uiuc.edu Wed Mar 14 12:42:49 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 14 Mar 2007 11:42:49 -0500 Subject: [Bioperl-l] NCBI Blast via HTTP In-Reply-To: References: <851565.10139.qm@web23302.mail.ird.yahoo.com> Message-ID: Ah, spoke too soon. There seems to be an issue with the RID being removed properly when retrieving BLAST in text format which causes an infinite loop; this is popping up in RemoteBlast tests. I'll look into that. chris On Mar 14, 2007, at 11:16 AM, Chris Fields wrote: > I tried this today and had no issues. The NCBI BLAST server is > typically busy so you may have to wait dep. on the type of BLAST you > intend on running, whether you use stats, etc. You can check on the > status if you set RemoteBlast::verbose() to 1. Warning: it's pretty > noisy as verbose returns the HTMLized page containing the length of > time to recheck the RID. > > chris > > On Mar 14, 2007, at 5:59 AM, al bob wrote: > >> Hi, >> >> I'm wondering if there is a problem with the remote >> execution of the NCBI Blast via HTTP using bioperl >> object Bio::Tools::Run::RemoteBlast. I used it last >> year and it was perfect but I don't know why after >> 20th of January, it doesn't work any more : it appears >> to run very slow but it doen't give a result even >> after a long time waiting. >> >> Thanks for any advice. >> >> Bob, >> >> >> >> >> >> >> _____________________________________________________________________ >> _ >> _____ >> D?couvrez une nouvelle fa?on d'obtenir des r?ponses ? toutes vos >> questions ! >> Profitez des connaissances, des opinions et des exp?riences des >> internautes sur Yahoo! Questions/R?ponses >> http://fr.answers.yahoo.com >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Mar 14 13:05:56 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 14 Mar 2007 12:05:56 -0500 Subject: [Bioperl-l] NCBI Blast via HTTP In-Reply-To: References: Message-ID: David, The problem I am seeing (using a verbose set to 1) is repeated queries to retrieve the report using an RID that isn't removed. The report retrieval works for all formats (Text, Tabular, XML) but the RID isn't removed in the case of tabular or text output, thus gets stuck in an infinite loop. XML for some reason works just fine, so I suggest anyone who uses RemoteBlast to switch to XML format parsing for the time being until this is fixed. my $remote_blastxml = Bio::Tools::Run::RemoteBlast->new ('-prog' => $prog, '-data' => $db, '-readmethod' => 'xml', '-expect' => $e_val, ); $remote_blastxml->retrieve_parameter('FORMAT_TYPE', 'XML'); then submit as normal. chris On Mar 14, 2007, at 11:10 AM, dwaner at scitegic.com wrote: > We have seen this too, and I think I have tracked it down to recent > changes in NCBI's results page format. Specifically, there is no > longer > the QBlastInfo block that BioPerl uses to detect that the results are > ready. There is also no closing PRE tag that BioPerl uses to detect > the > end of the blast results text. > > Also, in looking at the BioPerl code, I see that it is assuming that > "QBlastInfoBegin" and "Status=READY" are on separate lines. This is > not > always the case, and has caused failures for us in the past. > > I am looking into this now, and communicating with NCBI about the > format > changes. I will keep the BioPerl list updated when I know more. > > David Waner > Bioinformatics Software Engineer > Accelrys/SciTegic > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Mar 14 15:14:51 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 14 Mar 2007 14:14:51 -0500 Subject: [Bioperl-l] NCBI Blast via HTTP In-Reply-To: References: Message-ID: I have a tentative fix for this which I've committed to CVS. When Jason set up the module he checked for the xml header tag as well as the status line but had no specific check for text/tabular output. I added a check for the first BLAST line which should break out of the loop for all BLAST types (famous last words...) which passes our tests. chris On Mar 14, 2007, at 12:25 PM, dwaner at scitegic.com wrote: > > Chris, > > I have been monitoring the HTTP traffic when submitting blast > requests, and the reason that XML format works and TEXT format > doesn't is that the XML format still contains the "QBlastInfo... > stauts=READY" block, but the TEXT format doesn't. I'm trying to > get NCBI to restore this information to the TEXT result message. > Apparently NCBI had a similar problem in February and restored the > QBlastInfo comment block on request from someone at Accelrys. I > don't know why they changed it back - and they don't seem to know > either. I will let you know if we can get this fixed on NCBI's end. > > Regarding XML Blast, is that working in 1.5.1? We are still > relying on TEXT format because it was our understanding that XML > parsing was not yet working in the BioPerl version that we are > currently using with Pipeline Pilot. > > David > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From dwaner at scitegic.com Wed Mar 14 16:36:40 2007 From: dwaner at scitegic.com (dwaner at scitegic.com) Date: Wed, 14 Mar 2007 13:36:40 -0700 Subject: [Bioperl-l] NCBI Blast via HTTP Message-ID: Chris (and other interested parties): I just ran our tests again and NCBI has restored the block to the TEXT format blast results. BioPerl remote blast should now work, even without your changes. David Waner Bioinformatics Software Engineer Accelrys/Scitegic From cjfields at uiuc.edu Wed Mar 14 17:27:11 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 14 Mar 2007 16:27:11 -0500 Subject: [Bioperl-l] NCBI Blast via HTTP In-Reply-To: References: Message-ID: <0FC81BC8-5561-497E-8F95-B13E59047774@uiuc.edu> I'll leave the changes in for now, just in case. It seems to catch both instances. chris On Mar 14, 2007, at 3:36 PM, dwaner at scitegic.com wrote: > Chris (and other interested parties): > > I just ran our tests again and NCBI has restored the > > > > block to the TEXT format blast results. BioPerl remote blast > should now > work, even without your changes. > > > David Waner > Bioinformatics Software Engineer > Accelrys/Scitegic > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From daniel.lang at biologie.uni-freiburg.de Fri Mar 16 14:37:26 2007 From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang) Date: Fri, 16 Mar 2007 19:37:26 +0100 Subject: [Bioperl-l] nasty space in Bio::DB::SeqFeature::Store prevents Gbrowse from running with bioperl-live Message-ID: <45FAE3E6.4000303@biologie.uni-freiburg.de> Hi, there's a tiny error in the pod doc of the alias get_feature_by_name that prevents it to be found at all (e.g. when using gbrowse). The attached patch should fix it. Daniel:-) -- Daniel Lang University of Freiburg, Plant Biotechnology Schaenzlestr. 1, D-79104 Freiburg fax: +49 761 203 6945 phone: +49 761 203 6974 homepage: http://www.plant-biotech.net/ e-mail: daniel.lang at biologie.uni-freiburg.de ################################################# My software never has bugs. It just develops random features. ################################################# -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Store.pm.patch Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070316/e82d8f24/attachment.ksh From hlapp at gmx.net Sat Mar 17 10:26:04 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 17 Mar 2007 10:26:04 -0400 Subject: [Bioperl-l] [BioSQL-l] Bio::Annotation::StructuredValue In-Reply-To: <8C776F12-19CF-4511-91F2-ED9640FB995C@uiuc.edu> References: <8C776F12-19CF-4511-91F2-ED9640FB995C@uiuc.edu> Message-ID: <1F43323F-22D8-4A1D-A62E-46E60A59D97C@gmx.net> On Mar 16, 2007, at 3:44 PM, Chris Fields wrote: > Does bioperl-db store Bio::Annotation::StructuredValue (i.e. in > SwissProt)? It does b/c B::A::StructuredValue ISA B::A::SimpleValue and it handles the latter. This isn't ideal because if you're like me you'd want all the individual values to each translate to its own row. I was using a SeqProcessor to convert the StructuredValue objects into arrays of SimpleValue objects. Obviously, this will lose the structure between them (i.e., in reality it's not just a flat array), but for enabling indexed searches it works well. With Uniprot no longer collapsing per sequence, the thing that gets lost is the semantic context of each token, but as you found out correctly it gets lost at the bioperl level already. > I am thinking of using StructuredValue, Data::Stag, or > Class::Meta for some of my RNA structural data work but didn't know > if StructuredValues would persist via bioperl-db. At this point they are either flattened out (through the overridden value() method), or you convert them upfront into an array, using a SeqProcessor. BioSQL has no provision for storing the fact that a number of tag/ value associations (which is what B::A::SimpleValues are) comprise of a "bag" of annotation that belongs together. You could, however, persist that through embedding the tags in an ontology (tags are ontology terms) that captures that (through rel.ships). > > I also noticed there is an outstanding BioPerl bug (http:// > bugzilla.open-bio.org/show_bug.cgi?id=1825) where Hilmar suggested > reimplementing StructuredValueto use Data::Stag, so I thought I might > give it a try. Sounds good :-) I hope the above makes some sense. Let me know if not. -hilmar > > chris > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Mar 17 16:15:01 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 17 Mar 2007 16:15:01 -0400 Subject: [Bioperl-l] Phyloinformatics Summer of Code Message-ID: (Apologies if you receive multiple copies of this. The message is being posted to multiple channels.) (Note that leading developers from both Biojava and BioPerl are amongst the mentors.) Phyloinformatics Summer of Code 2007 A collaborative Phyloinformatics Group, sponsored by the National Evolutionary Synthesis Center (NESCent: http://www.nescent.org/), is working to develop user-interfaces, improve software interoperability and support data exchange standards in evolutionary bioinformatics. The specific projects are diverse in nature and range from the development of AJAX components for web-based bioinformatics applications, managing workflows using approaches from functional and logic programming, and developing data exchange standards for phylogenetic substitution models. The Phyloinformatics group will be sponsoring student collaborators through the Google Summer of Code program (http://code.google.com/ soc), which provides undergraduate, masters and PhD students with a unique opportunity (over three summer months) to obtain hands-on experience writing and extending open-source software under the mentorship of experienced developers from around the world. We are particularly targeting students interested in both evolutionary biology and software development. Students will have one or more dedicated mentors with expertise in phylogenetic methods and open- source software development. Our project proposals are flexible and can be adjusted in scope to match the skills of students with less programming proficiency. If the program sounds interesting to you but you are unsure whether you have the necessary skills, please email the mentors at phylosoc {at} nescent {dot} org. We will work with those who are genuinely interested to find a project that fits your interest and skills. Students will receive a stipend from Google and will be invited to participate in future collaborative events such as the NESCent Phyloinformatics Hackathons (http://www.nescent.org/wg/ phyloinformatics). TO APPLY: Students must apply on-line at the Google Summer of Code website (http://code.google.com/soc). The application period for students is now open and ends on Saturday, March 24, 2007 (one week from now). The Phyloinformatics Summer of Code project and ideas page is at the following URL: http://phyloinformatics.net/Phyloinformatics_Summer_of_Code_2007 The above page also contains links to the GSoC program rules, eligibility requirements, and stipend payment mechanism. We encourage all interested students to email any questions, or self-proposed project ideas, to phylosoc {at} nescent {dot} org. This will reach all prospective mentors. Eligibility requirements for students: http://code.google.com/support/bin/answer.py?answer=60279&topic=10730 Stipend for students: http://code.google.com/support/bin/answer.py?answer=60322&topic=10731 Please disseminate this announcement to appropriate students at your institution. Hilmar Lapp Assistant Director for Informatics NESCent From lstein at cshl.edu Mon Mar 19 11:49:38 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 19 Mar 2007 11:49:38 -0400 Subject: [Bioperl-l] nasty space in Bio::DB::SeqFeature::Store prevents Gbrowse from running with bioperl-live In-Reply-To: <45FAE3E6.4000303@biologie.uni-freiburg.de> References: <45FAE3E6.4000303@biologie.uni-freiburg.de> Message-ID: <6dce9a0b0703190849q167e4fd9k4f21da7cc915ce4a@mail.gmail.com> Thanks! It's fixed. Lincoln On 3/16/07, Daniel Lang wrote: > > Hi, > > there's a tiny error in the pod doc of the alias get_feature_by_name > that prevents it to be found at all (e.g. when using gbrowse). > > The attached patch should fix it. > > Daniel:-) > -- > > Daniel Lang > University of Freiburg, Plant Biotechnology > Schaenzlestr. 1, D-79104 Freiburg > fax: +49 761 203 6945 > phone: +49 761 203 6974 > homepage: http://www.plant-biotech.net/ > e-mail: daniel.lang at biologie.uni-freiburg.de > > ################################################# > My software never has bugs. > It just develops random features. > ################################################# > > > > > --- bioperl-live/Bio/DB/SeqFeature/Store.pm 2007-03-16 19:22: > 21.000000000 +0100 > +++ Store.pm 2007-03-16 19:23:53.000000000 +0100 > @@ -705,7 +705,7 @@ > > This method is provided for backward compatibility with gbrowse. > > -= cut > +=cut > > sub get_feature_by_name { shift->get_features_by_name(@_) } > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lavende_dresden at hotmail.com Mon Mar 19 12:17:24 2007 From: lavende_dresden at hotmail.com (zhang zhang) Date: Mon, 19 Mar 2007 16:17:24 +0000 Subject: [Bioperl-l] about mirbase Message-ID: Dear all, Is there any modules in Bioperl which can link to MirBase(micrRNA database)?? If there is, could you give me the script about how to access it? Thanks in advance. Regards, Jenny _________________________________________________________________ ?????????????????????????????? MSN Hotmail?? http://www.hotmail.com From DGroskreutz at twt.com Tue Mar 20 02:00:16 2007 From: DGroskreutz at twt.com (DGroskreutz at twt.com) Date: Tue, 20 Mar 2007 01:00:16 -0500 Subject: [Bioperl-l] CN=Deb Groskreutz/OU=MSN/O=TWT is out of the office. Message-ID: I will be out of the office starting 03/17/2007 and will not return until 03/26/2007. Thanks, Deb NOTICE OF CONFIDENTIALITY: The information contained in this communication, including attachments, is intended for the specific delivery to and use by the individual(s) to whom it is addressed. This email includes confidential information that may be attorney-client privileged. Any review, retransmission, dissemination, or unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please reply to the sender immediately and delete the original communication and any copy of it from your computer system, including all attachments. From pmiguel at purdue.edu Tue Mar 20 15:19:09 2007 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Tue, 20 Mar 2007 15:19:09 -0400 Subject: [Bioperl-l] [Possibly confidential informtion deleted] In-Reply-To: References: Message-ID: <460033AD.9050504@purdue.edu> (Note--parts of the message below have been redacted to comply with its (apparently mandatory) NOTICE OF CONFIDENTIALITY) xxxxxxxxxxx at xxx.com wrote: > NOTICE OF CONFIDENTIALITY: > The information contained in this communication, including attachments, is intended for the specific delivery to and use by the individual(s) to whom it is addressed. This email includes confidential information that may be attorney-client privileged. Any review, retransmission, dissemination, or unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please reply to the sender immediately and delete the original communication and any copy of it from your computer system, including all attachments. > > My apologies to everyone reading this for not possessing the self-control to ignore the message to which I'm responding. But, anyone else find themselves ill-disposed towards senders of unsolicited emails that contain "Notices of Confidentiality" like the one above? -- Phillip From emmanuel.quevillon at versailles.inra.fr Tue Mar 20 16:36:18 2007 From: emmanuel.quevillon at versailles.inra.fr (Emmanuel Quevillon) Date: Tue, 20 Mar 2007 21:36:18 +0100 Subject: [Bioperl-l] Bio::Index::Abstract improvement Message-ID: <460045C2.4030704@versailles.inra.fr> Dear list, I started playing with Bio::Index::* module and particularly with Bio::Index::Blast. But unfortunately, I have a problem with Bio::Index::Abstract::index_file method. The thing is that when it indexes the input file, it forces the indexed file to have a key with the absolute path of the original file to be indexed by using File::Spec::rel2abs. What I would like to do is to index the file(s) in some place, on a faster machine than mine, and transfer the original and indexed files onto mine to take profit of the index to retrieve results. Unfortunately, it is not possible right now. So I wondered if it could be possible to implement some kind of option to pass to the new method to tell not to keep the absolute path of data file and then consider relative one. I don't know if my request could interest more people, but I think this could be interesting anyway. Thanks in advance Regards Emmanuel From luciap at sas.upenn.edu Tue Mar 20 16:47:02 2007 From: luciap at sas.upenn.edu (Lucia Peixoto) Date: Tue, 20 Mar 2007 16:47:02 -0400 Subject: [Bioperl-l] bug in Bio:Tree:IO?? Message-ID: <1174423622.4600484632db4@webmail.sas.upenn.edu> hi, I've noticed that when using: $tree->remove_Node($node), the nodes are removed correctly but the extra parenthesis are left, which creates problems if the trees are going to be used in further applicationshas anyone experinced a similar problem? Lucia Peixoto Department of Biology,SAS University of Pennsylvania From cjfields at uiuc.edu Tue Mar 20 17:40:00 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Mar 2007 16:40:00 -0500 Subject: [Bioperl-l] bug in Bio:Tree:IO?? In-Reply-To: <1174423622.4600484632db4@webmail.sas.upenn.edu> References: <1174423622.4600484632db4@webmail.sas.upenn.edu> Message-ID: Have you tried the latest bioperl release (1.5.2)? If the latest release shows this problem you should file this as a bug along with data and a demo script so we can reproduce it. chris On Mar 20, 2007, at 3:47 PM, Lucia Peixoto wrote: > hi, > > I've noticed that when using: > $tree->remove_Node($node), the nodes are removed correctly but the > extra > parenthesis are left, which creates problems if the trees are going > to be used > in further applicationshas anyone experinced a similar problem? > > > Lucia Peixoto > Department of Biology,SAS > University of Pennsylvania > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Max.Kaufmann at lazard.com Tue Mar 20 19:03:34 2007 From: Max.Kaufmann at lazard.com (Max.Kaufmann at lazard.com) Date: Tue, 20 Mar 2007 19:03:34 -0400 Subject: [Bioperl-l] job opportunity Message-ID: I recognize the value of working with people who love what they do, that is why I am looking into the open source community. I am a statistician so I have no idea about how to find OSS programmers. Any suggestions are welcomed. A quantitative hedge fund within Lazard Asset Management in New York City is looking for a talented programmer to join a small team of researchers and portfolio managers. The job would intersect the fields of finance, applied statistics and computer science. The position involves applying intelligent design and development of R and SQL programs to accelerate the speed of research. The individual would work directly with the portfolio manager to find and exploit market inefficiencies in the global equity markets. The role * Data analysis on large financial and market databases * Designing data structures, classes and method to test new research ideas * Use a diversity of statistical methods to estimate stock picking models. * Design and implementation of algorithms, * Software development The ideal candidate will have the following qualifications: * Formal education in Computer science (preferable), mathematics or statistics * Experience programming any language * Knowledge of applied statistics or mathematics (not required but interested). * Excellent communication skills. * Experience of general software systems is an advantage (eg .NET, XML, C++, ?). * Willingness and strong interest to learn about finance and capital markets. This opening offers a unique opportunity to get into the highly lucrative hedge fund business within a well-established and reputable firm. Because the team is small the individual will learn quickly about portfolio management and have fun using his or her skills freely. For further information, please send a resume to the email below. lamrecruiting at lazard.com Subject line: ?Quant-Analyst? From cjfields at uiuc.edu Tue Mar 20 23:38:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Mar 2007 22:38:46 -0500 Subject: [Bioperl-l] about mirbase In-Reply-To: References: Message-ID: Unfortunately there is no direct interface that I know of. I did notice you can download the miRNA sequences in EMBL format or FASTA format from here: http://microrna.sanger.ac.uk/sequences/ftp.shtml so you might be able to index them somehow locally. chris On Mar 19, 2007, at 11:17 AM, zhang zhang wrote: > Dear all, > > Is there any modules in Bioperl which can link to MirBase(micrRNA > database)?? If there is, could you give me the script about how to > access it? Thanks in advance. > > Regards, > Jenny > > _________________________________________________________________ > ??????????????? MSN Hotmail? http:// > www.hotmail.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From n.saunders at uq.edu.au Tue Mar 20 23:59:45 2007 From: n.saunders at uq.edu.au (Neil Saunders) Date: Wed, 21 Mar 2007 13:59:45 +1000 Subject: [Bioperl-l] Bio::Tools::Run::Signalp In-Reply-To: <9C84DB98-BC9B-48B9-8F73-B66D57AD1C96@bioperl.org> References: <45F6E998.7040109@uq.edu.au> <9C84DB98-BC9B-48B9-8F73-B66D57AD1C96@bioperl.org> Message-ID: <4600ADB1.4030708@uq.edu.au> hi guys, Thanks for your input on Bio::Tools::Signalp. I got a colleague of mine, Ronald Schroeter (ronalds at itee.uq.edu.au) to look at the module and he has made some minor modifications so that it now parses signalp version 3.0 output correctly. I've added his module as an attachment at bugzilla: http://bugzilla.open-bio.org/show_bug.cgi?id=2203 This is not meant to replace Emmanuel's efforts with ExtendedSignalp.pm - rather, it functions as a replacement for the existing Signalp.pm (which doesn't handle version 3 output). I hope this doesn't muddy the waters too much. Neil -- School of Molecular and Microbial Sciences University of Queensland Brisbane 4072 Australia http://nsaunders.wordpress.com From cjfields at uiuc.edu Wed Mar 21 00:24:10 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Mar 2007 23:24:10 -0500 Subject: [Bioperl-l] Bio::Tools::Run::Signalp In-Reply-To: <4600ADB1.4030708@uq.edu.au> References: <45F6E998.7040109@uq.edu.au> <9C84DB98-BC9B-48B9-8F73-B66D57AD1C96@bioperl.org> <4600ADB1.4030708@uq.edu.au> Message-ID: <47F3C071-99A9-4129-9B17-F39B659BD403@uiuc.edu> On Mar 20, 2007, at 10:59 PM, Neil Saunders wrote: > hi guys, > > Thanks for your input on Bio::Tools::Signalp. I got a colleague of > mine, Ronald Schroeter (ronalds at itee.uq.edu.au) to look at the > module and he has made some minor modifications so that it now > parses signalp version 3.0 output correctly. > > I've added his module as an attachment at bugzilla: > http://bugzilla.open-bio.org/show_bug.cgi?id=2203 > > This is not meant to replace Emmanuel's efforts with > ExtendedSignalp.pm - rather, it functions as a replacement for the > existing Signalp.pm (which doesn't handle version 3 output). I > hope this doesn't muddy the waters too much. > > > Neil Thanks for the module Neil; the additional v3 Signalp tests now pass. I committed both to CVS. We probably should merge the two Signalp modules ala the recent Glimmer work (parses all Glimmer versions now). Unfortunately I don't know anyone who has time for it at this moment (I'm up to my neck in my own work!). chris From cjfields at uiuc.edu Wed Mar 21 00:34:22 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Mar 2007 23:34:22 -0500 Subject: [Bioperl-l] Two Bio::DB::Flat swiss adaptors Message-ID: <40F32EFC-E515-4B5C-9008-BFD90735C232@uiuc.edu> Just curious, but why are there two Bio::DB::Flat::BDB swissprot adaptors (Bio::DB::Flat::BDB::swiss, Bio::DB::Flat::BDB::swissprot)? The only significant difference between the two is the following line in seq_to_ids(): Bio::DB::Flat::BDB::swiss : $ids{VERSION} = "$accession.$version" if defined $accession && defined $version; Bio::DB::Flat::BDB::swiss : $ids{VERSION} = "$accession.version" if defined $accession && defined $version; The second is missing '$' in version, which I'm guessing is a bug? chris From betts at embl.de Wed Mar 21 09:39:28 2007 From: betts at embl.de (Matthew Betts) Date: Wed, 21 Mar 2007 14:39:28 +0100 (CET) Subject: [Bioperl-l] Bio::Graphics::Panel and ModPerl::Registry Message-ID: Hi, I'm using Bio::Graphics::Panel under modperl2. The script runs without errors the first time, but every subsequent time I get errors like: Illegal hexadecimal digit '-' ignored at /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Panel.pm line 813, line 18. As far as I can find out this is because __END__ and __DATA__ tokens can not be used with mod_perl. http://perl.apache.org/docs/1.0/guide/porting.html Am I doing something wrong, and is there any way around this please? Thanks, Matthew From cjfields at uiuc.edu Wed Mar 21 11:06:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 21 Mar 2007 10:06:46 -0500 Subject: [Bioperl-l] Bio::Graphics::Panel and ModPerl::Registry In-Reply-To: References: Message-ID: I don't think the __END__/__DATA__ tag is the issue. The mod_perl docs section you refer to indicates the problems with using __END__/ __DATA__ lie in the actual Apache::Registry scripts (which are wrapped in a handler sub), not Perl modules. The error you report doesn't match up either; I would expect a missing brackets problem, as stated in the docs. What bioperl version are you using? The line numbering in your reported error doesn't match up with bioperl CVS or the last release. One aspect may be that this has been fixed already, so you could try updating bioperl to the latest release or from CVS to see if that fixes the problem. chris On Mar 21, 2007, at 8:39 AM, Matthew Betts wrote: > > Hi, > > I'm using Bio::Graphics::Panel under modperl2. The script runs without > errors the first time, but every subsequent time I get errors like: > > Illegal hexadecimal digit '-' ignored at > /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Panel.pm line 813, > line > 18. > > As far as I can find out this is because __END__ and __DATA__ > tokens can > not be used with mod_perl. > http://perl.apache.org/docs/1.0/guide/porting.html > > Am I doing something wrong, and is there any way around this please? > > Thanks, > > Matthew > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Wed Mar 21 11:59:37 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 21 Mar 2007 11:59:37 -0400 Subject: [Bioperl-l] Two Bio::DB::Flat swiss adaptors In-Reply-To: <40F32EFC-E515-4B5C-9008-BFD90735C232@uiuc.edu> References: <40F32EFC-E515-4B5C-9008-BFD90735C232@uiuc.edu> Message-ID: Maybe someone wanted to change the name and failed to remove the original? On Mar 21, 2007, at 12:34 AM, Chris Fields wrote: > Just curious, but why are there two Bio::DB::Flat::BDB swissprot > adaptors (Bio::DB::Flat::BDB::swiss, Bio::DB::Flat::BDB::swissprot)? > The only significant difference between the two is the following line > in seq_to_ids(): > > Bio::DB::Flat::BDB::swiss : $ids{VERSION} = "$accession.$version" > if defined $accession && defined $version; > Bio::DB::Flat::BDB::swiss : $ids{VERSION} = "$accession.version" > if defined $accession && defined $version; > > The second is missing '$' in version, which I'm guessing is a bug? > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Wed Mar 21 12:19:20 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 21 Mar 2007 11:19:20 -0500 Subject: [Bioperl-l] Two Bio::DB::Flat swiss adaptors In-Reply-To: References: <40F32EFC-E515-4B5C-9008-BFD90735C232@uiuc.edu> Message-ID: That's what I am thinking as well. I may try removing the one with the version error to see what happens with tests... chris On Mar 21, 2007, at 10:59 AM, Hilmar Lapp wrote: > Maybe someone wanted to change the name and failed to remove the > original? > > On Mar 21, 2007, at 12:34 AM, Chris Fields wrote: > >> Just curious, but why are there two Bio::DB::Flat::BDB swissprot >> adaptors (Bio::DB::Flat::BDB::swiss, Bio::DB::Flat::BDB::swissprot)? >> The only significant difference between the two is the following line >> in seq_to_ids(): >> >> Bio::DB::Flat::BDB::swiss : $ids{VERSION} = "$accession.$version" >> if defined $accession && defined $version; >> Bio::DB::Flat::BDB::swiss : $ids{VERSION} = "$accession.version" >> if defined $accession && defined $version; >> >> The second is missing '$' in version, which I'm guessing is a bug? >> >> chris >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Mar 21 12:34:42 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 21 Mar 2007 11:34:42 -0500 Subject: [Bioperl-l] Two Bio::DB::Flat swiss adaptors In-Reply-To: References: Message-ID: Yes, I agree. The potential error with $version is actually in the 'swissprot' version. flat.t tests only use 'swiss' so I'm presuming that's the keeper. chris On Mar 21, 2007, at 11:23 AM, Brian Osborne wrote: > Chris, > > "swiss" is presumably the preferred name since this is what is used > over in > SeqIO. > > Brian O. > > > On 3/21/07 12:19 PM, "Chris Fields" wrote: > >> That's what I am thinking as well. I may try removing the one with >> the version error to see what happens with tests... >> >> chris >> >> On Mar 21, 2007, at 10:59 AM, Hilmar Lapp wrote: >> >>> Maybe someone wanted to change the name and failed to remove the >>> original? >>> >>> On Mar 21, 2007, at 12:34 AM, Chris Fields wrote: >>> >>>> Just curious, but why are there two Bio::DB::Flat::BDB swissprot >>>> adaptors (Bio::DB::Flat::BDB::swiss, >>>> Bio::DB::Flat::BDB::swissprot)? >>>> The only significant difference between the two is the following >>>> line >>>> in seq_to_ids(): >>>> >>>> Bio::DB::Flat::BDB::swiss : $ids{VERSION} = "$accession.$version" >>>> if defined $accession && defined $version; >>>> Bio::DB::Flat::BDB::swiss : $ids{VERSION} = "$accession.version" >>>> if defined $accession && defined $version; >>>> >>>> The second is missing '$' in version, which I'm guessing is a bug? >>>> >>>> chris >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >>> >>> >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bosborne11 at verizon.net Wed Mar 21 12:23:22 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 21 Mar 2007 12:23:22 -0400 Subject: [Bioperl-l] Two Bio::DB::Flat swiss adaptors In-Reply-To: Message-ID: Chris, "swiss" is presumably the preferred name since this is what is used over in SeqIO. Brian O. On 3/21/07 12:19 PM, "Chris Fields" wrote: > That's what I am thinking as well. I may try removing the one with > the version error to see what happens with tests... > > chris > > On Mar 21, 2007, at 10:59 AM, Hilmar Lapp wrote: > >> Maybe someone wanted to change the name and failed to remove the >> original? >> >> On Mar 21, 2007, at 12:34 AM, Chris Fields wrote: >> >>> Just curious, but why are there two Bio::DB::Flat::BDB swissprot >>> adaptors (Bio::DB::Flat::BDB::swiss, Bio::DB::Flat::BDB::swissprot)? >>> The only significant difference between the two is the following line >>> in seq_to_ids(): >>> >>> Bio::DB::Flat::BDB::swiss : $ids{VERSION} = "$accession.$version" >>> if defined $accession && defined $version; >>> Bio::DB::Flat::BDB::swiss : $ids{VERSION} = "$accession.version" >>> if defined $accession && defined $version; >>> >>> The second is missing '$' in version, which I'm guessing is a bug? >>> >>> chris >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From swseo at uiuc.edu Wed Mar 21 18:02:13 2007 From: swseo at uiuc.edu (Seongwon "Terry" Seo) Date: Wed, 21 Mar 2007 17:02:13 -0500 Subject: [Bioperl-l] Two Bio::DB::Flat swiss adaptors In-Reply-To: <40F32EFC-E515-4B5C-9008-BFD90735C232@uiuc.edu> References: <40F32EFC-E515-4B5C-9008-BFD90735C232@uiuc.edu> Message-ID: <002001c76c04$a2fc4820$51f97e82@BosTaurus> I have a question regarding SwissProt. One ID can have multiple accessions in SwissProt, but I cannot find where they are saved in Bioperl::Seq. Is there anyone can help me on this? Terry Seo. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields Sent: Tuesday, March 20, 2007 11:34 PM To: Bioperl-L list Subject: [Bioperl-l] Two Bio::DB::Flat swiss adaptors Just curious, but why are there two Bio::DB::Flat::BDB swissprot adaptors (Bio::DB::Flat::BDB::swiss, Bio::DB::Flat::BDB::swissprot)? The only significant difference between the two is the following line in seq_to_ids(): Bio::DB::Flat::BDB::swiss : $ids{VERSION} = "$accession.$version" if defined $accession && defined $version; Bio::DB::Flat::BDB::swiss : $ids{VERSION} = "$accession.version" if defined $accession && defined $version; The second is missing '$' in version, which I'm guessing is a bug? chris _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Wed Mar 21 18:29:55 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 21 Mar 2007 18:29:55 -0400 Subject: [Bioperl-l] Two Bio::DB::Flat swiss adaptors In-Reply-To: <002001c76c04$a2fc4820$51f97e82@BosTaurus> References: <40F32EFC-E515-4B5C-9008-BFD90735C232@uiuc.edu> <002001c76c04$a2fc4820$51f97e82@BosTaurus> Message-ID: <118F9FF6-D323-4483-B4B3-9CCE922AA376@gmx.net> get_secondary_accessions() in the Bio::Seq::RichSeqI interface On Mar 21, 2007, at 6:02 PM, Seongwon "Terry" Seo wrote: > I have a question regarding SwissProt. > One ID can have multiple accessions in SwissProt, but I cannot find > where > they are saved in Bioperl::Seq. Is there anyone can help me on this? > Terry Seo. > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris > Fields > Sent: Tuesday, March 20, 2007 11:34 PM > To: Bioperl-L list > Subject: [Bioperl-l] Two Bio::DB::Flat swiss adaptors > > Just curious, but why are there two Bio::DB::Flat::BDB swissprot > adaptors (Bio::DB::Flat::BDB::swiss, Bio::DB::Flat::BDB::swissprot)? > The only significant difference between the two is the following line > in seq_to_ids(): > > Bio::DB::Flat::BDB::swiss : $ids{VERSION} = "$accession.$version" > if defined $accession && defined $version; > Bio::DB::Flat::BDB::swiss : $ids{VERSION} = "$accession.version" > if defined $accession && defined $version; > > The second is missing '$' in version, which I'm guessing is a bug? > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Wed Mar 21 18:31:45 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 21 Mar 2007 17:31:45 -0500 Subject: [Bioperl-l] Two Bio::DB::Flat swiss adaptors In-Reply-To: <002001c76c04$a2fc4820$51f97e82@BosTaurus> References: <40F32EFC-E515-4B5C-9008-BFD90735C232@uiuc.edu> <002001c76c04$a2fc4820$51f97e82@BosTaurus> Message-ID: I think all secondary accessions are accessed via: my @sec_acc = $seq->get_secondary_accessions(); At least that's what's in the test suite. Go Illini! chris On Mar 21, 2007, at 5:02 PM, Seongwon "Terry" Seo wrote: > I have a question regarding SwissProt. > One ID can have multiple accessions in SwissProt, but I cannot find > where > they are saved in Bioperl::Seq. Is there anyone can help me on this? > Terry Seo. > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris > Fields > Sent: Tuesday, March 20, 2007 11:34 PM > To: Bioperl-L list > Subject: [Bioperl-l] Two Bio::DB::Flat swiss adaptors > > Just curious, but why are there two Bio::DB::Flat::BDB swissprot > adaptors (Bio::DB::Flat::BDB::swiss, Bio::DB::Flat::BDB::swissprot)? > The only significant difference between the two is the following line > in seq_to_ids(): > > Bio::DB::Flat::BDB::swiss : $ids{VERSION} = "$accession.$version" > if defined $accession && defined $version; > Bio::DB::Flat::BDB::swiss : $ids{VERSION} = "$accession.version" > if defined $accession && defined $version; > > The second is missing '$' in version, which I'm guessing is a bug? > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From betts at embl.de Thu Mar 22 05:16:33 2007 From: betts at embl.de (Matthew Betts) Date: Thu, 22 Mar 2007 10:16:33 +0100 (CET) Subject: [Bioperl-l] Bio::Graphics::Panel and ModPerl::Registry In-Reply-To: References:

Message-ID: Thanks Chris, I was using the 1.4.0 stable release before, and just upgraded to 1.5.2 and it works now. I searched bugzilla before sending my last email, but couldn't see anything relevant (but I was searching for __DATA__, mod_perl, etc, which as you say may not have been the problem). Thanks again for the quick reply, Matthew On Wed, 21 Mar 2007, Chris Fields wrote: > I don't think the __END__/__DATA__ tag is the issue. The mod_perl docs > section you refer to indicates the problems with using __END__/__DATA__ lie in > the actual Apache::Registry scripts (which are wrapped in a handler sub), not > Perl modules. The error you report doesn't match up either; I would expect a > missing brackets problem, as stated in the docs. > > What bioperl version are you using? The line numbering in your reported error > doesn't match up with bioperl CVS or the last release. One aspect may be that > this has been fixed already, so you could try updating bioperl to the latest > release or from CVS to see if that fixes the problem. > > chris > > On Mar 21, 2007, at 8:39 AM, Matthew Betts wrote: > > > > > Hi, > > > > I'm using Bio::Graphics::Panel under modperl2. The script runs without > > errors the first time, but every subsequent time I get errors like: > > > > Illegal hexadecimal digit '-' ignored at > > /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Panel.pm line 813, > > line > > 18. > > > > As far as I can find out this is because __END__ and __DATA__ tokens can > > not be used with mod_perl. > > http://perl.apache.org/docs/1.0/guide/porting.html > > > > Am I doing something wrong, and is there any way around this please? > > > > Thanks, > > > > Matthew > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From prtome at crs4.it Thu Mar 22 10:11:44 2007 From: prtome at crs4.it (Patricia Rodriguez Tome) Date: Thu, 22 Mar 2007 15:11:44 +0100 Subject: [Bioperl-l] strange problem Message-ID: <46028EA0.7070901@crs4.it> Hi I have found a parsing problem in SearchIO I have this result: Score E Sequences producing significant alignments: (bits) Value UniRef50_Q9X0H5 Cluster: Histidyl-tRNA synthetase; n=4; Thermoto... 23 650 I do a very simple parsing with SearchIO my $in = new Bio::SearchIO(-format => 'blast', -file => $ARGV[0]); while (my $hit = $result->next_hit()) { print "name\t", $hit->name, "\n"; print "length\t", $hit->length, "\n"; print "accession\t", $hit->accession, "\n"; print "description\t", $hit->description, "\n"; print "raw_score\t", $hit->raw_score, "\n"; print "significance\t", $hit->significance, "\n"; print "bits\t", $hit->bits, "\n"; And the result is: name UniRef50_Q9X0H5 length 420 accession UniRef50_Q9X0H5 description Cluster: Histidyl-tRNA synthetase; n=4; Thermotogaceae|Rep: Histidyl-tRNA synthetase - Thermotoga maritima raw_score ... significance 23 bits 22.7 As you see the three dots at the end of description get into the raw score instead, then the evalue gets the raw score I am using bioperl1.5.1 and tried even with 1.5.2 but get the same result Where can I change it ? Grazie Patricia -- Dr. Patricia Rodriguez-Tom?, PhD CRS4 - Bioinformatics Loc. Pixina Manna Edificio 3 Pula 09010 (CA), Italy http://www.bioinformatica.crs4.org From cjfields at uiuc.edu Thu Mar 22 11:30:28 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 22 Mar 2007 10:30:28 -0500 Subject: [Bioperl-l] strange problem In-Reply-To: <46028EA0.7070901@crs4.it> References: <46028EA0.7070901@crs4.it> Message-ID: <85EAEAA1-8574-4862-8647-D0D4CFDC8C6D@uiuc.edu> This is a bug. Could you add this to bugzilla with a representative report and script? It should be easy to fix as it's likely just a regex problem. http://www.bioperl.org/wiki/Bugs http://bugzilla.open-bio.org/ chris On Mar 22, 2007, at 9:11 AM, Patricia Rodriguez Tome wrote: > Hi > > I have found a parsing problem in SearchIO > I have this result: > > > Score E > Sequences producing significant alignments: > (bits) Value > > UniRef50_Q9X0H5 Cluster: Histidyl-tRNA synthetase; n=4; > Thermoto... 23 650 > > > > I do a very simple parsing with SearchIO > my $in = new Bio::SearchIO(-format => 'blast', > -file => $ARGV[0]); > > while (my $hit = $result->next_hit()) { > print "name\t", $hit->name, "\n"; > print "length\t", $hit->length, "\n"; > print "accession\t", $hit->accession, "\n"; > print "description\t", $hit->description, "\n"; > print "raw_score\t", $hit->raw_score, "\n"; > print "significance\t", $hit->significance, "\n"; > print "bits\t", $hit->bits, "\n"; > > And the result is: > name UniRef50_Q9X0H5 > length 420 > accession UniRef50_Q9X0H5 > description Cluster: Histidyl-tRNA synthetase; n=4; > Thermotogaceae|Rep: Histidyl-tRNA synthetase - Thermotoga maritima > raw_score ... > significance 23 > bits 22.7 > > As you see the three dots at the end of description get into the raw > score instead, then the evalue gets the raw score > I am using bioperl1.5.1 and tried even with 1.5.2 but get the same > result > Where can I change it ? > > Grazie > > Patricia > > -- > Dr. Patricia Rodriguez-Tom?, PhD > CRS4 - Bioinformatics > Loc. Pixina Manna Edificio 3 > Pula 09010 (CA), Italy > http://www.bioinformatica.crs4.org > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From dmessina at wustl.edu Thu Mar 22 11:12:54 2007 From: dmessina at wustl.edu (David Messina) Date: Thu, 22 Mar 2007 10:12:54 -0500 Subject: [Bioperl-l] strange problem In-Reply-To: <46028EA0.7070901@crs4.it> References: <46028EA0.7070901@crs4.it> Message-ID: Ciao Patricia, Could you tell us what version of Blast your report is from? Are you running Blast locally on your machine or remotely through a web interface? Dave -- Dave Messina Senior Analyst, Assembly Group Genome Sequencing Center Washington University St. Louis, MO -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2504 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070322/5ce06bef/attachment.bin From emeric.sevin at univ-rennes1.fr Thu Mar 22 11:07:57 2007 From: emeric.sevin at univ-rennes1.fr (Emeric Sevin) Date: Thu, 22 Mar 2007 16:07:57 +0100 Subject: [Bioperl-l] rpsblast results unsupported by Bio::SearchIO::Writer In-Reply-To: <46028EA0.7070901@crs4.it> References: <46028EA0.7070901@crs4.it> Message-ID: <8015924160e6b1f3af747fe2a906503a@univ-rennes1.fr> Hello, I am new to this community, and apologize if this subject has been posted before. I want to print out only selected results from a multiple blast-alignments results file. Problem is, the algorithm used is rpsblast. The parsing (with Bio::SearchIO) goes fine, but the actual writing task yields "unclean" warnings. Although an ouput is actually written, the writer (Bio::SearchIO::Writer::TextResultWriter) seems to be disturbed by the fact rpsblast DBs are not labeled with "protein"/"nucleic"/"translated". Does anybody know of an easy fix to that bug, or of another way to come around it? Thank you very much Emeric SEVIN Universit? de Rennes 1 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 693 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070322/00bc6481/attachment.bin From Kevin.M.Brown at asu.edu Thu Mar 22 11:25:21 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 22 Mar 2007 08:25:21 -0700 Subject: [Bioperl-l] Blast parsers missing functions Message-ID: <1A4207F8295607498283FE9E93B775B402EDF92A@EX02.asurite.ad.asu.edu> I was looking through the deobfuscator after creating a blast file parser that failed on parsing a blast XML file due to a missing method and I see there are actually quite a few missing. Found in Bio::SearchIO::blast, but not Bio::SearchIO::blastxml or Bio::SearchIO::blasttable best_hit_only check_all_hits inclusion_threshold max_significance min_query_length min_score signif I use the min_score in my parser so that blastall can be run with a high E and later filtered, or refiltered for other things after the blast is done. It would be nice if all the blast parsers had most of these missing functions to at least keep them consistent with each other for parsing blast reports :) From prtome at crs4.it Thu Mar 22 12:03:00 2007 From: prtome at crs4.it (Patricia Rodriguez Tome) Date: Thu, 22 Mar 2007 17:03:00 +0100 Subject: [Bioperl-l] strange problem In-Reply-To: References: <46028EA0.7070901@crs4.it> Message-ID: <4602A8B4.2020507@crs4.it> hi running Blast locally, version: BLASTP 2.2.15 [Oct-15-2006] I will follow Chris advice and fill a bug report Pat David Messina wrote: > Ciao Patricia, > > Could you tell us what version of Blast your report is from? Are you > running Blast locally on your machine or remotely through a web > interface? > > Dave > > -- > Dave Messina > Senior Analyst, Assembly Group > Genome Sequencing Center > Washington University > St. Louis, MO > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Dr. Patricia Rodriguez-Tom?, PhD CRS4 - Bioinformatics Loc. Pixina Manna Edificio 3 Pula 09010 (CA), Italy http://www.bioinformatica.crs4.org From cjfields at uiuc.edu Thu Mar 22 13:56:16 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 22 Mar 2007 12:56:16 -0500 Subject: [Bioperl-l] Blast parsers missing functions In-Reply-To: <1A4207F8295607498283FE9E93B775B402EDF92A@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B402EDF92A@EX02.asurite.ad.asu.edu> Message-ID: You can certainly add this as an enhancement request in Bugzilla, but unless someone codes for it we can't promise you when it would be added in. chris On Mar 22, 2007, at 10:25 AM, Kevin Brown wrote: > I was looking through the deobfuscator after creating a blast file > parser that failed on parsing a blast XML file due to a missing method > and I see there are actually quite a few missing. > > Found in Bio::SearchIO::blast, but not Bio::SearchIO::blastxml or > Bio::SearchIO::blasttable > best_hit_only > check_all_hits > inclusion_threshold > max_significance > min_query_length > min_score > signif > > I use the min_score in my parser so that blastall can be run with a > high > E and later filtered, or refiltered for other things after the > blast is > done. > > It would be nice if all the blast parsers had most of these missing > functions to at least keep them consistent with each other for parsing > blast reports :) From cjfields at uiuc.edu Thu Mar 22 18:29:15 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 22 Mar 2007 17:29:15 -0500 Subject: [Bioperl-l] [BioSQL-l] Bio::Annotation::StructuredValue In-Reply-To: <1F43323F-22D8-4A1D-A62E-46E60A59D97C@gmx.net> References: <8C776F12-19CF-4511-91F2-ED9640FB995C@uiuc.edu> <1F43323F-22D8-4A1D-A62E-46E60A59D97C@gmx.net> Message-ID: <70781A22-76D3-4B71-8364-1FAED30179CE@uiuc.edu> On Mar 17, 2007, at 9:26 AM, Hilmar Lapp wrote: > On Mar 16, 2007, at 3:44 PM, Chris Fields wrote: > >> Does bioperl-db store Bio::Annotation::StructuredValue (i.e. in >> SwissProt)? > > It does b/c B::A::StructuredValue ISA B::A::SimpleValue and it > handles the latter. > > This isn't ideal because if you're like me you'd want all the > individual values to each translate to its own row. I was using a > SeqProcessor to convert the StructuredValue objects into arrays of > SimpleValue objects. > > Obviously, this will lose the structure between them (i.e., in > reality it's not just a flat array), but for enabling indexed > searches it works well. > > With Uniprot no longer collapsing per sequence, the thing that gets > lost is the semantic context of each token, but as you found out > correctly it gets lost at the bioperl level already. Yes, unfortunately, though the use of an ontology would help as you suggest below. >> I am thinking of using StructuredValue, Data::Stag, or >> Class::Meta for some of my RNA structural data work but didn't know >> if StructuredValues would persist via bioperl-db. > > At this point they are either flattened out (through the overridden > value() method), or you convert them upfront into an array, using a > SeqProcessor. > > BioSQL has no provision for storing the fact that a number of tag/ > value associations (which is what B::A::SimpleValues are) comprise of > a "bag" of annotation that belongs together. > > You could, however, persist that through embedding the tags in an > ontology (tags are ontology terms) that captures that (through > rel.ships). I will likely use this approach, though there are no applicable SO/GO terms that I can use so I'll have to roll my own for now. I may use something similar to the RNAML tags for sec. structure. >> I also noticed there is an outstanding BioPerl bug (http:// >> bugzilla.open-bio.org/show_bug.cgi?id=1825) where Hilmar suggested >> reimplementing StructuredValueto use Data::Stag, so I thought I might >> give it a try. > > Sounds good :-) > > I hope the above makes some sense. Let me know if not. > > -hilmar Makes perfect sense! Just needed to run it by someone on the BioSQL end since I'll want to make my data a bit more persistent. I think I will go with Bio::StructuredValue implementing Data::Stag since it has pretty much everything I need. chris From brandon.barker at gmail.com Fri Mar 23 02:46:06 2007 From: brandon.barker at gmail.com (Brandon Barker) Date: Fri, 23 Mar 2007 02:46:06 -0400 Subject: [Bioperl-l] SearchIO Parse Errors Message-ID: <3e078a5e0703222346q5f6e5fc7ue09c43916500bf4b@mail.gmail.com> Hi, I'm using Bioperl to run BLAST and parse the results of many pairs of sequences. Some of these pairs work fine, but for some of the others my script outputs errors of the following form: Bio::SearchIO::blast=HASH(0x1a18040)parse error on F18G5.3 / CBG16182 Bio::SearchIO::blast=HASH(0x19ff2cc)parse error on Y73E7A.3 / CBG22266 Bio::SearchIO::blast=HASH(0x1a0cbe4)parse error on T23F1.6 / CBG24659 Bio::SearchIO::blast=HASH(0x19fcca8)parse error on Y71G10AR.4 / CBG13872 The last two words on the lines above are the sequences that are trying to be being aligned. I'm using the following code to construct the searchio objects: my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); $factory->outfile('blast.out'); my $report = $factory->bl2seq($input, $input2); try { my $result = $report->next_result; ..... } If it would be helpful, I would be more than happy to make an archive of the two sequence files and the bioperl script. I searched the bioperl core codebase briefly but didn't see anything that looked familiar to this problem. Thanks in advance, -- Brandon Barker Phone: (859) 948-5335 From cjfields at uiuc.edu Fri Mar 23 11:17:15 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 23 Mar 2007 10:17:15 -0500 Subject: [Bioperl-l] SearchIO Parse Errors In-Reply-To: <3e078a5e0703222346q5f6e5fc7ue09c43916500bf4b@mail.gmail.com> References: <3e078a5e0703222346q5f6e5fc7ue09c43916500bf4b@mail.gmail.com> Message-ID: <08730A53-3EAF-48FC-A85B-FB0B3BF67768@uiuc.edu> You can send the archive directly to me (not the group!) or file this as a bug in Bugzilla and attach the archive to the report. It's an odd bug but it's hard to say what's going on w/o looking at your code and BLAST reports. I did notice that (when going through the StandAloneBlast code) that bl2seq expects a Bio::Tools::BPbl2seq object which has been deprecated for a while now and is no longer supported. chris On Mar 23, 2007, at 1:46 AM, Brandon Barker wrote: > Hi, > > I'm using Bioperl to run BLAST and parse the results of many pairs of > sequences. Some of these pairs work fine, but for some of the others > my script outputs errors of the following form: > > > Bio::SearchIO::blast=HASH(0x1a18040)parse error on F18G5.3 / CBG16182 > Bio::SearchIO::blast=HASH(0x19ff2cc)parse error on Y73E7A.3 / CBG22266 > Bio::SearchIO::blast=HASH(0x1a0cbe4)parse error on T23F1.6 / CBG24659 > Bio::SearchIO::blast=HASH(0x19fcca8)parse error on Y71G10AR.4 / > CBG13872 > > > The last two words on the lines above are the sequences that are > trying to be being aligned. I'm using the following code to construct > the searchio objects: > > > my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); > > $factory->outfile('blast.out'); > my $report = $factory->bl2seq($input, $input2); > try { > my $result = $report->next_result; > ..... > } > > If it would be helpful, I would be more than happy to make an archive > of the two sequence files and the bioperl script. I searched the > bioperl core codebase briefly but didn't see anything that looked > familiar to this problem. > > Thanks in advance, > > > -- > Brandon Barker > Phone: (859) 948-5335 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From steletch at jouy.inra.fr Tue Mar 27 11:12:55 2007 From: steletch at jouy.inra.fr (=?ISO-8859-1?Q?St=E9phane_T=E9letch=E9a?=) Date: Tue, 27 Mar 2007 17:12:55 +0200 Subject: [Bioperl-l] Aligning two alignments ... Message-ID: <46093477.5090807@jouy.inra.fr> Dear all, I'm trying to compare alignments results for small protein sequences to validate my result from another method. In the best option, i would do on two sequences - run a clustalw alignment - run a blast alignment - run a T-Coffee alignment - use my own And align them all to see easily what is identical, what differs. I've considered doing a simple match, but with more than two methods to compare, this may not be trivial. I'd like you opinion on this please. I would consider sequence I as a reference and sequence II as the alignment target. Thanks a lot in advance for your comments, St?phane -- St?phane T?letch?a, PhD. http://www.steletch.org Unit? Math?matique Informatique et G?nome http://migale.jouy.inra.fr/mig INRA, Domaine de Vilvert T?l : (33) 134 652 891 78352 Jouy-en-Josas cedex, France Fax : (33) 134 652 901 From jaudall at byu.edu Tue Mar 27 11:36:35 2007 From: jaudall at byu.edu (Joshua Udall) Date: Tue, 27 Mar 2007 08:36:35 -0700 Subject: [Bioperl-l] Aligning two alignments ... In-Reply-To: <46093477.5090807@jouy.inra.fr> References: <46093477.5090807@jouy.inra.fr> Message-ID: <7.0.1.0.2.20070327083437.031b8028@byu.edu> Stephane - Muscle will align two alignments when you use the profile option. I'm not sure if the 'profile' functionality is in the bioperl wrapper or not ... At 08:12 AM 3/27/2007, St?phane T?letch?a wrote: >Dear all, > >I'm trying to compare alignments results for small protein sequences to >validate my result from another method. In the best option, i would do >on two sequences > >- run a clustalw alignment >- run a blast alignment >- run a T-Coffee alignment >- use my own > >And align them all to see easily what is identical, what differs. > >I've considered doing a simple match, but with more than two methods to >compare, this may not be trivial. I'd like you opinion on this please. > >I would consider sequence I as a reference and sequence II as the >alignment target. > >Thanks a lot in advance for your comments, > >St?phane >-- >St?phane T?letch?a, PhD. http://www.steletch.org >Unit? Math?matique Informatique et G?nome http://migale.jouy.inra.fr/mig >INRA, Domaine de Vilvert T?l : (33) 134 652 891 >78352 Jouy-en-Josas cedex, France Fax : (33) 134 652 901 >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l Joshua Udall Assistant Professor 295 WIDB Department of Plant and Animal Science Brigham Young University Provo, UT 84602 Office: 801-422-9307 Fax: 801-422-0008 From steletch at jouy.inra.fr Tue Mar 27 12:51:28 2007 From: steletch at jouy.inra.fr (=?ISO-8859-1?Q?St=E9phane_T=E9letch=E9a?=) Date: Tue, 27 Mar 2007 18:51:28 +0200 Subject: [Bioperl-l] Aligning two alignments ... In-Reply-To: <7.0.1.0.2.20070327083437.031b8028@byu.edu> References: <46093477.5090807@jouy.inra.fr> <7.0.1.0.2.20070327083437.031b8028@byu.edu> Message-ID: <46094B90.1000702@jouy.inra.fr> Joshua Udall a ?crit : > Stephane - > > Muscle will align two alignments when you use the profile option. I'm > not sure if the 'profile' functionality is in the bioperl wrapper or not > ... Many thanks to Joshua and Iain for their rapid answer but i forgot to mention that i don't want to alter the alignments. In fact i want to align the sequences to the initial sequence. Cheers, St?phane -- St?phane T?letch?a, PhD. http://www.steletch.org Unit? Math?matique Informatique et G?nome http://migale.jouy.inra.fr/mig INRA, Domaine de Vilvert T?l : (33) 134 652 891 78352 Jouy-en-Josas cedex, France Fax : (33) 134 652 901 From bix at sendu.me.uk Tue Mar 27 13:24:03 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 27 Mar 2007 18:24:03 +0100 Subject: [Bioperl-l] Aligning two alignments ... In-Reply-To: <46093477.5090807@jouy.inra.fr> References: <46093477.5090807@jouy.inra.fr> Message-ID: <46095333.3050905@sendu.me.uk> St?phane T?letch?a wrote: > Dear all, > > I'm trying to compare alignments results for small protein sequences to > validate my result from another method. In the best option, i would do > on two sequences > > - run a clustalw alignment > - run a blast alignment > - run a T-Coffee alignment > - use my own > > And align them all to see easily what is identical, what differs. You don't want to align the alignments; that makes no sense and you can't achieve anything without altering the alignments. I suppose you actually just want to score the alignments against some 'ideal' alignment or against one of the others. In which case generate a match line/ cigar line/ simply count the number of identical bases across different alignment strings. From akarger at CGR.Harvard.edu Tue Mar 27 16:40:13 2007 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Tue, 27 Mar 2007 16:40:13 -0400 Subject: [Bioperl-l] dn/ds code from PAML howto not working Message-ID: A client of mine stole the pairwise Ka/Ks code from the "Running PAML from within Bioperl" section of the PAML HOWTO. It mostly seems to be running fine. However, when we say $result = $parser->next_result; we get an undef $result. 1) Is there something I can look for in the output files to show me whether it's a problem with Bioperl itself (Never!) or with CODEML? 2) Is there a previously known issue with Bioperl's incorrectly reading such files? Bio::Root::Version says it's 1.5, by the way. 3) Something else I haven't thought of? Thanks, Amir Karger Research Computing, Life Sciences Division Harvard University From cjfields at uiuc.edu Tue Mar 27 17:17:51 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 27 Mar 2007 16:17:51 -0500 Subject: [Bioperl-l] Question about BioPerl Bugzilla Bug 2213 In-Reply-To: <7600DB635DB2234CBBED9CDD990C4DB0023A28@exchmail.CMH.Internal> References: <7600DB635DB2234CBBED9CDD990C4DB0023A28@exchmail.CMH.Internal> Message-ID: <4E406CFC-A7B8-4F15-AE90-192791AA507F@uiuc.edu> First, you should direct this to the mail list in case anyone else can add to this. I may not be able to get to this anytime real soon. From the bug report: "The postprocessing in SpeciesAdaptor does mess things up in some cases. The issue is directly related to recent changes in Bio::Species and and could be taken care of by simply not running any postprocessing and foregoing the lineage checking altogether in Bio::Species::classification(), where the exception occurs. However, I believe doing so may break functionality with older bioperl-db/BioSQL installations since data is stored based on the older Bio::Species system (single-name genus and species). Maybe Hilmar can comment?" As noted in the bug report this is still considered a developer series; even though most of the core modules work well together there are still some interoperability issues present (as this bug demonstrates). Maybe having a BioSQL TaxonAdaptor module would be a workaround; Bio::Species is-now-a Bio::Taxon (whereas pre-1.5.2 versions aren't), so if we had a module that stored data in the newer context it might work around this. Hilmar? chris On Mar 27, 2007, at 3:42 PM, Carrel, Michael, G wrote: > Chris, > > > > I am trying to apply this patch to my BioPerl-DB 1.5.2 code and > don't understand what the changes are in the Bio/DB/BioSQL/ > SpeciesAdaptor.pm code. What does the "+=pod" text mean? Same for > "+=cut"? Are we commenting out lines 256 through 280? > > > > The text says that "massaging" code was commented out, but I don't > understand exactly what lines are commented out. Please explain in > more detail what the changes are in the SpeciesAdaptor.pm file. > > > > I believe I understand the changes in the Bio/DB/BioSQL/mysql/ > SpeciesAdaptorDriver.pm code file...commenting out the one line > > ( #$clf[0]->[0] = $obj->binomial(); ). > > > > Thank you, > > > > Mike Carrel > > Network Analyst > > 816-234-1571 > > mgcarrel at cmh.edu > > > > > > > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From stefan.kirov at bms.com Tue Mar 27 17:13:03 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Tue, 27 Mar 2007 16:13:03 -0500 Subject: [Bioperl-l] dn/ds code from PAML howto not working In-Reply-To: References: Message-ID: <460988DF.3000400@bms.com> Amir Karger wrote: > A client of mine stole the pairwise Ka/Ks code from the "Running PAML > from within Bioperl" section of the PAML HOWTO. It mostly seems to be > running fine. However, when we say > $result = $parser->next_result; > > we get an undef $result. > > 1) Is there something I can look for in the output files to show me > whether it's a problem with Bioperl itself (Never!) or with CODEML? > > 2) Is there a previously known issue with Bioperl's incorrectly reading > such files? Bio::Root::Version says it's 1.5, by the way. > > 3) Something else I haven't thought of? > > Thanks, > > Amir Karger > Research Computing, Life Sciences Division > Harvard University > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Do you get a message saying you have no sites? Have you looked at the alignments? Stefan From gilbertd at cricket.bio.indiana.edu Tue Mar 27 19:42:30 2007 From: gilbertd at cricket.bio.indiana.edu (Don Gilbert) Date: Tue, 27 Mar 2007 18:42:30 -0500 (EST) Subject: [Bioperl-l] Genbank2gff3 script update Message-ID: <200703272342.l2RNgUT19468@cricket.bio.indiana.edu> Dear Bioperl developers, Here is an improved bp_Genbank2gff3.pl script, with bug fixes and enhancements. The non-transparent changes in behavior are made via non-default command flags. I've updated these against current Bioperl CVS. Would one of you care to add this to your CVS repository? THanks, Don Gilbert Find at http://eugenes.org/gmod/genbank2chado/ =item Bioperl bp_genbank2gff3.pl bin/genbank2gff3.PLS (Bioperl CVS scripts/Bio-GFF-DB/genbank2gff3.PLS) lib/Bio-new/SeqFeature/Tools/TypeMapper.pm (required for genbank2gff3 update) lib/Bio-new/SeqFeature/Tools/Unflattener.pm (minor change suggested for genbank2gff3) (put into your Bioperl lib/Bio/... directories) There are also this unrelated patch lib/Bio-new/Graphnics/Glyph/processed_transcript.pm -- new flag to ignore excess subfeatures from Chado's gene-mrna-polypeptide-exon model. =item Genbank2gff3 changes * Polypeptide alternate gene model added (--noCDS option) Standard gene model: gene > mRNA > (UTR,CDS,exon) G-R-P-E alternate model: gene > mRNA > polypeptide > exon Polypeptide contains all the important protein info (IDs, translation, GO terms) * IO pipes: curl ftp://ncbigenomes/... | genbank2gff3 --in stdin --out stdout | gff2chado ... * GenBank main record fields are added to source feature and the sourcetype, commonly chromosome for genomes, is used. * Gene Model handling for ncRNA, pseudogenes are added. * GFF header is cleaner, more informative, and GFF_VERSION option * GFF ##FASTA inclusion is improved, and translation sequence stored there. * FT -> GFF attribute mapping is improved. * --format choice of SeqIO input formats (GenBank default). Uniprot/Swissprot and EMBL produce useful GFF. * SeqFeature::Tools::TypeMapper has a few FT -> SOFA additions, more flexible usage. -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 -- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/ From bosborne11 at verizon.net Tue Mar 27 22:20:59 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 27 Mar 2007 22:20:59 -0400 Subject: [Bioperl-l] Genbank2gff3 script update In-Reply-To: <200703272342.l2RNgUT19468@cricket.bio.indiana.edu> Message-ID: Don, I took the file http://eugenes.org/gmod/genbank2chado/bin/genbank2gff3.PLS and replaced the script of the same name with it, in scripts/Bio-DB-GFF. Brian O. On 3/27/07 7:42 PM, "Don Gilbert" wrote: > > Dear Bioperl developers, > > Here is an improved bp_Genbank2gff3.pl script, with bug fixes > and enhancements. The non-transparent changes in behavior are > made via non-default command flags. I've updated these against current > Bioperl CVS. Would one of you care to add this to your CVS repository? > > THanks, Don Gilbert > > Find at http://eugenes.org/gmod/genbank2chado/ > > =item Bioperl bp_genbank2gff3.pl > > bin/genbank2gff3.PLS (Bioperl CVS scripts/Bio-GFF-DB/genbank2gff3.PLS) > lib/Bio-new/SeqFeature/Tools/TypeMapper.pm (required for genbank2gff3 > update) > lib/Bio-new/SeqFeature/Tools/Unflattener.pm (minor change suggested for > genbank2gff3) > (put into your Bioperl lib/Bio/... directories) > > There are also this unrelated patch > lib/Bio-new/Graphnics/Glyph/processed_transcript.pm > -- new flag to ignore excess subfeatures from Chado's > gene-mrna-polypeptide-exon model. > > =item Genbank2gff3 changes > > * Polypeptide alternate gene model added (--noCDS option) > Standard gene model: gene > mRNA > (UTR,CDS,exon) > G-R-P-E alternate model: gene > mRNA > polypeptide > exon > Polypeptide contains all the important protein info (IDs, translation, GO > terms) > > * IO pipes: curl ftp://ncbigenomes/... | genbank2gff3 --in stdin --out > stdout | gff2chado ... > > * GenBank main record fields are added to source feature > and the sourcetype, commonly chromosome for genomes, is used. > > * Gene Model handling for ncRNA, pseudogenes are added. > > * GFF header is cleaner, more informative, and GFF_VERSION option > > * GFF ##FASTA inclusion is improved, and translation sequence stored there. > > * FT -> GFF attribute mapping is improved. > > * --format choice of SeqIO input formats (GenBank default). > Uniprot/Swissprot and EMBL produce useful GFF. > > * SeqFeature::Tools::TypeMapper has a few FT -> SOFA additions, more > flexible usage. > > -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 > -- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Derek.Fairley at bll.n-i.nhs.uk Wed Mar 28 04:04:23 2007 From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek) Date: Wed, 28 Mar 2007 09:04:23 +0100 Subject: [Bioperl-l] Aligning two alignments ... In-Reply-To: <46093477.5090807@jouy.inra.fr> Message-ID: St?phane, You might also want to take a look at TuneClustalX (http://homepage.mac.com/barryghall/TuneClustalX.html). This utility compares the Q-score values generated by ClustalX when run with different gap penalties, to optimize your aa alignment in the first place. Derek. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of St?phane T?letch?a Sent: 27 March 2007 16:13 To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Aligning two alignments ... Dear all, I'm trying to compare alignments results for small protein sequences to validate my result from another method. In the best option, i would do on two sequences - run a clustalw alignment - run a blast alignment - run a T-Coffee alignment - use my own And align them all to see easily what is identical, what differs. I've considered doing a simple match, but with more than two methods to compare, this may not be trivial. I'd like you opinion on this please. I would consider sequence I as a reference and sequence II as the alignment target. Thanks a lot in advance for your comments, St?phane -- St?phane T?letch?a, PhD. http://www.steletch.org Unit? Math?matique Informatique et G?nome http://migale.jouy.inra.fr/mig INRA, Domaine de Vilvert T?l : (33) 134 652 891 78352 Jouy-en-Josas cedex, France Fax : (33) 134 652 901 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From lavende_dresden at hotmail.com Wed Mar 28 04:52:05 2007 From: lavende_dresden at hotmail.com (zhang zhang) Date: Wed, 28 Mar 2007 08:52:05 +0000 Subject: [Bioperl-l] from transcript to genes Message-ID: Dear all, I just begin to use Ensembl perl API. for example, now I have zebrafish transcrits like "ENSDART00000011751", how can I find its corresponding gene and further get gene's location such chromosome, start, end information? what kinds of ensembl database and adaptors or methods should I use? Thanks, Yanju Zhang _________________________________________________________________ ???????????????????????????? MSN Messenger: http://messenger.msn.com/cn From avilella at gmail.com Wed Mar 28 05:41:14 2007 From: avilella at gmail.com (Albert Vilella) Date: Wed, 28 Mar 2007 10:41:14 +0100 Subject: [Bioperl-l] from transcript to genes In-Reply-To: References: Message-ID: <358f4d650703280241v7cf9068emff5766354950c7a@mail.gmail.com> Hi Zhang, This question is more for the ensembl-dev mailing list: ensembl-dev at ebi.ac.uk than the bioperl ml. You can get the gene object from the transcript object with a script like this: ---- use strict; use Bio::EnsEMBL::Registry; Bio::EnsEMBL::Registry->load_registry_from_db(-host=>"ensembldb.ensembl.org", -user=>"anonymous", -verbose=>'0'); my $transcript_adaptor = Bio::EnsEMBL::Registry->get_adaptor("Danio rerio", "core", "Transcript"); my $transcript = $transcript_adaptor->fetch_by_stable_id("ENSDART00000011751"); my $gene_adaptor = Bio::EnsEMBL::Registry->get_adaptor("Danio rerio", "core", "Gene"); my $gene_from_transcript = $gene_adaptor->fetch_by_transcript_id($transcript->dbID); print "transcript_stable_id: ", $transcript->stable_id, "\n"; print "gene_stable_id: ", $gene_from_transcript->stable_id, "\n"; print "Chromosome: ", $gene_from_transcript->seq_region_name, "\n"; print "Strand (1,-1): ", $gene_from_transcript->seq_region_strand, "\n"; print "Start: ", $gene_from_transcript->seq_region_start, "\n"; print "End: ", $gene_from_transcript->seq_region_end, "\n"; 1; ---- Cheers, Albert. On 3/28/07, zhang zhang wrote: > > Dear all, > > I just begin to use Ensembl perl API. for example, now I have zebrafish > transcrits like "ENSDART00000011751", how can I find its corresponding > gene > and further get gene's location such chromosome, start, end information? > > what kinds of ensembl database and adaptors or methods should I use? > > Thanks, > Yanju Zhang > > _________________________________________________________________ > ???????????????????????????? MSN Messenger: http://messenger.msn.com/cn > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From avilella at gmail.com Wed Mar 28 05:44:03 2007 From: avilella at gmail.com (Albert Vilella) Date: Wed, 28 Mar 2007 10:44:03 +0100 Subject: [Bioperl-l] Aligning two alignments ... In-Reply-To: <7.0.1.0.2.20070327083437.031b8028@byu.edu> References: <46093477.5090807@jouy.inra.fr> <7.0.1.0.2.20070327083437.031b8028@byu.edu> Message-ID: <358f4d650703280244o54e8da6albe4892321450e2c9@mail.gmail.com> You can call the Muscle wrapper with the profile option although when I wrote it I was assuming 1 aln and a set of seqs as the input: $alnfilename = /t/data/cysprot.msa'; $seqsfilename = 't/data/cysprot.fa'; $aln = $factory->profile($alnfilename,$seqsfilename); one could easily add the code for having 2 alns as input, but I think it is internally the same for muscle, Cheers, Albert. On 3/27/07, Joshua Udall wrote: > > Stephane - > > Muscle will align two alignments when you use the > profile option. I'm not sure if the 'profile' > functionality is in the bioperl wrapper or not ... > > At 08:12 AM 3/27/2007, St?phane T?letch?a wrote: > >Dear all, > > > >I'm trying to compare alignments results for small protein sequences to > >validate my result from another method. In the best option, i would do > >on two sequences > > > >- run a clustalw alignment > >- run a blast alignment > >- run a T-Coffee alignment > >- use my own > > > >And align them all to see easily what is identical, what differs. > > > >I've considered doing a simple match, but with more than two methods to > >compare, this may not be trivial. I'd like you opinion on this please. > > > >I would consider sequence I as a reference and sequence II as the > >alignment target. > > > >Thanks a lot in advance for your comments, > > > >St?phane > >-- > >St?phane T?letch?a, PhD. http://www.steletch.org > >Unit? Math?matique Informatique et G?nome http://migale.jouy.inra.fr/mig > >INRA, Domaine de Vilvert T?l : (33) 134 652 891 > >78352 Jouy-en-Josas cedex, France Fax : (33) 134 652 901 > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Joshua Udall > Assistant Professor > 295 WIDB > Department of Plant and Animal Science > Brigham Young University > Provo, UT 84602 > Office: 801-422-9307 > Fax: 801-422-0008 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bix at sendu.me.uk Wed Mar 28 05:44:56 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 28 Mar 2007 10:44:56 +0100 Subject: [Bioperl-l] from transcript to genes In-Reply-To: References: Message-ID: <460A3918.7000001@sendu.me.uk> zhang zhang wrote: > Dear all, > > I just begin to use Ensembl perl API. for example, now I have zebrafish > transcrits like "ENSDART00000011751", how can I find its corresponding > gene and further get gene's location such chromosome, start, end > information? > > what kinds of ensembl database and adaptors or methods should I use? Ask on the ensembl-dev mailing list and/or read the ensembl API docs: http://www.ensembl.org/info/about/contact.html http://www.ensembl.org/info/software/core/core_tutorial.html http://www.ensembl.org/info/software/Pdoc/ensembl/index.html From lavende_dresden at hotmail.com Wed Mar 28 05:50:24 2007 From: lavende_dresden at hotmail.com (zhang zhang) Date: Wed, 28 Mar 2007 09:50:24 +0000 Subject: [Bioperl-l] from transcript to genes In-Reply-To: <358f4d650703280241v7cf9068emff5766354950c7a@mail.gmail.com> Message-ID: Thanks Albert, But now if I have a string of transcripts like ('ENSDART00000011751','ENSDART00000002250' and more). I want to find their corresponding genes how to do it? I have tried: $transcript_adaptor->fetch_by_stable_id("ENSDART00000011751",'ENSDART00000002250'); doesnot work! except using the loop, any easy idea? Cheers Yanju >From: "Albert Vilella" >To: "zhang zhang" , bioperl-l >Subject: Re: [Bioperl-l] from transcript to genes >Date: Wed, 28 Mar 2007 10:41:14 +0100 > >Hi Zhang, > >This question is more for the ensembl-dev mailing list: >ensembl-dev at ebi.ac.uk than the bioperl ml. > >You can get the gene object from the transcript object with a script >like >this: > >---- > >use strict; >use Bio::EnsEMBL::Registry; > >Bio::EnsEMBL::Registry->load_registry_from_db(-host=>"ensembldb.ensembl.org", >-user=>"anonymous", -verbose=>'0'); > >my $transcript_adaptor = Bio::EnsEMBL::Registry->get_adaptor("Danio >rerio", >"core", "Transcript"); >my $transcript = >$transcript_adaptor->fetch_by_stable_id("ENSDART00000011751"); > >my $gene_adaptor = Bio::EnsEMBL::Registry->get_adaptor("Danio >rerio", >"core", "Gene"); >my $gene_from_transcript = >$gene_adaptor->fetch_by_transcript_id($transcript->dbID); > >print "transcript_stable_id: ", $transcript->stable_id, "\n"; >print "gene_stable_id: ", $gene_from_transcript->stable_id, "\n"; >print "Chromosome: ", $gene_from_transcript->seq_region_name, "\n"; >print "Strand (1,-1): ", $gene_from_transcript->seq_region_strand, >"\n"; >print "Start: ", $gene_from_transcript->seq_region_start, "\n"; >print "End: ", $gene_from_transcript->seq_region_end, "\n"; > >1; > >---- > >Cheers, > > Albert. > > >On 3/28/07, zhang zhang wrote: >> >>Dear all, >> >>I just begin to use Ensembl perl API. for example, now I have >>zebrafish >>transcrits like "ENSDART00000011751", how can I find its >>corresponding >>gene >>and further get gene's location such chromosome, start, end >>information? >> >>what kinds of ensembl database and adaptors or methods should I >>use? >> >>Thanks, >>Yanju Zhang >> >>_________________________________________________________________ >>???????????????????????????? MSN Messenger: >>http://messenger.msn.com/cn >> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> _________________________________________________________________ ?????????????????????????????? MSN Hotmail?? http://www.hotmail.com From cjfields at uiuc.edu Wed Mar 28 11:09:05 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Mar 2007 10:09:05 -0500 Subject: [Bioperl-l] from transcript to genes In-Reply-To: References: Message-ID: <8FE7C788-599F-43B9-835A-3D1DE9A451FA@uiuc.edu> As both Albert and Sendu point out, this is best asked on the ensembl- dev mail list. In relation (and due) to this question I have added a few Ensembl- related questions to the FAQ. If anyone has corrections on those please feel free to make them. chris On Mar 28, 2007, at 4:50 AM, zhang zhang wrote: > Thanks Albert, > > But now if I have a string of transcripts like > ('ENSDART00000011751','ENSDART00000002250' and more). I want to > find their corresponding genes how to do it? > > I have tried: > $transcript_adaptor->fetch_by_stable_id > ("ENSDART00000011751",'ENSDART00000002250'); > > > doesnot work! except using the loop, any easy idea? > > Cheers Yanju > > >> From: "Albert Vilella" >> To: "zhang zhang" , bioperl-l > >> Subject: Re: [Bioperl-l] from transcript to genes >> Date: Wed, 28 Mar 2007 10:41:14 +0100 >> >> Hi Zhang, >> >> This question is more for the ensembl-dev mailing list: >> ensembl-dev at ebi.ac.uk than the bioperl ml. >> >> You can get the gene object from the transcript object with a >> script like >> this: >> >> ---- >> >> use strict; >> use Bio::EnsEMBL::Registry; >> >> Bio::EnsEMBL::Registry->load_registry_from_db(- >> host=>"ensembldb.ensembl.org", > >> -user=>"anonymous", -verbose=>'0'); >> >> my $transcript_adaptor = Bio::EnsEMBL::Registry->get_adaptor >> ("Danio rerio", >> "core", "Transcript"); >> my $transcript = >> $transcript_adaptor->fetch_by_stable_id("ENSDART00000011751"); >> >> my $gene_adaptor = Bio::EnsEMBL::Registry->get_adaptor("Danio rerio", >> "core", "Gene"); >> my $gene_from_transcript = >> $gene_adaptor->fetch_by_transcript_id($transcript->dbID); >> >> print "transcript_stable_id: ", $transcript->stable_id, "\n"; >> print "gene_stable_id: ", $gene_from_transcript->stable_id, "\n"; >> print "Chromosome: ", $gene_from_transcript->seq_region_name, "\n"; >> print "Strand (1,-1): ", $gene_from_transcript- >> >seq_region_strand, "\n"; >> print "Start: ", $gene_from_transcript->seq_region_start, "\n"; >> print "End: ", $gene_from_transcript->seq_region_end, "\n"; >> >> 1; >> >> ---- >> >> Cheers, >> >> Albert. >> >> >> On 3/28/07, zhang zhang wrote: >>> >>> Dear all, >>> >>> I just begin to use Ensembl perl API. for example, now I have >>> zebrafish >>> transcrits like "ENSDART00000011751", how can I find its >>> corresponding >>> gene >>> and further get gene's location such chromosome, start, end >>> information? >>> >>> what kinds of ensembl database and adaptors or methods should I use? >>> >>> Thanks, >>> Yanju Zhang >>> >>> _________________________________________________________________ >>> ?????????????? MSN Messenger: http:// >>> messenger.msn.com/cn >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> > > _________________________________________________________________ > ??????????????? MSN Hotmail? http:// > www.hotmail.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From johnsonm at gmail.com Wed Mar 28 14:27:39 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Wed, 28 Mar 2007 13:27:39 -0500 Subject: [Bioperl-l] Bio::Tools::Run::tRNAscanSE Message-ID: I'm going to need Bio::Tools::Run::tRNAscanSE. I'm going to fire up an editor in a little while and bang it out unless somebody already has something but hasn't committed to cvs yet... From cjfields at uiuc.edu Wed Mar 28 14:38:21 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Mar 2007 13:38:21 -0500 Subject: [Bioperl-l] Bio::Tools::Run::tRNAscanSE In-Reply-To: References: Message-ID: Fire away! I have contributed a wrapper for Infernal as well if you need it... chris On Mar 28, 2007, at 1:27 PM, Mark Johnson wrote: > I'm going to need Bio::Tools::Run::tRNAscanSE. I'm going to fire > up an editor in a little while and bang it out unless somebody already > has something but hasn't committed to cvs yet... > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From johnsonm at gmail.com Wed Mar 28 14:43:46 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Wed, 28 Mar 2007 13:43:46 -0500 Subject: [Bioperl-l] Bio::Tools::Run::tRNAscanSE In-Reply-To: References:

Message-ID: I will be taking your Infernal wrapper for a spin. Thanks for writing it, I'm happy to have that checked off the list. 8) On 3/28/07, Chris Fields wrote: > Fire away! I have contributed a wrapper for Infernal as well if you > need it... > > chris > > On Mar 28, 2007, at 1:27 PM, Mark Johnson wrote: > > > I'm going to need Bio::Tools::Run::tRNAscanSE. I'm going to fire > > up an editor in a little while and bang it out unless somebody already > > has something but hasn't committed to cvs yet... > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From aharry2001 at yahoo.com Thu Mar 29 04:21:12 2007 From: aharry2001 at yahoo.com (Ambrose) Date: Thu, 29 Mar 2007 01:21:12 -0700 (PDT) Subject: [Bioperl-l] problem while parsing UniProt(ltaxon.pm) In-Reply-To: Message-ID: <20070329082112.60238.qmail@web52010.mail.re2.yahoo.com> Dear all, I am parsing the information in UniProt using bioperl The parser runs successfull for thousand of records and suddenly I get an error message concerning the lineage.The error message suggested that the error comes up when bioperl tries to parse OC lines(taxonomy). I decided to parse this out using perl and not bioperl but I still get the same error message. I really wish to know whether the problem is with a change in the taxon.pm or has this been problem been reported by other users.I am waiting ansciously to read from you. best regards ambrose Here you are with the error message Q0QAY1_9DIPT Q0QAY7_9DIPT Q0QB51_9DIPT Q0QB52_9DIPT Q0QB62_9DIPT Q0QB63_9DIPT ------------- EXCEPTION ------------- MSG: The lineage 'Eukaryota, Metazoa, Mollusca, Bivalvia, Heteroconchia, Veneroida, Veneroidea, Veneridae, Venerupis, Ruditapes, Venerupis' had two non-consecutive nodes with the same name. Can't cope! STACK Bio::DB::Taxonomy::list::add_lineage /usr/local/ActivePerl/site/lib/Bio/DB/Taxonomy/list.pm:157 STACK Bio::DB::Taxonomy::list::new /usr/local/ActivePerl/site/lib/Bio/DB/Taxonomy/list.pm:94 STACK Bio::DB::Taxonomy::new /usr/local/ActivePerl/site/lib/Bio/DB/Taxonomy.pm:103 STACK Bio::Species::classification /usr/local/ActivePerl/site/lib/Bio/Species.pm:179 STACK Bio::SeqIO::swiss::_read_swissprot_Species /usr/local/ActivePerl/site/lib/Bio/SeqIO/swiss.pm:1067 STACK Bio::SeqIO::swiss::next_seq /usr/local/ActivePerl/site/lib/Bio/SeqIO/swiss.pm:247 STACK toplevel upparser.pl:178 -------------------------------------- --------------------------------- Need Mail bonding? Go to the Yahoo! Mail Q&A for great tips from Yahoo! Answers users. From bix at sendu.me.uk Thu Mar 29 05:30:07 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 29 Mar 2007 10:30:07 +0100 Subject: [Bioperl-l] problem while parsing UniProt(ltaxon.pm) In-Reply-To: <20070329082112.60238.qmail@web52010.mail.re2.yahoo.com> References: <20070329082112.60238.qmail@web52010.mail.re2.yahoo.com> Message-ID: <460B871F.5040906@sendu.me.uk> Ambrose wrote: > Dear all, > I am parsing the information in UniProt using bioperl > The parser runs successfull for thousand of records and suddenly I get an error > message concerning the lineage.The error message suggested that the error > comes up when bioperl tries to parse OC lines(taxonomy). > I decided to parse this out using perl and not bioperl but > I still get the same error message. Not sure what you mean by that. The error message is generated by BioPerl, so of course you were still using BioPerl. > I really wish to know whether the problem is with a > change in the taxon.pm or has this been problem been reported > by other users.I am waiting ansciously to read from you. Taxon.pm is new (not 'changed'), and the error is generated by Bio::DB::Taxonomy::list, also new. > Here you are with the error message > > Q0QAY1_9DIPT > Q0QAY7_9DIPT > Q0QB51_9DIPT > Q0QB52_9DIPT > Q0QB62_9DIPT > Q0QB63_9DIPT > > ------------- EXCEPTION ------------- > MSG: The lineage 'Eukaryota, Metazoa, Mollusca, Bivalvia, > Heteroconchia, > Veneroida, Veneroidea, Veneridae, Venerupis, Ruditapes, Venerupis' had > two > non-consecutive nodes with the same name. Can't cope! > STACK Bio::DB::Taxonomy::list::add_lineage > /usr/local/ActivePerl/site/lib/Bio/DB/Taxonomy/list.pm:157 Please send me the actual record that causes the exception and I'll see what I can do about fixing the problem. From cjfields at uiuc.edu Thu Mar 29 08:12:43 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 29 Mar 2007 07:12:43 -0500 Subject: [Bioperl-l] problem while parsing UniProt(ltaxon.pm) In-Reply-To: <460B871F.5040906@sendu.me.uk> References: <20070329082112.60238.qmail@web52010.mail.re2.yahoo.com> <460B871F.5040906@sendu.me.uk> Message-ID: <96A15304-41A8-423A-AE46-B30C04EC22CF@uiuc.edu> >> Here you are with the error message >> >> Q0QAY1_9DIPT >> Q0QAY7_9DIPT >> Q0QB51_9DIPT >> Q0QB52_9DIPT >> Q0QB62_9DIPT >> Q0QB63_9DIPT >> >> ------------- EXCEPTION ------------- >> MSG: The lineage 'Eukaryota, Metazoa, Mollusca, Bivalvia, >> Heteroconchia, >> Veneroida, Veneroidea, Veneridae, Venerupis, Ruditapes, Venerupis' >> had >> two >> non-consecutive nodes with the same name. Can't cope! >> STACK Bio::DB::Taxonomy::list::add_lineage >> /usr/local/ActivePerl/site/lib/Bio/DB/Taxonomy/list.pm:157 > > Please send me the actual record that causes the exception and I'll > see > what I can do about fixing the problem. Sendu, Here's one accession which reproduces this: Q7Y720. There is an additional component to the error that I find: Use of uninitialized value in pattern match (m//) at /Users/cjfields/ src/bioperl-live/Bio/SeqIO/swiss.pm line 1060, line 13. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: The lineage 'Eukaryota, Metazoa, Mollusca, Bivalvia, Heteroconchia, Veneroida, Veneroidea, Veneridae, Venerupis, Ruditapes, Venerupis' had two non-consecutive nodes with the same name. Can't cope! STACK: Error::throw STACK: Bio::Root::Root::throw /Users/cjfields/src/bioperl-live/Bio/ Root/Root.pm:359 STACK: Bio::DB::Taxonomy::list::add_lineage /Users/cjfields/src/ bioperl-live/Bio/DB/Taxonomy/list.pm:157 STACK: Bio::DB::Taxonomy::list::new /Users/cjfields/src/bioperl-live/ Bio/DB/Taxonomy/list.pm:94 STACK: Bio::DB::Taxonomy::new /Users/cjfields/src/bioperl-live/Bio/DB/ Taxonomy.pm:103 STACK: Bio::Species::classification /Users/cjfields/src/bioperl-live/ Bio/Species.pm:180 STACK: Bio::SeqIO::swiss::_read_swissprot_Species /Users/cjfields/src/ bioperl-live/Bio/SeqIO/swiss.pm:1073 STACK: Bio::SeqIO::swiss::next_seq /Users/cjfields/src/bioperl-live/ Bio/SeqIO/swiss.pm:247 STACK: tax.pl:11 ----------------------------------------------------------- The problem appears to be with the OS source organism line in swiss files, which looks like is being parsed incorrectly for these. Here is the relevant section: OS Venerupis (Ruditapes) philippinarum. OG Mitochondrion. A UniProt query limited to taxonomy using 'Venerupis' produces several more. This only affects swissprot; embl and genbank files with similar source lines do not have the same problem. chris From Simon.Williams at postgrad.manchester.ac.uk Thu Mar 29 09:13:23 2007 From: Simon.Williams at postgrad.manchester.ac.uk (Simon Williams) Date: Thu, 29 Mar 2007 14:13:23 +0100 Subject: [Bioperl-l] Blastall problems Message-ID: <20070329141323.wblbdqfcgoowgw0w@webmail.manchester.ac.uk> Dear all, I am having a few difficulties with standaloneblast. I am trying to implement a web tool which will blast a given sequence before it goes on to do various other things. I have a script to run the blast using the appropriate bioperl modules which runs ok from the command line. The problem comes when I try to run this through the web page. I get this output: [blastall] FATAL ERROR: BlastFormattingInfoNew returned non-zero status ------------- EXCEPTION ------------- MSG: blastall call crashed: 256 /usr/bin/blastall -p blastp -d "/fs/storage/data/db/blast/nr" -i /tmp/gldKFlbrJn -e 0.001 -o blast.out STACK Bio::Tools::Run::StandAloneBlast::_runblast /usr/share/perl5/Bio/Tools/Run/StandAloneBlast.pm:732 STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast /usr/share/perl5/Bio/Tools/Run/StandAloneBlast.pm:680 STACK Bio::Tools::Run::StandAloneBlast::blastall /usr/share/perl5/Bio/Tools/Run/StandAloneBlast.pm:536 STACK toplevel /fs/storage/home/williams/public_html/crescendo_cgi/pdb_extra/blastSeqGetter.pl:33 -------------------------------------- I'm not sure exactly what this means! Any ideas would be gratefully received. Simon From bix at sendu.me.uk Thu Mar 29 09:41:37 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 29 Mar 2007 14:41:37 +0100 Subject: [Bioperl-l] problem while parsing UniProt(ltaxon.pm) In-Reply-To: <96A15304-41A8-423A-AE46-B30C04EC22CF@uiuc.edu> References: <20070329082112.60238.qmail@web52010.mail.re2.yahoo.com> <460B871F.5040906@sendu.me.uk> <96A15304-41A8-423A-AE46-B30C04EC22CF@uiuc.edu> Message-ID: <460BC211.9000604@sendu.me.uk> Chris Fields wrote: > Here's one accession which reproduces this: Q7Y720. There is an > additional component to the error that I find: > > Use of uninitialized value in pattern match (m//) at > /Users/cjfields/src/bioperl-live/Bio/SeqIO/swiss.pm line 1060, > line 13. > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: The lineage 'Eukaryota, Metazoa, Mollusca, Bivalvia, Heteroconchia, > Veneroida, Veneroidea, Veneridae, Venerupis, Ruditapes, Venerupis' had > two non-consecutive nodes with the same name. Can't cope! > STACK: Error::throw > STACK: Bio::Root::Root::throw > /Users/cjfields/src/bioperl-live/Bio/Root/Root.pm:359 > STACK: Bio::DB::Taxonomy::list::add_lineage > /Users/cjfields/src/bioperl-live/Bio/DB/Taxonomy/list.pm:157 > STACK: Bio::DB::Taxonomy::list::new > /Users/cjfields/src/bioperl-live/Bio/DB/Taxonomy/list.pm:94 > STACK: Bio::DB::Taxonomy::new > /Users/cjfields/src/bioperl-live/Bio/DB/Taxonomy.pm:103 > STACK: Bio::Species::classification > /Users/cjfields/src/bioperl-live/Bio/Species.pm:180 > STACK: Bio::SeqIO::swiss::_read_swissprot_Species > /Users/cjfields/src/bioperl-live/Bio/SeqIO/swiss.pm:1073 > STACK: Bio::SeqIO::swiss::next_seq > /Users/cjfields/src/bioperl-live/Bio/SeqIO/swiss.pm:247 > STACK: tax.pl:11 > ----------------------------------------------------------- > > The problem appears to be with the OS source organism line in swiss > files, which looks like is being parsed incorrectly for these. Here is > the relevant section: > > OS Venerupis (Ruditapes) philippinarum. > OG Mitochondrion. > > A UniProt query limited to taxonomy using 'Venerupis' produces several > more. This only affects swissprot; embl and genbank files with similar > source lines do not have the same problem. Thanks. I've made a tentative fix to swiss.pm. The only problem might be common names/ descriptions don't get caught on some strange OS lines. I don't have enough experience of OS lines to know what they might look like. Still, at least there won't be thrown exceptions, which some users may prefer ;) I'll add tests later if and when Ambrose/ yourself confirm all is well. From cjfields at uiuc.edu Thu Mar 29 10:18:42 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 29 Mar 2007 09:18:42 -0500 Subject: [Bioperl-l] problem while parsing UniProt(ltaxon.pm) In-Reply-To: <460BC211.9000604@sendu.me.uk> References: <20070329082112.60238.qmail@web52010.mail.re2.yahoo.com> <460B871F.5040906@sendu.me.uk> <96A15304-41A8-423A-AE46-B30C04EC22CF@uiuc.edu> <460BC211.9000604@sendu.me.uk> Message-ID: <5FD505AB-9092-442A-8382-DA845182176B@uiuc.edu> On Mar 29, 2007, at 8:41 AM, Sendu Bala wrote: > Chris Fields wrote: >> Here's one accession which reproduces this: Q7Y720. There is an >> additional component to the error that I find: >> Use of uninitialized value in pattern match (m//) at /Users/ >> cjfields/src/bioperl-live/Bio/SeqIO/swiss.pm line 1060, >> line 13. >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: The lineage 'Eukaryota, Metazoa, Mollusca, Bivalvia, >> Heteroconchia, Veneroida, Veneroidea, Veneridae, Venerupis, >> Ruditapes, Venerupis' had two non-consecutive nodes with the same >> name. Can't cope! >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /Users/cjfields/src/bioperl-live/Bio/ >> Root/Root.pm:359 >> STACK: Bio::DB::Taxonomy::list::add_lineage /Users/cjfields/src/ >> bioperl-live/Bio/DB/Taxonomy/list.pm:157 >> STACK: Bio::DB::Taxonomy::list::new /Users/cjfields/src/bioperl- >> live/Bio/DB/Taxonomy/list.pm:94 >> STACK: Bio::DB::Taxonomy::new /Users/cjfields/src/bioperl-live/Bio/ >> DB/Taxonomy.pm:103 >> STACK: Bio::Species::classification /Users/cjfields/src/bioperl- >> live/Bio/Species.pm:180 >> STACK: Bio::SeqIO::swiss::_read_swissprot_Species /Users/cjfields/ >> src/bioperl-live/Bio/SeqIO/swiss.pm:1073 >> STACK: Bio::SeqIO::swiss::next_seq /Users/cjfields/src/bioperl- >> live/Bio/SeqIO/swiss.pm:247 >> STACK: tax.pl:11 >> ----------------------------------------------------------- >> The problem appears to be with the OS source organism line in >> swiss files, which looks like is being parsed incorrectly for >> these. Here is the relevant section: >> OS Venerupis (Ruditapes) philippinarum. >> OG Mitochondrion. >> A UniProt query limited to taxonomy using 'Venerupis' produces >> several more. This only affects swissprot; embl and genbank files >> with similar source lines do not have the same problem. > > Thanks. I've made a tentative fix to swiss.pm. The only problem > might be common names/ descriptions don't get caught on some > strange OS lines. I don't have enough experience of OS lines to > know what they might look like. > > Still, at least there won't be thrown exceptions, which some users > may prefer ;) > > I'll add tests later if and when Ambrose/ yourself confirm all is > well. I'm getting it to parse but there is a '.' appended to the scientific_name(): Venerupis (Ruditapes) philippinarum. which appears in the classification: Venerupis (Ruditapes) philippinarum.; Ruditapes; Venerupis; Veneridae; Veneroidea; Veneroida; Heteroconchia; Bivalvia; Mollusca; Metazoa; Eukaryota; chris From bix at sendu.me.uk Thu Mar 29 10:06:51 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 29 Mar 2007 15:06:51 +0100 Subject: [Bioperl-l] Blastall problems In-Reply-To: <20070329141323.wblbdqfcgoowgw0w@webmail.manchester.ac.uk> References: <20070329141323.wblbdqfcgoowgw0w@webmail.manchester.ac.uk> Message-ID: <460BC7FB.6030908@sendu.me.uk> Simon Williams wrote: > Dear all, > > I am having a few difficulties with standaloneblast. I am trying to implement a > web tool which will blast a given sequence before it goes on to do various > other things. I have a script to run the blast using the appropriate bioperl > modules which runs ok from the command line. The problem comes when I try to > run this through the web page. I get this output: > > [blastall] FATAL ERROR: BlastFormattingInfoNew returned non-zero status > ------------- EXCEPTION ------------- > MSG: blastall call crashed: 256 /usr/bin/blastall -p blastp -d > "/fs/storage/data/db/blast/nr" -i /tmp/gldKFlbrJn -e 0.001 -o blast.out STACK > Bio::Tools::Run::StandAloneBlast::_runblast > /usr/share/perl5/Bio/Tools/Run/StandAloneBlast.pm:732 STACK > Bio::Tools::Run::StandAloneBlast::_generic_local_blast > /usr/share/perl5/Bio/Tools/Run/StandAloneBlast.pm:680 STACK > Bio::Tools::Run::StandAloneBlast::blastall > /usr/share/perl5/Bio/Tools/Run/StandAloneBlast.pm:536 STACK toplevel > /fs/storage/home/williams/public_html/crescendo_cgi/pdb_extra/blastSeqGetter.pl:33 > -------------------------------------- > > I'm not sure exactly what this means! As the message says, the blastall call crashed. Perhaps your http user doesn't have permission to run blastall or access the necessary files. From bix at sendu.me.uk Thu Mar 29 10:29:34 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 29 Mar 2007 15:29:34 +0100 Subject: [Bioperl-l] problem while parsing UniProt(ltaxon.pm) In-Reply-To: <5FD505AB-9092-442A-8382-DA845182176B@uiuc.edu> References: <20070329082112.60238.qmail@web52010.mail.re2.yahoo.com> <460B871F.5040906@sendu.me.uk> <96A15304-41A8-423A-AE46-B30C04EC22CF@uiuc.edu> <460BC211.9000604@sendu.me.uk> <5FD505AB-9092-442A-8382-DA845182176B@uiuc.edu> Message-ID: <460BCD4E.6010803@sendu.me.uk> Chris Fields wrote: > On Mar 29, 2007, at 8:41 AM, Sendu Bala wrote: >> Thanks. I've made a tentative fix to swiss.pm. The only problem might >> be common names/ descriptions don't get caught on some strange OS >> lines. I don't have enough experience of OS lines to know what they >> might look like. >> >> Still, at least there won't be thrown exceptions, which some users may >> prefer ;) >> >> I'll add tests later if and when Ambrose/ yourself confirm all is well. > > I'm getting it to parse but there is a '.' appended to the > scientific_name(): > > Venerupis (Ruditapes) philippinarum. Ok, that should be fixed as well now. How do/will these changes feed into your driver stuff? What is the status on that work? The intent? Are we switching over to using swissdriver.pm et al. at some point? From cjfields at uiuc.edu Thu Mar 29 11:39:55 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 29 Mar 2007 10:39:55 -0500 Subject: [Bioperl-l] problem while parsing UniProt(ltaxon.pm) In-Reply-To: <460BCD4E.6010803@sendu.me.uk> References: <20070329082112.60238.qmail@web52010.mail.re2.yahoo.com> <460B871F.5040906@sendu.me.uk> <96A15304-41A8-423A-AE46-B30C04EC22CF@uiuc.edu> <460BC211.9000604@sendu.me.uk> <5FD505AB-9092-442A-8382-DA845182176B@uiuc.edu> <460BCD4E.6010803@sendu.me.uk> Message-ID: <183D3757-48A0-43BC-B3D0-93F9517C4A4B@uiuc.edu> On Mar 29, 2007, at 9:29 AM, Sendu Bala wrote: > Chris Fields wrote: >> On Mar 29, 2007, at 8:41 AM, Sendu Bala wrote: >>> Thanks. I've made a tentative fix to swiss.pm. The only problem >>> might be common names/ descriptions don't get caught on some >>> strange OS lines. I don't have enough experience of OS lines to >>> know what they might look like. >>> >>> Still, at least there won't be thrown exceptions, which some >>> users may prefer ;) >>> >>> I'll add tests later if and when Ambrose/ yourself confirm all is >>> well. >> I'm getting it to parse but there is a '.' appended to the >> scientific_name(): >> Venerupis (Ruditapes) philippinarum. > > Ok, that should be fixed as well now. How do/will these changes > feed into your driver stuff? What is the status on that work? The > intent? Are we switching over to using swissdriver.pm et al. at > some point? It's fine now; I did notice that EMBL leaves it in as well so I fixed that. As for SeqIO::swissdriver, it does remove the '.' from the OS line but leaves it in the classification line. Doh! I'll try fixing that... In relation to that, the driver/handler-based SeqIO parsers are still being worked on when I have time (which there hasn't been much of lately). I don't see them immediately replacing SeqIO's genbank/embl/ swiss, though the next_seq() implementation works fine. It is very possible that these will replace the older parsers down the road, though (maybe post-1.6). They aren't intended for a stable release for now so may not be included in v 1.6, but they pass the current genbank/embl/swiss tests and can be included in any dev releases for testing. As for the Handler.t tests, I cheat a little since they don't have a write_seq() implemented yet; I may just delegate those to SeqIO::genbank/embl/swiss::write_seq() for the time being. The general idea of what I want to do is in the following link, though it's woefully incomplete at this stage. If you have any ideas let me know or add your own thoughts to the Talk page there! http://www.bioperl.org/wiki/Handler-based_SeqIO_parsers chris From cjfields at uiuc.edu Thu Mar 29 12:26:34 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 29 Mar 2007 11:26:34 -0500 Subject: [Bioperl-l] strange problem In-Reply-To: <460BE24F.5030704@crs4.it> References: <46028EA0.7070901@crs4.it> <85EAEAA1-8574-4862-8647-D0D4CFDC8C6D@uiuc.edu> <460BE24F.5030704@crs4.it> Message-ID: <6DCFEC52-62F3-4EBC-BECE-836D65574792@uiuc.edu> The issue with Build.PL may be that it is trying to update Scalar::Util to the latest (1.19) or install it if it isn't present but doesn't have the correct permissions. I myself have run into this problem; you must install the latest Scalar::Utils directly via CPAN with the correct permissions ('sudo cpan' in my case). I'm guessing that using something like 'sudo perl Build.PL' should work as well. I'm adding Sendu to this so he can answer. Sendu, I know it's now required for Bio::Species/Bio::Taxon stuff to weaken refs but was there any reason why Scalar::Util needs to be v1.19? I can't recall anything about it in past discussions... chris On Mar 29, 2007, at 10:59 AM, Patricia Rodriguez Tome wrote: > Hello Chris > > I took the bioperl from cvs (from the web site, did a tarball) but > now I am at lost to install it. > Now that it does not use simple perl Makefile.PL, it wants to > install plenty of things with CPAN, and even complaints that my > CPAN has serious problems (I can install other things) > So I am afraid that the test with the new cvs version will have to > wait another week until I come back and can solve all these problems > > In the meantime I am opening another bug with another very strange > case (I seem to be collecting them lately:) > cheers > > > > Chris Fields wrote: >> This is a bug. Could you add this to bugzilla with a >> representative report and script? It should be easy to fix as >> it's likely just a regex problem. >> >> http://www.bioperl.org/wiki/Bugs >> http://bugzilla.open-bio.org/ >> >> chris >> >> On Mar 22, 2007, at 9:11 AM, Patricia Rodriguez Tome wrote: >> >> >>> Hi >>> >>> I have found a parsing problem in SearchIO >>> I have this result: >>> >>> >>> Score E >>> Sequences producing significant alignments: >>> (bits) Value >>> >>> UniRef50_Q9X0H5 Cluster: Histidyl-tRNA synthetase; n=4; >>> Thermoto... 23 650 >>> >>> >>> >>> I do a very simple parsing with SearchIO >>> my $in = new Bio::SearchIO(-format => 'blast', >>> -file => $ARGV[0]); >>> >>> while (my $hit = $result->next_hit()) { >>> print "name\t", $hit->name, "\n"; >>> print "length\t", $hit->length, "\n"; >>> print "accession\t", $hit->accession, "\n"; >>> print "description\t", $hit->description, "\n"; >>> print "raw_score\t", $hit->raw_score, "\n"; >>> print "significance\t", $hit->significance, "\n"; >>> print "bits\t", $hit->bits, "\n"; >>> >>> And the result is: >>> name UniRef50_Q9X0H5 >>> length 420 >>> accession UniRef50_Q9X0H5 >>> description Cluster: Histidyl-tRNA synthetase; n=4; >>> Thermotogaceae|Rep: Histidyl-tRNA synthetase - Thermotoga maritima >>> raw_score ... >>> significance 23 >>> bits 22.7 >>> >>> As you see the three dots at the end of description get into the raw >>> score instead, then the evalue gets the raw score >>> I am using bioperl1.5.1 and tried even with 1.5.2 but get the >>> same result >>> Where can I change it ? >>> >>> Grazie >>> >>> Patricia >>> >>> -- >>> Dr. Patricia Rodriguez-Tom?, PhD >>> CRS4 - Bioinformatics >>> Loc. Pixina Manna Edificio 3 >>> Pula 09010 (CA), Italy >>> http://www.bioinformatica.crs4.org >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > -- > Dr. Patricia Rodriguez-Tom?, PhD > CRS4 - Bioinformatics > Loc. Pixina Manna Edificio 3 Pula 09010 (CA), Italy > http://www.bioinformatica.crs4.org > > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From staffa at niehs.nih.gov Thu Mar 29 15:06:22 2007 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS)) Date: Thu, 29 Mar 2007 15:06:22 -0400 Subject: [Bioperl-l] get_SeqFeatures doesn't like genbank CON files Message-ID: If I use the following code on the genbank flat files gbconN.seq (N=1..4), I bomb memory. So I wrote a flat Perl script and made oodles of files, one for each genbank CON entry for D.pseudoobscura. These entries have complete features tables, but do not have real sequence, just join statements referencing the WGS files AADExxxxxxxxxxx. When I run this code on them. the BioPerl modules don't seem to like the join statements being where they are, and for some reason object to "gap". I AM glad that BioPerl allowed the program to process all files. The code: $seqio_object = Bio::SeqIO->new(-file => "$filename" ); $seq_object = $seqio_object->next_seq; $sequence_length = $seq_object->length(); my @features = $seq_object->get_SeqFeatures(); # just top level The log: -------------------- WARNING --------------------- MSG: exception while parsing location line [join(AADE01003924.1:1..5157,gap(128),complement(AADE01002963.1:1..8959),gap (50),complement(AA DE01002322.1:801..13635),AADE01008784.1:1..995,complement(AADE01002422.1:1.. 12770),gap(105),complement(AADE01006425.1:1..1791),gap(940),c omplement(AADE01002137.1:1..15323),gap(962),AADE01003112.1:1..8150,gap(194), AADE01000989.1:1..38476,AADE01012537.1:1..1696,gap(243),AADE0 1012620.1:1..612,complement(AADE01002972.1:1..8912),gap(1646),complement(AAD E01009428.1:602..2135),AADE01000086.1:1..143541,complement(AA ... ... ... 01003505.1:1..6496,gap(1445),AADE01004655.1:1..3580,gap(328),AADE01002622.1: 1..11193,gap(90),complement(AADE01006718.1:1..1606),gap(423), complement(AADE01004351.1:1..4128))] in reading EMBL/GenBank/SwissProt, ignoring feature CONTIG (seqid=CH379058): ------------- EXCEPTION ------------- MSG: operator "gap" unrecognized by parser STACK Bio::Factory::FTLocationFactory::from_string /usr/lib/perl5/site_perl/5.8.5/Bio/Factory/FTLocationFactory.pm:179 STACK Bio::Factory::FTLocationFactory::from_string /usr/lib/perl5/site_perl/5.8.5/Bio/Factory/FTLocationFactory.pm:175 STACK (eval) /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/FTHelper.pm:127 STACK Bio::SeqIO::FTHelper::_generic_seqfeature /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/FTHelper.pm:126 STACK Bio::SeqIO::genbank::next_seq /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/genbank.pm:514 STACK toplevel find_orthos.pl:24 This even occurs with the addition of ?format => ?genbank? Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Information Technology Support Services Contract (Science Task Monitor: John D. Grovenstein (grovens1 at niehs.nih.gov) National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina From cjfields at uiuc.edu Thu Mar 29 16:00:09 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 29 Mar 2007 15:00:09 -0500 Subject: [Bioperl-l] get_SeqFeatures doesn't like genbank CON files In-Reply-To: References: Message-ID: <41C9F40D-55CF-4A2D-8A3C-8A034051E6F0@uiuc.edu> Nick, you may want to check your bioperl version as the SeqIO::genbank line number indicated in the error is not the same as in CVS (and I'm guessing from the last release either). If you aren't running a recent bioperl version I would suggest upgrading to 1.5.2; CONTIG parsing was something I added in last year sometime post 1.5.1. They must be preceeded by the GenBank-compliant CONTIG tagname to be parsed correctly (using the EMBL-like 'CON' doesn't work). The CONTIG line data is not supposed to be treated like a location; it's normally just stuffed into Annotation::SimpleValue objects to be spit back out in write_seq() if needed. As the error states there are no Bio::Location classes that handle gap data. Since it's trying to process this as a location it indicates something is definitely wrong; the only place this would occur is while parsing features as that's where FTLocationFactory comes into play (via FTHelper). If your seq records look like this (from CM000126) and you still have problems with the latest bioperl release you'll have to file a bug with an example file so we can look at it. ... FEATURES Location/Qualifiers source 1..47244934 /organism="Oryza sativa (indica cultivar-group)" /mol_type="genomic DNA" /cultivar="93-11" /db_xref="taxon:39946" /chromosome="1" CONTIG join(CH398081.1:1..22419,gap(unk100),CH398082.1:1..12525385, gap(unk100),CH398083.1:1..13518,gap (unk100),CH398084.1:1..2551194, gap(unk100),CH398085.1:1..3493222,gap(unk100), CH398086.1:1..5091462,gap(unk100),CH398087.1:1..26622,gap (unk100), CH398088.1:1..4860221,gap(unk100),CH398089.1:1..18660091) // chris On Mar 29, 2007, at 2:06 PM, Staffa, Nick (NIH/NIEHS) wrote: > If I use the following code on the genbank flat files gbconN.seq > (N=1..4), > I bomb memory. So I wrote a flat Perl script and made oodles of > files, > one for each genbank CON entry for D.pseudoobscura. > These entries have complete features tables, but do not have real > sequence, > just join statements referencing the WGS files AADExxxxxxxxxxx. > When I run this code on them. the BioPerl modules don't seem to > like the > join statements being where they are, and for some reason object to > "gap". > I AM glad that BioPerl allowed the program to process all files. > > The code: > $seqio_object = Bio::SeqIO->new(-file => "$filename" ); > $seq_object = $seqio_object->next_seq; > $sequence_length = $seq_object->length(); > my @features = $seq_object->get_SeqFeatures(); # just top level > > > The log: > -------------------- WARNING --------------------- > MSG: exception while parsing location line > [join(AADE01003924.1:1..5157,gap(128),complement > (AADE01002963.1:1..8959),gap > (50),complement(AA > DE01002322.1:801..13635),AADE01008784.1:1..995,complement > (AADE01002422.1:1.. > 12770),gap(105),complement(AADE01006425.1:1..1791),gap(940),c > omplement(AADE01002137.1:1..15323),gap > (962),AADE01003112.1:1..8150,gap(194), > AADE01000989.1:1..38476,AADE01012537.1:1..1696,gap(243),AADE0 > 1012620.1:1..612,complement(AADE01002972.1:1..8912),gap > (1646),complement(AAD > E01009428.1:602..2135),AADE01000086.1:1..143541,complement(AA > ... > ... > ... > 01003505.1:1..6496,gap(1445),AADE01004655.1:1..3580,gap > (328),AADE01002622.1: > 1..11193,gap(90),complement(AADE01006718.1:1..1606),gap(423), > complement(AADE01004351.1:1..4128))] in reading EMBL/GenBank/ > SwissProt, > ignoring feature CONTIG (seqid=CH379058): > ------------- EXCEPTION ------------- > MSG: operator "gap" unrecognized by parser > STACK Bio::Factory::FTLocationFactory::from_string > /usr/lib/perl5/site_perl/5.8.5/Bio/Factory/FTLocationFactory.pm:179 > STACK Bio::Factory::FTLocationFactory::from_string > /usr/lib/perl5/site_perl/5.8.5/Bio/Factory/FTLocationFactory.pm:175 > STACK (eval) /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/FTHelper.pm:127 > STACK Bio::SeqIO::FTHelper::_generic_seqfeature > /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/FTHelper.pm:126 > STACK Bio::SeqIO::genbank::next_seq > /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/genbank.pm:514 > STACK toplevel find_orthos.pl:24 > > This even occurs with the addition of ?format => ?genbank? > > > > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Information Technology Support Services Contract > (Science Task Monitor: John D. Grovenstein (grovens1 at niehs.nih.gov) > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From torsten.seemann at infotech.monash.edu.au Thu Mar 29 19:50:20 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 30 Mar 2007 09:50:20 +1000 Subject: [Bioperl-l] Blastall problems In-Reply-To: <20070329141323.wblbdqfcgoowgw0w@webmail.manchester.ac.uk> References: <20070329141323.wblbdqfcgoowgw0w@webmail.manchester.ac.uk> Message-ID: Simon, > I am having a few difficulties with standaloneblast. I am trying to implement > web tool which will blast a given sequence before it goes on to do various > other things. I have a script to run the blast using the appropriate bioperl > modules which runs ok from the command line. The problem comes when I > try to run this through the web page. I get this output: > I'm not sure exactly what this means! > Any ideas would be gratefully received. Try putting this BEGIN{} block at the top of your Perl script: BEGIN { $ENV{BLASTDIR} = '/usr/bin/'; # where my blastall binary is $ENV{BLASTDATADIR} = '/fs/storage/data/db/blast/' ; # where my -d are } This is because the CGI script doesn't necessarily get the same environment as your login account. Are your CGI scripts running as user apache/httpd or are you using suEXEC to run it as 'williams' (or other) user? --Torsten From daniel.lang at biologie.uni-freiburg.de Fri Mar 30 05:20:53 2007 From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang) Date: Fri, 30 Mar 2007 11:20:53 +0200 Subject: [Bioperl-l] Cannot dump GFF with relative coords using Bio::DB::SeqFeature backend Message-ID: <460CD675.10202@biologie.uni-freiburg.de> Hi, when trying to dump GFF (2, 2.5 or 3) with coordinates relative to dumped segment from "Bio::DB::SeqFeature-backended" gbrowse, I get the following error: Can't locate object method "refseq" via package "Bio::DB::SeqFeature::Segment" at /var/www/gbrowse/conf//gbrowse.conf/plugins/GFFDumper.pm line 128., I'm using bioperl-live and gbrowse-stable (both cvs as of yesterday, but also looked in other branches for different versions of both modules). Looking into GFFDumper.pm and Segment.pm reveals, that Segment.pm indeed does not offer a method like Bio::DB::GFF::Segment::refseq, which seems to reset the seqments coords somehow. I didn't look deeper into what refseq really does... If I got it right... Currently the function would be handy for me - Can anyone think of a short-term work-around? I've seen there are functions in Bio::DB::SeqFeature::Segment to transform relative <-> absolute coordinates, but don't really know how to make them work in the context of GFFDumper.pm line 128. Thanks in advance! Daniel -- Daniel Lang University of Freiburg, Plant Biotechnology Schaenzlestr. 1, D-79104 Freiburg fax: +49 761 203 6945 phone: +49 761 203 6974 homepage: http://www.plant-biotech.net/ e-mail: daniel.lang at biologie.uni-freiburg.de ################################################# My software never has bugs. It just develops random features. ################################################# From alexl at users.sourceforge.net Fri Mar 30 03:59:51 2007 From: alexl at users.sourceforge.net (Alex Lancaster) Date: Fri, 30 Mar 2007 00:59:51 -0700 Subject: [Bioperl-l] Packaging bioperl for Fedora Message-ID: Hello bioperl, I'm new to the bioperl world, having just started a research position in which I need to manage a large bioperl-based codebase. To this end, I'm working on packaging bioperl as an official Fedora Package (formerly "Fedora Extras") and I'm currently wading through and packaging the long laundry list of Perl dependencies (then I'm going to try and do the same for biopython). You can see my some of my progress (including links to the reviews) here: http://fedoraproject.org/wiki/AlexLancaster Several issues have arisen during the packaging that I hope the bioperl list could help clarify: 1) There is one dependent package: perl-SVG-Graph that has questionable licensing status that currently prevents it from being packaged, see: http://bugzilla.redhat.com/233848 There is no license in any of the documentation or within the .pm files, save for the following: "COPYRIGHT AND LICENCE Copyright (C) 2002-2003 Allen Day , Chris To " which doesn't really help clarify the situation. As the original upstream authors I have e-mailed both Allen and Chris, but I have yet to hear back from them. Most CPAN modules are usually contributed with the implicit "same licensing conditions as Perl", but Fedora is strict about stating this explictly. If there is anybody on this list who has access to this package and/or knows how to contact the upstream authors to clarify the license conditions? Please let me know, or better, if you can clarify/fix the license in the CPAN modules itself, then please do! 2) http://www.bioperl.org/wiki/Getting_BioPerl lists 1.4.0 as the current "stable" release, but that's getting pretty old, having been released back in 2003. Do you currently recommend people use 1.4.0 or one of the 1.5.x series? I'd rather package the more recent version even if it's not "officially" stable because I know if most bioperl people are using more recent then they'll be calls to also package the newest release and doesn't seem sensible to spend a lot of time packaging the older release. If so, is there a particular 1.5.x package you recommend, perhaps one which is somewhere between totally-stable and bleeding-edge? (I started on 1.5.2_102 since it was the latest). 3) Lastly I see that bioperl is now split into bioperl (core), bioperl-run, bioperl-db, bioperl-network etc. Do you recommend that each be packaged separately? For example, I notice that some scripts in bioperl (core) actually depends on Perl modules that are provided by bioperl-run, which means that they will both be needed to installed simultaneously to make rpm happy. Regards, Alex -- Alex Lancaster, Ph.D. | Ecology & Evolutionary Biology, University of Arizona From bix at sendu.me.uk Fri Mar 30 07:03:29 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 30 Mar 2007 12:03:29 +0100 Subject: [Bioperl-l] Packaging bioperl for Fedora In-Reply-To: References: Message-ID: <460CEE81.3000401@sendu.me.uk> Alex Lancaster wrote: > Hello bioperl, > > I'm new to the bioperl world, having just started a research position > in which I need to manage a large bioperl-based codebase. To this > end, I'm working on packaging bioperl as an official Fedora Package [snip] Thanks, that would be appreciated I'm sure. > 1) There is one dependent package: perl-SVG-Graph that has > questionable licensing status that currently prevents it from being > packaged, see: > > http://bugzilla.redhat.com/233848 [snip] > If there is anybody on this list who has access to this package > and/or knows how to contact the upstream authors to clarify the > license conditions? Please let me know, or better, if you can > clarify/fix the license in the CPAN modules itself, then please do! Allen should have answered; he is still around and posting to this list from time to time. > 2) http://www.bioperl.org/wiki/Getting_BioPerl lists 1.4.0 as the > current "stable" release, but that's getting pretty old, having > been released back in 2003. Do you currently recommend people use > 1.4.0 or one of the 1.5.x series? I'd rather package the more > recent version even if it's not "officially" stable because I know > if most bioperl people are using more recent then they'll be calls > to also package the newest release and doesn't seem sensible to > spend a lot of time packaging the older release. If so, is there a > particular 1.5.x package you recommend, perhaps one which is > somewhere between totally-stable and bleeding-edge? (I started on > 1.5.2_102 since it was the latest). The latest release is the one to go for, suitable for use by most users. So yes, 1.5.2_102. We don't do 'bleeding-edge' releases - for bleeding-edge we ask users to use CVS directly. 'Stable' in this context is in regard to the API, not bugs. > 3) Lastly I see that bioperl is now split into bioperl (core), > bioperl-run, bioperl-db, bioperl-network etc. Do you recommend > that each be packaged separately? For example, I notice that some > scripts in bioperl (core) actually depends on Perl modules that are > provided by bioperl-run, which means that they will both be needed > to installed simultaneously to make rpm happy. Ah, well that perhaps shouldn't be the case. We like to keep them separate since core is usable on its own and the other packages won't be needed by many users. Correspondingly separate rpms would be appropriate. Cheers, Sendu. From heikki at sanbi.ac.za Fri Mar 30 07:58:44 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Fri, 30 Mar 2007 13:58:44 +0200 Subject: [Bioperl-l] Bio::SeqEvolution Message-ID: <200703301358.45618.heikki@sanbi.ac.za> I've finally committed a few modules into Bio::SeqEvolution to mutate DNA sequences. This was discussed some time ago: http://bioperl.org/pipermail/bioperl-l/2006-February/020832.html At the moment only point mutations with definable tr/tv ratio has been implemented. Some work have been done to provide abstraction for more complex models which are not there, yet. -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From cain.cshl at gmail.com Fri Mar 30 08:48:17 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Fri, 30 Mar 2007 08:48:17 -0400 Subject: [Bioperl-l] Packaging bioperl for Fedora In-Reply-To: References: Message-ID: <1175258897.2668.21.camel@localhost.localdomain> Hello Alex, Please take a look at http://biopackages.net/ and talk to Allen Day and/or Brian O'Connor (that is, Dr. Brian O'Connor :-) as they have already done most of the work for this: they have FC2 packages for bioperl and prereqs, they would just need to be updated for more recent releases of Fedora and bioperl. Presumably, they would also want to host the packages at biopackages.net if you do make updated versions. Scott On Fri, 2007-03-30 at 00:59 -0700, Alex Lancaster wrote: > Hello bioperl, > > I'm new to the bioperl world, having just started a research position > in which I need to manage a large bioperl-based codebase. To this > end, I'm working on packaging bioperl as an official Fedora Package > (formerly "Fedora Extras") and I'm currently wading through and > packaging the long laundry list of Perl dependencies (then I'm going > to try and do the same for biopython). You can see my some of my > progress (including links to the reviews) here: > > http://fedoraproject.org/wiki/AlexLancaster > > Several issues have arisen during the packaging that I hope the > bioperl list could help clarify: > > 1) There is one dependent package: perl-SVG-Graph that has > questionable licensing status that currently prevents it from being > packaged, see: > > http://bugzilla.redhat.com/233848 > > There is no license in any of the documentation or within the .pm > files, save for the following: > > "COPYRIGHT AND LICENCE > > Copyright (C) 2002-2003 Allen Day , > Chris To " > > which doesn't really help clarify the situation. As the original > upstream authors I have e-mailed both Allen and Chris, but I have > yet to hear back from them. Most CPAN modules are usually > contributed with the implicit "same licensing conditions as Perl", > but Fedora is strict about stating this explictly. > > If there is anybody on this list who has access to this package > and/or knows how to contact the upstream authors to clarify the > license conditions? Please let me know, or better, if you can > clarify/fix the license in the CPAN modules itself, then please do! > > 2) http://www.bioperl.org/wiki/Getting_BioPerl lists 1.4.0 as the > current "stable" release, but that's getting pretty old, having > been released back in 2003. Do you currently recommend people use > 1.4.0 or one of the 1.5.x series? I'd rather package the more > recent version even if it's not "officially" stable because I know > if most bioperl people are using more recent then they'll be calls > to also package the newest release and doesn't seem sensible to > spend a lot of time packaging the older release. If so, is there a > particular 1.5.x package you recommend, perhaps one which is > somewhere between totally-stable and bleeding-edge? (I started on > 1.5.2_102 since it was the latest). > > 3) Lastly I see that bioperl is now split into bioperl (core), > bioperl-run, bioperl-db, bioperl-network etc. Do you recommend > that each be packaged separately? For example, I notice that some > scripts in bioperl (core) actually depends on Perl modules that are > provided by bioperl-run, which means that they will both be needed > to installed simultaneously to make rpm happy. > > Regards, > Alex > -- > Alex Lancaster, Ph.D. | Ecology & Evolutionary Biology, University of Arizona > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070330/ecf0ba5e/attachment.bin From bix at sendu.me.uk Fri Mar 30 09:00:06 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 30 Mar 2007 14:00:06 +0100 Subject: [Bioperl-l] strange problem In-Reply-To: <6DCFEC52-62F3-4EBC-BECE-836D65574792@uiuc.edu> References: <46028EA0.7070901@crs4.it> <85EAEAA1-8574-4862-8647-D0D4CFDC8C6D@uiuc.edu> <460BE24F.5030704@crs4.it> <6DCFEC52-62F3-4EBC-BECE-836D65574792@uiuc.edu> Message-ID: <460D09D6.2060802@sendu.me.uk> Chris Fields wrote: > The issue with Build.PL may be that it is trying to update Scalar::Util > to the latest (1.19) or install it if it isn't present but doesn't have > the correct permissions. I myself have run into this problem; you must > install the latest Scalar::Utils directly via CPAN with the correct > permissions ('sudo cpan' in my case). I'm guessing that using something > like 'sudo perl Build.PL' should work as well. > > I'm adding Sendu to this so he can answer. Sendu, I know it's now > required for Bio::Species/Bio::Taxon stuff to weaken refs but was there > any reason why Scalar::Util needs to be v1.19? I can't recall anything > about it in past discussions... I don't recall why I made it 1.19. I've just changed the requirement to 0 instead. From lstein at cshl.edu Fri Mar 30 10:01:31 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 30 Mar 2007 10:01:31 -0400 Subject: [Bioperl-l] [Gmod-gbrowse] Cannot dump GFF with relative coords using Bio::DB::SeqFeature backend In-Reply-To: <460CD675.10202@biologie.uni-freiburg.de> References: <460CD675.10202@biologie.uni-freiburg.de> Message-ID: <6dce9a0b0703300701t2735ae17x24643859f3da0ca6@mail.gmail.com> Hi Dan, This is an inadvertent omission. I will need to add the refseq() method bo Bio::DB::SeqFeature::Segment. It simply stores the start and strand information from the reference sequence and then uses that information to offset the coordinates returned by start(), end() and strand(). Lincoln On 3/30/07, Daniel Lang wrote: > > Hi, > > when trying to dump GFF (2, 2.5 or 3) with coordinates relative to > dumped segment from "Bio::DB::SeqFeature-backended" gbrowse, I get the > following error: > > Can't locate object method "refseq" via package > "Bio::DB::SeqFeature::Segment" at > /var/www/gbrowse/conf//gbrowse.conf/plugins/GFFDumper.pm line 128., > > I'm using bioperl-live and gbrowse-stable (both cvs as of yesterday, but > also looked in other branches for different versions of both modules). > > Looking into GFFDumper.pm and Segment.pm reveals, that Segment.pm indeed > does not offer a method like Bio::DB::GFF::Segment::refseq, which seems > to reset the seqments coords somehow. I didn't look deeper into what > refseq really does... > > If I got it right... > Currently the function would be handy for me - Can anyone think of a > short-term work-around? I've seen there are functions in > Bio::DB::SeqFeature::Segment to transform relative <-> absolute > coordinates, but don't really know how to make them work in the context > of GFFDumper.pm line 128. > > Thanks in advance! > > Daniel > > -- > > Daniel Lang > University of Freiburg, Plant Biotechnology > Schaenzlestr. 1, D-79104 Freiburg > fax: +49 761 203 6945 > phone: +49 761 203 6974 > homepage: http://www.plant-biotech.net/ > e-mail: daniel.lang at biologie.uni-freiburg.de > > ################################################# > My software never has bugs. > It just develops random features. > ################################################# > > > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Fri Mar 30 10:04:37 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 30 Mar 2007 09:04:37 -0500 Subject: [Bioperl-l] strange problem In-Reply-To: <460D09D6.2060802@sendu.me.uk> References: <46028EA0.7070901@crs4.it> <85EAEAA1-8574-4862-8647-D0D4CFDC8C6D@uiuc.edu> <460BE24F.5030704@crs4.it> <6DCFEC52-62F3-4EBC-BECE-836D65574792@uiuc.edu> <460D09D6.2060802@sendu.me.uk> Message-ID: <75E07F51-1F43-4FD6-827B-99366CCCC222@uiuc.edu> On Mar 30, 2007, at 8:00 AM, Sendu Bala wrote: > Chris Fields wrote: >> The issue with Build.PL may be that it is trying to update >> Scalar::Util >> to the latest (1.19) or install it if it isn't present but doesn't >> have >> the correct permissions. I myself have run into this problem; you >> must >> install the latest Scalar::Utils directly via CPAN with the correct >> permissions ('sudo cpan' in my case). I'm guessing that using >> something >> like 'sudo perl Build.PL' should work as well. >> >> I'm adding Sendu to this so he can answer. Sendu, I know it's now >> required for Bio::Species/Bio::Taxon stuff to weaken refs but was >> there >> any reason why Scalar::Util needs to be v1.19? I can't recall >> anything >> about it in past discussions... > > I don't recall why I made it 1.19. I've just changed the > requirement to > 0 instead. Sounds good. I'm sure if there are problems we'll hear about it! chris From alexl at users.sourceforge.net Fri Mar 30 10:25:32 2007 From: alexl at users.sourceforge.net (Alex Lancaster) Date: Fri, 30 Mar 2007 07:25:32 -0700 Subject: [Bioperl-l] Packaging bioperl for Fedora In-Reply-To: <1175258897.2668.21.camel@localhost.localdomain> (Scott Cain's message of "Fri\, 30 Mar 2007 08\:48\:17 -0400") References: <1175258897.2668.21.camel@localhost.localdomain> Message-ID: <6d648ierkz.fsf@delpy.biol.berkeley.edu> >>>>> "SC" == Scott Cain writes: SC> Hello Alex, Please take a look at http://biopackages.net/ and talk SC> to Allen Day and/or Brian O'Connor (that is, Dr. Brian O'Connor SC> :-) as they have already done most of the work for this: they have SC> FC2 packages for bioperl and prereqs, they would just need to be SC> updated for more recent releases of Fedora and bioperl. SC> Presumably, they would also want to host the packages at SC> biopackages.net if you do make updated versions. Scott and others, I did look at biopackages.net, which was helpful, however, I decided to contribute these as Fedora packages (see below for why), which means that the build system and infrastructure for the packages are hosted by the Fedora project itself (rather than by a third-party repository). Also the packages on biopackages.net were last updated for Fedora Core 5, which is now 1 year old, so I sort of assumed that the project was probably on ice at the moment. Having the packages in Fedora itself means that the infrastructure is there for new maintainers to pick up a package if an old maintainer "orphans" it. Another factor is that some of the Perl dependencies are required for other (non-Bioperl) packages in Fedora, (e.g. perl-XML-Writer is used by MythTV) so it makes sense for these Perl packages to be part of Fedora itself. Lastly, having Bioperl in Fedora itself means that it can be installed out-of-the-box without having to enable a new yum repository, which is why I think it makes sense to have as many bioinformatics packages in the base distribution and lessens the chance of unexpected interactions between third-party repositories, see: http://fedoraproject.org/wiki/Extras/RepositoryMixingProblems Of course, anybody is welcome to sign up to contribute to Fedora as a packager, and I'd be happy to either hand off maintainership of (or co-maintain) these packages to anybody who's interested. There's a small amount of hassle signing up and then submitting packages, but the package peer-review process (as well as a helpful community) generally helps the quality of packages all round. For some of the other packages on biopackages.net, it would be useful to take the spec files there as a starting point for Fedora packages (e.g maybe for R-Bioconductor). Alex -- Alex Lancaster, Ph.D. | Ecology & Evolutionary Biology, University of Arizona From wrp at virginia.edu Fri Mar 30 14:05:15 2007 From: wrp at virginia.edu (William R. Pearson) Date: Fri, 30 Mar 2007 14:05:15 -0400 Subject: [Bioperl-l] Changes in FASTA output format Message-ID: The next major revision of the FASTA program package will have some major improvements to the strategy for calculating statistical significance, particularly when a small library is being searched (high scoring sequences will be shuffled and used to estimate a second set of statistical parameters). As a result, I am considering some changes in FASTA output. (1) I would like to expand the line that shows the algorithm and scoring matrix parameters to multiple lines. Currently it looks like: Smith-Waterman (SSE2, Michael Farrar 2006) (6.0 Mar 2007) function [BL50 matrix (15:-5)], open/ext: -12/-2 Scan time: 2.140 I would like to allow at least two lines here, one for the algorithm and version, a second for the scoring parameters: Smith-Waterman (SSE2, Michael Farrar 2006) (6.0 Mar 2007) function BL50 matrix (15:-5), open/ext: -12/-2 Scan time: 2.140 I could even imagine tagging the lines: Algorithm: Smith-Waterman (SSE2, Michael Farrar 2006) (6.0 Mar 2007) Parameters: BL50 matrix (15:-5), open/ext: -12/-2 Scan time: 2.140 I don't think this would break many FASTA parsers, but I wanted to check. (2) I am also thinking about displaying multiple E()-values, depending on whether they are calculated from the similarity search or the shuffled high scores, e.g., going from: The best scores are: s-w bits E (231210) gi|121716|sp|P10649|GSTM1_MOUSE Glutathione S-tran ( 218) 1497 349.6 6.1e-96 gi|121717|sp|P04905|GSTM1_RAT Glutathione S-transf ( 218) 1413 330.4 3.8e-90 gi|399829|sp|Q00285|GSTMU_CRILO Glutathione S-tran ( 218) 1354 316.9 4.5e-86 To: The best scores are: s-w bits E (231210) ES() gi|121716|sp|P10649|GSTM1_MOUSE Glutathione S-tran ( 218) 1497 349.6 6.1e-96 5.5e-95 gi|121717|sp|P04905|GSTM1_RAT Glutathione S-transf ( 218) 1413 330.4 3.8e-90 2.2e-89 gi|399829|sp|Q00285|GSTMU_CRILO Glutathione S-tran ( 218) 1354 316.9 4.5e-86 8.3e-85 I think this output would break many more FASTA parsers, and one option would be (initially) to add it only to the alignment output. Naturally, initially it will be easy to revert to the classic format. I would appreciate any comments on the problems these changes might cause. Bill Pearson From allenday at ucla.edu Fri Mar 30 14:42:54 2007 From: allenday at ucla.edu (Allen Day) Date: Fri, 30 Mar 2007 11:42:54 -0700 Subject: [Bioperl-l] Packaging bioperl for Fedora In-Reply-To: <6d648ierkz.fsf@delpy.biol.berkeley.edu> References: <1175258897.2668.21.camel@localhost.localdomain> <6d648ierkz.fsf@delpy.biol.berkeley.edu> Message-ID: <5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com> Hi Alex, The Biopackages.net project is still active, we are regularly adding packages to it, mostly R packages lately. Most of the systems we use are running CentOS at this point, which is why you have not seen support for FC6 yet. There is nothing preventing building FC6 packages aside from lack of time to set up the FC6 build farm nodes. If you're interested in packaging BioPerl or other bioinformatics-related software, please join the Biopackages project on SourceForge. We object to the Fedora Extras FUD tactics used to discourage people from using 3rd party repositories, and suspect they may not want to host some of our data packages, such as the >2GB genome packages. Biopackages project is likely to partially merge with RPMForge. We are already discussing with them how best to do it. -Allen On 3/30/07, Alex Lancaster wrote: > >>>>> "SC" == Scott Cain writes: > > SC> Hello Alex, Please take a look at http://biopackages.net/ and talk > SC> to Allen Day and/or Brian O'Connor (that is, Dr. Brian O'Connor > SC> :-) as they have already done most of the work for this: they have > SC> FC2 packages for bioperl and prereqs, they would just need to be > SC> updated for more recent releases of Fedora and bioperl. > SC> Presumably, they would also want to host the packages at > SC> biopackages.net if you do make updated versions. > > Scott and others, > > I did look at biopackages.net, which was helpful, however, I decided > to contribute these as Fedora packages (see below for why), which > means that the build system and infrastructure for the packages are > hosted by the Fedora project itself (rather than by a third-party > repository). > > Also the packages on biopackages.net were last updated for Fedora Core > 5, which is now 1 year old, so I sort of assumed that the project was > probably on ice at the moment. Having the packages in Fedora itself > means that the infrastructure is there for new maintainers to pick up > a package if an old maintainer "orphans" it. Another factor is that > some of the Perl dependencies are required for other (non-Bioperl) > packages in Fedora, (e.g. perl-XML-Writer is used by MythTV) so it > makes sense for these Perl packages to be part of Fedora itself. > > Lastly, having Bioperl in Fedora itself means that it can be installed > out-of-the-box without having to enable a new yum repository, which is > why I think it makes sense to have as many bioinformatics packages in > the base distribution and lessens the chance of unexpected > interactions between third-party repositories, see: > > http://fedoraproject.org/wiki/Extras/RepositoryMixingProblems > > Of course, anybody is welcome to sign up to contribute to Fedora as a > packager, and I'd be happy to either hand off maintainership of (or > co-maintain) these packages to anybody who's interested. There's a > small amount of hassle signing up and then submitting packages, but > the package peer-review process (as well as a helpful community) > generally helps the quality of packages all round. For some of the > other packages on biopackages.net, it would be useful to take the spec > files there as a starting point for Fedora packages (e.g maybe for > R-Bioconductor). > > Alex > -- > Alex Lancaster, Ph.D. | Ecology & Evolutionary Biology, University of Arizona > > > From cjfields at uiuc.edu Fri Mar 30 15:36:26 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 30 Mar 2007 14:36:26 -0500 Subject: [Bioperl-l] Packaging bioperl for Fedora In-Reply-To: <5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com> References: <1175258897.2668.21.camel@localhost.localdomain> <6d648ierkz.fsf@delpy.biol.berkeley.edu> <5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com> Message-ID: <48F80131-2625-47B2-A441-E064A287B539@uiuc.edu> On Mar 30, 2007, at 1:42 PM, Allen Day wrote: > Hi Alex, > > The Biopackages.net project is still active, we are regularly adding > packages to it, mostly R packages lately. Most of the systems we use > are running CentOS at this point, which is why you have not seen > support for FC6 yet. There is nothing preventing building FC6 > packages aside from lack of time to set up the FC6 build farm nodes. > > If you're interested in packaging BioPerl or other > bioinformatics-related software, please join the Biopackages project > on SourceForge. We object to the Fedora Extras FUD tactics used to > discourage people from using 3rd party repositories, and suspect they > may not want to host some of our data packages, such as the >2GB > genome packages. Biopackages project is likely to partially merge > with RPMForge. We are already discussing with them how best to do it. > > -Allen My personal feeling about this is if anyone wants to contribute a package which makes it easier to install or update BioPerl, be it through Biopackages, Fink, fedora extras, or whatever, I'm all for it, 'FUD' or not. chris From dmessina at wustl.edu Fri Mar 30 16:24:26 2007 From: dmessina at wustl.edu (David Messina) Date: Fri, 30 Mar 2007 15:24:26 -0500 Subject: [Bioperl-l] Changes in FASTA output format In-Reply-To: References: Message-ID: > I could even imagine tagging the lines: > > Algorithm: Smith-Waterman (SSE2, Michael Farrar 2006) (6.0 Mar > 2007) > Parameters: BL50 matrix (15:-5), open/ext: -12/-2 > Scan time: 2.140 IMO, tagged lines would be great and make parsing very easy. > (2) I am also thinking about displaying multiple E()-values, > depending on whether they are calculated from the similarity search > or the shuffled high scores, e.g., going from: > > [...] > > I think this output would break many more FASTA parsers, and one > option would be (initially) to add it only to the alignment output. Agreed, but... > Naturally, initially it will be easy to revert to the classic format. I think the backwards compatibility you describe here would take care of those cases. My two cents (and thanks for asking :), Dave -- Dave Messina Senior Analyst, Assembly Group Genome Sequencing Center Washington University St. Louis, MO From alexl at users.sourceforge.net Fri Mar 30 19:52:08 2007 From: alexl at users.sourceforge.net (Alex Lancaster) Date: Fri, 30 Mar 2007 16:52:08 -0700 Subject: [Bioperl-l] Packaging bioperl for Fedora In-Reply-To: <5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com> (Allen Day's message of "Fri\, 30 Mar 2007 11\:42\:54 -0700") References: <1175258897.2668.21.camel@localhost.localdomain> <6d648ierkz.fsf@delpy.biol.berkeley.edu> <5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com> Message-ID: <1p8xdeb87r.fsf@delpy.biol.berkeley.edu> >>>>> "AD" == Allen Day writes: AD> Hi Alex, The Biopackages.net project is still active, we are AD> regularly adding packages to it, mostly R packages lately. Most AD> of the systems we use are running CentOS at this point, which is AD> why you have not seen support for FC6 yet. There is nothing AD> preventing building FC6 packages aside from lack of time to set up AD> the FC6 build farm nodes. Hi Allen and other, Great news to hear that Biopackages.net is still active! I would like to help out if possible. I don't believe in "FUD" either... ;) AD> If you're interested in packaging BioPerl or other AD> bioinformatics-related software, please join the Biopackages AD> project on SourceForge. We object to the Fedora Extras FUD AD> tactics used to discourage people from using 3rd party AD> repositories, and suspect they may not want to host some of our AD> data packages, such as the >2GB genome packages. Biopackages AD> project is likely to partially merge with RPMForge. We are AD> already discussing with them how best to do it. The packages that I created which are currently available in Fedora Packages are Perl dependencies which, as I said are useful for packages outside the bioinformatics purview. I do have a (base) bioperl package in review, but it is not yet released. As for third-party repos, I don't object to them at all, and for some kinds of projects they are indeed appropriate. (e.g. for non-free stuff like Livna or Freshrpms). However I do have practical concerns about repository mixing, but I think that it does need to be handled carefully but that co-operation between Fedora and third-party repos can make it work. For example, one practical concern is that as of the soon-to-be-released Fedora 7, Core+Extras will be merged, so there will be no distinction at the repository-level between formerly Extras packages and formerly Core packages (as of now there are only "Fedora Packages"), which means that it will not be possible for third-party repos to limit their dependencies to just those in a former base set (i.e. excluding Extras). I agree that a few years ago (circa 2003-2004) there was concern about the way some third party repositories were treated somewhat badly by the (then) Fedora Extras (with some people going so far as to say that third-party repos were bad in principle and should always be ignored which I disagree with too). But it seems to me that culture has shifted since, with some notable packagers such as Matthias Saou (of Freshrpms) and Axel Thimm (of Atrpms) now contributing packages to Fedora itself. The process of contributing has also become much simpler and reviews are conducted speedily and efficiently, I had packages in the repository in a matter of a few days from initial submission. Freshrpms itself now enables and depends on the (old) Extras. The real question for me, then is what packages it makes sense to go in Fedora, and what packages go in third party repositories. It seems to me that in the case of Perl packages which could be dependencies for other packages not specific to the third-party repo in question, it makes sense for them to go into Fedora itself, so I think I will continue to package them. This lessens the load on the third-party repo, while making them available for all other third-party repos. (This is approach that Freshrpms seems to be taking, Matthias has contributed most packages back to Fedora now other than the non-free ones). At the other end of the spectrum are packages like you mention, genome packages, which may be of concern because of their size and/or highly specialised nature, and, as you say, may make sense to go in a third-party repo like Biopackages.net. Also packages which can't be packaged by Fedora for legal reasons like Clustal could/should go in Biopackages.net. In the middle are packages like bioperl itself which are potentially useful to perhaps a wider group of people than the genome packages but may not necessarily be dependencies for other packages. I lean towards making them part of Fedora so that they will be available of out the box on the planned "Everything" DVD ISO, but I welcome a discussion on this. As I said, I'm glad to hear that Biopackages.net is alive and well and I welcome a discussion on how upstream Fedora can usefully interact with Biopackages.net (I guess perhaps on the Biopackages.net list). Regards, Alex PS. As the upstream author If you could clarify the license on perl-SVG-Graph, on CPAN (or on the mailing list) that would be great. -- Alex Lancaster, Ph.D. | Ecology & Evolutionary Biology, University of Arizona From allenday at gmail.com Fri Mar 30 20:30:27 2007 From: allenday at gmail.com (Allen Day) Date: Fri, 30 Mar 2007 17:30:27 -0700 Subject: [Bioperl-l] Packaging bioperl for Fedora In-Reply-To: <1p8xdeb87r.fsf@delpy.biol.berkeley.edu> References: <1175258897.2668.21.camel@localhost.localdomain> <6d648ierkz.fsf@delpy.biol.berkeley.edu> <5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com> <1p8xdeb87r.fsf@delpy.biol.berkeley.edu> Message-ID: <5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com> Hi Alex, You've aptly noted that there are several classes of packages being discussed here, and that they should not be treated equally. From my point of view and of specific relevance to the Bioperl community we have at least: 1) "regular" CPAN dependencies and their occassional C/C++/Fortran dependencies. These should all be in Fedora Extras, as they are of general utility. Biopackages.net currently hosts about 200 packages (.spec files, specifically) that are like this. Maybe 80 of these are needed for Bioperl. 2) academic packages, such as BLAT, NCBI Toolkit, CLUSTAL, genscan, etc. From what I've seen, these typically have strange/custom licenses that may not be valid for some users. BLAT has a dual licensing scheme for academic and non-academic licensees, for instance. These packages are not of general utility. For these two reasons, my stance is that they should not be included in Fedora Extras. 3) Bioperl packages. Several subsets here. The Bioperl-run libraries depend directly on type (2) packages, so aren't appropriate to include in Fedora Extras. Bioperl-live is not really that useful without type (2) packages. It is also sensible to all of the keep the Bioperl-* packages in the same repository. For these reasons, my stance is that they should not be included in Fedora Extras. 4) Bioinformatics / Comp. Bio. data sets. These don't have licensing problems, but they tend to be large. Usually in the 10E7 - 10E10 byte range. RPM can not even generate correct metadata for some of them correctly if the files are too large (overflow problems). Probably not appropriate to put in Fedora Extras because they are too large and not generally useful. 5) Bioinformatics-specific System databases / daemons. These high-level packages depend on types (2), (3), and (4), and so are not appropriate to put into Fedora Extras. An example is a BLAT daemon, which relies on the BLAT server, as well as NIB-formatted genome sequence files. That said, there are a lot of type (1) packages in the Biopackages.net repository. If you're interested in migrating the spec files from our repository to the Fedora project it would save us (the Biopackages.net maintainers) a ton of build and maintenance time, so please feel free to take them, just let us know. If we can reach some agreement on where the bioinformatics-specific packages should be maintained/built we may be able to work together on these as well. -Allen On 3/30/07, Alex Lancaster wrote: > >>>>> "AD" == Allen Day writes: > > AD> Hi Alex, The Biopackages.net project is still active, we are > AD> regularly adding packages to it, mostly R packages lately. Most > AD> of the systems we use are running CentOS at this point, which is > AD> why you have not seen support for FC6 yet. There is nothing > AD> preventing building FC6 packages aside from lack of time to set up > AD> the FC6 build farm nodes. > > Hi Allen and other, > > Great news to hear that Biopackages.net is still active! I would like > to help out if possible. I don't believe in "FUD" either... ;) > > AD> If you're interested in packaging BioPerl or other > AD> bioinformatics-related software, please join the Biopackages > AD> project on SourceForge. We object to the Fedora Extras FUD > AD> tactics used to discourage people from using 3rd party > AD> repositories, and suspect they may not want to host some of our > AD> data packages, such as the >2GB genome packages. Biopackages > AD> project is likely to partially merge with RPMForge. We are > AD> already discussing with them how best to do it. > > The packages that I created which are currently available in Fedora > Packages are Perl dependencies which, as I said are useful for > packages outside the bioinformatics purview. I do have a (base) > bioperl package in review, but it is not yet released. > > As for third-party repos, I don't object to them at all, and for some > kinds of projects they are indeed appropriate. (e.g. for non-free > stuff like Livna or Freshrpms). However I do have practical concerns > about repository mixing, but I think that it does need to be handled > carefully but that co-operation between Fedora and third-party repos > can make it work. > > For example, one practical concern is that as of the > soon-to-be-released Fedora 7, Core+Extras will be merged, so there > will be no distinction at the repository-level between formerly Extras > packages and formerly Core packages (as of now there are only "Fedora > Packages"), which means that it will not be possible for third-party > repos to limit their dependencies to just those in a former base set > (i.e. excluding Extras). > > I agree that a few years ago (circa 2003-2004) there was concern about > the way some third party repositories were treated somewhat badly by > the (then) Fedora Extras (with some people going so far as to say that > third-party repos were bad in principle and should always be ignored > which I disagree with too). But it seems to me that culture has > shifted since, with some notable packagers such as Matthias Saou (of > Freshrpms) and Axel Thimm (of Atrpms) now contributing packages to > Fedora itself. The process of contributing has also become much > simpler and reviews are conducted speedily and efficiently, I had > packages in the repository in a matter of a few days from initial > submission. Freshrpms itself now enables and depends on the (old) > Extras. > > The real question for me, then is what packages it makes sense to go > in Fedora, and what packages go in third party repositories. It seems > to me that in the case of Perl packages which could be dependencies > for other packages not specific to the third-party repo in question, > it makes sense for them to go into Fedora itself, so I think I will > continue to package them. This lessens the load on the third-party > repo, while making them available for all other third-party repos. > (This is approach that Freshrpms seems to be taking, Matthias has > contributed most packages back to Fedora now other than the non-free > ones). > > At the other end of the spectrum are packages like you mention, genome > packages, which may be of concern because of their size and/or highly > specialised nature, and, as you say, may make sense to go in a > third-party repo like Biopackages.net. Also packages which can't be > packaged by Fedora for legal reasons like Clustal could/should go in > Biopackages.net. > > In the middle are packages like bioperl itself which are potentially > useful to perhaps a wider group of people than the genome packages but > may not necessarily be dependencies for other packages. I lean > towards making them part of Fedora so that they will be available of > out the box on the planned "Everything" DVD ISO, but I welcome a > discussion on this. > > As I said, I'm glad to hear that Biopackages.net is alive and well and > I welcome a discussion on how upstream Fedora can usefully interact > with Biopackages.net (I guess perhaps on the Biopackages.net list). > > Regards, > Alex > > PS. As the upstream author If you could clarify the license on > perl-SVG-Graph, on CPAN (or on the mailing list) that would be great. > -- > Alex Lancaster, Ph.D. | Ecology & Evolutionary Biology, University of Arizona > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Fri Mar 30 20:51:38 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 30 Mar 2007 19:51:38 -0500 Subject: [Bioperl-l] Changes in FASTA output format In-Reply-To: References:

Message-ID: <7D42F947-F7F5-4DD1-B6FF-A951679F6BC8@uiuc.edu> On Mar 30, 2007, at 3:24 PM, David Messina wrote: >> I could even imagine tagging the lines: >> >> Algorithm: Smith-Waterman (SSE2, Michael Farrar 2006) (6.0 Mar >> 2007) >> Parameters: BL50 matrix (15:-5), open/ext: -12/-2 >> Scan time: 2.140 > > IMO, tagged lines would be great and make parsing very easy. > > >> (2) I am also thinking about displaying multiple E()-values, >> depending on whether they are calculated from the similarity search >> or the shuffled high scores, e.g., going from: >> >> [...] >> >> I think this output would break many more FASTA parsers, and one >> option would be (initially) to add it only to the alignment output. > > Agreed, but... > > >> Naturally, initially it will be easy to revert to the classic format. > > I think the backwards compatibility you describe here would take care > of those cases. > > > My two cents (and thanks for asking :), > Dave > > -- > Dave Messina > Senior Analyst, Assembly Group > Genome Sequencing Center > Washington University > St. Louis, MO If it ever becomes a problem we can pass off the flow of parsing to specific parser methods (one for the old version, one for the new) or just try to evaluate them separately (ala SearchIO::blast). If there are tags that make new format distinguishable from the old, such as the "Algorithm:" or "Parameters:" above, then that would be a good point to catch the difference and pass off to the appropriate method. We'll need to add this to the Project Priority List... chris From hlapp at gmx.net Fri Mar 30 22:06:32 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 30 Mar 2007 22:06:32 -0400 Subject: [Bioperl-l] Packaging bioperl for Fedora In-Reply-To: <5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com> References: <1175258897.2668.21.camel@localhost.localdomain> <6d648ierkz.fsf@delpy.biol.berkeley.edu> <5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com> <1p8xdeb87r.fsf@delpy.biol.berkeley.edu> <5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com> Message-ID: <16153593-5B2A-43B4-9366-282C654E40E7@gmx.net> On Mar 30, 2007, at 8:30 PM, Allen Day wrote: > 2) academic packages, such as BLAT, NCBI Toolkit, CLUSTAL, genscan, > etc. [...] > > 3) [...] Bioperl-live is not really that useful without type > (2) packages. ?? Why's that? -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From allenday at gmail.com Sat Mar 31 00:02:03 2007 From: allenday at gmail.com (Allen Day) Date: Fri, 30 Mar 2007 21:02:03 -0700 Subject: [Bioperl-l] Packaging bioperl for Fedora In-Reply-To: <16153593-5B2A-43B4-9366-282C654E40E7@gmx.net> References: <1175258897.2668.21.camel@localhost.localdomain> <6d648ierkz.fsf@delpy.biol.berkeley.edu> <5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com> <1p8xdeb87r.fsf@delpy.biol.berkeley.edu> <5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com> <16153593-5B2A-43B4-9366-282C654E40E7@gmx.net> Message-ID: <5c24dcc30703302102w2f008b7bt6e7d77ec42f21011@mail.gmail.com> The majority of the Bioperl classes are file parsers, or manipulate data that comes from the file parsers. Yes there are exceptions like the Eutils and Ensembl-intefacing classes, but they are the minority. The types of files that are worked with are generally either A) primary data sets such as genome data, or B) derivative data, such as sequence alignments that are derived from primary data using an algorithm. If we're in agreement that the primary data sets and libraries/applications for producing derivative data should not be present in Fedora Extras, then it follows that the Bioperl classes for manipulating these primary and derivative data should also not be present in Fedora Extras as they are of little use without data to manipulate. -Allen On 3/30/07, Hilmar Lapp wrote: > > On Mar 30, 2007, at 8:30 PM, Allen Day wrote: > > > 2) academic packages, such as BLAT, NCBI Toolkit, CLUSTAL, genscan, > > etc. [...] > > > > 3) [...] Bioperl-live is not really that useful without type > > (2) packages. > > ?? Why's that? > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > From cjfields at uiuc.edu Sat Mar 31 00:39:15 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 30 Mar 2007 23:39:15 -0500 Subject: [Bioperl-l] Packaging bioperl for Fedora In-Reply-To: <5c24dcc30703302102w2f008b7bt6e7d77ec42f21011@mail.gmail.com> References: <1175258897.2668.21.camel@localhost.localdomain> <6d648ierkz.fsf@delpy.biol.berkeley.edu> <5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com> <1p8xdeb87r.fsf@delpy.biol.berkeley.edu> <5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com> <16153593-5B2A-43B4-9366-282C654E40E7@gmx.net> <5c24dcc30703302102w2f008b7bt6e7d77ec42f21011@mail.gmail.com> Message-ID: On Mar 30, 2007, at 11:02 PM, Allen Day wrote: > The majority of the Bioperl classes are file parsers, or manipulate > data that comes from the file parsers. Yes there are exceptions like > the Eutils and Ensembl-intefacing classes, but they are the minority. > The types of files that are worked with are generally either A) > primary data sets such as genome data, or B) derivative data, such as > sequence alignments that are derived from primary data using an > algorithm. > > If we're in agreement that the primary data sets and > libraries/applications for producing derivative data should not be > present in Fedora Extras, then it follows that the Bioperl classes for > manipulating these primary and derivative data should also not be > present in Fedora Extras as they are of little use without data to > manipulate. > > -Allen I respectfully disagree. BioPerl, to me, is a toolkit which helps accomplish certain tasks. As with any toolkit, not all parts are required to do what one needs. A good number of end-users use BioPerl for remote database queries (Bio::DB::GenBank/Taxonomy/etc), remote BLAST, seq analysis, alignment analysis, phylogenetic tree manipulation, etc, none of which require outside apps be installed. For many a remote db is their primary source of data; not everybody sets up BioPerl for accessing local db records, running programs, etc (just the smart ones!). As for outside apps, the docs are pretty explicit where certain outside resources (libxml2, expat, libgd) are needed for functionality. When we package up a new release we generally have ActiveState PPM archives available for Win32 users who want an easy way to install BioPerl. I wouldn't have a problem if ActiveState wanted to post these to their repository. Why would allowing someone to do the same for fedora extras be any different? chris > On 3/30/07, Hilmar Lapp wrote: >> >> On Mar 30, 2007, at 8:30 PM, Allen Day wrote: >> >>> 2) academic packages, such as BLAT, NCBI Toolkit, CLUSTAL, genscan, >>> etc. [...] >>> >>> 3) [...] Bioperl-live is not really that useful without type >>> (2) packages. >> >> ?? Why's that? >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== From alexl at users.sourceforge.net Sat Mar 31 00:55:31 2007 From: alexl at users.sourceforge.net (Alex Lancaster) Date: Fri, 30 Mar 2007 21:55:31 -0700 Subject: [Bioperl-l] Packaging bioperl for Fedora In-Reply-To: <5c24dcc30703302102w2f008b7bt6e7d77ec42f21011@mail.gmail.com> (Allen Day's message of "Fri\, 30 Mar 2007 21\:02\:03 -0700") References: <1175258897.2668.21.camel@localhost.localdomain> <6d648ierkz.fsf@delpy.biol.berkeley.edu> <5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com> <1p8xdeb87r.fsf@delpy.biol.berkeley.edu> <5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com> <16153593-5B2A-43B4-9366-282C654E40E7@gmx.net> <5c24dcc30703302102w2f008b7bt6e7d77ec42f21011@mail.gmail.com> Message-ID: >>>>> "AD" == Allen Day writes: AD> The majority of the Bioperl classes are file parsers, or AD> manipulate data that comes from the file parsers. Yes there are AD> exceptions like the Eutils and Ensembl-intefacing classes, but AD> they are the minority. The types of files that are worked with AD> are generally either A) primary data sets such as genome data, or AD> B) derivative data, such as sequence alignments that are derived AD> from primary data using an algorithm. AD> If we're in agreement that the primary data sets and AD> libraries/applications for producing derivative data should not be AD> present in Fedora Extras, then it follows that the Bioperl classes AD> for manipulating these primary and derivative data should also not AD> be present in Fedora Extras as they are of little use without data AD> to manipulate. That's not entirely true: I'm using Bioperl (just the "live" package alone) to do some data analysis since I'm not using any of these genomes. Some data sets won't come from these large genome databases, but maybe local population datasets produced inhouse, for these kinds of analyses it's sufficient to just have bioperl-live. I think it's still workable if a subset of bioperl lives in Fedora space ("Extras" wil be no more as of F7). As an example of a split that works: look at how Fedora/Livna handles the media player xine: xine-lib lives in Fedora but various non-free plugins live in Livna. I agree that this may not work in our case, but it's worth thinking about. My sense is that if the license of a package is OK with Fedora and it doesn't otherwise break up a group of packages too much (e.g. breaking up Bioperl may or may not be a good idea), then it could go into Fedora. (For example I believe that the NCBI C++ toolkit's license should be just fine being a work of the US government is in the public domain[1], so it could go in Fedora, same for BLAST[2], which is has an implementation within the toolkit). Is anybody in Biopackages.net interested in packaging biopython there? Given that whether or not Bioperl lives (partially) in Fedora is still under discussion, it might make sense for me to suspend work on the Bioperl stuff (although not the CPAN Perl deps) and work on a biopython package for Fedora for the moment until we decide how to proceed with Bioperl. Alex [1] http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/corelib/README [2] http://www.ncbi.nlm.nih.gov/blast/developer.shtml -- Alex Lancaster, Ph.D. | Ecology & Evolutionary Biology, University of Arizona From cjfields at uiuc.edu Sat Mar 31 11:50:59 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 31 Mar 2007 10:50:59 -0500 Subject: [Bioperl-l] Packaging bioperl for Fedora In-Reply-To: References: <1175258897.2668.21.camel@localhost.localdomain> <6d648ierkz.fsf@delpy.biol.berkeley.edu> <5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com> <1p8xdeb87r.fsf@delpy.biol.berkeley.edu> <5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com> <16153593-5B2A-43B4-9366-282C654E40E7@gmx.net> <5c24dcc30703302102w2f008b7bt6e7d77ec42f21011@mail.gmail.com> Message-ID: > ... > My sense is that if the license of a package is OK with Fedora and it > doesn't otherwise break up a group of packages too much (e.g. breaking > up Bioperl may or may not be a good idea), then it could go into > Fedora. (For example I believe that the NCBI C++ toolkit's license > should be just fine being a work of the US government is in the public > domain[1], so it could go in Fedora, same for BLAST[2], which is has > an implementation within the toolkit). BioPerl is issued under the Perl Artistic License so there shouldn't be any problems with making a distribution. A copy of the license is included in CVS and should be included in the distribution. http://www.bioperl.org/wiki/Licensing_BioPerl As I stated in a another post we already make bioperl core, bioperl- run, bioperl-network, and bioperl-db PPMs for ActivePerl so I don't see a problem with having a Fedora RPM available for core, bioperl- run, bioperl-db. etc., though I would stick with releases and not CVS. chris > Is anybody in Biopackages.net interested in packaging biopython there? > Given that whether or not Bioperl lives (partially) in Fedora is still > under discussion, it might make sense for me to suspend work on the > Bioperl stuff (although not the CPAN Perl deps) and work on a > biopython package for Fedora for the moment until we decide how to > proceed with Bioperl. > > Alex > > [1] http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/ > corelib/README > [2] http://www.ncbi.nlm.nih.gov/blast/developer.shtml > -- > Alex Lancaster, Ph.D. | Ecology & Evolutionary Biology, University > of Arizona From bosborne11 at verizon.net Sat Mar 31 13:08:42 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Sat, 31 Mar 2007 13:08:42 -0400 Subject: [Bioperl-l] Packaging bioperl for Fedora In-Reply-To: <5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com> Message-ID: Allen et al., What happened to the "GMOD" package or packages? I've had some conversations in the past few months with you-all suggesting that a GMOD package, or packages, would be useful. Brian O. On 3/30/07 8:30 PM, "Allen Day" wrote: > Hi Alex, > > You've aptly noted that there are several classes of packages being > discussed here, and that they should not be treated equally. From my > point of view and of specific relevance to the Bioperl community we > have at least: > > 1) "regular" CPAN dependencies and their occassional C/C++/Fortran > dependencies. These should all be in Fedora Extras, as they are of > general utility. Biopackages.net currently hosts about 200 packages > (.spec files, specifically) that are like this. Maybe 80 of these are > needed for Bioperl. > > 2) academic packages, such as BLAT, NCBI Toolkit, CLUSTAL, genscan, > etc. From what I've seen, these typically have strange/custom > licenses that may not be valid for some users. BLAT has a dual > licensing scheme for academic and non-academic licensees, for > instance. These packages are not of general utility. For these two > reasons, my stance is that they should not be included in Fedora > Extras. > > 3) Bioperl packages. Several subsets here. The Bioperl-run libraries > depend directly on type (2) packages, so aren't appropriate to include > in Fedora Extras. Bioperl-live is not really that useful without type > (2) packages. It is also sensible to all of the keep the Bioperl-* > packages in the same repository. For these reasons, my stance is that > they should not be included in Fedora Extras. > > 4) Bioinformatics / Comp. Bio. data sets. These don't have licensing > problems, but they tend to be large. Usually in the 10E7 - 10E10 byte > range. RPM can not even generate correct metadata for some of them > correctly if the files are too large (overflow problems). Probably > not appropriate to put in Fedora Extras because they are too large and > not generally useful. > > 5) Bioinformatics-specific System databases / daemons. These > high-level packages depend on types (2), (3), and (4), and so are not > appropriate to put into Fedora Extras. An example is a BLAT daemon, > which relies on the BLAT server, as well as NIB-formatted genome > sequence files. > > That said, there are a lot of type (1) packages in the Biopackages.net > repository. If you're interested in migrating the spec files from our > repository to the Fedora project it would save us (the Biopackages.net > maintainers) a ton of build and maintenance time, so please feel free > to take them, just let us know. If we can reach some agreement on > where the bioinformatics-specific packages should be maintained/built > we may be able to work together on these as well. > > -Allen > > > On 3/30/07, Alex Lancaster wrote: >>>>>>> "AD" == Allen Day writes: >> >> AD> Hi Alex, The Biopackages.net project is still active, we are >> AD> regularly adding packages to it, mostly R packages lately. Most >> AD> of the systems we use are running CentOS at this point, which is >> AD> why you have not seen support for FC6 yet. There is nothing >> AD> preventing building FC6 packages aside from lack of time to set up >> AD> the FC6 build farm nodes. >> >> Hi Allen and other, >> >> Great news to hear that Biopackages.net is still active! I would like >> to help out if possible. I don't believe in "FUD" either... ;) >> >> AD> If you're interested in packaging BioPerl or other >> AD> bioinformatics-related software, please join the Biopackages >> AD> project on SourceForge. We object to the Fedora Extras FUD >> AD> tactics used to discourage people from using 3rd party >> AD> repositories, and suspect they may not want to host some of our >> AD> data packages, such as the >2GB genome packages. Biopackages >> AD> project is likely to partially merge with RPMForge. We are >> AD> already discussing with them how best to do it. >> >> The packages that I created which are currently available in Fedora >> Packages are Perl dependencies which, as I said are useful for >> packages outside the bioinformatics purview. I do have a (base) >> bioperl package in review, but it is not yet released. >> >> As for third-party repos, I don't object to them at all, and for some >> kinds of projects they are indeed appropriate. (e.g. for non-free >> stuff like Livna or Freshrpms). However I do have practical concerns >> about repository mixing, but I think that it does need to be handled >> carefully but that co-operation between Fedora and third-party repos >> can make it work. >> >> For example, one practical concern is that as of the >> soon-to-be-released Fedora 7, Core+Extras will be merged, so there >> will be no distinction at the repository-level between formerly Extras >> packages and formerly Core packages (as of now there are only "Fedora >> Packages"), which means that it will not be possible for third-party >> repos to limit their dependencies to just those in a former base set >> (i.e. excluding Extras). >> >> I agree that a few years ago (circa 2003-2004) there was concern about >> the way some third party repositories were treated somewhat badly by >> the (then) Fedora Extras (with some people going so far as to say that >> third-party repos were bad in principle and should always be ignored >> which I disagree with too). But it seems to me that culture has >> shifted since, with some notable packagers such as Matthias Saou (of >> Freshrpms) and Axel Thimm (of Atrpms) now contributing packages to >> Fedora itself. The process of contributing has also become much >> simpler and reviews are conducted speedily and efficiently, I had >> packages in the repository in a matter of a few days from initial >> submission. Freshrpms itself now enables and depends on the (old) >> Extras. >> >> The real question for me, then is what packages it makes sense to go >> in Fedora, and what packages go in third party repositories. It seems >> to me that in the case of Perl packages which could be dependencies >> for other packages not specific to the third-party repo in question, >> it makes sense for them to go into Fedora itself, so I think I will >> continue to package them. This lessens the load on the third-party >> repo, while making them available for all other third-party repos. >> (This is approach that Freshrpms seems to be taking, Matthias has >> contributed most packages back to Fedora now other than the non-free >> ones). >> >> At the other end of the spectrum are packages like you mention, genome >> packages, which may be of concern because of their size and/or highly >> specialised nature, and, as you say, may make sense to go in a >> third-party repo like Biopackages.net. Also packages which can't be >> packaged by Fedora for legal reasons like Clustal could/should go in >> Biopackages.net. >> >> In the middle are packages like bioperl itself which are potentially >> useful to perhaps a wider group of people than the genome packages but >> may not necessarily be dependencies for other packages. I lean >> towards making them part of Fedora so that they will be available of >> out the box on the planned "Everything" DVD ISO, but I welcome a >> discussion on this. >> >> As I said, I'm glad to hear that Biopackages.net is alive and well and >> I welcome a discussion on how upstream Fedora can usefully interact >> with Biopackages.net (I guess perhaps on the Biopackages.net list). >> >> Regards, >> Alex >> >> PS. As the upstream author If you could clarify the license on >> perl-SVG-Graph, on CPAN (or on the mailing list) that would be great. >> -- >> Alex Lancaster, Ph.D. | Ecology & Evolutionary Biology, University of Arizona >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From allenday at gmail.com Sat Mar 31 15:25:24 2007 From: allenday at gmail.com (Allen Day) Date: Sat, 31 Mar 2007 12:25:24 -0700 Subject: [Bioperl-l] Packaging bioperl for Fedora In-Reply-To: References: <5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com> Message-ID: <5c24dcc30703311225t6021e455yf83548ba3ef86101@mail.gmail.com> Most of the GMOD software is packaged and available from Biopackages.net . Gbrowse, Textpresso, AmiGO, Chado for various organisms, BLAST graphic, GMOD web, DAS2 server, and a few others are there. If there's something missing just let us know and we'll package it up. -Allen On 3/31/07, Brian Osborne wrote: > Allen et al., > > What happened to the "GMOD" package or packages? I've had some conversations > in the past few months with you-all suggesting that a GMOD package, or > packages, would be useful. > > Brian O. > > > > > On 3/30/07 8:30 PM, "Allen Day" wrote: > > > Hi Alex, > > > > You've aptly noted that there are several classes of packages being > > discussed here, and that they should not be treated equally. From my > > point of view and of specific relevance to the Bioperl community we > > have at least: > > > > 1) "regular" CPAN dependencies and their occassional C/C++/Fortran > > dependencies. These should all be in Fedora Extras, as they are of > > general utility. Biopackages.net currently hosts about 200 packages > > (.spec files, specifically) that are like this. Maybe 80 of these are > > needed for Bioperl. > > > > 2) academic packages, such as BLAT, NCBI Toolkit, CLUSTAL, genscan, > > etc. From what I've seen, these typically have strange/custom > > licenses that may not be valid for some users. BLAT has a dual > > licensing scheme for academic and non-academic licensees, for > > instance. These packages are not of general utility. For these two > > reasons, my stance is that they should not be included in Fedora > > Extras. > > > > 3) Bioperl packages. Several subsets here. The Bioperl-run libraries > > depend directly on type (2) packages, so aren't appropriate to include > > in Fedora Extras. Bioperl-live is not really that useful without type > > (2) packages. It is also sensible to all of the keep the Bioperl-* > > packages in the same repository. For these reasons, my stance is that > > they should not be included in Fedora Extras. > > > > 4) Bioinformatics / Comp. Bio. data sets. These don't have licensing > > problems, but they tend to be large. Usually in the 10E7 - 10E10 byte > > range. RPM can not even generate correct metadata for some of them > > correctly if the files are too large (overflow problems). Probably > > not appropriate to put in Fedora Extras because they are too large and > > not generally useful. > > > > 5) Bioinformatics-specific System databases / daemons. These > > high-level packages depend on types (2), (3), and (4), and so are not > > appropriate to put into Fedora Extras. An example is a BLAT daemon, > > which relies on the BLAT server, as well as NIB-formatted genome > > sequence files. > > > > That said, there are a lot of type (1) packages in the Biopackages.net > > repository. If you're interested in migrating the spec files from our > > repository to the Fedora project it would save us (the Biopackages.net > > maintainers) a ton of build and maintenance time, so please feel free > > to take them, just let us know. If we can reach some agreement on > > where the bioinformatics-specific packages should be maintained/built > > we may be able to work together on these as well. > > > > -Allen > > > > > > On 3/30/07, Alex Lancaster wrote: > >>>>>>> "AD" == Allen Day writes: > >> > >> AD> Hi Alex, The Biopackages.net project is still active, we are > >> AD> regularly adding packages to it, mostly R packages lately. Most > >> AD> of the systems we use are running CentOS at this point, which is > >> AD> why you have not seen support for FC6 yet. There is nothing > >> AD> preventing building FC6 packages aside from lack of time to set up > >> AD> the FC6 build farm nodes. > >> > >> Hi Allen and other, > >> > >> Great news to hear that Biopackages.net is still active! I would like > >> to help out if possible. I don't believe in "FUD" either... ;) > >> > >> AD> If you're interested in packaging BioPerl or other > >> AD> bioinformatics-related software, please join the Biopackages > >> AD> project on SourceForge. We object to the Fedora Extras FUD > >> AD> tactics used to discourage people from using 3rd party > >> AD> repositories, and suspect they may not want to host some of our > >> AD> data packages, such as the >2GB genome packages. Biopackages > >> AD> project is likely to partially merge with RPMForge. We are > >> AD> already discussing with them how best to do it. > >> > >> The packages that I created which are currently available in Fedora > >> Packages are Perl dependencies which, as I said are useful for > >> packages outside the bioinformatics purview. I do have a (base) > >> bioperl package in review, but it is not yet released. > >> > >> As for third-party repos, I don't object to them at all, and for some > >> kinds of projects they are indeed appropriate. (e.g. for non-free > >> stuff like Livna or Freshrpms). However I do have practical concerns > >> about repository mixing, but I think that it does need to be handled > >> carefully but that co-operation between Fedora and third-party repos > >> can make it work. > >> > >> For example, one practical concern is that as of the > >> soon-to-be-released Fedora 7, Core+Extras will be merged, so there > >> will be no distinction at the repository-level between formerly Extras > >> packages and formerly Core packages (as of now there are only "Fedora > >> Packages"), which means that it will not be possible for third-party > >> repos to limit their dependencies to just those in a former base set > >> (i.e. excluding Extras). > >> > >> I agree that a few years ago (circa 2003-2004) there was concern about > >> the way some third party repositories were treated somewhat badly by > >> the (then) Fedora Extras (with some people going so far as to say that > >> third-party repos were bad in principle and should always be ignored > >> which I disagree with too). But it seems to me that culture has > >> shifted since, with some notable packagers such as Matthias Saou (of > >> Freshrpms) and Axel Thimm (of Atrpms) now contributing packages to > >> Fedora itself. The process of contributing has also become much > >> simpler and reviews are conducted speedily and efficiently, I had > >> packages in the repository in a matter of a few days from initial > >> submission. Freshrpms itself now enables and depends on the (old) > >> Extras. > >> > >> The real question for me, then is what packages it makes sense to go > >> in Fedora, and what packages go in third party repositories. It seems > >> to me that in the case of Perl packages which could be dependencies > >> for other packages not specific to the third-party repo in question, > >> it makes sense for them to go into Fedora itself, so I think I will > >> continue to package them. This lessens the load on the third-party > >> repo, while making them available for all other third-party repos. > >> (This is approach that Freshrpms seems to be taking, Matthias has > >> contributed most packages back to Fedora now other than the non-free > >> ones). > >> > >> At the other end of the spectrum are packages like you mention, genome > >> packages, which may be of concern because of their size and/or highly > >> specialised nature, and, as you say, may make sense to go in a > >> third-party repo like Biopackages.net. Also packages which can't be > >> packaged by Fedora for legal reasons like Clustal could/should go in > >> Biopackages.net. > >> > >> In the middle are packages like bioperl itself which are potentially > >> useful to perhaps a wider group of people than the genome packages but > >> may not necessarily be dependencies for other packages. I lean > >> towards making them part of Fedora so that they will be available of > >> out the box on the planned "Everything" DVD ISO, but I welcome a > >> discussion on this. > >> > >> As I said, I'm glad to hear that Biopackages.net is alive and well and > >> I welcome a discussion on how upstream Fedora can usefully interact > >> with Biopackages.net (I guess perhaps on the Biopackages.net list). > >> > >> Regards, > >> Alex > >> > >> PS. As the upstream author If you could clarify the license on > >> perl-SVG-Graph, on CPAN (or on the mailing list) that would be great. > >> -- > >> Alex Lancaster, Ph.D. | Ecology & Evolutionary Biology, University of Arizona > >> > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From fahmiderbali at gmail.com Fri Mar 30 18:34:02 2007 From: fahmiderbali at gmail.com (fahmi derbali) Date: Sat, 31 Mar 2007 00:34:02 +0200 Subject: [Bioperl-l] installation bioperl Message-ID: please i can't install bioperl. I have not an internet connection, is it beacause this. please can you give me an easy m?thod to do it ? From granjeau at tagc.univ-mrs.fr Thu Mar 1 02:36:43 2007 From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137) Date: Thu, 01 Mar 2007 08:36:43 +0100 Subject: [Bioperl-l] retrieven ids In-Reply-To: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com> References: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com> Message-ID: <45E6828B.4080808@tagc.univ-mrs.fr> Hi, I am not sure it's the key answer but the FAQ may help you http://www.bioperl.org/wiki/FAQ#How_do_I_retrieve_a_nucleotide_coding_sequence_when_I_have_a_protein_gi_number.3F Cheers, --Samuel Luba Pardo wrote: > Hi everyone, > I wonder if someone could give an advice of the following: > I want to retrieve the DNA coding sequence of a RefSeq protein id. I do not > want to translate the protein back to DNA, but rather get the DNA coding > sequence ID and then retrieve the DNA sequence from Gen Bank. Is there any > module that allow to get all possible ids for a sequence given a gi protein > ? > > Thank you very much in advance, > L. Pardo > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From lubapardo at gmail.com Thu Mar 1 02:48:27 2007 From: lubapardo at gmail.com (Luba Pardo) Date: Thu, 1 Mar 2007 08:48:27 +0100 Subject: [Bioperl-l] retrieven ids In-Reply-To: <57419.10.0.7.57.1172691089.squirrel@gscmail.wustl.edu> References: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com> <57419.10.0.7.57.1172691089.squirrel@gscmail.wustl.edu> Message-ID: <58ff33550702282348w7263f9c1o8a1d4bd6270c4fd0@mail.gmail.com> Thank you very much. L. Pardo On 28/02/07, Dave Messina wrote: > > Whenever I'm unsure of how to do something, I first look to see if one of > the HOWTOs on bioperl.org covers it. In this case, the Features HOWTO has > example code which I think will do what you want. > > Genbank records typically have the coding sequence of a protein as a > feature, so I would do something like: > > - use the RefSeq protein IDs to query Entrez and get back the Genbank > records. > > - read the Features HOWTO to refresh my memory on the syntax for grabbing > features. > > That HOWTO is at: > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation > > - whip up a little script to loop through the Genbank records one at a > time with SeqIO and pull out the cDNA sequence features. > > > Dave > > > From granjeau at tagc.univ-mrs.fr Thu Mar 1 05:09:11 2007 From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137) Date: Thu, 01 Mar 2007 11:09:11 +0100 Subject: [Bioperl-l] _rearrange In-Reply-To: References: Message-ID: <45E6A647.4060605@tagc.univ-mrs.fr> Hi, May be you will find information in http://www.bioperl.org/wiki/Advanced_BioPerl#rearrange.28.29 http://www.bioperl.org/wiki/Bioperl_Best_Practices Cheers, --Samuel Caroline Johnston wrote: > hi, > > Is there a discussion of the rationale behind the _rearrange method > somewhere? I'm probably just being gormless, but I think I'm missing the > point a bit. > > Is it okay for a method just to expect named params like > ->foo(arg1=>'stuff', arg2=>'things'); ? > > Cxx > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From michael.watson at bbsrc.ac.uk Thu Mar 1 05:58:16 2007 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu, 1 Mar 2007 10:58:16 -0000 Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna In-Reply-To: <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com> References: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk> <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2EBE3@iahce2ksrv1.iah.bbsrc.ac.uk> In fact, those pad_left and pad_right arguments have no effect whatsoever (using bioperl 1.5.2_100) my $panel = Bio::Graphics::Panel->new(-key_style => between, -offset => $start, -length => $stop - $start + 1, -width => 800 -pad_left =>5000, -pad_right =>5000 ); Even if I set them to 5000, the image looks exactly as if I had not set them. The only way I can get around this is to edit Glyph/dna.pm lines 184 and 186 such that $x2+3 is changed to $x2-20 - then the axes are drawn on the image instead of outside of it. This is obviously a hack, which upsets my karma. Mick ________________________________ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: 15 February 2007 18:53 To: michael watson (IAH-C) Cc: BioPerl-List Subject: Re: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna Hi Michael, When you set up the panel, do this: Bio::Graphics::Panel->new(-blah -blah, -pad_left => 20, -pad_right => 20); This will leave enough room on the left and right for you to see the Y axis. Otherwise it runs off the edge of the image (ok, this is a mis-design, but it was the only way to solve a chicken-and-egg problem about who gets to say how wide the panel is) Lincoln On 2/15/07, michael watson (IAH-C) wrote: Hi OK I have some great images out of this glyph, but I can't see the axis, and nor is it labelled (ie does it go from 0 - 100%?) so isn't great for publication. The docs say: "NOTE: -gc_window=>'auto' gives nice results and is recommended for drawing GC content. The GC content axes draw slightly outside the panel, so you may wish to add some extra padding on the right and left. " Any idea how to do this? Basically, I want a nice GC graph with the axis quite clearly labelled, and a nice "%GC" title next to it :) Thanks Mick The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From michael.watson at bbsrc.ac.uk Thu Mar 1 06:01:39 2007 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu, 1 Mar 2007 11:01:39 -0000 Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9503E2EBE3@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk> <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com> <8975119BCD0AC5419D61A9CF1A923E9503E2EBE3@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2EBE4@iahce2ksrv1.iah.bbsrc.ac.uk> On further inspection, the lack of a comma was causing my karma upset - apologies. Mick ________________________________ From: michael watson (IAH-C) Sent: 01 March 2007 10:58 To: 'lincoln.stein at gmail.com' Cc: BioPerl-List Subject: RE: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna In fact, those pad_left and pad_right arguments have no effect whatsoever (using bioperl 1.5.2_100) my $panel = Bio::Graphics::Panel->new(-key_style => between, -offset => $start, -length => $stop - $start + 1, -width => 800 -pad_left =>5000, -pad_right =>5000 ); Even if I set them to 5000, the image looks exactly as if I had not set them. The only way I can get around this is to edit Glyph/dna.pm lines 184 and 186 such that $x2+3 is changed to $x2-20 - then the axes are drawn on the image instead of outside of it. This is obviously a hack, which upsets my karma. Mick ________________________________ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: 15 February 2007 18:53 To: michael watson (IAH-C) Cc: BioPerl-List Subject: Re: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna Hi Michael, When you set up the panel, do this: Bio::Graphics::Panel->new(-blah -blah, -pad_left => 20, -pad_right => 20); This will leave enough room on the left and right for you to see the Y axis. Otherwise it runs off the edge of the image (ok, this is a mis-design, but it was the only way to solve a chicken-and-egg problem about who gets to say how wide the panel is) Lincoln On 2/15/07, michael watson (IAH-C) wrote: Hi OK I have some great images out of this glyph, but I can't see the axis, and nor is it labelled (ie does it go from 0 - 100%?) so isn't great for publication. The docs say: "NOTE: -gc_window=>'auto' gives nice results and is recommended for drawing GC content. The GC content axes draw slightly outside the panel, so you may wish to add some extra padding on the right and left. " Any idea how to do this? Basically, I want a nice GC graph with the axis quite clearly labelled, and a nice "%GC" title next to it :) Thanks Mick The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From heikki at sanbi.ac.za Thu Mar 1 06:02:30 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 1 Mar 2007 13:02:30 +0200 Subject: [Bioperl-l] Bio::SeqIO::FTHelper In-Reply-To: References: Message-ID: <200703011302.30855.heikki@sanbi.ac.za> Chris, It was meant to collect code that was common to all three main databases using similar feature tables. Now might be the time to optimise the parsing speed by removing it. Do you have a plan how to do it? -Heikki On Tuesday 27 February 2007 22:57:40 Chris Fields wrote: > Could anyone tell me what FTHelper is used for? From what I gather > it rolls up seqfeature data into a lightweight object but then > creates a SeqFeature::Generic anyway (at least for GenBank/EMBL/ > Swiss), which seems to be a waste of memory and time. Is there > something I'm missing (besides my sanity of course)? > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From lubapardo at gmail.com Thu Mar 1 09:47:23 2007 From: lubapardo at gmail.com (Luba Pardo) Date: Thu, 1 Mar 2007 15:47:23 +0100 Subject: [Bioperl-l] (no subject) Message-ID: <58ff33550703010647j1e7908f3sf3c01a74eeceaca4@mail.gmail.com> Dear all, Sorry if the questions is too basic but I am trying to learn BioPerl modules. So I am trying to get the CDS sequence from a gi identification protein using the "features" method. I started to run the example of the FAQ doc (How do I retrieve a nucleotide coding sequence when I have a protein gi number?) , but I can not get the script to run. the script is: use Bio::Factory::FTLocationFactory; use Bio::DB::GenPept; use Bio::DB::GenBank; my $gp = Bio::DB::GenPept->new; my $gb = Bio::DB::GenBank->new; # factory to turn strings into Bio::Location objects my $loc_factory = Bio::Factory::FTLocationFactory->new; my $protein_gi = '405830'; my $prot_obj = $gp->get_Seq_by_id($protein_gi);; foreach my $feat ( $prot_obj->top_SeqFeatures ) { if ( $feat->primary_tag eq 'CDS' ) { # example: 'coded_by="U05729.1:1..122"' my @coded_by = $feat->each_tag_value('coded_by'); my ($nuc_acc,$loc_str) = split /\:/, $coded_by[0]; my $nuc_obj = $gb->get_Seq_by_acc($nuc_acc); # create Bio::Location object from a string my $loc_object = $loc_factory->from_string($loc_str); # create a Feature object by using a Location my $feat_obj = Bio::SeqFeature::Generic->new(-location =>$loc_object); # associate the Feature object with the nucleotide Seq object $nuc_obj->add_SeqFeature($feat_obj); my $cds_obj = $feat_obj->spliced_seq; print "CDS sequence is ",$cds_obj->seq,"\n"; } } The error I got is ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Must specify a query or list of uids to fetch STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.1/Bio/Root/Root.pm:359 STACK: Bio::DB::NCBIHelper::get_request /usr/lib/perl5/site_perl/5.8.1/Bio/DB/NCBIHelper.pm:192 STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/lib/perl5/site_perl/5.8.1/Bio/DB/WebDBSeqI.pm:432 STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/lib/perl5/site_perl/5.8.1/Bio/DB/NCBIHelper.pm:361 STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc /usr/lib/perl5/site_perl/5.8.1/Bio/DB/WebDBSeqI.pm:172 STACK: feature1.pl:16 But I can not see where part of the script is that I have to specify a list of gi. That very odd. Am I interpreting the script wrong? I also tried : get_Seq_by_acc ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: acc complement(join(AL593843.9 does not exist STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.1/Bio/Root/Root.pm:359 STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc /usr/lib/perl5/site_perl/5.8.1/Bio/DB/WebDBSeqI.pm:181 STACK: feature1.pl:16 Can anyone let me know what am I doing wromg? Thank you very much in advance L. Pardo From jay at jays.net Thu Mar 1 10:51:38 2007 From: jay at jays.net (Jay Hannah) Date: Thu, 1 Mar 2007 09:51:38 -0600 (CST) Subject: [Bioperl-l] Enhance Bio::PrimarySeqI::trunc() for Bio::Location::Split ? Message-ID: In my GenBank files when I'm sitting on a CDS usually I can just call $feature->seq->seq; and out pops the exact nucleotide sequence which codes my protein. Very cool. Unfortunately, I have a crazy GenBank file which contains a CDS with a split range like this: CDS join(1959..2355,1..92) When I try to use $feature->seq->seq I don't end up with just the properly pieced together coding region, I end up with the *entire* nucleotide sequence. This seems to be happening because Bio::SeqFeature::Generic::seq 506: my $seq = $self->{'_gsf_seq'}->trunc($self->start(), $self->end()); (which is calling Bio::PrimarySeqI::trunc) works fine when Bio::SeqFeature::Generic is using '_location' => Bio::Location::Simple=HASH(0x1804344) '_end' => 2842 '_location_type' => 'EXACT' '_root_verbose' => 0 '_seqid' => 'AE015930' '_start' => 1601 '_strand' => 1 but when things get complicated and Bio::SeqFeature::Generic is using '_location' => Bio::Location::Split=HASH(0x1d1f130) '_seqid' => 'PNECG' '_splittype' => 'JOIN' '_sublocations' => ARRAY(0x1d1e654) 0 Bio::Location::Simple=HASH(0x1d1f290) '_end' => 2355 '_location_type' => 'EXACT' '_root_verbose' => 0 '_seqid' => 'PNECG' '_start' => 1959 '_strand' => 1 1 Bio::Location::Simple=HASH(0x1d1f338) '_end' => 92 '_location_type' => 'EXACT' '_root_verbose' => 0 '_seqid' => 'PNECG' '_start' => 1 '_strand' => 1 Simply passing $self->start and $self->end into trunc() will not pull off the appropriate magic. Question 1: Perhaps my data was bad and I should refuse to process join(1959..2355,1..92)? My accession is M12730, and if I download that from NCBI now it looks like they've changed it so my problem no longer exists in that sequence anyway. There are already 71 examples of CDS join in various files in t/data, and *none* of those examples jump backwards. Should I write this off as bad data or try to enhance BioPerl? I'm happy to throw my painful M12730 on the end of t/data/test.genbank and write tests for it if anyone thinks it is important. Question 2: Even if we can just ignore my M12730, though, I think there's still a problem afoot. Below I demo L26462 (already siting in t/data/test.genbank) which has a CDS join(866..957,1088..1310,2161..2289) In this case (as my tests below demonstrate), $feature->seq->seq is pulling the right range of nucleotide, but it's also pulling the gaps (introns). Isn't that wrong? Shouldn't it skip the introns? So... is the appropriate approach to try to enhance Bio::PrimarySeqI::trunc() for Bio::Location::Split? Or should trunc() be left alone and Bio::SeqFeature::Generic::seq() needs to get smarter? Or...? Thanks, oh mighty BioWizards! :) j seqlab.net http://www.bioperl.org/wiki/User:Jhannah ----------------- Tack this on the end of t/genbank.t and the length test at the end fails: ----------------- # Enhance Bio::PrimarySeqI::trunc() for Bio::Location::Split ? my $stream = Bio::SeqIO->new(-file => Bio::Root::IO->catfile ("t","data","test.genbank"), -verbose => $verbose, -format => 'genbank'); my $seq = $stream->next_seq; while ($seq->accession ne "M37762") { $seq = $stream->next_seq; } # M37762 has a CDS 76..819, which should work fine. ok(my @features = $seq->get_SeqFeatures(), "get_SeqFeatures()"); my $feat; foreach my $feat2 ( @features ) { next unless ($feat2->primary_tag eq "CDS"); my @db_xrefs = $feat2->annotation->get_Annotations("db_xref"); if (grep { $_ eq "GI:179403" } @db_xrefs) { $feat = $feat2; last; } } my ($protein_seq) = $feat->annotation->get_Annotations("translation"); ok($protein_seq =~ /^MTILFLTMVISYFGCMKA.*GWRFIRIDTSCVCTLTIKRGR$/, "protein sequence"); my ($nucleotide_seq) = $feat->seq->seq; ok($nucleotide_seq =~ /^ATGACCATCCTTTTCCTT.*ACCATTAAAAGGGGAAGATAG$/, "nucleotide sequence"); is(length($nucleotide_seq), 744, "nucleotide length"); # Jump down to L26462 which has a CDS join(866..957,1088..1310,2161..2289), which is broken? while ($seq->accession ne "L26462") { $seq = $stream->next_seq; } ok(my @features = $seq->get_SeqFeatures(), "get_SeqFeatures()"); my $feat; foreach my $feat2 ( @features ) { next unless ($feat2->primary_tag eq "CDS"); my @db_xrefs = $feat2->annotation->get_Annotations("db_xref"); if (grep { $_ eq "GI:532506" } @db_xrefs) { $feat = $feat2; last; } } my ($protein_seq) = $feat->annotation->get_Annotations("translation"); ok($protein_seq =~ /^MVHLTPEEKSAVTALWGK.*VQAAYQKVVAGVANALAHKYH$/, "protein sequence"); my ($nucleotide_seq) = $feat->seq->seq; ok($nucleotide_seq =~ /^ATGGTGCATCTGACTCCT.*CTGGCCCACAAGTATCACTAA$/, "nucleotide sequence - correct CDS range"); #print "[$nucleotide_seq]\n"; ok($nucleotide_seq !~ /^ACCTCCTATTTGACACCA.*TGCTAGTCTCCCGGAACTATC$/, "nucleotide sequence - full nucleotide should not match"); is(length($nucleotide_seq), 444, "nucleotide length"); # I have an old(?) version of M12730 which lists # CDS join(1959..2355,1..92) # /db_xref="GI:150830" # Crazy ranges like that don't work at all, you end up with the full nucleotide sequence... # But NCBI doesn't list M12730 that way any more, so now I would be OK? # ------------------ From cjfields at uiuc.edu Thu Mar 1 10:24:03 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Mar 2007 09:24:03 -0600 Subject: [Bioperl-l] Bio::SeqIO::FTHelper In-Reply-To: <200703011302.30855.heikki@sanbi.ac.za> References: <200703011302.30855.heikki@sanbi.ac.za> Message-ID: <8AA2B586-4C2B-4379-BFF0-0CEB15FAE68E@uiuc.edu> I do have a rough outline of what I think could be done: http://www.bioperl.org/wiki/Handler-based_SeqIO_parsers where you could switch out handlers to deal with incoming data chunks. Any suggestions there are welcome. I'll probably commit examples of the above in the next week or two (GenBank, EMBL, Swiss parsers using the same handlers) which don't use FTHelper. So far I have all three passing tests based on genbank/ embl/swiss.t but they need a few more tweaks before I commit. chris On Mar 1, 2007, at 5:02 AM, Heikki Lehvaslaiho wrote: > Chris, > > It was meant to collect code that was common to all three main > databases using > similar feature tables. > > Now might be the time to optimise the parsing speed by removing it. > Do you > have a plan how to do it? > > -Heikki > > On Tuesday 27 February 2007 22:57:40 Chris Fields wrote: >> Could anyone tell me what FTHelper is used for? From what I gather >> it rolls up seqfeature data into a lightweight object but then >> creates a SeqFeature::Generic anyway (at least for GenBank/EMBL/ >> Swiss), which seems to be a waste of memory and time. Is there >> something I'm missing (besides my sanity of course)? >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Mar 1 10:57:02 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Mar 2007 09:57:02 -0600 Subject: [Bioperl-l] Enhance Bio::PrimarySeqI::trunc() for Bio::Location::Split ? In-Reply-To: References: Message-ID: Jay, Have you tried using $feature->spliced_seq() instead of seq()? Using seq() retrieves the full sequence for the split location (from start of first sublocation to end of last), while spliced_seq() splices the sublocation sequences together, which is what I think you want. chris On Mar 1, 2007, at 9:51 AM, Jay Hannah wrote: > In my GenBank files when I'm sitting on a CDS usually I can just call > > $feature->seq->seq; > > and out pops the exact nucleotide sequence which codes my protein. > Very > cool. > > Unfortunately, I have a crazy GenBank file which contains a CDS with a > split range like this: CDS join(1959..2355,1..92) > > When I try to use $feature->seq->seq I don't end up with just the > properly > pieced together coding region, I end up with the *entire* nucleotide > sequence. > > This seems to be happening because > > Bio::SeqFeature::Generic::seq > 506: my $seq = $self->{'_gsf_seq'}->trunc($self->start(), $self- > >end()); > (which is calling Bio::PrimarySeqI::trunc) > > works fine when Bio::SeqFeature::Generic is using > > '_location' => Bio::Location::Simple=HASH(0x1804344) > '_end' => 2842 > '_location_type' => 'EXACT' > '_root_verbose' => 0 > '_seqid' => 'AE015930' > '_start' => 1601 > '_strand' => 1 > > but when things get complicated and Bio::SeqFeature::Generic is using > > '_location' => Bio::Location::Split=HASH(0x1d1f130) > '_seqid' => 'PNECG' > '_splittype' => 'JOIN' > '_sublocations' => ARRAY(0x1d1e654) > 0 Bio::Location::Simple=HASH(0x1d1f290) > '_end' => 2355 > '_location_type' => 'EXACT' > '_root_verbose' => 0 > '_seqid' => 'PNECG' > '_start' => 1959 > '_strand' => 1 > 1 Bio::Location::Simple=HASH(0x1d1f338) > '_end' => 92 > '_location_type' => 'EXACT' > '_root_verbose' => 0 > '_seqid' => 'PNECG' > '_start' => 1 > '_strand' => 1 > > Simply passing $self->start and $self->end into trunc() will not pull > off the appropriate magic. > > Question 1: Perhaps my data was bad and I should refuse to process > join(1959..2355,1..92)? My accession is M12730, and if I download that > from NCBI now it looks like they've changed it so my problem no longer > exists in that sequence anyway. There are already 71 examples of > CDS join > in various files in t/data, and *none* of those examples jump > backwards. > Should I write this off as bad data or try to enhance BioPerl? I'm > happy > to throw my painful M12730 on the end of t/data/test.genbank and write > tests for it if anyone thinks it is important. > > Question 2: Even if we can just ignore my M12730, though, I think > there's > still a problem afoot. Below I demo L26462 (already siting in > t/data/test.genbank) which has a > > CDS join(866..957,1088..1310,2161..2289) > > In this case (as my tests below demonstrate), $feature->seq->seq is > pulling the right range of nucleotide, but it's also pulling the gaps > (introns). Isn't that wrong? Shouldn't it skip the introns? > > So... is the appropriate approach to try to enhance > Bio::PrimarySeqI::trunc() for Bio::Location::Split? Or should trunc > () be > left alone and Bio::SeqFeature::Generic::seq() needs to get smarter? > > Or...? > > Thanks, oh mighty BioWizards! :) > > j > seqlab.net > http://www.bioperl.org/wiki/User:Jhannah > > > > ----------------- > Tack this on the end of t/genbank.t and the length test at the end > fails: > ----------------- > # Enhance Bio::PrimarySeqI::trunc() for Bio::Location::Split ? > my $stream = Bio::SeqIO->new(-file => Bio::Root::IO->catfile > ("t","data","test.genbank"), > -verbose => $verbose, > -format => 'genbank'); > my $seq = $stream->next_seq; > while ($seq->accession ne "M37762") { > $seq = $stream->next_seq; > } > # M37762 has a CDS 76..819, which should work fine. > ok(my @features = $seq->get_SeqFeatures(), "get_SeqFeatures()"); > my $feat; > foreach my $feat2 ( @features ) { > next unless ($feat2->primary_tag eq "CDS"); > my @db_xrefs = $feat2->annotation->get_Annotations("db_xref"); > if (grep { $_ eq "GI:179403" } @db_xrefs) { > $feat = $feat2; > last; > } > } > my ($protein_seq) = $feat->annotation->get_Annotations("translation"); > ok($protein_seq =~ /^MTILFLTMVISYFGCMKA.*GWRFIRIDTSCVCTLTIKRGR > $/, "protein sequence"); > my ($nucleotide_seq) = $feat->seq->seq; > ok($nucleotide_seq =~ /^ATGACCATCCTTTTCCTT.*ACCATTAAAAGGGGAAGATAG > $/, "nucleotide sequence"); > is(length($nucleotide_seq), > 744, "nucleotide length"); > > # Jump down to L26462 which has a CDS join > (866..957,1088..1310,2161..2289), which is broken? > while ($seq->accession ne "L26462") { > $seq = $stream->next_seq; > } > ok(my @features = $seq->get_SeqFeatures(), "get_SeqFeatures()"); > my $feat; > foreach my $feat2 ( @features ) { > next unless ($feat2->primary_tag eq "CDS"); > my @db_xrefs = $feat2->annotation->get_Annotations("db_xref"); > if (grep { $_ eq "GI:532506" } @db_xrefs) { > $feat = $feat2; > last; > } > } > my ($protein_seq) = $feat->annotation->get_Annotations("translation"); > ok($protein_seq =~ /^MVHLTPEEKSAVTALWGK.*VQAAYQKVVAGVANALAHKYH > $/, "protein sequence"); > my ($nucleotide_seq) = $feat->seq->seq; > ok($nucleotide_seq =~ /^ATGGTGCATCTGACTCCT.*CTGGCCCACAAGTATCACTAA > $/, "nucleotide sequence - correct CDS range"); > #print "[$nucleotide_seq]\n"; > ok($nucleotide_seq !~ /^ACCTCCTATTTGACACCA.*TGCTAGTCTCCCGGAACTATC > $/, "nucleotide sequence - full nucleotide should not match"); > is(length($nucleotide_seq), > 444, "nucleotide length"); > > # I have an old(?) version of M12730 which lists > # CDS join(1959..2355,1..92) > # /db_xref="GI:150830" > # Crazy ranges like that don't work at all, you end up with the > full nucleotide sequence... > # But NCBI doesn't list M12730 that way any more, so now I would be > OK? > > # ------------------ From sac at bioperl.org Thu Mar 1 11:30:59 2007 From: sac at bioperl.org (Steve Chervitz) Date: Thu, 1 Mar 2007 09:30:59 -0700 Subject: [Bioperl-l] [Bioperl-announce-l] BioPerl leadership additions In-Reply-To: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> References: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> Message-ID: <000101c75c1e$fecb7770$6400a8c0@CodonSolutions.local> Welcome to the club, Chris & Sendu. Always good to have an infusion of new blood and capable, motivated hands. Steve On 2/26/07, Jason Stajich wrote: > > Dear BioPerl Users and Developers, > > I want to announce a addition in the leadership of BioPerl. > Christopher Fields and and Sendu Bala are now members of the BioPerl > Core developer group to recognize their ongoing leadership in the > project. Chris and Sendu were instrumental in the 1.5.2 Developer > release and have made a significant commitment and contribution to > the quality of the code and the documentation of the project. We > have invited them to be part of the core to recognize their work and > to feel comfortable to ask them to do more. ;-) > > The Core group was established to insure that someone was responsible > for making code releases, vetting new developers for CVS write > accounts, and generally dealing with things that might otherwise slip > through the cracks. We are very excited to have more people > contributing to and maintaining the toolkit. We look forward to > their help along with all the other developers, as we work towards a > 1.6 release release this year. > > As always, while their is a need for some individuals to lead the > project, we encourage contributions from all levels of expertise to > improve the code, documentation, and tutorials of the project. > > We plan to discuss the progress of the toolkit at this year's > Bioinformatics Open Source Conference held in Vienna, Austria in > conjunction with the SIG meetings at ISMB. We are trying to use > BOSC 2007 as a chance for the developers of Open Bioinformatics > Foundation sponsored and related projects to coordinate future > development and release cycles. > > Jason Stajich on behalf of the Core developers > > _______________________________________________ > Bioperl-announce-l mailing list > Bioperl-announce-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l > _______________________________________________ Bioperl-announce-l mailing list Bioperl-announce-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l From arareko at campus.iztacala.unam.mx Thu Mar 1 11:30:59 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 1 Mar 2007 09:30:59 -0700 Subject: [Bioperl-l] [Bioperl-announce-l] BioPerl leadership additions In-Reply-To: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> References: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> Message-ID: <000001c75c1e$fec90670$6400a8c0@CodonSolutions.local> Congrats Chris & Sendu! Very well-deserved. Keep up the great work. Cheers! Mauricio. Jason Stajich wrote: > Dear BioPerl Users and Developers, > > I want to announce a addition in the leadership of BioPerl. > Christopher Fields and and Sendu Bala are now members of the BioPerl > Core developer group to recognize their ongoing leadership in the > project. Chris and Sendu were instrumental in the 1.5.2 Developer > release and have made a significant commitment and contribution to > the quality of the code and the documentation of the project. We > have invited them to be part of the core to recognize their work and > to feel comfortable to ask them to do more. ;-) > > The Core group was established to insure that someone was responsible > for making code releases, vetting new developers for CVS write > accounts, and generally dealing with things that might otherwise slip > through the cracks. We are very excited to have more people > contributing to and maintaining the toolkit. We look forward to > their help along with all the other developers, as we work towards a > 1.6 release release this year. > > As always, while their is a need for some individuals to lead the > project, we encourage contributions from all levels of expertise to > improve the code, documentation, and tutorials of the project. > > We plan to discuss the progress of the toolkit at this year's > Bioinformatics Open Source Conference held in Vienna, Austria in > conjunction with the SIG meetings at ISMB. We are trying to use > BOSC 2007 as a chance for the developers of Open Bioinformatics > Foundation sponsored and related projects to coordinate future > development and release cycles. > > Jason Stajich on behalf of the Core developers > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM _______________________________________________ Bioperl-announce-l mailing list Bioperl-announce-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l From johnsonm at gmail.com Thu Mar 1 11:49:20 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Thu, 1 Mar 2007 10:49:20 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: References: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu>