From granjeau at tagc.univ-mrs.fr Thu Mar 1 02:36:43 2007 From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137) Date: Thu, 01 Mar 2007 08:36:43 +0100 Subject: [Bioperl-l] retrieven ids In-Reply-To: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com> References: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com> Message-ID: <45E6828B.4080808@tagc.univ-mrs.fr> Hi, I am not sure it's the key answer but the FAQ may help you http://www.bioperl.org/wiki/FAQ#How_do_I_retrieve_a_nucleotide_coding_sequence_when_I_have_a_protein_gi_number.3F Cheers, --Samuel Luba Pardo wrote: > Hi everyone, > I wonder if someone could give an advice of the following: > I want to retrieve the DNA coding sequence of a RefSeq protein id. I do not > want to translate the protein back to DNA, but rather get the DNA coding > sequence ID and then retrieve the DNA sequence from Gen Bank. Is there any > module that allow to get all possible ids for a sequence given a gi protein > ? > > Thank you very much in advance, > L. Pardo > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From lubapardo at gmail.com Thu Mar 1 02:48:27 2007 From: lubapardo at gmail.com (Luba Pardo) Date: Thu, 1 Mar 2007 08:48:27 +0100 Subject: [Bioperl-l] retrieven ids In-Reply-To: <57419.10.0.7.57.1172691089.squirrel@gscmail.wustl.edu> References: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com> <57419.10.0.7.57.1172691089.squirrel@gscmail.wustl.edu> Message-ID: <58ff33550702282348w7263f9c1o8a1d4bd6270c4fd0@mail.gmail.com> Thank you very much. L. Pardo On 28/02/07, Dave Messina wrote: > > Whenever I'm unsure of how to do something, I first look to see if one of > the HOWTOs on bioperl.org covers it. In this case, the Features HOWTO has > example code which I think will do what you want. > > Genbank records typically have the coding sequence of a protein as a > feature, so I would do something like: > > - use the RefSeq protein IDs to query Entrez and get back the Genbank > records. > > - read the Features HOWTO to refresh my memory on the syntax for grabbing > features. > > That HOWTO is at: > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation > > - whip up a little script to loop through the Genbank records one at a > time with SeqIO and pull out the cDNA sequence features. > > > Dave > > > From granjeau at tagc.univ-mrs.fr Thu Mar 1 05:09:11 2007 From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137) Date: Thu, 01 Mar 2007 11:09:11 +0100 Subject: [Bioperl-l] _rearrange In-Reply-To: References: Message-ID: <45E6A647.4060605@tagc.univ-mrs.fr> Hi, May be you will find information in http://www.bioperl.org/wiki/Advanced_BioPerl#rearrange.28.29 http://www.bioperl.org/wiki/Bioperl_Best_Practices Cheers, --Samuel Caroline Johnston wrote: > hi, > > Is there a discussion of the rationale behind the _rearrange method > somewhere? I'm probably just being gormless, but I think I'm missing the > point a bit. > > Is it okay for a method just to expect named params like > ->foo(arg1=>'stuff', arg2=>'things'); ? > > Cxx > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From michael.watson at bbsrc.ac.uk Thu Mar 1 05:58:16 2007 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu, 1 Mar 2007 10:58:16 -0000 Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna In-Reply-To: <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com> References: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk> <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2EBE3@iahce2ksrv1.iah.bbsrc.ac.uk> In fact, those pad_left and pad_right arguments have no effect whatsoever (using bioperl 1.5.2_100) my $panel = Bio::Graphics::Panel->new(-key_style => between, -offset => $start, -length => $stop - $start + 1, -width => 800 -pad_left =>5000, -pad_right =>5000 ); Even if I set them to 5000, the image looks exactly as if I had not set them. The only way I can get around this is to edit Glyph/dna.pm lines 184 and 186 such that $x2+3 is changed to $x2-20 - then the axes are drawn on the image instead of outside of it. This is obviously a hack, which upsets my karma. Mick ________________________________ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: 15 February 2007 18:53 To: michael watson (IAH-C) Cc: BioPerl-List Subject: Re: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna Hi Michael, When you set up the panel, do this: Bio::Graphics::Panel->new(-blah -blah, -pad_left => 20, -pad_right => 20); This will leave enough room on the left and right for you to see the Y axis. Otherwise it runs off the edge of the image (ok, this is a mis-design, but it was the only way to solve a chicken-and-egg problem about who gets to say how wide the panel is) Lincoln On 2/15/07, michael watson (IAH-C) wrote: Hi OK I have some great images out of this glyph, but I can't see the axis, and nor is it labelled (ie does it go from 0 - 100%?) so isn't great for publication. The docs say: "NOTE: -gc_window=>'auto' gives nice results and is recommended for drawing GC content. The GC content axes draw slightly outside the panel, so you may wish to add some extra padding on the right and left. " Any idea how to do this? Basically, I want a nice GC graph with the axis quite clearly labelled, and a nice "%GC" title next to it :) Thanks Mick The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From michael.watson at bbsrc.ac.uk Thu Mar 1 06:01:39 2007 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu, 1 Mar 2007 11:01:39 -0000 Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9503E2EBE3@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk> <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com> <8975119BCD0AC5419D61A9CF1A923E9503E2EBE3@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2EBE4@iahce2ksrv1.iah.bbsrc.ac.uk> On further inspection, the lack of a comma was causing my karma upset - apologies. Mick ________________________________ From: michael watson (IAH-C) Sent: 01 March 2007 10:58 To: 'lincoln.stein at gmail.com' Cc: BioPerl-List Subject: RE: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna In fact, those pad_left and pad_right arguments have no effect whatsoever (using bioperl 1.5.2_100) my $panel = Bio::Graphics::Panel->new(-key_style => between, -offset => $start, -length => $stop - $start + 1, -width => 800 -pad_left =>5000, -pad_right =>5000 ); Even if I set them to 5000, the image looks exactly as if I had not set them. The only way I can get around this is to edit Glyph/dna.pm lines 184 and 186 such that $x2+3 is changed to $x2-20 - then the axes are drawn on the image instead of outside of it. This is obviously a hack, which upsets my karma. Mick ________________________________ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: 15 February 2007 18:53 To: michael watson (IAH-C) Cc: BioPerl-List Subject: Re: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna Hi Michael, When you set up the panel, do this: Bio::Graphics::Panel->new(-blah -blah, -pad_left => 20, -pad_right => 20); This will leave enough room on the left and right for you to see the Y axis. Otherwise it runs off the edge of the image (ok, this is a mis-design, but it was the only way to solve a chicken-and-egg problem about who gets to say how wide the panel is) Lincoln On 2/15/07, michael watson (IAH-C) wrote: Hi OK I have some great images out of this glyph, but I can't see the axis, and nor is it labelled (ie does it go from 0 - 100%?) so isn't great for publication. The docs say: "NOTE: -gc_window=>'auto' gives nice results and is recommended for drawing GC content. The GC content axes draw slightly outside the panel, so you may wish to add some extra padding on the right and left. " Any idea how to do this? Basically, I want a nice GC graph with the axis quite clearly labelled, and a nice "%GC" title next to it :) Thanks Mick The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From heikki at sanbi.ac.za Thu Mar 1 06:02:30 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 1 Mar 2007 13:02:30 +0200 Subject: [Bioperl-l] Bio::SeqIO::FTHelper In-Reply-To: References: Message-ID: <200703011302.30855.heikki@sanbi.ac.za> Chris, It was meant to collect code that was common to all three main databases using similar feature tables. Now might be the time to optimise the parsing speed by removing it. Do you have a plan how to do it? -Heikki On Tuesday 27 February 2007 22:57:40 Chris Fields wrote: > Could anyone tell me what FTHelper is used for? From what I gather > it rolls up seqfeature data into a lightweight object but then > creates a SeqFeature::Generic anyway (at least for GenBank/EMBL/ > Swiss), which seems to be a waste of memory and time. Is there > something I'm missing (besides my sanity of course)? > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From lubapardo at gmail.com Thu Mar 1 09:47:23 2007 From: lubapardo at gmail.com (Luba Pardo) Date: Thu, 1 Mar 2007 15:47:23 +0100 Subject: [Bioperl-l] (no subject) Message-ID: <58ff33550703010647j1e7908f3sf3c01a74eeceaca4@mail.gmail.com> Dear all, Sorry if the questions is too basic but I am trying to learn BioPerl modules. So I am trying to get the CDS sequence from a gi identification protein using the "features" method. I started to run the example of the FAQ doc (How do I retrieve a nucleotide coding sequence when I have a protein gi number?) , but I can not get the script to run. the script is: use Bio::Factory::FTLocationFactory; use Bio::DB::GenPept; use Bio::DB::GenBank; my $gp = Bio::DB::GenPept->new; my $gb = Bio::DB::GenBank->new; # factory to turn strings into Bio::Location objects my $loc_factory = Bio::Factory::FTLocationFactory->new; my $protein_gi = '405830'; my $prot_obj = $gp->get_Seq_by_id($protein_gi);; foreach my $feat ( $prot_obj->top_SeqFeatures ) { if ( $feat->primary_tag eq 'CDS' ) { # example: 'coded_by="U05729.1:1..122"' my @coded_by = $feat->each_tag_value('coded_by'); my ($nuc_acc,$loc_str) = split /\:/, $coded_by[0]; my $nuc_obj = $gb->get_Seq_by_acc($nuc_acc); # create Bio::Location object from a string my $loc_object = $loc_factory->from_string($loc_str); # create a Feature object by using a Location my $feat_obj = Bio::SeqFeature::Generic->new(-location =>$loc_object); # associate the Feature object with the nucleotide Seq object $nuc_obj->add_SeqFeature($feat_obj); my $cds_obj = $feat_obj->spliced_seq; print "CDS sequence is ",$cds_obj->seq,"\n"; } } The error I got is ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Must specify a query or list of uids to fetch STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.1/Bio/Root/Root.pm:359 STACK: Bio::DB::NCBIHelper::get_request /usr/lib/perl5/site_perl/5.8.1/Bio/DB/NCBIHelper.pm:192 STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/lib/perl5/site_perl/5.8.1/Bio/DB/WebDBSeqI.pm:432 STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/lib/perl5/site_perl/5.8.1/Bio/DB/NCBIHelper.pm:361 STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc /usr/lib/perl5/site_perl/5.8.1/Bio/DB/WebDBSeqI.pm:172 STACK: feature1.pl:16 But I can not see where part of the script is that I have to specify a list of gi. That very odd. Am I interpreting the script wrong? I also tried : get_Seq_by_acc ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: acc complement(join(AL593843.9 does not exist STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.1/Bio/Root/Root.pm:359 STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc /usr/lib/perl5/site_perl/5.8.1/Bio/DB/WebDBSeqI.pm:181 STACK: feature1.pl:16 Can anyone let me know what am I doing wromg? Thank you very much in advance L. Pardo From jay at jays.net Thu Mar 1 10:51:38 2007 From: jay at jays.net (Jay Hannah) Date: Thu, 1 Mar 2007 09:51:38 -0600 (CST) Subject: [Bioperl-l] Enhance Bio::PrimarySeqI::trunc() for Bio::Location::Split ? Message-ID: In my GenBank files when I'm sitting on a CDS usually I can just call $feature->seq->seq; and out pops the exact nucleotide sequence which codes my protein. Very cool. Unfortunately, I have a crazy GenBank file which contains a CDS with a split range like this: CDS join(1959..2355,1..92) When I try to use $feature->seq->seq I don't end up with just the properly pieced together coding region, I end up with the *entire* nucleotide sequence. This seems to be happening because Bio::SeqFeature::Generic::seq 506: my $seq = $self->{'_gsf_seq'}->trunc($self->start(), $self->end()); (which is calling Bio::PrimarySeqI::trunc) works fine when Bio::SeqFeature::Generic is using '_location' => Bio::Location::Simple=HASH(0x1804344) '_end' => 2842 '_location_type' => 'EXACT' '_root_verbose' => 0 '_seqid' => 'AE015930' '_start' => 1601 '_strand' => 1 but when things get complicated and Bio::SeqFeature::Generic is using '_location' => Bio::Location::Split=HASH(0x1d1f130) '_seqid' => 'PNECG' '_splittype' => 'JOIN' '_sublocations' => ARRAY(0x1d1e654) 0 Bio::Location::Simple=HASH(0x1d1f290) '_end' => 2355 '_location_type' => 'EXACT' '_root_verbose' => 0 '_seqid' => 'PNECG' '_start' => 1959 '_strand' => 1 1 Bio::Location::Simple=HASH(0x1d1f338) '_end' => 92 '_location_type' => 'EXACT' '_root_verbose' => 0 '_seqid' => 'PNECG' '_start' => 1 '_strand' => 1 Simply passing $self->start and $self->end into trunc() will not pull off the appropriate magic. Question 1: Perhaps my data was bad and I should refuse to process join(1959..2355,1..92)? My accession is M12730, and if I download that from NCBI now it looks like they've changed it so my problem no longer exists in that sequence anyway. There are already 71 examples of CDS join in various files in t/data, and *none* of those examples jump backwards. Should I write this off as bad data or try to enhance BioPerl? I'm happy to throw my painful M12730 on the end of t/data/test.genbank and write tests for it if anyone thinks it is important. Question 2: Even if we can just ignore my M12730, though, I think there's still a problem afoot. Below I demo L26462 (already siting in t/data/test.genbank) which has a CDS join(866..957,1088..1310,2161..2289) In this case (as my tests below demonstrate), $feature->seq->seq is pulling the right range of nucleotide, but it's also pulling the gaps (introns). Isn't that wrong? Shouldn't it skip the introns? So... is the appropriate approach to try to enhance Bio::PrimarySeqI::trunc() for Bio::Location::Split? Or should trunc() be left alone and Bio::SeqFeature::Generic::seq() needs to get smarter? Or...? Thanks, oh mighty BioWizards! :) j seqlab.net http://www.bioperl.org/wiki/User:Jhannah ----------------- Tack this on the end of t/genbank.t and the length test at the end fails: ----------------- # Enhance Bio::PrimarySeqI::trunc() for Bio::Location::Split ? my $stream = Bio::SeqIO->new(-file => Bio::Root::IO->catfile ("t","data","test.genbank"), -verbose => $verbose, -format => 'genbank'); my $seq = $stream->next_seq; while ($seq->accession ne "M37762") { $seq = $stream->next_seq; } # M37762 has a CDS 76..819, which should work fine. ok(my @features = $seq->get_SeqFeatures(), "get_SeqFeatures()"); my $feat; foreach my $feat2 ( @features ) { next unless ($feat2->primary_tag eq "CDS"); my @db_xrefs = $feat2->annotation->get_Annotations("db_xref"); if (grep { $_ eq "GI:179403" } @db_xrefs) { $feat = $feat2; last; } } my ($protein_seq) = $feat->annotation->get_Annotations("translation"); ok($protein_seq =~ /^MTILFLTMVISYFGCMKA.*GWRFIRIDTSCVCTLTIKRGR$/, "protein sequence"); my ($nucleotide_seq) = $feat->seq->seq; ok($nucleotide_seq =~ /^ATGACCATCCTTTTCCTT.*ACCATTAAAAGGGGAAGATAG$/, "nucleotide sequence"); is(length($nucleotide_seq), 744, "nucleotide length"); # Jump down to L26462 which has a CDS join(866..957,1088..1310,2161..2289), which is broken? while ($seq->accession ne "L26462") { $seq = $stream->next_seq; } ok(my @features = $seq->get_SeqFeatures(), "get_SeqFeatures()"); my $feat; foreach my $feat2 ( @features ) { next unless ($feat2->primary_tag eq "CDS"); my @db_xrefs = $feat2->annotation->get_Annotations("db_xref"); if (grep { $_ eq "GI:532506" } @db_xrefs) { $feat = $feat2; last; } } my ($protein_seq) = $feat->annotation->get_Annotations("translation"); ok($protein_seq =~ /^MVHLTPEEKSAVTALWGK.*VQAAYQKVVAGVANALAHKYH$/, "protein sequence"); my ($nucleotide_seq) = $feat->seq->seq; ok($nucleotide_seq =~ /^ATGGTGCATCTGACTCCT.*CTGGCCCACAAGTATCACTAA$/, "nucleotide sequence - correct CDS range"); #print "[$nucleotide_seq]\n"; ok($nucleotide_seq !~ /^ACCTCCTATTTGACACCA.*TGCTAGTCTCCCGGAACTATC$/, "nucleotide sequence - full nucleotide should not match"); is(length($nucleotide_seq), 444, "nucleotide length"); # I have an old(?) version of M12730 which lists # CDS join(1959..2355,1..92) # /db_xref="GI:150830" # Crazy ranges like that don't work at all, you end up with the full nucleotide sequence... # But NCBI doesn't list M12730 that way any more, so now I would be OK? # ------------------ From cjfields at uiuc.edu Thu Mar 1 10:24:03 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Mar 2007 09:24:03 -0600 Subject: [Bioperl-l] Bio::SeqIO::FTHelper In-Reply-To: <200703011302.30855.heikki@sanbi.ac.za> References: <200703011302.30855.heikki@sanbi.ac.za> Message-ID: <8AA2B586-4C2B-4379-BFF0-0CEB15FAE68E@uiuc.edu> I do have a rough outline of what I think could be done: http://www.bioperl.org/wiki/Handler-based_SeqIO_parsers where you could switch out handlers to deal with incoming data chunks. Any suggestions there are welcome. I'll probably commit examples of the above in the next week or two (GenBank, EMBL, Swiss parsers using the same handlers) which don't use FTHelper. So far I have all three passing tests based on genbank/ embl/swiss.t but they need a few more tweaks before I commit. chris On Mar 1, 2007, at 5:02 AM, Heikki Lehvaslaiho wrote: > Chris, > > It was meant to collect code that was common to all three main > databases using > similar feature tables. > > Now might be the time to optimise the parsing speed by removing it. > Do you > have a plan how to do it? > > -Heikki > > On Tuesday 27 February 2007 22:57:40 Chris Fields wrote: >> Could anyone tell me what FTHelper is used for? From what I gather >> it rolls up seqfeature data into a lightweight object but then >> creates a SeqFeature::Generic anyway (at least for GenBank/EMBL/ >> Swiss), which seems to be a waste of memory and time. Is there >> something I'm missing (besides my sanity of course)? >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Mar 1 10:57:02 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Mar 2007 09:57:02 -0600 Subject: [Bioperl-l] Enhance Bio::PrimarySeqI::trunc() for Bio::Location::Split ? In-Reply-To: References: Message-ID: Jay, Have you tried using $feature->spliced_seq() instead of seq()? Using seq() retrieves the full sequence for the split location (from start of first sublocation to end of last), while spliced_seq() splices the sublocation sequences together, which is what I think you want. chris On Mar 1, 2007, at 9:51 AM, Jay Hannah wrote: > In my GenBank files when I'm sitting on a CDS usually I can just call > > $feature->seq->seq; > > and out pops the exact nucleotide sequence which codes my protein. > Very > cool. > > Unfortunately, I have a crazy GenBank file which contains a CDS with a > split range like this: CDS join(1959..2355,1..92) > > When I try to use $feature->seq->seq I don't end up with just the > properly > pieced together coding region, I end up with the *entire* nucleotide > sequence. > > This seems to be happening because > > Bio::SeqFeature::Generic::seq > 506: my $seq = $self->{'_gsf_seq'}->trunc($self->start(), $self- > >end()); > (which is calling Bio::PrimarySeqI::trunc) > > works fine when Bio::SeqFeature::Generic is using > > '_location' => Bio::Location::Simple=HASH(0x1804344) > '_end' => 2842 > '_location_type' => 'EXACT' > '_root_verbose' => 0 > '_seqid' => 'AE015930' > '_start' => 1601 > '_strand' => 1 > > but when things get complicated and Bio::SeqFeature::Generic is using > > '_location' => Bio::Location::Split=HASH(0x1d1f130) > '_seqid' => 'PNECG' > '_splittype' => 'JOIN' > '_sublocations' => ARRAY(0x1d1e654) > 0 Bio::Location::Simple=HASH(0x1d1f290) > '_end' => 2355 > '_location_type' => 'EXACT' > '_root_verbose' => 0 > '_seqid' => 'PNECG' > '_start' => 1959 > '_strand' => 1 > 1 Bio::Location::Simple=HASH(0x1d1f338) > '_end' => 92 > '_location_type' => 'EXACT' > '_root_verbose' => 0 > '_seqid' => 'PNECG' > '_start' => 1 > '_strand' => 1 > > Simply passing $self->start and $self->end into trunc() will not pull > off the appropriate magic. > > Question 1: Perhaps my data was bad and I should refuse to process > join(1959..2355,1..92)? My accession is M12730, and if I download that > from NCBI now it looks like they've changed it so my problem no longer > exists in that sequence anyway. There are already 71 examples of > CDS join > in various files in t/data, and *none* of those examples jump > backwards. > Should I write this off as bad data or try to enhance BioPerl? I'm > happy > to throw my painful M12730 on the end of t/data/test.genbank and write > tests for it if anyone thinks it is important. > > Question 2: Even if we can just ignore my M12730, though, I think > there's > still a problem afoot. Below I demo L26462 (already siting in > t/data/test.genbank) which has a > > CDS join(866..957,1088..1310,2161..2289) > > In this case (as my tests below demonstrate), $feature->seq->seq is > pulling the right range of nucleotide, but it's also pulling the gaps > (introns). Isn't that wrong? Shouldn't it skip the introns? > > So... is the appropriate approach to try to enhance > Bio::PrimarySeqI::trunc() for Bio::Location::Split? Or should trunc > () be > left alone and Bio::SeqFeature::Generic::seq() needs to get smarter? > > Or...? > > Thanks, oh mighty BioWizards! :) > > j > seqlab.net > http://www.bioperl.org/wiki/User:Jhannah > > > > ----------------- > Tack this on the end of t/genbank.t and the length test at the end > fails: > ----------------- > # Enhance Bio::PrimarySeqI::trunc() for Bio::Location::Split ? > my $stream = Bio::SeqIO->new(-file => Bio::Root::IO->catfile > ("t","data","test.genbank"), > -verbose => $verbose, > -format => 'genbank'); > my $seq = $stream->next_seq; > while ($seq->accession ne "M37762") { > $seq = $stream->next_seq; > } > # M37762 has a CDS 76..819, which should work fine. > ok(my @features = $seq->get_SeqFeatures(), "get_SeqFeatures()"); > my $feat; > foreach my $feat2 ( @features ) { > next unless ($feat2->primary_tag eq "CDS"); > my @db_xrefs = $feat2->annotation->get_Annotations("db_xref"); > if (grep { $_ eq "GI:179403" } @db_xrefs) { > $feat = $feat2; > last; > } > } > my ($protein_seq) = $feat->annotation->get_Annotations("translation"); > ok($protein_seq =~ /^MTILFLTMVISYFGCMKA.*GWRFIRIDTSCVCTLTIKRGR > $/, "protein sequence"); > my ($nucleotide_seq) = $feat->seq->seq; > ok($nucleotide_seq =~ /^ATGACCATCCTTTTCCTT.*ACCATTAAAAGGGGAAGATAG > $/, "nucleotide sequence"); > is(length($nucleotide_seq), > 744, "nucleotide length"); > > # Jump down to L26462 which has a CDS join > (866..957,1088..1310,2161..2289), which is broken? > while ($seq->accession ne "L26462") { > $seq = $stream->next_seq; > } > ok(my @features = $seq->get_SeqFeatures(), "get_SeqFeatures()"); > my $feat; > foreach my $feat2 ( @features ) { > next unless ($feat2->primary_tag eq "CDS"); > my @db_xrefs = $feat2->annotation->get_Annotations("db_xref"); > if (grep { $_ eq "GI:532506" } @db_xrefs) { > $feat = $feat2; > last; > } > } > my ($protein_seq) = $feat->annotation->get_Annotations("translation"); > ok($protein_seq =~ /^MVHLTPEEKSAVTALWGK.*VQAAYQKVVAGVANALAHKYH > $/, "protein sequence"); > my ($nucleotide_seq) = $feat->seq->seq; > ok($nucleotide_seq =~ /^ATGGTGCATCTGACTCCT.*CTGGCCCACAAGTATCACTAA > $/, "nucleotide sequence - correct CDS range"); > #print "[$nucleotide_seq]\n"; > ok($nucleotide_seq !~ /^ACCTCCTATTTGACACCA.*TGCTAGTCTCCCGGAACTATC > $/, "nucleotide sequence - full nucleotide should not match"); > is(length($nucleotide_seq), > 444, "nucleotide length"); > > # I have an old(?) version of M12730 which lists > # CDS join(1959..2355,1..92) > # /db_xref="GI:150830" > # Crazy ranges like that don't work at all, you end up with the > full nucleotide sequence... > # But NCBI doesn't list M12730 that way any more, so now I would be > OK? > > # ------------------ From sac at bioperl.org Thu Mar 1 11:30:59 2007 From: sac at bioperl.org (Steve Chervitz) Date: Thu, 1 Mar 2007 09:30:59 -0700 Subject: [Bioperl-l] [Bioperl-announce-l] BioPerl leadership additions In-Reply-To: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> References: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> Message-ID: <000101c75c1e$fecb7770$6400a8c0@CodonSolutions.local> Welcome to the club, Chris & Sendu. Always good to have an infusion of new blood and capable, motivated hands. Steve On 2/26/07, Jason Stajich wrote: > > Dear BioPerl Users and Developers, > > I want to announce a addition in the leadership of BioPerl. > Christopher Fields and and Sendu Bala are now members of the BioPerl > Core developer group to recognize their ongoing leadership in the > project. Chris and Sendu were instrumental in the 1.5.2 Developer > release and have made a significant commitment and contribution to > the quality of the code and the documentation of the project. We > have invited them to be part of the core to recognize their work and > to feel comfortable to ask them to do more. ;-) > > The Core group was established to insure that someone was responsible > for making code releases, vetting new developers for CVS write > accounts, and generally dealing with things that might otherwise slip > through the cracks. We are very excited to have more people > contributing to and maintaining the toolkit. We look forward to > their help along with all the other developers, as we work towards a > 1.6 release release this year. > > As always, while their is a need for some individuals to lead the > project, we encourage contributions from all levels of expertise to > improve the code, documentation, and tutorials of the project. > > We plan to discuss the progress of the toolkit at this year's > Bioinformatics Open Source Conference held in Vienna, Austria in > conjunction with the SIG meetings at ISMB. We are trying to use > BOSC 2007 as a chance for the developers of Open Bioinformatics > Foundation sponsored and related projects to coordinate future > development and release cycles. > > Jason Stajich on behalf of the Core developers > > _______________________________________________ > Bioperl-announce-l mailing list > Bioperl-announce-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l > _______________________________________________ Bioperl-announce-l mailing list Bioperl-announce-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l From arareko at campus.iztacala.unam.mx Thu Mar 1 11:30:59 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 1 Mar 2007 09:30:59 -0700 Subject: [Bioperl-l] [Bioperl-announce-l] BioPerl leadership additions In-Reply-To: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> References: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> Message-ID: <000001c75c1e$fec90670$6400a8c0@CodonSolutions.local> Congrats Chris & Sendu! Very well-deserved. Keep up the great work. Cheers! Mauricio. Jason Stajich wrote: > Dear BioPerl Users and Developers, > > I want to announce a addition in the leadership of BioPerl. > Christopher Fields and and Sendu Bala are now members of the BioPerl > Core developer group to recognize their ongoing leadership in the > project. Chris and Sendu were instrumental in the 1.5.2 Developer > release and have made a significant commitment and contribution to > the quality of the code and the documentation of the project. We > have invited them to be part of the core to recognize their work and > to feel comfortable to ask them to do more. ;-) > > The Core group was established to insure that someone was responsible > for making code releases, vetting new developers for CVS write > accounts, and generally dealing with things that might otherwise slip > through the cracks. We are very excited to have more people > contributing to and maintaining the toolkit. We look forward to > their help along with all the other developers, as we work towards a > 1.6 release release this year. > > As always, while their is a need for some individuals to lead the > project, we encourage contributions from all levels of expertise to > improve the code, documentation, and tutorials of the project. > > We plan to discuss the progress of the toolkit at this year's > Bioinformatics Open Source Conference held in Vienna, Austria in > conjunction with the SIG meetings at ISMB. We are trying to use > BOSC 2007 as a chance for the developers of Open Bioinformatics > Foundation sponsored and related projects to coordinate future > development and release cycles. > > Jason Stajich on behalf of the Core developers > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM _______________________________________________ Bioperl-announce-l mailing list Bioperl-announce-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l From johnsonm at gmail.com Thu Mar 1 11:49:20 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Thu, 1 Mar 2007 10:49:20 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: References: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> Message-ID: On 2/28/07, Hilmar Lapp wrote: > > > I'm not sure how the user would be able to take out the child hitting > ctrl-c if you run it through system() (except if the parent > terminates first - but maybe then terminating a run-away child is in > good order). Quoting the perlfunc docs on system: Since "SIGINT" and "SIGQUIT" are ignored during the execution of "system", if you expect your program to terminate on receipt of these signals you will need to arrange to do so yourself based on the return value. @args = ("command", "arg1", "arg2"); system(@args) == 0 or die "system @args failed: $?" You can check all the failure possibilities by inspecting $? like this: if ($? == -1) { print "failed to execute: $!\n"; } elsif ($? & 127) { printf "child died with signal %d, %s coredump\n", ($? & 127), ($? & 128) ? 'with' : 'without'; } else { printf "child exited with value %d\n", $? >> 8; } or more portably by using the W*() calls of the POSIX exten? sion; see perlport for more information. When the arguments get executed via the system shell, results and return codes will be subject to its quirks and capabili? ties. See "'STRING'" in perlop and "exec" for details. So, during a call to system(), a CTRL-C (SIGINT) won't take out the parent, but it will take out the child, unless the child has caught it and handled it. If you don't care why the child failed, just that it did, I suppose the distinction is a subtle one. > I haven't read the IPC::run POD in full detail but you will want to > make sure that if the parent gets killed the child does get killed > too, or otherwise you'll have a run-away process that novices will > have trouble with understanding or terminating. I'll double check. > Other than that though IPC::run seems like a useful module, so > incurring this as a dependency should be fine. > From thiago.venancio at gmail.com Thu Mar 1 13:02:14 2007 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Thu, 1 Mar 2007 15:02:14 -0300 Subject: [Bioperl-l] frac_aligned_query returning results >1. Message-ID: <44255ea80703011002l245e9576s66319ee695d3bd5b@mail.gmail.com> Hi all, I have read a lot of threads regarding my issue, but still didn't get any efficient answer yet. I am with problems with frac_aligned_query(). It is returning "> 1" results. I have just updated my SearhUtils.pm from: http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/Search/SearchUtils.pm The problem persists and, additionally, I get several warnings like: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Undefined sub-sequence (1507,1507) . Valid range = 1444 - 1507 STACK: Error::throw STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:328 STACK: Bio::Search::HSP::HSPI::matches /usr/share/perl5/Bio/Search/HSP/HSPI.pm:711 STACK: Bio::Search::SearchUtils::_adjust_contigs /usr/share/perl5/Bio/Search/SearchUtils.pm:489 STACK: Bio::Search::SearchUtils::tile_hsps /usr/share/perl5/Bio/Search/SearchUtils.pm:200 STACK: Bio::Search::Hit::GenericHit::frac_aligned_query /usr/share/perl5/Bio/Search/Hit/GenericHit.pm:1145 STACK: ./geraStatGenome.pl:17 My code is pretty clean: while( my $hit = $result->next_hit ) { print $result->query_name."\t".$hit->frac_aligned_query('query')."\t".$hit->frac_identical( 'query' )."\n"; last; } Thanks. Thiago -- "The way to get started is to quit talking and begin doing." Walt Disney ======================== Thiago Motta Venancio, MSc PhD student in Bioinformatics University of Sao Paulo ======================== From cjfields at uiuc.edu Thu Mar 1 13:27:10 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Mar 2007 12:27:10 -0600 Subject: [Bioperl-l] frac_aligned_query returning results >1. In-Reply-To: <44255ea80703011002l245e9576s66319ee695d3bd5b@mail.gmail.com> References: <44255ea80703011002l245e9576s66319ee695d3bd5b@mail.gmail.com> Message-ID: <8D7B0767-46A6-4083-B94E-D6490B241B84@uiuc.edu> This is related to a reported bug: http://bugzilla.open-bio.org/show_bug.cgi?id=2193 The relevant code used to tile HSPs is a bit brittle and sometimes leads to errors like this. The error (which is actually a thrown exception) is wrapped in an eval block and converted to a warn for that reason. I'm not familiar with the tiling algorithm used, maybe Steve can add some input? chris On Mar 1, 2007, at 12:02 PM, Thiago Venancio wrote: > Hi all, > > I have read a lot of threads regarding my issue, but still didn't > get any > efficient answer yet. > > I am with problems with frac_aligned_query(). It is returning "> 1" > results. > I have just updated my SearhUtils.pm from: > http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ > Bio/Search/SearchUtils.pm > > > The problem persists and, additionally, I get several warnings like: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Undefined sub-sequence (1507,1507) . Valid range = 1444 - 1507 > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:328 > STACK: Bio::Search::HSP::HSPI::matches > /usr/share/perl5/Bio/Search/HSP/HSPI.pm:711 > STACK: Bio::Search::SearchUtils::_adjust_contigs > /usr/share/perl5/Bio/Search/SearchUtils.pm:489 > STACK: Bio::Search::SearchUtils::tile_hsps > /usr/share/perl5/Bio/Search/SearchUtils.pm:200 > STACK: Bio::Search::Hit::GenericHit::frac_aligned_query > /usr/share/perl5/Bio/Search/Hit/GenericHit.pm:1145 > STACK: ./geraStatGenome.pl:17 > > My code is pretty clean: > > while( my $hit = $result->next_hit ) { > print > $result->query_name."\t".$hit->frac_aligned_query('query')."\t". > $hit->frac_identical( > 'query' )."\n"; > last; > } > > > Thanks. > > Thiago > > > -- > "The way to get started is to quit talking and begin doing." > Walt Disney > > ======================== > Thiago Motta Venancio, MSc > PhD student in Bioinformatics > University of Sao Paulo > ======================== > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From Kevin.M.Brown at asu.edu Thu Mar 1 13:28:22 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 1 Mar 2007 11:28:22 -0700 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <45E5F43C.9080902@sendu.me.uk> References: <45E5F43C.9080902@sendu.me.uk> Message-ID: <1A4207F8295607498283FE9E93B775B402CC9D60@EX02.asurite.ad.asu.edu> Are you certain that GD has SVG enabled in it? Sounds like this error is from outside the bioperl panel and is instead from GD and the GD perl module. > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Wednesday, February 28, 2007 2:30 PM > To: bioperl list > Cc: Lincoln Stein > Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails > > I have GD 2.35 and GD::SVG 2.33 installed. > > I have a working script in which a Bio::Graphics::Panel > object is made and output with: > > print $panel->png; > > This is fine. Changing it to: > > print $panel->svg; > > Gives the error: > > Can't locate object method "svg" via package "GD:Image" at > /.../Bio/Graphics/Panel.pm line 971, line 192. > > > Am I supposed to do something else to get this to work? > > > Cheers, > Sendu. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Thu Mar 1 14:51:19 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Mar 2007 13:51:19 -0600 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <1A4207F8295607498283FE9E93B775B402CC9D60@EX02.asurite.ad.asu.edu> References: <45E5F43C.9080902@sendu.me.uk> <1A4207F8295607498283FE9E93B775B402CC9D60@EX02.asurite.ad.asu.edu> Message-ID: Does SVG output via GD still require GD::SVG (or SVG::GD, I can't remember which)? chris On Mar 1, 2007, at 12:28 PM, Kevin Brown wrote: > Are you certain that GD has SVG enabled in it? Sounds like this error > is from outside the bioperl panel and is instead from GD and the GD > perl > module. .. From stefan.kirov at bms.com Thu Mar 1 15:11:11 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 01 Mar 2007 15:11:11 -0500 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: References: <45E5F43C.9080902@sendu.me.uk> <1A4207F8295607498283FE9E93B775B402CC9D60@EX02.asurite.ad.asu.edu> Message-ID: <45E7335F.8070102@bms.com> Chris Fields wrote: > Does SVG output via GD still require GD::SVG (or SVG::GD, I can't > remember which)? > > chris > > On Mar 1, 2007, at 12:28 PM, Kevin Brown wrote: > > >> Are you certain that GD has SVG enabled in it? Sounds like this error >> is from outside the bioperl panel and is instead from GD and the GD >> perl >> module. >> > .. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Guys, I think you missed parts of the discussion yesterday, it was the object constructor, which decides if it should use GD or GD::SVG... Stefan From cjfields at uiuc.edu Thu Mar 1 15:14:41 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Mar 2007 14:14:41 -0600 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <45E7335F.8070102@bms.com> References: <45E5F43C.9080902@sendu.me.uk> <1A4207F8295607498283FE9E93B775B402CC9D60@EX02.asurite.ad.asu.edu> <45E7335F.8070102@bms.com> Message-ID: <5D75FAFC-F71A-4528-8650-818C2CFC85FF@uiuc.edu> Nope, I saw that. I was just curious; I hadn't used GD in a while but will be soon... chris On Mar 1, 2007, at 2:11 PM, Stefan Kirov wrote: > Chris Fields wrote: >> Does SVG output via GD still require GD::SVG (or SVG::GD, I can't >> remember which)? >> >> chris >> >> On Mar 1, 2007, at 12:28 PM, Kevin Brown wrote: >> >> >>> Are you certain that GD has SVG enabled in it? Sounds like this >>> error >>> is from outside the bioperl panel and is instead from GD and the >>> GD perl >>> module. >>> >> .. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > Guys, I think you missed parts of the discussion yesterday, it was > the object constructor, which decides if it should use GD or > GD::SVG... > Stefan From stefan.kirov at bms.com Thu Mar 1 15:20:46 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 01 Mar 2007 15:20:46 -0500 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <45E7335F.8070102@bms.com> References: <45E5F43C.9080902@sendu.me.uk> <1A4207F8295607498283FE9E93B775B402CC9D60@EX02.asurite.ad.asu.edu> <45E7335F.8070102@bms.com> Message-ID: <45E7359E.5030104@bms.com> Stefan Kirov wrote: > Chris Fields wrote: > >> Does SVG output via GD still require GD::SVG (or SVG::GD, I can't >> remember which)? >> >> chris >> >> On Mar 1, 2007, at 12:28 PM, Kevin Brown wrote: >> >> >> >>> Are you certain that GD has SVG enabled in it? Sounds like this error >>> is from outside the bioperl panel and is instead from GD and the GD >>> perl >>> module. >>> >>> >> .. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > Guys, I think you missed parts of the discussion yesterday, it was the > object constructor, which decides if it should use GD or GD::SVG... > Stefan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > OK, sorry.. In any case yes, it requires GD::SVG since the constructor instantiate GD::SVG object if you pass -image_class=~'svg' Stefan From jay at jays.net Thu Mar 1 16:15:03 2007 From: jay at jays.net (Jay Hannah) Date: Thu, 1 Mar 2007 15:15:03 -0600 (CST) Subject: [Bioperl-l] Enhance Bio::PrimarySeqI::trunc() for Bio::Location::Split ? In-Reply-To: References: Message-ID: On Thu, 1 Mar 2007, Chris Fields wrote: > Have you tried using $feature->spliced_seq() instead of seq()? Using > seq() retrieves the full sequence for the split location (from start > of first sublocation to end of last), while spliced_seq() splices the > sublocation sequences together, which is what I think you want. Genius. No wonder they promoted you into the core developer group. :) Using this: my ($nucleotide_seq) = $feat->spliced_seq(-nosort => 1)->seq; Gives me what I expected against these: # M37762 CDS 76..819 # L26462 CDS join(866..957,1088..1310,2161..2289) # M12730 CDS join(1959..2355,1..92) I'm happy to submit my patches for t/genbank.t and t/data/test.genbank if that would make the universe a slightly better place. (...or t/SeqFeature.t or t/splicedseq.t, which appear to be the tests that have spliced_seq calls in them so far...) Thanks! j seqlab.net http://www.bioperl.org/wiki/User:Jhannah From lstein at cshl.edu Thu Mar 1 15:39:12 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 1 Mar 2007 15:39:12 -0500 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <45E7359E.5030104@bms.com> References: <45E5F43C.9080902@sendu.me.uk> <1A4207F8295607498283FE9E93B775B402CC9D60@EX02.asurite.ad.asu.edu> <45E7335F.8070102@bms.com> <45E7359E.5030104@bms.com> Message-ID: <6dce9a0b0703011239m7065a570l6d19e6c7065fca45@mail.gmail.com> You need to have GD::SVG installed and then instantiate the panel with: -image_class=>'GD::SVG' Lincoln On 3/1/07, Stefan Kirov wrote: > > Stefan Kirov wrote: > > Chris Fields wrote: > > > >> Does SVG output via GD still require GD::SVG (or SVG::GD, I can't > >> remember which)? > >> > >> chris > >> > >> On Mar 1, 2007, at 12:28 PM, Kevin Brown wrote: > >> > >> > >> > >>> Are you certain that GD has SVG enabled in it? Sounds like this error > >>> is from outside the bioperl panel and is instead from GD and the GD > >>> perl > >>> module. > >>> > >>> > >> .. > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >> > > Guys, I think you missed parts of the discussion yesterday, it was the > > object constructor, which decides if it should use GD or GD::SVG... > > Stefan > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > OK, sorry.. > In any case yes, it requires GD::SVG since the constructor instantiate > GD::SVG object if you pass -image_class=~'svg' > Stefan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From stefan.kirov at bms.com Thu Mar 1 16:03:03 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 01 Mar 2007 16:03:03 -0500 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <6dce9a0b0703011239m7065a570l6d19e6c7065fca45@mail.gmail.com> References: <45E5F43C.9080902@sendu.me.uk> <1A4207F8295607498283FE9E93B775B402CC9D60@EX02.asurite.ad.asu.edu> <45E7335F.8070102@bms.com> <45E7359E.5030104@bms.com> <6dce9a0b0703011239m7065a570l6d19e6c7065fca45@mail.gmail.com> Message-ID: <45E73F87.1090104@bms.com> Lincoln Stein wrote: > You need to have GD::SVG installed and then instantiate the panel with: > -image_class=>'GD::SVG' > Yes, silly me I was looking at the code and did not realize that =~/svg/ is only to check, and the actual class name is taken of the arg list. Sorry Chris. Stefan > Lincoln > > On 3/1/07, Stefan Kirov wrote: > >> Stefan Kirov wrote: >> >>> Chris Fields wrote: >>> >>> >>>> Does SVG output via GD still require GD::SVG (or SVG::GD, I can't >>>> remember which)? >>>> >>>> chris >>>> >>>> On Mar 1, 2007, at 12:28 PM, Kevin Brown wrote: >>>> >>>> >>>> >>>> >>>>> Are you certain that GD has SVG enabled in it? Sounds like this error >>>>> is from outside the bioperl panel and is instead from GD and the GD >>>>> perl >>>>> module. >>>>> >>>>> >>>>> >>>> .. >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> >>> Guys, I think you missed parts of the discussion yesterday, it was the >>> object constructor, which decides if it should use GD or GD::SVG... >>> Stefan >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> OK, sorry.. >> In any case yes, it requires GD::SVG since the constructor instantiate >> GD::SVG object if you pass -image_class=~'svg' >> Stefan >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > > From lstein at cshl.edu Thu Mar 1 16:10:52 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 1 Mar 2007 16:10:52 -0500 Subject: [Bioperl-l] [Bioperl-announce-l] BioPerl leadership additions In-Reply-To: <8f200b4c0702270959n7521f693l915bfabe7ccb7ef7@mail.gmail.com> References: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> <8f200b4c0702270959n7521f693l915bfabe7ccb7ef7@mail.gmail.com> Message-ID: <6dce9a0b0703011310n6c4ec150hfb7835ed576461d4@mail.gmail.com> Hear hear! Lincoln On 2/27/07, Steve Chervitz wrote: > > Welcome to the club, Chris & Sendu. Always good to have an infusion of new > blood and capable, motivated hands. > > Steve > > On 2/26/07, Jason Stajich wrote: > > > > Dear BioPerl Users and Developers, > > > > I want to announce a addition in the leadership of BioPerl. > > Christopher Fields and and Sendu Bala are now members of the BioPerl > > Core developer group to recognize their ongoing leadership in the > > project. Chris and Sendu were instrumental in the 1.5.2 Developer > > release and have made a significant commitment and contribution to > > the quality of the code and the documentation of the project. We > > have invited them to be part of the core to recognize their work and > > to feel comfortable to ask them to do more. ;-) > > > > The Core group was established to insure that someone was responsible > > for making code releases, vetting new developers for CVS write > > accounts, and generally dealing with things that might otherwise slip > > through the cracks. We are very excited to have more people > > contributing to and maintaining the toolkit. We look forward to > > their help along with all the other developers, as we work towards a > > 1.6 release release this year. > > > > As always, while their is a need for some individuals to lead the > > project, we encourage contributions from all levels of expertise to > > improve the code, documentation, and tutorials of the project. > > > > We plan to discuss the progress of the toolkit at this year's > > Bioinformatics Open Source Conference held in Vienna, Austria in > > conjunction with the SIG meetings at ISMB. We are trying to use > > BOSC 2007 as a chance for the developers of Open Bioinformatics > > Foundation sponsored and related projects to coordinate future > > development and release cycles. > > > > Jason Stajich on behalf of the Core developers > > > > _______________________________________________ > > Bioperl-announce-l mailing list > > Bioperl-announce-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Thu Mar 1 16:23:49 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 1 Mar 2007 16:23:49 -0500 Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9503E2EBE3@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk> <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com> <8975119BCD0AC5419D61A9CF1A923E9503E2EBE3@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <6dce9a0b0703011323i44dea645ha5aba6361dbc964@mail.gmail.com> I'm glad you picked that up. I would have never noticed the missing comma. NB: if you set "use warnings" at the top of your script, then you would have gotten an error message about subtraction with an undefined variable. Lincoln On 3/1/07, michael watson (IAH-C) wrote: > > In fact, those pad_left and pad_right arguments have no effect whatsoever > (using bioperl 1.5.2_100) > > my $panel = Bio::Graphics::Panel->new(-key_style => between, > -offset => $start, > -length => $stop - $start + 1, > -width => 800 > -pad_left =>5000, > -pad_right =>5000 > ); > Even if I set them to 5000, the image looks exactly as if I had not set > them. > > The only way I can get around this is to edit Glyph/dna.pm lines 184 and > 186 such that $x2+3 is changed to $x2-20 - then the axes are drawn on the > image instead of outside of it. This is obviously a hack, which upsets my > karma. > > Mick > > ------------------------------ > *From:* lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] *On > Behalf Of *Lincoln Stein > *Sent:* 15 February 2007 18:53 > *To:* michael watson (IAH-C) > *Cc:* BioPerl-List > *Subject:* Re: [Bioperl-l] The axis of GC content in > Bio::Graphics::glyph:dna > > Hi Michael, > > When you set up the panel, do this: > > Bio::Graphics::Panel->new(-blah -blah, > -pad_left => 20, > -pad_right => 20); > > This will leave enough room on the left and right for you to see the Y > axis. Otherwise it runs off the edge of the image (ok, this is a mis-design, > but it was the only way to solve a chicken-and-egg problem about who gets to > say how wide the panel is) > > Lincoln > > On 2/15/07, michael watson (IAH-C) wrote: > > > > Hi > > > > OK I have some great images out of this glyph, but I can't see the axis, > > and nor is it labelled (ie does it go from 0 - 100%?) so isn't great for > > publication. The docs say: > > > > "NOTE: -gc_window=>'auto' gives nice results and is recommended for > > drawing GC content. The GC content axes draw slightly outside the > > panel, so you may wish to add some extra padding on the right and > > left. " > > > > Any idea how to do this? > > > > Basically, I want a nice GC graph with the axis quite clearly labelled, > > and a nice "%GC" title next to it :) > > > > Thanks > > > > Mick > > > > The information contained in this message may be confidential or legally > > privileged and is intended solely for the addressee. If you have > > received this message in error please delete it & notify the originator > > immediately. > > Unauthorised use, disclosure, copying or alteration of this message is > > forbidden & may be unlawful. > > The contents of this e-mail are the views of the sender and do not > > necessarily represent the views of the Institute. > > This email and associated attachments has been checked locally for > > viruses but we can accept no responsibility once it has left our > > systems. > > Communications on Institute computers are monitored to secure the > > effective operation of the systems and for other lawful purposes. > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Thu Mar 1 16:25:09 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Mar 2007 15:25:09 -0600 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <45E73F87.1090104@bms.com> References: <45E5F43C.9080902@sendu.me.uk> <1A4207F8295607498283FE9E93B775B402CC9D60@EX02.asurite.ad.asu.edu> <45E7335F.8070102@bms.com> <45E7359E.5030104@bms.com> <6dce9a0b0703011239m7065a570l6d19e6c7065fca45@mail.gmail.com> <45E73F87.1090104@bms.com> Message-ID: <6727E8E1-F2D9-4C4E-843F-FC6D53ADAAA7@uiuc.edu> No problemo. chris On Mar 1, 2007, at 3:03 PM, Stefan Kirov wrote: > Lincoln Stein wrote: >> You need to have GD::SVG installed and then instantiate the panel >> with: >> -image_class=>'GD::SVG' >> > Yes, silly me I was looking at the code and did not realize that =~/ > svg/ > is only to check, and the actual class name is taken of the arg list. > Sorry Chris. > Stefan >> Lincoln >> >> On 3/1/07, Stefan Kirov wrote: >> >>> Stefan Kirov wrote: >>> >>>> Chris Fields wrote: >>>> >>>> >>>>> Does SVG output via GD still require GD::SVG (or SVG::GD, I can't >>>>> remember which)? >>>>> >>>>> chris >>>>> >>>>> On Mar 1, 2007, at 12:28 PM, Kevin Brown wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> Are you certain that GD has SVG enabled in it? Sounds like >>>>>> this error >>>>>> is from outside the bioperl panel and is instead from GD and >>>>>> the GD >>>>>> perl >>>>>> module. >>>>>> >>>>>> >>>>>> >>>>> .. >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> >>>> Guys, I think you missed parts of the discussion yesterday, it >>>> was the >>>> object constructor, which decides if it should use GD or GD::SVG... >>>> Stefan >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>> OK, sorry.. >>> In any case yes, it requires GD::SVG since the constructor >>> instantiate >>> GD::SVG object if you pass -image_class=~'svg' >>> Stefan >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Mar 1 16:29:18 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Mar 2007 15:29:18 -0600 Subject: [Bioperl-l] Enhance Bio::PrimarySeqI::trunc() for Bio::Location::Split ? In-Reply-To: References: Message-ID: On Mar 1, 2007, at 3:15 PM, Jay Hannah wrote: > On Thu, 1 Mar 2007, Chris Fields wrote: >> Have you tried using $feature->spliced_seq() instead of seq()? Using >> seq() retrieves the full sequence for the split location (from start >> of first sublocation to end of last), while spliced_seq() splices the >> sublocation sequences together, which is what I think you want. > > Genius. No wonder they promoted you into the core developer group. :) > > Using this: > my ($nucleotide_seq) = $feat->spliced_seq(-nosort => 1)->seq; > > Gives me what I expected against these: > > # M37762 CDS 76..819 > # L26462 CDS join(866..957,1088..1310,2161..2289) > # M12730 CDS join(1959..2355,1..92) > > I'm happy to submit my patches for t/genbank.t and t/data/ > test.genbank if > that would make the universe a slightly better place. (...or > t/SeqFeature.t or t/splicedseq.t, which appear to be the tests that > have > spliced_seq calls in them so far...) > > Thanks! > > j > seqlab.net > http://www.bioperl.org/wiki/User:Jhannah The more the merrier tests the better, I say. I would only put in one example, though (maybe the last one, M12730, since it's from a gene in a circular sequence split across the start). I'm still planning on testing out some variations of Bio::Location::SplitLocationI (which impacts sliced_seq() ) and have started a page on it, so any added tests would be great. chris From harris at cshl.edu Thu Mar 1 15:09:16 2007 From: harris at cshl.edu (Todd Harris) Date: Thu, 1 Mar 2007 13:09:16 -0700 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: References: <45E5F43C.9080902@sendu.me.uk> Message-ID: <1172779760.7B32453@fd9.dngr.org> Hi Chris - I don't believe that GD or gd for that matter can generate SVG but I could be wrong. SVG output can be generated from GD using either GD::SVG or SVG::GD, two modules that accomplish the same task through a similar strategy. Todd On Thu, 1 Mar 2007 2:05 pm, Chris Fields wrote: > Does SVG output via GD still require GD::SVG (or SVG::GD, I can't > remember which)? > > chris > > On Mar 1, 2007, at 12:28 PM, Kevin Brown wrote: > >> Are you certain that GD has SVG enabled in it? Sounds like this error >> is from outside the bioperl panel and is instead from GD and the GD >> perl >> module. > .. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From johnsonm at gmail.com Thu Mar 1 17:46:12 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Thu, 1 Mar 2007 16:46:12 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: <45E61AA9.9030906@sendu.me.uk> References: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> <45E61AA9.9030906@sendu.me.uk> Message-ID: Now that I'm using _set_from_args() and trying to get all the options and switches working that I never use, it occurs to me that a 4-in-1 module for Glimmer2/Glimmer3/GlimmerM/GllimmerHMM is not going to fly due to the options and switches being different. At this point, I think I'm going to end up with a Genemark module, a Glimmer2 module, and a Glimmer3 module. Feh. On 2/28/07, Sendu Bala wrote: > > Mark Johnson wrote: > > I'm using _rearrange() now. I'll look at _set_from_args(). Is either > one > > preferred to the other? > > _set_from_args() is implemented using _rearrange() iirc. In any case, > they do different things but _set_from_args() just makes creating > wrapper modules a lot simpler. Another example: compare revisions 1.15 > and 1.16 of Bio::Tools::Run::Alignment::Lagan where I reimplemented it > to use _set_from_args() and _setparams(). > > > http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-run/Bio/Tools/Run/Alignment/Lagan.pm.diff?r1=text&tr1=1.15&r2=text&tr2=1.16&diff_format=h > > So, its new, but I'd recommend new modules, especially wrappers, make > use of it. > From bix at sendu.me.uk Thu Mar 1 18:06:54 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 01 Mar 2007 23:06:54 +0000 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: References: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> <45E61AA9.9030906@sendu.me.uk> Message-ID: <45E75C8E.7010809@sendu.me.uk> Mark Johnson wrote: > Now that I'm using _set_from_args() and trying to get all the options > and switches working that I never use, it occurs to me that a 4-in-1 > module for Glimmer2/Glimmer3/GlimmerM/GllimmerHMM is not going to fly > due to the options and switches being different. At this point, I think > I'm going to end up with a Genemark module, a Glimmer2 module, and a > Glimmer3 module. Feh. I think a 4in1 would still be possible. Presumably at some point you know which one you will run, so let the user set everything in the single new() even if it doesn't make sense, but then form argument strings with sub _setparams { ... if ($glimmer2) { my $param_string = $self->SUPER::_setparams( -params => [@glim2params], -dash => 1); } elsif ($glimmer3) { ... Or if you want to be stricter in new(), do something like: sub new { my($class, @args) = @_; my $self = $class->SUPER::new(@args); my ($type) = $self->_rearrange([qw(TYPE)], @args); if ($type eq 'glimmer2') } $self->_set_from_args(\@args, -methods => [@glim2params], -create => 1); } elsif ($type eq ... You'll have to figure out something yourself if you want to warn about the user supplying args that their requested type doesn't use. All that said, if these Glimmer things are different programs with different uses (and not simply different versions of the same thing with the same function), by all means make separate modules. From cjfields at uiuc.edu Thu Mar 1 18:08:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Mar 2007 17:08:46 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: References: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> <45E61AA9.9030906@sendu.me.uk> Message-ID: <9CE93EC0-A9DB-4A5B-8CE7-F15795375587@uiuc.edu> I have been working on an Infernal wrapper (not finished yet but getting there) which does this: # when run() is called, cmsearch is the program run... my $factory = Bio::Tools::Run::Infernal->new('-program' =>'cmsearch', @params); in Infernal.pm: # for each program... my %INFERNAL_PROGRAM = ( ... cmsearch => [qw(h W informat toponly local noalign dumptrees thresh X inside null2 learninserts hmmfb hmmweinberg hmmpad hmmonly hthresh beta noqdb qdbfile hbanded hbandp banddump sums scan2bands)], ... ); then set in new() based on only the parameters listed for the program; I'm still toying with whether the program needs to be specified in the constructor prior to a run. There are prob. other variations on this using AUTOLOAD and _set_from_args() etc. chris On Mar 1, 2007, at 4:46 PM, Mark Johnson wrote: > Now that I'm using _set_from_args() and trying to get all the > options and switches working that I never use, it occurs to me that > a 4-in-1 module for Glimmer2/Glimmer3/GlimmerM/GllimmerHMM is not > going to fly due to the options and switches being different. At > this point, I think I'm going to end up with a Genemark module, a > Glimmer2 module, and a Glimmer3 module. Feh. From lstein at cshl.edu Thu Mar 1 18:12:29 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 1 Mar 2007 18:12:29 -0500 Subject: [Bioperl-l] Bio::Factory::EMBOSS, CGI and taint In-Reply-To: <45E55884.9010908@uq.edu.au> References: <45E55884.9010908@uq.edu.au> Message-ID: <6dce9a0b0703011512k360bd94dv82e143d4477ebcea@mail.gmail.com> You'll need to set the %ENV hash to a known safe state. e.g.: $ENV{PATH}="/bin:/usr/bin:/usr/local/bin"; Lincoln On 2/28/07, Neil Saunders wrote: > > Dear Bioperlers, > > I'm trying to understand an error that occurs when Bio::Factory::EMBOSS is > used > in a CGI script. Using BioPerl 1.5.2 on Ubuntu Dapper, Apache 2.0.55, > Perl 5.8.7. > > If I load this test CGI script (cgi.pl) in a browser: > > BEGIN CODE > ---------- > #!/usr/bin/perl -Tw > use strict; > use CGI; > use Bio::Factory::EMBOSS; > > my $cgi = new CGI; > my $f = new Bio::Factory::EMBOSS; > > print $cgi->header, > $cgi->start_html, > $cgi->end_html; > -------- > END CODE > > I get a 500 server error and the Apache error log reads: > [error] [client 192.168.0.3] Premature end of script headers: cgi.pl > > I can fix this in 2 ways: > > (1) Move the "my $f = new Bio::Factory::EMBOSS" line to the end of the > script, > which isn't a very useful fix. > (2) Remove the -T switch from the shebang line > > There seem to be a few old posts on the list regarding "taint-safe" > modules. It > seems that the new Bio::Factory::EMBOSS object is interfering with the > headers > in some way, but I'm no CGI.pm guru and wondered if anyone could shed > light on this. > > thanks, > Neil > -- > School of Molecular and Microbial Sciences > University of Queensland > Brisbane 4072 Australia > > http://nsaunders.wordpress.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From arareko at campus.iztacala.unam.mx Thu Mar 1 21:52:49 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 01 Mar 2007 20:52:49 -0600 Subject: [Bioperl-l] BioPerl in MiniCPAN Message-ID: <45E79181.9090404@campus.iztacala.unam.mx> Folks, Just found this post by Brian D Foy at the O'Reilly ONLamp Blog, BioPerl takes a reasonable part in the picture: http://www.oreillynet.com/onlamp/blog/2007/02/minicpan_and_grandperspective.html It would be interesting to see the same graphic for the whole CPAN repository... :) Cheers, Mauricio. -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From heikki at sanbi.ac.za Fri Mar 2 01:08:15 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Fri, 2 Mar 2007 08:08:15 +0200 Subject: [Bioperl-l] Bio::SeqIO::FTHelper In-Reply-To: <8AA2B586-4C2B-4379-BFF0-0CEB15FAE68E@uiuc.edu> References: <200703011302.30855.heikki@sanbi.ac.za> <8AA2B586-4C2B-4379-BFF0-0CEB15FAE68E@uiuc.edu> Message-ID: <200703020808.15664.heikki@sanbi.ac.za> This sounds great. Is the speed increase noticeable? -Heikki On Thursday 01 March 2007 17:24:03 Chris Fields wrote: > I do have a rough outline of what I think could be done: > > http://www.bioperl.org/wiki/Handler-based_SeqIO_parsers > > where you could switch out handlers to deal with incoming data > chunks. Any suggestions there are welcome. > > I'll probably commit examples of the above in the next week or two > (GenBank, EMBL, Swiss parsers using the same handlers) which don't > use FTHelper. So far I have all three passing tests based on genbank/ > embl/swiss.t but they need a few more tweaks before I commit. > > chris > > On Mar 1, 2007, at 5:02 AM, Heikki Lehvaslaiho wrote: > > Chris, > > > > It was meant to collect code that was common to all three main > > databases using > > similar feature tables. > > > > Now might be the time to optimise the parsing speed by removing it. > > Do you > > have a plan how to do it? > > > > -Heikki > > > > On Tuesday 27 February 2007 22:57:40 Chris Fields wrote: > >> Could anyone tell me what FTHelper is used for? From what I gather > >> it rolls up seqfeature data into a lightweight object but then > >> creates a SeqFeature::Generic anyway (at least for GenBank/EMBL/ > >> Swiss), which seems to be a waste of memory and time. Is there > >> something I'm missing (besides my sanity of course)? > >> > >> chris > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > ______ _/ _/_____________________________________________________ > > _/ _/ > > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > > _/ _/ _/ SANBI, South African National Bioinformatics Institute > > _/ _/ _/ University of Western Cape, South Africa > > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > > ___ _/_/_/_/_/________________________________________________________ > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From bix at sendu.me.uk Fri Mar 2 06:23:45 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 02 Mar 2007 11:23:45 +0000 Subject: [Bioperl-l] frac_aligned_query returning results >1. In-Reply-To: <44255ea80703011002l245e9576s66319ee695d3bd5b@mail.gmail.com> References: <44255ea80703011002l245e9576s66319ee695d3bd5b@mail.gmail.com> Message-ID: <45E80941.6020406@sendu.me.uk> Thiago Venancio wrote: > Hi all, > > I have read a lot of threads regarding my issue, but still didn't get any > efficient answer yet. > > I am with problems with frac_aligned_query(). It is returning "> 1" > results. > I have just updated my SearhUtils.pm from: > http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/Search/SearchUtils.pm It shouldn't return results greater than 1. Please send me a minimal blast report that gives such results. Make sure you only have one copy of SearchUtils.pm and that is the latest version (or that you are definitely using that latest version). > The problem persists and, additionally, I get several warnings like: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Undefined sub-sequence (1507,1507) . Valid range = 1444 - 1507 > STACK: Error::throw I don't know about that problem. See Chris's reply. From bix at sendu.me.uk Fri Mar 2 06:25:25 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 02 Mar 2007 11:25:25 +0000 Subject: [Bioperl-l] frac_aligned_query returning results >1. In-Reply-To: <8D7B0767-46A6-4083-B94E-D6490B241B84@uiuc.edu> References: <44255ea80703011002l245e9576s66319ee695d3bd5b@mail.gmail.com> <8D7B0767-46A6-4083-B94E-D6490B241B84@uiuc.edu> Message-ID: <45E809A5.9060407@sendu.me.uk> Chris Fields wrote: > This is related to a reported bug: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2193 > > The relevant code used to tile HSPs is a bit brittle and sometimes > leads to errors like this. The error (which is actually a thrown > exception) is wrapped in an eval block and converted to a warn for > that reason. I'm not familiar with the tiling algorithm used, maybe > Steve can add some input? Depending on what exactly you're talking about here, I may have re-written that algorithm. Nice to know the bug survived ;) From bix at sendu.me.uk Fri Mar 2 06:42:26 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 02 Mar 2007 11:42:26 +0000 Subject: [Bioperl-l] Problem of Installing Bioperl In-Reply-To: <405300.74546.qm@web60523.mail.yahoo.com> References: <405300.74546.qm@web60523.mail.yahoo.com> Message-ID: <45E80DA2.6050303@sendu.me.uk> Chan Kuang Lim wrote: > Thank you for your reply, but still cant solve my problem. The folder > 'ppm-VzM4DH' do not exist in my system. so, there is no > bioperl-1.5.2_100.tgz. How i can get it? You should post back to the mailing list; I don't have a Windows machine to test things out on. Others with more Windows experience may be able to help you. I can suggest making sure you have the latest version of (ActiveState) perl and the GUI PPM installer. As a last resort you can ensure you have nmake installed and try installing with CPAN on the command-line. It will no doubt be helpful if you supply complete, unedited details of what you do and the errors you receive so we can diagnose your problem successfully. From n.haigh at sheffield.ac.uk Fri Mar 2 07:43:04 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 02 Mar 2007 12:43:04 +0000 Subject: [Bioperl-l] Problem of Installing Bioperl In-Reply-To: <459942.77644.qm@web60518.mail.yahoo.com> References: <459942.77644.qm@web60518.mail.yahoo.com> Message-ID: <45E81BD8.3030304@sheffield.ac.uk> Chan Kuang Lim wrote: > I have problem of installing bioperl in windows using command-line installation. > In the cmd windows, after > ppm-shell > search bioperl > install 2 > > many downloading had done, but the next line is: > Unpacking bioperl-1.5.2_100...ppm install failed: Can't extract files from C:.............../Bioperl-1.5.2_100.tgz > > > Hope you can answer my question. Thank you. > > Regards, > Chan Kuang Lim > Malaysia > > I should be able to help out, but I'm a little busy at the moment. If you are still having problems, let us know the details of your system, e.g. what version of windows, if you are logged in as an administrator, what version of activeperl and what version of Perl. Cheers Nath From cjfields at uiuc.edu Fri Mar 2 08:33:08 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 2 Mar 2007 07:33:08 -0600 Subject: [Bioperl-l] frac_aligned_query returning results >1. In-Reply-To: <45E809A5.9060407@sendu.me.uk> References: <44255ea80703011002l245e9576s66319ee695d3bd5b@mail.gmail.com> <8D7B0767-46A6-4083-B94E-D6490B241B84@uiuc.edu> <45E809A5.9060407@sendu.me.uk> Message-ID: On Mar 2, 2007, at 5:25 AM, Sendu Bala wrote: > Chris Fields wrote: >> This is related to a reported bug: >> http://bugzilla.open-bio.org/show_bug.cgi?id=2193 >> The relevant code used to tile HSPs is a bit brittle and >> sometimes leads to errors like this. The error (which is >> actually a thrown exception) is wrapped in an eval block and >> converted to a warn for that reason. I'm not familiar with the >> tiling algorithm used, maybe Steve can add some input? > > Depending on what exactly you're talking about here, I may have re- > written that algorithm. Nice to know the bug survived ;) Yep, I saw your commits (revs. 1.16, 1.17, 1.19, and 1.20). I can check code prior to that to see if it changes anything for better or worse or gets rid of the bug (prob. later today or tomorrow), though I can't see why your revisions would make it worse. If anything they're now more accurate. Thiago can also try; just pull up a revision prior to the ones listed above and see if it helps: http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ Search/SearchUtils.pm?cvsroot=bioperl Jason had previously indicted problems with tiling (i.e. similar exceptions were thrown) prior to your commits so I don't think your changes are related, but one never knows. Chris From bix at sendu.me.uk Fri Mar 2 08:35:34 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 02 Mar 2007 13:35:34 +0000 Subject: [Bioperl-l] (no subject) In-Reply-To: <58ff33550703010647j1e7908f3sf3c01a74eeceaca4@mail.gmail.com> References: <58ff33550703010647j1e7908f3sf3c01a74eeceaca4@mail.gmail.com> Message-ID: <45E82826.30007@sendu.me.uk> Luba Pardo wrote: > Dear all, > Sorry if the questions is too basic but I am trying to learn BioPerl > modules. So I am trying to get the CDS sequence from a gi identification > protein using the "features" method. I started to run the example of the FAQ > doc (How do I retrieve a nucleotide coding sequence when I have a protein gi > number?) [ http://www.bioperl.org/wiki/FAQ#How_do_I_retrieve_a_nucleotide_coding_sequence_when_I_have_a_protein_gi_number.3F ] [snip] > my $protein_gi = '405830'; > my $prot_obj = $gp->get_Seq_by_id($protein_gi);; > foreach my $feat ( $prot_obj->top_SeqFeatures ) { > if ( $feat->primary_tag eq 'CDS' ) { > # example: 'coded_by="U05729.1:1..122"' > my @coded_by = $feat->each_tag_value('coded_by'); > my ($nuc_acc, $loc_str) = split /\:/, $coded_by[0]; [snip] > The error I got is > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Must specify a query or list of uids to fetch [snip] > But I can not see where part of the script is that I have to specify a list > of gi. That very odd. Am I interpreting the script wrong? If you use warnings you'd have seen a problem on the line with the split: @coded_by is empty. This is because you aren't supplying a protein GI. In this case it would be 405831, not 405830. 405830 is already the nucleotide GI so you don't need to do this stuff with coded_by. Use the code in the next section of the FAQ instead: http://www.bioperl.org/wiki/FAQ#How_do_I_get_the_complete_spliced_nucleotide_sequence_from_the_CDS_section.3F From lubapardo at gmail.com Fri Mar 2 08:47:26 2007 From: lubapardo at gmail.com (Luba Pardo) Date: Fri, 2 Mar 2007 14:47:26 +0100 Subject: [Bioperl-l] (no subject) In-Reply-To: <45E82826.30007@sendu.me.uk> References: <58ff33550703010647j1e7908f3sf3c01a74eeceaca4@mail.gmail.com> <45E82826.30007@sendu.me.uk> Message-ID: <58ff33550703020547r26edb40pb9af8dc6556e27d1@mail.gmail.com> Thank you all for your advice. It certaintly made my weekend! Indeed, I could run the example using the RefSeq accesion number. As suggested earlier by Samuel, I run the command over RefSeq (get_Seq_by_id ) method and it worked even without taking out the version last numbers. I am attaching the modified script I run (I checked the translated protein also to verify I got the correct CDS) use Bio::Factory::FTLocationFactory; use Bio::DB::RefSeq; use Bio::DB::GenBank; my $gp = Bio::DB::RefSeq->new; my $gb = Bio::DB::GenBank->new; # factory to turn strings into Bio::Location objects my $loc_factory = Bio::Factory::FTLocationFactory->new; open (IN,"refids.txt") or die "\n I can't open the file\n"; open (OUT, ">>refseqfast.txt") or die "\n I can write it\n"; while () { chomp; my $protein_acc = $_; #print "que onda $protein_acc\n"; #die; my $prot_obj = $gp->get_Seq_by_id($protein_acc); foreach my $feat ( $prot_obj->top_SeqFeatures ) { if ( $feat->primary_tag eq 'CDS' ) { # example: 'coded_by="U05729.1:1..122"' my @coded_by = $feat->each_tag_value('coded_by'); my ($nuc_acc,$loc_str) = split /\:/, $coded_by[0]; print " $nuc_acc\n"; # $nuc_acc = ~s/(\w+).\d+/\1/; print " $nuc_acc\n"; my $nuc_obj = $gb->get_Seq_by_id($nuc_acc); # create Bio::Location object from a string my $loc_object = $loc_factory->from_string($loc_str); # create a Feature object by using a Location my $feat_obj = Bio::SeqFeature::Generic->new(-location =>$loc_object); # associate the Feature object with the nucleotide Seq object $nuc_obj->add_SeqFeature($feat_obj); my $cds_obj = $feat_obj->spliced_seq; print OUT ">",$nuc_acc,"\n",$cds_obj->seq,"\n"; } } } On 02/03/07, Sendu Bala wrote: > > Luba Pardo wrote: > > Dear all, > > Sorry if the questions is too basic but I am trying to learn BioPerl > > modules. So I am trying to get the CDS sequence from a gi identification > > protein using the "features" method. I started to run the example of the > FAQ > > doc (How do I retrieve a nucleotide coding sequence when I have a > protein gi > > number?) > > [ > > http://www.bioperl.org/wiki/FAQ#How_do_I_retrieve_a_nucleotide_coding_sequence_when_I_have_a_protein_gi_number.3F > ] > > [snip] > > my $protein_gi = '405830'; > > my $prot_obj = $gp->get_Seq_by_id($protein_gi);; > > foreach my $feat ( $prot_obj->top_SeqFeatures ) { > > if ( $feat->primary_tag eq 'CDS' ) { > > # example: 'coded_by="U05729.1:1..122"' > > my @coded_by = $feat->each_tag_value('coded_by'); > > my ($nuc_acc, $loc_str) = split /\:/, $coded_by[0]; > [snip] > > The error I got is > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Must specify a query or list of uids to fetch > [snip] > > But I can not see where part of the script is that I have to specify a > list > > of gi. That very odd. Am I interpreting the script wrong? > > If you use warnings you'd have seen a problem on the line with the > split: @coded_by is empty. This is because you aren't supplying a > protein GI. In this case it would be 405831, not 405830. 405830 is > already the nucleotide GI so you don't need to do this stuff with > coded_by. Use the code in the next section of the FAQ instead: > > > http://www.bioperl.org/wiki/FAQ#How_do_I_get_the_complete_spliced_nucleotide_sequence_from_the_CDS_section.3F > > From thiago.venancio at gmail.com Fri Mar 2 06:38:48 2007 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Fri, 2 Mar 2007 08:38:48 -0300 Subject: [Bioperl-l] Bioperl-l] frac_aligned_query returning results >1. Message-ID: <44255ea80703020338t608c1d71k810baf92ede1180e@mail.gmail.com> Hi Sendu and Chris, Thanks for the help. As I mentioned, I have updated my SearchUtils file from: http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/Search/SearchUtils.pm I am also using the lates BioPerl version, installed from CPAN. Please find a buggy blast report attached. In this case, the frac_aligned_query() outputs "1.04", but I have others with " 1.57" for example. Just for a quantitative aspect, I got ">1" values in only 61 / 53,377. The line where I call the function is : print $result->query_name."\t".$hit->frac_aligned_query()."\t".$hit->frac_identical( 'query' )."\n"; Thiago -- "The way to get started is to quit talking and begin doing." Walt Disney ======================== Thiago Motta Venancio, MSc PhD student in Bioinformatics University of Sao Paulo ======================== -------------- next part -------------- BLASTN 2.2.6 [Apr-09-2003] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= AEDES_02704.C (1069 letters) Database: aaegypti.SCAFFOLDS-MASKED.AEDES1.fa 4758 sequences; 1,383,971,543 total letters Searching..........done Score E Sequences producing significant alignments: (bits) Value supercontig:1:supercont1.157:1:2064756:1 supercontig supercont1.157 858 0.0 >supercontig:1:supercont1.157:1:2064756:1 supercontig supercont1.157 Length = 2064756 Score = 858 bits (433), Expect = 0.0 Identities = 448/453 (98%) Strand = Plus / Plus Query: 12 cattttaaatgcatatattgggtgccatcatgactgcctgactcctaaacttgacctcga 71 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 759400 cattttaaatgcatatattgggtgccatcatgactgcctgactcctaaacttgacctcga 759459 Query: 72 ggcctatattctatcccttcttacatgtagtggcttaatcctagattgctggtactcacg 131 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 759460 ggcctatattctatcccttcttacatgtagtggcttaatcctagattgctggtactcacg 759519 Query: 132 gcaccggctattatgctcccgccgcctctagctccaccagcctctgcatcctttctgacg 191 |||||||| |||||||||||||||||| |||||||||||||||| ||||||||||||||| Sbjct: 759520 gcaccggccattatgctcccgccgcctttagctccaccagcctccgcatcctttctgacg 759579 Query: 192 gtctgcgctcgccgtcgtcgcacgctctgtgctcctgcaccacctcagctgaagcggcta 251 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 759580 gtctgcgctcgccgtcgtcgcacgctctgtgctcctgcaccacctcagctgaagcggctg 759639 Query: 252 gctctcgacggtcattcaccgccgacgaccttgcactactgtgttgggaacccctggtcg 311 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 759640 gctctcgacggtcattcaccgccgacgaccttgcactactgtgttgggaacccctggtcg 759699 Query: 312 tgagtcacagtccgctcttcctccgatgtgccaaatgtcaaacgctgatatggctacgga 371 |||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||| Sbjct: 759700 tgagtcacagtccgctcttcctccgatgtgtcaaatgtcaaacgctgatatggctacgga 759759 Query: 372 cgatcctccaccgcggtgttgggaacccctgagtgtagactcgctgggccttccttcgat 431 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 759760 cgatcctccaccgcggtgttgggaacccctgagtgtagactcgctgggccttccttcgat 759819 Query: 432 gagccaaagaacgaaactgcaacgaaaaaaccc 464 ||||||||||||||||||||||||||||||||| Sbjct: 759820 gagccaaagaacgaaactgcaacgaaaaaaccc 759852 Score = 803 bits (405), Expect = 0.0 Identities = 441/453 (97%) Strand = Plus / Plus Query: 12 cattttaaatgcatatattgggtgccatcatgactgcctgactcctaaacttgacctcga 71 ||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||| Sbjct: 768455 cattttaaatgcatatattgggtgccatcatgactacctgactcctaaacttgacctcga 768514 Query: 72 ggcctatattctatcccttcttacatgtagtggcttaatcctagattgctggtactcacg 131 ||||||||||||||| ||||||||||||||||||||||||||||||| |||||||||||| Sbjct: 768515 ggcctatattctatctcttcttacatgtagtggcttaatcctagatttctggtactcacg 768574 Query: 132 gcaccggctattatgctcccgccgcctctagctccaccagcctctgcatcctttctgacg 191 |||||||| |||||||||||||||||| |||||||||||||||| |||| |||||||||| Sbjct: 768575 gcaccggccattatgctcccgccgcctttagctccaccagcctccgcattctttctgacg 768634 Query: 192 gtctgcgctcgccgtcgtcgcacgctctgtgctcctgcaccacctcagctgaagcggcta 251 ||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||| Sbjct: 768635 gtctgcgctcgccgtcgtcgcacgctctgtgctcctgcaccccctcagctgaagcggctg 768694 Query: 252 gctctcgacggtcattcaccgccgacgaccttgcactactgtgttgggaacccctggtcg 311 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 768695 gctctcgacggtcattcaccgccgacgaccttgcactactgtgttgggaacccctggtcg 768754 Query: 312 tgagtcacagtccgctcttcctccgatgtgccaaatgtcaaacgctgatatggctacgga 371 ||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||| Sbjct: 768755 tgagtcacagtccgctcttcctccgatgtgccaaatgtcaactgctgatatggctacgga 768814 Query: 372 cgatcctccaccgcggtgttgggaacccctgagtgtagactcgctgggccttccttcgat 431 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 768815 cgatcctccaccgcggtgttgggaacccctgagtgtagactcgctgggccttccttcgat 768874 Query: 432 gagccaaagaacgaaactgcaacgaaaaaaccc 464 |||||||||||| |||||||||||||||||||| Sbjct: 768875 gagccaaagaacaaaactgcaacgaaaaaaccc 768907 Score = 317 bits (160), Expect = 3e-84 Identities = 170/172 (98%), Gaps = 1/172 (0%) Strand = Plus / Plus Query: 899 ttcctttcaccgtatggccacgtgacgatgatgagctctggcgcctttcgcgtccggatg 958 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 769407 ttcctttcaccgtatggccacgtgacgatgatgagctctggcgcctttcgcgtccggatg 769466 Query: 959 ttttccgccacggccggtaacaactatgtaacctttcactatggaaaactgcaaaagttt 1018 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 769467 ttttccgccacggccggtaacaactatgtaacctttcactatggaaaactgcaaaagttc 769526 Query: 1019 tctttcgccggcttt-aacggcactaactttttgtggcaaaatccgcttatt 1069 ||||||||||||||| |||||||||||||||||||||||||||||||||||| Sbjct: 769527 tctttcgccggctttaaacggcactaactttttgtggcaaaatccgcttatt 769578 Score = 311 bits (157), Expect = 2e-82 Identities = 167/169 (98%), Gaps = 1/169 (0%) Strand = Plus / Plus Query: 902 ctttcaccgtatggccacgtgacgatgatgagctctggcgcctttcgcgtccggatgttt 961 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 760355 ctttcaccgtatggccacgtgacgatgatgagctctggcgcctttcgcgtccggatgttt 760414 Query: 962 tccgccacggccggtaacaactatgtaacctttcactatggaaaactgcaaaagttttct 1021 |||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||| Sbjct: 760415 tccgccacggccggtaacaactatgtaacctttcactatggaaaactgcaaaagttctct 760474 Query: 1022 ttcgccggcttt-aacggcactaactttttgtggcaaaatccgcttatt 1069 |||||||||||| |||||||||||||||||||||||||||||||||||| Sbjct: 760475 ttcgccggctttaaacggcactaactttttgtggcaaaatccgcttatt 760523 Score = 293 bits (148), Expect = 5e-77 Identities = 151/152 (99%) Strand = Plus / Plus Query: 587 ccccctttctttccgtcggaatcacgatccgagcagtagcgtcagcagacaatggatttg 646 ||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||| Sbjct: 769138 ccccctttctttccgtcggaatcgcgatccgagcagtagcgtcagcagacaatggatttg 769197 Query: 647 cggcgttcttattgttgaacaatgcgtgcccgtcctgatgccagcgatgatctcgtgatc 706 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 769198 cggcgttcttattgttgaacaatgcgtgcccgtcctgatgccagcgatgatctcgtgatc 769257 Query: 707 ggacaatcacgtcggtttcgaagcggttggcc 738 |||||||||||||||||||||||||||||||| Sbjct: 769258 ggacaatcacgtcggtttcgaagcggttggcc 769289 Score = 293 bits (148), Expect = 5e-77 Identities = 151/152 (99%) Strand = Plus / Plus Query: 587 ccccctttctttccgtcggaatcacgatccgagcagtagcgtcagcagacaatggatttg 646 ||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||| Sbjct: 760083 ccccctttctttccgtcggaatcgcgatccgagcagtagcgtcagcagacaatggatttg 760142 Query: 647 cggcgttcttattgttgaacaatgcgtgcccgtcctgatgccagcgatgatctcgtgatc 706 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 760143 cggcgttcttattgttgaacaatgcgtgcccgtcctgatgccagcgatgatctcgtgatc 760202 Query: 707 ggacaatcacgtcggtttcgaagcggttggcc 738 |||||||||||||||||||||||||||||||| Sbjct: 760203 ggacaatcacgtcggtttcgaagcggttggcc 760234 Score = 242 bits (122), Expect = 2e-61 Identities = 125/126 (99%) Strand = Plus / Plus Query: 463 cccaataaggggtttctcgtcgtgtgtgggtgcttcgagggaagcgacagtcaacctggc 522 |||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||| Sbjct: 768959 cccaataaggggtttctcgtcgtgtgtgggtgcttcgagggaagcgacagccaacctggc 769018 Query: 523 tcgcttcgaggtcctcctacggctcttgcagtggctatgcgctattggtggatgcaaaat 582 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 769019 tcgcttcgaggtcctcctacggctcttgcagtggctatgcgctattggtggatgcaaaat 769078 Query: 583 cgtccc 588 |||||| Sbjct: 769079 cgtccc 769084 Score = 242 bits (122), Expect = 2e-61 Identities = 125/126 (99%) Strand = Plus / Plus Query: 463 cccaataaggggtttctcgtcgtgtgtgggtgcttcgagggaagcgacagtcaacctggc 522 |||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||| Sbjct: 759904 cccaataaggggtttctcgtcgtgtgtgggtgcttcgagggaagcgacagccaacctggc 759963 Query: 523 tcgcttcgaggtcctcctacggctcttgcagtggctatgcgctattggtggatgcaaaat 582 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 759964 tcgcttcgaggtcctcctacggctcttgcagtggctatgcgctattggtggatgcaaaat 760023 Query: 583 cgtccc 588 |||||| Sbjct: 760024 cgtccc 760029 Score = 123 bits (62), Expect = 1e-25 Identities = 65/66 (98%) Strand = Plus / Plus Query: 737 ccactctaaaatgcttcttcgtgccttctaggtcgtcgatatttgccgcgaaaaccgtga 796 |||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||| Sbjct: 769344 ccactctaaaatgcttcttcgtgccttctaggtcgttgatatttgccgcgaaaaccgtga 769403 Query: 797 tccttc 802 |||||| Sbjct: 769404 tccttc 769409 Score = 121 bits (61), Expect = 4e-25 Identities = 64/65 (98%) Strand = Plus / Plus Query: 737 ccactctaaaatgcttcttcgtgccttctaggtcgtcgatatttgccgcgaaaaccgtga 796 |||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||| Sbjct: 760289 ccactctaaaatgcttcttcgtgccttctaggtcgttgatatttgccgcgaaaaccgtga 760348 Query: 797 tcctt 801 ||||| Sbjct: 760349 tcctt 760353 Score = 105 bits (53), Expect = 2e-20 Identities = 68/73 (93%) Query: 806 gttaaaaataatgaagattacacgtcatctaaactccatttatgcgatgtaaacatgacg 865 ||||||||||||||||||||||| |||| |||||| ||||||||| ||| |||||||||| Sbjct: 1251522 gttaaaaataatgaagattacacatcatgtaaacttcatttatgcaatgcaaacatgacg 1251463 Query: 866 tcatgtaaattta 878 ||||||||||||| Sbjct: 1251462 tcatgtaaattta 1251450 Score = 97.6 bits (49), Expect = 6e-18 Identities = 70/77 (90%) Strand = Plus / Plus Query: 802 cacagttaaaaataatgaagattacacgtcatctaaactccatttatgcgatgtaaacat 861 ||||||||||| |||||||||||| |||| ||||||||| || |||||||||||||||| Sbjct: 1251086 cacagttaaaactaatgaagattaaacgttatctaaactttatatatgcgatgtaaacat 1251145 Query: 862 gacgtcatgtaaattta 878 || |||||||||||||| Sbjct: 1251146 gaagtcatgtaaattta 1251162 Score = 61.9 bits (31), Expect = 3e-07 Identities = 37/39 (94%) Strand = Plus / Minus Query: 802 cacagttaaaaataatgaagattacacgtcatctaaact 840 |||||| ||||||||||||||||||||||||| |||||| Sbjct: 1601368 cacagtaaaaaataatgaagattacacgtcatgtaaact 1601330 From thiago.venancio at gmail.com Fri Mar 2 07:29:43 2007 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Fri, 2 Mar 2007 09:29:43 -0300 Subject: [Bioperl-l] frac aligned query Message-ID: <44255ea80703020429m53a2eb7ek6588011bd8400a0a@mail.gmail.com> Hi Sendu and Chris, Sorry for mailing again, my previous email was blocked by the list (suspicious header). Thanks for the help. As I mentioned, I have updated my SearchUtils file from: http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/Search/SearchUtils.pm I am also using the lates BioPerl version, installed from CPAN. Please find a buggy blast report attached. In this case, the frac_aligned_query() outputs "1.04", but I have others with " 1.57" for example. Just for a quantitative aspect, I got ">1" values in only 61 / 53,377. The line where I call the function is : print $result->query_name."\t".$hit->frac_aligned_query()."\t".$hit->frac_identical( 'query' )."\n"; Thiago On 3/2/07, Sendu Bala wrote: > > Chris Fields wrote: > > This is related to a reported bug: > > > > http://bugzilla.open-bio.org/show_bug.cgi?id=2193 > > > > The relevant code used to tile HSPs is a bit brittle and sometimes > > leads to errors like this. The error (which is actually a thrown > > exception) is wrapped in an eval block and converted to a warn for > > that reason. I'm not familiar with the tiling algorithm used, maybe > > Steve can add some input? > > Depending on what exactly you're talking about here, I may have > re-written that algorithm. Nice to know the bug survived ;) > -- "The way to get started is to quit talking and begin doing." Walt Disney ======================== Thiago Motta Venancio, MSc PhD student in Bioinformatics University of Sao Paulo ======================== -------------- next part -------------- BLASTN 2.2.6 [Apr-09-2003] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= AEDES_02704.C (1069 letters) Database: aaegypti.SCAFFOLDS-MASKED.AEDES1.fa 4758 sequences; 1,383,971,543 total letters Searching..........done Score E Sequences producing significant alignments: (bits) Value supercontig:1:supercont1.157:1:2064756:1 supercontig supercont1.157 858 0.0 >supercontig:1:supercont1.157:1:2064756:1 supercontig supercont1.157 Length = 2064756 Score = 858 bits (433), Expect = 0.0 Identities = 448/453 (98%) Strand = Plus / Plus Query: 12 cattttaaatgcatatattgggtgccatcatgactgcctgactcctaaacttgacctcga 71 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 759400 cattttaaatgcatatattgggtgccatcatgactgcctgactcctaaacttgacctcga 759459 Query: 72 ggcctatattctatcccttcttacatgtagtggcttaatcctagattgctggtactcacg 131 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 759460 ggcctatattctatcccttcttacatgtagtggcttaatcctagattgctggtactcacg 759519 Query: 132 gcaccggctattatgctcccgccgcctctagctccaccagcctctgcatcctttctgacg 191 |||||||| |||||||||||||||||| |||||||||||||||| ||||||||||||||| Sbjct: 759520 gcaccggccattatgctcccgccgcctttagctccaccagcctccgcatcctttctgacg 759579 Query: 192 gtctgcgctcgccgtcgtcgcacgctctgtgctcctgcaccacctcagctgaagcggcta 251 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 759580 gtctgcgctcgccgtcgtcgcacgctctgtgctcctgcaccacctcagctgaagcggctg 759639 Query: 252 gctctcgacggtcattcaccgccgacgaccttgcactactgtgttgggaacccctggtcg 311 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 759640 gctctcgacggtcattcaccgccgacgaccttgcactactgtgttgggaacccctggtcg 759699 Query: 312 tgagtcacagtccgctcttcctccgatgtgccaaatgtcaaacgctgatatggctacgga 371 |||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||| Sbjct: 759700 tgagtcacagtccgctcttcctccgatgtgtcaaatgtcaaacgctgatatggctacgga 759759 Query: 372 cgatcctccaccgcggtgttgggaacccctgagtgtagactcgctgggccttccttcgat 431 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 759760 cgatcctccaccgcggtgttgggaacccctgagtgtagactcgctgggccttccttcgat 759819 Query: 432 gagccaaagaacgaaactgcaacgaaaaaaccc 464 ||||||||||||||||||||||||||||||||| Sbjct: 759820 gagccaaagaacgaaactgcaacgaaaaaaccc 759852 Score = 803 bits (405), Expect = 0.0 Identities = 441/453 (97%) Strand = Plus / Plus Query: 12 cattttaaatgcatatattgggtgccatcatgactgcctgactcctaaacttgacctcga 71 ||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||| Sbjct: 768455 cattttaaatgcatatattgggtgccatcatgactacctgactcctaaacttgacctcga 768514 Query: 72 ggcctatattctatcccttcttacatgtagtggcttaatcctagattgctggtactcacg 131 ||||||||||||||| ||||||||||||||||||||||||||||||| |||||||||||| Sbjct: 768515 ggcctatattctatctcttcttacatgtagtggcttaatcctagatttctggtactcacg 768574 Query: 132 gcaccggctattatgctcccgccgcctctagctccaccagcctctgcatcctttctgacg 191 |||||||| |||||||||||||||||| |||||||||||||||| |||| |||||||||| Sbjct: 768575 gcaccggccattatgctcccgccgcctttagctccaccagcctccgcattctttctgacg 768634 Query: 192 gtctgcgctcgccgtcgtcgcacgctctgtgctcctgcaccacctcagctgaagcggcta 251 ||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||| Sbjct: 768635 gtctgcgctcgccgtcgtcgcacgctctgtgctcctgcaccccctcagctgaagcggctg 768694 Query: 252 gctctcgacggtcattcaccgccgacgaccttgcactactgtgttgggaacccctggtcg 311 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 768695 gctctcgacggtcattcaccgccgacgaccttgcactactgtgttgggaacccctggtcg 768754 Query: 312 tgagtcacagtccgctcttcctccgatgtgccaaatgtcaaacgctgatatggctacgga 371 ||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||| Sbjct: 768755 tgagtcacagtccgctcttcctccgatgtgccaaatgtcaactgctgatatggctacgga 768814 Query: 372 cgatcctccaccgcggtgttgggaacccctgagtgtagactcgctgggccttccttcgat 431 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 768815 cgatcctccaccgcggtgttgggaacccctgagtgtagactcgctgggccttccttcgat 768874 Query: 432 gagccaaagaacgaaactgcaacgaaaaaaccc 464 |||||||||||| |||||||||||||||||||| Sbjct: 768875 gagccaaagaacaaaactgcaacgaaaaaaccc 768907 Score = 317 bits (160), Expect = 3e-84 Identities = 170/172 (98%), Gaps = 1/172 (0%) Strand = Plus / Plus Query: 899 ttcctttcaccgtatggccacgtgacgatgatgagctctggcgcctttcgcgtccggatg 958 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 769407 ttcctttcaccgtatggccacgtgacgatgatgagctctggcgcctttcgcgtccggatg 769466 Query: 959 ttttccgccacggccggtaacaactatgtaacctttcactatggaaaactgcaaaagttt 1018 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 769467 ttttccgccacggccggtaacaactatgtaacctttcactatggaaaactgcaaaagttc 769526 Query: 1019 tctttcgccggcttt-aacggcactaactttttgtggcaaaatccgcttatt 1069 ||||||||||||||| |||||||||||||||||||||||||||||||||||| Sbjct: 769527 tctttcgccggctttaaacggcactaactttttgtggcaaaatccgcttatt 769578 Score = 311 bits (157), Expect = 2e-82 Identities = 167/169 (98%), Gaps = 1/169 (0%) Strand = Plus / Plus Query: 902 ctttcaccgtatggccacgtgacgatgatgagctctggcgcctttcgcgtccggatgttt 961 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 760355 ctttcaccgtatggccacgtgacgatgatgagctctggcgcctttcgcgtccggatgttt 760414 Query: 962 tccgccacggccggtaacaactatgtaacctttcactatggaaaactgcaaaagttttct 1021 |||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||| Sbjct: 760415 tccgccacggccggtaacaactatgtaacctttcactatggaaaactgcaaaagttctct 760474 Query: 1022 ttcgccggcttt-aacggcactaactttttgtggcaaaatccgcttatt 1069 |||||||||||| |||||||||||||||||||||||||||||||||||| Sbjct: 760475 ttcgccggctttaaacggcactaactttttgtggcaaaatccgcttatt 760523 Score = 293 bits (148), Expect = 5e-77 Identities = 151/152 (99%) Strand = Plus / Plus Query: 587 ccccctttctttccgtcggaatcacgatccgagcagtagcgtcagcagacaatggatttg 646 ||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||| Sbjct: 769138 ccccctttctttccgtcggaatcgcgatccgagcagtagcgtcagcagacaatggatttg 769197 Query: 647 cggcgttcttattgttgaacaatgcgtgcccgtcctgatgccagcgatgatctcgtgatc 706 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 769198 cggcgttcttattgttgaacaatgcgtgcccgtcctgatgccagcgatgatctcgtgatc 769257 Query: 707 ggacaatcacgtcggtttcgaagcggttggcc 738 |||||||||||||||||||||||||||||||| Sbjct: 769258 ggacaatcacgtcggtttcgaagcggttggcc 769289 Score = 293 bits (148), Expect = 5e-77 Identities = 151/152 (99%) Strand = Plus / Plus Query: 587 ccccctttctttccgtcggaatcacgatccgagcagtagcgtcagcagacaatggatttg 646 ||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||| Sbjct: 760083 ccccctttctttccgtcggaatcgcgatccgagcagtagcgtcagcagacaatggatttg 760142 Query: 647 cggcgttcttattgttgaacaatgcgtgcccgtcctgatgccagcgatgatctcgtgatc 706 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 760143 cggcgttcttattgttgaacaatgcgtgcccgtcctgatgccagcgatgatctcgtgatc 760202 Query: 707 ggacaatcacgtcggtttcgaagcggttggcc 738 |||||||||||||||||||||||||||||||| Sbjct: 760203 ggacaatcacgtcggtttcgaagcggttggcc 760234 Score = 242 bits (122), Expect = 2e-61 Identities = 125/126 (99%) Strand = Plus / Plus Query: 463 cccaataaggggtttctcgtcgtgtgtgggtgcttcgagggaagcgacagtcaacctggc 522 |||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||| Sbjct: 768959 cccaataaggggtttctcgtcgtgtgtgggtgcttcgagggaagcgacagccaacctggc 769018 Query: 523 tcgcttcgaggtcctcctacggctcttgcagtggctatgcgctattggtggatgcaaaat 582 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 769019 tcgcttcgaggtcctcctacggctcttgcagtggctatgcgctattggtggatgcaaaat 769078 Query: 583 cgtccc 588 |||||| Sbjct: 769079 cgtccc 769084 Score = 242 bits (122), Expect = 2e-61 Identities = 125/126 (99%) Strand = Plus / Plus Query: 463 cccaataaggggtttctcgtcgtgtgtgggtgcttcgagggaagcgacagtcaacctggc 522 |||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||| Sbjct: 759904 cccaataaggggtttctcgtcgtgtgtgggtgcttcgagggaagcgacagccaacctggc 759963 Query: 523 tcgcttcgaggtcctcctacggctcttgcagtggctatgcgctattggtggatgcaaaat 582 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 759964 tcgcttcgaggtcctcctacggctcttgcagtggctatgcgctattggtggatgcaaaat 760023 Query: 583 cgtccc 588 |||||| Sbjct: 760024 cgtccc 760029 Score = 123 bits (62), Expect = 1e-25 Identities = 65/66 (98%) Strand = Plus / Plus Query: 737 ccactctaaaatgcttcttcgtgccttctaggtcgtcgatatttgccgcgaaaaccgtga 796 |||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||| Sbjct: 769344 ccactctaaaatgcttcttcgtgccttctaggtcgttgatatttgccgcgaaaaccgtga 769403 Query: 797 tccttc 802 |||||| Sbjct: 769404 tccttc 769409 Score = 121 bits (61), Expect = 4e-25 Identities = 64/65 (98%) Strand = Plus / Plus Query: 737 ccactctaaaatgcttcttcgtgccttctaggtcgtcgatatttgccgcgaaaaccgtga 796 |||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||| Sbjct: 760289 ccactctaaaatgcttcttcgtgccttctaggtcgttgatatttgccgcgaaaaccgtga 760348 Query: 797 tcctt 801 ||||| Sbjct: 760349 tcctt 760353 Score = 105 bits (53), Expect = 2e-20 Identities = 68/73 (93%) Query: 806 gttaaaaataatgaagattacacgtcatctaaactccatttatgcgatgtaaacatgacg 865 ||||||||||||||||||||||| |||| |||||| ||||||||| ||| |||||||||| Sbjct: 1251522 gttaaaaataatgaagattacacatcatgtaaacttcatttatgcaatgcaaacatgacg 1251463 Query: 866 tcatgtaaattta 878 ||||||||||||| Sbjct: 1251462 tcatgtaaattta 1251450 Score = 97.6 bits (49), Expect = 6e-18 Identities = 70/77 (90%) Strand = Plus / Plus Query: 802 cacagttaaaaataatgaagattacacgtcatctaaactccatttatgcgatgtaaacat 861 ||||||||||| |||||||||||| |||| ||||||||| || |||||||||||||||| Sbjct: 1251086 cacagttaaaactaatgaagattaaacgttatctaaactttatatatgcgatgtaaacat 1251145 Query: 862 gacgtcatgtaaattta 878 || |||||||||||||| Sbjct: 1251146 gaagtcatgtaaattta 1251162 Score = 61.9 bits (31), Expect = 3e-07 Identities = 37/39 (94%) Strand = Plus / Minus Query: 802 cacagttaaaaataatgaagattacacgtcatctaaact 840 |||||| ||||||||||||||||||||||||| |||||| Sbjct: 1601368 cacagtaaaaaataatgaagattacacgtcatgtaaact 1601330 From cjfields at uiuc.edu Fri Mar 2 09:35:34 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 2 Mar 2007 08:35:34 -0600 Subject: [Bioperl-l] Bio::SeqIO::FTHelper In-Reply-To: <200703020808.15664.heikki@sanbi.ac.za> References: <200703011302.30855.heikki@sanbi.ac.za> <8AA2B586-4C2B-4379-BFF0-0CEB15FAE68E@uiuc.edu> <200703020808.15664.heikki@sanbi.ac.za> Message-ID: <7EC38884-D3E1-470D-9FAC-548797433B9D@uiuc.edu> The current parsers are slightly faster, but not enough to make a huge difference unless you're parsing thousands of sequences. However, it does demonstrate that a good deal of the performance issues stem from object creation and not parsing, an issue that is already known. For instance, if you do everything up to (but skip) instantiation of an object, like a SeqFeature/Annotation/Species, the parsing speeds up dramatically dependent on the number of objects created. I also saw significant increases in speed when using FTHelper (instead of SeqFeatures) or Bio::Taxon (instead of Bio::Species), so lighter objects definitely help. I basically just separate the two key steps into two distinct tasks (driver and handler); I haven't thought much about validation though I would probably separate that into a third task. Regardless, the current drivers are flexible enough to deal with the occasional oddity and not die. It's much easier to maintain and extend; for instance if you wanted to develop lightweight objects it's now easier to accomplish (i.e. rewrite/overload a handler vs. rewrite next_seq () ), and you can separately develop a faster driver via next_seq() as long as it threw the same data structure. Multiple parsers can also use the same handler. I currently have GenBank/EMBL/SwissProt all sharing the same handler and passing all tests. chris On Mar 2, 2007, at 12:08 AM, Heikki Lehvaslaiho wrote: > This sounds great. Is the speed increase noticeable? > > -Heikki > > > On Thursday 01 March 2007 17:24:03 Chris Fields wrote: >> I do have a rough outline of what I think could be done: >> >> http://www.bioperl.org/wiki/Handler-based_SeqIO_parsers >> >> where you could switch out handlers to deal with incoming data >> chunks. Any suggestions there are welcome. >> >> I'll probably commit examples of the above in the next week or two >> (GenBank, EMBL, Swiss parsers using the same handlers) which don't >> use FTHelper. So far I have all three passing tests based on >> genbank/ >> embl/swiss.t but they need a few more tweaks before I commit. >> >> chris >> >> On Mar 1, 2007, at 5:02 AM, Heikki Lehvaslaiho wrote: >>> Chris, >>> >>> It was meant to collect code that was common to all three main >>> databases using >>> similar feature tables. >>> >>> Now might be the time to optimise the parsing speed by removing it. >>> Do you >>> have a plan how to do it? >>> >>> -Heikki >>> >>> On Tuesday 27 February 2007 22:57:40 Chris Fields wrote: >>>> Could anyone tell me what FTHelper is used for? From what I gather >>>> it rolls up seqfeature data into a lightweight object but then >>>> creates a SeqFeature::Generic anyway (at least for GenBank/EMBL/ >>>> Swiss), which seems to be a waste of memory and time. Is there >>>> something I'm missing (besides my sanity of course)? >>>> >>>> chris >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> ______ _/ _/ >>> _____________________________________________________ >>> _/ _/ >>> _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za >>> _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho >>> _/ _/ _/ SANBI, South African National Bioinformatics >>> Institute >>> _/ _/ _/ University of Western Cape, South Africa >>> _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 >>> ___ _/_/_/_/_/ >>> ________________________________________________________ >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From Kevin.M.Brown at asu.edu Fri Mar 2 10:21:16 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Fri, 2 Mar 2007 08:21:16 -0700 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <6dce9a0b0703011239m7065a570l6d19e6c7065fca45@mail.gmail.com> References: <45E5F43C.9080902@sendu.me.uk><1A4207F8295607498283FE9E93B775B402CC9D60@EX02.asurite.ad.asu.edu><45E7335F.8070102@bms.com> <45E7359E.5030104@bms.com> <6dce9a0b0703011239m7065a570l6d19e6c7065fca45@mail.gmail.com> Message-ID: <1A4207F8295607498283FE9E93B775B402CC9E79@EX02.asurite.ad.asu.edu> > You need to have GD::SVG installed and then instantiate the > panel with: > -image_class=>'GD::SVG' If this is the case, then why have an SVG method in Bio::Graphics::Panel if it doesn't do this for you. Either the method should be removed and the normal $panel->gd method should be called to get an image out or calling that method should setup and create the SVG for the user. Either way I don't see anything in the documentation or wiki that points out this "gotcha". From stefan.kirov at bms.com Fri Mar 2 10:41:34 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Fri, 02 Mar 2007 10:41:34 -0500 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <1A4207F8295607498283FE9E93B775B402CC9E79@EX02.asurite.ad.asu.edu> References: <45E5F43C.9080902@sendu.me.uk> <1A4207F8295607498283FE9E93B775B402CC9D60@EX02.asurite.ad.asu.edu> <45E7335F.8070102@bms.com> <45E7359E.5030104@bms.com> <6dce9a0b0703011239m7065a570l6d19e6c7065fca45@mail.gmail.com> <1A4207F8295607498283FE9E93B775B402CC9E79@EX02.asurite.ad.asu.edu> Message-ID: <45E845AE.6060400@bms.com> Kevin Brown wrote: >> You need to have GD::SVG installed and then instantiate the >> panel with: >> -image_class=>'GD::SVG' >> > > If this is the case, then why have an SVG method in Bio::Graphics::Panel > if it doesn't do this for you. Either the method should be removed and > the normal $panel->gd method should be called to get an image out or > calling that method should setup and create the SVG for the user. > Either way I don't see anything in the documentation or wiki that points > out this "gotcha". > > I don't think it is that easy, since the you cannot simply switch between graphics libraries, but perhaps svg method should check the class that was used and throw an error if it is not GD::SVG. Stefan From bix at sendu.me.uk Fri Mar 2 11:05:16 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 02 Mar 2007 16:05:16 +0000 Subject: [Bioperl-l] frac_aligned_query returning results >1. In-Reply-To: <44255ea80703020336y7e423b94rd07acd380fe4b8fd@mail.gmail.com> References: <44255ea80703011002l245e9576s66319ee695d3bd5b@mail.gmail.com> <8D7B0767-46A6-4083-B94E-D6490B241B84@uiuc.edu> <45E809A5.9060407@sendu.me.uk> <44255ea80703020336y7e423b94rd07acd380fe4b8fd@mail.gmail.com> Message-ID: <45E84B3C.5000402@sendu.me.uk> Thiago Venancio wrote: > Hi Sendu and Chris, > > Thanks for the help. > As I mentioned, I have updated my SearchUtils file from: > http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/Search/SearchUtils.pm > > > I am also using the lates BioPerl version, installed from CPAN. > > Please find a buggy blast report attached. > In this case, the frac_aligned_query() outputs "1.04", but I have others > with " 1.57" for example. > > Just for a quantitative aspect, I got ">1" values in only 61 / 53,377. Many thanks for that. I've committed another fix for SearchUtils so please get revision 1.23 and try again. Hopefully all 61 will no longer be >1, but if any are please send me sample blast files again. For anyone interested, the bug was due to a completely unbelievable oversight on my part in the contig merging algorithm: I forgot to deal with contigs that were fully contained by others. Wow! From johnsonm at gmail.com Fri Mar 2 11:10:55 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Fri, 2 Mar 2007 10:10:55 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: <45E75C8E.7010809@sendu.me.uk> References: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> <45E61AA9.9030906@sendu.me.uk> <45E75C8E.7010809@sendu.me.uk> Message-ID: > I think a 4in1 would still be possible. Presumably at some point you > know which one you will run, so let the user set everything in the > single new() even if it doesn't make sense, but then form argument > strings with Something like that occurred to me while driving home last night. That ought to separate things cleanly enough, especially if I validate the options against the selected program. I'm wasn't really thrilled with the idea of code duplication between multiple modules, either. All that said, if these Glimmer things are different programs with > different uses (and not simply different versions of the same thing with > the same function), by all means make separate modules. > It's a 'family' of gene predictors, two eukaryotic, two prokaryotic. They're just similar enough to need similar solutions, and just different enough to be slightly annoying. 8) From thiago.venancio at gmail.com Fri Mar 2 11:14:20 2007 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Fri, 2 Mar 2007 13:14:20 -0300 Subject: [Bioperl-l] frac_aligned_query returning results >1. In-Reply-To: <45E84B3C.5000402@sendu.me.uk> References: <44255ea80703011002l245e9576s66319ee695d3bd5b@mail.gmail.com> <8D7B0767-46A6-4083-B94E-D6490B241B84@uiuc.edu> <45E809A5.9060407@sendu.me.uk> <44255ea80703020336y7e423b94rd07acd380fe4b8fd@mail.gmail.com> <45E84B3C.5000402@sendu.me.uk> Message-ID: <44255ea80703020814v31495221i72f65db532c0dd9b@mail.gmail.com> Hi Sendu, Great to know you fixed the problem. I have updated the SearchUtils and seems to be correct now. Best! Thiago On 3/2/07, Sendu Bala wrote: > > Thiago Venancio wrote: > > Hi Sendu and Chris, > > > > Thanks for the help. > > As I mentioned, I have updated my SearchUtils file from: > > > http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/Search/SearchUtils.pm > > < > http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/Search/SearchUtils.pm > > > > > > I am also using the lates BioPerl version, installed from CPAN. > > > > Please find a buggy blast report attached. > > In this case, the frac_aligned_query() outputs "1.04", but I have others > > with " 1.57" for example. > > > > Just for a quantitative aspect, I got ">1" values in only 61 / 53,377. > > Many thanks for that. > > I've committed another fix for SearchUtils so please get revision 1.23 > and try again. Hopefully all 61 will no longer be >1, but if any are > please send me sample blast files again. > > For anyone interested, the bug was due to a completely unbelievable > oversight on my part in the contig merging algorithm: I forgot to deal > with contigs that were fully contained by others. Wow! > -- "The way to get started is to quit talking and begin doing." Walt Disney ======================== Thiago Motta Venancio, MSc PhD student in Bioinformatics University of Sao Paulo ======================== From johnsonm at gmail.com Fri Mar 2 11:15:34 2007 From: johnsonm at gmail.com (Mark Johnso