From bioperlanand at yahoo.com Mon May 1 14:36:20 2006 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Mon, 1 May 2006 11:36:20 -0700 (PDT) Subject: [Bioperl-l] how to obtain GIs from clone_ids Message-ID: <20060501183620.85791.qmail@web37901.mail.mud.yahoo.com> Hi everybody, I have a file containing clone_ids (from the Features annotation section of a GenBank entry) ------------------------------------------------------------ FEATURES Location/Qualifiers source 1..707 /clone="C0005918b04" ------------------------------------------------------------ Is there a way in Bioperl to send a query over the internet (one clone_id at a time) and get out just the GI number for that clone_id? Any suggestions.. Thanks in advance. Anand --------------------------------- Blab-away for as little as 1?/min. Make PC-to-Phone Calls using Yahoo! Messenger with Voice. From cuiw at mail.nih.gov Mon May 1 15:39:01 2006 From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F]) Date: Mon, 1 May 2006 15:39:01 -0400 Subject: [Bioperl-l] how to obtain GIs from clone_ids In-Reply-To: <20060501183620.85791.qmail@web37901.mail.mud.yahoo.com> Message-ID: use strict; use Bio::DB::Query::GenBank; my $query_string = 'EST["C0005918b04"]'; my $query = Bio::DB::Query::GenBank->new(-db=>'nucleotide', -query=>$query_string, ); my $count = $query->count; my @ids = $query->ids; for (@ids) { print; } -----Original Message----- From: Anand Venkatraman [mailto:bioperlanand at yahoo.com] Sent: Monday, May 01, 2006 2:36 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] how to obtain GIs from clone_ids Hi everybody, I have a file containing clone_ids (from the Features annotation section of a GenBank entry) ------------------------------------------------------------ FEATURES Location/Qualifiers source 1..707 /clone="C0005918b04" ------------------------------------------------------------ Is there a way in Bioperl to send a query over the internet (one clone_id at a time) and get out just the GI number for that clone_id? Any suggestions.. Thanks in advance. Anand --------------------------------- Blab-away for as little as 1?/min. Make PC-to-Phone Calls using Yahoo! Messenger with Voice. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From s.ryazansky at gmail.com Mon May 1 17:55:13 2006 From: s.ryazansky at gmail.com (Sergei Ryazansky) Date: Mon, 1 May 2006 21:55:13 +0000 (UTC) Subject: [Bioperl-l] blast program to run locally on windows References: <007c01c66883$61f29490$15327e82@pyrimidine> <20060425215433.35436.qmail@web36613.mail.mud.yahoo.com> Message-ID: Hi, Can you post your formatdb.log file here? From cjfields at uiuc.edu Tue May 2 00:15:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 1 May 2006 23:15:19 -0500 Subject: [Bioperl-l] blast program to run locally on windows In-Reply-To: References: <007c01c66883$61f29490$15327e82@pyrimidine> <20060425215433.35436.qmail@web36613.mail.mud.yahoo.com> Message-ID: We managed to work our way through it. He hadn't set ncbi.ini to the correct directories; the database was formatted correctly. Chris On May 1, 2006, at 4:55 PM, Sergei Ryazansky wrote: > Hi, > Can you post your formatdb.log file here? > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue May 2 12:19:34 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 2 May 2006 11:19:34 -0500 Subject: [Bioperl-l] Bio::DB::GenBank and complexity Message-ID: <000901c66e04$33e07370$15327e82@pyrimidine> I ran into some wonkiness with using extra parameters ('seq_start', 'seq_stop', 'strand', and 'complexity') with Bio::DB::GenBank that I have gone through, fixed, and committed. I also have added a few tests to DB.t for everything (all changes were in Bio::DB::WebDBSeqI and Bio::DB::NCBIHelper). The 'complexity' tag is the strangest, though I did manage to get it added as well (with tests). This is how NCBI defines complexity: complexity regulates the display: 0 - get the whole blob 1 - get the bioseq for gi of interest (default in Entrez) 2 - get the minimal bioseq-set containing the gi of interest 3 - get the minimal nuc-prot containing the gi of interest 4 - get the minimal pub-set containing the gi of interest Here's my quandary; when setting complexity to '0', you get a glob back (the main sequence as well as any subsequences, such as CDS); this is in essence a sequence stream with multiple alphabet types. So, I now have it set up to do this: my $factory = Bio::DB::GenBank->new(-format => 'fasta', -complexity => 0 ); my $seqin = $factory->get_Seq_by_acc($acc); while (my $seq = $seqin->next_seq) { $seqout->write_seq($seq); } since I thought returning an array would be horrendously expensive on memory, esp. with larger sequences. Currently this is only set up for sequences which are retrieved when complexity is set to '0' so it's a pretty unique case. Regardless, I'm worried that, since users expect a Bio::Seq object instead of a Bio::SeqIO object here, it will cause a lot of confusion with the API. Any suggestions/gripes? Chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From mamillerpa at yahoo.com Tue May 2 07:41:01 2006 From: mamillerpa at yahoo.com (Mark A. Miller) Date: Tue, 2 May 2006 04:41:01 -0700 (PDT) Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines Message-ID: <20060502114101.29745.qmail@web50409.mail.yahoo.com> Hello all. I have a recently donwloaded UniProt/TrEMBL flat file. I am trying to make FASTA subset files for some bacterial strains. I haven't been able to parse out the strain information from the OS or RC lines. These lines typically look like: OS Somegenus somespecies subsp. somesubspecies strain ABC123. RC STRAIN=ABC123. I'm not especiialy good with Perl, and I'm definitely weak when it comes to OOP. I have included some code I pasted together from various pages on the bioperl wiki. In addition to the wiki, I have been making use of www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html The code I have so far reports the species but not the subspecies or variant. I have also tried to walk through all of the feature, annotation and reference objects but I still can't seem to parse out the information I need. (For brevity, the example I'm including below only lists the code I used for the annotation objects.) Also, this code only prints the information... I know that I'll have to write a FASTA sequence object seperately. Any suggestions? Thanks, Mark --- --- --- #!/usr/bin/perl use Bio::SeqIO; my $usage = "getaccs.pl file format\n"; my $file = shift or die $usage; my $format = shift or die $usage; my $inseq = Bio::SeqIO->new(-file => "<$file", -format => $format ); while (my $seq = $inseq->next_seq) { my $species_object = $seq->species; my $species_string = $species_object->species; my $variant_string = $species_object->variant; my $common_string = $species_object->common_name; my $sub_string = $species_object->sub_species; my $binomial = $species_object->binomial('FULL'); print "display ",$seq->display_id,"\n"; print "accession ",$seq->accession_number,"\n"; print "desc ",$seq->desc,"\n"; print "species ",$species_string,"\n"; print "variant ",$variant_string,"\n"; print "common ",$common_string,"\n"; print "sub ",$sub_string,"\n"; print "binomial ",$binomial,"\n"; print $seq->seq,"\n"; my $anno_collection = $seq->annotation; for my $key ( $anno_collection->get_all_annotation_keys ) { my @annotations = $anno_collection->get_Annotations($key); for my $value ( @annotations ) { print "tagname : ", $value->tagname, "\n"; # $value is an Bio::Annotation, and has an "as_text" method print " annotation value: ", $value->as_text, "\n"; if ($value->tagname eq "reference") { my $hash_ref = $value->hash_tree; for my $key (keys %{$hash_ref}) { print $key,": ",$hash_ref->{$key},"\n"; } } } } print "\n"; } exit; --- --- --- --- --- --- --- --- Mark A. Miller __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cjfields at uiuc.edu Tue May 2 14:01:58 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 2 May 2006 13:01:58 -0500 Subject: [Bioperl-l] Bio::DB::GenBank and complexity In-Reply-To: <000901c66e04$33e07370$15327e82@pyrimidine> Message-ID: <000a01c66e12$8131a960$15327e82@pyrimidine> I hate responding to my own post! Just wanted to add that I'm adding a warnings for the get_Seq* methods to use the approp. get_Stream* method when complexity == 0 before returning the Bio::SeqIO object. CJF > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Tuesday, May 02, 2006 11:20 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::DB::GenBank and complexity > > I ran into some wonkiness with using extra parameters ('seq_start', > 'seq_stop', 'strand', and 'complexity') with Bio::DB::GenBank that I have > gone through, fixed, and committed. I also have added a few tests to DB.t > for everything (all changes were in Bio::DB::WebDBSeqI and > Bio::DB::NCBIHelper). The 'complexity' tag is the strangest, though I did > manage to get it added as well (with tests). This is how NCBI defines > complexity: > > complexity regulates the display: > 0 - get the whole blob > 1 - get the bioseq for gi of interest (default in Entrez) > 2 - get the minimal bioseq-set containing the gi of interest > 3 - get the minimal nuc-prot containing the gi of interest > 4 - get the minimal pub-set containing the gi of interest > > Here's my quandary; when setting complexity to '0', you get a glob back > (the > main sequence as well as any subsequences, such as CDS); this is in > essence > a sequence stream with multiple alphabet types. So, I now have it set up > to > do this: > > my $factory = Bio::DB::GenBank->new(-format => 'fasta', > -complexity => 0 > ); > > my $seqin = $factory->get_Seq_by_acc($acc); > > while (my $seq = $seqin->next_seq) { > $seqout->write_seq($seq); > } > > since I thought returning an array would be horrendously expensive on > memory, esp. with larger sequences. Currently this is only set up for > sequences which are retrieved when complexity is set to '0' so it's a > pretty > unique case. Regardless, I'm worried that, since users expect a Bio::Seq > object instead of a Bio::SeqIO object here, it will cause a lot of > confusion > with the API. Any suggestions/gripes? > > Chris > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Tue May 2 14:36:08 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue, 2 May 2006 14:36:08 -0400 Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines In-Reply-To: <20060502114101.29745.qmail@web50409.mail.yahoo.com> References: <20060502114101.29745.qmail@web50409.mail.yahoo.com> Message-ID: <7B49D031-9F74-43C3-AA4F-2AE115BB843D@duke.edu> This is really a limitation of the EMBL/GenBank format See this thread: http://lists.open-bio.org/pipermail/bioperl-l/2006-March/021068.html or on GMANE http://comments.gmane.org/gmane.comp.lang.perl.bio.general/10557 I don't know if any of this has been resolved really so hopefully James will speak up if he's implemented anything. -jason On May 2, 2006, at 7:41 AM, Mark A. Miller wrote: > Hello all. > > I have a recently donwloaded UniProt/TrEMBL flat file. I am trying to > make FASTA subset files for some bacterial strains. I haven't been > able to parse out the strain information from the OS or RC lines. > These lines typically look like: > > OS Somegenus somespecies subsp. somesubspecies strain ABC123. > RC STRAIN=ABC123. > > I'm not especiialy good with Perl, and I'm definitely weak when it > comes to OOP. > > I have included some code I pasted together from various pages on the > bioperl wiki. In addition to the wiki, I have been making use of > www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html > > The code I have so far reports the species but not the subspecies or > variant. I have also tried to walk through all of the feature, > annotation and reference objects but I still can't seem to parse out > the information I need. (For brevity, the example I'm including below > only lists the code I used for the annotation objects.) Also, this > code only prints the information... I know that I'll have to write a > FASTA sequence object seperately. > > Any suggestions? > > Thanks, > Mark > > --- --- --- > > > #!/usr/bin/perl > > > > use Bio::SeqIO; > > > > my $usage = "getaccs.pl file format\n"; > > my $file = shift or die $usage; > > my $format = shift or die $usage; > > > > my $inseq = Bio::SeqIO->new(-file => "<$file", > > -format => $format ); > > > > while (my $seq = $inseq->next_seq) { > > > > my $species_object = $seq->species; > > my $species_string = $species_object->species; > > my $variant_string = $species_object->variant; > > my $common_string = $species_object->common_name; > > my $sub_string = $species_object->sub_species; > > my $binomial = $species_object->binomial('FULL'); > > > > print "display ",$seq->display_id,"\n"; > > print "accession ",$seq->accession_number,"\n"; > > print "desc ",$seq->desc,"\n"; > > > > print "species ",$species_string,"\n"; > > print "variant ",$variant_string,"\n"; > > print "common ",$common_string,"\n"; > > print "sub ",$sub_string,"\n"; > > print "binomial ",$binomial,"\n"; > > > > print $seq->seq,"\n"; > > > > my $anno_collection = $seq->annotation; > > for my $key ( $anno_collection->get_all_annotation_keys ) { > > my @annotations = $anno_collection->get_Annotations($key); > > for my $value ( @annotations ) { > > print "tagname : ", $value->tagname, "\n"; > > # $value is an Bio::Annotation, and has an "as_text" method > > print " annotation value: ", $value->as_text, "\n"; > > > > if ($value->tagname eq "reference") { > > my $hash_ref = $value->hash_tree; > > for my $key (keys %{$hash_ref}) { > > print $key,": ",$hash_ref->{$key},"\n"; > > } > > } > > } > > } > > print "\n"; > > } > > exit; > > > > > > --- --- --- --- --- --- --- --- > > Mark A. Miller > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From mblanche at berkeley.edu Tue May 2 15:30:49 2006 From: mblanche at berkeley.edu (Marco Blanchette) Date: Tue, 02 May 2006 12:30:49 -0700 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF Message-ID: Dear all-- I have been trying to use the intersection function to extract overlapping region from alternatively spliced exons as in the following script. The returned object from the 'my $overlap = $exon1->intersection($exon2);' is actually loosing the strand of $exon1 if $exon1 is from the negative strand. Is this behavior expected? Should I check the strand of $exon1 before working on the object return by any Bio::RangeI function? Many thanks #!/usr/bin/perl use strict; use warnings; use Bio::DB::GFF; MAIN:{ my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', -dsn => 'dbi:mysql:database=dmel_43_LS;host=riolab.net', -user => 'guest'); my $test_db = $db->segment('4'); # Load up the exons into $exons_p for my $gene ($test_db->features(-types => 'gene')){ my $exons_p = extractExons($gene); cluster($exons_p) unless ($#{$exons_p} == -1); } } sub extractExons { my $gene = shift; my %ex_list; my @tcs = $gene->features( -type =>'processed_transcript', -attributes =>{Gene => $gene->group}); for my $tc (@tcs){ my @exons = $tc->features (-type => 'exon', -attributes => {Parent => $tc->group} ); for (@exons){ my $ex_id = $_->id; $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); } } my @values = values %ex_list; return(\@values); } sub cluster { my $exons_p = shift; for (my $s = 0; $s <= $#{$exons_p}; $s++){ for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ my $exon1 = $exons_p->[$s]; my $exon2 = $exons_p->[$t]; if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ my $overlap = $exon1->intersection($exon2); print "===\n";; print "ex1\n", $exon1->seq, "\n"; print "ex2\n", $exon2->seq, "\n"; print "overlap\n", $overlap->seq, "\n"; } } } } ______________________________ Marco Blanchette, Ph.D. mblanche at uclink.berkeley.edu Donald C. Rio's lab Department of Molecular and Cell Biology 16 Barker Hall University of California Berkeley, CA 94720-3204 Tel: (510) 642-1084 Cell: (510) 847-0996 Fax: (510) 642-6062 -- From osborne1 at optonline.net Tue May 2 16:17:29 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 02 May 2006 16:17:29 -0400 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF In-Reply-To: Message-ID: Marco, Yes, this is how intersection() is supposed to work. If both of the Range objects have the same strand then the strand information is returned as part of the result but if they aren't on the same strand then no strand information is returned. Brian O. On 5/2/06 3:30 PM, "Marco Blanchette" wrote: > Dear all-- > > I have been trying to use the intersection function to extract overlapping > region from alternatively spliced exons as in the following script. The > returned object from the 'my $overlap = $exon1->intersection($exon2);' is > actually loosing the strand of $exon1 if $exon1 is from the negative strand. > Is this behavior expected? Should I check the strand of $exon1 before > working on the object return by any Bio::RangeI function? > > Many thanks > > #!/usr/bin/perl > use strict; > use warnings; > use Bio::DB::GFF; > > MAIN:{ > > my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', > -dsn => > 'dbi:mysql:database=dmel_43_LS;host=riolab.net', > -user => 'guest'); > my $test_db = $db->segment('4'); > > # Load up the exons into $exons_p > for my $gene ($test_db->features(-types => 'gene')){ > > my $exons_p = extractExons($gene); > > cluster($exons_p) unless ($#{$exons_p} == -1); > > } > } > > sub extractExons { > my $gene = shift; > my %ex_list; > my @tcs = $gene->features( -type =>'processed_transcript', > -attributes =>{Gene => $gene->group}); > > for my $tc (@tcs){ > my @exons = $tc->features (-type => 'exon', > -attributes => {Parent => $tc->group} > ); > > for (@exons){ > my $ex_id = $_->id; > $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); > > } > > } > my @values = values %ex_list; > return(\@values); > } > > sub cluster { > my $exons_p = shift; > > for (my $s = 0; $s <= $#{$exons_p}; $s++){ > for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ > my $exon1 = $exons_p->[$s]; > my $exon2 = $exons_p->[$t]; > > if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ > > my $overlap = $exon1->intersection($exon2); > > print "===\n";; > print "ex1\n", $exon1->seq, "\n"; > print "ex2\n", $exon2->seq, "\n"; > print "overlap\n", $overlap->seq, "\n"; > } > } > } > } > ______________________________ > Marco Blanchette, Ph.D. > > mblanche at uclink.berkeley.edu > > Donald C. Rio's lab > Department of Molecular and Cell Biology > 16 Barker Hall > University of California > Berkeley, CA 94720-3204 > > Tel: (510) 642-1084 > Cell: (510) 847-0996 > Fax: (510) 642-6062 From mblanche at berkeley.edu Tue May 2 16:32:58 2006 From: mblanche at berkeley.edu (Marco Blanchette) Date: Tue, 02 May 2006 13:32:58 -0700 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF In-Reply-To: Message-ID: Brian-- Even when both elements of intersection() are from the negative strand, the return object is from the positive strand and $overlap is actually the revervese complement of the intersection between the 2 exons. Here is part of the output from the script below: === ex1 Strand: -1 CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG ex2 Strand: -1 CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAACCCGT CAAATCG overlap Strand: 1 CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTATTTT TGCCGACTGCCATGTTCAACTAATAAACCGG AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG ... If both are from the positive strand, the return object is positive as in: === ex1 Strand: 1 CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCTTTGT TTTGTGCCTGTTTCAGTATAAATTAATTATG CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGATGAAT AAATATACATATATGCAACATATATAACTTC CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGGCAGA GCAGCGATCAAAGCTGCGTTCGGTACTCGTT GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT ex2 Strand: 1 ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG overlap Strand: 1 CAACGCAGACGTG Is there something I am missing? Here is the script generating the output Many thanks all... Marco use strict; use warnings; use Bio::DB::GFF; MAIN:{ my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', -dsn => 'dbi:mysql:database=dmel_43_LS;host=riolab.net', -user => 'guest'); my $test_db = $db->segment('4'); # Load up the exons into $exons_p for my $gene ($test_db->features(-types => 'gene')){ my $exons_p = extractExons($gene); cluster($exons_p) unless ($#{$exons_p} == -1); } } sub extractExons { my $gene = shift; my %ex_list; my @tcs = $gene->features( -type =>'processed_transcript', -attributes =>{Gene => $gene->group}); for my $tc (@tcs){ my @exons = $tc->features (-type => 'exon', -attributes => {Parent => $tc->group} ); for (@exons){ my $ex_id = $_->id; $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); } } my @values = values %ex_list; return(\@values); } sub cluster { my $exons_p = shift; for (my $s = 0; $s <= $#{$exons_p}; $s++){ for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ my $exon1 = $exons_p->[$s]; my $exon2 = $exons_p->[$t]; if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ my $overlap = $exon1->intersection($exon2); print "===\n";; print "ex1\tStrand: ", $exon1->strand, "\n", $exon1->seq, "\n"; print "ex2\tStrand: ", $exon2->strand, "\n", $exon2->seq, "\n"; print "overlap\tStrand: ", $overlap->strand, "\n", $overlap->seq, "\n"; } } } } On 5/2/06 13:17, "Brian Osborne" wrote: > Marco, > > Yes, this is how intersection() is supposed to work. If both of the Range > objects have the same strand then the strand information is returned as part > of the result but if they aren't on the same strand then no strand > information is returned. > > Brian O. > > > On 5/2/06 3:30 PM, "Marco Blanchette" wrote: > >> Dear all-- >> >> I have been trying to use the intersection function to extract overlapping >> region from alternatively spliced exons as in the following script. The >> returned object from the 'my $overlap = $exon1->intersection($exon2);' is >> actually loosing the strand of $exon1 if $exon1 is from the negative strand. >> Is this behavior expected? Should I check the strand of $exon1 before >> working on the object return by any Bio::RangeI function? >> >> Many thanks >> >> #!/usr/bin/perl >> use strict; >> use warnings; >> use Bio::DB::GFF; >> >> MAIN:{ >> >> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', >> -dsn => >> 'dbi:mysql:database=dmel_43_LS;host=riolab.net', >> -user => 'guest'); >> my $test_db = $db->segment('4'); >> >> # Load up the exons into $exons_p >> for my $gene ($test_db->features(-types => 'gene')){ >> >> my $exons_p = extractExons($gene); >> >> cluster($exons_p) unless ($#{$exons_p} == -1); >> >> } >> } >> >> sub extractExons { >> my $gene = shift; >> my %ex_list; >> my @tcs = $gene->features( -type =>'processed_transcript', >> -attributes =>{Gene => $gene->group}); >> >> for my $tc (@tcs){ >> my @exons = $tc->features (-type => 'exon', >> -attributes => {Parent => $tc->group} >> ); >> >> for (@exons){ >> my $ex_id = $_->id; >> $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); >> >> } >> >> } >> my @values = values %ex_list; >> return(\@values); >> } >> >> sub cluster { >> my $exons_p = shift; >> >> for (my $s = 0; $s <= $#{$exons_p}; $s++){ >> for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ >> my $exon1 = $exons_p->[$s]; >> my $exon2 = $exons_p->[$t]; >> >> if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ >> >> my $overlap = $exon1->intersection($exon2); >> >> print "===\n";; >> print "ex1\n", $exon1->seq, "\n"; >> print "ex2\n", $exon2->seq, "\n"; >> print "overlap\n", $overlap->seq, "\n"; >> } >> } >> } >> } >> ______________________________ >> Marco Blanchette, Ph.D. >> >> mblanche at uclink.berkeley.edu >> >> Donald C. Rio's lab >> Department of Molecular and Cell Biology >> 16 Barker Hall >> University of California >> Berkeley, CA 94720-3204 >> >> Tel: (510) 642-1084 >> Cell: (510) 847-0996 >> Fax: (510) 642-6062 > > ______________________________ Marco Blanchette, Ph.D. mblanche at uclink.berkeley.edu Donald C. Rio's lab Department of Molecular and Cell Biology 16 Barker Hall University of California Berkeley, CA 94720-3204 Tel: (510) 642-1084 Cell: (510) 847-0996 Fax: (510) 642-6062 -- From osborne1 at optonline.net Tue May 2 17:49:49 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 02 May 2006 17:49:49 -0400 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF In-Reply-To: Message-ID: Marco, Odd, because the intersection() code is quite simple and it's clear how it should behave. What version of Bioperl are you using? I'm looking at the latest, in bioperl-live... Brian O. On 5/2/06 4:32 PM, "Marco Blanchette" wrote: > Brian-- > > Even when both elements of intersection() are from the negative strand, the > return object is from the positive strand and $overlap is actually the > revervese complement of the intersection between the 2 exons. Here is part > of the output from the script below: > > === > ex1 Strand: -1 > CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA > AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG > TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG > ex2 Strand: -1 > CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA > AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG > TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAACCCGT > CAAATCG > overlap Strand: 1 > CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTATTTT > TGCCGACTGCCATGTTCAACTAATAAACCGG > AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG > ... > > If both are from the positive strand, the return object is positive as in: > > === > ex1 Strand: 1 > CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCTTTGT > TTTGTGCCTGTTTCAGTATAAATTAATTATG > CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGATGAAT > AAATATACATATATGCAACATATATAACTTC > CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGGCAGA > GCAGCGATCAAAGCTGCGTTCGGTACTCGTT > GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT > ex2 Strand: 1 > ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG > overlap Strand: 1 > CAACGCAGACGTG > > Is there something I am missing? Here is the script generating the output > > Many thanks all... > > Marco > > > use strict; > use warnings; > use Bio::DB::GFF; > > MAIN:{ > > my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', > -dsn => > 'dbi:mysql:database=dmel_43_LS;host=riolab.net', > -user => 'guest'); > my $test_db = $db->segment('4'); > > # Load up the exons into $exons_p > for my $gene ($test_db->features(-types => 'gene')){ > > my $exons_p = extractExons($gene); > > cluster($exons_p) unless ($#{$exons_p} == -1); > > } > } > > sub extractExons { > my $gene = shift; > my %ex_list; > my @tcs = $gene->features( -type =>'processed_transcript', > -attributes =>{Gene => $gene->group}); > > for my $tc (@tcs){ > my @exons = $tc->features (-type => 'exon', > -attributes => {Parent => $tc->group} > ); > > for (@exons){ > my $ex_id = $_->id; > $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); > > } > > } > my @values = values %ex_list; > return(\@values); > } > > sub cluster { > my $exons_p = shift; > > for (my $s = 0; $s <= $#{$exons_p}; $s++){ > for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ > my $exon1 = $exons_p->[$s]; > my $exon2 = $exons_p->[$t]; > > if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ > > my $overlap = $exon1->intersection($exon2); > > print "===\n";; > print "ex1\tStrand: ", $exon1->strand, "\n", > $exon1->seq, "\n"; > print "ex2\tStrand: ", $exon2->strand, "\n", > $exon2->seq, "\n"; > print "overlap\tStrand: ", $overlap->strand, "\n", > $overlap->seq, "\n"; > } > } > } > } > > On 5/2/06 13:17, "Brian Osborne" wrote: > >> Marco, >> >> Yes, this is how intersection() is supposed to work. If both of the Range >> objects have the same strand then the strand information is returned as part >> of the result but if they aren't on the same strand then no strand >> information is returned. >> >> Brian O. >> >> >> On 5/2/06 3:30 PM, "Marco Blanchette" wrote: >> >>> Dear all-- >>> >>> I have been trying to use the intersection function to extract overlapping >>> region from alternatively spliced exons as in the following script. The >>> returned object from the 'my $overlap = $exon1->intersection($exon2);' is >>> actually loosing the strand of $exon1 if $exon1 is from the negative strand. >>> Is this behavior expected? Should I check the strand of $exon1 before >>> working on the object return by any Bio::RangeI function? >>> >>> Many thanks >>> >>> #!/usr/bin/perl >>> use strict; >>> use warnings; >>> use Bio::DB::GFF; >>> >>> MAIN:{ >>> >>> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', >>> -dsn => >>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net', >>> -user => 'guest'); >>> my $test_db = $db->segment('4'); >>> >>> # Load up the exons into $exons_p >>> for my $gene ($test_db->features(-types => 'gene')){ >>> >>> my $exons_p = extractExons($gene); >>> >>> cluster($exons_p) unless ($#{$exons_p} == -1); >>> >>> } >>> } >>> >>> sub extractExons { >>> my $gene = shift; >>> my %ex_list; >>> my @tcs = $gene->features( -type =>'processed_transcript', >>> -attributes =>{Gene => $gene->group}); >>> >>> for my $tc (@tcs){ >>> my @exons = $tc->features (-type => 'exon', >>> -attributes => {Parent => $tc->group} >>> ); >>> >>> for (@exons){ >>> my $ex_id = $_->id; >>> $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); >>> >>> } >>> >>> } >>> my @values = values %ex_list; >>> return(\@values); >>> } >>> >>> sub cluster { >>> my $exons_p = shift; >>> >>> for (my $s = 0; $s <= $#{$exons_p}; $s++){ >>> for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ >>> my $exon1 = $exons_p->[$s]; >>> my $exon2 = $exons_p->[$t]; >>> >>> if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ >>> >>> my $overlap = $exon1->intersection($exon2); >>> >>> print "===\n";; >>> print "ex1\n", $exon1->seq, "\n"; >>> print "ex2\n", $exon2->seq, "\n"; >>> print "overlap\n", $overlap->seq, "\n"; >>> } >>> } >>> } >>> } >>> ______________________________ >>> Marco Blanchette, Ph.D. >>> >>> mblanche at uclink.berkeley.edu >>> >>> Donald C. Rio's lab >>> Department of Molecular and Cell Biology >>> 16 Barker Hall >>> University of California >>> Berkeley, CA 94720-3204 >>> >>> Tel: (510) 642-1084 >>> Cell: (510) 847-0996 >>> Fax: (510) 642-6062 >> >> > > ______________________________ > Marco Blanchette, Ph.D. > > mblanche at uclink.berkeley.edu > > Donald C. Rio's lab > Department of Molecular and Cell Biology > 16 Barker Hall > University of California > Berkeley, CA 94720-3204 > > Tel: (510) 642-1084 > Cell: (510) 847-0996 > Fax: (510) 642-6062 From mblanche at berkeley.edu Tue May 2 18:31:44 2006 From: mblanche at berkeley.edu (Marco Blanchette) Date: Tue, 02 May 2006 15:31:44 -0700 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF In-Reply-To: Message-ID: Brian-- I checked out last week version from the CVS. Silly question: How do I get the version of BioPerl I am using... Never had to check a module/bundle version number before... Marco On 5/2/06 14:49, "Brian Osborne" wrote: > Marco, > > Odd, because the intersection() code is quite simple and it's clear how it > should behave. What version of Bioperl are you using? I'm looking at the > latest, in bioperl-live... > > Brian O. > > > On 5/2/06 4:32 PM, "Marco Blanchette" wrote: > >> Brian-- >> >> Even when both elements of intersection() are from the negative strand, the >> return object is from the positive strand and $overlap is actually the >> revervese complement of the intersection between the 2 exons. Here is part >> of the output from the script below: >> >> === >> ex1 Strand: -1 >> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA >> AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG >> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG >> ex2 Strand: -1 >> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA >> AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG >> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAACCCGT >> CAAATCG >> overlap Strand: 1 >> CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTATTTT >> TGCCGACTGCCATGTTCAACTAATAAACCGG >> AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG >> ... >> >> If both are from the positive strand, the return object is positive as in: >> >> === >> ex1 Strand: 1 >> CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCTTTGT >> TTTGTGCCTGTTTCAGTATAAATTAATTATG >> CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGATGAAT >> AAATATACATATATGCAACATATATAACTTC >> CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGGCAGA >> GCAGCGATCAAAGCTGCGTTCGGTACTCGTT >> GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT >> ex2 Strand: 1 >> ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG >> overlap Strand: 1 >> CAACGCAGACGTG >> >> Is there something I am missing? Here is the script generating the output >> >> Many thanks all... >> >> Marco >> >> >> use strict; >> use warnings; >> use Bio::DB::GFF; >> >> MAIN:{ >> >> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', >> -dsn => >> 'dbi:mysql:database=dmel_43_LS;host=riolab.net', >> -user => 'guest'); >> my $test_db = $db->segment('4'); >> >> # Load up the exons into $exons_p >> for my $gene ($test_db->features(-types => 'gene')){ >> >> my $exons_p = extractExons($gene); >> >> cluster($exons_p) unless ($#{$exons_p} == -1); >> >> } >> } >> >> sub extractExons { >> my $gene = shift; >> my %ex_list; >> my @tcs = $gene->features( -type =>'processed_transcript', >> -attributes =>{Gene => $gene->group}); >> >> for my $tc (@tcs){ >> my @exons = $tc->features (-type => 'exon', >> -attributes => {Parent => $tc->group} >> ); >> >> for (@exons){ >> my $ex_id = $_->id; >> $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); >> >> } >> >> } >> my @values = values %ex_list; >> return(\@values); >> } >> >> sub cluster { >> my $exons_p = shift; >> >> for (my $s = 0; $s <= $#{$exons_p}; $s++){ >> for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ >> my $exon1 = $exons_p->[$s]; >> my $exon2 = $exons_p->[$t]; >> >> if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ >> >> my $overlap = $exon1->intersection($exon2); >> >> print "===\n";; >> print "ex1\tStrand: ", $exon1->strand, "\n", >> $exon1->seq, "\n"; >> print "ex2\tStrand: ", $exon2->strand, "\n", >> $exon2->seq, "\n"; >> print "overlap\tStrand: ", $overlap->strand, "\n", >> $overlap->seq, "\n"; >> } >> } >> } >> } >> >> On 5/2/06 13:17, "Brian Osborne" wrote: >> >>> Marco, >>> >>> Yes, this is how intersection() is supposed to work. If both of the Range >>> objects have the same strand then the strand information is returned as part >>> of the result but if they aren't on the same strand then no strand >>> information is returned. >>> >>> Brian O. >>> >>> >>> On 5/2/06 3:30 PM, "Marco Blanchette" wrote: >>> >>>> Dear all-- >>>> >>>> I have been trying to use the intersection function to extract overlapping >>>> region from alternatively spliced exons as in the following script. The >>>> returned object from the 'my $overlap = $exon1->intersection($exon2);' is >>>> actually loosing the strand of $exon1 if $exon1 is from the negative >>>> strand. >>>> Is this behavior expected? Should I check the strand of $exon1 before >>>> working on the object return by any Bio::RangeI function? >>>> >>>> Many thanks >>>> >>>> #!/usr/bin/perl >>>> use strict; >>>> use warnings; >>>> use Bio::DB::GFF; >>>> >>>> MAIN:{ >>>> >>>> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', >>>> -dsn => >>>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net', >>>> -user => 'guest'); >>>> my $test_db = $db->segment('4'); >>>> >>>> # Load up the exons into $exons_p >>>> for my $gene ($test_db->features(-types => 'gene')){ >>>> >>>> my $exons_p = extractExons($gene); >>>> >>>> cluster($exons_p) unless ($#{$exons_p} == -1); >>>> >>>> } >>>> } >>>> >>>> sub extractExons { >>>> my $gene = shift; >>>> my %ex_list; >>>> my @tcs = $gene->features( -type =>'processed_transcript', >>>> -attributes =>{Gene => $gene->group}); >>>> >>>> for my $tc (@tcs){ >>>> my @exons = $tc->features (-type => 'exon', >>>> -attributes => {Parent => $tc->group} >>>> ); >>>> >>>> for (@exons){ >>>> my $ex_id = $_->id; >>>> $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); >>>> >>>> } >>>> >>>> } >>>> my @values = values %ex_list; >>>> return(\@values); >>>> } >>>> >>>> sub cluster { >>>> my $exons_p = shift; >>>> >>>> for (my $s = 0; $s <= $#{$exons_p}; $s++){ >>>> for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ >>>> my $exon1 = $exons_p->[$s]; >>>> my $exon2 = $exons_p->[$t]; >>>> >>>> if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ >>>> >>>> my $overlap = $exon1->intersection($exon2); >>>> >>>> print "===\n";; >>>> print "ex1\n", $exon1->seq, "\n"; >>>> print "ex2\n", $exon2->seq, "\n"; >>>> print "overlap\n", $overlap->seq, "\n"; >>>> } >>>> } >>>> } >>>> } >>>> ______________________________ >>>> Marco Blanchette, Ph.D. >>>> >>>> mblanche at uclink.berkeley.edu >>>> >>>> Donald C. Rio's lab >>>> Department of Molecular and Cell Biology >>>> 16 Barker Hall >>>> University of California >>>> Berkeley, CA 94720-3204 >>>> >>>> Tel: (510) 642-1084 >>>> Cell: (510) 847-0996 >>>> Fax: (510) 642-6062 >>> >>> >> >> ______________________________ >> Marco Blanchette, Ph.D. >> >> mblanche at uclink.berkeley.edu >> >> Donald C. Rio's lab >> Department of Molecular and Cell Biology >> 16 Barker Hall >> University of California >> Berkeley, CA 94720-3204 >> >> Tel: (510) 642-1084 >> Cell: (510) 847-0996 >> Fax: (510) 642-6062 > > ______________________________ Marco Blanchette, Ph.D. mblanche at uclink.berkeley.edu Donald C. Rio's lab Department of Molecular and Cell Biology 16 Barker Hall University of California Berkeley, CA 94720-3204 Tel: (510) 642-1084 Cell: (510) 847-0996 Fax: (510) 642-6062 -- From arareko at campus.iztacala.unam.mx Tue May 2 18:32:24 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Tue, 02 May 2006 17:32:24 -0500 Subject: [Bioperl-l] BioPerl-run in FreeBSD Message-ID: <4457DDF8.4050005@campus.iztacala.unam.mx> It?s my great pleasure to announce the availability of the BioPerl-run packages (stable & developer releases) for the FreeBSD operating system. For instructions on how to install BioPerl ports in FreeBSD, please take a look into the Getting Bioperl section of the BioPerl Wiki. Regards, Mauricio. -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From heikki at sanbi.ac.za Wed May 3 02:51:12 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 3 May 2006 08:51:12 +0200 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF In-Reply-To: References: Message-ID: <200605030851.13007.heikki@sanbi.ac.za> On Wednesday 03 May 2006 00:31, Marco Blanchette wrote: > Brian-- > > I checked out last week version from the CVS. > > Silly question: How do I get the version of BioPerl I am using... Never had > to check a module/bundle version number before... It is not that silly. The syntax in not too easy: perl -MBio::Perl -le 'print Bio::Perl->VERSION;' You can use any module in bioperl, of course. -Heikki > Marco > > On 5/2/06 14:49, "Brian Osborne" wrote: > > Marco, > > > > Odd, because the intersection() code is quite simple and it's clear how > > it should behave. What version of Bioperl are you using? I'm looking at > > the latest, in bioperl-live... > > > > Brian O. > > > > On 5/2/06 4:32 PM, "Marco Blanchette" wrote: > >> Brian-- > >> > >> Even when both elements of intersection() are from the negative strand, > >> the return object is from the positive strand and $overlap is actually > >> the revervese complement of the intersection between the 2 exons. Here > >> is part of the output from the script below: > >> > >> === > >> ex1 Strand: -1 > >> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAA > >>AATA AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG > >> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG > >> ex2 Strand: -1 > >> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAA > >>AATA AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG > >> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAAC > >>CCGT CAAATCG > >> overlap Strand: 1 > >> CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTA > >>TTTT TGCCGACTGCCATGTTCAACTAATAAACCGG > >> AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG > >> ... > >> > >> If both are from the positive strand, the return object is positive as > >> in: > >> > >> === > >> ex1 Strand: 1 > >> CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCT > >>TTGT TTTGTGCCTGTTTCAGTATAAATTAATTATG > >> CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGAT > >>GAAT AAATATACATATATGCAACATATATAACTTC > >> CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGG > >>CAGA GCAGCGATCAAAGCTGCGTTCGGTACTCGTT > >> GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT > >> ex2 Strand: 1 > >> ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG > >> overlap Strand: 1 > >> CAACGCAGACGTG > >> > >> Is there something I am missing? Here is the script generating the > >> output > >> > >> Many thanks all... > >> > >> Marco > >> > >> > >> use strict; > >> use warnings; > >> use Bio::DB::GFF; > >> > >> MAIN:{ > >> > >> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', > >> -dsn => > >> 'dbi:mysql:database=dmel_43_LS;host=riolab.net', > >> -user => 'guest'); > >> my $test_db = $db->segment('4'); > >> > >> # Load up the exons into $exons_p > >> for my $gene ($test_db->features(-types => 'gene')){ > >> > >> my $exons_p = extractExons($gene); > >> > >> cluster($exons_p) unless ($#{$exons_p} == -1); > >> > >> } > >> } > >> > >> sub extractExons { > >> my $gene = shift; > >> my %ex_list; > >> my @tcs = $gene->features( -type =>'processed_transcript', > >> -attributes =>{Gene => > >> $gene->group}); > >> > >> for my $tc (@tcs){ > >> my @exons = $tc->features (-type => 'exon', > >> -attributes => {Parent => > >> $tc->group} ); > >> > >> for (@exons){ > >> my $ex_id = $_->id; > >> $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); > >> > >> } > >> > >> } > >> my @values = values %ex_list; > >> return(\@values); > >> } > >> > >> sub cluster { > >> my $exons_p = shift; > >> > >> for (my $s = 0; $s <= $#{$exons_p}; $s++){ > >> for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ > >> my $exon1 = $exons_p->[$s]; > >> my $exon2 = $exons_p->[$t]; > >> > >> if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ > >> > >> my $overlap = $exon1->intersection($exon2); > >> > >> print "===\n";; > >> print "ex1\tStrand: ", $exon1->strand, "\n", > >> $exon1->seq, "\n"; > >> print "ex2\tStrand: ", $exon2->strand, "\n", > >> $exon2->seq, "\n"; > >> print "overlap\tStrand: ", $overlap->strand, "\n", > >> $overlap->seq, "\n"; > >> } > >> } > >> } > >> } > >> > >> On 5/2/06 13:17, "Brian Osborne" wrote: > >>> Marco, > >>> > >>> Yes, this is how intersection() is supposed to work. If both of the > >>> Range objects have the same strand then the strand information is > >>> returned as part of the result but if they aren't on the same strand > >>> then no strand information is returned. > >>> > >>> Brian O. > >>> > >>> On 5/2/06 3:30 PM, "Marco Blanchette" wrote: > >>>> Dear all-- > >>>> > >>>> I have been trying to use the intersection function to extract > >>>> overlapping region from alternatively spliced exons as in the > >>>> following script. The returned object from the 'my $overlap = > >>>> $exon1->intersection($exon2);' is actually loosing the strand of > >>>> $exon1 if $exon1 is from the negative strand. > >>>> Is this behavior expected? Should I check the strand of $exon1 before > >>>> working on the object return by any Bio::RangeI function? > >>>> > >>>> Many thanks > >>>> > >>>> #!/usr/bin/perl > >>>> use strict; > >>>> use warnings; > >>>> use Bio::DB::GFF; > >>>> > >>>> MAIN:{ > >>>> > >>>> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', > >>>> -dsn => > >>>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net', > >>>> -user => 'guest'); > >>>> my $test_db = $db->segment('4'); > >>>> > >>>> # Load up the exons into $exons_p > >>>> for my $gene ($test_db->features(-types => 'gene')){ > >>>> > >>>> my $exons_p = extractExons($gene); > >>>> > >>>> cluster($exons_p) unless ($#{$exons_p} == -1); > >>>> > >>>> } > >>>> } > >>>> > >>>> sub extractExons { > >>>> my $gene = shift; > >>>> my %ex_list; > >>>> my @tcs = $gene->features( -type =>'processed_transcript', > >>>> -attributes =>{Gene => > >>>> $gene->group}); > >>>> > >>>> for my $tc (@tcs){ > >>>> my @exons = $tc->features (-type => 'exon', > >>>> -attributes => {Parent => > >>>> $tc->group} ); > >>>> > >>>> for (@exons){ > >>>> my $ex_id = $_->id; > >>>> $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); > >>>> > >>>> } > >>>> > >>>> } > >>>> my @values = values %ex_list; > >>>> return(\@values); > >>>> } > >>>> > >>>> sub cluster { > >>>> my $exons_p = shift; > >>>> > >>>> for (my $s = 0; $s <= $#{$exons_p}; $s++){ > >>>> for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ > >>>> my $exon1 = $exons_p->[$s]; > >>>> my $exon2 = $exons_p->[$t]; > >>>> > >>>> if (!($exon1->equals($exon2)) && > >>>> $exon1->overlaps($exon2)){ > >>>> > >>>> my $overlap = $exon1->intersection($exon2); > >>>> > >>>> print "===\n";; > >>>> print "ex1\n", $exon1->seq, "\n"; > >>>> print "ex2\n", $exon2->seq, "\n"; > >>>> print "overlap\n", $overlap->seq, "\n"; > >>>> } > >>>> } > >>>> } > >>>> } > >>>> ______________________________ > >>>> Marco Blanchette, Ph.D. > >>>> > >>>> mblanche at uclink.berkeley.edu > >>>> > >>>> Donald C. Rio's lab > >>>> Department of Molecular and Cell Biology > >>>> 16 Barker Hall > >>>> University of California > >>>> Berkeley, CA 94720-3204 > >>>> > >>>> Tel: (510) 642-1084 > >>>> Cell: (510) 847-0996 > >>>> Fax: (510) 642-6062 > >> > >> ______________________________ > >> Marco Blanchette, Ph.D. > >> > >> mblanche at uclink.berkeley.edu > >> > >> Donald C. Rio's lab > >> Department of Molecular and Cell Biology > >> 16 Barker Hall > >> University of California > >> Berkeley, CA 94720-3204 > >> > >> Tel: (510) 642-1084 > >> Cell: (510) 847-0996 > >> Fax: (510) 642-6062 > > ______________________________ > Marco Blanchette, Ph.D. > > mblanche at uclink.berkeley.edu > > Donald C. Rio's lab > Department of Molecular and Cell Biology > 16 Barker Hall > University of California > Berkeley, CA 94720-3204 > > Tel: (510) 642-1084 > Cell: (510) 847-0996 > Fax: (510) 642-6062 -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From nuclearn at gmail.com Wed May 3 02:05:42 2006 From: nuclearn at gmail.com (Li Xiao) Date: Wed, 3 May 2006 14:05:42 +0800 Subject: [Bioperl-l] about the frame and strand of a blastx report Message-ID: <150864390605022305p5a04e743l24938386af12edf3@mail.gmail.com> Hi, anybody, I am working to parse a blastx report by using BioPerl modules (Bio::SearchIO). The blastx result was created by NCBI-BLAST. How i can obtain the strand ( + or -) of query sequence against the hited protein? I tried to use the strand function, but nothing were reported. And i used the frame funtion, the result usually display 0,1,2, so, the result can not give any information about the query strand( + o r- ). How i obtain the strand of a query squence? -- ********************************************************************* Li Xiao Sichuan Key Laboratory of Molecular Biology and Biotechnology College of Life Science, Sichuan University Chengdu, SiChuan, P.R.China TEL:86-28-85470083 FAX:86-28-85412738 E-MAIL: nuclearn at gmail.com URL: http://scbi.scu.edu.cn ********************************************************************** From cjfields at uiuc.edu Wed May 3 09:38:17 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 3 May 2006 08:38:17 -0500 Subject: [Bioperl-l] about the frame and strand of a blastx report In-Reply-To: <150864390605022305p5a04e743l24938386af12edf3@mail.gmail.com> Message-ID: <000601c66eb6$d5d5f530$15327e82@pyrimidine> $hsp->strand(): my $parser = Bio::SearchIO->new (-file => shift @ARGV, -format => 'blast'); while (my $result = $parser->next_result) { while (my $hit = $result->next_hit) { while (my $hsp = $hit->next_hsp) { print $hsp->strand,"\n"; } } } This will give 1 or -1. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Li Xiao > Sent: Wednesday, May 03, 2006 1:06 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] about the frame and strand of a blastx report > > Hi, anybody, > > I am working to parse a blastx report by using BioPerl modules > (Bio::SearchIO). > The blastx result was created by NCBI-BLAST. How i can obtain the strand ( > + > or -) > of query sequence against the hited protein? I tried to use the strand > function, but > nothing were reported. And i used the frame funtion, the result usually > display 0,1,2, > so, the result can not give any information about the query strand( + o r- > ). > How i obtain the strand of a query squence? > -- > ********************************************************************* > Li Xiao > Sichuan Key Laboratory of Molecular Biology and Biotechnology > College of Life Science, Sichuan University > Chengdu, SiChuan, P.R.China > TEL:86-28-85470083 FAX:86-28-85412738 > E-MAIL: nuclearn at gmail.com > URL: http://scbi.scu.edu.cn > ********************************************************************** > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Wed May 3 11:22:27 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 03 May 2006 11:22:27 -0400 Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines In-Reply-To: <20060502114101.29745.qmail@web50409.mail.yahoo.com> Message-ID: Mark, So you're trying to get the information in the RC line from a Swissprot format file? Brian O. On 5/2/06 7:41 AM, "Mark A. Miller" wrote: > Hello all. > > I have a recently donwloaded UniProt/TrEMBL flat file. I am trying to > make FASTA subset files for some bacterial strains. I haven't been > able to parse out the strain information from the OS or RC lines. > These lines typically look like: > > OS Somegenus somespecies subsp. somesubspecies strain ABC123. > RC STRAIN=ABC123. > > I'm not especiialy good with Perl, and I'm definitely weak when it > comes to OOP. > > I have included some code I pasted together from various pages on the > bioperl wiki. In addition to the wiki, I have been making use of > www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html > > The code I have so far reports the species but not the subspecies or > variant. I have also tried to walk through all of the feature, > annotation and reference objects but I still can't seem to parse out > the information I need. (For brevity, the example I'm including below > only lists the code I used for the annotation objects.) Also, this > code only prints the information... I know that I'll have to write a > FASTA sequence object seperately. > > Any suggestions? > > Thanks, > Mark > > --- --- --- > > > #!/usr/bin/perl > > > > use Bio::SeqIO; > > > > my $usage = "getaccs.pl file format\n"; > > my $file = shift or die $usage; > > my $format = shift or die $usage; > > > > my $inseq = Bio::SeqIO->new(-file => "<$file", > > -format => $format ); > > > > while (my $seq = $inseq->next_seq) { > > > > my $species_object = $seq->species; > > my $species_string = $species_object->species; > > my $variant_string = $species_object->variant; > > my $common_string = $species_object->common_name; > > my $sub_string = $species_object->sub_species; > > my $binomial = $species_object->binomial('FULL'); > > > > print "display ",$seq->display_id,"\n"; > > print "accession ",$seq->accession_number,"\n"; > > print "desc ",$seq->desc,"\n"; > > > > print "species ",$species_string,"\n"; > > print "variant ",$variant_string,"\n"; > > print "common ",$common_string,"\n"; > > print "sub ",$sub_string,"\n"; > > print "binomial ",$binomial,"\n"; > > > > print $seq->seq,"\n"; > > > > my $anno_collection = $seq->annotation; > > for my $key ( $anno_collection->get_all_annotation_keys ) { > > my @annotations = $anno_collection->get_Annotations($key); > > for my $value ( @annotations ) { > > print "tagname : ", $value->tagname, "\n"; > > # $value is an Bio::Annotation, and has an "as_text" method > > print " annotation value: ", $value->as_text, "\n"; > > > > if ($value->tagname eq "reference") { > > my $hash_ref = $value->hash_tree; > > for my $key (keys %{$hash_ref}) { > > print $key,": ",$hash_ref->{$key},"\n"; > > } > > } > > } > > } > > print "\n"; > > } > > exit; > > > > > > --- --- --- --- --- --- --- --- > > Mark A. Miller > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From MEC at stowers-institute.org Wed May 3 11:09:04 2006 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Wed, 3 May 2006 10:09:04 -0500 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF Message-ID: Marco, It appears that your code assumes that the exons as returned from call to BIO::DB::GFF::features are sorted by start; I don't think is guaranteed (at least not in the documentation I'm reading). Also I think your code will not report overlap between two exons that have an intervening overlapping exon. Depending on what you're application is, you may care. For example, e1, e2, e3 all intersect pairwise, but your code won't report on e1's overlap with e3. e1 ---*******------- e2 -----******------ e3 ------***-------- Out of curiousity, what is your application? Designing primers for gene resequencing? Cheers, Malcolm Cook Database Applications Manager, Bioinformatics Stowers Institute for Medical Research >-----Original Message----- >From: bioperl-l-bounces at lists.open-bio.org >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >Marco Blanchette >Sent: Tuesday, May 02, 2006 2:31 PM >To: bioperl-l at lists.open-bio.org >Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF > >Dear all-- > >I have been trying to use the intersection function to extract >overlapping >region from alternatively spliced exons as in the following script. The >returned object from the 'my $overlap = >$exon1->intersection($exon2);' is >actually loosing the strand of $exon1 if $exon1 is from the >negative strand. >Is this behavior expected? Should I check the strand of $exon1 before >working on the object return by any Bio::RangeI function? > >Many thanks > >#!/usr/bin/perl >use strict; >use warnings; >use Bio::DB::GFF; > >MAIN:{ > > my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', > -dsn => >'dbi:mysql:database=dmel_43_LS;host=riolab.net', > -user => 'guest'); > my $test_db = $db->segment('4'); > > # Load up the exons into $exons_p > for my $gene ($test_db->features(-types => 'gene')){ > > my $exons_p = extractExons($gene); > > cluster($exons_p) unless ($#{$exons_p} == -1); > > } >} > >sub extractExons { > my $gene = shift; > my %ex_list; > my @tcs = $gene->features( -type =>'processed_transcript', > -attributes =>{Gene => >$gene->group}); > > for my $tc (@tcs){ > my @exons = $tc->features (-type => 'exon', > -attributes => {Parent => >$tc->group} >); > > for (@exons){ > my $ex_id = $_->id; > $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); > > } > > } > my @values = values %ex_list; > return(\@values); >} > >sub cluster { > my $exons_p = shift; > > for (my $s = 0; $s <= $#{$exons_p}; $s++){ > for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ > my $exon1 = $exons_p->[$s]; > my $exon2 = $exons_p->[$t]; > > if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ > > my $overlap = $exon1->intersection($exon2); > > print "===\n";; > print "ex1\n", $exon1->seq, "\n"; > print "ex2\n", $exon2->seq, "\n"; > print "overlap\n", $overlap->seq, "\n"; > } > } > } >} >______________________________ >Marco Blanchette, Ph.D. > >mblanche at uclink.berkeley.edu > >Donald C. Rio's lab >Department of Molecular and Cell Biology >16 Barker Hall >University of California >Berkeley, CA 94720-3204 > >Tel: (510) 642-1084 >Cell: (510) 847-0996 >Fax: (510) 642-6062 >-- > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From sdavis2 at mail.nih.gov Wed May 3 12:18:48 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 03 May 2006 12:18:48 -0400 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF In-Reply-To: Message-ID: On 5/3/06 11:09 AM, "Cook, Malcolm" wrote: > Marco, > > It appears that your code assumes that the exons as returned from call > to BIO::DB::GFF::features are sorted by start; I don't think is > guaranteed (at least not in the documentation I'm reading). Also I > think your code will not report overlap between two exons that have an > intervening overlapping exon. Depending on what you're application is, > you may care. For example, e1, e2, e3 all intersect pairwise, but your > code won't report on e1's overlap with e3. > > e1 ---*******------- > e2 -----******------ > e3 ------***-------- I think this can be done (looking for "superexons") via the UCSC table browser or via Penn State University's Galaxy server (written in python and downloadable) in case you want a quick solution to what I think is your problem.... Sean From osborne1 at optonline.net Wed May 3 16:22:57 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 03 May 2006 16:22:57 -0400 Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines In-Reply-To: <20060503193446.92476.qmail@web50412.mail.yahoo.com> Message-ID: Mark, The RC line is part of the description of a reference, I'm guessing 'RC' stands for Reference Comment. In order to get the attributes of a reference you'll first do something like: my $anno_collection = $seq->annotation; my @references = $anno_collection->get_Annotations('reference'); To get the comment field for a specific reference you can do: $references[0]->comment; See the Feature-Annotation HOWTO for more information on Annotations, the Reference object is a kind of Annotation object. Brian O. On 5/3/06 3:34 PM, "Mark A. Miller" wrote: > Yeah. Do you have any experience with that? > > Mark > > --- Brian Osborne wrote: > >> Mark, >> >> So you're trying to get the information in the RC line from a >> Swissprot >> format file? >> >> Brian O. > > > --- --- --- --- --- --- --- --- > > Mark A. Miller > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com From cjfields at uiuc.edu Wed May 3 17:09:36 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 3 May 2006 16:09:36 -0500 Subject: [Bioperl-l] Batch retrieval partially implemented in Bio::DB::GenBank/GenPept Message-ID: <000601c66ef5$e3066d90$15327e82@pyrimidine> Just wanted to let you guys know I have added a few bits and pieces to Bio::DB::Gen* and BioLLDB::NCBIHelper for batch retrieval using epost/efetch. I didn't want to break anything too severely so you can only use this at the moment using get_seq_stream (i.e. NOT through get_Stream* methods yet). I also added tests to DB.t, a few each for protein and nucleotide retrieval using batch mode and so far they all pass fine. I haven't tested the upper sequence limit for this yet to see if it's at all comparable to just using efetch but it seems a bit faster. The eutils coursebook states that one should only post ~500 at a time (I think you can get a bit higher though). Also, at the moment it only works at the moment for GI's (NOT accessions, which apparently epost does not accept). If we want to continue using this method for retrieval then we may need a workaround for accs. CJF Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From torsten.seemann at infotech.monash.edu.au Wed May 3 17:44:48 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Thu, 04 May 2006 07:44:48 +1000 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF In-Reply-To: References: Message-ID: <1146692688.12571.1.camel@chauvel.csse.monash.edu.au> Marco, > Silly question: How do I get the version of BioPerl I am using... Never had > to check a module/bundle version number before... http://bioperl.org/wiki/FAQ#How_can_I_tell_what_version_of_BioPerl_is_installed.3F -- Torsten Seemann Victorian Bioinformatics Consortium From cjfields at uiuc.edu Wed May 3 18:08:37 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 3 May 2006 17:08:37 -0500 Subject: [Bioperl-l] Batch retrieval partially implemented inBio::DB::GenBank/GenPept In-Reply-To: <000601c66ef5$e3066d90$15327e82@pyrimidine> Message-ID: <000001c66efe$21dbcf80$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Wednesday, May 03, 2006 4:10 PM > To: 'Jason Stajich'; 'Brian Osborne'; bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Batch retrieval partially implemented > inBio::DB::GenBank/GenPept > > Just wanted to let you guys know I have added a few bits and pieces to > Bio::DB::Gen* and BioLLDB::NCBIHelper for batch retrieval using ^^^^^^^^^^^^^^^^^^^ Bio::DB::NCBIHelper Fat fingers! > epost/efetch. I didn't want to break anything too severely so you can > only > use this at the moment using get_seq_stream (i.e. NOT through get_Stream* > methods yet). I also added tests to DB.t, a few each for protein and > nucleotide retrieval using batch mode and so far they all pass fine. > > I haven't tested the upper sequence limit for this yet to see if it's at > all > comparable to just using efetch but it seems a bit faster. The eutils > coursebook states that one should only post ~500 at a time (I think you > can > get a bit higher though). > > Also, at the moment it only works at the moment for GI's (NOT accessions, > which apparently epost does not accept). If we want to continue using > this > method for retrieval then we may need a workaround for accs. > > CJF > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From arareko at campus.iztacala.unam.mx Wed May 3 18:24:23 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Wed, 03 May 2006 17:24:23 -0500 Subject: [Bioperl-l] Batch retrieval partially implemented inBio::DB::GenBank/GenPept In-Reply-To: <000001c66efe$21dbcf80$15327e82@pyrimidine> References: <000001c66efe$21dbcf80$15327e82@pyrimidine> Message-ID: <44592D97.6090906@campus.iztacala.unam.mx> hehehe :) Chris Fields wrote: > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Chris Fields >> Sent: Wednesday, May 03, 2006 4:10 PM >> To: 'Jason Stajich'; 'Brian Osborne'; bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Batch retrieval partially implemented >> inBio::DB::GenBank/GenPept >> >> Just wanted to let you guys know I have added a few bits and pieces to >> Bio::DB::Gen* and BioLLDB::NCBIHelper for batch retrieval using > ^^^^^^^^^^^^^^^^^^^ > Bio::DB::NCBIHelper > Fat fingers! > >> epost/efetch. I didn't want to break anything too severely so you can >> only >> use this at the moment using get_seq_stream (i.e. NOT through get_Stream* >> methods yet). I also added tests to DB.t, a few each for protein and >> nucleotide retrieval using batch mode and so far they all pass fine. >> >> I haven't tested the upper sequence limit for this yet to see if it's at >> all >> comparable to just using efetch but it seems a bit faster. The eutils >> coursebook states that one should only post ~500 at a time (I think you >> can >> get a bit higher though). >> >> Also, at the moment it only works at the moment for GI's (NOT accessions, >> which apparently epost does not accept). If we want to continue using >> this >> method for retrieval then we may need a workaround for accs. >> >> CJF >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From fernan at iib.unsam.edu.ar Wed May 3 20:38:07 2006 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Wed, 3 May 2006 21:38:07 -0300 Subject: [Bioperl-l] BioPerl-run in FreeBSD In-Reply-To: <4457DDF8.4050005@campus.iztacala.unam.mx> References: <4457DDF8.4050005@campus.iztacala.unam.mx> Message-ID: <20060504003807.GA86447@iib.unsam.edu.ar> +----[ Mauricio Herrera Cuadra (02.May.2006 19:49): | | It?s my great pleasure to announce the availability of the BioPerl-run | packages (stable & developer releases) for the FreeBSD operating system. | | For instructions on how to install BioPerl ports in FreeBSD, please take | a look into the Getting Bioperl section of the BioPerl Wiki. | +----] Great job Mauricio, thanks for contributing this! Fernan From miker at biotiquesystems.com Tue May 2 23:31:59 2006 From: miker at biotiquesystems.com (Michael Rogoff) Date: Tue, 2 May 2006 20:31:59 -0700 Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps Message-ID: <007b01c66e62$23161d20$c100a8c0@mike> I've encountered a pretty serious bug in Bio::SeqIO when parsing certain genbank files that contain CONTIG entries with gaps. One such record is NW_925173. When I try to parse this file using Bio::SeqIO::genbank, it will enter an infinite loop and spin until it runs out of memory. I'm pretty certain it relates to this bug: http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to indicate that genbank records with CONTIG gaps are not valid and can't be parsed. But this bug actually claims to be fixed, which is strange, since looking at the code for FTLocationFactory (where the loop is) it's still right there. I assume that this may be fixed in other contexts but is still not fixed in Bio::SeqIO::genbank? Or am I doing something wrong? I think that this should probably be filed as an open bug. I would think that even if bioperl isn't interested in parsing this type of file via SeqIO, certainly you'd want to ensure that no finite input file would send the parser into an infinite loop. Have others encountered this problem? Is there any plan to address it? Thanks very much for any information or help! -Mike P.S. I've played around with my version of FTLocationFactory and it seems to actually work and parse the gaps. I'm not sure if I've created other bugs or if it works in all cases, but at least the parser doesn't die. I also don't know that my hacky code is appropriate for putting back in to BioPerl, but I'm happy to provide it if someone wants to check it out and/or consider it for checkin. From ULNJUJERYDIX at spammotel.com Wed May 3 04:20:38 2006 From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau) Date: Wed, 3 May 2006 16:20:38 +0800 Subject: [Bioperl-l] Bio::Graphics::Panel imagemap making with Bio::Graphics::Panel Message-ID: <5b6410e0605030120q31d1f554mbc4bf104deca48bf@mail.gmail.com> Help! I can't figure out the docs instructions I want to create an imagemap of short sequence matches with a longer one with clickable imagemaps for the short sequences. I figure I can do this easily enough using the example script for parsing blast output but I need an example script to understand how to produce the html code for the imagemap. I can find only rather cryptic references about how this can be done (see below). $boxes = $panel-Eboxes @boxes = $panel-Eboxes The boxes() method returns a list of arrayrefs containing the coordinates of each glyph. The method is useful for constructing an image map. In a scalar context, boxes() returns an arrayref. In an list context, the method returns the list directly. Each member of the list is an arrayref of the following format: [ $feature, $x1, $y1, $x2, $y2, $track ] The first element is the feature object; either an Ace::Sequence::Feature, a Das::Segment::Feature, or another Bioperl Bio::SeqFeatureI object. The coordinates are the topleft and bottomright corners of the glyph, including any space allocated for labels. The track is the Bio::Graphics::Glyph object corresponding to the track that the feature is rendered inside. $position = $panel-Etrack_position($track) After calling gd() or boxes(), you can learn the resulting Y coordinate of a track by calling track_position() with the value returned by add_track() or unshift_track(). This will return undef if called before gd() or boxes() or with an invalid track. @pixel_coords = $panel-Elocation2pixel(@feature_coords) Public routine to map feature coordinates (in base pairs) into pixel coordinates relative to the left-hand edge of the picture. If you define a -background callback, the callback may wish to invoke this routine in order to translate base coordinates into pixel coordinates. $left = $panel-Eleft $right = $panel-Eright $top = $panel-Etop $bottom = $panel-Ebottom Return the pixel coordinates of the *drawing area* of the panel, that is, exclusive of the padding. got it from http://docs.bioperl.org/bioperl-live/Bio/Graphics/Panel.html From s.johri at imperial.ac.uk Thu May 4 08:50:34 2006 From: s.johri at imperial.ac.uk (Johri, Saurabh) Date: Thu, 4 May 2006 13:50:34 +0100 Subject: [Bioperl-l] Fu and Li's D statistic - calculate Message-ID: <4A98ACB8EC146149872BAC9A132A582C277AB3@icex5.ic.ac.uk> Hi all, I'm trying to calculate Fu and Li's D summary statistic for a group of sequences. the function fu_and_li_D(@ingroup,$extmutations) takes 2 args, the first being the ingroup (population) and the second being the number of external mutations which is calculated from an outgroup sequence.. my question is, which function do i use to calculate the number of external mutations ? would this be the singleton_count() function ? the singleton_count() function takes a PopGen object - which represents a clustal alignment file... would i include the outgroup in a multiple fasta file for alignment with clustal ? any suggestions as to how to calculate the number of external mutations would be much appreciated Thanks for your help! Saurabh Johri Centre for Molecular Microbiology & Infection Imperial College London SW7 2AZ From hlapp at gmx.net Thu May 4 12:30:05 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 4 May 2006 12:30:05 -0400 Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps In-Reply-To: <007b01c66e62$23161d20$c100a8c0@mike> References: <007b01c66e62$23161d20$c100a8c0@mike> Message-ID: Infinite loop on a file you can download (i.e., as opposed to a file you tinkered with) is never ok. Could you file this as a bug report? And ideally attach your patch? Thanks, -hilmar On May 2, 2006, at 11:31 PM, Michael Rogoff wrote: > > I've encountered a pretty serious bug in Bio::SeqIO when parsing > certain genbank > files that contain CONTIG entries with gaps. One such record is > NW_925173. > > When I try to parse this file using Bio::SeqIO::genbank, it will > enter an > infinite loop and spin until it runs out of memory. > > I'm pretty certain it relates to this bug: > http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to > indicate that > genbank records with CONTIG gaps are not valid and can't be > parsed. But this > bug actually claims to be fixed, which is strange, since looking at > the code for > FTLocationFactory (where the loop is) it's still right there. I > assume that > this may be fixed in other contexts but is still not fixed in > Bio::SeqIO::genbank? Or am I doing something wrong? > > I think that this should probably be filed as an open bug. I would > think that > even if bioperl isn't interested in parsing this type of file via > SeqIO, > certainly you'd want to ensure that no finite input file would send > the parser > into an infinite loop. Have others encountered this problem? Is > there any plan > to address it? > > Thanks very much for any information or help! > > -Mike > > P.S. I've played around with my version of FTLocationFactory and > it seems to > actually work and parse the gaps. I'm not sure if I've created > other bugs or if > it works in all cases, but at least the parser doesn't die. I also > don't know > that my hacky code is appropriate for putting back in to BioPerl, > but I'm happy > to provide it if someone wants to check it out and/or consider it > for checkin. > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From saldroubi at yahoo.com Thu May 4 13:03:00 2006 From: saldroubi at yahoo.com (Sam Al-Droubi) Date: Thu, 4 May 2006 10:03:00 -0700 (PDT) Subject: [Bioperl-l] Is webiste down? Message-ID: <20060504170300.12178.qmail@web34301.mail.mud.yahoo.com> All, Is the bioperl website down? I can't get to http://www.bioperl.org Thank you. Sincerely, Sam Al-Droubi, M.S. saldroubi at yahoo.com From arareko at campus.iztacala.unam.mx Thu May 4 14:22:52 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 04 May 2006 13:22:52 -0500 Subject: [Bioperl-l] Is webiste down? In-Reply-To: <20060504170300.12178.qmail@web34301.mail.mud.yahoo.com> References: <20060504170300.12178.qmail@web34301.mail.mud.yahoo.com> Message-ID: <445A467C.4070700@campus.iztacala.unam.mx> Website is ok, maybe your gateway can't lookup the bioperl server at the moment. Regards, Mauricio. Sam Al-Droubi wrote: > All, > > Is the bioperl website down? I can't get to http://www.bioperl.org > > > Thank you. > > > > Sincerely, > Sam Al-Droubi, M.S. > saldroubi at yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Thu May 4 14:40:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 4 May 2006 13:40:32 -0500 Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps In-Reply-To: <007b01c66e62$23161d20$c100a8c0@mike> Message-ID: <000001c66faa$3a25b130$15327e82@pyrimidine> Are you using the CONTIG record or the full GenBank file? I see problems with both (using bioperl-live) which seem unrelated to one another. The full file seems to be running a bit slow b/c the full GenBank record is huge (~55 MB) but the CONTIG file does exactly what you said (runs out of memory). Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Michael Rogoff > Sent: Tuesday, May 02, 2006 10:32 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > > I've encountered a pretty serious bug in Bio::SeqIO when parsing certain > genbank > files that contain CONTIG entries with gaps. One such record is > NW_925173. > > When I try to parse this file using Bio::SeqIO::genbank, it will enter an > infinite loop and spin until it runs out of memory. > > I'm pretty certain it relates to this bug: > http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to indicate > that > genbank records with CONTIG gaps are not valid and can't be parsed. But > this > bug actually claims to be fixed, which is strange, since looking at the > code for > FTLocationFactory (where the loop is) it's still right there. I assume > that > this may be fixed in other contexts but is still not fixed in > Bio::SeqIO::genbank? Or am I doing something wrong? > > I think that this should probably be filed as an open bug. I would think > that > even if bioperl isn't interested in parsing this type of file via SeqIO, > certainly you'd want to ensure that no finite input file would send the > parser > into an infinite loop. Have others encountered this problem? Is there > any plan > to address it? > > Thanks very much for any information or help! > > -Mike > > P.S. I've played around with my version of FTLocationFactory and it seems > to > actually work and parse the gaps. I'm not sure if I've created other bugs > or if > it works in all cases, but at least the parser doesn't die. I also don't > know > that my hacky code is appropriate for putting back in to BioPerl, but I'm > happy > to provide it if someone wants to check it out and/or consider it for > checkin. > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From j.abbott at imperial.ac.uk Thu May 4 11:44:44 2006 From: j.abbott at imperial.ac.uk (James Abbott) Date: Thu, 04 May 2006 16:44:44 +0100 Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines In-Reply-To: <7B49D031-9F74-43C3-AA4F-2AE115BB843D@duke.edu> References: <20060502114101.29745.qmail@web50409.mail.yahoo.com> <7B49D031-9F74-43C3-AA4F-2AE115BB843D@duke.edu> Message-ID: <445A216C.7090108@imperial.ac.uk> Jason Stajich wrote: > I don't know if any of this has been resolved really so hopefully > James will speak up if he's implemented anything. Not as yet, I'm afraid - $job is keeping me overly busy at the moment, but it's on my todo list.... Cheers, James -- Dr. James Abbott Bioinformatics Software Developer, Bioinformatics Support Service Imperial College, London From hubert.prielinger at gmx.at Thu May 4 15:35:42 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Thu, 04 May 2006 13:35:42 -0600 Subject: [Bioperl-l] can't parse blast file anymore Message-ID: <445A578E.8050207@gmx.at> Hi, the following perl script worked fine until a few days ago.... ============================================================== #!/usr/bin/perl -w use Bio::SearchIO; use strict; use DBI; use Net::MySQL; #use lib qw(/usr/local/lib/perl5/site_perl/5.8.6/i686-linux); print "trying to connect to database \n"; my $database = 'antimicro_peptides'; my $host = 'ppc7.bio.ucalgary.ca'; my $user = 'Hubert'; my $password = 'Col00eng30'; my $mysql = Net::MySQL->new( hostname => $host, database => $database, user => $user, password => $password, ); print "Connection established \n"; my $selectID = 0; my $count = 0; ##output database results #while (my @row = $sth->fetchrow_array) # { print "@row\n" } print "start program\n"; my $directory = '/home/Hubert/test'; opendir(DIR, $directory) || die("Cannot open directory"); print "opened directory\n"; foreach my $file (readdir(DIR)) { if ($file =~ /txt$/) { $count++; print "read file $file \n"; $file = $directory . '/' . $file; my $search = new Bio::SearchIO (-format => 'blast', -file => $file); print "bioperl seems to work....\n"; my $cutoff_len = 10; #iterate over each query sequence print "try to enter while loop\n"; while (my $result = $search->next_result) { print "entered 1st while loop\n"; #iterate over each hit on the query sequence while (my $hit = $result->next_hit) { print "entered 2nd while loop\n"; #iterate over each HSP in the hit while (my $hsp = $hit->next_hsp) { print "entered 3rd while loop\n"; if ($hsp->length('sbjct') <= $cutoff_len) { #print $hsp->hit_string, "\n"; for ($hsp->hit_string) { #$hsp->hit_string print "count files....., $count ,\n"; ................. =================================================================== Output: [Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl trying to connect to database Connection established start program opened directory read file 40026.txt bioperl seems to work.... try to enter while loop but it doesn't enter the first while loop, it stuck there, first I thought it is a linux problem, because I updated from FC4 to FC5, but it isn't because perl is working fine, and it seems bioperl is working fine too, but it cannot parse the file anymore..... regards Hubert From barry.moore at genetics.utah.edu Thu May 4 17:22:51 2006 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu, 4 May 2006 15:22:51 -0600 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <445A578E.8050207@gmx.at> References: <445A578E.8050207@gmx.at> Message-ID: Hubert, My first suggestion would be to log onto your calgary server and change your password real quick (unless that is intended to post you password to the world). Well, this isn't an answer, but it may help you find one. Use perl -d your_script.pl to run your script under the debugger. Type 'n' to step forward to the line where you start the while loop. Type 'x $result' to see that an object exists (it should or you'd have gotten an error). Type 's' to step into the next_results call, and then continue to type 'n' and 's' as needed to burrow down to see if you can find where you're hanging. Barry On May 4, 2006, at 1:35 PM, Hubert Prielinger wrote: > Hi, > the following perl script worked fine until a few days ago.... > > ============================================================== > #!/usr/bin/perl -w > > use Bio::SearchIO; > use strict; > use DBI; > use Net::MySQL; > > #use lib qw(/usr/local/lib/perl5/site_perl/5.8.6/i686-linux); > > print "trying to connect to database \n"; > my $database = 'antimicro_peptides'; > my $host = 'ppc7.bio.ucalgary.ca'; > my $user = 'Hubert'; > my $password = 'Col00eng30'; > > my $mysql = Net::MySQL->new( > hostname => $host, > database => $database, > user => $user, > password => $password, > ); > > > print "Connection established \n"; > > my $selectID = 0; > my $count = 0; > > > > ##output database results > #while (my @row = $sth->fetchrow_array) > # { print "@row\n" } > > > > print "start program\n"; > my $directory = '/home/Hubert/test'; > opendir(DIR, $directory) || die("Cannot open directory"); > print "opened directory\n"; > > foreach my $file (readdir(DIR)) { > if ($file =~ /txt$/) { > $count++; > print "read file $file \n"; > > > $file = $directory . '/' . $file; > > my $search = new Bio::SearchIO (-format => 'blast', > -file => $file); > print "bioperl seems to work....\n"; > my $cutoff_len = 10; > > #iterate over each query sequence > print "try to enter while loop\n"; > while (my $result = $search->next_result) { > print "entered 1st while loop\n"; > > #iterate over each hit on the query sequence > while (my $hit = $result->next_hit) { > print "entered 2nd while loop\n"; > > #iterate over each HSP in the hit > while (my $hsp = $hit->next_hsp) { > print "entered 3rd while loop\n"; > > if ($hsp->length('sbjct') <= $cutoff_len) { > #print $hsp->hit_string, "\n"; > > for ($hsp->hit_string) { #$hsp->hit_string > print "count files....., $count ,\n"; > ................. > > =================================================================== > > Output: > > [Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl > trying to connect to database > Connection established > start program > opened directory > read file 40026.txt > bioperl seems to work.... > try to enter while loop > > > but it doesn't enter the first while loop, it stuck there, first I > thought it is a linux problem, because I updated from FC4 to FC5, > but it > isn't because perl is working fine, and it seems bioperl is working > fine > too, but it cannot parse the file anymore..... > > regards > Hubert > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Thu May 4 18:27:57 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 4 May 2006 17:27:57 -0500 Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps In-Reply-To: <000001c66faa$3a25b130$15327e82@pyrimidine> Message-ID: <000001c66fc9$fe7e5680$15327e82@pyrimidine> Here's another odd bit. This is what I get for the CONTIG line when I passed a simple contig file (NW_925062, with one join) through Bio::SeqIO: ----------------------------------- .... FEATURES Location/Qualifiers source 1..8541 /db_xref="taxon:9606" /mol_type="genomic DNA" /chromosome="11" /organism="Homo sapiens" CONTIG AADB02014027.1:1..8541 // ----------------------------------- Here's the original: ----------------------------------- FEATURES Location/Qualifiers source 1..8541 /organism="Homo sapiens" /mol_type="genomic DNA" /db_xref="taxon:9606" /chromosome="11" CONTIG join(AADB02014027.1:1..8541) // ----------------------------------- Looks like it lopped out the 'join' here as well. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Thursday, May 04, 2006 1:41 PM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > Are you using the CONTIG record or the full GenBank file? I see > problems with both (using bioperl-live) which seem unrelated to one > another. > The full file seems to be running a bit slow b/c the full GenBank record > is > huge (~55 MB) but the CONTIG file does exactly what you said (runs out of > memory). > > Chris > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Michael Rogoff > > Sent: Tuesday, May 02, 2006 10:32 PM > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > > > > > I've encountered a pretty serious bug in Bio::SeqIO when parsing certain > > genbank > > files that contain CONTIG entries with gaps. One such record is > > NW_925173. > > > > When I try to parse this file using Bio::SeqIO::genbank, it will enter > an > > infinite loop and spin until it runs out of memory. > > > > I'm pretty certain it relates to this bug: > > http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to indicate > > that > > genbank records with CONTIG gaps are not valid and can't be parsed. But > > this > > bug actually claims to be fixed, which is strange, since looking at the > > code for > > FTLocationFactory (where the loop is) it's still right there. I assume > > that > > this may be fixed in other contexts but is still not fixed in > > Bio::SeqIO::genbank? Or am I doing something wrong? > > > > I think that this should probably be filed as an open bug. I would > think > > that > > even if bioperl isn't interested in parsing this type of file via SeqIO, > > certainly you'd want to ensure that no finite input file would send the > > parser > > into an infinite loop. Have others encountered this problem? Is there > > any plan > > to address it? > > > > Thanks very much for any information or help! > > > > -Mike > > > > P.S. I've played around with my version of FTLocationFactory and it > seems > > to > > actually work and parse the gaps. I'm not sure if I've created other > bugs > > or if > > it works in all cases, but at least the parser doesn't die. I also > don't > > know > > that my hacky code is appropriate for putting back in to BioPerl, but > I'm > > happy > > to provide it if someone wants to check it out and/or consider it for > > checkin. > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Thu May 4 18:39:05 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 4 May 2006 18:39:05 -0400 Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps In-Reply-To: <000001c66fc9$fe7e5680$15327e82@pyrimidine> References: <000001c66fc9$fe7e5680$15327e82@pyrimidine> Message-ID: <2E0D7723-FA6E-4812-8DBB-30FCD11FA85C@gmx.net> The two notations are equivalent and syntactically correct, or so I believe ... I don't think 100% verbatim preservation should be the goal. Or am I missing the point? On May 4, 2006, at 6:27 PM, Chris Fields wrote: > Here's another odd bit. This is what I get for the CONTIG line when I > passed a simple contig file (NW_925062, with one join) through > Bio::SeqIO: > > ----------------------------------- > .... > FEATURES Location/Qualifiers > source 1..8541 > /db_xref="taxon:9606" > /mol_type="genomic DNA" > /chromosome="11" > /organism="Homo sapiens" > CONTIG AADB02014027.1:1..8541 > > // > ----------------------------------- > Here's the original: > ----------------------------------- > FEATURES Location/Qualifiers > source 1..8541 > /organism="Homo sapiens" > /mol_type="genomic DNA" > /db_xref="taxon:9606" > /chromosome="11" > CONTIG join(AADB02014027.1:1..8541) > // > ----------------------------------- > > Looks like it lopped out the 'join' here as well. > > Chris > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Chris Fields >> Sent: Thursday, May 04, 2006 1:41 PM >> To: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps >> >> Are you using the CONTIG record or the full GenBank file? I see >> problems with both (using bioperl-live) which seem unrelated to one >> another. >> The full file seems to be running a bit slow b/c the full GenBank >> record >> is >> huge (~55 MB) but the CONTIG file does exactly what you said (runs >> out of >> memory). >> >> Chris >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff >>> Sent: Tuesday, May 02, 2006 10:32 PM >>> To: bioperl-l at lists.open-bio.org >>> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps >>> >>> >>> I've encountered a pretty serious bug in Bio::SeqIO when parsing >>> certain >>> genbank >>> files that contain CONTIG entries with gaps. One such record is >>> NW_925173. >>> >>> When I try to parse this file using Bio::SeqIO::genbank, it will >>> enter >> an >>> infinite loop and spin until it runs out of memory. >>> >>> I'm pretty certain it relates to this bug: >>> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to >>> indicate >>> that >>> genbank records with CONTIG gaps are not valid and can't be >>> parsed. But >>> this >>> bug actually claims to be fixed, which is strange, since looking >>> at the >>> code for >>> FTLocationFactory (where the loop is) it's still right there. I >>> assume >>> that >>> this may be fixed in other contexts but is still not fixed in >>> Bio::SeqIO::genbank? Or am I doing something wrong? >>> >>> I think that this should probably be filed as an open bug. I would >> think >>> that >>> even if bioperl isn't interested in parsing this type of file via >>> SeqIO, >>> certainly you'd want to ensure that no finite input file would >>> send the >>> parser >>> into an infinite loop. Have others encountered this problem? Is >>> there >>> any plan >>> to address it? >>> >>> Thanks very much for any information or help! >>> >>> -Mike >>> >>> P.S. I've played around with my version of FTLocationFactory and it >> seems >>> to >>> actually work and parse the gaps. I'm not sure if I've created >>> other >> bugs >>> or if >>> it works in all cases, but at least the parser doesn't die. I also >> don't >>> know >>> that my hacky code is appropriate for putting back in to BioPerl, >>> but >> I'm >>> happy >>> to provide it if someone wants to check it out and/or consider it >>> for >>> checkin. >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hubert.prielinger at gmx.at Thu May 4 19:57:44 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Thu, 04 May 2006 17:57:44 -0600 Subject: [Bioperl-l] can't parse blast file anymore In-Reply-To: <445A7449.1080607@infotech.monash.edu.au> References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au> Message-ID: <445A94F8.9000903@gmx.at> Torsten Seemann wrote: > Hubert > >> the following perl script worked fine until a few days ago.... >> >> #iterate over each query sequence >> print "try to enter while loop\n"; >> >> > die "Bad BLAST report" if not defined $search; > >> while (my $result = $search->next_result) { >> print "entered 1st while loop\n"; >> >> Output: >> >> [Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl >> try to enter while loop >> >> but it doesn't enter the first while loop, it stuck there, first I >> > What is the value of $search before you start the WHILE loop ? > > hi, $search is defined, like my $search = new Bio::SearchIO (-format => 'blast', -file => $file) if I try it with the debugger as barry has suggested than I get the following DB<1> n main::(Blast.pl:24): print "Connection established \n"; DB<1> n Connection established main::(Blast.pl:26): my $selectID = 0; DB<1> n main::(Blast.pl:27): my $count = 0; DB<1> n main::(Blast.pl:37): print "start program\n"; DB<1> n start program main::(Blast.pl:38): my $directory = '/home/Hubert/test'; DB<1> n main::(Blast.pl:39): opendir(DIR, $directory) || die("Cannot open directory"); DB<1> n main::(Blast.pl:40): print "opened directory\n"; DB<1> n opened directory main::(Blast.pl:42): foreach my $file (readdir(DIR)) { DB<1> n main::(Blast.pl:43): if ($file =~ /txt$/) { DB<1> n main::(Blast.pl:44): $count++; DB<1> n main::(Blast.pl:45): print "read file $file \n"; DB<1> n read file 40026.txt main::(Blast.pl:48): $file = $directory . '/' . $file; DB<1> n main::(Blast.pl:50): my $search = new Bio::SearchIO (-format => 'blast', main::(Blast.pl:51): -file => $file); DB<1> n main::(Blast.pl:52): print "bioperl seems to work....\n"; DB<1> s $search main::((eval 14)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3): 3: $search; DB<<2>> n DB<2> n bioperl seems to work.... main::(Blast.pl:53): my $cutoff_len = 10; DB<2> n main::(Blast.pl:56): print "try to enter while loop\n"; DB<2> n try to enter while loop main::(Blast.pl:57): while (my $result = $search->next_result) { DB<2> s $result main::((eval 15)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3): 3: $result; DB<<3>> From torsten.seemann at infotech.monash.edu.au Thu May 4 17:38:17 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 05 May 2006 07:38:17 +1000 Subject: [Bioperl-l] can't parse blast file anymore In-Reply-To: <445A578E.8050207@gmx.at> References: <445A578E.8050207@gmx.at> Message-ID: <445A7449.1080607@infotech.monash.edu.au> Hubert >the following perl script worked fine until a few days ago.... > > #iterate over each query sequence > print "try to enter while loop\n"; > > die "Bad BLAST report" if not defined $search; > while (my $result = $search->next_result) { > print "entered 1st while loop\n"; > >Output: > >[Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl >try to enter while loop > >but it doesn't enter the first while loop, it stuck there, first I > > What is the value of $search before you start the WHILE loop ? From barry.moore at genetics.utah.edu Thu May 4 20:39:57 2006 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu, 4 May 2006 18:39:57 -0600 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <445A94F8.9000903@gmx.at> References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au> <445A94F8.9000903@gmx.at> Message-ID: <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu> That should be 'x $resust' and you should see the object dumped to the screen. or just 's' by itself which will step you into the sub on the while line will step you into the next_result sub, and you can look around and watch what's happening. B > DB<2> s $result > main::((eval 15)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3): > 3: $result; > DB<<3>> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Thu May 4 22:04:20 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Thu, 04 May 2006 20:04:20 -0600 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu> References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au> <445A94F8.9000903@gmx.at> <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu> Message-ID: <445AB2A4.7020405@gmx.at> if I do so it returns: 0 undef Barry Moore wrote: > That should be 'x $resust' and you should see the object dumped to > the screen. > > or just 's' by itself which will step you into the sub on the while > line will step you into the next_result sub, and you can look around > and watch what's happening. > > B > > >> DB<2> s $result >> main::((eval 15)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3): >> 3: $result; >> DB<<3>> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From torsten.seemann at infotech.monash.edu.au Fri May 5 00:40:34 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 05 May 2006 14:40:34 +1000 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <445AB2A4.7020405@gmx.at> References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au> <445A94F8.9000903@gmx.at> <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu> <445AB2A4.7020405@gmx.at> Message-ID: <445AD742.4070408@infotech.monash.edu.au> Hubert Prielinger wrote: > if I do so it returns: > 0 undef That means the value of $search was undef. That means that it could not parse or open the BLAST report. I repeat the line that I put in my earlier email which you ignored. # your line my $search = Bio::SearchIO->new( ..... ); # then check if it was successful! die "could not open blast report" if not defined $search; --Torsten From jason.stajich at duke.edu Fri May 5 09:21:38 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri, 5 May 2006 09:21:38 -0400 Subject: [Bioperl-l] bioperl-AlignIO problems parsing fasta files In-Reply-To: <5d35ac4d.b35ce863.8198d00@expms1.cites.uiuc.edu> References: <5d35ac4d.b35ce863.8198d00@expms1.cites.uiuc.edu> Message-ID: Space after the > is causing the problem since we infer the ID as the everything after the '>' BEFORE the first whitespace. Get rid of the space. $ perl -i.backup -p -e 's/^>\s+/>/' YOURFASALNFILE On May 4, 2006, at 7:00 PM, Gloria Rendon wrote: > contents of the input file has a single sequence: > >> gi|90108701|pdb|2AHZ|B Chain B, K+ Complex Of The Nak Channel > MLSFLLTLKRMLRACLRAWKDKEFQVLFVLTILTLISGTIFYSTVEGLRPIDALYFSVVTLTTVGDGNFS > PQTDFGKIFTILYIFIGIGLVFGFIHKLAVNVQLPSILSN > ------------------------------------------ > this is the script that tries to parse it: > > use Bio::AlignIO; > my $inseq = Bio::AlignIO->new(-format => 'fasta', > -file => 'test.fasta'); > while( my $aln = $inseq->next_aln ) { > print "name: ", $aln->displayname; > print "length: ", $aln->length; > print "\n"; > } > > ------------------------------------------ > and this is the result of running that script on winxp > > D:\msa\NAK MUTANTS>perl parseFasta.pl > > > ------------- EXCEPTION ------------- > MSG: No sequence with name [] > STACK Bio::SimpleAlign::displayname > C:/Perl/site/lib/Bio/SimpleAlign.pm:2047 > STACK toplevel parseFasta.pl:11 > > -------------------------------------- > D:\msa\NAK MUTANTS> -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From thoufek at pngg.org Thu May 4 12:50:44 2006 From: thoufek at pngg.org (T.D. Houfek) Date: Thu, 04 May 2006 12:50:44 -0400 Subject: [Bioperl-l] Bio::Seq::Quality description line problem In-Reply-To: References: Message-ID: <445A30E4.6070103@pngg.org> Using Bioperl 1.5, having trouble with writing FASTA-style quality files using Bio::Seq::Quality. I create the Bio::Seq::Quality object, giving its constructor an ID, a description, a nucleotide sequence, and a quality sequence. I then write the sequence FASTA and the quality FASTA. The description string will appear in the header line of the sequence FASTA, but not in the header line of the quality FASTA. Can anybody help me figure out how to fix this? I've attached a sample script and output. -T.D. ------------------- sample script follows --------------------------------------- #!/usr/bin/perl use strict; use Bio::Seq::Quality; use Bio::SeqIO; my $id = "bogus_id"; my $desc = "bogus description"; my $seq = "ATTATTATTATTATT"; my $qual = "10 20 30 10 20 30 10 20 30 10 20 30 10 20 30"; my $sequal_obj = Bio::Seq::Quality->new( -display_id => $id, -desc => $desc, -seq => $seq, -qual => $qual ); my $qualout = Bio::SeqIO->new( -file => ">myfile.qual", -format => 'qual' ); my $seqout = Bio::SeqIO->new( -file => ">myfile.seq", -format => 'Fasta' ); $seqout->write_seq($sequal_obj); $qualout->write_seq($sequal_obj); ------------------ sample output follows --------------------------------------- tdhoufek at aether:~$ cat myfile.seq >bogus_id bogus description ATTATTATTATTATT tdhoufek at aether:~$ cat myfile.qual >bogus_id 10 20 30 10 20 30 10 20 30 10 20 30 10 20 30 -------------------------------------------------------------------------------------------------- -- T.D. Houfek senior bioinformatics developer plant nematode genetics group north carolina state university Email: thoufek at pngg.org ---------------------------------------------------------- use Bio::Seq; @a =qw/NNN CCT GAG CAT GCG TGT AAG AAC TAG/; $u=seq;$r=Bio::Seq;sub c{$c=$r->new(-$u=>"@_[0]")->revcom; $t=$c->$u;}map{m/\d/?$g=c($a[$_]):tr/a-i/1-9/&&($g=$a[$_]) ;$x[$i++]=$g;} split //,"dgh5cb40ab120cdefb4";$z=$r->new(- $u=>(join"", at x))->translate()->$u;$z =~s/X/ /g;print"$z\n" From jason.stajich at duke.edu Fri May 5 09:27:51 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri, 5 May 2006 09:27:51 -0400 Subject: [Bioperl-l] bioperl-AlignIO problems parsing fasta files In-Reply-To: References: <5d35ac4d.b35ce863.8198d00@expms1.cites.uiuc.edu> Message-ID: <0F79C9AD-DE36-4424-9E59-37ABE8B62A5E@duke.edu> [replying to myself] although if you are trying to just read a sequence not an alignment then you want to use Bio::SeqIO. See the copious help on the HOWTO page at bioperl website including a sequence and feature howto and beginner's guide. http://bioperl.org/wiki/HOWTOs -jason On May 5, 2006, at 9:21 AM, Jason Stajich wrote: > Space after the > is causing the problem since we infer the ID as the > everything after the '>' BEFORE the first whitespace. Get rid of the > space. > $ perl -i.backup -p -e 's/^>\s+/>/' YOURFASALNFILE > > On May 4, 2006, at 7:00 PM, Gloria Rendon wrote: > >> contents of the input file has a single sequence: >> >>> gi|90108701|pdb|2AHZ|B Chain B, K+ Complex Of The Nak Channel >> MLSFLLTLKRMLRACLRAWKDKEFQVLFVLTILTLISGTIFYSTVEGLRPIDALYFSVVTLTTVGDGNF >> S >> PQTDFGKIFTILYIFIGIGLVFGFIHKLAVNVQLPSILSN >> ------------------------------------------ >> this is the script that tries to parse it: >> >> use Bio::AlignIO; >> my $inseq = Bio::AlignIO->new(-format => 'fasta', >> -file => 'test.fasta'); >> while( my $aln = $inseq->next_aln ) { >> print "name: ", $aln->displayname; >> print "length: ", $aln->length; >> print "\n"; >> } >> >> ------------------------------------------ >> and this is the result of running that script on winxp >> >> D:\msa\NAK MUTANTS>perl parseFasta.pl >> >> >> ------------- EXCEPTION ------------- >> MSG: No sequence with name [] >> STACK Bio::SimpleAlign::displayname >> C:/Perl/site/lib/Bio/SimpleAlign.pm:2047 >> STACK toplevel parseFasta.pl:11 >> >> -------------------------------------- >> D:\msa\NAK MUTANTS> > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From osborne1 at optonline.net Fri May 5 10:04:02 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 05 May 2006 10:04:02 -0400 Subject: [Bioperl-l] Bio::Seq::Quality description line problem In-Reply-To: <445A30E4.6070103@pngg.org> Message-ID: T.D., According to the documentation, http://www.bioperl.org/wiki/Qual_sequence_format, your *qual file looks right. What are you trying to create? Brian O. On 5/4/06 12:50 PM, "T.D. Houfek" wrote: > Using Bioperl 1.5, having trouble with writing FASTA-style quality files > using Bio::Seq::Quality. > > I create the Bio::Seq::Quality object, giving its constructor an ID, a > description, a nucleotide sequence, and a quality sequence. I then write > the sequence FASTA and the quality FASTA. The description string will > appear in the header line of the sequence FASTA, but not in the header > line of the quality FASTA. > > Can anybody help me figure out how to fix this? I've attached a sample > script and output. > > -T.D. > > ------------------- sample script follows > --------------------------------------- > > #!/usr/bin/perl > use strict; > use Bio::Seq::Quality; > use Bio::SeqIO; > > my $id = "bogus_id"; > my $desc = "bogus description"; > my $seq = "ATTATTATTATTATT"; > my $qual = "10 20 30 10 20 30 10 20 30 10 20 30 10 20 30"; > > my $sequal_obj = Bio::Seq::Quality->new( > -display_id => $id, > -desc => $desc, > -seq => $seq, > -qual => $qual > ); > > my $qualout = Bio::SeqIO->new( > -file => ">myfile.qual", > -format => 'qual' > ); > my $seqout = Bio::SeqIO->new( > -file => ">myfile.seq", > -format => 'Fasta' > ); > > $seqout->write_seq($sequal_obj); > $qualout->write_seq($sequal_obj); > > > ------------------ sample output follows > --------------------------------------- > > tdhoufek at aether:~$ cat myfile.seq >> bogus_id bogus description > ATTATTATTATTATT > tdhoufek at aether:~$ cat myfile.qual >> bogus_id > 10 20 30 10 20 30 10 20 30 10 20 30 10 20 30 > > ------------------------------------------------------------------------------ > -------------------- > > > From cjfields at uiuc.edu Fri May 5 10:24:05 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 May 2006 09:24:05 -0500 Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps In-Reply-To: <2E0D7723-FA6E-4812-8DBB-30FCD11FA85C@gmx.net> Message-ID: <001701c6704f$90dbd090$15327e82@pyrimidine> I'm not sure it's a valid CONTIG file w/o the join(...). This is a chunk from the longer file Michael used as an example here (NW_925173). I believe the CONTIG line is currently handled like a feature so I think it goes through Bio::SeqIO::FTHelper, which is where Michael mentions his bugfix is; I think it's getting beaten up in there somehow. I may see what happens if it's treated like a WGS line (like a Bio::Annotation::SimpleValue object) and just glob the whole mess together as is. Chris ... FEATURES Location/Qualifiers source 1..44976370 /organism="Homo sapiens" /mol_type="genomic DNA" /db_xref="taxon:9606" /chromosome="11" CONTIG join(AADB02014316.1:1..1482320,gap(67),AADB02014317.1:1..577321, gap(441),AADB02014318.1:1..173584,gap(676), AADB02014319.1:1..377558,gap(20), complement(AADB02014320.1:1..431263),gap(20), AADB02014321.1:1..794957,gap(1241),AADB02014322.1:1..1366198, gap(6446),AADB02014323.1:1..3366,gap(20),AADB02014324.1:1..4771, gap(4611),AADB02014325.1:1..383881,gap(20), complement(AADB02014326.1:1..381633),gap(1930), complement(AADB02014327.1:1..460053),gap(20), AADB02014328.1:1..4186,gap(1587), ... > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > Sent: Thursday, May 04, 2006 5:39 PM > To: Chris Fields > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > The two notations are equivalent and syntactically correct, or so I > believe ... I don't think 100% verbatim preservation should be the > goal. Or am I missing the point? > > On May 4, 2006, at 6:27 PM, Chris Fields wrote: > > > Here's another odd bit. This is what I get for the CONTIG line when I > > passed a simple contig file (NW_925062, with one join) through > > Bio::SeqIO: > > > > ----------------------------------- > > .... > > FEATURES Location/Qualifiers > > source 1..8541 > > /db_xref="taxon:9606" > > /mol_type="genomic DNA" > > /chromosome="11" > > /organism="Homo sapiens" > > CONTIG AADB02014027.1:1..8541 > > > > // > > ----------------------------------- > > Here's the original: > > ----------------------------------- > > FEATURES Location/Qualifiers > > source 1..8541 > > /organism="Homo sapiens" > > /mol_type="genomic DNA" > > /db_xref="taxon:9606" > > /chromosome="11" > > CONTIG join(AADB02014027.1:1..8541) > > // > > ----------------------------------- > > > > Looks like it lopped out the 'join' here as well. > > > > Chris > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Chris Fields > >> Sent: Thursday, May 04, 2006 1:41 PM > >> To: bioperl-l at lists.open-bio.org > >> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > >> > >> Are you using the CONTIG record or the full GenBank file? I see > >> problems with both (using bioperl-live) which seem unrelated to one > >> another. > >> The full file seems to be running a bit slow b/c the full GenBank > >> record > >> is > >> huge (~55 MB) but the CONTIG file does exactly what you said (runs > >> out of > >> memory). > >> > >> Chris > >> > >>> -----Original Message----- > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff > >>> Sent: Tuesday, May 02, 2006 10:32 PM > >>> To: bioperl-l at lists.open-bio.org > >>> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > >>> > >>> > >>> I've encountered a pretty serious bug in Bio::SeqIO when parsing > >>> certain > >>> genbank > >>> files that contain CONTIG entries with gaps. One such record is > >>> NW_925173. > >>> > >>> When I try to parse this file using Bio::SeqIO::genbank, it will > >>> enter > >> an > >>> infinite loop and spin until it runs out of memory. > >>> > >>> I'm pretty certain it relates to this bug: > >>> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to > >>> indicate > >>> that > >>> genbank records with CONTIG gaps are not valid and can't be > >>> parsed. But > >>> this > >>> bug actually claims to be fixed, which is strange, since looking > >>> at the > >>> code for > >>> FTLocationFactory (where the loop is) it's still right there. I > >>> assume > >>> that > >>> this may be fixed in other contexts but is still not fixed in > >>> Bio::SeqIO::genbank? Or am I doing something wrong? > >>> > >>> I think that this should probably be filed as an open bug. I would > >> think > >>> that > >>> even if bioperl isn't interested in parsing this type of file via > >>> SeqIO, > >>> certainly you'd want to ensure that no finite input file would > >>> send the > >>> parser > >>> into an infinite loop. Have others encountered this problem? Is > >>> there > >>> any plan > >>> to address it? > >>> > >>> Thanks very much for any information or help! > >>> > >>> -Mike > >>> > >>> P.S. I've played around with my version of FTLocationFactory and it > >> seems > >>> to > >>> actually work and parse the gaps. I'm not sure if I've created > >>> other > >> bugs > >>> or if > >>> it works in all cases, but at least the parser doesn't die. I also > >> don't > >>> know > >>> that my hacky code is appropriate for putting back in to BioPerl, > >>> but > >> I'm > >>> happy > >>> to provide it if someone wants to check it out and/or consider it > >>> for > >>> checkin. > >>> > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Fri May 5 10:47:50 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 5 May 2006 10:47:50 -0400 Subject: [Bioperl-l] Bio::Seq::Quality description line problem In-Reply-To: References: Message-ID: <2E1683FE-57E4-4D97-A958-1B529973E89E@gmx.net> He wants the description on the description line, like for the sequence file. Thomas, my guess is the code doesn't print the description to the line although I haven't made sure. Do you want to volunteer and check, add that print statement and post the patch? -hilmar On May 5, 2006, at 10:04 AM, Brian Osborne wrote: > T.D., > > According to the documentation, > http://www.bioperl.org/wiki/Qual_sequence_format, your *qual file > looks > right. What are you trying to create? > > Brian O. > > > On 5/4/06 12:50 PM, "T.D. Houfek" wrote: > >> Using Bioperl 1.5, having trouble with writing FASTA-style quality >> files >> using Bio::Seq::Quality. >> >> I create the Bio::Seq::Quality object, giving its constructor an >> ID, a >> description, a nucleotide sequence, and a quality sequence. I then >> write >> the sequence FASTA and the quality FASTA. The description string will >> appear in the header line of the sequence FASTA, but not in the >> header >> line of the quality FASTA. >> >> Can anybody help me figure out how to fix this? I've attached a >> sample >> script and output. >> >> -T.D. >> >> ------------------- sample script follows >> --------------------------------------- >> >> #!/usr/bin/perl >> use strict; >> use Bio::Seq::Quality; >> use Bio::SeqIO; >> >> my $id = "bogus_id"; >> my $desc = "bogus description"; >> my $seq = "ATTATTATTATTATT"; >> my $qual = "10 20 30 10 20 30 10 20 30 10 20 30 10 20 30"; >> >> my $sequal_obj = Bio::Seq::Quality->new( >> -display_id => $id, >> -desc => $desc, >> -seq => $seq, >> -qual => $qual >> ); >> >> my $qualout = Bio::SeqIO->new( >> -file => ">myfile.qual", >> -format => 'qual' >> ); >> my $seqout = Bio::SeqIO->new( >> -file => ">myfile.seq", >> -format => 'Fasta' >> ); >> >> $seqout->write_seq($sequal_obj); >> $qualout->write_seq($sequal_obj); >> >> >> ------------------ sample output follows >> --------------------------------------- >> >> tdhoufek at aether:~$ cat myfile.seq >>> bogus_id bogus description >> ATTATTATTATTATT >> tdhoufek at aether:~$ cat myfile.qual >>> bogus_id >> 10 20 30 10 20 30 10 20 30 10 20 30 10 20 30 >> >> --------------------------------------------------------------------- >> --------- >> -------------------- >> >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From dmessina at wustl.edu Fri May 5 11:24:47 2006 From: dmessina at wustl.edu (David Messina) Date: Fri, 5 May 2006 10:24:47 -0500 Subject: [Bioperl-l] Bio::Seq::Quality description line problem In-Reply-To: <445A30E4.6070103@pngg.org> References: <445A30E4.6070103@pngg.org> Message-ID: <5A549C57-A310-4623-BC44-787AC8BFD6C2@wustl.edu> Apologies if this is a repost -- mail troubles this morning. Hilmar is correct. From a cursory walk through the code in a debugger, it looks like Bio::SeqIO::qual's write_seq method doesn't read the 'desc' out of the Bio::Seq::Quality object. I think there should be something like this: if ($source->can('desc') and my $desc = $source->desc()) { $desc =~ s/\n//g; } $header .= " $desc"; before line 218 in Bio::SeqIO::qual (where the header is printed): $self->_print (">$header \n"); Dave From dmessina at wustl.edu Fri May 5 10:53:15 2006 From: dmessina at wustl.edu (David Messina) Date: Fri, 5 May 2006 09:53:15 -0500 Subject: [Bioperl-l] Bio::Seq::Quality description line problem In-Reply-To: <445A30E4.6070103@pngg.org> References: <445A30E4.6070103@pngg.org> Message-ID: T.D., From a cursory walk through your code in a debugger, it looks like Bio::SeqIO::qual's write_seq method doesn't read the 'desc' out of the Bio::Seq::Quality object. I think there should be something like this: if ($source->can('desc') and my $desc = $source->desc()) { $desc =~ s/\n//g; } $header .= " $desc"; before line 218 in Bio::SeqIO::qual (where the header is printed): $self->_print (">$header \n"); Dave From dmessina at wustl.edu Fri May 5 10:53:15 2006 From: dmessina at wustl.edu (David Messina) Date: Fri, 5 May 2006 09:53:15 -0500 Subject: [Bioperl-l] Bio::Seq::Quality description line problem In-Reply-To: <445A30E4.6070103@pngg.org> References: <445A30E4.6070103@pngg.org> Message-ID: T.D., From a cursory walk through your code in a debugger, it looks like Bio::SeqIO::qual's write_seq method doesn't read the 'desc' out of the Bio::Seq::Quality object. I think there should be something like this: if ($source->can('desc') and my $desc = $source->desc()) { $desc =~ s/\n//g; } $header .= " $desc"; before line 218 in Bio::SeqIO::qual (where the header is printed): $self->_print (">$header \n"); Dave From hubert.prielinger at gmx.at Fri May 5 14:30:24 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 05 May 2006 12:30:24 -0600 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <445AD742.4070408@infotech.monash.edu.au> References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au> <445A94F8.9000903@gmx.at> <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu> <445AB2A4.7020405@gmx.at> <445AD742.4070408@infotech.monash.edu.au> Message-ID: <445B99C0.6050407@gmx.at> hi, I have done, as you suggested and I got the error message: Can't call method "next_result" on an undefined value at.... then I looked up at the internet and found a thread which suggested to use strict and then the problem is solved.... but I'm already using use strict.. thanks Torsten Seemann wrote: > Hubert Prielinger wrote: > >> if I do so it returns: >> 0 undef >> > > That means the value of $search was undef. > That means that it could not parse or open the BLAST report. > I repeat the line that I put in my earlier email which you ignored. > > # your line > my $search = Bio::SearchIO->new( ..... ); > > # then check if it was successful! > die "could not open blast report" if not defined $search; > > --Torsten > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From cjfields at uiuc.edu Fri May 5 15:18:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 May 2006 14:18:16 -0500 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <445B99C0.6050407@gmx.at> Message-ID: <000001c67078$a9a7ca10$15327e82@pyrimidine> What happens if you add the verbose flag? my $search = new Bio::SearchIO (-verbose => 1, -format => 'blast', -file => $file); Added thought : you might want to look at File::Find for stepping through your files and performing a task on each one, such as parsing output. It changes into the working directory each time; you should be able to do something like this: use File::Find; use Bio::SearchIO; Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger > Sent: Friday, May 05, 2006 1:30 PM > To: Torsten Seemann; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore > > hi, > I have done, as you suggested and I got the error message: > > Can't call method "next_result" on an undefined value at.... > > then I looked up at the internet and found a thread which suggested to > use strict and then the problem is solved.... > but I'm already using use strict.. > > thanks > > Torsten Seemann wrote: > > Hubert Prielinger wrote: > > > >> if I do so it returns: > >> 0 undef > >> > > > > That means the value of $search was undef. > > That means that it could not parse or open the BLAST report. > > I repeat the line that I put in my earlier email which you ignored. > > > > # your line > > my $search = Bio::SearchIO->new( ..... ); > > > > # then check if it was successful! > > die "could not open blast report" if not defined $search; > > > > --Torsten > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri May 5 15:27:12 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 May 2006 14:27:12 -0500 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <445B99C0.6050407@gmx.at> Message-ID: <000101c67079$e8c86a00$15327e82@pyrimidine> Sorry, mail got sent before I finished it! Here I go again... What happens if you add the verbose flag? my $search = new Bio::SearchIO (-verbose => 1, -format => 'blast', -file => $file); Added thought : you might want to look at File::Find for stepping through your files and performing a task on each one, such as parsing output. It changes into the working directory each time; you should be able to do something like this: use File::Find; use Bio::SearchIO; my @dirlist = ("/home/Hubert/test"); find (\&dir, @dirlist); sub printdir { return unless /txt$/; return if (-d); my $parser = Bio::SearchIO->new(-file => $_, -format => 'blast'); while (my $result = $parser->next_result) { while (my $hit = $result->next_hit) { while (my $hsp = $hit->next_hsp) { # do stuff here } } } } Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger > Sent: Friday, May 05, 2006 1:30 PM > To: Torsten Seemann; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore > > hi, > I have done, as you suggested and I got the error message: > > Can't call method "next_result" on an undefined value at.... > > then I looked up at the internet and found a thread which suggested to > use strict and then the problem is solved.... > but I'm already using use strict.. > > thanks > > Torsten Seemann wrote: > > Hubert Prielinger wrote: > > > >> if I do so it returns: > >> 0 undef > >> > > > > That means the value of $search was undef. > > That means that it could not parse or open the BLAST report. > > I repeat the line that I put in my earlier email which you ignored. > > > > # your line > > my $search = Bio::SearchIO->new( ..... ); > > > > # then check if it was successful! > > die "could not open blast report" if not defined $search; > > > > --Torsten > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From barry.moore at genetics.utah.edu Fri May 5 15:39:37 2006 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri, 5 May 2006 13:39:37 -0600 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <445B99C0.6050407@gmx.at> References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au> <445A94F8.9000903@gmx.at> <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu> <445AB2A4.7020405@gmx.at> <445AD742.4070408@infotech.monash.edu.au> <445B99C0.6050407@gmx.at> Message-ID: <7F3D73A6-392E-4728-ACB9-FD3BEDFD3C18@genetics.utah.edu> Hubert- If you want to send me your script and input file I'll try to have a look at it. Barry On May 5, 2006, at 12:30 PM, Hubert Prielinger wrote: > hi, > I have done, as you suggested and I got the error message: > > Can't call method "next_result" on an undefined value at.... > > then I looked up at the internet and found a thread which suggested to > use strict and then the problem is solved.... > but I'm already using use strict.. > > thanks > > Torsten Seemann wrote: >> Hubert Prielinger wrote: >> >>> if I do so it returns: >>> 0 undef >>> >> >> That means the value of $search was undef. >> That means that it could not parse or open the BLAST report. >> I repeat the line that I put in my earlier email which you ignored. >> >> # your line >> my $search = Bio::SearchIO->new( ..... ); >> >> # then check if it was successful! >> die "could not open blast report" if not defined $search; >> >> --Torsten >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri May 5 16:07:53 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 May 2006 15:07:53 -0500 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <000101c67079$e8c86a00$15327e82@pyrimidine> Message-ID: <000201c6707f$97aaaba0$15327e82@pyrimidine> Oops! This is what happens when I copy and paste in a hurry. > use File::Find; > use Bio::SearchIO; > > my @dirlist = ("/home/Hubert/test"); > > find (\&dir, @dirlist); > > sub printdir { ^^^^^^^^^^^ Should be: sub dir { > return unless /txt$/; > return if (-d); > my $parser = Bio::SearchIO->new(-file => $_, > -format => 'blast'); > while (my $result = $parser->next_result) { > while (my $hit = $result->next_hit) { > while (my $hsp = $hit->next_hsp) { > # do stuff here > } > } > } > } Hubert, if the file you are parsing looks fine (i.e. valid BLAST output), post it and your script on Bugzilla and let us take a look. Leave out your password though ; > Chris From golharam at umdnj.edu Fri May 5 15:58:03 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Fri, 05 May 2006 15:58:03 -0400 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <000001c67078$a9a7ca10$15327e82@pyrimidine> Message-ID: <02f101c6707e$39a03a30$2f01a8c0@GOLHARMOBILE1> I'm not sure how applicable this is, but I've seen a problem with Perl if the LANG environment variable contain UTF8 (ex LANG=en_US.UTF8). I've changed mine to en_US and lots of perl string parsing problems went away. Also, what about running the bioperl tests on your installation (make test). What happens? -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields Sent: Friday, May 05, 2006 3:18 PM To: 'Hubert Prielinger'; 'Torsten Seemann'; bioperl-l at bioperl.org Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore What happens if you add the verbose flag? my $search = new Bio::SearchIO (-verbose => 1, -format => 'blast', -file => $file); Added thought : you might want to look at File::Find for stepping through your files and performing a task on each one, such as parsing output. It changes into the working directory each time; you should be able to do something like this: use File::Find; use Bio::SearchIO; Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger > Sent: Friday, May 05, 2006 1:30 PM > To: Torsten Seemann; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore > > hi, > I have done, as you suggested and I got the error message: > > Can't call method "next_result" on an undefined value at.... > > then I looked up at the internet and found a thread which suggested to > use strict and then the problem is solved.... but I'm already using > use strict.. > > thanks > > Torsten Seemann wrote: > > Hubert Prielinger wrote: > > > >> if I do so it returns: > >> 0 undef > >> > > > > That means the value of $search was undef. > > That means that it could not parse or open the BLAST report. I > > repeat the line that I put in my earlier email which you ignored. > > > > # your line > > my $search = Bio::SearchIO->new( ..... ); > > > > # then check if it was successful! > > die "could not open blast report" if not defined $search; > > > > --Torsten > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri May 5 17:56:29 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 May 2006 16:56:29 -0500 Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps In-Reply-To: <001701c6704f$90dbd090$15327e82@pyrimidine> Message-ID: <000901c6708e$c77442b0$15327e82@pyrimidine> Okay, I have changed the way the CONTIG line is handled in Bio::SeqIO::genbank. It was handling it as a feature; I just changed it over to handling it as a Bio::Annotation::SimpleValue object with the value being the entire contig section. It seems to pass tests fine but I'm operating off Windows and my wife's IBook went to the great desktop in the sky (motherboard), so I can't test it there. Pulling the file off using Bio::DB::GenBank (using the no-redirect flag) works w/o crashing out. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Friday, May 05, 2006 9:24 AM > To: 'Hilmar Lapp' > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > I'm not sure it's a valid CONTIG file w/o the join(...). This is a chunk > from the longer file Michael used as an example here (NW_925173). I > believe > the CONTIG line is currently handled like a feature so I think it goes > through Bio::SeqIO::FTHelper, which is where Michael mentions his bugfix > is; > I think it's getting beaten up in there somehow. I may see what happens if > it's treated like a WGS line (like a Bio::Annotation::SimpleValue object) > and just glob the whole mess together as is. > > > Chris > > ... > FEATURES Location/Qualifiers > source 1..44976370 > /organism="Homo sapiens" > /mol_type="genomic DNA" > /db_xref="taxon:9606" > /chromosome="11" > CONTIG > join(AADB02014316.1:1..1482320,gap(67),AADB02014317.1:1..577321, > gap(441),AADB02014318.1:1..173584,gap(676), > AADB02014319.1:1..377558,gap(20), > complement(AADB02014320.1:1..431263),gap(20), > AADB02014321.1:1..794957,gap(1241),AADB02014322.1:1..1366198, > > gap(6446),AADB02014323.1:1..3366,gap(20),AADB02014324.1:1..4771, > gap(4611),AADB02014325.1:1..383881,gap(20), > complement(AADB02014326.1:1..381633),gap(1930), > complement(AADB02014327.1:1..460053),gap(20), > AADB02014328.1:1..4186,gap(1587), > ... > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > > Sent: Thursday, May 04, 2006 5:39 PM > > To: Chris Fields > > Cc: bioperl-l at lists.open-bio.org > > Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > > > The two notations are equivalent and syntactically correct, or so I > > believe ... I don't think 100% verbatim preservation should be the > > goal. Or am I missing the point? > > > > On May 4, 2006, at 6:27 PM, Chris Fields wrote: > > > > > Here's another odd bit. This is what I get for the CONTIG line when I > > > passed a simple contig file (NW_925062, with one join) through > > > Bio::SeqIO: > > > > > > ----------------------------------- > > > .... > > > FEATURES Location/Qualifiers > > > source 1..8541 > > > /db_xref="taxon:9606" > > > /mol_type="genomic DNA" > > > /chromosome="11" > > > /organism="Homo sapiens" > > > CONTIG AADB02014027.1:1..8541 > > > > > > // > > > ----------------------------------- > > > Here's the original: > > > ----------------------------------- > > > FEATURES Location/Qualifiers > > > source 1..8541 > > > /organism="Homo sapiens" > > > /mol_type="genomic DNA" > > > /db_xref="taxon:9606" > > > /chromosome="11" > > > CONTIG join(AADB02014027.1:1..8541) > > > // > > > ----------------------------------- > > > > > > Looks like it lopped out the 'join' here as well. > > > > > > Chris > > > > > >> -----Original Message----- > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >> bounces at lists.open-bio.org] On Behalf Of Chris Fields > > >> Sent: Thursday, May 04, 2006 1:41 PM > > >> To: bioperl-l at lists.open-bio.org > > >> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > >> > > >> Are you using the CONTIG record or the full GenBank file? I see > > >> problems with both (using bioperl-live) which seem unrelated to one > > >> another. > > >> The full file seems to be running a bit slow b/c the full GenBank > > >> record > > >> is > > >> huge (~55 MB) but the CONTIG file does exactly what you said (runs > > >> out of > > >> memory). > > >> > > >> Chris > > >> > > >>> -----Original Message----- > > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >>> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff > > >>> Sent: Tuesday, May 02, 2006 10:32 PM > > >>> To: bioperl-l at lists.open-bio.org > > >>> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > >>> > > >>> > > >>> I've encountered a pretty serious bug in Bio::SeqIO when parsing > > >>> certain > > >>> genbank > > >>> files that contain CONTIG entries with gaps. One such record is > > >>> NW_925173. > > >>> > > >>> When I try to parse this file using Bio::SeqIO::genbank, it will > > >>> enter > > >> an > > >>> infinite loop and spin until it runs out of memory. > > >>> > > >>> I'm pretty certain it relates to this bug: > > >>> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to > > >>> indicate > > >>> that > > >>> genbank records with CONTIG gaps are not valid and can't be > > >>> parsed. But > > >>> this > > >>> bug actually claims to be fixed, which is strange, since looking > > >>> at the > > >>> code for > > >>> FTLocationFactory (where the loop is) it's still right there. I > > >>> assume > > >>> that > > >>> this may be fixed in other contexts but is still not fixed in > > >>> Bio::SeqIO::genbank? Or am I doing something wrong? > > >>> > > >>> I think that this should probably be filed as an open bug. I would > > >> think > > >>> that > > >>> even if bioperl isn't interested in parsing this type of file via > > >>> SeqIO, > > >>> certainly you'd want to ensure that no finite input file would > > >>> send the > > >>> parser > > >>> into an infinite loop. Have others encountered this problem? Is > > >>> there > > >>> any plan > > >>> to address it? > > >>> > > >>> Thanks very much for any information or help! > > >>> > > >>> -Mike > > >>> > > >>> P.S. I've played around with my version of FTLocationFactory and it > > >> seems > > >>> to > > >>> actually work and parse the gaps. I'm not sure if I've created > > >>> other > > >> bugs > > >>> or if > > >>> it works in all cases, but at least the parser doesn't die. I also > > >> don't > > >>> know > > >>> that my hacky code is appropriate for putting back in to BioPerl, > > >>> but > > >> I'm > > >>> happy > > >>> to provide it if someone wants to check it out and/or consider it > > >>> for > > >>> checkin. > > >>> > > >>> > > >>> > > >>> _______________________________________________ > > >>> Bioperl-l mailing list > > >>> Bioperl-l at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > -- > > =========================================================== > > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > =========================================================== > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Fri May 5 19:54:55 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 05 May 2006 17:54:55 -0600 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <02f101c6707e$39a03a30$2f01a8c0@GOLHARMOBILE1> References: <02f101c6707e$39a03a30$2f01a8c0@GOLHARMOBILE1> Message-ID: <445BE5CF.2000007@gmx.at> hi ryan, nothing happend if I add the verbose flag and how can I test my bioperl installation..... Ryan Golhar wrote: > I'm not sure how applicable this is, but I've seen a problem with Perl > if the LANG environment variable contain UTF8 (ex LANG=en_US.UTF8). > I've changed mine to en_US and lots of perl string parsing problems went > away. > > Also, what about running the bioperl tests on your installation (make > test). What happens? > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Friday, May 05, 2006 3:18 PM > To: 'Hubert Prielinger'; 'Torsten Seemann'; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore > > > What happens if you add the verbose flag? > > my $search = new Bio::SearchIO (-verbose => 1, > -format => 'blast', > -file => $file); > > Added thought : you might want to look at File::Find for stepping > through your files and performing a task on each one, such as parsing > output. It changes into the working directory each time; you should be > able to do something like this: > > use File::Find; > use Bio::SearchIO; > > > > > Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger >> Sent: Friday, May 05, 2006 1:30 PM >> To: Torsten Seemann; bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore >> >> hi, >> I have done, as you suggested and I got the error message: >> >> Can't call method "next_result" on an undefined value at.... >> >> then I looked up at the internet and found a thread which suggested to >> > > >> use strict and then the problem is solved.... but I'm already using >> use strict.. >> >> thanks >> >> Torsten Seemann wrote: >> >>> Hubert Prielinger wrote: >>> >>> >>>> if I do so it returns: >>>> 0 undef >>>> >>>> >>> That means the value of $search was undef. >>> That means that it could not parse or open the BLAST report. I >>> repeat the line that I put in my earlier email which you ignored. >>> >>> # your line >>> my $search = Bio::SearchIO->new( ..... ); >>> >>> # then check if it was successful! >>> die "could not open blast report" if not defined $search; >>> >>> --Torsten >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From hubert.prielinger at gmx.at Fri May 5 20:01:11 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 05 May 2006 18:01:11 -0600 Subject: [Bioperl-l] [BULK] can't parse blast file anymore Message-ID: <445BE747.5020202@gmx.at> hi I have posted my script and the blast file to bugzilla...... From hubert.prielinger at gmx.at Fri May 5 21:21:33 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 05 May 2006 19:21:33 -0600 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <445BE747.5020202@gmx.at> References: <445BE747.5020202@gmx.at> Message-ID: <445BFA1D.5060008@gmx.at> they bugzilla posting didn't work, what is the exact email address for bugzilla Hubert Prielinger wrote: > hi > I have posted my script and the blast file to bugzilla...... > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From cjfields at uiuc.edu Fri May 5 21:38:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 May 2006 20:38:47 -0500 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <445BFA1D.5060008@gmx.at> Message-ID: <000d01c670ad$d209f980$15327e82@pyrimidine> Hubert, Calm down. Breathe in, breath out. Relax....... Okay, here is the place to start. Read the instructions there first. http://www.bioperl.org/wiki/Bugs Bugs are reported at this site: http://bugzilla.bioperl.org/ Again, follow the instructions. You will have to create a user name and password to submit. Once that is set up, click the "Submit a new bug" link on the main bugzilla page. On that page, fill out all information first and a description of the error and hit 'commit'. Add the BLAST report and some sample script by clicking on the "Create a New Attachment" link (you'll have to do this for each file). Once you go back to the bug page you should see two attachments and the bug report. Any commits get sent through the bioperl-guts-l mail list which most developers subscribe to, so they'll know there's a new bug out there. I will not be able to get to it personally; our home computer died a slow painful death today (RIP 2002-2006) but I can get to it next week. If you post the bug, somebody might be able to get to it sooner! Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger > Sent: Friday, May 05, 2006 8:22 PM > To: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] [BULK] can't parse blast file anymore > > they bugzilla posting didn't work, what is the exact email address for > bugzilla > > Hubert Prielinger wrote: > > hi > > I have posted my script and the blast file to bugzilla...... > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri May 5 22:26:35 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 May 2006 21:26:35 -0500 Subject: [Bioperl-l] Changes to NCBIHelper (RE: CONTIG, genome files) Message-ID: <000f01c670b4$7f22f760$15327e82@pyrimidine> I committed a change to NCBIHelper that permits the downloading of CON (contig) files and corrects an issue where no sequence features were saved when rebuilding those files. If you use Bio::DB::GenBank regularly to download genome files, this likely will NOT affect your code unless you explicitly set the format type to 'genbank', like so: $factory = Bio::DB::GenBank->new(-format => 'gb'); # or 'genbank' I believe most will not have that setting since the default was already 'gb'. Now, the default is 'gbwithparts', which returns the full sequence regardless. If it is a file with a CONTIG line, the sequence is built on NCBI's end and will include seq features if they are present). As Brian said, we'll let NCBI do the work for us! If you need the actual file w/o sequence, then you can set the format to 'genbank' (like above) and it will grab it for you. There was an unrelated problem with CONTIG line parsing that I also fixed, where I changed the format over to a Bio::Annotation::SimpleValue as a workaround for now; for some reason some CON files were misparsed and resulted in infinite loops or missing 'join' statements. Chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From hubert.prielinger at gmx.at Sat May 6 18:22:05 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Sat, 06 May 2006 16:22:05 -0600 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <000d01c670ad$d209f980$15327e82@pyrimidine> References: <000d01c670ad$d209f980$15327e82@pyrimidine> Message-ID: <445D218D.2030504@gmx.at> ok, thanks I have submitted the bug bug #1994 Chris Fields wrote: > Hubert, > > Calm down. Breathe in, breath out. Relax....... > > Okay, here is the place to start. Read the instructions there first. > > http://www.bioperl.org/wiki/Bugs > > Bugs are reported at this site: > > http://bugzilla.bioperl.org/ > > Again, follow the instructions. You will have to create a user name and > password to submit. Once that is set up, click the "Submit a new bug" link > on the main bugzilla page. On that page, fill out all information first and > a description of the error and hit 'commit'. Add the BLAST report and some > sample script by clicking on the "Create a New Attachment" link (you'll have > to do this for each file). Once you go back to the bug page you should see > two attachments and the bug report. Any commits get sent through the > bioperl-guts-l mail list which most developers subscribe to, so they'll know > there's a new bug out there. > > I will not be able to get to it personally; our home computer died a slow > painful death today (RIP 2002-2006) but I can get to it next week. If you > post the bug, somebody might be able to get to it sooner! > > Chris > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger >> Sent: Friday, May 05, 2006 8:22 PM >> To: bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] [BULK] can't parse blast file anymore >> >> they bugzilla posting didn't work, what is the exact email address for >> bugzilla >> >> Hubert Prielinger wrote: >> >>> hi >>> I have posted my script and the blast file to bugzilla...... >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From torsten.seemann at infotech.monash.edu.au Sat May 6 20:57:14 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Sun, 07 May 2006 10:57:14 +1000 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <445D218D.2030504@gmx.at> References: <000d01c670ad$d209f980$15327e82@pyrimidine> <445D218D.2030504@gmx.at> Message-ID: <445D45EA.8020804@infotech.monash.edu.au> Hubert Prielinger wrote: > ok, thanks > I have submitted the bug > bug #1994 This is a line from the script you sent to Bugzilla: my $search = new Bio::SearchIO ( -verbose => 1,-format => 'blast', -file => $file) or die "could not open blast report" if not defined my $search; Althoygh syntactically correct, I don't think it is doing what you want. Please change it to this: my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or die "could not open blast report"; or alternatively, this: my $search = new Bio::SearchIO(-format => 'blast', -file => $file); if (not defined $search) { die "could not open blast report"; } and let us know what happens. all the example output you have supplied still suggests that Bio::SearchIO can not load or parse your blast report. -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia From mamillerpa at yahoo.com Sat May 6 19:07:30 2006 From: mamillerpa at yahoo.com (Mark A. Miller) Date: Sat, 6 May 2006 16:07:30 -0700 (PDT) Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines In-Reply-To: Message-ID: <20060506230730.56480.qmail@web50410.mail.yahoo.com> Thanks for your responses, Jason and Brian. Brian, you suggestion works great. I had really hoped that by parsing the OS line as well, I could be sure I wasn't missing any sequences from my organisms. Well, I gave up on that and just obtained the NCBI taxonomy values. I find it pretty easy to work with them in bioperl. Unfortunately, walking through all of Trembl takes a while, and I'm getting this error: Can't call method "ncbi_taxid" on an undefined value at ./ga2.pl line 55, line 3253682. When I try to extract annotations, etc., from entries like: DHE4_UNKP with: my $species_object = $seq->species; my $taxid_string = $species_object->ncbi_taxid; I guess I have to write an error handler for incomplete taxonomy values. Bye for now, Mark --- Brian Osborne wrote: > Mark, > > The RC line is part of the description of a reference, I'm guessing > 'RC' > stands for Reference Comment. In order to get the attributes of a > reference > you'll first do something like: > > my $anno_collection = $seq->annotation; > my @references = $anno_collection->get_Annotations('reference'); > > To get the comment field for a specific reference you can do: > > $references[0]->comment; > > See the Feature-Annotation HOWTO for more information on Annotations, > the > Reference object is a kind of Annotation object. > > Brian O. > > > On 5/3/06 3:34 PM, "Mark A. Miller" wrote: > > > Yeah. Do you have any experience with that? > > > > Mark > > > > --- Brian Osborne wrote: > > > >> Mark, > >> > >> So you're trying to get the information in the RC line from a > >> Swissprot > >> format file? > >> > >> Brian O. > > > > > > --- --- --- --- --- --- --- --- > > > > Mark A. Miller > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam protection around > > http://mail.yahoo.com > > > --- --- --- --- --- --- --- --- Mark A. Miller __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cjfields at uiuc.edu Sat May 6 23:33:40 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Sat, 6 May 2006 22:33:40 -0500 Subject: [Bioperl-l] [BULK] can't parse blast file anymore Message-ID: <65109dc1.b47d779e.81acb00@expms6.cites.uiuc.edu> The -verbose flag was my suggestion; it should output a ton of debugging info from SearchIO::blast; if you see anything there, then it means that it's at least attempting to parse the report. Of course I can't test this myself at the moment since my wife's computer died (along with the bioperl setup); I'm using a loaner computer at the moment. Chris ---- Original message ---- >Date: Sun, 07 May 2006 10:57:14 +1000 >From: Torsten Seemann >Subject: Re: [Bioperl-l] [BULK] ?can't parse blast file anymore >To: Hubert Prielinger >Cc: bioperl-l at bioperl.org > >Hubert Prielinger wrote: >> ok, thanks >> I have submitted the bug >> bug #1994 > >This is a line from the script you sent to Bugzilla: > >my $search = new Bio::SearchIO ( >-verbose => 1,-format => 'blast', -file => $file) >or die "could not open blast report" if not defined my $search; > >Althoygh syntactically correct, I don't think it is doing what you want. >Please change it to this: > >my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or die >"could not open blast report"; > >or alternatively, this: > >my $search = new Bio::SearchIO(-format => 'blast', -file => $file); >if (not defined $search) { > die "could not open blast report"; >} > >and let us know what happens. > >all the example output you have supplied still suggests that Bio::SearchIO can >not load or parse your blast report. > >-- >Torsten Seemann >Victorian Bioinformatics Consortium, Monash University, Australia >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From chen_li3 at yahoo.com Sun May 7 03:34:55 2006 From: chen_li3 at yahoo.com (chen li) Date: Sun, 7 May 2006 00:34:55 -0700 (PDT) Subject: [Bioperl-l] primer parameters using primer3 Message-ID: <20060507073455.11849.qmail@web36815.mail.mud.yahoo.com> Hi all, I use Bio::Tools::Run::Primer3 to design PCR primers. I want to change some default values, for example, to increase the PCR product size to 490-510 bp instead of using the default value of 100-300 bp. What should I do ? Thanks, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From jason.stajich at duke.edu Sun May 7 16:49:29 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun, 7 May 2006 16:49:29 -0400 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <65109dc1.b47d779e.81acb00@expms6.cites.uiuc.edu> References: <65109dc1.b47d779e.81acb00@expms6.cites.uiuc.edu> Message-ID: The problem is in how SearchIO was being initialized, the code basically looked like this: my $x = new Foo() or die if not defined my $x; which is invalid for two reason. 1) if not defined my $x; Will ALWAYS be false. 2) my $x = new Foo() or die ; Will cast the new object as a boolean. Whenever things aren't working, take a look at the code and try and walk through any shortcuts. For clarity make it a two-step process my $x = new Foo(); die "no valid $x" unless defined $x; Please note that currently BioPerl WILL die (via throw) if you try and ask for an invalid file when you initialize a new IO object -- this is handled by code in Bio::Root::IO (line 313 in Bio/Root/IO.pm) which all the IO objects use, so you don't really need to do a test on the object after all. --jason On May 6, 2006, at 11:33 PM, Christopher Fields wrote: > The -verbose flag was my suggestion; it should output a ton of > debugging info > from SearchIO::blast; if you see anything there, then it means that > it's at least > attempting to parse the report. > > Of course I can't test this myself at the moment since my wife's > computer died > (along with the bioperl setup); I'm using a loaner computer at the > moment. > > Chris > > ---- Original message ---- >> Date: Sun, 07 May 2006 10:57:14 +1000 >> From: Torsten Seemann >> Subject: Re: [Bioperl-l] [BULK] can't parse blast file anymore >> To: Hubert Prielinger >> Cc: bioperl-l at bioperl.org >> >> Hubert Prielinger wrote: >>> ok, thanks >>> I have submitted the bug >>> bug #1994 >> >> This is a line from the script you sent to Bugzilla: >> >> my $search = new Bio::SearchIO ( >> -verbose => 1,-format => 'blast', -file => $file) >> or die "could not open blast report" if not defined my $search; >> >> Althoygh syntactically correct, I don't think it is doing what you >> want. >> Please change it to this: >> >> my $search = new Bio::SearchIO(-format => 'blast', -file => $file) >> or die >> "could not open blast report"; >> >> or alternatively, this: >> >> my $search = new Bio::SearchIO(-format => 'blast', -file => $file); >> if (not defined $search) { >> die "could not open blast report"; >> } >> >> and let us know what happens. >> >> all the example output you have supplied still suggests that >> Bio::SearchIO can >> not load or parse your blast report. >> >> -- >> Torsten Seemann >> Victorian Bioinformatics Consortium, Monash University, Australia >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From jason.stajich at duke.edu Sun May 7 17:01:29 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun, 7 May 2006 17:01:29 -0400 Subject: [Bioperl-l] primer parameters using primer3 In-Reply-To: <20060507073455.11849.qmail@web36815.mail.mud.yahoo.com> References: <20060507073455.11849.qmail@web36815.mail.mud.yahoo.com> Message-ID: I put up some info on the wiki (and I encourage other people to do the same!) http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3 Set the command line parameters by just calling a function of the name of the parameter. To get a list of the available options, this perl code will report it to you: # what are the arguments, and what do they mean? my $args = $primer3->arguments; print "ARGUMENT\tMEANING\n"; foreach my $key (keys %{$args}) {print "$key\t", $$args{$key}, "\n"} The info for PRODUCT_SIZE_RANGE is: (size range list, default 100-300) space separated list of product sizes eg - - I believe you can set the PCR product size with $primer3->primer_product_size_range("490-510"); -jason On May 7, 2006, at 3:34 AM, chen li wrote: > Hi all, > > I use Bio::Tools::Run::Primer3 to design PCR primers. > I want to change some default values, for example, to > increase the PCR product size to 490-510 bp instead of > using the default value of 100-300 bp. What should I > do ? > > > Thanks, > > Li > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From chen_li3 at yahoo.com Sun May 7 21:18:17 2006 From: chen_li3 at yahoo.com (chen li) Date: Sun, 7 May 2006 18:18:17 -0700 (PDT) Subject: [Bioperl-l] primer parameters using primer3 In-Reply-To: Message-ID: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com> Hi Jason, I add the line code $primer3->primer_product_size_range("490-510"); to my script. But it doesn't work nor primer3 complains it. Li --- Jason Stajich wrote: > I put up some info on the wiki (and I encourage > other people to do > the same!) > http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3 > > Set the command line parameters by just calling a > function of the > name of the parameter. To get a list of the > available options, this > perl code will report it to you: > > # what are the arguments, and what do they mean? > my $args = $primer3->arguments; > > print "ARGUMENT\tMEANING\n"; > foreach my $key (keys %{$args}) {print "$key\t", > $$args{$key}, "\n"} > > The info for PRODUCT_SIZE_RANGE is: > (size range list, default 100-300) space > separated list of product > sizes eg - - > > I believe you can set the PCR product size with > $primer3->primer_product_size_range("490-510"); > > -jason > On May 7, 2006, at 3:34 AM, chen li wrote: > > > Hi all, > > > > I use Bio::Tools::Run::Primer3 to design PCR > primers. > > I want to change some default values, for example, > to > > increase the PCR product size to 490-510 bp > instead of > > using the default value of 100-300 bp. What should > I > > do ? > > > > > > Thanks, > > > > Li > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From hubert.prielinger at gmx.at Sun May 7 21:41:14 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Sun, 07 May 2006 19:41:14 -0600 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <445D45EA.8020804@infotech.monash.edu.au> References: <000d01c670ad$d209f980$15327e82@pyrimidine> <445D218D.2030504@gmx.at> <445D45EA.8020804@infotech.monash.edu.au> Message-ID: <445EA1BA.9050301@gmx.at> hi, I have corrected that and now I finally I got a few error messages: blast.pm: unrecognized line Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Sch?ffer, blast.pm: unrecognized line Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman blast.pm: unrecognized line (1997), "Gapped BLAST and PSI-BLAST: a new generation of blast.pm: unrecognized line protein database search programs", Nucleic Acids Res. 25:3389-3402. blast.pm: unrecognized line RID: 1137529800-24476-151611170370.BLASTQ1 after that line it stops without terminating.... Torsten Seemann wrote: > Hubert Prielinger wrote: >> ok, thanks >> I have submitted the bug >> bug #1994 > > This is a line from the script you sent to Bugzilla: > > my $search = new Bio::SearchIO ( > -verbose => 1,-format => 'blast', -file => $file) > or die "could not open blast report" if not defined my $search; > > Althoygh syntactically correct, I don't think it is doing what you want. > Please change it to this: > > my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or > die "could not open blast report"; > > or alternatively, this: > > my $search = new Bio::SearchIO(-format => 'blast', -file => $file); > if (not defined $search) { > die "could not open blast report"; > } > > and let us know what happens. > > all the example output you have supplied still suggests that > Bio::SearchIO can not load or parse your blast report. > From cjfields at uiuc.edu Sun May 7 22:04:13 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Sun, 7 May 2006 21:04:13 -0500 Subject: [Bioperl-l] [BULK] can't parse blast file anymore Message-ID: <42d52830.b4f91bfc.81e4600@expms6.cites.uiuc.edu> These are debugging lines (not errors); you still have the -verbose flag set. Did you follow Jason's advice? I believe he's right on the money about the issue at hand... Chris ---- Original message ---- >Date: Sun, 07 May 2006 19:41:14 -0600 >From: Hubert Prielinger >Subject: Re: [Bioperl-l] [BULK] ?can't parse blast file anymore >To: Torsten Seemann , bioperl- l at bioperl.org, Chris Fields , Jason Stajich > >hi, >I have corrected that and now I finally I got a few error messages: > >blast.pm: unrecognized line Reference: Altschul, Stephen F., Thomas L. >Madden, Alejandro A. Sch?ffer, >blast.pm: unrecognized line Jinghui Zhang, Zheng Zhang, Webb Miller, and >David J. Lipman >blast.pm: unrecognized line (1997), "Gapped BLAST and PSI-BLAST: a new >generation of >blast.pm: unrecognized line protein database search programs", Nucleic >Acids Res. 25:3389-3402. >blast.pm: unrecognized line RID: 1137529800-24476-151611170370.BLASTQ1 > >after that line it stops without terminating.... > > >Torsten Seemann wrote: >> Hubert Prielinger wrote: >>> ok, thanks >>> I have submitted the bug >>> bug #1994 >> >> This is a line from the script you sent to Bugzilla: >> >> my $search = new Bio::SearchIO ( >> -verbose => 1,-format => 'blast', -file => $file) >> or die "could not open blast report" if not defined my $search; >> >> Althoygh syntactically correct, I don't think it is doing what you want. >> Please change it to this: >> >> my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or >> die "could not open blast report"; >> >> or alternatively, this: >> >> my $search = new Bio::SearchIO(-format => 'blast', -file => $file); >> if (not defined $search) { >> die "could not open blast report"; >> } >> >> and let us know what happens. >> >> all the example output you have supplied still suggests that >> Bio::SearchIO can not load or parse your blast report. >> > From jason.stajich at duke.edu Sun May 7 22:47:00 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun, 7 May 2006 22:47:00 -0400 Subject: [Bioperl-l] primer parameters using primer3 In-Reply-To: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com> References: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com> Message-ID: <430DE892-8EE8-4FC9-8BAC-7D344C876B72@duke.edu> I'm not really familiar with the module more than what the documentation says so did you try and use the add_targets method to add arguments instead? I had thought the AUTOLOAD method took care of access to the cmd line arguments as it does for the other Run modules but I am not really sure. Perhaps folks on the list who use this module can provide better advice. -jason On May 7, 2006, at 9:18 PM, chen li wrote: > Hi Jason, > > I add the line code > $primer3->primer_product_size_range("490-510"); > to my script. But it doesn't work nor primer3 > complains it. > > Li > > --- Jason Stajich wrote: > >> I put up some info on the wiki (and I encourage >> other people to do >> the same!) >> > http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3 >> >> Set the command line parameters by just calling a >> function of the >> name of the parameter. To get a list of the >> available options, this >> perl code will report it to you: >> >> # what are the arguments, and what do they mean? >> my $args = $primer3->arguments; >> >> print "ARGUMENT\tMEANING\n"; >> foreach my $key (keys %{$args}) {print "$key\t", >> $$args{$key}, "\n"} >> >> The info for PRODUCT_SIZE_RANGE is: >> (size range list, default 100-300) space >> separated list of product >> sizes eg - - >> >> I believe you can set the PCR product size with >> $primer3->primer_product_size_range("490-510"); >> >> -jason >> On May 7, 2006, at 3:34 AM, chen li wrote: >> >>> Hi all, >>> >>> I use Bio::Tools::Run::Primer3 to design PCR >> primers. >>> I want to change some default values, for example, >> to >>> increase the PCR product size to 490-510 bp >> instead of >>> using the default value of 100-300 bp. What should >> I >>> do ? >>> >>> >>> Thanks, >>> >>> Li >>> >>> __________________________________________________ >>> Do You Yahoo!? >>> Tired of spam? Yahoo! Mail has the best spam >> protection around >>> http://mail.yahoo.com >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com -- Jason Stajich Duke University http://www.duke.edu/~jes12 From osborne1 at optonline.net Mon May 8 10:49:22 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Mon, 08 May 2006 10:49:22 -0400 Subject: [Bioperl-l] primer parameters using primer3 In-Reply-To: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com> Message-ID: Li, Read the documentation, Bio::Tools::Run::Primer3. It shows examples of the correct syntax. Also look at bioperl-run/t/Primer3.t. Brian O. On 5/7/06 9:18 PM, "chen li" wrote: > Hi Jason, > > I add the line code > $primer3->primer_product_size_range("490-510"); > to my script. But it doesn't work nor primer3 > complains it. > > Li > > --- Jason Stajich wrote: > >> I put up some info on the wiki (and I encourage >> other people to do >> the same!) >> > http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3 >> >> Set the command line parameters by just calling a >> function of the >> name of the parameter. To get a list of the >> available options, this >> perl code will report it to you: >> >> # what are the arguments, and what do they mean? >> my $args = $primer3->arguments; >> >> print "ARGUMENT\tMEANING\n"; >> foreach my $key (keys %{$args}) {print "$key\t", >> $$args{$key}, "\n"} >> >> The info for PRODUCT_SIZE_RANGE is: >> (size range list, default 100-300) space >> separated list of product >> sizes eg - - >> >> I believe you can set the PCR product size with >> $primer3->primer_product_size_range("490-510"); >> >> -jason >> On May 7, 2006, at 3:34 AM, chen li wrote: >> >>> Hi all, >>> >>> I use Bio::Tools::Run::Primer3 to design PCR >> primers. >>> I want to change some default values, for example, >> to >>> increase the PCR product size to 490-510 bp >> instead of >>> using the default value of 100-300 bp. What should >> I >>> do ? >>> >>> >>> Thanks, >>> >>> Li >>> >>> __________________________________________________ >>> Do You Yahoo!? >>> Tired of spam? Yahoo! Mail has the best spam >> protection around >>> http://mail.yahoo.com >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy at colibase.bham.ac.uk Mon May 8 07:12:49 2006 From: roy at colibase.bham.ac.uk (Roy Chaudhuri) Date: Mon, 08 May 2006 12:12:49 +0100 Subject: [Bioperl-l] primer parameters using primer3 In-Reply-To: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com> References: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com> Message-ID: <445F27B1.40501@colibase.bham.ac.uk> Hi Li, I think the syntax you need is: $primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510'); I guess you may also need to change the parameter PRIMER_PRODUCT_OPT_SIZE. Incidentally, such a restricted product size range may mean that Primer3 is unable to design any suitable primers. If I recall correctly, this doesn't cause an error, you just get a Bio::Tools::Primer3 object with no primers in it. I have had some success with testing for this, and if necessary relaxing some constraints on primer design and re-running Primer3. Hope this helps. Roy. -- Dr. Roy Chaudhuri Bioinformatics Research Fellow Division of Immunity and Infection University of Birmingham, U.K. http://xbase.bham.ac.uk > Hi Jason, > > I add the line code > $primer3->primer_product_size_range("490-510"); > to my script. But it doesn't work nor primer3 > complains it. > > Li > > --- Jason Stajich wrote: > >> > I put up some info on the wiki (and I encourage >> > other people to do >> > the same!) >> > > http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3 >> > >> > Set the command line parameters by just calling a >> > function of the >> > name of the parameter. To get a list of the >> > available options, this >> > perl code will report it to you: >> > >> > # what are the arguments, and what do they mean? >> > my $args = $primer3->arguments; >> > >> > print "ARGUMENT\tMEANING\n"; >> > foreach my $key (keys %{$args}) {print "$key\t", >> > $$args{$key}, "\n"} >> > >> > The info for PRODUCT_SIZE_RANGE is: >> > (size range list, default 100-300) space >> > separated list of product >> > sizes eg - - >> > >> > I believe you can set the PCR product size with >> > $primer3->primer_product_size_range("490-510"); >> > >> > -jason >> > On May 7, 2006, at 3:34 AM, chen li wrote: >> > >>> > > Hi all, >>> > > >>> > > I use Bio::Tools::Run::Primer3 to design PCR >> > primers. >>> > > I want to change some default values, for example, >> > to >>> > > increase the PCR product size to 490-510 bp >> > instead of >>> > > using the default value of 100-300 bp. What should >> > I >>> > > do ? >>> > > >>> > > >>> > > Thanks, >>> > > >>> > > Li >>> > > >>> > > __________________________________________________ >>> > > Do You Yahoo!? >>> > > Tired of spam? Yahoo! Mail has the best spam >> > protection around >>> > > http://mail.yahoo.com >>> > > _______________________________________________ >>> > > Bioperl-l mailing list >>> > > Bioperl-l at lists.open-bio.org >>> > > >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > -- >> > Jason Stajich >> > Duke University >> > http://www.duke.edu/~jes12 >> > >> > >> > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From chen_li3 at yahoo.com Mon May 8 09:21:54 2006 From: chen_li3 at yahoo.com (chen li) Date: Mon, 8 May 2006 06:21:54 -0700 (PDT) Subject: [Bioperl-l] primer parameters using primer3 In-Reply-To: <445F27B1.40501@colibase.bham.ac.uk> Message-ID: <20060508132154.71440.qmail@web36802.mail.mud.yahoo.com> I think Dr. Chaudhuri is correct. I add the follwoing line codes to my script(actually copy from the document) $primer3->add_targets( PRIMER_PRODUCT_SIZE_RANGE=>'490-510'); $primer3->add_targets('PRIMER_MIN_TM'=>60, 'PRIMER_MAX_TM'=>64); to design the primers with product size from 490-510 bp and primer annealing Tm from 60 to 64C . Here is part of the output in the file called temp.out: .......... original sequence..... GTGGGCTGGTGTTGCTTGGAAAATTTCAAAATCCCAAAGTTTCAGGCTTCCCAAAGTTGGCTTGGAAAAATGTGATAGTCTCACCTGAGTCTAGACATGT ................. PRIMER_PRODUCT_SIZE_RANGE=490-510 PRIMER_MIN_TM=60 PRIMER_MAX_TM=64 PRIMER_PAIR_PENALTY=0.1544 PRIMER_LEFT_PENALTY=0.081468 PRIMER_RIGHT_PENALTY=0.072951 PRIMER_LEFT_SEQUENCE=CCAAAGTTGGCTTGGAAAAA ............................... PRIMER_PRODUCT_SIZE=501 .............. This is what I want. If you don't set the special parameters such annealing Tm program will use the defualt ones. If you set your own parameters they will show up after the sequence (see this output example). If one needs to set more parameters and wants to know what parameters are available just browse the code for BEGIN section. Now I have another question: the program always prints out the original sequence at the beginning is it possible not to do that? Thanks all for join this topic, Li --- Roy Chaudhuri wrote: > Hi Li, > > I think the syntax you need is: > > $primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510'); > > I guess you may also need to change the parameter > PRIMER_PRODUCT_OPT_SIZE. > > Incidentally, such a restricted product size range > may mean that Primer3 > is unable to design any suitable primers. If I > recall correctly, this > doesn't cause an error, you just get a > Bio::Tools::Primer3 object with > no primers in it. I have had some success with > testing for this, and if > necessary relaxing some constraints on primer design > and re-running > Primer3. > > Hope this helps. > Roy. > > -- > Dr. Roy Chaudhuri > Bioinformatics Research Fellow > Division of Immunity and Infection > University of Birmingham, U.K. > > http://xbase.bham.ac.uk > > > Hi Jason, > > > > I add the line code > > $primer3->primer_product_size_range("490-510"); > > to my script. But it doesn't work nor primer3 > > complains it. > > > > Li > > > > --- Jason Stajich wrote: > > > >> > I put up some info on the wiki (and I encourage > >> > other people to do > >> > the same!) > >> > > > > http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3 > >> > > >> > Set the command line parameters by just calling > a > >> > function of the > >> > name of the parameter. To get a list of the > >> > available options, this > >> > perl code will report it to you: > >> > > >> > # what are the arguments, and what do they > mean? > >> > my $args = $primer3->arguments; > >> > > >> > print "ARGUMENT\tMEANING\n"; > >> > foreach my $key (keys %{$args}) {print > "$key\t", > >> > $$args{$key}, "\n"} > >> > > >> > The info for PRODUCT_SIZE_RANGE is: > >> > (size range list, default 100-300) space > >> > separated list of product > >> > sizes eg - - > >> > > >> > I believe you can set the PCR product size with > >> > > $primer3->primer_product_size_range("490-510"); > >> > > >> > -jason > >> > On May 7, 2006, at 3:34 AM, chen li wrote: > >> > > >>> > > Hi all, > >>> > > > >>> > > I use Bio::Tools::Run::Primer3 to design PCR > >> > primers. > >>> > > I want to change some default values, for > example, > >> > to > >>> > > increase the PCR product size to 490-510 bp > >> > instead of > >>> > > using the default value of 100-300 bp. What > should > >> > I > >>> > > do ? > >>> > > > >>> > > > >>> > > Thanks, > >>> > > > >>> > > Li > >>> > > > >>> > > > __________________________________________________ > >>> > > Do You Yahoo!? > >>> > > Tired of spam? Yahoo! Mail has the best > spam > >> > protection around > >>> > > http://mail.yahoo.com > >>> > > > _______________________________________________ > >>> > > Bioperl-l mailing list > >>> > > Bioperl-l at lists.open-bio.org > >>> > > > >> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > >> > -- > >> > Jason Stajich > >> > Duke University > >> > http://www.duke.edu/~jes12 > >> > > >> > > >> > > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From hubert.prielinger at gmx.at Mon May 8 15:09:29 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Mon, 08 May 2006 13:09:29 -0600 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <42d52830.b4f91bfc.81e4600@expms6.cites.uiuc.edu> References: <42d52830.b4f91bfc.81e4600@expms6.cites.uiuc.edu> Message-ID: <445F9769.70500@gmx.at> hi all together, i have solved the problem, because I'm parsing blast 2.2.13 and I have installed an early bioperl 1.5.1 and there it occurred that bug 1934 wasn't fixed yet, so I had to exchange the blast.pm file and now it works properly. thank you very much Hubert Christopher Fields wrote: > These are debugging lines (not errors); you still have the -verbose flag set. > > Did you follow Jason's advice? I believe he's right on the money about the issue > at hand... > > Chris > > ---- Original message ---- > >> Date: Sun, 07 May 2006 19:41:14 -0600 >> From: Hubert Prielinger >> Subject: Re: [Bioperl-l] [BULK] can't parse blast file anymore >> To: Torsten Seemann , bioperl- >> > l at bioperl.org, Chris Fields , Jason Stajich > > >> hi, >> I have corrected that and now I finally I got a few error messages: >> >> blast.pm: unrecognized line Reference: Altschul, Stephen F., Thomas L. >> Madden, Alejandro A. Sch?ffer, >> blast.pm: unrecognized line Jinghui Zhang, Zheng Zhang, Webb Miller, and >> David J. Lipman >> blast.pm: unrecognized line (1997), "Gapped BLAST and PSI-BLAST: a new >> generation of >> blast.pm: unrecognized line protein database search programs", Nucleic >> Acids Res. 25:3389-3402. >> blast.pm: unrecognized line RID: >> > 1137529800-24476-151611170370.BLASTQ1 > >> after that line it stops without terminating.... >> >> >> Torsten Seemann wrote: >> >>> Hubert Prielinger wrote: >>> >>>> ok, thanks >>>> I have submitted the bug >>>> bug #1994 >>>> >>> This is a line from the script you sent to Bugzilla: >>> >>> my $search = new Bio::SearchIO ( >>> -verbose => 1,-format => 'blast', -file => $file) >>> or die "could not open blast report" if not defined my $search; >>> >>> Althoygh syntactically correct, I don't think it is doing what you want. >>> Please change it to this: >>> >>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or >>> die "could not open blast report"; >>> >>> or alternatively, this: >>> >>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file); >>> if (not defined $search) { >>> die "could not open blast report"; >>> } >>> >>> and let us know what happens. >>> >>> all the example output you have supplied still suggests that >>> Bio::SearchIO can not load or parse your blast report. >>> >>> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From s.johri at imperial.ac.uk Mon May 8 11:38:13 2006 From: s.johri at imperial.ac.uk (Johri, Saurabh) Date: Mon, 8 May 2006 16:38:13 +0100 Subject: [Bioperl-l] PAML + Codeml problem.. Message-ID: <4A98ACB8EC146149872BAC9A132A582C277AC4@icex5.ic.ac.uk> Hi all, I'm trying to use codeml from PAML to estimate Ka, Ks values from sequences within a multi fasta file: i'm using the code which has been posted on the bioperl wiki... However, when I run the code, i get the following errors: I did a google search to see if anyone had come across similar problems.... in which case the problem seems to have been due to the sequences not being a multiple of 3, In my code I check if the sequence is a multiple of 3 and if not, i alter the sequences until this is the case, although I still have the same error messages, Any suggestions as to why this could be happening? Thanks!!! Saurabh Johri Tuberculosis Research Group Centre for Molecular Microbiology & Infection Imperial College London SW7 2AZ -------------------- WARNING --------------------- MSG: There was an error - see error_string for the program output --------------------------------------------------- ------------- EXCEPTION Bio::Root::NotImplemented ------------- MSG: Unknown format of PAML output STACK Bio::Tools::Phylo::PAML::_parse_summary /sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:359 STACK Bio::Tools::Phylo::PAML::next_result /sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:224 ------------------------------------ >Rv3923c caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc aaataagcccggtgttgcaatcaa >Rv3923c_mtb_cdc1551 caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcac >Rv3923c_mtb_f11 caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc aaataagcccggtgttgcaatcaa >Rv3923c_mtb_c1 caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc aaataagcccggtgttgcaatcaa >Rv3923c_mtb_210 caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc aaataagcccggtgttgcaatcaa >Rv3923c_mbovis caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc aaataagcccggtgttgcaatcaa ------------------------------------ From chen_li3 at yahoo.com Mon May 8 20:21:42 2006 From: chen_li3 at yahoo.com (chen li) Date: Mon, 8 May 2006 17:21:42 -0700 (PDT) Subject: [Bioperl-l] use primer3 to design primers with multiple sequences Message-ID: <20060509002142.94880.qmail@web36806.mail.mud.yahoo.com> Dear all, The following is the script I use to design primers for one sequence: #!/cygdrive/c/Perl/bin/perl.exe use warnings; use strict; use Bio::Tools::Run::Primer3; use Bio::SeqIO; my $file_in='piwil2.fa'; my $file_out='temp.out'; my $seqio=Bio::SeqIO->new(-file=>$file_in) my $seq=$seqio->next_seq; my $primer3=Bio::Tools::Run::Primer3->new( -seq=>$seq, -outfile=>$file_out, - path=>"c:/Perl/local/primer3_1.0.0/src/primer3.exe" ); unless ($primer3->executable){ print "primer3 can not be found. Is it installed?\n"; exit(-1); } $primer3->add_targets( # set your own parameters for the primers or product 'PRIMER_OPT_GC_PERCENT'=>' 50 ', 'PRIMER_OPT_SIZE'=> '24 ', 'PRIMER_OPT_TM'=> ' 60 '); my $result=$primer3->run; exit; I try to modify it for multiple sequences by using a while loop as following: while ($seq=$seqio->next_seq){ my $primer3=Bio::Tools::Run::Primer3->new() # design the primer} ....} I get primers only for the last sequence. It seems the earlier ones are overwritten. Any idea will be highly aprreciated. Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From jason.stajich at duke.edu Mon May 8 20:59:26 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon, 8 May 2006 20:59:26 -0400 Subject: [Bioperl-l] PAML + Codeml problem.. In-Reply-To: <4A98ACB8EC146149872BAC9A132A582C277AC4@icex5.ic.ac.uk> References: <4A98ACB8EC146149872BAC9A132A582C277AC4@icex5.ic.ac.uk> Message-ID: <4796FE3D-9D14-4D93-B455-69EDFE2B2B62@duke.edu> Saurabh - a) These sequences are identical except for difference in length so there isn't going to be any interesting values from PAML, but maybe you are just providing an example? b) I think you are missing the trailing gaps in the alignment of the Rv3923c_mtb_cdc1551 sequence as it is shorter PAML requires aligned sequences as input. c) The sequences, in the reading frame you have provided (and using the standard translation table), have stop codons in them, this will cause failure as well. Which code from the wiki are you running, the 'running PAML' part of the HOWTO? Try looking at the actual output from PAML to figure out what is wrong. Add this when initializing the Run object: -save_tempfiles => 1, -verbose => 1, then open up the tempdir that is reported and look at the output files (mlc file). -jason On May 8, 2006, at 11:38 AM, Johri, Saurabh wrote: > Hi all, > > I'm trying to use codeml from PAML to estimate Ka, Ks values from > sequences within a multi fasta file: > i'm using the code which has been posted on the bioperl wiki... > > However, when I run the code, i get the following errors: > > I did a google search to see if anyone had come across similar > problems.... in which case the problem seems to have been due to the > sequences not being a multiple of 3, > In my code I check if the sequence is a multiple of 3 and if not, i > alter the sequences until this is the case, although I still have the > same error messages, > > Any suggestions as to why this could be happening? > > Thanks!!! > > Saurabh Johri > Tuberculosis Research Group > Centre for Molecular Microbiology & Infection > Imperial College London > SW7 2AZ > > > > > -------------------- WARNING --------------------- > MSG: There was an error - see error_string for the program output > --------------------------------------------------- > > ------------- EXCEPTION Bio::Root::NotImplemented ------------- > MSG: Unknown format of PAML output > STACK Bio::Tools::Phylo::PAML::_parse_summary > /sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:359 > STACK Bio::Tools::Phylo::PAML::next_result > /sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:224 > ------------------------------------ > >> Rv3923c > caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg > ag > gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg > ac > ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc > gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg > gt > acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc > gc > aaataagcccggtgttgcaatcaa >> Rv3923c_mtb_cdc1551 > caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg > ag > gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg > ac > ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc > gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg > gt > acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcac >> Rv3923c_mtb_f11 > caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg > ag > gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg > ac > ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc > gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg > gt > acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc > gc > aaataagcccggtgttgcaatcaa >> Rv3923c_mtb_c1 > caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg > ag > gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg > ac > ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc > gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg > gt > acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc > gc > aaataagcccggtgttgcaatcaa >> Rv3923c_mtb_210 > caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg > ag > gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg > ac > ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc > gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg > gt > acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc > gc > aaataagcccggtgttgcaatcaa >> Rv3923c_mbovis > caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg > ag > gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg > ac > ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc > gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg > gt > acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc > gc > aaataagcccggtgttgcaatcaa > > ------------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From osborne1 at optonline.net Mon May 8 21:17:22 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Mon, 08 May 2006 21:17:22 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiple sequences In-Reply-To: <20060509002142.94880.qmail@web36806.mail.mud.yahoo.com> Message-ID: Li, If you're analyzing multiple input sequences you're going to have to create multiple output sequences. Brian O. On 5/8/06 8:21 PM, "chen li" wrote: > I get primers only for the last sequence. It seems the > earlier ones are overwritten. From WiersmaP at AGR.GC.CA Mon May 8 21:28:27 2006 From: WiersmaP at AGR.GC.CA (Wiersma, Paul) Date: Mon, 8 May 2006 21:28:27 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiple sequences Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C41@onncrxms5.agr.gc.ca> Hi Li, When you execute $primer3->run with a Bio::Tools::Run::Primer3 object it opens -outfile=>"filename" for writing and then closes. That's why putting it in a loop will overwrite your output file each time so you only see the last one. I suppose you could read in each output file before looping to the next seq and append it to another file. If you're doing a fair bit of work with this module it would be worth looking at the Bio::Tools::Primer3 module. The statement $result = $primer3->run produces a Bio::Tools::Primer3 object which has all the methods you need for customizing your output. Paul Paul A. Wiersma Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Summerland, BC wiersmap at agr.gc.ca From simon_sask at yahoo.com Tue May 9 04:06:04 2006 From: simon_sask at yahoo.com (Simon K. Chan) Date: Tue, 9 May 2006 01:06:04 -0700 (PDT) Subject: [Bioperl-l] Raw Blast Alignment Message-ID: <20060509080604.53621.qmail@web54104.mail.yahoo.com> Hi Fellow Bioperl-ers, bioperl-live/examples/searchio/rawwriter.pl is supposed to show the raw alignments using Bio::SearchIO. The script is written to parse a PSI-BLAST report. I found an old email in the archive from Jason stating that this should parse other flavors of blast reports as well. What do I need to do to make this script parse non-PSI blast reports? I tried to just specify a file and that the -format is 'blast', but I get an error stating that the object method 'raw_hit_data' is not defined in Bio::Search::Hit::BlastHit. Basically, I want to obtain the raw alignment because I'd like to get the size of the gaps, not just the number. Any help will be much appreciated. Many thanks __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cjfields at uiuc.edu Tue May 9 08:21:02 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Tue, 9 May 2006 07:21:02 -0500 Subject: [Bioperl-l] Raw Blast Alignment Message-ID: You need to read the SearchIO HOWTO, which gives several examples: http://www.bioperl.org/wiki/HOWTO:SearchIO Chris ---- Original message ---- >Date: Tue, 9 May 2006 01:06:04 -0700 (PDT) >From: "Simon K. Chan" >Subject: [Bioperl-l] Raw Blast Alignment >To: bioperl-l at lists.open-bio.org > >Hi Fellow Bioperl-ers, > >bioperl-live/examples/searchio/rawwriter.pl is >supposed to show the raw alignments using >Bio::SearchIO. The script is written to parse a >PSI-BLAST report. I found an old email in the archive >from Jason stating that this should parse other >flavors of blast reports as well. > >What do I need to do to make this script parse non-PSI >blast reports? I tried to just specify a file and >that the -format is 'blast', but I get an error >stating that the object method 'raw_hit_data' is not >defined in Bio::Search::Hit::BlastHit. > >Basically, I want to obtain the raw alignment because >I'd like to get the size of the gaps, not just the >number. > >Any help will be much appreciated. >Many thanks > > >__________________________________________________ >Do You Yahoo!? >Tired of spam? Yahoo! Mail has the best spam protection around >http://mail.yahoo.com >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From peterm at bioinf.uni-leipzig.de Tue May 9 08:44:25 2006 From: peterm at bioinf.uni-leipzig.de (Peter Menzel) Date: Tue, 09 May 2006 14:44:25 +0200 Subject: [Bioperl-l] colorize features Message-ID: <44608EA9.1030808@bioinf.uni-leipzig.de> Hi all, I am using the Bio::Graphics module to draw sequences and their features with Bio::SeqFeature::Generic. The features I want to highlight are occurrences of transcription binding factors. Therefore I want to give every factor its own color, but i didn't see how to manage it. I only can colorize complete tracks. Is there a known workaround? Thanks, Peter From Marc.Logghe at DEVGEN.com Tue May 9 10:13:24 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Tue, 9 May 2006 16:13:24 +0200 Subject: [Bioperl-l] colorize features Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746D88@ANTARESIA.be.devgen.com> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Peter Menzel > Sent: Tuesday, May 09, 2006 2:44 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] colorize features > > Hi all, > I am using the Bio::Graphics module to draw sequences and > their features with Bio::SeqFeature::Generic. > The features I want to highlight are occurrences of > transcription binding factors. Therefore I want to give every > factor its own color, but i didn't see how to manage it. I > only can colorize complete tracks. > Is there a known workaround? Yes, instead of giving a hardcoded color value you can pass a subroutine to the option. -bgcolor => sub { my $feat = shift; # get your attribute on which you want to base your color my ($attr) = $feat->get_tag_values('my_attribute'); return $attr > 10 ? 'red' : 'green' } Not sure about the method calls I am making here (could as well be get_attributes()) but you get the idea. Cheers, Marc From Marc.Logghe at DEVGEN.com Tue May 9 10:47:06 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Tue, 9 May 2006 16:47:06 +0200 Subject: [Bioperl-l] colorize features Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746D89@ANTARESIA.be.devgen.com> Hi Peter, Actually it is explained much better in this howto: http://bioperl.org/wiki/HOWTO:Graphics The examples show the principle I mentioned in my previous post (e.g. Example 4), but then for the -label or -description options. But as said, you can apply this as well for (most of ?) the other options as well. Regards, ML > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Marc Logghe > Sent: Tuesday, May 09, 2006 4:13 PM > To: Peter Menzel; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] colorize features > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Peter > > Menzel > > Sent: Tuesday, May 09, 2006 2:44 PM > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] colorize features > > > > Hi all, > > I am using the Bio::Graphics module to draw sequences and their > > features with Bio::SeqFeature::Generic. > > The features I want to highlight are occurrences of transcription > > binding factors. Therefore I want to give every factor its > own color, > > but i didn't see how to manage it. I only can colorize complete > > tracks. > > Is there a known workaround? > > Yes, instead of giving a hardcoded color value you can pass a > subroutine to the option. > -bgcolor => sub { > my $feat = shift; > # get your attribute on which you want to base your color > my ($attr) = $feat->get_tag_values('my_attribute'); > > return $attr > 10 ? 'red' : 'green' > } > > Not sure about the method calls I am making here (could as well be > get_attributes()) but you get the idea. > Cheers, > Marc > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From WiersmaP at AGR.GC.CA Tue May 9 11:49:33 2006 From: WiersmaP at AGR.GC.CA (Wiersma, Paul) Date: Tue, 9 May 2006 11:49:33 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiple sequences Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C42@onncrxms5.agr.gc.ca> Hi Li, The line "my $result = $primer3->run" is already in the code you submitted. In the Bio::Tools::Primer3 module the author uses "$p3" for the object. If you change your line to "my $p3 = $primer3->run" you should be able to run the examples below. Process the results for each sequence and output the results before looping to the next sequence. >From Bio::Tools::Primer3.pm: # how many results were there? my $num=$p3->number_of_results; print "There were $num results\n"; # get all the results my $all_results=$p3->all_results; print "ALL the results\n"; foreach my $key (keys %{$all_results}) {print "$key\t${$all_results}{$key}\n"} # get specific results my $result1=$p3->primer_results(1); print "The first primer is\n"; foreach my $key (keys %{$result1}) {print "$key\t${$result1}{$key}\n"} Paul Paul A. Wiersma Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Summerland, BC wiersmap at agr.gc.ca ? -----Original Message----- From: chen li [mailto:chen_li3 at yahoo.com] Sent: Monday, May 08, 2006 8:32 PM To: Wiersma, Paul Subject: Re: [Bioperl-l] use primer3 to design primers with multiple sequences Hi Paul, I read both documents. What I understand is that Bio:Tools::Run:Primer3 is for designing primers and Bio:Tools::Primer3 is for parsing the results. When I read the documents I do not see this line $result = $primer3->run in Bio:Tools::Primer3. I wonder how you get this infomration. Thanks, Li --- "Wiersma, Paul" wrote: > Hi Li, > > > > When you execute $primer3->run with a > Bio::Tools::Run::Primer3 object it > opens -outfile=>"filename" for writing and then > closes. That's why > putting it in a loop will overwrite your output file > each time so you > only see the last one. I suppose you could read in > each output file > before looping to the next seq and append it to > another file. > > > > If you're doing a fair bit of work with this module > it would be worth > looking at the Bio::Tools::Primer3 module. The > statement $result = > $primer3->run produces a Bio::Tools::Primer3 object > which has all the > methods you need for customizing your output. > > > > Paul > > > > Paul A. Wiersma > Agriculture and Agri-Food Canada/Agriculture et > Agroalimentaire Canada > Summerland, BC > > wiersmap at agr.gc.ca > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From chen_li3 at yahoo.com Tue May 9 13:32:32 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 9 May 2006 10:32:32 -0700 (PDT) Subject: [Bioperl-l] use primer3 to design primers with multiple sequences In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C42@onncrxms5.agr.gc.ca> Message-ID: <20060509173232.18843.qmail@web36802.mail.mud.yahoo.com> Thanks Paul it REALLY works. I have other questions: 1) When I run the script I use this line on the command prompt perl primer.pl >test When I check the default output file(temp.out) used by the script I only see the information about the last sequence which is different from what is in the test file. In test file I can get all the information for all the sequences. 2)Is it possible directly to use Bio::Tools:: Primer3 to print out selective information such as the primer sequence and the size of PCR product? Or do I have parse the file by myself? After I get all these information I would like to post the script for bacth-designing PCR primers. Thanks, Li --- "Wiersma, Paul" wrote: > Hi Li, > > The line "my $result = $primer3->run" is already in > the code you submitted. In the Bio::Tools::Primer3 > module the author uses "$p3" for the object. If you > change your line to "my $p3 = $primer3->run" you > should be able to run the examples below. Process > the results for each sequence and output the results > before looping to the next sequence. > > >From Bio::Tools::Primer3.pm: > > # how many results were there? > my $num=$p3->number_of_results; > print "There were $num results\n"; > > # get all the results > my $all_results=$p3->all_results; > print "ALL the results\n"; > foreach my $key (keys %{$all_results}) {print > "$key\t${$all_results}{$key}\n"} > > # get specific results > my $result1=$p3->primer_results(1); > print "The first primer is\n"; > foreach my $key (keys %{$result1}) {print > "$key\t${$result1}{$key}\n"} > > Paul > > Paul A. Wiersma > Agriculture and Agri-Food Canada/Agriculture et > Agroalimentaire Canada > Summerland, BC > wiersmap at agr.gc.ca > > ? > > > > -----Original Message----- > From: chen li [mailto:chen_li3 at yahoo.com] > Sent: Monday, May 08, 2006 8:32 PM > To: Wiersma, Paul > Subject: Re: [Bioperl-l] use primer3 to design > primers with multiple sequences > > Hi Paul, > > I read both documents. What I understand is that > Bio:Tools::Run:Primer3 is for designing primers and > Bio:Tools::Primer3 is for parsing the results. When > I > read the documents I do not see this line > $result = $primer3->run in Bio:Tools::Primer3. I > wonder how you get this infomration. > > Thanks, > > Li > > --- "Wiersma, Paul" wrote: > > > Hi Li, > > > > > > > > When you execute $primer3->run with a > > Bio::Tools::Run::Primer3 object it > > opens -outfile=>"filename" for writing and then > > closes. That's why > > putting it in a loop will overwrite your output > file > > each time so you > > only see the last one. I suppose you could read > in > > each output file > > before looping to the next seq and append it to > > another file. > > > > > > > > If you're doing a fair bit of work with this > module > > it would be worth > > looking at the Bio::Tools::Primer3 module. The > > statement $result = > > $primer3->run produces a Bio::Tools::Primer3 > object > > which has all the > > methods you need for customizing your output. > > > > > > > > Paul > > > > > > > > Paul A. Wiersma > > Agriculture and Agri-Food Canada/Agriculture et > > Agroalimentaire Canada > > Summerland, BC > > > > wiersmap at agr.gc.ca > > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam > protection around > http://mail.yahoo.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From WiersmaP at AGR.GC.CA Tue May 9 13:59:20 2006 From: WiersmaP at AGR.GC.CA (Wiersma, Paul) Date: Tue, 9 May 2006 13:59:20 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiple sequences Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C43@onncrxms5.agr.gc.ca> Hi Li, I've attached some code I used to explore basic functionality of Primer3.pm modules. Hopefully you can see how I've picked out parts of the results for printing. You can modify it as you need to output only some results. >>>>>>>> # design the primers. This runs primer3 and returns a # Bio::Tools::Run::Primer3 object with the results my $results=$primer3->run; # see the Bio::Tools::Run::Primer3 pod for # things that you can get from this. For example: print "There were ", $results->number_of_results+1, " primers\n"; my @out_keys_part = qw( START LENGTH TM GC_PERCENT SELF_ANY SELF_END SEQUENCE ); for (my $i=0;$i <= $results->number_of_results;$i++){ # get specific results my $result1=$results->primer_results($i); print "\n",$i+1; for $key qw(PRIMER_LEFT PRIMER_RIGHT){ my ($start, $length) = split /,/, ${$result1}{$key}; ${$result1}{$key."_START"} = $start; ${$result1}{$key."_LENGTH"} = $length; foreach $partkey (@out_keys_part) { print "\t", ${$result1}{$key."_".$partkey}; } print "\n"; } print "\tPRODUCT SIZE: ", ${$result1}{'PRIMER_PRODUCT_SIZE'}, ", PAIR ANY COMPL: ", ${$result1}{'PRIMER_PAIR_COMPL_ANY'}; print ", PAIR 3\' COMPL: ", ${$result1}{'PRIMER_PAIR_COMPL_END'}, "\n"; } >>>>>>>>>>>>>>> Paul A. Wiersma Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Telephone/T?l?phone: 250-494-6388 Facsimile/T?l?copieur: 250-494-0755 Box 5000, 4200 Hwy 97 Summerland, BC V0H 1Z0 wiersmap at agr.gc.ca ? -----Original Message----- From: chen li [mailto:chen_li3 at yahoo.com] Sent: Tuesday, May 09, 2006 10:33 AM To: Wiersma, Paul Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] use primer3 to design primers with multiple sequences Thanks Paul it REALLY works. I have other questions: 1) When I run the script I use this line on the command prompt perl primer.pl >test When I check the default output file(temp.out) used by the script I only see the information about the last sequence which is different from what is in the test file. In test file I can get all the information for all the sequences. 2)Is it possible directly to use Bio::Tools:: Primer3 to print out selective information such as the primer sequence and the size of PCR product? Or do I have parse the file by myself? After I get all these information I would like to post the script for bacth-designing PCR primers. Thanks, Li --- "Wiersma, Paul" wrote: > Hi Li, > > The line "my $result = $primer3->run" is already in > the code you submitted. In the Bio::Tools::Primer3 > module the author uses "$p3" for the object. If you > change your line to "my $p3 = $primer3->run" you > should be able to run the examples below. Process > the results for each sequence and output the results > before looping to the next sequence. > > >From Bio::Tools::Primer3.pm: > > # how many results were there? > my $num=$p3->number_of_results; > print "There were $num results\n"; > > # get all the results > my $all_results=$p3->all_results; > print "ALL the results\n"; > foreach my $key (keys %{$all_results}) {print > "$key\t${$all_results}{$key}\n"} > > # get specific results > my $result1=$p3->primer_results(1); > print "The first primer is\n"; > foreach my $key (keys %{$result1}) {print > "$key\t${$result1}{$key}\n"} > > Paul > > Paul A. Wiersma > Agriculture and Agri-Food Canada/Agriculture et > Agroalimentaire Canada > Summerland, BC > wiersmap at agr.gc.ca > > ? > > > > -----Original Message----- > From: chen li [mailto:chen_li3 at yahoo.com] > Sent: Monday, May 08, 2006 8:32 PM > To: Wiersma, Paul > Subject: Re: [Bioperl-l] use primer3 to design > primers with multiple sequences > > Hi Paul, > > I read both documents. What I understand is that > Bio:Tools::Run:Primer3 is for designing primers and > Bio:Tools::Primer3 is for parsing the results. When > I > read the documents I do not see this line > $result = $primer3->run in Bio:Tools::Primer3. I > wonder how you get this infomration. > > Thanks, > > Li > > --- "Wiersma, Paul" wrote: > > > Hi Li, > > > > > > > > When you execute $primer3->run with a > > Bio::Tools::Run::Primer3 object it > > opens -outfile=>"filename" for writing and then > > closes. That's why > > putting it in a loop will overwrite your output > file > > each time so you > > only see the last one. I suppose you could read > in > > each output file > > before looping to the next seq and append it to > > another file. > > > > > > > > If you're doing a fair bit of work with this > module > > it would be worth > > looking at the Bio::Tools::Primer3 module. The > > statement $result = > > $primer3->run produces a Bio::Tools::Primer3 > object > > which has all the > > methods you need for customizing your output. > > > > > > > > Paul > > > > > > > > Paul A. Wiersma > > Agriculture and Agri-Food Canada/Agriculture et > > Agroalimentaire Canada > > Summerland, BC > > > > wiersmap at agr.gc.ca > > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam > protection around > http://mail.yahoo.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cjfields at uiuc.edu Tue May 9 17:13:43 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 9 May 2006 16:13:43 -0500 Subject: [Bioperl-l] Oddness in Bio::SeqIO Message-ID: <000601c673ad$74601c30$15327e82@pyrimidine> I noticed an odd thing with SeqIO parsing of species lines (those problematic bacterial tax names again). I have a simple script that runs output to STDOUT to generate a list of hits. Here's what I get: Bacterium: Corynebacterium glutamicum ATCC 13032 hits: 4 Bacterium: Corynebacterium jeikeium K411 K411 <-- hits: 1 Bacterium: Frankia sp. CcI3 CcI3 <-- hits: 1 Bacterium: Frankia sp. EAN1pec EAN1pec <-- hits: 1 Bacterium: Janibacter sp. HTCC2649 HTCC2649 <-- hits: 1 Bacterium: Kineococcus radiotolerans SRS30216 SRS30216 <-- hits: 1 Bacterium: Leifsonia xyli subsp. xyli str. CTCB07 xyli str. CTCB07 <-- hits: 1 Bacterium: Mycobacterium avium subsp. paratuberculosis K-10 paratuberculosis K-10 <-- ... Most (but not all) of the strain numbers get repeated (marked with arrows). This is actually in the GenBank file itself, downloaded via Bio::DB::GenBank (and thus passed through Bio::SeqIO). Anyone seen this before? Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From torsten.seemann at infotech.monash.edu.au Tue May 9 19:42:29 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Wed, 10 May 2006 09:42:29 +1000 Subject: [Bioperl-l] Oddness in Bio::SeqIO In-Reply-To: <000601c673ad$74601c30$15327e82@pyrimidine> References: <000601c673ad$74601c30$15327e82@pyrimidine> Message-ID: <446128E5.1000908@infotech.monash.edu.au> Chris, > I noticed an odd thing with SeqIO parsing of species lines (those > problematic bacterial tax names again). I have a simple script that runs > output to STDOUT to generate a list of hits. Here's what I get: > Bacterium: Mycobacterium avium subsp. paratuberculosis K-10 paratuberculosis > K-10 <-- In this case, Genus = Mycobacterium Species = avium Subspecies = paratuberculosis Strain = K-10 which suggests that BioPerl is trying to handle something special, because the 'subsp.' is gone? Here's the pertinent parts of the Genbank file (apologies for the wrapping): LOCUS NC_002944 4829781 bp DNA circular BCT 18-JAN-2006 DEFINITION Mycobacterium avium subsp. paratuberculosis K-10, complete genome. SOURCE Mycobacterium avium subsp. paratuberculosis K-10 ORGANISM Mycobacterium avium subsp. paratuberculosis K-10 Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales; Corynebacterineae; Mycobacteriaceae; Mycobacterium; Mycobacterium avium complex (MAC). /organism="Mycobacterium avium subsp. paratuberculosis K-10" /strain="K-10" /sub_species="paratuberculosis" > Most (but not all) of the strain numbers get repeated (marked with arrows). > This is actually in the GenBank file itself, downloaded via Bio::DB::GenBank > (and thus passed through Bio::SeqIO). Anyone seen this before? The problem is mentioned in the wiki so it must have come up before? http://bioperl.org/wiki/Project_priority_list#Taxonomy_.2F_Species_data I also deal with Bacteria mainly, and should also look into this. I haven't been using the genbank headers directly, only the features, so i never came across this. Another thing which may crop up is when no Species has been allocated yet but the genus is known (or something like that). In that case the name is written as "Genus spp." eg. Gallibacterium spp. --Torsten From chen_li3 at yahoo.com Tue May 9 21:04:08 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 9 May 2006 18:04:08 -0700 (PDT) Subject: [Bioperl-l] use primer3 to design primers with multiple sequences In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C47@onncrxms5.agr.gc.ca> Message-ID: <20060510010408.24494.qmail@web36804.mail.mud.yahoo.com> Hi Paul, Thank you very much. Just like you point out in your lastest email I now figure out the line "my $result1=$results->primer_results(1);" returns a hash reference containing all the information for the first pair of primer. 1)Since it is a hash I should be able to get the specific value for its corresponding key by telling Perl which key is the entry for the value. 2) Also it is a reference I should deference it to get the so-called true value. I don't know too much OO and Perl and your code looks a little bit complicated to me. But I get the job done by adding the following lines directly: ############################################### #from Primer3 module to get all the infomration #foreach my $key (sort keys %{$result1}) { #print "$key\t${$result1}{$key}\n"} ################################################## #get the value for the key in the hash reference my $key_PRIMER_LEFT_SEQUENCE='PRIMER_LEFT_SEQUENCE'; print "$key_PRIMER_LEFT_SEQUENCE\t${$result1}{$key_PRIMER_LEFT_SEQUENCE}\n"; There is one point I don't understand: When I add these two lines into my code (line 49 in my code) my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID'; print "$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n"; I don't get the PRIMER_SEQUENCE_ID. Perl complains it and says "Use of uninitialized value in concatenation (.) or string at primer3-3 line 49." Li --- "Wiersma, Paul" wrote: > Hi Li, > > Just a bit of clarification of the code that I sent > earlier. > The line "my $result1=$results->primer_results($i);" > gives you a > reference to a hash that contains all of the > information for a primer > pair. > To access the entries you dereference the hash, i.e. > the hash is > %{$result1} and ${$result1}{'PRIMER_PRODUCT_SIZE'} > gives you the entry > for product size. The following are the available > entries. All are > single values or strings except PRIMER_RIGHT and > PRIMER_LEFT which are > start,length pairs (e.g. PRIMER_LEFT => '60,20') > which can be pulled out > with split. > my ($start, $length) = split /,/, > ${$result1}{'PRIMER_LEFT'}; > my $right_Tm = ${$result1}{'PRIMER_RIGHT_TM'} > PRIMER_PRODUCT_SIZE > PRIMER_PAIR_COMPL_ANY > PRIMER_PAIR_COMPL_END > PRIMER_PAIR_PENALTY > > PRIMER_LEFT > PRIMER_LEFT_END_STABILITY > PRIMER_LEFT_PENALTY > PRIMER_LEFT_TM > PRIMER_LEFT_GC_PERCENT > PRIMER_LEFT_SELF_ANY > PRIMER_LEFT_SELF_END > PRIMER_LEFT_SEQUENCE > > PRIMER_RIGHT > PRIMER_RIGHT_END_STABILITY > PRIMER_RIGHT_PENALTY > PRIMER_RIGHT_TM > PRIMER_RIGHT_GC_PERCENT > PRIMER_RIGHT_SELF_ANY > PRIMER_RIGHT_SELF_END > PRIMER_RIGHT_SEQUENCE > > Paul A. Wiersma > Agriculture and Agri-Food Canada/Agriculture et > Agroalimentaire Canada > Summerland, BC > wiersmap at agr.gc.ca > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From zhouyubio at gmail.com Tue May 9 21:35:01 2006 From: zhouyubio at gmail.com (Yu ZHOU) Date: Wed, 10 May 2006 01:35:01 +0000 (UTC) Subject: [Bioperl-l] pubmed References: <6.1.2.0.2.20050331171052.03830ba8@qfdong.mail.iastate.edu> Message-ID: Qunfeng iastate.edu> writes: > > Hi there, > > http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html > > I am not very familiar with BioPerl. I tried to follow the example showing > in the above page to retrieve pubmed ID under each Reference tag , i.e., > $value->pubmed(), but it doesn't work for me for the seq gi#56961711. The > authors() works for me. Appreciate any suggestions. > > Qunfeng > Hi, I have the same problem with you. Here is what I have done, by using regular expression to match the value of 'location' tag, if there is. #------------------ my $ann = $seqobj->annotation(); # annotation object foreach my $ref ( $ann->get_Annotations('reference') ) { print "Title: ", $ref->title,"\n"; print "Location: ", $ref->location, "\n"; if ($ref->location =~ /PUBMED\s+(\d+)/) { my $pmid = $1; print "PMID: ", $pmid, "\n"; } print "Authors: ", $ref->authors, "\n"; } #------------------ From osborne1 at optonline.net Tue May 9 23:01:49 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 09 May 2006 23:01:49 -0400 Subject: [Bioperl-l] pubmed In-Reply-To: Message-ID: Qunfeng, I'm using bioperl-live, I'm able retrieve the single PubMed id found in the 56961711 entry using the pubmed() method. Note that there are 4 references, only one of which has a Pubmed id. Also, the authors() method prints out the authors, not the Pubmed id. If you have a problem please show your code and tell us which version of Bioperl you're using. Brian O. use strict; use lib "/Users/bosborne/bioperl-live"; use Bio::DB::GenBank; my $db = Bio::DB::GenBank->new; my $seq = $db->get_Seq_by_id(56961711); my $ann_coll = $seq->annotation; foreach my $ann ($ann_coll->get_Annotations('reference')) { print "Author: ", $ann->authors, "\nPubmed id: ", $ann->pubmed, "\n"; } On 5/9/06 9:35 PM, "Yu ZHOU" wrote: > Qunfeng iastate.edu> writes: > >> >> Hi there, >> >> http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html >> >> I am not very familiar with BioPerl. I tried to follow the example showing >> in the above page to retrieve pubmed ID under each Reference tag , i.e., >> $value->pubmed(), but it doesn't work for me for the seq gi#56961711. The >> authors() works for me. Appreciate any suggestions. >> >> Qunfeng >> > > > Hi, > > I have the same problem with you. Here is what I have done, by using regular > expression to match the value of 'location' tag, if there is. > > #------------------ > my $ann = $seqobj->annotation(); # annotation object > foreach my $ref ( $ann->get_Annotations('reference') ) { > print "Title: ", $ref->title,"\n"; > print "Location: ", $ref->location, "\n"; > if ($ref->location =~ /PUBMED\s+(\d+)/) { > my $pmid = $1; > print "PMID: ", $pmid, "\n"; > } > print "Authors: ", $ref->authors, "\n"; > } > #------------------ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sb at mrc-dunn.cam.ac.uk Wed May 10 05:30:59 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Wed, 10 May 2006 10:30:59 +0100 Subject: [Bioperl-l] Bio::Taxonomy confusion Message-ID: <4461B2D3.7010603@mrc-dunn.cam.ac.uk> Hi, I'm a little confused as to how names are supposed to work in Bio::Taxonomy::Node. In the bioperl versions that I've looked at a Node doesn't seem to store the most important information about itself - it's scientific name - in an obvious place. bioperl 1.5.1 puts it at the start of the classification list. I'd have thought sticking it in -name would make more sense, but this is used only for the GenBank common name. The Bio::Taxonomy docs still suggests: my $node_species_sapiens = Bio::Taxonomy::Node->new( -object_id => 9606, # or -ncbi_taxid. Requird tag -names => { 'scientific' => ['sapiens'], 'common_name' => ['human'] }, -rank => 'species' # Required tag ); and whilst Bio::Taxonomy::Node does not accept -names, it does have a 'name' method which claims to work like: $obj->name('scientific', 'sapiens'); This kind of thing would be really nice, but afaics Bio::Taxonomy::Node->new takes the -name value and makes a common name out of it, whilst the name() method passes any 'scientific' name to the scientific_name() method which is unable to set any value (and warns about this), only get. It seems like the need to have this classification array work the same way as Bio::Species is causing some unnecessary restrictions. Can't the more sensible idea of having a dedicated storage spot for the ScientificName and other parameters be used, with the classification array either being generated just-in-time from the hash-stored data, or indeed being generated from the Lineage field? Also, why does a node store the complete hierarchy on itself in the classification array? If we're going that far, why don't the Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a get_taxonomy() method instead of a get_Taxonomy_Node() method. get_taxonomy() could, from a single efetch.fcgi lookup, create a complete Bio::Taxonomy with all the nodes. Whilst most nodes would only have a minimum of information, if you could simply ask a node what its rank and scientific name was you could easily build a classification array, or ask what Kingdom your species was in etc. Are there good reasons for Taxonomy working the way it does in 1.5.1, or would I not be wasting my time re-writing things to make more sense (to me)? Cheers, Sendu. From osborne1 at optonline.net Wed May 10 10:33:18 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 10 May 2006 10:33:18 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiple sequences In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C43@onncrxms5.agr.gc.ca> Message-ID: Paul, I took your code, added some "run" code and made it into a script and added this to CVS, examples/tools/run_primer3.pl. I hope this is OK with you. Brian O. On 5/9/06 1:59 PM, "Wiersma, Paul" wrote: > $results->number_of_results From stoltzfu at umbi.umd.edu Tue May 9 16:22:43 2006 From: stoltzfu at umbi.umd.edu (Arlin Stoltzfus) Date: Tue, 09 May 2006 16:22:43 -0400 Subject: [Bioperl-l] proposal: CDAT (character data and trees) integrative object Message-ID: Dear developers-- We propose a Bio::CDAT (Character Data And Trees) module to facilitate comparative analysis using evolutionary methods by 1) managing evolutionary relationships (by linking data to trees) and 2) allowing coordinated analysis of different types of data (by implementing a generic concept of ?character-state? data). Bio::CDAT would take advantage of existing BioPerl objects and would include the functionality of Rutger Vos's Bio::Phylo. It would provide the framework to develop interfaces to analysis tools (phylogeny inference, evolutionary rate models, functional shift inference, etc), as well as to file formats and visualization methods appropriate for such analyses. A proposal is attached. We would like to hear your thoughts (e.g., see the section on "Questions to consider")! Thanks Arlin Stoltzfus WeiGang Qiu Rutger Vos (with thanks to Justin Reese and Aaron Mackey) ------------------ Arlin Stoltzfus (stoltzfu at umbi.umd.edu) CARB, 9600 Gudelsky Drive, Rockville, Maryland 20850 tel 240 314 6208, fax 240 314 6255, www.molevol.org/camel --------- -------------- next part -------------- A non-text attachment was scrubbed... Name: CDAT-proposal.pdf Type: application/pdf Size: 193701 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060509/48aeca4b/attachment-0001.pdf -------------- next part -------------- From zhouyubio at gmail.com Wed May 10 04:55:46 2006 From: zhouyubio at gmail.com (Yu Zhou) Date: Wed, 10 May 2006 16:55:46 +0800 Subject: [Bioperl-l] pubmed In-Reply-To: References: Message-ID: <613ffb490605100155w43a9ea4sca23818bc7fa4e33@mail.gmail.com> Thanks! I am using Bioperl-1.4, not bioperl-live. That may be the reason why it does not work! On 5/10/06, Brian Osborne wrote: > Qunfeng, > > I'm using bioperl-live, I'm able retrieve the single PubMed id found in the > 56961711 entry using the pubmed() method. Note that there are 4 references, > only one of which has a Pubmed id. Also, the authors() method prints out the > authors, not the Pubmed id. If you have a problem please show your code and > tell us which version of Bioperl you're using. > > Brian O. > > > use strict; > > use lib "/Users/bosborne/bioperl-live"; > > use Bio::DB::GenBank; > > > > my $db = Bio::DB::GenBank->new; > > my $seq = $db->get_Seq_by_id(56961711); > > my $ann_coll = $seq->annotation; > > > foreach my $ann ($ann_coll->get_Annotations('reference')) { > > print "Author: ", $ann->authors, "\nPubmed id: ", $ann->pubmed, "\n"; > > } > > > > > > On 5/9/06 9:35 PM, "Yu ZHOU" wrote: > > > Qunfeng iastate.edu> writes: > > > >> > >> Hi there, > >> > >> http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html > >> > >> I am not very familiar with BioPerl. I tried to follow the example > showing > >> in the above page to retrieve pubmed ID under each Reference tag , i.e., > >> $value->pubmed(), but it doesn't work for me for the seq gi#56961711. The > >> authors() works for me. Appreciate any suggestions. > >> > >> Qunfeng > >> > > > > > > Hi, > > > > I have the same problem with you. Here is what I have done, by using > regular > > expression to match the value of 'location' tag, if there is. > > > > #------------------ > > my $ann = $seqobj->annotation(); # annotation object > > foreach my $ref ( $ann->get_Annotations('reference') ) { > > print "Title: ", $ref->title,"\n"; > > print "Location: ", $ref->location, "\n"; > > if ($ref->location =~ /PUBMED\s+(\d+)/) { > > my $pmid = $1; > > print "PMID: ", $pmid, "\n"; > > } > > print "Authors: ", $ref->authors, "\n"; > > } > > #------------------ > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- Best Wishes! Yu From cjfields at uiuc.edu Wed May 10 11:46:27 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 10 May 2006 10:46:27 -0500 Subject: [Bioperl-l] Oddness in Bio::SeqIO In-Reply-To: <446128E5.1000908@infotech.monash.edu.au> Message-ID: <000f01c67448$e63973b0$15327e82@pyrimidine> This actually pops up when using $seq->species->common_name; using $seq->species->binomial chops some of the strain designations off, so really neither one works optimally for bacterial genus-species-strain taxonomy. Hilmar made the suggestion that it's probably best to grab the NCBI TaxID and parse it out that way by looking it up in the taxonomy database (using Bio::DB::Taxonomy), but at the moment that's not what Bio::SeqIO::genbank does. I wonder if we should be trying to shove most of this stuff into species objects directly from the beginning; in other words, maybe we should try to get the information in Bio::Annotation objects and then, after the parsing/IO is finished, have a method to get the information into Bio::Species objects when wanted/needed; a check could be added against the NCBI Taxonomy database there. Anyway, I really haven't looked at how they are parsed out and don't have the time at the moment. I may look into this as well but not until I get back from conference (end of May). Jason and Brian have been calling for a refactoring of Bio::SeqIO::genbank for a while; maybe it's getting time to do something about it... Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > Sent: Tuesday, May 09, 2006 6:42 PM > To: Chris Fields > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Oddness in Bio::SeqIO > > Chris, > > > I noticed an odd thing with SeqIO parsing of species lines (those > > problematic bacterial tax names again). I have a simple script that > runs > > output to STDOUT to generate a list of hits. Here's what I get: > > > Bacterium: Mycobacterium avium subsp. paratuberculosis K-10 > paratuberculosis > > K-10 <-- > > In this case, > > Genus = Mycobacterium > Species = avium > Subspecies = paratuberculosis > Strain = K-10 > > which suggests that BioPerl is trying to handle something special, > because the 'subsp.' is gone? > > Here's the pertinent parts of the Genbank file > (apologies for the wrapping): > > LOCUS NC_002944 4829781 bp DNA circular BCT > 18-JAN-2006 > DEFINITION Mycobacterium avium subsp. paratuberculosis K-10, complete > genome. > SOURCE Mycobacterium avium subsp. paratuberculosis K-10 > ORGANISM Mycobacterium avium subsp. paratuberculosis K-10 > Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales; > Corynebacterineae; Mycobacteriaceae; Mycobacterium; > Mycobacterium > avium complex (MAC). > > /organism="Mycobacterium avium subsp. > paratuberculosis K-10" > /strain="K-10" > /sub_species="paratuberculosis" > > > > Most (but not all) of the strain numbers get repeated (marked with > arrows). > > This is actually in the GenBank file itself, downloaded via > Bio::DB::GenBank > > (and thus passed through Bio::SeqIO). Anyone seen this before? > > The problem is mentioned in the wiki so it must have come up before? > http://bioperl.org/wiki/Project_priority_list#Taxonomy_.2F_Species_data > > I also deal with Bacteria mainly, and should also look into this. I > haven't been using the genbank headers directly, only the features, so i > never came across this. > > Another thing which may crop up is when no Species has been allocated > yet but the genus is known (or something like that). In that case the > name is written as "Genus spp." eg. Gallibacterium spp. > > --Torsten > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cuiw at mail.nih.gov Wed May 10 12:02:55 2006 From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F]) Date: Wed, 10 May 2006 12:02:55 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiplesequences In-Reply-To: <20060510010408.24494.qmail@web36804.mail.mud.yahoo.com> Message-ID: 'PRIMER_SEQUENCE_ID' is not a key in the Bio::Tools::Primer3 output hash. You can find all legal keys by "print keys %{$result1};" There is one point I don't understand: When I add these two lines into my code (line 49 in my code) my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID'; print "$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n"; I don't get the PRIMER_SEQUENCE_ID. Perl complains it and says "Use of uninitialized value in concatenation (.) or string at primer3-3 line 49." Li From WiersmaP at AGR.GC.CA Wed May 10 12:08:37 2006 From: WiersmaP at AGR.GC.CA (Wiersma, Paul) Date: Wed, 10 May 2006 12:08:37 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiple sequences Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C48@onncrxms5.agr.gc.ca> Brian, no problem with the code, thanks for asking. Li, PRIMER_SEQUENCE_ID and SEQUENCE are not part of the individual results but only end up by default with $results->primer_results(0). If you try to access them using $results->primer_results(1) (or anything but 0) you will get an error. Paul Paul A. Wiersma Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Summerland, BC wiersmap at agr.gc.ca ? -----Original Message----- From: chen li [mailto:chen_li3 at yahoo.com] Sent: Tuesday, May 09, 2006 6:04 PM To: Wiersma, Paul Cc: bioperl-l at bioperl.org Subject: RE: [Bioperl-l] use primer3 to design primers with multiple sequences Hi Paul, Thank you very much. Just like you point out in your lastest email I now figure out the line "my $result1=$results->primer_results(1);" returns a hash reference containing all the information for the first pair of primer. 1)Since it is a hash I should be able to get the specific value for its corresponding key by telling Perl which key is the entry for the value. 2) Also it is a reference I should deference it to get the so-called true value. I don't know too much OO and Perl and your code looks a little bit complicated to me. But I get the job done by adding the following lines directly: ############################################### #from Primer3 module to get all the infomration #foreach my $key (sort keys %{$result1}) { #print "$key\t${$result1}{$key}\n"} ################################################## #get the value for the key in the hash reference my $key_PRIMER_LEFT_SEQUENCE='PRIMER_LEFT_SEQUENCE'; print "$key_PRIMER_LEFT_SEQUENCE\t${$result1}{$key_PRIMER_LEFT_SEQUENCE}\n"; There is one point I don't understand: When I add these two lines into my code (line 49 in my code) my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID'; print "$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n"; I don't get the PRIMER_SEQUENCE_ID. Perl complains it and says "Use of uninitialized value in concatenation (.) or string at primer3-3 line 49." Li --- "Wiersma, Paul" wrote: > Hi Li, > > Just a bit of clarification of the code that I sent > earlier. > The line "my $result1=$results->primer_results($i);" > gives you a > reference to a hash that contains all of the > information for a primer > pair. > To access the entries you dereference the hash, i.e. > the hash is > %{$result1} and ${$result1}{'PRIMER_PRODUCT_SIZE'} > gives you the entry > for product size. The following are the available > entries. All are > single values or strings except PRIMER_RIGHT and > PRIMER_LEFT which are > start,length pairs (e.g. PRIMER_LEFT => '60,20') > which can be pulled out > with split. > my ($start, $length) = split /,/, > ${$result1}{'PRIMER_LEFT'}; > my $right_Tm = ${$result1}{'PRIMER_RIGHT_TM'} > PRIMER_PRODUCT_SIZE > PRIMER_PAIR_COMPL_ANY > PRIMER_PAIR_COMPL_END > PRIMER_PAIR_PENALTY > > PRIMER_LEFT > PRIMER_LEFT_END_STABILITY > PRIMER_LEFT_PENALTY > PRIMER_LEFT_TM > PRIMER_LEFT_GC_PERCENT > PRIMER_LEFT_SELF_ANY > PRIMER_LEFT_SELF_END > PRIMER_LEFT_SEQUENCE > > PRIMER_RIGHT > PRIMER_RIGHT_END_STABILITY > PRIMER_RIGHT_PENALTY > PRIMER_RIGHT_TM > PRIMER_RIGHT_GC_PERCENT > PRIMER_RIGHT_SELF_ANY > PRIMER_RIGHT_SELF_END > PRIMER_RIGHT_SEQUENCE > > Paul A. Wiersma > Agriculture and Agri-Food Canada/Agriculture et > Agroalimentaire Canada > Summerland, BC > wiersmap at agr.gc.ca > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cuiw at mail.nih.gov Wed May 10 14:42:36 2006 From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F]) Date: Wed, 10 May 2006 14:42:36 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiplesequences: bug in code! In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C48@onncrxms5.agr.gc.ca> Message-ID: Hope this works! Bio::Tools::Primer3 line 264 should be: $self->{seqobject}=Bio::Seq->new(-seq=>$value, -id=>$id); Then you should be able to display PRIMER_SEQUENCE_ID by ####read primer3 output file############ my $p3=Bio::Tools::Primer3->new(-file=>"data/primer3_output.txt"); ######## print id############### print $p3->seqobject->id; Wenwu Cui, PhD NIH/NCI -----Original Message----- From: Wiersma, Paul [mailto:WiersmaP at agr.gc.ca] Sent: Wednesday, May 10, 2006 12:09 PM To: chen li Cc: bioperl-l at bioperl.org Subject: Re: [Bioperl-l] use primer3 to design primers with multiplesequences Brian, no problem with the code, thanks for asking. Li, PRIMER_SEQUENCE_ID and SEQUENCE are not part of the individual results but only end up by default with $results->primer_results(0). If you try to access them using $results->primer_results(1) (or anything but 0) you will get an error. Paul Paul A. Wiersma Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Summerland, BC wiersmap at agr.gc.ca ? -----Original Message----- From: chen li [mailto:chen_li3 at yahoo.com] Sent: Tuesday, May 09, 2006 6:04 PM To: Wiersma, Paul Cc: bioperl-l at bioperl.org Subject: RE: [Bioperl-l] use primer3 to design primers with multiple sequences Hi Paul, Thank you very much. Just like you point out in your lastest email I now figure out the line "my $result1=$results->primer_results(1);" returns a hash reference containing all the information for the first pair of primer. 1)Since it is a hash I should be able to get the specific value for its corresponding key by telling Perl which key is the entry for the value. 2) Also it is a reference I should deference it to get the so-called true value. I don't know too much OO and Perl and your code looks a little bit complicated to me. But I get the job done by adding the following lines directly: ############################################### #from Primer3 module to get all the infomration #foreach my $key (sort keys %{$result1}) { #print "$key\t${$result1}{$key}\n"} ################################################## #get the value for the key in the hash reference my $key_PRIMER_LEFT_SEQUENCE='PRIMER_LEFT_SEQUENCE'; print "$key_PRIMER_LEFT_SEQUENCE\t${$result1}{$key_PRIMER_LEFT_SEQUENCE}\n"; There is one point I don't understand: When I add these two lines into my code (line 49 in my code) my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID'; print "$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n"; I don't get the PRIMER_SEQUENCE_ID. Perl complains it and says "Use of uninitialized value in concatenation (.) or string at primer3-3 line 49." Li --- "Wiersma, Paul" wrote: > Hi Li, > > Just a bit of clarification of the code that I sent > earlier. > The line "my $result1=$results->primer_results($i);" > gives you a > reference to a hash that contains all of the > information for a primer > pair. > To access the entries you dereference the hash, i.e. > the hash is > %{$result1} and ${$result1}{'PRIMER_PRODUCT_SIZE'} > gives you the entry > for product size. The following are the available > entries. All are > single values or strings except PRIMER_RIGHT and > PRIMER_LEFT which are > start,length pairs (e.g. PRIMER_LEFT => '60,20') > which can be pulled out > with split. > my ($start, $length) = split /,/, > ${$result1}{'PRIMER_LEFT'}; > my $right_Tm = ${$result1}{'PRIMER_RIGHT_TM'} > PRIMER_PRODUCT_SIZE > PRIMER_PAIR_COMPL_ANY > PRIMER_PAIR_COMPL_END > PRIMER_PAIR_PENALTY > > PRIMER_LEFT > PRIMER_LEFT_END_STABILITY > PRIMER_LEFT_PENALTY > PRIMER_LEFT_TM > PRIMER_LEFT_GC_PERCENT > PRIMER_LEFT_SELF_ANY > PRIMER_LEFT_SELF_END > PRIMER_LEFT_SEQUENCE > > PRIMER_RIGHT > PRIMER_RIGHT_END_STABILITY > PRIMER_RIGHT_PENALTY > PRIMER_RIGHT_TM > PRIMER_RIGHT_GC_PERCENT > PRIMER_RIGHT_SELF_ANY > PRIMER_RIGHT_SELF_END > PRIMER_RIGHT_SEQUENCE > > Paul A. Wiersma > Agriculture and Agri-Food Canada/Agriculture et > Agroalimentaire Canada > Summerland, BC > wiersmap at agr.gc.ca > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Wed May 10 14:58:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 10 May 2006 13:58:19 -0500 Subject: [Bioperl-l] ListSummaries for April 26-May 9 Message-ID: <001801c67463$b3c0a910$15327e82@pyrimidine> ListSummaries for April 26-May 9 are up at the usual place: http://www.bioperl.org/wiki/Mailing_list_summaries Direct link: http://www.bioperl.org/wiki/ListSummary:April_26-May_9%2C2006 It's a bit of a hurried one so don't be surprised to find a few spelling errors here and there. I'm getting ready for a conference in a couple weeks so I may be off the radar a bit here and there. The next ListSummary won't be posted until May 26. Enjoy! Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From chen_li3 at yahoo.com Wed May 10 20:27:34 2006 From: chen_li3 at yahoo.com (chen li) Date: Wed, 10 May 2006 17:27:34 -0700 (PDT) Subject: [Bioperl-l] What is the relationship between primer3 module and run-primer3 module? Message-ID: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com> First thank you all for replying my previous post about primer3. But now I am a little confused even after I read the documents: What is the relationship between these two modules? What is correct/standard way to use them to do the batch-primer design? What I do is that I use Bio::Tools::Run::Primer3 to design primers. Based on Dr. Roy Chaudhuri's information I can set the parameters using the following syntax: $primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510'); Based on Paul A. Wiersma's explanation I can also print out part of the primer results(because I don't need all the information). But there is a little trouble: PRIMER_SEQUENCE_ID can't be accessed using this method. And Paul points out that "PRIMER_SEQUENCE_ID and SEQUENCE are not part of the individual results but only end up by default with $results->primer_results(0)". So it seems there is no way to get around this problem using Bio::Tools::Run::Primer3. And others suggest using Bio::Tools::Primer3 to parse the results. So is true that Bio::Tools::Run::Primer3 is for primer design and Bio::Tools::Primer3 is for parsing the results from Bio::Tools::Run::Primer3? But what I find is that I get almost all the results (except PRIMER_SEQUENCE_ID and SEQUENCE ) without providing a line code use Bio::Tools::Primer3 in the script. How to explain this? Is it because the following line code? my $result=$primer3->run; The last question: which line code is used to invoke program primer3.exe? How does Perl script call the primer3.exe? Once again thank you all very much, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From jason.stajich at duke.edu Wed May 10 20:41:31 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed, 10 May 2006 20:41:31 -0400 Subject: [Bioperl-l] What is the relationship between primer3 module and run-primer3 module? In-Reply-To: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com> References: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com> Message-ID: Bio::Tools::Run::XXX modules are for running applications... On May 10, 2006, at 8:27 PM, chen li wrote: > First thank you all for replying my previous post > about primer3. > > But now I am a little confused even after I read the > documents: What is the relationship between these two > modules? What is correct/standard way to use them to > do the batch-primer design? What I do is that I use > Bio::Tools::Run::Primer3 to design primers. Based on > Dr. Roy Chaudhuri's information I can set the > parameters using the following syntax: > > $primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510'); > > Based on Paul A. Wiersma's explanation I can also > print out part of the primer results(because I don't > need all the information). But there is a little > trouble: PRIMER_SEQUENCE_ID can't be accessed using > this method. And Paul points out that > "PRIMER_SEQUENCE_ID and SEQUENCE are not part of the > individual > results but only end up by default with > $results->primer_results(0)". So it seems there is no > way to get around this problem using > Bio::Tools::Run::Primer3. And others suggest using > Bio::Tools::Primer3 to parse the results. So is true > that Bio::Tools::Run::Primer3 is for primer design and > Bio::Tools::Primer3 is for parsing the results from > Bio::Tools::Run::Primer3? But what I find is that I > get almost all the results (except PRIMER_SEQUENCE_ID > and SEQUENCE ) without providing a line code > > use Bio::Tools::Primer3 > > in the script. How to explain this? Is it because the > following line code? > > my $result=$primer3->run; > > The last question: which line code is used to invoke > program primer3.exe? How does Perl script call the > primer3.exe? > > Once again thank you all very much, > > Li > > > > > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From jason.stajich at duke.edu Wed May 10 20:53:43 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed, 10 May 2006 20:53:43 -0400 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: <4461B2D3.7010603@mrc-dunn.cam.ac.uk> References: <4461B2D3.7010603@mrc-dunn.cam.ac.uk> Message-ID: <655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu> I would use the implementation that talks to the flatfile db as the standard here. nodes are defined by the data in from taxonomy dump dbs from ncbi. the eutils is pretty worthless except for taxid->name or reverse, you can't get the full taxonomy (or couldn't when that implementation was written). The "name" method refers to the name of the node - each level in the taxonomy can have a "name". The bits of hackiness relate to wrapping the node object as a Bio::Species and/or being able to read a genbank file and the organism taxonomy data as a list and instantiating. If we could rely on everything being in a DB of course this would be simpler. Another problem is the depth of the taxonomy is not constant for every node so assuming that a fixed number of slots will be filled in to generate the taxonomy leads to problems. Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the best example of working code as this is how I really wanted it to work, the Bio::Species hacks are only there to shoehorn data retrieved from genbank files in. With the flatfile implementation you have to walk all the way up the db hierarchy to get the kingdom for a node so you do have to build up the classification hierarchy as each node only stores data about itsself. I'm not exactly sure what you are proposing to do, but would definitely enjoy another pair of hands, I don't really have time to mess with it any time soon. -jason On May 10, 2006, at 5:30 AM, Sendu Bala wrote: > Hi, > I'm a little confused as to how names are supposed to work in > Bio::Taxonomy::Node. > > In the bioperl versions that I've looked at a Node doesn't seem to > store > the most important information about itself - it's scientific name > - in > an obvious place. bioperl 1.5.1 puts it at the start of the > classification list. I'd have thought sticking it in -name would make > more sense, but this is used only for the GenBank common name. > > The Bio::Taxonomy docs still suggests: > > my $node_species_sapiens = Bio::Taxonomy::Node->new( > -object_id => 9606, # or -ncbi_taxid. Requird tag > -names => { > 'scientific' => ['sapiens'], > 'common_name' => ['human'] > }, > -rank => 'species' # Required tag > ); > > and whilst Bio::Taxonomy::Node does not accept -names, it does have a > 'name' method which claims to work like: > > $obj->name('scientific', 'sapiens'); > > This kind of thing would be really nice, but afaics > Bio::Taxonomy::Node->new takes the -name value and makes a common name > out of it, whilst the name() method passes any 'scientific' name to > the > scientific_name() method which is unable to set any value (and warns > about this), only get. > > It seems like the need to have this classification array work the same > way as Bio::Species is causing some unnecessary restrictions. Can't > the > more sensible idea of having a dedicated storage spot for the > ScientificName and other parameters be used, with the classification > array either being generated just-in-time from the hash-stored > data, or > indeed being generated from the Lineage field? > > > Also, why does a node store the complete hierarchy on itself in the > classification array? If we're going that far, why don't the > Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a > get_taxonomy() method instead of a get_Taxonomy_Node() method. > get_taxonomy() could, from a single efetch.fcgi lookup, create a > complete Bio::Taxonomy with all the nodes. Whilst most nodes would > only > have a minimum of information, if you could simply ask a node what its > rank and scientific name was you could easily build a classification > array, or ask what Kingdom your species was in etc. > > Are there good reasons for Taxonomy working the way it does in > 1.5.1, or > would I not be wasting my time re-writing things to make more sense > (to me)? > > > Cheers, > Sendu. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cuiw at mail.nih.gov Wed May 10 21:46:00 2006 From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F]) Date: Wed, 10 May 2006 21:46:00 -0400 Subject: [Bioperl-l] What is the relationship between primer3 module andrun-primer3 module? References: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com> Message-ID: 1. Bio::Tools::Primer3 is already included in Bio::Tools::Run::Primer3 module so that you can parse the result file. 2. There is a bug in Bio::Toos::Primer3.pm line 264 as I mentioned. Once fixed, it can output PRIMER_SEQUENCE_ID 3. primer3.exe is called in the Bio::Tools::Run::Primer3 "run" function, please read the function definition. ________________________________ From: chen li [mailto:chen_li3 at yahoo.com] Sent: Wed 5/10/2006 8:27 PM To: bioperl-l at bioperl.org Subject: [Bioperl-l] What is the relationship between primer3 module andrun-primer3 module? First thank you all for replying my previous post about primer3. But now I am a little confused even after I read the documents: What is the relationship between these two modules? What is correct/standard way to use them to do the batch-primer design? What I do is that I use Bio::Tools::Run::Primer3 to design primers. Based on Dr. Roy Chaudhuri's information I can set the parameters using the following syntax: $primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510'); Based on Paul A. Wiersma's explanation I can also print out part of the primer results(because I don't need all the information). But there is a little trouble: PRIMER_SEQUENCE_ID can't be accessed using this method. And Paul points out that "PRIMER_SEQUENCE_ID and SEQUENCE are not part of the individual results but only end up by default with $results->primer_results(0)". So it seems there is no way to get around this problem using Bio::Tools::Run::Primer3. And others suggest using Bio::Tools::Primer3 to parse the results. So is true that Bio::Tools::Run::Primer3 is for primer design and Bio::Tools::Primer3 is for parsing the results from Bio::Tools::Run::Primer3? But what I find is that I get almost all the results (except PRIMER_SEQUENCE_ID and SEQUENCE ) without providing a line code use Bio::Tools::Primer3 in the script. How to explain this? Is it because the following line code? my $result=$primer3->run; The last question: which line code is used to invoke program primer3.exe? How does Perl script call the primer3.exe? Once again thank you all very much, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Wed May 10 23:36:39 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 10 May 2006 22:36:39 -0500 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: <655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu> Message-ID: <000301c674ac$1d40f0f0$15327e82@pyrimidine> I think you can get pretty much everything now, though I can definitely see the use of a local database. I ran a few tests, really unrelated to this, using the powerscripting test page at NCBI for eutils (for the curious, at http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was able to retrieve XML-formatted taxonomic information; here's the bacterium Frankia sp. CcI3 TaxID info, which looks like they have everything set up by rank. It gives quite a bit of information. 106370 Frankia sp. CcI3 1854 species Bacteria 11 Bacterial and Plant Plastid 0 Unspecified cellular organisms; Bacteria; Actinobacteria; Actinobacteria (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae; Frankia 131567 cellular organisms no rank 2 Bacteria superkingdom 201174 Actinobacteria phylum 1760 Actinobacteria (class) class 85003 Actinobacteridae subclass 2037 Actinomycetales order 85013 Frankineae suborder 74712 Frankiaceae family 1854 Frankia genus 1999/10/22 2005/01/19 2000/02/02 Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Jason Stajich > Sent: Wednesday, May 10, 2006 7:54 PM > To: Sendu Bala > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::Taxonomy confusion > > I would use the implementation that talks to the flatfile db as the > standard here. nodes are defined by the data in from taxonomy dump > dbs from ncbi. > the eutils is pretty worthless except for taxid->name or reverse, you > can't get the full taxonomy (or couldn't when that implementation was > written). > > The "name" method refers to the name of the node - each level in the > taxonomy can have a "name". > > The bits of hackiness relate to wrapping the node object as a > Bio::Species and/or being able to read a genbank file and the > organism taxonomy data as a list and instantiating. If we could rely > on everything being in a DB of course this would be simpler. > > Another problem is the depth of the taxonomy is not constant for > every node so assuming that a fixed number of slots will be filled in > to generate the taxonomy leads to problems. > > Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the > best example of working code as this is how I really wanted it to > work, the Bio::Species hacks are only there to shoehorn data > retrieved from genbank files in. With the flatfile implementation > you have to walk all the way up the db hierarchy to get the kingdom > for a node so you do have to build up the classification hierarchy as > each node only stores data about itsself. > > I'm not exactly sure what you are proposing to do, but would > definitely enjoy another pair of hands, I don't really have time to > mess with it any time soon. > > -jason > On May 10, 2006, at 5:30 AM, Sendu Bala wrote: > > > Hi, > > I'm a little confused as to how names are supposed to work in > > Bio::Taxonomy::Node. > > > > In the bioperl versions that I've looked at a Node doesn't seem to > > store > > the most important information about itself - it's scientific name > > - in > > an obvious place. bioperl 1.5.1 puts it at the start of the > > classification list. I'd have thought sticking it in -name would make > > more sense, but this is used only for the GenBank common name. > > > > The Bio::Taxonomy docs still suggests: > > > > my $node_species_sapiens = Bio::Taxonomy::Node->new( > > -object_id => 9606, # or -ncbi_taxid. Requird tag > > -names => { > > 'scientific' => ['sapiens'], > > 'common_name' => ['human'] > > }, > > -rank => 'species' # Required tag > > ); > > > > and whilst Bio::Taxonomy::Node does not accept -names, it does have a > > 'name' method which claims to work like: > > > > $obj->name('scientific', 'sapiens'); > > > > This kind of thing would be really nice, but afaics > > Bio::Taxonomy::Node->new takes the -name value and makes a common name > > out of it, whilst the name() method passes any 'scientific' name to > > the > > scientific_name() method which is unable to set any value (and warns > > about this), only get. > > > > It seems like the need to have this classification array work the same > > way as Bio::Species is causing some unnecessary restrictions. Can't > > the > > more sensible idea of having a dedicated storage spot for the > > ScientificName and other parameters be used, with the classification > > array either being generated just-in-time from the hash-stored > > data, or > > indeed being generated from the Lineage field? > > > > > > Also, why does a node store the complete hierarchy on itself in the > > classification array? If we're going that far, why don't the > > Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a > > get_taxonomy() method instead of a get_Taxonomy_Node() method. > > get_taxonomy() could, from a single efetch.fcgi lookup, create a > > complete Bio::Taxonomy with all the nodes. Whilst most nodes would > > only > > have a minimum of information, if you could simply ask a node what its > > rank and scientific name was you could easily build a classification > > array, or ask what Kingdom your species was in etc. > > > > Are there good reasons for Taxonomy working the way it does in > > 1.5.1, or > > would I not be wasting my time re-writing things to make more sense > > (to me)? > > > > > > Cheers, > > Sendu. > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Thu May 11 08:04:54 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu, 11 May 2006 08:04:54 -0400 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: <000301c674ac$1d40f0f0$15327e82@pyrimidine> References: <000301c674ac$1d40f0f0$15327e82@pyrimidine> Message-ID: Great - now we just need someone to volunteer to actually work on this. The current code grabs most of this but I believe expects a different XML On May 10, 2006, at 11:36 PM, Chris Fields wrote: > I think you can get pretty much everything now, though I can > definitely see > the use of a local database. I ran a few tests, really unrelated > to this, > using the powerscripting test page at NCBI for eutils (for the > curious, at > http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was > able to > retrieve XML-formatted taxonomic information; here's the bacterium > Frankia > sp. CcI3 TaxID info, which looks like they have everything set up > by rank. > It gives quite a bit of information. > > > "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd"> > > > > 106370 > Frankia sp. CcI3 > 1854 > species > Bacteria > > 11 > Bacterial and Plant Plastid > > > 0 > Unspecified > > cellular organisms; Bacteria; Actinobacteria; > Actinobacteria > (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae; > Frankia > > > 131567 > cellular organisms > no rank > > > 2 > Bacteria > superkingdom > > > 201174 > Actinobacteria > phylum > > > 1760 > Actinobacteria (class) > class > > > 85003 > Actinobacteridae > subclass > > > 2037 > Actinomycetales > order > > > 85013 > Frankineae > suborder > > > 74712 > Frankiaceae > family > > > 1854 > Frankia > genus > > > 1999/10/22 > 2005/01/19 > 2000/02/02 > > > > Chris > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Jason Stajich >> Sent: Wednesday, May 10, 2006 7:54 PM >> To: Sendu Bala >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion >> >> I would use the implementation that talks to the flatfile db as the >> standard here. nodes are defined by the data in from taxonomy dump >> dbs from ncbi. >> the eutils is pretty worthless except for taxid->name or reverse, you >> can't get the full taxonomy (or couldn't when that implementation was >> written). >> >> The "name" method refers to the name of the node - each level in the >> taxonomy can have a "name". >> >> The bits of hackiness relate to wrapping the node object as a >> Bio::Species and/or being able to read a genbank file and the >> organism taxonomy data as a list and instantiating. If we could rely >> on everything being in a DB of course this would be simpler. >> >> Another problem is the depth of the taxonomy is not constant for >> every node so assuming that a fixed number of slots will be filled in >> to generate the taxonomy leads to problems. >> >> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the >> best example of working code as this is how I really wanted it to >> work, the Bio::Species hacks are only there to shoehorn data >> retrieved from genbank files in. With the flatfile implementation >> you have to walk all the way up the db hierarchy to get the kingdom >> for a node so you do have to build up the classification hierarchy as >> each node only stores data about itsself. >> >> I'm not exactly sure what you are proposing to do, but would >> definitely enjoy another pair of hands, I don't really have time to >> mess with it any time soon. >> >> -jason >> On May 10, 2006, at 5:30 AM, Sendu Bala wrote: >> >>> Hi, >>> I'm a little confused as to how names are supposed to work in >>> Bio::Taxonomy::Node. >>> >>> In the bioperl versions that I've looked at a Node doesn't seem to >>> store >>> the most important information about itself - it's scientific name >>> - in >>> an obvious place. bioperl 1.5.1 puts it at the start of the >>> classification list. I'd have thought sticking it in -name would >>> make >>> more sense, but this is used only for the GenBank common name. >>> >>> The Bio::Taxonomy docs still suggests: >>> >>> my $node_species_sapiens = Bio::Taxonomy::Node->new( >>> -object_id => 9606, # or -ncbi_taxid. Requird tag >>> -names => { >>> 'scientific' => ['sapiens'], >>> 'common_name' => ['human'] >>> }, >>> -rank => 'species' # Required tag >>> ); >>> >>> and whilst Bio::Taxonomy::Node does not accept -names, it does >>> have a >>> 'name' method which claims to work like: >>> >>> $obj->name('scientific', 'sapiens'); >>> >>> This kind of thing would be really nice, but afaics >>> Bio::Taxonomy::Node->new takes the -name value and makes a common >>> name >>> out of it, whilst the name() method passes any 'scientific' name to >>> the >>> scientific_name() method which is unable to set any value (and warns >>> about this), only get. >>> >>> It seems like the need to have this classification array work the >>> same >>> way as Bio::Species is causing some unnecessary restrictions. Can't >>> the >>> more sensible idea of having a dedicated storage spot for the >>> ScientificName and other parameters be used, with the classification >>> array either being generated just-in-time from the hash-stored >>> data, or >>> indeed being generated from the Lineage field? >>> >>> >>> Also, why does a node store the complete hierarchy on itself in the >>> classification array? If we're going that far, why don't the >>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a >>> get_taxonomy() method instead of a get_Taxonomy_Node() method. >>> get_taxonomy() could, from a single efetch.fcgi lookup, create a >>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would >>> only >>> have a minimum of information, if you could simply ask a node >>> what its >>> rank and scientific name was you could easily build a classification >>> array, or ask what Kingdom your species was in etc. >>> >>> Are there good reasons for Taxonomy working the way it does in >>> 1.5.1, or >>> would I not be wasting my time re-writing things to make more sense >>> (to me)? >>> >>> >>> Cheers, >>> Sendu. >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From sb at mrc-dunn.cam.ac.uk Thu May 11 07:51:44 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Thu, 11 May 2006 12:51:44 +0100 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: <655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu> References: <4461B2D3.7010603@mrc-dunn.cam.ac.uk> <655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu> Message-ID: <44632550.3040603@mrc-dunn.cam.ac.uk> Jason Stajich wrote: > I would use the implementation that talks to the flatfile db as the > standard here. nodes are defined by the data in from taxonomy dump > dbs from ncbi. the eutils is pretty worthless except for taxid->name > or reverse, you can't get the full taxonomy (or couldn't when that > implementation was written). I'm not sure what you mean. In 1.5.1 you have access to the full taxonomy because you're using efetch.fcgi. Indeed, you parse the full taxonomy already to get the classification. > The "name" method refers to the name of the node - each level in the > taxonomy can have a "name". Yes, and to me the 'name of the node' is its scientific name (something like 'sapiens'), not a 'common' name. So why is it stored as a 'common' name in the object? Why don't the DB::Taxonomy modules store the actual common names (something like 'human')? > The bits of hackiness relate to wrapping the node object as a > Bio::Species and/or being able to read a genbank file and the > organism taxonomy data as a list and instantiating. If we could rely > on everything being in a DB of course this would be simpler. I think that Taxonomy stuff could be done in a 'pure' way, with a new Bio::Species made as a wrapper around an appropriate Taxonomy module(s) that cheated and made fake nodes from a genbank list and then made a proper Bio::Taxonomy. > With the flatfile implementation you have to walk all the way up the > db hierarchy to get the kingdom for a node so you do have to build up > the classification hierarchy as each node only stores data about > itsself. I'm still actually using bioperl 1.4 but I'm looking at 1.5.1 assuming it is the latest available and I see that the flatfile implementation works the same way as the entrez one. The requested node is fetched, but then internally it walks the hierarchy purely so it can build a classification list which is then stored on the object. If you're already retrieving every node above the the requested node, why not just return every node? Why not just return a whole Bio::Taxonomy? > I'm not exactly sure what you are proposing to do, but would > definitely enjoy another pair of hands, I don't really have time to > mess with it any time soon. I shouldn't really be spending any time on it either, but I knocked up a quick implementation for myself yesterday/today. I'm working on a bunch of modules that inherit from bioperl and then add/alter to suit my needs. In this regard they're a bit limited and kind of hard-coded to my way of thinking, but hopefully you can see my intent and perhaps use some of my implementation. In my implementation: # DB::Taxonomy::* return a Bio::Taxonomy equivalent with a single database lookup. # The Taxonomy is implicitly a tree. # The Taxonomy can have branches of different length from root to the same rank level. # The Taxonomy isn't told what ranks is has (isn't limited by some supplied rank list); it has the ranks that its Nodes have and knows (without being told) what order those ranks should be in. # The Taxonomy is made of Nodes that truly only contain information about themselves and have no classification array or anything like that. # A Node can still be classified. # We can have Nodes of rank 'no rank' that will be correctly ordered in the classification. # Nodes have a scientific name and common names # You get parent and all children nodes without database lookups. # There is a Bio::Species like thing that wraps around this and gives easy access to what I really want to do: my $human = TFBS::Species->new(-common_name => 'human'); my @classification = $human->classification; # returns the array you'd expect from a normally created, fully classified Bio::Species my $kingdom = $human->kingdom # returns 'Metazoa' # For genbank, we can still supply TFBS::Species a classification array http://bix.sendu.me.uk/files/taxonomy_the_tfbs_way.tar.gz (only tested inheriting from bioperl 1.4, but ideally that shouldn't make any difference!) Is there any scope for bioperl Taxonomy becoming more like this? Or are there problems with my design (quite likely!)? Or are there good reasons for maintaining the current way of working? Please feel free to shoot me down/ discuss. Cheers, Sendu. From sb at mrc-dunn.cam.ac.uk Thu May 11 08:22:53 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Thu, 11 May 2006 13:22:53 +0100 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: References: <000301c674ac$1d40f0f0$15327e82@pyrimidine> Message-ID: <44632C9D.4010408@mrc-dunn.cam.ac.uk> Jason Stajich wrote: > Great - now we just need someone to volunteer to actually work on this. Now I'm really confused... > The current code grabs most of this but I believe expects a different XML No, I think the code in bioperl 1.5.1 Bio::DB::Taxonomy::entrez expects that XML, and parses it as fully as flatfile.pm does. Nothing more to do. Weren't you the person that wrote that parser? I parse the same XML in my version of entrez.pm (see my previous email); the main difference being I make Nodes out of each Taxon instead of just adding each Taxon's ScientificName to the classification array. From jason.stajich at duke.edu Thu May 11 09:53:56 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu, 11 May 2006 09:53:56 -0400 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: <44632C9D.4010408@mrc-dunn.cam.ac.uk> References: <000301c674ac$1d40f0f0$15327e82@pyrimidine> <44632C9D.4010408@mrc-dunn.cam.ac.uk> Message-ID: i guess so - long since forgotten what it supports though since I don't regularly use it. sorry. On May 11, 2006, at 8:22 AM, Sendu Bala wrote: > Jason Stajich wrote: >> Great - now we just need someone to volunteer to actually work on >> this. > > Now I'm really confused... > > >> The current code grabs most of this but I believe expects a >> different XML > > No, I think the code in bioperl 1.5.1 Bio::DB::Taxonomy::entrez > expects > that XML, and parses it as fully as flatfile.pm does. Nothing more to > do. Weren't you the person that wrote that parser? > > I parse the same XML in my version of entrez.pm (see my previous > email); > the main difference being I make Nodes out of each Taxon instead of > just > adding each Taxon's ScientificName to the classification array. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cjfields at uiuc.edu Thu May 11 10:57:20 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 11 May 2006 09:57:20 -0500 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: Message-ID: <000b01c6750b$33e95ea0$15327e82@pyrimidine> Heh... To tell the truth, I haven't looked at Bio::DB::Taxonomy in any depth yet, but I myself have seen issues with the way Bio::Species treats bacterial strains (I guess this also involves Bio::Taxonomy::Node since that's what Bio::Species delegates to). Seems it likes to repeat some strain names when using $seq->species->common_name. Not a killer problem but annoying since the correct name is in the source tag in the feature table! I 'could' take a look at it but I can't guarantee quick results. Jason, I could add Taxonomy to the EUtilities overhaul I mentioned to you previously but it'll take awhile to get going. I'm really more interested in getting epost-esearch-efetch sequence retrieval up and running first with the same API as Bio::DB::GenBank/Genpept and Bio::DB::Query::GenBank, donate the code (late summer/fall???) after working out namespace issues so it doesn't conflict with current Bio::DB::WebDBSeqI inheritance. I suppose I could also look at Bio::DB:Taxonomy to see what's up in the next couple of weeks (after conference), unless someone gets to it sooner. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Jason Stajich > Sent: Thursday, May 11, 2006 7:05 AM > To: Chris Fields > Cc: bioperl-l at lists.open-bio.org; 'Sendu Bala' > Subject: Re: [Bioperl-l] Bio::Taxonomy confusion > > Great - now we just need someone to volunteer to actually work on this. > > The current code grabs most of this but I believe expects a different > XML > > > On May 10, 2006, at 11:36 PM, Chris Fields wrote: > > > I think you can get pretty much everything now, though I can > > definitely see > > the use of a local database. I ran a few tests, really unrelated > > to this, > > using the powerscripting test page at NCBI for eutils (for the > > curious, at > > http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was > > able to > > retrieve XML-formatted taxonomic information; here's the bacterium > > Frankia > > sp. CcI3 TaxID info, which looks like they have everything set up > > by rank. > > It gives quite a bit of information. > > > > > > > "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd"> > > > > > > > > 106370 > > Frankia sp. CcI3 > > 1854 > > species > > Bacteria > > > > 11 > > Bacterial and Plant Plastid > > > > > > 0 > > Unspecified > > > > cellular organisms; Bacteria; Actinobacteria; > > Actinobacteria > > (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae; > > Frankia > > > > > > 131567 > > cellular organisms > > no rank > > > > > > 2 > > Bacteria > > superkingdom > > > > > > 201174 > > Actinobacteria > > phylum > > > > > > 1760 > > Actinobacteria (class) > > class > > > > > > 85003 > > Actinobacteridae > > subclass > > > > > > 2037 > > Actinomycetales > > order > > > > > > 85013 > > Frankineae > > suborder > > > > > > 74712 > > Frankiaceae > > family > > > > > > 1854 > > Frankia > > genus > > > > > > 1999/10/22 > > 2005/01/19 > > 2000/02/02 > > > > > > > > Chris > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Jason Stajich > >> Sent: Wednesday, May 10, 2006 7:54 PM > >> To: Sendu Bala > >> Cc: bioperl-l at lists.open-bio.org > >> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion > >> > >> I would use the implementation that talks to the flatfile db as the > >> standard here. nodes are defined by the data in from taxonomy dump > >> dbs from ncbi. > >> the eutils is pretty worthless except for taxid->name or reverse, you > >> can't get the full taxonomy (or couldn't when that implementation was > >> written). > >> > >> The "name" method refers to the name of the node - each level in the > >> taxonomy can have a "name". > >> > >> The bits of hackiness relate to wrapping the node object as a > >> Bio::Species and/or being able to read a genbank file and the > >> organism taxonomy data as a list and instantiating. If we could rely > >> on everything being in a DB of course this would be simpler. > >> > >> Another problem is the depth of the taxonomy is not constant for > >> every node so assuming that a fixed number of slots will be filled in > >> to generate the taxonomy leads to problems. > >> > >> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the > >> best example of working code as this is how I really wanted it to > >> work, the Bio::Species hacks are only there to shoehorn data > >> retrieved from genbank files in. With the flatfile implementation > >> you have to walk all the way up the db hierarchy to get the kingdom > >> for a node so you do have to build up the classification hierarchy as > >> each node only stores data about itsself. > >> > >> I'm not exactly sure what you are proposing to do, but would > >> definitely enjoy another pair of hands, I don't really have time to > >> mess with it any time soon. > >> > >> -jason > >> On May 10, 2006, at 5:30 AM, Sendu Bala wrote: > >> > >>> Hi, > >>> I'm a little confused as to how names are supposed to work in > >>> Bio::Taxonomy::Node. > >>> > >>> In the bioperl versions that I've looked at a Node doesn't seem to > >>> store > >>> the most important information about itself - it's scientific name > >>> - in > >>> an obvious place. bioperl 1.5.1 puts it at the start of the > >>> classification list. I'd have thought sticking it in -name would > >>> make > >>> more sense, but this is used only for the GenBank common name. > >>> > >>> The Bio::Taxonomy docs still suggests: > >>> > >>> my $node_species_sapiens = Bio::Taxonomy::Node->new( > >>> -object_id => 9606, # or -ncbi_taxid. Requird tag > >>> -names => { > >>> 'scientific' => ['sapiens'], > >>> 'common_name' => ['human'] > >>> }, > >>> -rank => 'species' # Required tag > >>> ); > >>> > >>> and whilst Bio::Taxonomy::Node does not accept -names, it does > >>> have a > >>> 'name' method which claims to work like: > >>> > >>> $obj->name('scientific', 'sapiens'); > >>> > >>> This kind of thing would be really nice, but afaics > >>> Bio::Taxonomy::Node->new takes the -name value and makes a common > >>> name > >>> out of it, whilst the name() method passes any 'scientific' name to > >>> the > >>> scientific_name() method which is unable to set any value (and warns > >>> about this), only get. > >>> > >>> It seems like the need to have this classification array work the > >>> same > >>> way as Bio::Species is causing some unnecessary restrictions. Can't > >>> the > >>> more sensible idea of having a dedicated storage spot for the > >>> ScientificName and other parameters be used, with the classification > >>> array either being generated just-in-time from the hash-stored > >>> data, or > >>> indeed being generated from the Lineage field? > >>> > >>> > >>> Also, why does a node store the complete hierarchy on itself in the > >>> classification array? If we're going that far, why don't the > >>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a > >>> get_taxonomy() method instead of a get_Taxonomy_Node() method. > >>> get_taxonomy() could, from a single efetch.fcgi lookup, create a > >>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would > >>> only > >>> have a minimum of information, if you could simply ask a node > >>> what its > >>> rank and scientific name was you could easily build a classification > >>> array, or ask what Kingdom your species was in etc. > >>> > >>> Are there good reasons for Taxonomy working the way it does in > >>> 1.5.1, or > >>> would I not be wasting my time re-writing things to make more sense > >>> (to me)? > >>> > >>> > >>> Cheers, > >>> Sendu. > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> -- > >> Jason Stajich > >> Duke University > >> http://www.duke.edu/~jes12 > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Thu May 11 11:42:07 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu, 11 May 2006 11:42:07 -0400 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: <000b01c6750b$33e95ea0$15327e82@pyrimidine> References: <000b01c6750b$33e95ea0$15327e82@pyrimidine> Message-ID: <0C1C2DAC-F388-465E-B6C2-7217A3B4CC6C@duke.edu> I think you'll see it is different and mostly a limitation of the genbank format and the Bio::Species objects that you get from a genbank parse do represent the full capabilities of a Taxonomy::Node. I am happy for someone to overhaul things, but it all boils down to inferring which part of a list of names is the species versus sub- species versus strain when none of the members of the list are labeled. This is some of the same problems we have for swissprot as well. I just don't think we can do it right only from the genbank file data so I don't see a lot of point of expecting Bio::Species to provide more than a representation of what is in the file and just return that array. It has seemed like we need to special case things pretty heavily or do a lookup in the taxonomydb for something. Can you guess what value is the strain versus sub-species? What happens when there is a two part strain name (space separated) and a sub-species or variety designation? SOURCE Staphylococcus haemolyticus JCSC1435 ORGANISM Staphylococcus haemolyticus JCSC1435 Bacteria; Firmicutes; Bacillales; Staphylococcus. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=279808 strain is JCSC1435 versus SOURCE Muntiacus muntjak vaginalis ORGANISM Muntiacus muntjak vaginalis Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Ruminantia; Pecora; Cervidae; Muntiacinae; Muntiacus. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9887 species is muntjak, sub-species vaginalis ? versus SOURCE Aspergillus nidulans FGSC A4 ORGANISM Aspergillus nidulans FGSC A4 Eukaryota; Fungi; Ascomycota; Pezizomycotina; Eurotiomycetes; Eurotiales; Trichocomaceae; Emericella. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=227321 Genus should be Aspergillus or Emericella ? Strain and subspecies/variety in the same entry SOURCE Cryptococcus neoformans var. grubii H99 ORGANISM Cryptococcus neoformans var. grubii H99 Eukaryota; Fungi; Basidiomycota; Hymenomycetes; Heterobasidiomycetes; Tremellomycetidae; Tremellales; Tremellaceae; Filobasidiella. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=235443 On May 11, 2006, at 10:57 AM, Chris Fields wrote: > Heh... > > To tell the truth, I haven't looked at Bio::DB::Taxonomy in any > depth yet, > but I myself have seen issues with the way Bio::Species treats > bacterial > strains (I guess this also involves Bio::Taxonomy::Node since > that's what > Bio::Species delegates to). Seems it likes to repeat some strain > names when > using $seq->species->common_name. Not a killer problem but > annoying since > the correct name is in the source tag in the feature table! I > 'could' take > a look at it but I can't guarantee quick results. > > Jason, I could add Taxonomy to the EUtilities overhaul I mentioned > to you > previously but it'll take awhile to get going. I'm really more > interested > in getting epost-esearch-efetch sequence retrieval up and running > first with > the same API as Bio::DB::GenBank/Genpept and > Bio::DB::Query::GenBank, donate > the code (late summer/fall???) after working out namespace issues > so it > doesn't conflict with current Bio::DB::WebDBSeqI inheritance. I > suppose I > could also look at Bio::DB:Taxonomy to see what's up in the next > couple of > weeks (after conference), unless someone gets to it sooner. > > Chris > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Jason Stajich >> Sent: Thursday, May 11, 2006 7:05 AM >> To: Chris Fields >> Cc: bioperl-l at lists.open-bio.org; 'Sendu Bala' >> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion >> >> Great - now we just need someone to volunteer to actually work on >> this. >> >> The current code grabs most of this but I believe expects a different >> XML >> >> >> On May 10, 2006, at 11:36 PM, Chris Fields wrote: >> >>> I think you can get pretty much everything now, though I can >>> definitely see >>> the use of a local database. I ran a few tests, really unrelated >>> to this, >>> using the powerscripting test page at NCBI for eutils (for the >>> curious, at >>> http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was >>> able to >>> retrieve XML-formatted taxonomic information; here's the bacterium >>> Frankia >>> sp. CcI3 TaxID info, which looks like they have everything set up >>> by rank. >>> It gives quite a bit of information. >>> >>> >>> >> "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd"> >>> >>> >>> >>> 106370 >>> Frankia sp. CcI3 >>> 1854 >>> species >>> Bacteria >>> >>> 11 >>> Bacterial and Plant Plastid >>> >>> >>> 0 >>> Unspecified >>> >>> cellular organisms; Bacteria; Actinobacteria; >>> Actinobacteria >>> (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae; >>> Frankia >>> >>> >>> 131567 >>> cellular organisms >>> no rank >>> >>> >>> 2 >>> Bacteria >>> superkingdom >>> >>> >>> 201174 >>> Actinobacteria >>> phylum >>> >>> >>> 1760 >>> Actinobacteria (class) >>> class >>> >>> >>> 85003 >>> Actinobacteridae >>> subclass >>> >>> >>> 2037 >>> Actinomycetales >>> order >>> >>> >>> 85013 >>> Frankineae >>> suborder >>> >>> >>> 74712 >>> Frankiaceae >>> family >>> >>> >>> 1854 >>> Frankia >>> genus >>> >>> >>> 1999/10/22 >>> 2005/01/19 >>> 2000/02/02 >>> >>> >>> >>> Chris >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Jason Stajich >>>> Sent: Wednesday, May 10, 2006 7:54 PM >>>> To: Sendu Bala >>>> Cc: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion >>>> >>>> I would use the implementation that talks to the flatfile db as the >>>> standard here. nodes are defined by the data in from taxonomy dump >>>> dbs from ncbi. >>>> the eutils is pretty worthless except for taxid->name or >>>> reverse, you >>>> can't get the full taxonomy (or couldn't when that >>>> implementation was >>>> written). >>>> >>>> The "name" method refers to the name of the node - each level in >>>> the >>>> taxonomy can have a "name". >>>> >>>> The bits of hackiness relate to wrapping the node object as a >>>> Bio::Species and/or being able to read a genbank file and the >>>> organism taxonomy data as a list and instantiating. If we could >>>> rely >>>> on everything being in a DB of course this would be simpler. >>>> >>>> Another problem is the depth of the taxonomy is not constant for >>>> every node so assuming that a fixed number of slots will be >>>> filled in >>>> to generate the taxonomy leads to problems. >>>> >>>> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as >>>> the >>>> best example of working code as this is how I really wanted it to >>>> work, the Bio::Species hacks are only there to shoehorn data >>>> retrieved from genbank files in. With the flatfile implementation >>>> you have to walk all the way up the db hierarchy to get the kingdom >>>> for a node so you do have to build up the classification >>>> hierarchy as >>>> each node only stores data about itsself. >>>> >>>> I'm not exactly sure what you are proposing to do, but would >>>> definitely enjoy another pair of hands, I don't really have time to >>>> mess with it any time soon. >>>> >>>> -jason >>>> On May 10, 2006, at 5:30 AM, Sendu Bala wrote: >>>> >>>>> Hi, >>>>> I'm a little confused as to how names are supposed to work in >>>>> Bio::Taxonomy::Node. >>>>> >>>>> In the bioperl versions that I've looked at a Node doesn't seem to >>>>> store >>>>> the most important information about itself - it's scientific name >>>>> - in >>>>> an obvious place. bioperl 1.5.1 puts it at the start of the >>>>> classification list. I'd have thought sticking it in -name would >>>>> make >>>>> more sense, but this is used only for the GenBank common name. >>>>> >>>>> The Bio::Taxonomy docs still suggests: >>>>> >>>>> my $node_species_sapiens = Bio::Taxonomy::Node->new( >>>>> -object_id => 9606, # or -ncbi_taxid. Requird tag >>>>> -names => { >>>>> 'scientific' => ['sapiens'], >>>>> 'common_name' => ['human'] >>>>> }, >>>>> -rank => 'species' # Required tag >>>>> ); >>>>> >>>>> and whilst Bio::Taxonomy::Node does not accept -names, it does >>>>> have a >>>>> 'name' method which claims to work like: >>>>> >>>>> $obj->name('scientific', 'sapiens'); >>>>> >>>>> This kind of thing would be really nice, but afaics >>>>> Bio::Taxonomy::Node->new takes the -name value and makes a common >>>>> name >>>>> out of it, whilst the name() method passes any 'scientific' >>>>> name to >>>>> the >>>>> scientific_name() method which is unable to set any value (and >>>>> warns >>>>> about this), only get. >>>>> >>>>> It seems like the need to have this classification array work the >>>>> same >>>>> way as Bio::Species is causing some unnecessary restrictions. >>>>> Can't >>>>> the >>>>> more sensible idea of having a dedicated storage spot for the >>>>> ScientificName and other parameters be used, with the >>>>> classification >>>>> array either being generated just-in-time from the hash-stored >>>>> data, or >>>>> indeed being generated from the Lineage field? >>>>> >>>>> >>>>> Also, why does a node store the complete hierarchy on itself in >>>>> the >>>>> classification array? If we're going that far, why don't the >>>>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just >>>>> have a >>>>> get_taxonomy() method instead of a get_Taxonomy_Node() method. >>>>> get_taxonomy() could, from a single efetch.fcgi lookup, create a >>>>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would >>>>> only >>>>> have a minimum of information, if you could simply ask a node >>>>> what its >>>>> rank and scientific name was you could easily build a >>>>> classification >>>>> array, or ask what Kingdom your species was in etc. >>>>> >>>>> Are there good reasons for Taxonomy working the way it does in >>>>> 1.5.1, or >>>>> would I not be wasting my time re-writing things to make more >>>>> sense >>>>> (to me)? >>>>> >>>>> >>>>> Cheers, >>>>> Sendu. >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> Jason Stajich >>>> Duke University >>>> http://www.duke.edu/~jes12 >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From WiersmaP at AGR.GC.CA Thu May 11 13:04:01 2006 From: WiersmaP at AGR.GC.CA (Wiersma, Paul) Date: Thu, 11 May 2006 13:04:01 -0400 Subject: [Bioperl-l] What is the relationship between primer3 moduleandrun-primer3 module? Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C4D@onncrxms5.agr.gc.ca> The bug that Wenwu referred should only occur when reading a Primer3 output file; the Bio::Tools::Run::Primer3->run method takes the results and directly transfers them to a Bio::Tools::Primer3 object without an intermediate file. A Data::Dumper look at the Bio::Tools::Primer3 object shows the keys and results for PRIMER_SEQUENCE_ID and SEQUENCE in 'results' and then again in the 'results_by_number' hash but only in the '0' hash. All of this doesn't really matter for Li's original concern. If you want to include the id of sequence along with the primer3 results just take it from the seq object (i.e. $seq->display_id() ). Since you are in a loop taking one sequence at a time this $seq will be the one that was sent to primer3. PAW Paul A. Wiersma Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Summerland, BC wiersmap at agr.gc.ca ? -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Cui, Wenwu (NIH/NCI) [F] Sent: Wednesday, May 10, 2006 6:46 PM To: chen li; bioperl-l at bioperl.org Subject: Re: [Bioperl-l] What is the relationship between primer3 moduleandrun-primer3 module? 1. Bio::Tools::Primer3 is already included in Bio::Tools::Run::Primer3 module so that you can parse the result file. 2. There is a bug in Bio::Toos::Primer3.pm line 264 as I mentioned. Once fixed, it can output 3. primer3.exe is called in the Bio::Tools::Run::Primer3 "run" function, please read the function definition. From cjfields at uiuc.edu Thu May 11 13:16:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 11 May 2006 12:16:19 -0500 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: <0C1C2DAC-F388-465E-B6C2-7217A3B4CC6C@duke.edu> Message-ID: <000f01c6751e$9e89d6a0$15327e82@pyrimidine> > I think you'll see it is different and mostly a limitation of the > genbank format and the Bio::Species objects that you get from a > genbank parse do represent the full capabilities of a Taxonomy::Node. I definitely see the rational for using a TaxID lookup (I think Hilmar said so as well), especially for local databases. I wonder, though, if there is a way that RichSeqs like GenBank, when passed through SeqIO, can be just be 'short-circuited' using the sequence builder to just accept what's on the SOURCE or ORGANISM line of a file as is, without forcing it into Bio::Species/Bio::Taxonomy::Node. Or maybe diminish the role of the SOURCE/ORGANISM lines altogether to just simple Annotation objects and place much greater emphasis on the TaxID itself, in effect decoupling the TaxID (taxonomic information) from SOURCE/ORGANISM (annotation information). In other words, have GenBank/EMBL classification lines and organism lines essentially stay like they are in the input file (use simple objects). Then, if one were really intent on getting the full name, classification, etc., or one wanted to store their sequences in bioperl-db, they would be required to either have a local db of NCBI Taxonomy or remote access to a similar database (NCBI or something else) so a lookup could be accomplished using the TaxID. If they us BioSQL, then require them to preload their BioSQL database with NCBI's taxonomy, something Hilmar already strongly suggests. If anyone isn't interested in the taxonomic information or doesn't want to bother grabbing the database or setting up remote access, tough luck; just grab the Bio::Annotation/Bio::Species object and use that. As the saying goes, "you can't be all things to all people." At some point you have to throw your arms in the air, do the best you can, but give up trying to please everyone. > I am happy for someone to overhaul things, but it all boils down to > inferring which part of a list of names is the species versus sub- > species versus strain when none of the members of the list are > labeled. This is some of the same problems we have for swissprot as > well. I just don't think we can do it right only from the genbank > file data so I don't see a lot of point of expecting Bio::Species to > provide more than a representation of what is in the file and just > return that array. > > > It has seemed like we need to special case things pretty heavily or > do a lookup in the taxonomydb for something. > > Can you guess what value is the strain versus sub-species? What > happens when there is a two part strain name (space separated) and a > sub-species or variety designation? > > SOURCE Staphylococcus haemolyticus JCSC1435 > ORGANISM Staphylococcus haemolyticus JCSC1435 > Bacteria; Firmicutes; Bacillales; Staphylococcus. > http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=279808 > strain is JCSC1435 > > versus > SOURCE Muntiacus muntjak vaginalis > ORGANISM Muntiacus muntjak vaginalis > Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; > Euteleostomi; > Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; > Ruminantia; > Pecora; Cervidae; Muntiacinae; Muntiacus. > http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9887 > species is muntjak, sub-species vaginalis ? > > versus > SOURCE Aspergillus nidulans FGSC A4 > ORGANISM Aspergillus nidulans FGSC A4 > Eukaryota; Fungi; Ascomycota; Pezizomycotina; > Eurotiomycetes; > Eurotiales; Trichocomaceae; Emericella. > http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=227321 > > Genus should be Aspergillus or Emericella ? > > Strain and subspecies/variety in the same entry > SOURCE Cryptococcus neoformans var. grubii H99 > ORGANISM Cryptococcus neoformans var. grubii H99 > Eukaryota; Fungi; Basidiomycota; Hymenomycetes; > Heterobasidiomycetes; Tremellomycetidae; Tremellales; > Tremellaceae; > Filobasidiella. > http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=235443 Definitely tricky! This really points out the problem here. It used to be a problem for only a few cases but with so many bacterial and fungal genomes that's changed. The Frankia XML example has the scientific name set to "Frankia sp. CcI3", which matches the SOURCE/ORGANISM line in NCBI's GenBank files and the OS line in EMBL files. It looks like the lines are parsed into and then built from the ground-up in Bio::SeqIO::genbank using Bio::Species objects, which, in my case with the strain designation, is where the problem lies. They could be placed in annotation objects with (-tagname=> 'SOURCE', value =>'Frankia sp. CcI3') or similar settings. Or simplify Bio::Species to only represent the information in the GenBank SOURCE/ORGANISM/CLASSIFICATION or EMBL OS/OC lines and nothing more complex than that (no complex taxonomy; for that you use the TaxID and local database). Okay, I need to lay off the coffee now... Chris > On May 11, 2006, at 10:57 AM, Chris Fields wrote: > > > Heh... > > > > To tell the truth, I haven't looked at Bio::DB::Taxonomy in any > > depth yet, > > but I myself have seen issues with the way Bio::Species treats > > bacterial > > strains (I guess this also involves Bio::Taxonomy::Node since > > that's what > > Bio::Species delegates to). Seems it likes to repeat some strain > > names when > > using $seq->species->common_name. Not a killer problem but > > annoying since > > the correct name is in the source tag in the feature table! I > > 'could' take > > a look at it but I can't guarantee quick results. > > > > Jason, I could add Taxonomy to the EUtilities overhaul I mentioned > > to you > > previously but it'll take awhile to get going. I'm really more > > interested > > in getting epost-esearch-efetch sequence retrieval up and running > > first with > > the same API as Bio::DB::GenBank/Genpept and > > Bio::DB::Query::GenBank, donate > > the code (late summer/fall???) after working out namespace issues > > so it > > doesn't conflict with current Bio::DB::WebDBSeqI inheritance. I > > suppose I > > could also look at Bio::DB:Taxonomy to see what's up in the next > > couple of > > weeks (after conference), unless someone gets to it sooner. > > > > Chris > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Jason Stajich > >> Sent: Thursday, May 11, 2006 7:05 AM > >> To: Chris Fields > >> Cc: bioperl-l at lists.open-bio.org; 'Sendu Bala' > >> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion > >> > >> Great - now we just need someone to volunteer to actually work on > >> this. > >> > >> The current code grabs most of this but I believe expects a different > >> XML > >> > >> > >> On May 10, 2006, at 11:36 PM, Chris Fields wrote: > >> > >>> I think you can get pretty much everything now, though I can > >>> definitely see > >>> the use of a local database. I ran a few tests, really unrelated > >>> to this, > >>> using the powerscripting test page at NCBI for eutils (for the > >>> curious, at > >>> http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was > >>> able to > >>> retrieve XML-formatted taxonomic information; here's the bacterium > >>> Frankia > >>> sp. CcI3 TaxID info, which looks like they have everything set up > >>> by rank. > >>> It gives quite a bit of information. > >>> > >>> > >>> >>> "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd"> > >>> > >>> > >>> > >>> 106370 > >>> Frankia sp. CcI3 > >>> 1854 > >>> species > >>> Bacteria > >>> > >>> 11 > >>> Bacterial and Plant Plastid > >>> > >>> > >>> 0 > >>> Unspecified > >>> > >>> cellular organisms; Bacteria; Actinobacteria; > >>> Actinobacteria > >>> (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae; > >>> Frankia > >>> > >>> > >>> 131567 > >>> cellular organisms > >>> no rank > >>> > >>> > >>> 2 > >>> Bacteria > >>> superkingdom > >>> > >>> > >>> 201174 > >>> Actinobacteria > >>> phylum > >>> > >>> > >>> 1760 > >>> Actinobacteria (class) > >>> class > >>> > >>> > >>> 85003 > >>> Actinobacteridae > >>> subclass > >>> > >>> > >>> 2037 > >>> Actinomycetales > >>> order > >>> > >>> > >>> 85013 > >>> Frankineae > >>> suborder > >>> > >>> > >>> 74712 > >>> Frankiaceae > >>> family > >>> > >>> > >>> 1854 > >>> Frankia > >>> genus > >>> > >>> > >>> 1999/10/22 > >>> 2005/01/19 > >>> 2000/02/02 > >>> > >>> > >>> > >>> Chris > >>> > >>>> -----Original Message----- > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>> bounces at lists.open-bio.org] On Behalf Of Jason Stajich > >>>> Sent: Wednesday, May 10, 2006 7:54 PM > >>>> To: Sendu Bala > >>>> Cc: bioperl-l at lists.open-bio.org > >>>> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion > >>>> > >>>> I would use the implementation that talks to the flatfile db as the > >>>> standard here. nodes are defined by the data in from taxonomy dump > >>>> dbs from ncbi. > >>>> the eutils is pretty worthless except for taxid->name or > >>>> reverse, you > >>>> can't get the full taxonomy (or couldn't when that > >>>> implementation was > >>>> written). > >>>> > >>>> The "name" method refers to the name of the node - each level in > >>>> the > >>>> taxonomy can have a "name". > >>>> > >>>> The bits of hackiness relate to wrapping the node object as a > >>>> Bio::Species and/or being able to read a genbank file and the > >>>> organism taxonomy data as a list and instantiating. If we could > >>>> rely > >>>> on everything being in a DB of course this would be simpler. > >>>> > >>>> Another problem is the depth of the taxonomy is not constant for > >>>> every node so assuming that a fixed number of slots will be > >>>> filled in > >>>> to generate the taxonomy leads to problems. > >>>> > >>>> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as > >>>> the > >>>> best example of working code as this is how I really wanted it to > >>>> work, the Bio::Species hacks are only there to shoehorn data > >>>> retrieved from genbank files in. With the flatfile implementation > >>>> you have to walk all the way up the db hierarchy to get the kingdom > >>>> for a node so you do have to build up the classification > >>>> hierarchy as > >>>> each node only stores data about itsself. > >>>> > >>>> I'm not exactly sure what you are proposing to do, but would > >>>> definitely enjoy another pair of hands, I don't really have time to > >>>> mess with it any time soon. > >>>> > >>>> -jason > >>>> On May 10, 2006, at 5:30 AM, Sendu Bala wrote: > >>>> > >>>>> Hi, > >>>>> I'm a little confused as to how names are supposed to work in > >>>>> Bio::Taxonomy::Node. > >>>>> > >>>>> In the bioperl versions that I've looked at a Node doesn't seem to > >>>>> store > >>>>> the most important information about itself - it's scientific name > >>>>> - in > >>>>> an obvious place. bioperl 1.5.1 puts it at the start of the > >>>>> classification list. I'd have thought sticking it in -name would > >>>>> make > >>>>> more sense, but this is used only for the GenBank common name. > >>>>> > >>>>> The Bio::Taxonomy docs still suggests: > >>>>> > >>>>> my $node_species_sapiens = Bio::Taxonomy::Node->new( > >>>>> -object_id => 9606, # or -ncbi_taxid. Requird tag > >>>>> -names => { > >>>>> 'scientific' => ['sapiens'], > >>>>> 'common_name' => ['human'] > >>>>> }, > >>>>> -rank => 'species' # Required tag > >>>>> ); > >>>>> > >>>>> and whilst Bio::Taxonomy::Node does not accept -names, it does > >>>>> have a > >>>>> 'name' method which claims to work like: > >>>>> > >>>>> $obj->name('scientific', 'sapiens'); > >>>>> > >>>>> This kind of thing would be really nice, but afaics > >>>>> Bio::Taxonomy::Node->new takes the -name value and makes a common > >>>>> name > >>>>> out of it, whilst the name() method passes any 'scientific' > >>>>> name to > >>>>> the > >>>>> scientific_name() method which is unable to set any value (and > >>>>> warns > >>>>> about this), only get. > >>>>> > >>>>> It seems like the need to have this classification array work the > >>>>> same > >>>>> way as Bio::Species is causing some unnecessary restrictions. > >>>>> Can't > >>>>> the > >>>>> more sensible idea of having a dedicated storage spot for the > >>>>> ScientificName and other parameters be used, with the > >>>>> classification > >>>>> array either being generated just-in-time from the hash-stored > >>>>> data, or > >>>>> indeed being generated from the Lineage field? > >>>>> > >>>>> > >>>>> Also, why does a node store the complete hierarchy on itself in > >>>>> the > >>>>> classification array? If we're going that far, why don't the > >>>>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just > >>>>> have a > >>>>> get_taxonomy() method instead of a get_Taxonomy_Node() method. > >>>>> get_taxonomy() could, from a single efetch.fcgi lookup, create a > >>>>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would > >>>>> only > >>>>> have a minimum of information, if you could simply ask a node > >>>>> what its > >>>>> rank and scientific name was you could easily build a > >>>>> classification > >>>>> array, or ask what Kingdom your species was in etc. > >>>>> > >>>>> Are there good reasons for Taxonomy working the way it does in > >>>>> 1.5.1, or > >>>>> would I not be wasting my time re-writing things to make more > >>>>> sense > >>>>> (to me)? > >>>>> > >>>>> > >>>>> Cheers, > >>>>> Sendu. > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> -- > >>>> Jason Stajich > >>>> Duke University > >>>> http://www.duke.edu/~jes12 > >>>> > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > >> -- > >> Jason Stajich > >> Duke University > >> http://www.duke.edu/~jes12 > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 From WiersmaP at AGR.GC.CA Thu May 11 20:13:12 2006 From: WiersmaP at AGR.GC.CA (Wiersma, Paul) Date: Thu, 11 May 2006 20:13:12 -0400 Subject: [Bioperl-l] What is the relationship between primer3 module andrun-primer3 module? Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C52@onncrxms5.agr.gc.ca> Li, If you are only "a little confused" by the OO concepts in the primer3 modules than you are doing well. To expand a little on Wenwu's explanations. A Bio::Tools::Run:Primer3 object is a "wrapper" around the Primer3 program. All the commands and parameters that Primer3 needs for it to run are collected inside the object. This includes a sequence (which you must supply as a sequence object) and parameters (most of which are already supplied by default but can be changed using the $primer3_object->add_targets method). Then, when everything is set the way you want it you 'run' the Primer3 program by using $primer3_object->run. The "wrapper" collects all the run parameters and sends them off to the Primer3 executable. Primer3 does the analysis and outputs the results to "stdout" in boulder-io format. By redirecting the output (i.e. perl p3run_script.pl > out.txt) you will get the Primer3 output directly in the boulder-io format ('tag'='value') stored in out.txt. Because out.txt is not being closed between each sequence called in the script you get all of the results concatenated in out.txt. However, if you supplied an output filename (-outfile=>$file_out) in the "wrapper", each line of output from Primer3 will be written to $file_out and at the end of Primer3 output the file will be closed. Now if your script loops to another sequence it will open the same outfile again and overwrite. One last important detail for the "wrapper" object. When Primer3 is executed the $primer3_object is designed to return a Bio::Tools::Primer3 object (the code is: my $results_object = $primer3_object->run). $results_object is a Bio::Tools::Primer3 object and contains the results of your Primer3 run as well as having methods for getting at that information. This includes finding out how many primer sets were found and the means to access the primer set results one at a time. It does work as advertised. Because all of the primer sets are based on the same sequence, Primer3 only outputs the SEQUENCE and PRIMER_SEQUENCE_ID one time instead of for each primer set. That is why they only show up in $results_object as if they belonged with the first primer set (set '0') and they are not available for the other primer sets. PAW Paul A. Wiersma Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Summerland, BC wiersmap at agr.gc.ca ? -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chen li Sent: Wednesday, May 10, 2006 5:28 PM To: bioperl-l at bioperl.org Subject: [Bioperl-l] What is the relationship between primer3 module andrun-primer3 module? First thank you all for replying my previous post about primer3. But now I am a little confused even after I read the documents: What is the relationship between these two modules? What is correct/standard way to use them to do the batch-primer design? What I do is that I use Bio::Tools::Run::Primer3 to design primers. Based on Dr. Roy Chaudhuri's information I can set the parameters using the following syntax: $primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510'); Based on Paul A. Wiersma's explanation I can also print out part of the primer results(because I don't need all the information). But there is a little trouble: PRIMER_SEQUENCE_ID can't be accessed using this method. And Paul points out that "PRIMER_SEQUENCE_ID and SEQUENCE are not part of the individual results but only end up by default with $results->primer_results(0)". So it seems there is no way to get around this problem using Bio::Tools::Run::Primer3. And others suggest using Bio::Tools::Primer3 to parse the results. So is true that Bio::Tools::Run::Primer3 is for primer design and Bio::Tools::Primer3 is for parsing the results from Bio::Tools::Run::Primer3? But what I find is that I get almost all the results (except PRIMER_SEQUENCE_ID and SEQUENCE ) without providing a line code use Bio::Tools::Primer3 in the script. How to explain this? Is it because the following line code? my $result=$primer3->run; The last question: which line code is used to invoke program primer3.exe? How does Perl script call the primer3.exe? Once again thank you all very much, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From torsten.seemann at infotech.monash.edu.au Fri May 12 00:29:37 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 12 May 2006 14:29:37 +1000 Subject: [Bioperl-l] Using bioperl to convert gene predictions to gff In-Reply-To: <000301c6698f$b17a4d20$0202a8c0@GosinkFranklin> References: <000301c6698f$b17a4d20$0202a8c0@GosinkFranklin> Message-ID: <44640F31.6090702@infotech.monash.edu.au> Mark, > I'd like to reformat gene predictions from several different programs > (genscan, glimmerhmm, fgenesh) to gff format. I know bioperl can parse the > output from these and other predictors and that it can export into GFF. But > I'm not clear on how to string the two together. > Can anyone point me at any example code? The parser module for the gene predictions generally allow you to iterate through the predicted genes. Each prediction is usually returned as a Bio::SeqFeatureI-derived object. Those objects have a gff_string() method to print them as GFF. So something as simple as this *may* work: use Bio::Tools::Glimmer; my $parser = new Bio::Tools::Glimmer(-file => 'glimmer.out'); while(my $gene = $parser->next_prediction) { print $gene->gff_string; } If you want separate GFF lines for each exon, you'll have to do another loop over $gene->exons() etc each of which are luckily also Bio::SeqFeatures! Or if want to modify some of the GFF columns first, eg. the source tag, just do $gene->source_tag('mynewtag') before printing it. Hope this helps, -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From torsten.seemann at infotech.monash.edu.au Fri May 12 00:36:46 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 12 May 2006 14:36:46 +1000 Subject: [Bioperl-l] Bio::Graphics::Panel imagemap making with Bio::Graphics::Panel In-Reply-To: <5b6410e0605030120q31d1f554mbc4bf104deca48bf@mail.gmail.com> References: <5b6410e0605030120q31d1f554mbc4bf104deca48bf@mail.gmail.com> Message-ID: <446410DE.7070305@infotech.monash.edu.au> Kevin, > I want to create an imagemap of short sequence matches with a longer one > with clickable imagemaps for the short sequences. I figure I can do this > easily enough using the example script for parsing blast output but I need > an example script to understand how to produce the html code for the > imagemap. I can find only rather cryptic references about how this can be > done (see below). The "blastGraphic" project probably has Perl code that could help you. http://www.gmod.org/blastGraphic.shtml It is/was part of the GMOD project. It produces pretty clickable image maps from BLAST reports. Hope it helps, -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From brianjgilmartin at hotmail.com Fri May 12 05:29:15 2006 From: brianjgilmartin at hotmail.com (brian gilmartin) Date: Fri, 12 May 2006 10:29:15 +0100 Subject: [Bioperl-l] (no subject) Message-ID: please remove me from the list _________________________________________________________________ Be the first to hear what's new at MSN - sign up to our free newsletters! http://www.msn.co.uk/newsletters From sb at mrc-dunn.cam.ac.uk Fri May 12 06:24:39 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Fri, 12 May 2006 11:24:39 +0100 Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names Message-ID: <44646267.2000802@mrc-dunn.cam.ac.uk> In bioperl up to at least 1.5.1, when one of the database modules comes across a species rank it does: if ($rank eq 'species') { # get rid of genus from species name (undef,$taxon_name) = split(/\s+/,$taxon_name,2); } However even though true scientific name is usually 'Genus species' in the database, note the 'usually' - sometimes the species is a multiword item that does not include the Genus, so we can't do some simple split and take the second word. The same applies to levels below species, eg. 'Avian erythroblastosis virus' is a variant of the species 'Avian leukosis virus' but 'Avian erythroblastosis virus (strain ES4)' is a variant of that variant... My solution is to just remove whatever is the same between the current rank and the previous rank. Maybe even that's not so perfect, but it must be a lot better than turning the species 'Avian leukosis virus' into the species 'virus' (especially given that the genus here is 'Alpharetrovirus')! # we need to be going root(kingdom) -> leaf (species or lower) order # # we need to be storing untouched versions of the scientific name of # the previous rank ($self->{_last_raw}) # # probably only bother start doing this when we get to genus my $last_raw = $self->{_last_raw} || undef; $self->{_last_raw} = $sci_name; if ($last_raw) { $sci_name =~ s/$last_raw//; $sci_name =~ s/^\s+//; } Are there even more strange species (and lower) names that would still not work well with the above solution? Cheers, Sendu. From s_maheshwari84 at rediffmail.com Fri May 12 09:55:49 2006 From: s_maheshwari84 at rediffmail.com (saurabh maheshwari) Date: 12 May 2006 13:55:49 -0000 Subject: [Bioperl-l] problem help me...........please Message-ID: <20060512135549.27106.qmail@webmail9.rediffmail.com> hello I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD). I am working on protein protein interaction but I am unable to use the protein interaction module i.e. ProteinGraph.pm.. Actially I am facing lots of problem in the programme I have written Please help me since last four months I am not able to solve the same problem.. I am pasting my programe here also I am attaching it also. ...... #!usr/bin/perl use lib "/usr/local/bioxapps/bioperl/library/"; use strict; use Bio::Graph::SimpleGraph; use Bio::Graph::IO; our @ISA=qw( Bio::SeqI); use Bio::Graph::Edge; use Bio::Graph::IO::dip; use Bio::Graph::IO::psi_xml; use Clone qw(clone); use vars qw(@ISA); use Bio::AnnotatableI; use Bio::IdentifiableI; our @ISA = qw(Bio::Graph::SimpleGraph); @ISA = qw(Bio::Graph::IO); our @ISA=qw(Expoerter); use Bio::Graph::ProteinGraph; use Class::AutoClass; use Bio::Graph::SimpleGraph::Traversal; my $graphio = Bio::Graph::IO->new(-file => '/users/saurabh/perl_program/sample1.txt',-format => 'dip'); print "$graphio"; my $graph = $graphio->next_network(); print "$graph->nodes\t"; $graph->remove_dup_edges(); my @un=$graph->unconnected_nodes(); print "\nthe unconnected nodes are =@un"; my @n=$graph->subgraph(); print "\subgraph=@n\n"; #print "Please the protein-id whose clusering coefficient is to be detemined\n"; #my $v=; my $density = $graph->density(); print "\ngraph density=$density\n"; my @graphs = $graph->components(); print "\nno of Connected components=$#graphs\n"; print "\nplease enter the protein-id whom you want to remove from the network\n"; my $no=; $graph->remove_nodes($graph->nodes_by_id($no)); my $count = $graph->edge_count(); print "\nno of edges=$count\n "; my $ncount = $graph->node_count(); print "\nno of nodes=$ncount\n "; print"\nenter the protein whose interactions is to be find "; my $x=; my $node = $graph->nodes_by_id($x); #print " this is $node\n"; my @neighbors = $graph->neighbors($node); print "to check"; print join",",map{$_->object_id()} @neighbors; my @nodes = $graph->nodes(); print "\nno of nodes = @nodes\t\n"; my @hubs; foreach my $nodi (@nodes) { if ($graph->neighbor_count($node) > 10) { push @hubs, $nodi; } } foreach my $r(@hubs) { my @y=@$r; print "the following proteins have > 10 interactors=@y\n"; } #siblingual protein my @edgeref = $graph->articulation_points(); print "no of articulation points=$#edgeref\n"; print "please enter the protein whom you want to check for articulation point \n "; my $nod=; # make pathgen graph my $grap = Bio::Graph::IO->new(-file => 'org.txt',-format => 'dip'); my $gra = $grap->next_network(); $graph->remove_dup_edges(); $graph->union($gra); my @duplicates = $graph->dup_edges(); print "these interactions exist in cere and c.elegan\n=@duplicates"; print "please enter the first protein for identifiaction of shortest path\n"; my $p1=; print "please enter the second protein for identifiaction of shortest path\n"; my $p2=; my @a=$graph->shortest_paths(); print "shortest path=@a\t\n"; with Regards SAURABH MAHESHWARI M.Sc. (BIOINFORMATICS) JAMIA MILLIA ISLAMIA NEW DELHI -------------- next part -------------- A non-text attachment was scrubbed... Name: from.pl Type: application/octet-stream Size: 2723 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060512/fe287972/attachment.obj From chen_li3 at yahoo.com Thu May 11 13:47:33 2006 From: chen_li3 at yahoo.com (chen li) Date: Thu, 11 May 2006 10:47:33 -0700 (PDT) Subject: [Bioperl-l] script for batch-primer design using primer3 module In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C4D@onncrxms5.agr.gc.ca> Message-ID: <20060511174733.68836.qmail@web36812.mail.mud.yahoo.com> Hi all, With the valuable input from many of you I finally come out a script for my personal need: 1)bacth-primer design 2)set some of the parameters instead of using all the default values 3)output only part of the information for the first pair of primers but not all of them(but you can choose) 4)the reults can be exported into excel for my convience. Enclosed are the script and the results tested. I also include some lines about how I figure out which keys/entries are vailable for change.If you don't want the sequence part just add # to comment it. Any comments are welcome. BTW the solution suggested by Dr. Cui and Paul doesn't work for me. Once again thank you very much, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: primer3-5 Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060511/2358c5b7/attachment.pl -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: result1.txt Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060511/2358c5b7/attachment.txt From Marc.Logghe at DEVGEN.com Fri May 12 11:28:55 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Fri, 12 May 2006 17:28:55 +0200 Subject: [Bioperl-l] problem help me...........please Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746DAB@ANTARESIA.be.devgen.com> Hi, What is actually the problem ? Do you have errors ? Is the script not behaving as you expect ? You also might attach the input file sample1.txt so that people can try it. Regards, Marc > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > saurabh maheshwari > Sent: Friday, May 12, 2006 3:56 PM > To: bioperl-l at bioperl.org; s_maheshwari84 > Subject: [Bioperl-l] problem help me...........please > > > hello > I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD). > I am working on protein protein interaction but I am unable > to use the protein interaction module i.e. ProteinGraph.pm.. > Actially I am facing lots of problem in the programme I have > written Please help me since last four months I am not able > to solve the same problem.. > I am pasting my programe here also I am attaching it also. ...... > > #!usr/bin/perl > use lib "/usr/local/bioxapps/bioperl/library/"; > use strict; > use Bio::Graph::SimpleGraph; > use Bio::Graph::IO; > our @ISA=qw( Bio::SeqI); > use Bio::Graph::Edge; > use Bio::Graph::IO::dip; > use Bio::Graph::IO::psi_xml; > use Clone qw(clone); > use vars qw(@ISA); > use Bio::AnnotatableI; > use Bio::IdentifiableI; > our @ISA = qw(Bio::Graph::SimpleGraph); > @ISA = qw(Bio::Graph::IO); > our @ISA=qw(Expoerter); > use Bio::Graph::ProteinGraph; > use Class::AutoClass; > use Bio::Graph::SimpleGraph::Traversal; > > my $graphio = Bio::Graph::IO->new(-file => > '/users/saurabh/perl_program/sample1.txt',-format => 'dip'); > print "$graphio"; > my $graph = $graphio->next_network(); > print "$graph->nodes\t"; > $graph->remove_dup_edges(); > my @un=$graph->unconnected_nodes(); > print "\nthe unconnected nodes are =@un"; my > @n=$graph->subgraph(); print "\subgraph=@n\n"; #print "Please > the protein-id whose clusering coefficient is to be > detemined\n"; #my $v=; my $density = > $graph->density(); print "\ngraph density=$density\n"; my > @graphs = $graph->components(); print "\nno of Connected > components=$#graphs\n"; print "\nplease enter the protein-id > whom you want to remove from the network\n"; my $no=; > $graph->remove_nodes($graph->nodes_by_id($no)); > my $count = $graph->edge_count(); > print "\nno of edges=$count\n "; > my $ncount = $graph->node_count(); > print "\nno of nodes=$ncount\n "; > > print"\nenter the protein whose interactions is to be find > "; my $x=; my $node = $graph->nodes_by_id($x); #print > " this is $node\n"; my @neighbors = $graph->neighbors($node); > print "to check"; print join",",map{$_->object_id()} > @neighbors; my @nodes = $graph->nodes(); print "\nno of nodes > = @nodes\t\n"; my @hubs; foreach my $nodi (@nodes) { > if ($graph->neighbor_count($node) > 10) > { > push @hubs, $nodi; > } > } > > foreach my $r(@hubs) > { > my @y=@$r; > print "the following proteins have > 10 interactors=@y\n"; > } > #siblingual protein > > my @edgeref = $graph->articulation_points(); print "no of > articulation points=$#edgeref\n"; print "please enter the > protein whom you want to check for articulation point \n "; > my $nod=; > # make pathgen graph > my $grap = Bio::Graph::IO->new(-file => 'org.txt',-format > => 'dip'); > my $gra = $grap->next_network(); > $graph->remove_dup_edges(); > $graph->union($gra); > my @duplicates = $graph->dup_edges(); > print "these interactions exist in cere and c.elegan\n=@duplicates"; > print "please enter the first protein for identifiaction of > shortest path\n"; > my $p1=; > print "please enter the second protein for identifiaction > of shortest path\n"; > my $p2=; > > my @a=$graph->shortest_paths(); > print "shortest path=@a\t\n"; > > > > with Regards > > SAURABH MAHESHWARI > > M.Sc. (BIOINFORMATICS) > > JAMIA MILLIA ISLAMIA > > NEW DELHI > From stoltzfu at umbi.umd.edu Fri May 12 11:56:06 2006 From: stoltzfu at umbi.umd.edu (Arlin Stoltzfus) Date: Fri, 12 May 2006 11:56:06 -0400 Subject: [Bioperl-l] proposal: Bio::CDAT (character data and trees) Message-ID: Dear developers-- We propose a Bio::CDAT (Character Data And Trees) module to facilitate comparative analysis using evolutionary methods by 1) managing evolutionary relationships (by linking data to trees) and 2) allowing coordinated analysis of different types of data (by implementing a generic concept of ?character-state? data). Bio::CDAT would leverage existing BioPerl objects and include the functionality of Rutger Vos's Bio::Phylo. It would provide the framework to develop interfaces to analysis tools (phylogeny inference, evolutionary rate models, functional shift inference, etc), as well as to file formats and visualization methods appropriate for such analyses. A proposal is available at http://www.molevol.org/camel/projects/CDAT-proposal.pdf We would like to hear your thoughts (e.g., see the section on "Questions to consider")! Thanks Arlin Stoltzfus WeiGang Qiu Rutger Vos (with thanks to Justin Reese and Aaron Mackey) ------------------ Arlin Stoltzfus (stoltzfu at umbi.umd.edu) CARB, 9600 Gudelsky Drive, Rockville, Maryland 20850 tel 240 314 6208, fax 240 314 6255, www.molevol.org/camel From sdavis2 at mail.nih.gov Fri May 12 11:54:57 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Fri, 12 May 2006 11:54:57 -0400 Subject: [Bioperl-l] problem help me...........please In-Reply-To: <20060512135549.27106.qmail@webmail9.rediffmail.com> Message-ID: On 5/12/06 9:55 AM, "saurabh maheshwari" wrote: > > hello > I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD). > I am working on protein protein interaction but I am unable to use the protein > interaction module i.e. ProteinGraph.pm.. > Actially I am facing lots of problem in the programme I have written Please > help me since last four months I am not able to solve the same problem.. > I am pasting my programe here also I am attaching it also. ...... You haven't really told us what you are trying to do or what problems you are having. Sean From cjfields at uiuc.edu Fri May 12 13:08:11 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 12 May 2006 12:08:11 -0500 Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names In-Reply-To: <44646267.2000802@mrc-dunn.cam.ac.uk> Message-ID: <000f01c675e6$a61bde90$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Friday, May 12, 2006 5:25 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles > species,subspecies/variant names > > In bioperl up to at least 1.5.1, when one of the database modules comes > across a species rank it does: > > if ($rank eq 'species') { > # get rid of genus from species name > (undef,$taxon_name) = split(/\s+/,$taxon_name,2); > } The XML example from NCBI Taxonomy I mentioned previously seems to have everything in the classification, from superkingdom down to species (no strain unfortunately, and I'm nit sure about subspecies); if it's missing the rank then the designation doesn't exist or is tagged as 'no rank'. Like I mentioned before I'm not intimately familiar Bio::Taxonomy, Bio::DB::Taxonomy, or Bio::Species, so I don't have a clue as to how everything is parsed and plugged in to Bio::Taxonomy objects. I do know that XML::Twig is used for parsing through the data so it shouldn't be too hard to change what you want. I haven't tried using Bio::DB::Taxonomy directly yet, but I would have thought that the binomial is just built from the XML twig 'LineageEx' Rank=Genus + Rank=Species, that the genus comes from the tag 'Genus' and species from 'Species', and that the scientific name is from the tag 'ScientificName'. Guess not. > However even though true scientific name is usually 'Genus species' in > the database, note the 'usually' - sometimes the species is a multiword > item that does not include the Genus, so we can't do some simple split > and take the second word. > The same applies to levels below species, eg. 'Avian erythroblastosis > virus' is a variant of the species 'Avian leukosis virus' but 'Avian > erythroblastosis virus (strain ES4)' is a variant of that variant... > > My solution is to just remove whatever is the same between the current > rank and the previous rank. Maybe even that's not so perfect, but it > must be a lot better than turning the species 'Avian leukosis virus' > into the species 'virus' (especially given that the genus here is > 'Alpharetrovirus')! > > # we need to be going root(kingdom) -> leaf (species or lower) order > # > # we need to be storing untouched versions of the scientific name of > # the previous rank ($self->{_last_raw}) > # > # probably only bother start doing this when we get to genus > my $last_raw = $self->{_last_raw} || undef; > $self->{_last_raw} = $sci_name; > if ($last_raw) { > $sci_name =~ s/$last_raw//; > $sci_name =~ s/^\s+//; > } > > Are there even more strange species (and lower) names that would still > not work well with the above solution? I'm don't think taking Genus/Species directly from the scientific name (normally what is in the SOURCE or ORGANISM annotation for GenBank or OS for EMBL) is the best way to go about it since it's really a best guess using regex; Jason pointed out several examples where this falls apart, and being a bacterial man I have found many examples myself. I'm also not sure that forcing a lookup for every TaxID in every sequence every time it's passed through SeqIO is the best way to go either, though I think it should be required for storing sequences. It's a tricky balance. I still think that maybe we should absolve ourselves from using SOURCE/ORGANISM or OS/OC information in GenBank files as anything more than strictly annotation, or reconstruct Bio::Species to maybe a Bio::Annotation::Species object to handle that annotation and either deprecate Bio::Species or separate it completely from any Bio::Taxonomy objects. It would really simplify things. Then, if anyone is interested in taxonomy, either install a local database or use Entrez efetch, and then use Bio::DB::Taxonomy (fixed of course) to grab the TaxID info. Seems like we're running more and more into exceptions to the rule as more genomes are made available. Anyway, using Bio::Species for GenBank is really screwy for bacterial names, so currently I get around BioPerl issues with bacterial names by grabbing the 'source' seqfeature and pulling the 'organism' tag out. But it really shouldn't be that obfuscated, right? Chris > Cheers, > Sendu. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sdavis2 at mail.nih.gov Sat May 13 08:19:21 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Sat, 13 May 2006 08:19:21 -0400 Subject: [Bioperl-l] problem help me...........please In-Reply-To: <20060513041853.16091.qmail@webmail31.rediffmail.com> References: <20060513041853.16091.qmail@webmail31.rediffmail.com> Message-ID: <4465CEC9.2010909@mail.nih.gov> saurabh maheshwari wrote: > > hello > Thanks for your prompt reply. > Actaully I am trying to make a protein interaction graph from a dip > file.But I am not able to do so.In my last mail I have already attached > my program which is giving some error and I am not able troble shot > them.Please help > Thanks I meant that since we don't know what error(s) you are getting, it is really not possible to determine what the problem is. Also, someone else on the list offered to look at your code if you were to privide the input file. I find it helpful to look at this webpage every now and then to remind myself what constitutes a useful question to email lists: http://www.catb.org/~esr/faqs/smart-questions.html Sean > On Fri, 12 May 2006 Sean Davis wrote : > > > > > > > >On 5/12/06 9:55 AM, "saurabh maheshwari" > >wrote: > > > > > > > > hello > > > I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD). > > > I am working on protein protein interaction but I am unable to use > the protein > > > interaction module i.e. ProteinGraph.pm.. > > > Actially I am facing lots of problem in the programme I have > written Please > > > help me since last four months I am not able to solve the same > problem.. > > > I am pasting my programe here also I am attaching it also. ...... > > > >You haven't really told us what you are trying to do or what problems you > >are having. > > > >Sean > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > with Regards > SAURABH MAHESHWARI > M.Sc. (BIOINFORMATICS) > JAMIA MILLIA ISLAMIA > NEW DELHI > > > From s_maheshwari84 at rediffmail.com Sat May 13 01:17:58 2006 From: s_maheshwari84 at rediffmail.com (saurabh maheshwari) Date: 13 May 2006 05:17:58 -0000 Subject: [Bioperl-l] problem help me...........please Message-ID: <20060513051758.4610.qmail@webmail31.rediffmail.com> hello I am very happy to see the prompt reply from the group members.. As you all suggested to attach the required files .. So I have attached all the three file first the input file,secod I have saved the error I was getting into a error file and third the programme file.. Actully in error file I want to know some thing . I am putting here one error line, ## no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) ## what this stand for Second thing I want to get the connected graph as I have. which type of connected grph I explain you by example.. Let there are five object in such a way. A connected to B A connected to C B connected to C D connected to C E connected to A I want to create a whole link in betwwen all five. Please help me I am not getting the result with Regards SAURABH MAHESHWARI M.Sc. (BIOINFORMATICS) JAMIA MILLIA ISLAMIA NEW DELHI -------------- next part -------------- A non-text attachment was scrubbed... Name: sample.dip Type: application/octet-stream Size: 5794 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060513/77476ca5/attachment.obj -------------- next part -------------- bash-2.05b$ perl from.pl Bio::Graph::ProteinGraph=HASH(0x1182e70) Bio::Graph::ProteinGraph=HASH(0x1182e70)->nodes the unconnected nodes are =subgraph=Bio::Graph::SimpleGraph=HASH(0x11e2160) graph density=0.00826446280991736 no of Connected components=60 please enter the protein-id whom you want to remove from the network XMECF2 no of edges=61 no of nodes=122 enter the protein whose interactions is to be find XMECF2 XMECF2 interacts with map{->object_id()} no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) Bio::Seq::RichSeq=HASH(0x11d1850 ) Bio::Seq::RichSeq=HASH(0x11bd4c0) Bio::Seq::RichSeq=HASH(0x11c2fd0) Bio::Seq:: RichSeq=HASH(0x11aa7f0) Bio::Seq::RichSeq=HASH(0x1198340) Bio::Seq::RichSeq=HASH (0x11d81a0) Bio::Seq::RichSeq=HASH(0x11ca320) Bio::Seq::RichSeq=HASH(0x11b5e40) Bio::Seq::RichSeq=HASH(0x1190e00) Bio::Seq::RichSeq=HASH(0x11c1350) Bio::Seq::Ri chSeq=HASH(0x11b2e20) Bio::Seq::RichSeq=HASH(0x11cb360) Bio::Seq::RichSeq=HASH(0 x1198250) Bio::Seq::RichSeq=HASH(0x11d0240) Bio::Seq::RichSeq=HASH(0x11c8f20) Bi o::Seq::RichSeq=HASH(0x11b4ef0) Bio::Seq::RichSeq=HASH(0x119f7a0) Bio::Seq::Rich Seq=HASH(0x11c2ee0) Bio::Seq::RichSeq=HASH(0x11dba20) Bio::Seq::RichSeq=HASH(0x1 1e2300) Bio::Seq::RichSeq=HASH(0x11b2f10) Bio::Seq::RichSeq=HASH(0x11b4b90) Bio: :Seq::RichSeq=HASH(0x11d4df0) Bio::Seq::RichSeq=HASH(0x11d4b80) Bio::Seq::RichSe q=HASH(0x11d8e70) Bio::Seq::RichSeq=HASH(0x11a1270) Bio::Seq::RichSeq=HASH(0x11c b5d0) Bio::Seq::RichSeq=HASH(0x11d5cc0) Bio::Seq::RichSeq=HASH(0x11d32a0) Bio::S eq::RichSeq=HASH(0x11b4c80) Bio::Seq::RichSeq=HASH(0x119e0c0) Bio::Seq::RichSeq= HASH(0x11b7ed0) Bio::Seq::RichSeq=HASH(0x11ad490) Bio::Seq::RichSeq=HASH(0x1196e 60) Bio::Seq::RichSeq=HASH(0x119b7f0) Bio::Seq::RichSeq=HASH(0x11cef60) Bio::Seq ::RichSeq=HASH(0x11b7b70) Bio::Seq::RichSeq=HASH(0x11dd330) Bio::Seq::RichSeq=HA SH(0x11da8c0) Bio::Seq::RichSeq=HASH(0x11a9f70) Bio::Seq::RichSeq=HASH(0x119b700 ) Bio::Seq::RichSeq=HASH(0x119a550) Bio::Seq::RichSeq=HASH(0x11ba910) Bio::Seq:: RichSeq=HASH(0x11e0b30) Bio::Seq::RichSeq=HASH(0x11d3030) Bio::Seq::RichSeq=HASH (0x11c62d0) Bio::Seq::RichSeq=HASH(0x11abb20) Bio::Seq::RichSeq=HASH(0x11d5bd0) Bio::Seq::RichSeq=HASH(0x11b03c0) Bio::Seq::RichSeq=HASH(0x119e1b0) Bio::Seq::Ri chSeq=HASH(0x11aa060) Bio::Seq::RichSeq=HASH(0x11a5700) Bio::Seq::RichSeq=HASH(0 x11a81e0) Bio::Seq::RichSeq=HASH(0x1196b00) Bio::Seq::RichSeq=HASH(0x11c1260) Bi o::Seq::RichSeq=HASH(0x11a2800) Bio::Seq::RichSeq=HASH(0x11c63c0) Bio::Seq::Rich Seq=HASH(0x11b60b0) Bio::Seq::RichSeq=HASH(0x11b93b0) Bio::Seq::RichSeq=HASH(0x1 1a4490) Bio::Seq::RichSeq=HASH(0x11ded50) Bio::Seq::RichSeq=HASH(0x11bbcd0) Bio: :Seq::RichSeq=HASH(0x1194780) Bio::Seq::RichSeq=HASH(0x11aedd0) Bio::Seq::RichSe q=HASH(0x11cd300) Bio::Seq::RichSeq=HASH(0x11a14e0) Bio::Seq::RichSeq=HASH(0x11c 4630) Bio::Seq::RichSeq=HASH(0x11a43a0) Bio::Seq::RichSeq=HASH(0x11a80f0) Bio::S eq::RichSeq=HASH(0x11bbbe0) Bio::Seq::RichSeq=HASH(0x11d5960) Bio::Seq::RichSeq= HASH(0x11c8e30) Bio::Seq::RichSeq=HASH(0x11cd3f0) Bio::Seq::RichSeq=HASH(0x11dd4 20) Bio::Seq::RichSeq=HASH(0x11cee70) Bio::Seq::RichSeq=HASH(0x11dbb10) Bio::Seq ::RichSeq=HASH(0x119a460) Bio::Seq::RichSeq=HASH(0x11aaa60) Bio::Seq::RichSeq=HA SH(0x11d1760) Bio::Seq::RichSeq=HASH(0x11cb6c0) Bio::Seq::RichSeq=HASH(0x11c7530 ) Bio::Seq::RichSeq=HASH(0x11deae0) Bio::Seq::RichSeq=HASH(0x11c4720) Bio::Seq:: RichSeq=HASH(0x119f890) Bio::Seq::RichSeq=HASH(0x11a6c40) Bio::Seq::RichSeq=HASH (0x11ad130) Bio::Seq::RichSeq=HASH(0x11e23f0) Bio::Seq::RichSeq=HASH(0x11d2f40) Bio::Seq::RichSeq=HASH(0x1194640) Bio::Seq::RichSeq=HASH(0x11d8f60) Bio::Seq::Ri chSeq=HASH(0x11d0150) Bio::Seq::RichSeq=HASH(0x119d070) Bio::Seq::RichSeq=HASH(0 x11a5610) Bio::Seq::RichSeq=HASH(0x11aa2d0) Bio::Seq::RichSeq=HASH(0x11b94a0) Bi o::Seq::RichSeq=HASH(0x11bd5b0) Bio::Seq::RichSeq=HASH(0x11c0ff0) Bio::Seq::Rich Seq=HASH(0x11a6b50) Bio::Seq::RichSeq=HASH(0x119cf80) Bio::Seq::RichSeq=HASH(0x1 1baa00) Bio::Seq::RichSeq=HASH(0x11c7620) Bio::Seq::RichSeq=HASH(0x119fb00) Bio: :Seq::RichSeq=HASH(0x11a2a70) Bio::Seq::RichSeq=HASH(0x11b1960) Bio::Seq::RichSe q=HASH(0x11ab8b0) Bio::Seq::RichSeq=HASH(0x11e0c20) Bio::Seq::RichSeq=HASH(0x11a d3a0) Bio::Seq::RichSeq=HASH(0x1197fe0) Bio::Seq::RichSeq=HASH(0x11b1870) Bio::S eq::RichSeq=HASH(0x11a2b60) Bio::Seq::RichSeq=HASH(0x1192750) Bio::Seq::RichSeq= HASH(0x11c9190) Bio::Seq::RichSeq=HASH(0x11e08c0) Bio::Seq::RichSeq=HASH(0x11dd6 90) Bio::Seq::RichSeq=HASH(0x11da7d0) Bio::Seq::RichSeq=HASH(0x11aece0) Bio::Seq ::RichSeq=HASH(0x11d80b0) Bio::Seq::RichSeq=HASH(0x11ca0b0) Bio::Seq::RichSeq=HA SH(0x1196bf0) Bio::Seq::RichSeq=HASH(0x11b7de0) Bio::Seq::RichSeq=HASH(0x11b02d0 ) Can't call method "isa" on an undefined value at /usr/local/bioxapps/bioperl/lib rary//Bio/Graph/ProteinGraph.pm line 477, line 2. -------------- next part -------------- A non-text attachment was scrubbed... Name: from.pl Type: application/octet-stream Size: 2723 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060513/77476ca5/attachment-0001.obj From cjfields at uiuc.edu Sat May 13 14:18:53 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 13 May 2006 13:18:53 -0500 Subject: [Bioperl-l] problem help me...........please In-Reply-To: <20060513051758.4610.qmail@webmail31.rediffmail.com> Message-ID: <000901c676b9$b14479c0$15327e82@pyrimidine> I really hate to break the bad news here, but I'm going to be brutally honest. I have not looked at any of the Bio::Graph modules and have no idea how they are implemented, and I haven't looked at your input file, but I can tell right off the bat your script has major logic problems. I can also pretty much tell that you don't understand the object model we use here, at all. This is why I say that (from your last response): > ## no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) ## > what this stand for Did you cut and paste from several other scripts hoping that it would work? I say that b/c you mix styles quite frequently here, using objects correctly (deref'ing with '->') and incorrectly (print "$object"). You also declare (and redeclare) @ISA four times for a script (not needed unless you're declaring a class and inheriting methods from other modules). You also use @ISA once with a misspelled module name (I don't think there is a module named 'Expoerter'). So, I'm actually stunned that the script doesn't crash at all. Yikes! Okay, brutal honesty time over. Any time you see something like this: Bio::Graph::ProteinGraph=HASH(0x1182e70) means that what you are printing out is an reference to an object (it refers to the object class and the location in memory) and is NOT what you want. You should be doing something along the lines of $object->method, not 'print $object', to get at the object data and methods. You use this several times in your script already; that should be a big hint as the areas where it doesn't work do not use this syntax. Read the documentation for the many varied modules you use in your script. Look at script examples. Start simply, then work your way up. Also, using the '->' dereferencing operator inside double quotes doesn't work; you have to do something like: print $graph->nodes,"\t"; not print "$graph->nodes\t"; That's why you get this in your output: Bio::Graph::ProteinGraph=HASH(0x1182e70)->nodes Which just prints the object reference with the string '->nodes'. If any of what I just said doesn't make any sense, you really need to pick up 'Learning Perl' and 'Intermediate Perl' by Schwartz et al and 'Programming Perl' by Wall et al. I don't know if anyone can really help at this point w/o completely writing the script for you. We will fix problems to a point but we, for the most part, will not do your work for you. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of saurabh maheshwari > Sent: Saturday, May 13, 2006 12:18 AM > To: bioperl_l > Subject: [Bioperl-l] problem help me...........please > > > hello > I am very happy to see the prompt reply from the group members.. > As you all suggested to attach the required files .. > So I have attached all the three file first the input file,secod I have > saved the error I was getting into a error file and third the programme > file.. > Actully in error file I want to know some thing . > I am putting here one error line, > ## no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) ## > what this stand for > Second thing I want to get the connected graph as I have. > which type of connected grph I explain you by example.. > Let there are five object in such a way. > A connected to B > A connected to C > B connected to C > D connected to C > E connected to A > I want to create a whole link in betwwen all five. > > > Please help me I am not getting the result > > > with Regards > > SAURABH MAHESHWARI > > M.Sc. (BIOINFORMATICS) > > JAMIA MILLIA ISLAMIA > > NEW DELHI From hubert.prielinger at gmx.at Sat May 13 23:45:58 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Sat, 13 May 2006 21:45:58 -0600 Subject: [Bioperl-l] parsing output files from other tools Message-ID: <4466A7F6.30204@gmx.at> hi, Is it possible to parse text outputfiles rather than blast output files, like the text outputfiles form the search tool mpSrch that is offered by EBI, because the WU Blast output files are possible to parse with bioperl. thanks Hubert From arareko at campus.iztacala.unam.mx Sun May 14 00:09:35 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Sat, 13 May 2006 23:09:35 -0500 Subject: [Bioperl-l] Deobfuscator interface now available Message-ID: <4466AD7F.6050700@campus.iztacala.unam.mx> I'm glad to announce the availability of the Deobfuscator interface at the BioPerl website. You can use it at the following URL: http://bioperl.org/cgi-bin/deob_interface.cgi Many thanks to Laura Kavanaugh and David Messina for this great contribution to the BioPerl project! Mauricio. -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Sun May 14 12:18:10 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 14 May 2006 11:18:10 -0500 Subject: [Bioperl-l] parsing output files from other tools In-Reply-To: <4466A7F6.30204@gmx.at> Message-ID: <000301c67772$00b4e4f0$15327e82@pyrimidine> These are the current report types parsed through SearchIO: http://www.bioperl.org/wiki/Module:Bio::SearchIO I don't see mpsrch among them. If you want you could create a new plugin module to parse those reports; the SearchIO HOWTO gives some pointers: http://www.bioperl.org/wiki/HOWTO:SearchIO You can always look at some of the current modules like blast, blastxml, or fasta to get an idea of how it works. Judging by the mpsrch output I'm pretty sure you would have to build a custom plugin for it. A viable alternative: looking through the mail list it looks like mpsrch is a multiprocessor implementation of ssearch, itself an implementation of the Smith-Waterman algorithm for local alignments in the FASTA package of programs: http://www.bioperl.org/wiki/SSEARCH You might be able to use SearchIO::fasta there... Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger > Sent: Saturday, May 13, 2006 10:46 PM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] parsing output files from other tools > > hi, > Is it possible to parse text outputfiles rather than blast output files, > like the text outputfiles form the search tool mpSrch that is offered by > EBI, because the WU Blast output files are possible to parse with bioperl. > > thanks > Hubert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From chen_li3 at yahoo.com Sun May 14 13:14:30 2006 From: chen_li3 at yahoo.com (chen li) Date: Sun, 14 May 2006 10:14:30 -0700 (PDT) Subject: [Bioperl-l] no revcom method in Bio::Seq module? Message-ID: <20060514171430.74846.qmail@web36802.mail.mud.yahoo.com> Hi all, I need to get a reverse-complemenary sequence out of a fasta sequence file. And the Synopsis of Bio::Seq points out I can do like this way: $revcom=$seqobj->revcom(); I use the following script trying to get the job done but it doesn't work. Then I read documentation of Bio::Seq and it looks like it doesn't contain revcom method. Any idea will be appreciated. Li ############################### Here is the code: #!c:/perl/bin/perl.exe use strict; use warnings; use Bio::Seq; use Bio::SeqIO; my $file='c:/perl/local/primer3_1.0.0/src/est.txt'; my $seqIO=Bio::SeqIO->new(-file=>"<$file", -format=>'fasta' ); my $seqobj=$seqIO->next_seq();#create object print "what attributes/keys are available:\n"; for my $key (sort keys %$seqobj){ my $value=$seqobj->{$key}; print "$key\t=>\t$value\n" } # These are the output on the screen #primary_id => gi|54093|emb|X61809.1| #primary_seq => Bio::PrimarySeq=HASH(0x10492848) #based on these results primary_id can get #access right away # as to primary_seq it is an object in #Bio::Primaryseq and it provides the following #methods after reading the documentaion: #new #seq #validate_seq #subseq #length #display_id #accession_number #primary_id #alphabet #desc #can_call_new #id #is_circular #object_id #version #authority #namespace #display_name #description print "primary_id=",$seqobj->primary_id, "\n\n"; print "id=",$seqobj->id, "\n\n"; print "revcom=",$seqobj->revcom,"\n\n"; my $now_time=localtime; print $now_time, "\n\n"; exit; #These are the output on the screen #primary_id=gi|54093|emb|X61809.1| #id=gi|54093|emb|X61809.1 #revcom=Bio::Seq=HASH(0x10493304) #Sun May 14 12:45:20 2006 __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cjfields at uiuc.edu Sun May 14 13:39:50 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 14 May 2006 12:39:50 -0500 Subject: [Bioperl-l] no revcom method in Bio::Seq module? In-Reply-To: <20060514171430.74846.qmail@web36802.mail.mud.yahoo.com> Message-ID: <000401c6777d$66ddb120$15327e82@pyrimidine> This line should give you the hint: #revcom=Bio::Seq=HASH(0x10493304) You're getting an object ref here. The actual way to get the rev. comp on the wiki states '$seq->revcom->seq', not '$seq->revcom'. When I ran your script and change your line to the wiki version I get (using my test seq): what attributes/keys are available: primary_id => test, primary_seq => Bio::PrimarySeq=HASH(0x1d47fe0) primary_id=test, id=test, revcom=GGAACGAGATCTCCATGCCGCGCACCATCGGCCCGGGATGCAGCACGATCGCGCGGTCCGGCAGCATCG CCTGGCGCTTCTCGGACAATCCGTAGCGCACCGAGTACTCACGCGCGGA CGGGAAGAAACTGCCGTTCATGCGTTCGGCCTGCACGCGCAGCATGAGCACCGCGTCGGCCGCGGGCAGTTCGGCG TCCAGGTCATAGGACACGGTCACCGGCCAGTTCTCGACGCCCCTGGGGA GCAGCGTCGGTGGGGACACCAGCACCACCTCGGCCCCGAGGGTGTGCAGCAGCGTCACGTTGGAGCGGGCCACGCG GCTGTGCAGCACGTCGCCGACGATCACCACGCGCTTGCCCTCGACGCTG Sun May 14 17:34:45 2006 Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of chen li > Sent: Sunday, May 14, 2006 12:15 PM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] no revcom method in Bio::Seq module? > > Hi all, > > I need to get a reverse-complemenary sequence out of a > fasta sequence file. And the Synopsis of Bio::Seq > points out I can do like this way: > > $revcom=$seqobj->revcom(); > > I use the following script trying to get the job done > but it doesn't work. Then I read documentation of > Bio::Seq and it looks like it doesn't contain revcom > method. > > Any idea will be appreciated. > > Li > > > ############################### > Here is the code: > > #!c:/perl/bin/perl.exe > use strict; > use warnings; > > use Bio::Seq; > use Bio::SeqIO; > > my $file='c:/perl/local/primer3_1.0.0/src/est.txt'; > > > my $seqIO=Bio::SeqIO->new(-file=>"<$file", > -format=>'fasta' ); > > my $seqobj=$seqIO->next_seq();#create object > > print "what attributes/keys are available:\n"; > for my $key (sort keys %$seqobj){ > my $value=$seqobj->{$key}; > print "$key\t=>\t$value\n" > } > # These are the output on the screen > #primary_id => gi|54093|emb|X61809.1| > #primary_seq => Bio::PrimarySeq=HASH(0x10492848) > > #based on these results primary_id can get > #access right away > # as to primary_seq it is an object in > #Bio::Primaryseq and it provides the following > #methods after reading the documentaion: > #new > #seq > #validate_seq > #subseq > #length > #display_id > #accession_number > #primary_id > #alphabet > #desc > #can_call_new > #id > #is_circular > #object_id > #version > #authority > #namespace > #display_name > #description > > print "primary_id=",$seqobj->primary_id, "\n\n"; > print "id=",$seqobj->id, "\n\n"; > print "revcom=",$seqobj->revcom,"\n\n"; > > my $now_time=localtime; > print $now_time, "\n\n"; > exit; > > #These are the output on the screen > #primary_id=gi|54093|emb|X61809.1| > #id=gi|54093|emb|X61809.1 > #revcom=Bio::Seq=HASH(0x10493304) > #Sun May 14 12:45:20 2006 > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From chen_li3 at yahoo.com Sun May 14 14:08:49 2006 From: chen_li3 at yahoo.com (chen li) Date: Sun, 14 May 2006 11:08:49 -0700 (PDT) Subject: [Bioperl-l] no revcom method in Bio::Seq module? In-Reply-To: <000401c6777d$66ddb120$15327e82@pyrimidine> Message-ID: <20060514180849.55423.qmail@web36808.mail.mud.yahoo.com> Hi Chris, Thank you very much. But could you please give me the link for this syntax: $seq->revcom->seq? Li --- Chris Fields wrote: > This line should give you the hint: > > #revcom=Bio::Seq=HASH(0x10493304) > > You're getting an object ref here. The actual way > to get the rev. comp on > the wiki states '$seq->revcom->seq', not > '$seq->revcom'. > > When I ran your script and change your line to the > wiki version I get (using > my test seq): > > what attributes/keys are available: > primary_id => test, > primary_seq => > Bio::PrimarySeq=HASH(0x1d47fe0) > primary_id=test, > > id=test, > > revcom=GGAACGAGATCTCCATGCCGCGCACCATCGGCCCGGGATGCAGCACGATCGCGCGGTCCGGCAGCATCG > CCTGGCGCTTCTCGGACAATCCGTAGCGCACCGAGTACTCACGCGCGGA > CGGGAAGAAACTGCCGTTCATGCGTTCGGCCTGCACGCGCAGCATGAGCACCGCGTCGGCCGCGGGCAGTTCGGCG > TCCAGGTCATAGGACACGGTCACCGGCCAGTTCTCGACGCCCCTGGGGA > GCAGCGTCGGTGGGGACACCAGCACCACCTCGGCCCCGAGGGTGTGCAGCAGCGTCACGTTGGAGCGGGCCACGCG > GCTGTGCAGCACGTCGCCGACGATCACCACGCGCTTGCCCTCGACGCTG > > Sun May 14 17:34:45 2006 > > Chris > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of chen li > > Sent: Sunday, May 14, 2006 12:15 PM > > To: bioperl-l at bioperl.org > > Subject: [Bioperl-l] no revcom method in Bio::Seq > module? > > > > Hi all, > > > > I need to get a reverse-complemenary sequence out > of a > > fasta sequence file. And the Synopsis of Bio::Seq > > points out I can do like this way: > > > > $revcom=$seqobj->revcom(); > > > > I use the following script trying to get the job > done > > but it doesn't work. Then I read documentation of > > Bio::Seq and it looks like it doesn't contain > revcom > > method. > > > > Any idea will be appreciated. > > > > Li > > > > > > ############################### > > Here is the code: > > > > #!c:/perl/bin/perl.exe > > use strict; > > use warnings; > > > > use Bio::Seq; > > use Bio::SeqIO; > > > > my > $file='c:/perl/local/primer3_1.0.0/src/est.txt'; > > > > > > my $seqIO=Bio::SeqIO->new(-file=>"<$file", > > -format=>'fasta' ); > > > > my $seqobj=$seqIO->next_seq();#create object > > > > print "what attributes/keys are available:\n"; > > for my $key (sort keys %$seqobj){ > > my $value=$seqobj->{$key}; > > print "$key\t=>\t$value\n" > > } > > # These are the output on the screen > > #primary_id => gi|54093|emb|X61809.1| > > #primary_seq => > Bio::PrimarySeq=HASH(0x10492848) > > > > #based on these results primary_id can get > > #access right away > > # as to primary_seq it is an object in > > #Bio::Primaryseq and it provides the following > > #methods after reading the documentaion: > > #new > > #seq > > #validate_seq > > #subseq > > #length > > #display_id > > #accession_number > > #primary_id > > #alphabet > > #desc > > #can_call_new > > #id > > #is_circular > > #object_id > > #version > > #authority > > #namespace > > #display_name > > #description > > > > print "primary_id=",$seqobj->primary_id, "\n\n"; > > print "id=",$seqobj->id, "\n\n"; > > print "revcom=",$seqobj->revcom,"\n\n"; > > > > my $now_time=localtime; > > print $now_time, "\n\n"; > > exit; > > > > #These are the output on the screen > > #primary_id=gi|54093|emb|X61809.1| > > #id=gi|54093|emb|X61809.1 > > #revcom=Bio::Seq=HASH(0x10493304) > > #Sun May 14 12:45:20 2006 > > > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cjfields at uiuc.edu Sun May 14 14:28:14 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Sun, 14 May 2006 13:28:14 -0500 Subject: [Bioperl-l] no revcom method in Bio::Seq module? Message-ID: I think the confusion lies in what revcom returns. This page http://www.bioperl.org/wiki/Getting_Started show a quick way of using revcom, (which I mentioned previously) while this page http://www.bioperl.org/wiki/HOWTO:Beginners explains what is returned when you use revcom. '$seq_obj->revcom' returns a sequence object (not a sequence string): http://www.bioperl.org/wiki/HOWTO:Beginners#The_Sequence_Object which is why you need to use the 'seq' method to get the string. Hence, '$seq_obj->revcom->seq'. Chris ---- Original message ---- >Date: Sun, 14 May 2006 11:08:49 -0700 (PDT) >From: chen li >Subject: RE: [Bioperl-l] no revcom method in Bio::Seq module? >To: Chris Fields >Cc: bioperl-l at bioperl.org > >Hi Chris, > >Thank you very much. But could you please give me the >link for this syntax: $seq->revcom->seq? > >Li > > > >--- Chris Fields wrote: > >> This line should give you the hint: >> >> #revcom=Bio::Seq=HASH(0x10493304) >> >> You're getting an object ref here. The actual way >> to get the rev. comp on >> the wiki states '$seq->revcom->seq', not >> '$seq->revcom'. >> >> When I ran your script and change your line to the >> wiki version I get (using >> my test seq): >> >> what attributes/keys are available: >> primary_id => test, >> primary_seq => >> Bio::PrimarySeq=HASH(0x1d47fe0) >> primary_id=test, >> >> id=test, >> >> >revcom=GGAACGAGATCTCCATGCCGCGCACCATCGGCCCGGGATGCAGCACGAT CGCGCGGTCCGGCAGCATCG >> CCTGGCGCTTCTCGGACAATCCGTAGCGCACCGAGTACTCACGCGCGGA >> >CGGGAAGAAACTGCCGTTCATGCGTTCGGCCTGCACGCGCAGCATGAGCACCGCG TCGGCCGCGGGCAGTTCGGCG >> TCCAGGTCATAGGACACGGTCACCGGCCAGTTCTCGACGCCCCTGGGGA >> >GCAGCGTCGGTGGGGACACCAGCACCACCTCGGCCCCGAGGGTGTGCAGCAGCGT CACGTTGGAGCGGGCCACGCG >> GCTGTGCAGCACGTCGCCGACGATCACCACGCGCTTGCCCTCGACGCTG >> >> Sun May 14 17:34:45 2006 >> >> Chris >> >> > -----Original Message----- >> > From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l- >> > bounces at lists.open-bio.org] On Behalf Of chen li >> > Sent: Sunday, May 14, 2006 12:15 PM >> > To: bioperl-l at bioperl.org >> > Subject: [Bioperl-l] no revcom method in Bio::Seq >> module? >> > >> > Hi all, >> > >> > I need to get a reverse-complemenary sequence out >> of a >> > fasta sequence file. And the Synopsis of Bio::Seq >> > points out I can do like this way: >> > >> > $revcom=$seqobj->revcom(); >> > >> > I use the following script trying to get the job >> done >> > but it doesn't work. Then I read documentation of >> > Bio::Seq and it looks like it doesn't contain >> revcom >> > method. >> > >> > Any idea will be appreciated. >> > >> > Li >> > >> > >> > ############################### >> > Here is the code: >> > >> > #!c:/perl/bin/perl.exe >> > use strict; >> > use warnings; >> > >> > use Bio::Seq; >> > use Bio::SeqIO; >> > >> > my >> $file='c:/perl/local/primer3_1.0.0/src/est.txt'; >> > >> > >> > my $seqIO=Bio::SeqIO->new(-file=>"<$file", >> > -format=>'fasta' ); >> > >> > my $seqobj=$seqIO->next_seq();#create object >> > >> > print "what attributes/keys are available:\n"; >> > for my $key (sort keys %$seqobj){ >> > my $value=$seqobj->{$key}; >> > print "$key\t=>\t$value\n" >> > } >> > # These are the output on the screen >> > #primary_id => gi|54093|emb|X61809.1| >> > #primary_seq => >> Bio::PrimarySeq=HASH(0x10492848) >> > >> > #based on these results primary_id can get >> > #access right away >> > # as to primary_seq it is an object in >> > #Bio::Primaryseq and it provides the following >> > #methods after reading the documentaion: >> > #new >> > #seq >> > #validate_seq >> > #subseq >> > #length >> > #display_id >> > #accession_number >> > #primary_id >> > #alphabet >> > #desc >> > #can_call_new >> > #id >> > #is_circular >> > #object_id >> > #version >> > #authority >> > #namespace >> > #display_name >> > #description >> > >> > print "primary_id=",$seqobj->primary_id, "\n\n"; >> > print "id=",$seqobj->id, "\n\n"; >> > print "revcom=",$seqobj->revcom,"\n\n"; >> > >> > my $now_time=localtime; >> > print $now_time, "\n\n"; >> > exit; >> > >> > #These are the output on the screen >> > #primary_id=gi|54093|emb|X61809.1| >> > #id=gi|54093|emb|X61809.1 >> > #revcom=Bio::Seq=HASH(0x10493304) >> > #Sun May 14 12:45:20 2006 >> > >> > >> > >> > __________________________________________________ >> > Do You Yahoo!? >> > Tired of spam? Yahoo! Mail has the best spam >> protection around >> > http://mail.yahoo.com >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > >__________________________________________________ >Do You Yahoo!? >Tired of spam? Yahoo! Mail has the best spam protection around >http://mail.yahoo.com From Marc.Logghe at DEVGEN.com Sun May 14 16:28:34 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Sun, 14 May 2006 22:28:34 +0200 Subject: [Bioperl-l] no revcom method in Bio::Seq module? Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746DAC@ANTARESIA.be.devgen.com> Hi Li, > doesn't work. Then I read documentation of Bio::Seq and it > looks like it doesn't contain revcom method. Here, the Deobfuscator interface that Mauricio announced earlier, comes in handy. http://bioperl.org/cgi-bin/deob_interface.cgi?Search=Search&module=Bio%3 A%3ASeq&sort_order=by+method&search_string= If you look in the methods table, you will find out that the revcom method is inherited from, and implemented by Bio::PrimarySeqI. HTH, Marc From sb at mrc-dunn.cam.ac.uk Mon May 15 04:18:11 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Mon, 15 May 2006 09:18:11 +0100 Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names In-Reply-To: <000f01c675e6$a61bde90$15327e82@pyrimidine> References: <000f01c675e6$a61bde90$15327e82@pyrimidine> Message-ID: <44683943.5020307@mrc-dunn.cam.ac.uk> Chris Fields wrote: > Sendu Bala wrote: >> In bioperl up to at least 1.5.1, when one of the database modules >> comes across a species rank it does: >> >> if ($rank eq 'species') { # get rid of genus from species name >> (undef,$taxon_name) = split(/\s+/,$taxon_name,2); } > > The XML example from NCBI Taxonomy I mentioned previously seems to > have everything in the classification, from superkingdom down to > species (no strain unfortunately, and I'm nit sure about subspecies); > if it's missing the rank then the designation doesn't exist or is > tagged as 'no rank'. Like I mentioned before I'm not intimately > familiar Bio::Taxonomy, Bio::DB::Taxonomy, or Bio::Species, so I > don't have a clue as to how everything is parsed and plugged in to > Bio::Taxonomy objects. I do know that XML::Twig is used for parsing > through the data so it shouldn't be too hard to change what you > want. Yes, that's all true, but I'm not sure what it has to do with what I was saying. FYI, you do get a 'subspecies' rank but no 'variant' rank. In my own implementation I change the rank of all 'no rank' Nodes below species to 'variant'. > I haven't tried using Bio::DB::Taxonomy directly yet, but I would > have thought that the binomial is just built from the XML twig > 'LineageEx' Rank=Genus + Rank=Species, that the genus comes from the > tag 'Genus' and species from 'Species', and that the scientific name > is from the tag 'ScientificName'. Guess not. No. See above for what it actually does. That is a copy/paste from the code (there, $taxon_name == ScientificName). When it finds a species rank it does that split because in the ncbi taxonomy database the 'genus' rank for a human has a ScientificName of 'Homo', whilst the 'species' rank has a ScientificName of 'Homo sapiens', and the bioperl model (quite rightly, I think) wants the 'species' node to not have information of other nodes (well, except for the classification array). So it removes the 'Homo' from 'Homo sapiens' giving a species name of 'sapiens'. This then allows the binomial method to return 'Homo sapiens' instead of 'Homo Homo sapiens'. (though in a bizarre twist, and this is one of my problems with how names are currently represented in the Taxonomy modules, 'Scientific Name' and 'binomial' are synonymous) [snip] >> My solution is to just remove whatever is the same between the >> current rank and the previous rank. Maybe even that's not so >> perfect, but it must be a lot better than turning the species >> 'Avian leukosis virus' into the species 'virus' (especially given >> that the genus here is 'Alpharetrovirus')! > > I'm don't think taking Genus/Species directly from the scientific > name (normally what is in the SOURCE or ORGANISM annotation for > GenBank or OS for EMBL) is the best way to go about it [snip] Perhaps, but again I'm not sure what this has to do with what I was saying. If you don't want your species name to contain your genus name you have to do some kind of parsing. My post merely pointed out that the parsing currently in bioperl does not work for viruses and possibly other species. I'd like to think that someone cares about this error and would do the simple fix I offered, or that they already know about the problem and have done their own fix. > I'm also not sure that forcing a lookup for every TaxID in every > sequence every time it's passed through SeqIO is the best way to go > either, though I think it should be required for storing sequences. > It's a tricky balance. In my own implementation any database lookups are cached, and you have the option of not doing any database lookup at all and 'faking' a taxonomy from the supplied list of names (so it works just like normal Bio::Seq). > I still think that maybe we should absolve ourselves from using > SOURCE/ORGANISM or OS/OC information in GenBank files as anything > more than strictly annotation, or reconstruct Bio::Species to maybe a > Bio::Annotation::Species object to handle that annotation and either > deprecate Bio::Species or separate it completely from any > Bio::Taxonomy objects. It would really simplify things. Then, if > anyone is interested in taxonomy, either install a local database or > use Entrez efetch, and then use Bio::DB::Taxonomy (fixed of course) > to grab the TaxID info. My personal view is that having it as an annotation would serve no real purpose. For me the whole point of any kind of species representation in bioperl is to allow you to compare species in a biologically meaningful way. If it's just some annotation then that means it's basically free-form text and you have no guarantee that two sequences from the same species are annotated exactly the same - no guarantee that your code would identify that those sequences are from the same species. The only other useful thing that a species object needs to do it let you know how related two different species are - you need to be able to ask what a species' class, kingdom etc. are. Again, not viable with an annotation - you need something strict like a properly constructed Taxonomy. I guess it comes down to the philosophy of parsing a file. Do you try and reflect exactly what the file contains, letter for letter, so that your resulting object can recreate that file letter for letter, or do you parse the file and extract the correct /meaning/ in order to be more useful? I think there can be a choice by the user, and this is best done by making Bio::Species a clever wrapper around an improved Bio::Taxonomy, as in my own implementation. From s_maheshwari84 at rediffmail.com Mon May 15 04:15:26 2006 From: s_maheshwari84 at rediffmail.com (saurabh maheshwari) Date: 15 May 2006 08:15:26 -0000 Subject: [Bioperl-l] please help Message-ID: <20060515081526.27270.qmail@webmail7.rediffmail.com> Hello All I have sent a problem to the earlier also but my problem is still unsolve so i have modified the problem in another way please can any body give me code to make a graph between some items which are in a text file in the following formate: Example item1 interacts with item2 and i want to make graph by giving any item as input and asking all interactions of that item. item 1 item 2 A B A C C B D B D E A F G A with Regards SAURABH MAHESHWARI M.Sc. (BIOINFORMATICS) JAMIA MILLIA ISLAMIA NEW DELHI From sdavis2 at mail.nih.gov Mon May 15 06:26:53 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Mon, 15 May 2006 06:26:53 -0400 Subject: [Bioperl-l] please help In-Reply-To: <20060515081526.27270.qmail@webmail7.rediffmail.com> Message-ID: On 5/15/06 4:15 AM, "saurabh maheshwari" wrote: > > Hello All > I have sent a problem to the earlier also but my problem is still unsolve so i > have modified the problem in another way please can any body give me code to > make a graph between some items which are in a text file in the following > formate: > Example > item1 interacts with item2 and i want to make graph by giving any item as > input and asking all interactions of that item. > > item 1 item 2 > A B > A C > C B > D B > D E > A F > G A Not a bioperl answer, but in your case, I would suggest looking at using cytoscape to do this. Look here for details: http://www.cytoscape.org/ Sean From sdavis2 at mail.nih.gov Mon May 15 07:03:28 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Mon, 15 May 2006 07:03:28 -0400 Subject: [Bioperl-l] please help In-Reply-To: Message-ID: On 5/15/06 6:26 AM, "Sean Davis" wrote: > > > > On 5/15/06 4:15 AM, "saurabh maheshwari" > wrote: > >> >> Hello All >> I have sent a problem to the earlier also but my problem is still unsolve so >> i >> have modified the problem in another way please can any body give me code to >> make a graph between some items which are in a text file in the following >> formate: >> Example >> item1 interacts with item2 and i want to make graph by giving any item as >> input and asking all interactions of that item. >> >> item 1 item 2 >> A B >> A C >> C B >> D B >> D E >> A F >> G A > > Not a bioperl answer, but in your case, I would suggest looking at using > cytoscape to do this. Look here for details: > > http://www.cytoscape.org/ I forgot to mention, if you are looking for a perl solution, I would look at the Graph module. http://search.cpan.org/~jhi/Graph-0.69/lib/Graph.pod You can create the graph according to the docs and then use the neighbors() method (if I remember correctly) to get the nodes connected to the query node. Sean From akarger at CGR.Harvard.edu Mon May 15 08:20:11 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Mon, 15 May 2006 08:20:11 -0400 Subject: [Bioperl-l] Deobfuscator interface now available Message-ID: This tool is quite nice, and may save me a lot of perdoc'ing. A couple of minor interface thoughts. 1)There's quite a lot of methods for many of the classes. As such, I think I'll often want to browse through what's available in a class. But 60% or so of the screen real estate is used for "Enter a search string... OR select a class from the list". IMO, it would be better to have two pages, a search page and a result page. It only takes a click on Back (or a "new search" button) to get to a new search, and now you can use your whole screen for reading your results. 2) Please sort the "select a class from the list" alphabetically. I guess I can enter a search term to get the right classes, but it would be nice to be able to browse. 2a) if you want to be really fancy, make a javascript nested menu with expandable submenus. OK, maybe not. 3) Minimalist is nice, but documentation is even nicer. It wasn't clear to me that the search searches within class names rather than function names. What I really want to know sometimes is which module has, say, the revcom method in it. So, if it's not easy to include that within this search, then at least tell me what my search space is. 4) When I search for something that's not found, I get a screen that looks pretty familiar, with the extra text "No match to string found" down at the bottom. It took me a while to even notice it. (Studies show that most users don't read most of the text on a page.) Bold might be nice here. Or put the error at the top of the screen. Or both. 5) I'll save my stupidest comment for last - please make the page title "Bioperl Deobfuscator", so that when I bookmark it I'll know what the bookmark stands for. Thanks, Laura Kavanaugh and David Messina, for a neat AND useful tool. - Amir Karger Computational Biology Group Bauer Center for Genomics Research Harvard University 617-496-0626 From sb at mrc-dunn.cam.ac.uk Mon May 15 09:08:32 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Mon, 15 May 2006 14:08:32 +0100 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: References: Message-ID: <44687D50.6080306@mrc-dunn.cam.ac.uk> Amir Karger wrote: > This tool is quite nice, and may save me a lot of perdoc'ing. Yes, many thanks to everyone involved. > A couple of minor interface thoughts. > > 1)There's quite a lot of methods for many of the classes. As such, I > think I'll often want to browse through what's available in a class. But > 60% or so of the screen real estate is used for "Enter a search > string... OR select a class from the list". IMO, it would be better to > have two pages, a search page and a result page. It only takes a click > on Back (or a "new search" button) to get to a new search, and now you > can use your whole screen for reading your results. As the compromise it must be, I like the way it behaves. I don't like lots of windows. I especially don't like pop up windows. Right now when I'm using the bioperl docs I tend to have a whole bunch of tabs open to different class pages at once, so being able to see an overview all on one page in Deobfuscator is very nice. Further to that, I'd love it if clicking on a method name caused an in-place css(&|javascript) reveal (similar to how a well implemented drop down menu works in a website) rather than a new window opened. Alternatively, just have more columns in the results table, ie. usage, function, returns, args columns. I feel that opening a window for each method you want to understand is far too slow. I'd also really like a link to the code for the method as well. The bioperl docs are rarely complete enough that you can really understand what every method is supposed to do without looking at the code. > 3) Minimalist is nice, but documentation is even nicer. It wasn't clear > to me that the search searches within class names rather than function > names. What I really want to know sometimes is which module has, say, > the revcom method in it. This would be a great feature to add. Another minor interface thought: 6) Have a little more cell padding in all the tables. Things are just a little too cramped and things start to look messy/ run into each other. From cjfields at uiuc.edu Mon May 15 09:59:57 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 08:59:57 -0500 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <44687D50.6080306@mrc-dunn.cam.ac.uk> Message-ID: <000901c67827$d99eabb0$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Monday, May 15, 2006 8:09 AM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Deobfuscator interface now available > > Amir Karger wrote: > > This tool is quite nice, and may save me a lot of perdoc'ing. > > Yes, many thanks to everyone involved. The Deobfuscator currently indexes bioperl-1.4, so it's not completely up-to-date. I believe Mauricio and Dave may be working on updating to the newer versions and maybe bioperl-live, as well as getting the other bioperl packages up and running. For modules added after v1.4 I use the script in the FAQ question mentioned on the Deobfuscator wiki page to get up-to-date methods, then grab the that ActiveState HTML'd perldocs pumped out when installing using PPM (I make a custom PPM/PPD file and install myself every once in a while): #!/usr/bin/perl -w use Class::Inspector; $class = shift || die "Usage: methods perl_class_name\n"; eval "require $class"; print join ("\n", sort @{Class::Inspector- > > A couple of minor interface thoughts. > > > > 1)There's quite a lot of methods for many of the classes. As such, I > > think I'll often want to browse through what's available in a class. But > > 60% or so of the screen real estate is used for "Enter a search > > string... OR select a class from the list". IMO, it would be better to > > have two pages, a search page and a result page. It only takes a click > > on Back (or a "new search" button) to get to a new search, and now you > > can use your whole screen for reading your results. > > As the compromise it must be, I like the way it behaves. I don't like > lots of windows. I especially don't like pop up windows. Right now when > I'm using the bioperl docs I tend to have a whole bunch of tabs open to > different class pages at once, so being able to see an overview all on > one page in Deobfuscator is very nice. > > Further to that, I'd love it if clicking on a method name caused an > in-place css(&|javascript) reveal (similar to how a well implemented > drop down menu works in a website) rather than a new window opened. > Alternatively, just have more columns in the results table, ie. usage, > function, returns, args columns. I feel that opening a window for each > method you want to understand is far too slow. Agreed. > I'd also really like a link to the code for the method as well. The > bioperl docs are rarely complete enough that you can really understand > what every method is supposed to do without looking at the code. The methods that pop up are in columns along with the class module that implements the method. If you click on that link you get PDOC documentation for the module which includes most of the code (strangely, though Deobfuscator indexes bioperl 1.4, the PDOC corresponds to bioperl-live). Is that what you meant, or something a bit more detailed? > > 3) Minimalist is nice, but documentation is even nicer. It wasn't clear > > to me that the search searches within class names rather than function > > names. What I really want to know sometimes is which module has, say, > > the revcom method in it. That's listed in the method results table (the next column has the module with a link to the module's online docs). Chris > This would be a great feature to add. > > > Another minor interface thought: > 6) Have a little more cell padding in all the tables. Things are just a > little too cramped and things start to look messy/ run into each other. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Mon May 15 12:08:30 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 11:08:30 -0500 Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names In-Reply-To: <44683943.5020307@mrc-dunn.cam.ac.uk> Message-ID: <001601c67839$cf289490$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Monday, May 15, 2006 3:18 AM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, > subspecies/variant names > > Chris Fields wrote: > > Sendu Bala wrote: > >> In bioperl up to at least 1.5.1, when one of the database modules > >> comes across a species rank it does: > >> > >> if ($rank eq 'species') { # get rid of genus from species name > >> (undef,$taxon_name) = split(/\s+/,$taxon_name,2); } > > > > The XML example from NCBI Taxonomy I mentioned previously seems to > > have everything in the classification, from superkingdom down to > > species (no strain unfortunately, and I'm nit sure about subspecies); > > if it's missing the rank then the designation doesn't exist or is > > tagged as 'no rank'. Like I mentioned before I'm not intimately > > familiar Bio::Taxonomy, Bio::DB::Taxonomy, or Bio::Species, so I > > don't have a clue as to how everything is parsed and plugged in to > > Bio::Taxonomy objects. I do know that XML::Twig is used for parsing > > through the data so it shouldn't be too hard to change what you > > want. > > Yes, that's all true, but I'm not sure what it has to do with what I was > saying. FYI, you do get a 'subspecies' rank but no 'variant' rank. In my > own implementation I change the rank of all 'no rank' Nodes below > species to 'variant'. Sorry; wandered a bit off topic there. > > I haven't tried using Bio::DB::Taxonomy directly yet, but I would > > have thought that the binomial is just built from the XML twig > > 'LineageEx' Rank=Genus + Rank=Species, that the genus comes from the > > tag 'Genus' and species from 'Species', and that the scientific name > > is from the tag 'ScientificName'. Guess not. > > No. See above for what it actually does. That is a copy/paste from the > code (there, $taxon_name == ScientificName). When it finds a species > rank it does that split because in the > ncbi taxonomy database the 'genus' rank for a human has a ScientificName > of 'Homo', whilst the 'species' rank has a ScientificName of 'Homo > sapiens', and the bioperl model (quite rightly, I think) wants the > 'species' node to not have information of other nodes (well, except for > the classification array). So it removes the 'Homo' from 'Homo sapiens' > giving a species name of 'sapiens'. This then allows the binomial method > to return 'Homo sapiens' instead of 'Homo Homo sapiens'. > > (though in a bizarre twist, and this is one of my problems with how > names are currently represented in the Taxonomy modules, 'Scientific > Name' and 'binomial' are synonymous) Ah, now I see. That's a bit screwy, but it's not on our end so we have to deal with it. I also noticed that subspecies also contains the entire string: 135461 Bacillus subtilis subsp. subtilis subspecies As for the 'scientific_name' method when accessed through Bio::DB::Taxonomy, I don't get the actual scientific name for the node (from the GenBank ORGANISM line) almost every time; I get the name with the strain chopped off instead and a number of times the names get mangled. The regexes below only grab from the topmost tags: Script: --------------------------------- #! perl use strict; use warnings; use Bio::DB::Taxonomy; my $file = shift @ARGV; print "\nNCBI XML output ScientificName tag for each node:\n"; my @taxid =(); open (TAXFILE, "){ if (/^\s{2}(\d+)<\/TaxId>/) { print "$1\t"; push @taxid, $1; } print "$1\n" if /^\s{2}(.*)<\/ScientificName>/; } close TAXFILE; print "\nBio::DB::Taxonomy scientific_name:\n"; for my $id (@taxid){ my $factory = Bio::DB::Taxonomy->new(-source => 'entrez'); my $node = $factory->get_Taxonomy_Node(-taxonid => $id); print $node->ncbi_taxid,"\t",$node->scientific_name,"\n"; } --------------------------------- Output: --------------------------------- NCBI XML output ScientificName tag for each node: 191218 Bacillus anthracis str. A2012 198094 Bacillus anthracis str. Ames 222523 Bacillus cereus ATCC 10987 224308 Bacillus subtilis subsp. subtilis str. 168 226186 Bacteroides thetaiotaomicron VPI-5482 226900 Bacillus cereus ATCC 14579 246194 Carboxydothermus hydrogenoformans Z-2901 260799 Bacillus anthracis str. Sterne 261594 Bacillus anthracis str. 'Ames Ancestor' 264462 Bdellovibrio bacteriovorus HD100 272558 Bacillus halodurans C-125 272559 Bacteroides fragilis NCTC 9343 279010 Bacillus licheniformis ATCC 14580 281309 Bacillus thuringiensis serovar konkukian str. 97-27 288681 Bacillus cereus E33L 295405 Bacteroides fragilis YCH46 66692 Bacillus clausii KSM-K16 76114 Azoarcus sp. EbN1 Bio::DB::Taxonomy scientific_name: 191218 Bacillus cereus group anthracis 198094 Bacillus cereus group anthracis 222523 Bacillus cereus group cereus 224308 subtilis Bacillus subtilis subsp. subtilis 226186 Bacteroides thetaiotaomicron 226900 Bacillus cereus group cereus 246194 Carboxydothermus hydrogenoformans 260799 Bacillus cereus group anthracis 261594 Bacillus cereus group anthracis 264462 Bdellovibrio bacteriovorus 272558 Bacillus halodurans 272559 Bacteroides fragilis 279010 Bacillus licheniformis 281309 Bacillus cereus group thuringiensis 288681 Bacillus cereus group cereus 295405 Bacteroides fragilis 66692 Bacillus clausii 76114 Azoarcus sp. --------------------------------- Note Bacillus subtilis in the Bio::Tax output above. Not one of those is the scientific name as defined by NCBI (and most taxonomists for that matter). So, in a nutshell, there's a problem here. I don't know if your fix works for that, but I definitely don't think the 'scientific name' should be assembled ad hoc but should be taken from the tagname for that node. I am currently reduced to grabbing the feature primary_tagged 'source' and getting the 'organism' tagname from that. I cannot stress enough that it should NOT be that way. As for 'binomial' == 'scientific_name', I agree; I see it as well and that should be fixed. ... > Perhaps, but again I'm not sure what this has to do with what I was > saying. If you don't want your species name to contain your genus name > you have to do some kind of parsing. My post merely pointed out that the > parsing currently in bioperl does not work for viruses and possibly > other species. I'd like to think that someone cares about this error and > would do the simple fix I offered, or that they already know about the > problem and have done their own fix. Again me going off-topic, so my apologies; it's more to do with my frustrations with Bio::Species (not Bio::DB::Taxonomy). My point here was, since there is no real way to surmise from a GenBank flatfile what the taxonomic ranks are w/o guessing (which seems to break more often than not when dealing with complex names), there shouldn't be any tie to Bio::Tax objects, at least directly. I guess methods could be incorporated into Bio::Species for those who want to give it a try, but I would like to get a GenBank file, for once, in which the scientific name/binomial name isn't mangled by Bio::Species. Back to Bio::DB::Taxonomy; I don't have a problem with implementing your methods here; on the contrary, if they fix my problem above then I'll be more than glad to. I can't get to it immediately but maybe later today/tomorrow. > > I'm also not sure that forcing a lookup for every TaxID in every > > sequence every time it's passed through SeqIO is the best way to go > > either, though I think it should be required for storing sequences. > > It's a tricky balance. > > In my own implementation any database lookups are cached, and you have > the option of not doing any database lookup at all and 'faking' a > taxonomy from the supplied list of names (so it works just like normal > Bio::Seq). > > > > I still think that maybe we should absolve ourselves from using > > SOURCE/ORGANISM or OS/OC information in GenBank files as anything > > more than strictly annotation, or reconstruct Bio::Species to maybe a > > Bio::Annotation::Species object to handle that annotation and either > > deprecate Bio::Species or separate it completely from any > > Bio::Taxonomy objects. It would really simplify things. Then, if > > anyone is interested in taxonomy, either install a local database or > > use Entrez efetch, and then use Bio::DB::Taxonomy (fixed of course) > > to grab the TaxID info. > > My personal view is that having it as an annotation would serve no real > purpose. For me the whole point of any kind of species representation in > bioperl is to allow you to compare species in a biologically meaningful > way. If it's just some annotation then that means it's basically > free-form text and you have no guarantee that two sequences from the > same species are annotated exactly the same - no guarantee that your > code would identify that those sequences are from the same species. > The only other useful thing that a species object needs to do it let you > know how related two different species are - you need to be able to ask > what a species' class, kingdom etc. are. Again, not viable with an > annotation - you need something strict like a properly constructed > Taxonomy. My point is, a large number of users do NOT use, nor care about, taxonomic information to the degree they need to know the entire classification of the organism; many are just as happy about getting the scientific name only, which is in the GenBank/EMBL file itself. To take one extreme, it is not productive to force every user to download the NCBI tax database and use lookups just to convert sequences from EMBL format to GenBank format. It's not productive to allow users to spam the NCBI tax database remotely either, so hardcoding lookups is, IMHO, a big mistake. > I guess it comes down to the philosophy of parsing a file. Do you try > and reflect exactly what the file contains, letter for letter, so that > your resulting object can recreate that file letter for letter, or do > you parse the file and extract the correct /meaning/ in order to be more > useful? > I think there can be a choice by the user, and this is best done by > making Bio::Species a clever wrapper around an improved Bio::Taxonomy, > as in my own implementation. I understand both philosophies, but the latter implies that you know the intention of the ones submitting the sequence. 99.9% of the time that's fine, something I can live with. However, when we mess up something as simple as getting the scientific name for an organism when the information is directly in the flat file (ORGANISM line) by trying to 'imply' what the classification is, yes, I get frustrated. Even more frustrating to me is that Bio::DB::Taxonomy, which should return accurate information directly from the Taxonomy database, still manages to screw up the scientific name. The NCBI definition in the sample record: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html state that the ORGANISM line contains the formal scientific name and it's lineage (no ranking). If the lineage is very long it is abbreviated so you don't get the same thing as you would through using TaxID. So, in essence, I believe you are correct, that Bio::Species can be used as a 'wrapper' for Bio::Taxonomy objects, but only up to a certain degree with caveats or warnings for possible inaccuracies. I also believe that lookups should be allowed but optional, not required (i.e. left up to the user, as you state). I just feel that it's somewhat misleading to imply, by delegating to Bio::Taxonomy, that Bio::Species contains accurate taxonomic information when NCBI themselves state that the GenBank flatfile classification can be incomplete and does not supply rankings (genus, species) in the file. It's our best guess in most cases, and a best guess by definition is not very accurate. If you want taxonomic accuracy, use the TaxID and a local tax database. I feel that we shouldn't punish those who don't worry/care about taxonomy by implementing Bio::Species with methods that mangle data that's directly in the flat file they're parsing. Okay, not to cut short this discussion, but I have to get back to $job. I'll try adding your fixes in a bit later today/tomorrow; if they pass tests I'll commit them in. Chris From hlapp at gmx.net Mon May 15 12:59:06 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 15 May 2006 12:59:06 -0400 Subject: [Bioperl-l] error loading uniprot release 49.6 into mysql In-Reply-To: <051520061234.14794.446875470003415A000039CA21602807489D0A02970E9DD29C@att.net> References: <051520061234.14794.446875470003415A000039CA21602807489D0A02970E9DD29C@att.net> Message-ID: You found the right instance. Unfortunately with the way the bioperl swissprot parser works the group (RG) isn't promoted to author if there is no author in addition (in fact you may debate whether that would even be the best way of doing things), so it doesn't find it on second occurrence by unique key. If you can live without this entry, or any other entry that causes a hiccup, just supply the flag --safe and it will gracefully move on to the next entry. Fixing the issue would require either to fix the bioperl swissprot parser (or Bio::Annotation::Reference) to stick the RG group into the author slot if there is no author, or to fix Bioperl Bio::Annotation::Reference to also feature a group and biosql to use it in place of a missing author. Actually there is $reference->rg. Maybe Bioperl-db (and hence Biosql) should just use that in place of a missing author? The downside is that upon round-tripping an entry, the RG annotation line will become an RA annotation line. How bad would that be? Any thoughts from anyone? -hilmar On May 15, 2006, at 8:34 AM, s.rayner at att.net wrote: > I found where the script is hiccuping.... > > The Uniprot release contains lines with identical annotation for > the RL keyword for two different sequences. > > ___________________ > > First occurence... > ___________________ > > ID 1433T_PONPY STANDARD; PRT; 245 AA. > AC Q5RFJ2; Q5RDK2; > DT 05-JUL-2005, integrated into UniProtKB/Swiss-Prot. > DT 05-JUL-2005, sequence version 2. > DT 18-APR-2006, entry version 13. > DE 14-3-3 protein theta. > GN Name=YWHAQ; > OS Pongo pygmaeus (Orangutan). > OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; > OC Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; > OC Catarrhini; Hominidae; Pongo. > OX NCBI_TaxID=9600; > RN [1] > RP NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA]. > RC TISSUE=Brain cortex, and Kidney; > RG The German cDNA consortium; > RL Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases. > <====== Not Unique > > > ___________________ > > Second occurence... > ___________________ > > > ID 1433G_PONPY STANDARD; PRT; 246 AA. > AC Q5RC20; > DT 05-JUL-2005, integrated into UniProtKB/Swiss-Prot. > DT 05-JUL-2005, sequence version 2. > DT 18-APR-2006, entry version 13. > DE 14-3-3 protein gamma. > GN Name=YWHAG; > OS Pongo pygmaeus (Orangutan). > OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; > OC Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; > OC Catarrhini; Hominidae; Pongo. > OX NCBI_TaxID=9600; > RN [1] > RP NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA]. > RC TISSUE=Heart; > RG The German cDNA consortium; > RL Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases. > <====== Not Unique > > > > in these two cases the generated CRC key is identical and so MySQL > throws a wobbly. > > if i look at the MySQL entry in the REFERENCE table for the first > sequence > ------+-------+---------+----------------------+ > | 139 | NULL | Submitted (NOV-2004) to the EMBL/ > GenBank/DDBJ databases. | NULL | NULL | CRC-E7973FEA4B5611DC | > +--------------+----------- > +---------------------------------------------------- > > and the error when the script choked was > > MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, > values were > ("","","Submitted (NOV-2004) to the EMBL/GenBank/DDBJ > databases.","CRC-E7973FEA4B5611DC","","","") FKs ( Duplicate entry 'CRC-E7973FEA4B5611DC' for key 3 > > hence the problem. > > I'm guessing i'm not the first person to encounter this, but dont > see any hints for an easy way around this. > > any suggestions....? > > ta > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Mon May 15 13:01:14 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 15 May 2006 13:01:14 -0400 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <4466AD7F.6050700@campus.iztacala.unam.mx> References: <4466AD7F.6050700@campus.iztacala.unam.mx> Message-ID: <068E49BD-2DE4-47BA-BD7C-D6FD487DF095@gmx.net> Hey, thanks to Laura & David for this interface. Any idea why most of the Bio::Ontology::* modules show up without their leading Bio::Ontology? And clicking on those hyperlinks doesn't go anywhere either ... Anything different with those modules that I can fix? -hilmar On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: > I'm glad to announce the availability of the Deobfuscator interface at > the BioPerl website. You can use it at the following URL: > > http://bioperl.org/cgi-bin/deob_interface.cgi > > Many thanks to Laura Kavanaugh and David Messina for this great > contribution to the BioPerl project! > > Mauricio. > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Mon May 15 13:22:13 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 12:22:13 -0500 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <068E49BD-2DE4-47BA-BD7C-D6FD487DF095@gmx.net> Message-ID: <000301c67844$1b506280$15327e82@pyrimidine> That's strange. Clicking on the list gives me the results for that module. When I click on the hyperlinks in the results section they open fine; the method column links opens a new page containing usage-function-returns-args and the class column links opens pdoc (same page) for bioperl-live. I'm using Firefox 1.5 on WinXP. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > Sent: Monday, May 15, 2006 12:01 PM > To: Mauricio Herrera Cuadra > Cc: bioperl-l > Subject: Re: [Bioperl-l] Deobfuscator interface now available > > Hey, thanks to Laura & David for this interface. > > Any idea why most of the Bio::Ontology::* modules show up without > their leading Bio::Ontology? And clicking on those hyperlinks doesn't > go anywhere either ... Anything different with those modules that I > can fix? > > -hilmar > > On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: > > > I'm glad to announce the availability of the Deobfuscator interface at > > the BioPerl website. You can use it at the following URL: > > > > http://bioperl.org/cgi-bin/deob_interface.cgi > > > > Many thanks to Laura Kavanaugh and David Messina for this great > > contribution to the BioPerl project! > > > > Mauricio. > > > > -- > > MAURICIO HERRERA CUADRA > > arareko at campus.iztacala.unam.mx > > Laboratorio de Gen?tica > > Unidad de Morfofisiolog?a y Funci?n > > Facultad de Estudios Superiores Iztacala, UNAM > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sb at mrc-dunn.cam.ac.uk Mon May 15 14:00:15 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Mon, 15 May 2006 19:00:15 +0100 Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names In-Reply-To: <001601c67839$cf289490$15327e82@pyrimidine> References: <001601c67839$cf289490$15327e82@pyrimidine> Message-ID: <4468C1AF.9080400@mrc-dunn.cam.ac.uk> Chris Fields wrote: > > Ah, now I see. That's a bit screwy, but it's not on our end so we have to > deal with it. I also noticed that subspecies also contains the entire > string: > > > 135461 > Bacillus subtilis subsp. subtilis > subspecies > Yes, this is one of the problems I mentioned in the first post to this thread. > As for the 'scientific_name' method when accessed through Bio::DB::Taxonomy, > I don't get the actual scientific name for the node (from the GenBank > ORGANISM line) almost every time; I get the name with the strain chopped off > instead and a number of times the names get mangled. [snip, should be:] > 224308 Bacillus subtilis subsp. subtilis str. 168 > 281309 Bacillus thuringiensis serovar konkukian str. 97-27 [snip, but Bio::DB::Taxonomy gives:] > 224308 subtilis Bacillus subtilis subsp. subtilis > 281309 Bacillus cereus group thuringiensis [snip] > So, in a nutshell, there's a problem here. I don't know if your fix works > for that, but I definitely don't think the 'scientific name' should be > assembled ad hoc but should be taken from the tagname for that node. Yes, my implementation will get you the correct answer, but not quite as you say. My solution was to munge the actual ScientificName but 'ensure' that the binomial would give you back the actual binomial name you wanted - which is the intent of current Bio::DB::Taxonomy code. my $species0 = TFBS::Species->new(-ncbi_taxid => 224308); my $leaf_node = $species0->taxonomy->get_leaves(); print "sci_name of Node = '", $leaf_node->scientific_name, "'\n"; print "Species0 subspecies = '", $species0->subspecies, "'\n"; print "Species0 variants = '", scalar($species0->variant), "'\n"; print "Species0 binomial = '", $species0->binomial('FULL'), "'\n"; gives: sci_name of Node = 'str. 168' Species0 subspecies = 'subsp. subtilis' Species0 variants = 'str. 168' Species0 binomial = 'Bacillus subtilis subsp. subtilis str. 168' and the same again for id 281309: sci_name of Node = 'str. 97-27' Species0 subspecies = '' Species0 variants = 'serovar konkukian str. 97-27' Species0 binomial = 'Bacillus thuringiensis serovar konkukian str. 97-27' I've done it this way because even though strictly speaking the ScientificName for 224308 (a 'no rank') is 'Bacillus subtilis subsp. subtilis str. 168', when I ask for the variant I don't want that whole string. I just want the bit that will be different when comparing other strains of this subspecies of this species of Bacillus. I want 'str. 168'. Note that my objects never store the original ScientificName; it is due to 'luck' (or as I like to think, a good implementation) that the binomial method is able to reconstruct a string that is identical to what the original ScientificName was. If you'd like to see my code let me know. You can't just drop the code snippet I posted in this thread into existing bioperl modules; quite a bit else has to change as well. I'll have to make an updated taxonomy_the_tfbs_way.tar.gz file available if you want an example implementation; the current version of that file is now out of date - it doesn't do any of what I describe above. From hlapp at gmx.net Mon May 15 14:08:49 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 15 May 2006 14:08:49 -0400 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <000301c67844$1b506280$15327e82@pyrimidine> References: <000301c67844$1b506280$15327e82@pyrimidine> Message-ID: Safari or Firefox on MacOSX don't do this. Note that the appearance in the browsable list is already different (the prefix is missing), and the JavaScript link also lacks the prefix in the module name in contrast to others, e.g., Bio::Ontology::Ontology (which is one of the few Bio::Ontology exceptions that do work and do display correctly). I suppose there is something peculiar about the code formatting of those modules? Some of the modules under Bio::OntologyIO are also affected BTW. What happens is after you click on the link the page apppears to reload (i.e., gets submitted) but the second table that is supposed open underneath the first doesn't appear. However, the sort-by drop down selector does appear. -hilmar On May 15, 2006, at 1:22 PM, Chris Fields wrote: > That's strange. Clicking on the list gives me the results for that > module. > When I click on the hyperlinks in the results section they open > fine; the > method column links opens a new page containing usage-function- > returns-args > and the class column links opens pdoc (same page) for bioperl- > live. I'm > using Firefox 1.5 on WinXP. > > Chris > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp >> Sent: Monday, May 15, 2006 12:01 PM >> To: Mauricio Herrera Cuadra >> Cc: bioperl-l >> Subject: Re: [Bioperl-l] Deobfuscator interface now available >> >> Hey, thanks to Laura & David for this interface. >> >> Any idea why most of the Bio::Ontology::* modules show up without >> their leading Bio::Ontology? And clicking on those hyperlinks doesn't >> go anywhere either ... Anything different with those modules that I >> can fix? >> >> -hilmar >> >> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: >> >>> I'm glad to announce the availability of the Deobfuscator >>> interface at >>> the BioPerl website. You can use it at the following URL: >>> >>> http://bioperl.org/cgi-bin/deob_interface.cgi >>> >>> Many thanks to Laura Kavanaugh and David Messina for this great >>> contribution to the BioPerl project! >>> >>> Mauricio. >>> >>> -- >>> MAURICIO HERRERA CUADRA >>> arareko at campus.iztacala.unam.mx >>> Laboratorio de Gen?tica >>> Unidad de Morfofisiolog?a y Funci?n >>> Facultad de Estudios Superiores Iztacala, UNAM >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Mon May 15 15:07:59 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 14:07:59 -0500 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: Message-ID: <000501c67852$e1bb55c0$15327e82@pyrimidine> I'll have to give it a try on Mac OS X (we have an ancient G4 in the lab which I can try it on). I'll let you know what I find. This is what I get when I do a search for 'Bio::Ont*' using Firefox on WinXP and this Deobfuscator link (http://bioperl.org/cgi-bin/deob_interface.cgi?); all the classes have links that work (I added newline and tab to make it a bit more readable) : Bio::OntologyIO Parser factory for Ontology formats Bio::OntologyIO::Handlers::BaseSAXHandler no short description available Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler no short description available Bio::Ontology::OntologyI Interface for an ontology implementation Bio::Ontology::TermFactory Instantiates a new Bio::Ontology::TermI (or derived class) through a factory Bio::Ontology::OntologyStore A repository of ontologies Bio::Ontology::RelationshipFactory Instantiates a new Bio::Ontology::RelationshipI (or derived class) through a factory Bio::Ontology::Ontology standard implementation of an Ontology So the names seem fine here. When I click on a class (Bio::Ontology::Ontology) I get in the results section: Method Class Returns Usage add_relationship Bio::Ontology::Ontology Its argument. add_relationship(RelationshipI relationship): RelationshipI add_relationship_type Bio::Ontology::OntologyEngineI not documented not documented add_term Bio::Ontology::Ontology its argument. add_term(TermI term): TermI ....and so on Where each method is clickable and opens a new page containing a table: Bio::Ontology::Ontology::add_relationship Usage add_relationship(RelationshipI relationship): RelationshipI Function Adds a relationship object to the ontology engine. Returns Its argument. Args A RelationshipI object. Each class is also linked to the bioperl-live PDOC. Clicking on class Bio::Ontology::Ontology in the results table gets me this page (no new page): http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html Chris > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Monday, May 15, 2006 1:09 PM > To: Chris Fields > Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > Subject: Re: [Bioperl-l] Deobfuscator interface now available > > Safari or Firefox on MacOSX don't do this. Note that the appearance > in the browsable list is already different (the prefix is missing), > and the JavaScript link also lacks the prefix in the module name in > contrast to others, e.g., Bio::Ontology::Ontology (which is one of > the few Bio::Ontology exceptions that do work and do display correctly). > > I suppose there is something peculiar about the code formatting of > those modules? Some of the modules under Bio::OntologyIO are also > affected BTW. > > What happens is after you click on the link the page apppears to > reload (i.e., gets submitted) but the second table that is supposed > open underneath the first doesn't appear. However, the sort-by drop > down selector does appear. > > -hilmar > > On May 15, 2006, at 1:22 PM, Chris Fields wrote: > > > That's strange. Clicking on the list gives me the results for that > > module. > > When I click on the hyperlinks in the results section they open > > fine; the > > method column links opens a new page containing usage-function- > > returns-args > > and the class column links opens pdoc (same page) for bioperl- > > live. I'm > > using Firefox 1.5 on WinXP. > > > > Chris > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > >> Sent: Monday, May 15, 2006 12:01 PM > >> To: Mauricio Herrera Cuadra > >> Cc: bioperl-l > >> Subject: Re: [Bioperl-l] Deobfuscator interface now available > >> > >> Hey, thanks to Laura & David for this interface. > >> > >> Any idea why most of the Bio::Ontology::* modules show up without > >> their leading Bio::Ontology? And clicking on those hyperlinks doesn't > >> go anywhere either ... Anything different with those modules that I > >> can fix? > >> > >> -hilmar > >> > >> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: > >> > >>> I'm glad to announce the availability of the Deobfuscator > >>> interface at > >>> the BioPerl website. You can use it at the following URL: > >>> > >>> http://bioperl.org/cgi-bin/deob_interface.cgi > >>> > >>> Many thanks to Laura Kavanaugh and David Messina for this great > >>> contribution to the BioPerl project! > >>> > >>> Mauricio. > >>> > >>> -- > >>> MAURICIO HERRERA CUADRA > >>> arareko at campus.iztacala.unam.mx > >>> Laboratorio de Gen?tica > >>> Unidad de Morfofisiolog?a y Funci?n > >>> Facultad de Estudios Superiores Iztacala, UNAM > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > >> -- > >> =========================================================== > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >> =========================================================== > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From cjfields at uiuc.edu Mon May 15 15:12:34 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 14:12:34 -0500 Subject: [Bioperl-l] Deobfuscator interface now available Message-ID: <000601c67853$85d49cc0$15327e82@pyrimidine> I just tried the same thing (links, search, etc) with Mac OS X v 10.3.9 and Safari (no Firefox sorry) and it worked fine as well (all links, no missing Bio::Ontology, etc). Not sure what it could be... Chris > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > Sent: Monday, May 15, 2006 2:08 PM > To: 'Hilmar Lapp' > Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > Subject: RE: [Bioperl-l] Deobfuscator interface now available > > I'll have to give it a try on Mac OS X (we have an ancient G4 in the lab > which I can try it on). I'll let you know what I find. > > This is what I get when I do a search for 'Bio::Ont*' using Firefox on > WinXP and this Deobfuscator link (http://bioperl.org/cgi- > bin/deob_interface.cgi?); all the classes have links that work (I added > newline and tab to make it a bit more readable) : > > Bio::OntologyIO > Parser factory for Ontology formats > Bio::OntologyIO::Handlers::BaseSAXHandler > no short description available > Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler > no short description available > Bio::Ontology::OntologyI > Interface for an ontology implementation > Bio::Ontology::TermFactory > Instantiates a new Bio::Ontology::TermI (or derived class) through a > factory > Bio::Ontology::OntologyStore > A repository of ontologies > Bio::Ontology::RelationshipFactory > Instantiates a new Bio::Ontology::RelationshipI (or derived class) > through a factory > Bio::Ontology::Ontology > standard implementation of an Ontology > > So the names seem fine here. > > When I click on a class (Bio::Ontology::Ontology) I get in the results > section: > > Method Class Returns > Usage > add_relationship Bio::Ontology::Ontology Its > argument. add_relationship(RelationshipI relationship): RelationshipI > add_relationship_type Bio::Ontology::OntologyEngineI not > documented not documented > add_term Bio::Ontology::Ontology its > argument. add_term(TermI term): TermI > > ....and so on > > Where each method is clickable and opens a new page containing a table: > > Bio::Ontology::Ontology::add_relationship > Usage add_relationship(RelationshipI relationship): RelationshipI > Function Adds a relationship object to the ontology engine. > Returns Its argument. > Args A RelationshipI object. > > > Each class is also linked to the bioperl-live PDOC. Clicking on class > Bio::Ontology::Ontology in the results table gets me this page (no new > page): > > http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html > > > Chris > > > -----Original Message----- > > From: Hilmar Lapp [mailto:hlapp at gmx.net] > > Sent: Monday, May 15, 2006 1:09 PM > > To: Chris Fields > > Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > > Subject: Re: [Bioperl-l] Deobfuscator interface now available > > > > Safari or Firefox on MacOSX don't do this. Note that the appearance > > in the browsable list is already different (the prefix is missing), > > and the JavaScript link also lacks the prefix in the module name in > > contrast to others, e.g., Bio::Ontology::Ontology (which is one of > > the few Bio::Ontology exceptions that do work and do display correctly). > > > > I suppose there is something peculiar about the code formatting of > > those modules? Some of the modules under Bio::OntologyIO are also > > affected BTW. > > > > What happens is after you click on the link the page apppears to > > reload (i.e., gets submitted) but the second table that is supposed > > open underneath the first doesn't appear. However, the sort-by drop > > down selector does appear. > > > > -hilmar > > > > On May 15, 2006, at 1:22 PM, Chris Fields wrote: > > > > > That's strange. Clicking on the list gives me the results for that > > > module. > > > When I click on the hyperlinks in the results section they open > > > fine; the > > > method column links opens a new page containing usage-function- > > > returns-args > > > and the class column links opens pdoc (same page) for bioperl- > > > live. I'm > > > using Firefox 1.5 on WinXP. > > > > > > Chris > > > > > >> -----Original Message----- > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > > >> Sent: Monday, May 15, 2006 12:01 PM > > >> To: Mauricio Herrera Cuadra > > >> Cc: bioperl-l > > >> Subject: Re: [Bioperl-l] Deobfuscator interface now available > > >> > > >> Hey, thanks to Laura & David for this interface. > > >> > > >> Any idea why most of the Bio::Ontology::* modules show up without > > >> their leading Bio::Ontology? And clicking on those hyperlinks doesn't > > >> go anywhere either ... Anything different with those modules that I > > >> can fix? > > >> > > >> -hilmar > > >> > > >> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: > > >> > > >>> I'm glad to announce the availability of the Deobfuscator > > >>> interface at > > >>> the BioPerl website. You can use it at the following URL: > > >>> > > >>> http://bioperl.org/cgi-bin/deob_interface.cgi > > >>> > > >>> Many thanks to Laura Kavanaugh and David Messina for this great > > >>> contribution to the BioPerl project! > > >>> > > >>> Mauricio. > > >>> > > >>> -- > > >>> MAURICIO HERRERA CUADRA > > >>> arareko at campus.iztacala.unam.mx > > >>> Laboratorio de Gen?tica > > >>> Unidad de Morfofisiolog?a y Funci?n > > >>> Facultad de Estudios Superiores Iztacala, UNAM > > >>> > > >>> _______________________________________________ > > >>> Bioperl-l mailing list > > >>> Bioperl-l at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>> > > >> > > >> -- > > >> =========================================================== > > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > >> =========================================================== > > >> > > >> > > >> > > >> > > >> > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > -- > > =========================================================== > > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > =========================================================== > > > > > > From arareko at campus.iztacala.unam.mx Mon May 15 15:20:10 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Mon, 15 May 2006 14:20:10 -0500 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <000901c67827$d99eabb0$15327e82@pyrimidine> References: <000901c67827$d99eabb0$15327e82@pyrimidine> Message-ID: <4468D46A.8070203@campus.iztacala.unam.mx> Laura and Dave would be very happy to see all of your comments/suggestions/enhancements/complaints summarized in the appropriate wiki page. Just be sure to sign them properly with your name and date: http://bioperl.org/wiki/Deobfuscator I think they'll have to discuss which features will be nice to implement and which don't, depending on the direction they want their project to go. But don't worry, they're extremely nice people who are open to all kind of ideas. The best of all: the Deobfuscator is open-source so everyone is invited to contribute to it, just ask them for the code :) On my side, I'm working on tweaking the code so it would be able of browsing different BioPerl packages (core, run, ext) and their respective releases (stable, developer, cvs). Regards, Mauricio. Chris Fields wrote: >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Sendu Bala >> Sent: Monday, May 15, 2006 8:09 AM >> To: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Deobfuscator interface now available >> >> Amir Karger wrote: >>> This tool is quite nice, and may save me a lot of perdoc'ing. >> Yes, many thanks to everyone involved. > > The Deobfuscator currently indexes bioperl-1.4, so it's not completely > up-to-date. I believe Mauricio and Dave may be working on updating to the > newer versions and maybe bioperl-live, as well as getting the other bioperl > packages up and running. > > For modules added after v1.4 I use the script in the FAQ question mentioned > on the Deobfuscator wiki page to get up-to-date methods, then grab the that > ActiveState HTML'd perldocs pumped out when installing using PPM (I make a > custom PPM/PPD file and install myself every once in a while): > > #!/usr/bin/perl -w > use Class::Inspector; > $class = shift || die "Usage: methods perl_class_name\n"; > eval "require $class"; > print join ("\n", sort @{Class::Inspector- > >>> A couple of minor interface thoughts. >>> >>> 1)There's quite a lot of methods for many of the classes. As such, I >>> think I'll often want to browse through what's available in a class. But >>> 60% or so of the screen real estate is used for "Enter a search >>> string... OR select a class from the list". IMO, it would be better to >>> have two pages, a search page and a result page. It only takes a click >>> on Back (or a "new search" button) to get to a new search, and now you >>> can use your whole screen for reading your results. >> As the compromise it must be, I like the way it behaves. I don't like >> lots of windows. I especially don't like pop up windows. Right now when >> I'm using the bioperl docs I tend to have a whole bunch of tabs open to >> different class pages at once, so being able to see an overview all on >> one page in Deobfuscator is very nice. >> >> Further to that, I'd love it if clicking on a method name caused an >> in-place css(&|javascript) reveal (similar to how a well implemented >> drop down menu works in a website) rather than a new window opened. >> Alternatively, just have more columns in the results table, ie. usage, >> function, returns, args columns. I feel that opening a window for each >> method you want to understand is far too slow. > > Agreed. > >> I'd also really like a link to the code for the method as well. The >> bioperl docs are rarely complete enough that you can really understand >> what every method is supposed to do without looking at the code. > > The methods that pop up are in columns along with the class module that > implements the method. > > > If you click on that link you get PDOC documentation for the module which > includes most of the code (strangely, though Deobfuscator indexes bioperl > 1.4, the PDOC corresponds to bioperl-live). Is that what you meant, or > something a bit more detailed? > >>> 3) Minimalist is nice, but documentation is even nicer. It wasn't clear >>> to me that the search searches within class names rather than function >>> names. What I really want to know sometimes is which module has, say, >>> the revcom method in it. > > That's listed in the method results table (the next column has the module > with a link to the module's online docs). > > > Chris > > >> This would be a great feature to add. >> >> >> Another minor interface thought: >> 6) Have a little more cell padding in all the tables. Things are just a >> little too cramped and things start to look messy/ run into each other. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From hlapp at gmx.net Mon May 15 15:23:55 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 15 May 2006 15:23:55 -0400 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <000501c67852$e1bb55c0$15327e82@pyrimidine> References: <000501c67852$e1bb55c0$15327e82@pyrimidine> Message-ID: <57326DCD-D72B-4CED-801D-9E25609BF57C@gmx.net> I wasn't using the search. It's in the scrollable table for browsing. -hilmar On May 15, 2006, at 3:07 PM, Chris Fields wrote: > I'll have to give it a try on Mac OS X (we have an ancient G4 in > the lab > which I can try it on). I'll let you know what I find. > > This is what I get when I do a search for 'Bio::Ont*' using Firefox > on WinXP > and this Deobfuscator link (http://bioperl.org/cgi-bin/ > deob_interface.cgi?); > all the classes have links that work (I added newline and tab to > make it a > bit more readable) : > > Bio::OntologyIO > Parser factory for Ontology formats > Bio::OntologyIO::Handlers::BaseSAXHandler > no short description available > Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler > no short description available > Bio::Ontology::OntologyI > Interface for an ontology implementation > Bio::Ontology::TermFactory > Instantiates a new Bio::Ontology::TermI (or derived class) through a > factory > Bio::Ontology::OntologyStore > A repository of ontologies > Bio::Ontology::RelationshipFactory > Instantiates a new Bio::Ontology::RelationshipI (or derived class) > through a factory > Bio::Ontology::Ontology > standard implementation of an Ontology > > So the names seem fine here. > > When I click on a class (Bio::Ontology::Ontology) I get in the results > section: > > Method Class > Returns > Usage > add_relationship Bio::Ontology::Ontology Its > argument. add_relationship(RelationshipI relationship): > RelationshipI > add_relationship_type Bio::Ontology::OntologyEngineI not > documented not documented > add_term Bio::Ontology::Ontology its > argument. add_term(TermI term): TermI > > ....and so on > > Where each method is clickable and opens a new page containing a > table: > > Bio::Ontology::Ontology::add_relationship > Usage add_relationship(RelationshipI relationship): RelationshipI > Function Adds a relationship object to the ontology engine. > Returns Its argument. > Args A RelationshipI object. > > > Each class is also linked to the bioperl-live PDOC. Clicking on class > Bio::Ontology::Ontology in the results table gets me this page (no new > page): > > http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html > > > Chris > >> -----Original Message----- >> From: Hilmar Lapp [mailto:hlapp at gmx.net] >> Sent: Monday, May 15, 2006 1:09 PM >> To: Chris Fields >> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' >> Subject: Re: [Bioperl-l] Deobfuscator interface now available >> >> Safari or Firefox on MacOSX don't do this. Note that the appearance >> in the browsable list is already different (the prefix is missing), >> and the JavaScript link also lacks the prefix in the module name in >> contrast to others, e.g., Bio::Ontology::Ontology (which is one of >> the few Bio::Ontology exceptions that do work and do display >> correctly). >> >> I suppose there is something peculiar about the code formatting of >> those modules? Some of the modules under Bio::OntologyIO are also >> affected BTW. >> >> What happens is after you click on the link the page apppears to >> reload (i.e., gets submitted) but the second table that is supposed >> open underneath the first doesn't appear. However, the sort-by drop >> down selector does appear. >> >> -hilmar >> >> On May 15, 2006, at 1:22 PM, Chris Fields wrote: >> >>> That's strange. Clicking on the list gives me the results for that >>> module. >>> When I click on the hyperlinks in the results section they open >>> fine; the >>> method column links opens a new page containing usage-function- >>> returns-args >>> and the class column links opens pdoc (same page) for bioperl- >>> live. I'm >>> using Firefox 1.5 on WinXP. >>> >>> Chris >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp >>>> Sent: Monday, May 15, 2006 12:01 PM >>>> To: Mauricio Herrera Cuadra >>>> Cc: bioperl-l >>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available >>>> >>>> Hey, thanks to Laura & David for this interface. >>>> >>>> Any idea why most of the Bio::Ontology::* modules show up without >>>> their leading Bio::Ontology? And clicking on those hyperlinks >>>> doesn't >>>> go anywhere either ... Anything different with those modules that I >>>> can fix? >>>> >>>> -hilmar >>>> >>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: >>>> >>>>> I'm glad to announce the availability of the Deobfuscator >>>>> interface at >>>>> the BioPerl website. You can use it at the following URL: >>>>> >>>>> http://bioperl.org/cgi-bin/deob_interface.cgi >>>>> >>>>> Many thanks to Laura Kavanaugh and David Messina for this great >>>>> contribution to the BioPerl project! >>>>> >>>>> Mauricio. >>>>> >>>>> -- >>>>> MAURICIO HERRERA CUADRA >>>>> arareko at campus.iztacala.unam.mx >>>>> Laboratorio de Gen?tica >>>>> Unidad de Morfofisiolog?a y Funci?n >>>>> Facultad de Estudios Superiores Iztacala, UNAM >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> -- >>>> =========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>> =========================================================== >>>> >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From ClarkeW at AGR.GC.CA Mon May 15 15:40:15 2006 From: ClarkeW at AGR.GC.CA (Clarke, Wayne) Date: Mon, 15 May 2006 15:40:15 -0400 Subject: [Bioperl-l] Memory Leak in Bio::SearchIO Message-ID: <320530F83FA47047823E57F110DDEAADB159EC@onncrxms4.agr.gc.ca> Hey everyone, I have been developing some code to download and parse blast reports from a remote server using Soap::Lite as well as insert the results into a mysql database. The problem I am having is that my program seems to be taking up and huge amount of RAM. For a single job of 10000 queries it can consume as much as a couple hundred Mb inside an hour. I realize that a lot of work is being done but this seems like way too much. This leads me to the subject of my post. I think I may have traced the source of the memory leak to Bio::SearchIO. I have used Devel::Size to track the size of my variables and done other debugging steps and have had no luck with resolving this very frustrating problem. My code is as follows: my $result = $connector->getQueryResult($query_id); my $FH; open $FH, "<", \$result; my $searchio = new Bio::SearchIO(-format => "blast", -fh => $FH); while (my $o_blast = $searchio->next_result()) { my $clone_id = $o_blast->query_name(); my $statement = $bdbi->form_push_SQL ($o_blast, $clone_id, 5); this is just the leading and tailing code surrounding the use of Bio::SearchIO since there is quite a lot. I am mostly just wondering if anyone has ever had problems with SearchIO and its memory usage. I looked at the source code for it but am afraid it is out of my league. Any help/suggestions/questions would be great. Thanks From dmessina at wustl.edu Mon May 15 15:34:10 2006 From: dmessina at wustl.edu (David Messina) Date: Mon, 15 May 2006 14:34:10 -0500 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <000901c67827$d99eabb0$15327e82@pyrimidine> References: <000901c67827$d99eabb0$15327e82@pyrimidine> Message-ID: Responding to: >>> Amir Karger >> Sendu Bala > Chris Fields > The Deobfuscator currently indexes bioperl-1.4, so it's not completely > up-to-date. I believe Mauricio and Dave may be working on updating > to the > newer versions and maybe bioperl-live, as well as getting the other > bioperl > packages up and running. That's correct -- Mauricio is currently working on a version that will allow you to search 1.4, 1.5.1, or bioperl-live. The Deobfuscator indexes will be updated (daily?) to keep them in sync with the CVS repository. >>> A couple of minor interface thoughts. >>> >>> 1)There's quite a lot of methods for many of the classes. As such, I >>> think I'll often want to browse through what's available in a >>> class. But >>> 60% or so of the screen real estate is used for "Enter a search >>> string... OR select a class from the list". IMO, it would be >>> better to >>> have two pages, a search page and a result page. It only takes >>> a click >>> on Back (or a "new search" button) to get to a new search, and >>> now you >>> can use your whole screen for reading your results. >> >> As the compromise it must be, I like the way it behaves. I don't like >> lots of windows. I especially don't like pop up windows. Right now >> when >> I'm using the bioperl docs I tend to have a whole bunch of tabs >> open to >> different class pages at once, so being able to see an overview >> all on >> one page in Deobfuscator is very nice. I think the current behavior makes sense as the default, but I like the idea of being able to view the search results in a separate window for easier browsing. Thanks for the suggestion; I'll add it to the list. >> Further to that, I'd love it if clicking on a method name caused an >> in-place css(&|javascript) reveal (similar to how a well implemented >> drop down menu works in a website) rather than a new window opened. >> Alternatively, just have more columns in the results table, ie. >> usage, >> function, returns, args columns. I feel that opening a window for >> each >> method you want to understand is far too slow. > > Agreed. Yeah, the way it currently works is admittedly lame, and was done as a placeholder until we figured out a better way to do it. An in-place reveal sounds like a good solution. >>> 2) Please sort the "select a class from the list" alphabetically. I >>> guess I can enter a search term to get the right classes, but it >>> would >>> be nice to be able to browse. Agreed. I think we were doing this in an earlier test version, but I must have left it out of the release I handed off to Mauricio. >>> 3) Minimalist is nice, but documentation is even nicer. It wasn't >>> clear >>> to me that the search searches within class names rather than >>> function >>> names. What I really want to know sometimes is which module has, >>> say, >>> the revcom method in it. >> >> This would be a great feature to add. That's a great idea. >>> 4) When I search for something that's not found, I get a screen that >>> looks pretty familiar, with the extra text "No match to string >>> found" >>> down at the bottom. It took me a while to even notice it. >>> (Studies show >>> that most users don't read most of the text on a page.) Bold >>> might be >>> nice here. Or put the error at the top of the screen. Or both. Added to the list. >>> 5) I'll save my stupidest comment for last - please make the page >>> title >>> "Bioperl Deobfuscator", so that when I bookmark it I'll know what >>> the >>> bookmark stands for. Added to the list. Not stupid, by the way -- much to my surprise, there are at least 2 or 3 other (obviously inferior :) ) deobfuscators floating around out there. >> Another minor interface thought: >> 6) Have a little more cell padding in all the tables. Things are >> just a >> little too cramped and things start to look messy/ run into each >> other. Added to the list. Thanks to all of you for taking the time to give such detailed feedback -- it's really helpful. There is a wiki page on the BioPerl site for this project (http:// www.bioperl.org/wiki/Deobfuscator), so I'll be putting your comments there for tracking and further discussion. Please feel free to add to it. Dave -- Dave Messina WashU Genome Sequencing Center dmessina at wustl.edu 314-286-1825 From faruque at ebi.ac.uk Mon May 15 15:47:27 2006 From: faruque at ebi.ac.uk (Nadeem Faruque) Date: Mon, 15 May 2006 20:47:27 +0100 Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names Message-ID: <809AE0C7-A9B6-48A4-BF11-BA392C770CA9@ebi.ac.uk> >> My personal view is that having it as an annotation would serve no >> real >> purpose. For me the whole point of any kind of species >> representation in >> bioperl is to allow you to compare species in a biologically >> meaningful >> way. If it's just some annotation then that means it's basically I understand the need to find the species name of entries, especially now that so many complete genomes have been given their own strain- specific tax nodes, and I also think it is a shame that the ncbi tax dump does not give a rank to entries such as these (they cannot easily be distinguished from unofficial ranks higher in the tree without ascending the tree). Would it be useful for the species name to be included within EMBL file headers, eg in a line called OB (OB is a terrible suggestion based on 'Organism Binomial' since OS is already in use)? eg two examples of the species 'Apple stem grooving virus', where the second one would appear to be a different species without delving into the tax tree or the inclusion of an OB line. AC D14995; S47260; DE Apple stem grooving virus genome, complete sequence. OS Apple stem grooving virus OB Apple stem grooving virus OC Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae; OC Capillovirus. AC AY646511; DE Citrus tatter leaf virus strain Kumquat 1, complete genome. OS Citrus tatter leaf virus OB Apple stem grooving virus OC Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae; OC Capillovirus. > My point is, a large number of users do NOT use, nor care about, > taxonomic > information to the degree they need to know the entire > classification of the > organism; many are just as happy about getting the scientific name > only, > which is in the GenBank/EMBL file itself. To take one extreme, it > is not > productive to force every user to download the NCBI tax database > and use > lookups just to convert sequences from EMBL format to GenBank > format. It's > not productive to allow users to spam the NCBI tax database > remotely either, > so hardcoding lookups is, IMHO, a big mistake. I don't think you need to add any information to turn an embl-format file into a Genbank flatfile, but maybe I'm missing something obvious. Nadeem -- Dr S.M. Nadeem N. Faruque 9 Barley Court Saffron Walden Essex CB11 3HG 01799 500 120 From dmessina at wustl.edu Mon May 15 16:12:48 2006 From: dmessina at wustl.edu (David Messina) Date: Mon, 15 May 2006 15:12:48 -0500 Subject: [Bioperl-l] Deobfuscator interface now available Message-ID: <5A2309FD-8C6E-4349-99CC-B3EDA8B2F499@wustl.edu> On May 15, 2006, at 2:23 PM, Hilmar Lapp wrote: > I wasn't using the search. It's in the scrollable table for browsing. > -hilmar I'm seeing this too on OS X with Safari 2.0.3. If you type 'goflat' (without the quotes) into the search box, you'll see the behavior. Chris, can you try it again this way just to confirm it's an OS/browser-specific thing? Not sure what's going on, Hilmar -- I'll take a look. Dave From cjfields at uiuc.edu Mon May 15 16:56:29 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 15:56:29 -0500 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <57326DCD-D72B-4CED-801D-9E25609BF57C@gmx.net> Message-ID: <000a01c67862$0a00cab0$15327e82@pyrimidine> Okay, I see what you mean. Using the search term "Bio::Ont*" also explains why I didn't see it ;P. Yeah, the bug shows up here too (WinXP and Mac OS X), and those links are broken like you said. Could be something to do with indexing. Using the methods script in the FAQ (http://www.bioperl.org/wiki/FAQ#Why_can.27t_I_easily_get_a_list_of_all_the_ methods_a_object_can_call.3F) I get this: C:\Perl\Scripts>methods.pl Bio::OntologyIO::simplehierarchy Bio::OntologyIO::simplehierarchy::Dumper Bio::OntologyIO::simplehierarchy::basename Bio::OntologyIO::simplehierarchy::dirname Bio::OntologyIO::simplehierarchy::fileparse Bio::OntologyIO::simplehierarchy::fileparse_set_fstype Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > Sent: Monday, May 15, 2006 2:24 PM > To: Chris Fields > Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > Subject: Re: [Bioperl-l] Deobfuscator interface now available > > I wasn't using the search. It's in the scrollable table for browsing. > -hilmar > > On May 15, 2006, at 3:07 PM, Chris Fields wrote: > > > I'll have to give it a try on Mac OS X (we have an ancient G4 in > > the lab > > which I can try it on). I'll let you know what I find. > > > > This is what I get when I do a search for 'Bio::Ont*' using Firefox > > on WinXP > > and this Deobfuscator link (http://bioperl.org/cgi-bin/ > > deob_interface.cgi?); > > all the classes have links that work (I added newline and tab to > > make it a > > bit more readable) : > > > > Bio::OntologyIO > > Parser factory for Ontology formats > > Bio::OntologyIO::Handlers::BaseSAXHandler > > no short description available > > Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler > > no short description available > > Bio::Ontology::OntologyI > > Interface for an ontology implementation > > Bio::Ontology::TermFactory > > Instantiates a new Bio::Ontology::TermI (or derived class) through a > > factory > > Bio::Ontology::OntologyStore > > A repository of ontologies > > Bio::Ontology::RelationshipFactory > > Instantiates a new Bio::Ontology::RelationshipI (or derived class) > > through a factory > > Bio::Ontology::Ontology > > standard implementation of an Ontology > > > > So the names seem fine here. > > > > When I click on a class (Bio::Ontology::Ontology) I get in the results > > section: > > > > Method Class > > Returns > > Usage > > add_relationship Bio::Ontology::Ontology > Its > > argument. add_relationship(RelationshipI relationship): > > RelationshipI > > add_relationship_type Bio::Ontology::OntologyEngineI not > > documented not documented > > add_term Bio::Ontology::Ontology its > > argument. add_term(TermI term): TermI > > > > ....and so on > > > > Where each method is clickable and opens a new page containing a > > table: > > > > Bio::Ontology::Ontology::add_relationship > > Usage add_relationship(RelationshipI relationship): RelationshipI > > Function Adds a relationship object to the ontology engine. > > Returns Its argument. > > Args A RelationshipI object. > > > > > > Each class is also linked to the bioperl-live PDOC. Clicking on class > > Bio::Ontology::Ontology in the results table gets me this page (no new > > page): > > > > http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html > > > > > > Chris > > > >> -----Original Message----- > >> From: Hilmar Lapp [mailto:hlapp at gmx.net] > >> Sent: Monday, May 15, 2006 1:09 PM > >> To: Chris Fields > >> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > >> Subject: Re: [Bioperl-l] Deobfuscator interface now available > >> > >> Safari or Firefox on MacOSX don't do this. Note that the appearance > >> in the browsable list is already different (the prefix is missing), > >> and the JavaScript link also lacks the prefix in the module name in > >> contrast to others, e.g., Bio::Ontology::Ontology (which is one of > >> the few Bio::Ontology exceptions that do work and do display > >> correctly). > >> > >> I suppose there is something peculiar about the code formatting of > >> those modules? Some of the modules under Bio::OntologyIO are also > >> affected BTW. > >> > >> What happens is after you click on the link the page apppears to > >> reload (i.e., gets submitted) but the second table that is supposed > >> open underneath the first doesn't appear. However, the sort-by drop > >> down selector does appear. > >> > >> -hilmar > >> > >> On May 15, 2006, at 1:22 PM, Chris Fields wrote: > >> > >>> That's strange. Clicking on the list gives me the results for that > >>> module. > >>> When I click on the hyperlinks in the results section they open > >>> fine; the > >>> method column links opens a new page containing usage-function- > >>> returns-args > >>> and the class column links opens pdoc (same page) for bioperl- > >>> live. I'm > >>> using Firefox 1.5 on WinXP. > >>> > >>> Chris > >>> > >>>> -----Original Message----- > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > >>>> Sent: Monday, May 15, 2006 12:01 PM > >>>> To: Mauricio Herrera Cuadra > >>>> Cc: bioperl-l > >>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available > >>>> > >>>> Hey, thanks to Laura & David for this interface. > >>>> > >>>> Any idea why most of the Bio::Ontology::* modules show up without > >>>> their leading Bio::Ontology? And clicking on those hyperlinks > >>>> doesn't > >>>> go anywhere either ... Anything different with those modules that I > >>>> can fix? > >>>> > >>>> -hilmar > >>>> > >>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: > >>>> > >>>>> I'm glad to announce the availability of the Deobfuscator > >>>>> interface at > >>>>> the BioPerl website. You can use it at the following URL: > >>>>> > >>>>> http://bioperl.org/cgi-bin/deob_interface.cgi > >>>>> > >>>>> Many thanks to Laura Kavanaugh and David Messina for this great > >>>>> contribution to the BioPerl project! > >>>>> > >>>>> Mauricio. > >>>>> > >>>>> -- > >>>>> MAURICIO HERRERA CUADRA > >>>>> arareko at campus.iztacala.unam.mx > >>>>> Laboratorio de Gen?tica > >>>>> Unidad de Morfofisiolog?a y Funci?n > >>>>> Facultad de Estudios Superiores Iztacala, UNAM > >>>>> > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>> > >>>> -- > >>>> =========================================================== > >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >>>> =========================================================== > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > >> -- > >> =========================================================== > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >> =========================================================== > >> > >> > >> > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Mon May 15 17:29:14 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 16:29:14 -0500 Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names In-Reply-To: <809AE0C7-A9B6-48A4-BF11-BA392C770CA9@ebi.ac.uk> Message-ID: <000b01c67866$9dac2620$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Nadeem Faruque > Sent: Monday, May 15, 2006 2:47 PM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::DB::Taxonomy:: mishandles > species,subspecies/variant names > > >> My personal view is that having it as an annotation would serve no > >> real > >> purpose. For me the whole point of any kind of species > >> representation in > >> bioperl is to allow you to compare species in a biologically > >> meaningful > >> way. If it's just some annotation then that means it's basically > > I understand the need to find the species name of entries, especially > now that so many complete genomes have been given their own strain- > specific tax nodes, and I also think it is a shame that the ncbi tax > dump does not give a rank to entries such as these (they cannot > easily be distinguished from unofficial ranks higher in the tree > without ascending the tree). > Would it be useful for the species name to be included within EMBL > file headers, eg in a line called OB (OB is a terrible suggestion > based on 'Organism Binomial' since OS is already in use)? > > eg two examples of the species 'Apple stem grooving virus', where the > second one would appear to be a different species without delving > into the tax tree or the inclusion of an OB line. > > AC D14995; S47260; > DE Apple stem grooving virus genome, complete sequence. > OS Apple stem grooving virus > OB Apple stem grooving virus > OC Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae; > OC Capillovirus. > > AC AY646511; > DE Citrus tatter leaf virus strain Kumquat 1, complete genome. > OS Citrus tatter leaf virus > OB Apple stem grooving virus > OC Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae; > OC Capillovirus. Jason also mentions a few examples (see below). The problem lies in the fact that EMBL and GenBank flatfiles do not give hierarchy ranking for taxonomy, so it's a best guess. What I'm seeing is that the guess is wrong more often than not when it comes to complex scientific names (viruses, bacteria, etc). Notice the doubling of the strain in the following GenBank files passed through SeqIO (genbank->genbank conversion, BTW; haven't tried EMBL): SOURCE Azoarcus sp. EbN1 EbN1 ORGANISM Azoarcus sp. Bacteria; Proteobacteria; Betaproteobacteria; Rhodocyclales; Rhodocyclaceae; Azoarcus. SOURCE Mycobacterium sp. KMS KMS ORGANISM Mycobacterium sp. Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales; Corynebacterineae; Mycobacteriaceae; Mycobacterium. SOURCE Mycobacterium tuberculosis C C ORGANISM Mycobacterium tuberculosis Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales; Corynebacterineae; Mycobacteriaceae; Mycobacterium; Mycobacterium; tuberculosis complex; Mycobacterium. SOURCE Bacillus subtilis subsp. subtilis str. 168 subtilis str. 168 ORGANISM Bacillus subtilis subsp. Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus. Here are Jason's examples, for posterity: Can you guess what value is the strain versus sub-species? What happens when there is a two part strain name (space separated) and a sub-species or variety designation? SOURCE Staphylococcus haemolyticus JCSC1435 ORGANISM Staphylococcus haemolyticus JCSC1435 Bacteria; Firmicutes; Bacillales; Staphylococcus. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=279808 strain is JCSC1435 versus SOURCE Muntiacus muntjak vaginalis ORGANISM Muntiacus muntjak vaginalis Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Ruminantia; Pecora; Cervidae; Muntiacinae; Muntiacus. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9887 species is muntjak, sub-species vaginalis ? versus SOURCE Aspergillus nidulans FGSC A4 ORGANISM Aspergillus nidulans FGSC A4 Eukaryota; Fungi; Ascomycota; Pezizomycotina; Eurotiomycetes; Eurotiales; Trichocomaceae; Emericella. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=227321 Genus should be Aspergillus or Emericella ? Strain and subspecies/variety in the same entry SOURCE Cryptococcus neoformans var. grubii H99 ORGANISM Cryptococcus neoformans var. grubii H99 Eukaryota; Fungi; Basidiomycota; Hymenomycetes; Heterobasidiomycetes; Tremellomycetidae; Tremellales; Tremellaceae; Filobasidiella. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=235443 > > My point is, a large number of users do NOT use, nor care about, > > taxonomic > > information to the degree they need to know the entire > > classification of the > > organism; many are just as happy about getting the scientific name > > only, > > which is in the GenBank/EMBL file itself. To take one extreme, it > > is not > > productive to force every user to download the NCBI tax database > > and use > > lookups just to convert sequences from EMBL format to GenBank > > format. It's > > not productive to allow users to spam the NCBI tax database > > remotely either, > > so hardcoding lookups is, IMHO, a big mistake. > > I don't think you need to add any information to turn an embl-format > file into a Genbank flatfile, but maybe I'm missing something obvious. The issue is the way the SOURCE and ORGANISM lines are handled (OS/OC lines in EMBL, I believe), which is using a Bio::Species object. The problem is, like I mentioned above, no hierarchal ranking is in the flat file, just the order of the ranking. We can try to make a best guess based on that but it's obviously very tricky, particularly when dealing with subspecies, strains, etc. NCBI also states that many times the classification can be too long for a file so may be incomplete (I think they leave out nodes which have 'no rank' tags, but I can't be completely sure), so there's another issue. Anyway, this is where the lookup would come in, which would require a local taxonomy database (we can't spam the NCBI remote database, that would just be rude) which would give the complete taxonomic classification if it worked properly. So now we have three possible situations: 1) One extreme : We require a lookup to get it right (which, BTW, it currently doesn't); this by default requires a local database. 2) Middle of the road : we try and guess the information as best as we can with the information given (the current situation); this is breaking more and more often now, so is becoming more unreliable. 3) Other extreme : we punt and absolve ourselves of even trying to parse the data and just have a strict tagname->value or similar simple construct to handle the data. #3 as default with option to do #1 is probably best (least error prone with option for most information), with caching to speed up lookups as Sendu Bala does now. Chris > Nadeem > > > -- > Dr S.M. Nadeem N. Faruque > 9 Barley Court > Saffron Walden > Essex CB11 3HG > 01799 500 120 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Mon May 15 17:37:56 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 15 May 2006 17:37:56 -0400 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <000a01c67862$0a00cab0$15327e82@pyrimidine> References: <000a01c67862$0a00cab0$15327e82@pyrimidine> Message-ID: <6CCA8112-651D-4154-94AE-88FE8EFBCD27@gmx.net> It does have the following line though (and a 'use' statement for OntologyIO); @ISA = qw( Bio::OntologyIO ); So what is it doing 'wrong' (there aren't any tests or so in which anything erroneous would show)? -hilmar On May 15, 2006, at 4:56 PM, Chris Fields wrote: > Okay, I see what you mean. Using the search term "Bio::Ont*" also > explains > why I didn't see it ;P. Yeah, the bug shows up here too (WinXP and > Mac OS > X), and those links are broken like you said. Could be something > to do with > indexing. > > Using the methods script in the FAQ > (http://www.bioperl.org/wiki/FAQ#Why_can. > 27t_I_easily_get_a_list_of_all_the_ > methods_a_object_can_call.3F) I get this: > > C:\Perl\Scripts>methods.pl Bio::OntologyIO::simplehierarchy > Bio::OntologyIO::simplehierarchy::Dumper > Bio::OntologyIO::simplehierarchy::basename > Bio::OntologyIO::simplehierarchy::dirname > Bio::OntologyIO::simplehierarchy::fileparse > Bio::OntologyIO::simplehierarchy::fileparse_set_fstype > > Chris > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp >> Sent: Monday, May 15, 2006 2:24 PM >> To: Chris Fields >> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' >> Subject: Re: [Bioperl-l] Deobfuscator interface now available >> >> I wasn't using the search. It's in the scrollable table for browsing. >> -hilmar >> >> On May 15, 2006, at 3:07 PM, Chris Fields wrote: >> >>> I'll have to give it a try on Mac OS X (we have an ancient G4 in >>> the lab >>> which I can try it on). I'll let you know what I find. >>> >>> This is what I get when I do a search for 'Bio::Ont*' using Firefox >>> on WinXP >>> and this Deobfuscator link (http://bioperl.org/cgi-bin/ >>> deob_interface.cgi?); >>> all the classes have links that work (I added newline and tab to >>> make it a >>> bit more readable) : >>> >>> Bio::OntologyIO >>> Parser factory for Ontology formats >>> Bio::OntologyIO::Handlers::BaseSAXHandler >>> no short description available >>> Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler >>> no short description available >>> Bio::Ontology::OntologyI >>> Interface for an ontology implementation >>> Bio::Ontology::TermFactory >>> Instantiates a new Bio::Ontology::TermI (or derived class) >>> through a >>> factory >>> Bio::Ontology::OntologyStore >>> A repository of ontologies >>> Bio::Ontology::RelationshipFactory >>> Instantiates a new Bio::Ontology::RelationshipI (or derived class) >>> through a factory >>> Bio::Ontology::Ontology >>> standard implementation of an Ontology >>> >>> So the names seem fine here. >>> >>> When I click on a class (Bio::Ontology::Ontology) I get in the >>> results >>> section: >>> >>> Method Class >>> Returns >>> Usage >>> add_relationship Bio::Ontology::Ontology >> Its >>> argument. add_relationship(RelationshipI relationship): >>> RelationshipI >>> add_relationship_type Bio::Ontology::OntologyEngineI >>> not >>> documented not documented >>> add_term Bio::Ontology::Ontology >>> its >>> argument. add_term(TermI term): TermI >>> >>> ....and so on >>> >>> Where each method is clickable and opens a new page containing a >>> table: >>> >>> Bio::Ontology::Ontology::add_relationship >>> Usage add_relationship(RelationshipI relationship): RelationshipI >>> Function Adds a relationship object to the ontology engine. >>> Returns Its argument. >>> Args A RelationshipI object. >>> >>> >>> Each class is also linked to the bioperl-live PDOC. Clicking on >>> class >>> Bio::Ontology::Ontology in the results table gets me this page >>> (no new >>> page): >>> >>> http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html >>> >>> >>> Chris >>> >>>> -----Original Message----- >>>> From: Hilmar Lapp [mailto:hlapp at gmx.net] >>>> Sent: Monday, May 15, 2006 1:09 PM >>>> To: Chris Fields >>>> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' >>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available >>>> >>>> Safari or Firefox on MacOSX don't do this. Note that the appearance >>>> in the browsable list is already different (the prefix is missing), >>>> and the JavaScript link also lacks the prefix in the module name in >>>> contrast to others, e.g., Bio::Ontology::Ontology (which is one of >>>> the few Bio::Ontology exceptions that do work and do display >>>> correctly). >>>> >>>> I suppose there is something peculiar about the code formatting of >>>> those modules? Some of the modules under Bio::OntologyIO are also >>>> affected BTW. >>>> >>>> What happens is after you click on the link the page apppears to >>>> reload (i.e., gets submitted) but the second table that is supposed >>>> open underneath the first doesn't appear. However, the sort-by drop >>>> down selector does appear. >>>> >>>> -hilmar >>>> >>>> On May 15, 2006, at 1:22 PM, Chris Fields wrote: >>>> >>>>> That's strange. Clicking on the list gives me the results for >>>>> that >>>>> module. >>>>> When I click on the hyperlinks in the results section they open >>>>> fine; the >>>>> method column links opens a new page containing usage-function- >>>>> returns-args >>>>> and the class column links opens pdoc (same page) for bioperl- >>>>> live. I'm >>>>> using Firefox 1.5 on WinXP. >>>>> >>>>> Chris >>>>> >>>>>> -----Original Message----- >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp >>>>>> Sent: Monday, May 15, 2006 12:01 PM >>>>>> To: Mauricio Herrera Cuadra >>>>>> Cc: bioperl-l >>>>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available >>>>>> >>>>>> Hey, thanks to Laura & David for this interface. >>>>>> >>>>>> Any idea why most of the Bio::Ontology::* modules show up without >>>>>> their leading Bio::Ontology? And clicking on those hyperlinks >>>>>> doesn't >>>>>> go anywhere either ... Anything different with those modules >>>>>> that I >>>>>> can fix? >>>>>> >>>>>> -hilmar >>>>>> >>>>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: >>>>>> >>>>>>> I'm glad to announce the availability of the Deobfuscator >>>>>>> interface at >>>>>>> the BioPerl website. You can use it at the following URL: >>>>>>> >>>>>>> http://bioperl.org/cgi-bin/deob_interface.cgi >>>>>>> >>>>>>> Many thanks to Laura Kavanaugh and David Messina for this great >>>>>>> contribution to the BioPerl project! >>>>>>> >>>>>>> Mauricio. >>>>>>> >>>>>>> -- >>>>>>> MAURICIO HERRERA CUADRA >>>>>>> arareko at campus.iztacala.unam.mx >>>>>>> Laboratorio de Gen?tica >>>>>>> Unidad de Morfofisiolog?a y Funci?n >>>>>>> Facultad de Estudios Superiores Iztacala, UNAM >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>> >>>>>> -- >>>>>> =========================================================== >>>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>>>> =========================================================== >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> -- >>>> =========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>> =========================================================== >>>> >>>> >>>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Mon May 15 18:03:48 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 17:03:48 -0500 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <6CCA8112-651D-4154-94AE-88FE8EFBCD27@gmx.net> Message-ID: <000d01c6786b$71c04e60$15327e82@pyrimidine> And Bio::OntologyIO works on it's own: C:\Perl\Scripts>methods.pl Bio::OntologyIO Bio::OntologyIO::DESTROY Bio::OntologyIO::new Bio::OntologyIO::next_ontology Bio::OntologyIO::term_factory Bio::OntologyIO::unescape Bio::Root::IO::catfile Bio::Root::IO::close Bio::Root::IO::dup Bio::Root::IO::exists_exe Bio::Root::IO::file Bio::Root::IO::flush Bio::Root::IO::gensym Bio::Root::IO::mode Bio::Root::IO::noclose Bio::Root::IO::qualify Bio::Root::IO::qualify_to_ref Bio::Root::IO::rmtree Bio::Root::IO::tempdir Bio::Root::IO::tempfile Bio::Root::IO::ungensym Bio::Root::Root::confess Bio::Root::Root::debug Bio::Root::Root::throw Bio::Root::Root::verbose Bio::Root::RootI::carp Bio::Root::RootI::deprecated Bio::Root::RootI::stack_trace Bio::Root::RootI::stack_trace_dump Bio::Root::RootI::throw_not_implemented Bio::Root::RootI::warn Bio::Root::RootI::warn_not_implemented But when I try these: C:\Perl\Scripts>methods.pl Bio::OntologyIO::goflat C:\Perl\Scripts>methods.pl Bio::OntologyIO::dagflat I get nada. It could be related to the way the methods are parsed using Class::Inspector : print join ("\n", sort @{Class::Inspector->methods($class,'full','public')}), "\n"; I haven't tried it on all the weird Bio::Ontology-missing modules (don't have time today). It's not common to all of those modules though: C:\Perl\Scripts>methods.pl Bio::OntologyIO::InterProParser Bio::OntologyIO::DESTROY Bio::OntologyIO::InterProParser::next_ontology Bio::OntologyIO::InterProParser::parse Bio::OntologyIO::InterProParser::secondary_accessions_map Bio::OntologyIO::new Bio::OntologyIO::term_factory Bio::OntologyIO::unescape Bio::Root::IO::catfile Bio::Root::IO::close Bio::Root::IO::dup Bio::Root::IO::exists_exe Bio::Root::IO::file Bio::Root::IO::flush Bio::Root::IO::gensym Bio::Root::IO::mode Bio::Root::IO::noclose Bio::Root::IO::qualify Bio::Root::IO::qualify_to_ref Bio::Root::IO::rmtree Bio::Root::IO::tempdir Bio::Root::IO::tempfile Bio::Root::IO::ungensym Bio::Root::Root::confess Bio::Root::Root::debug Bio::Root::Root::throw Bio::Root::Root::verbose Bio::Root::RootI::carp Bio::Root::RootI::deprecated Bio::Root::RootI::stack_trace Bio::Root::RootI::stack_trace_dump Bio::Root::RootI::throw_not_implemented Bio::Root::RootI::warn Bio::Root::RootI::warn_not_implemented Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > Sent: Monday, May 15, 2006 4:38 PM > To: Chris Fields > Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > Subject: Re: [Bioperl-l] Deobfuscator interface now available > > It does have the following line though (and a 'use' statement for > OntologyIO); > > @ISA = qw( Bio::OntologyIO ); > > So what is it doing 'wrong' (there aren't any tests or so in which > anything erroneous would show)? > > -hilmar > > On May 15, 2006, at 4:56 PM, Chris Fields wrote: > > > Okay, I see what you mean. Using the search term "Bio::Ont*" also > > explains > > why I didn't see it ;P. Yeah, the bug shows up here too (WinXP and > > Mac OS > > X), and those links are broken like you said. Could be something > > to do with > > indexing. > > > > Using the methods script in the FAQ > > (http://www.bioperl.org/wiki/FAQ#Why_can. > > 27t_I_easily_get_a_list_of_all_the_ > > methods_a_object_can_call.3F) I get this: > > > > C:\Perl\Scripts>methods.pl Bio::OntologyIO::simplehierarchy > > Bio::OntologyIO::simplehierarchy::Dumper > > Bio::OntologyIO::simplehierarchy::basename > > Bio::OntologyIO::simplehierarchy::dirname > > Bio::OntologyIO::simplehierarchy::fileparse > > Bio::OntologyIO::simplehierarchy::fileparse_set_fstype > > > > Chris > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > >> Sent: Monday, May 15, 2006 2:24 PM > >> To: Chris Fields > >> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > >> Subject: Re: [Bioperl-l] Deobfuscator interface now available > >> > >> I wasn't using the search. It's in the scrollable table for browsing. > >> -hilmar > >> > >> On May 15, 2006, at 3:07 PM, Chris Fields wrote: > >> > >>> I'll have to give it a try on Mac OS X (we have an ancient G4 in > >>> the lab > >>> which I can try it on). I'll let you know what I find. > >>> > >>> This is what I get when I do a search for 'Bio::Ont*' using Firefox > >>> on WinXP > >>> and this Deobfuscator link (http://bioperl.org/cgi-bin/ > >>> deob_interface.cgi?); > >>> all the classes have links that work (I added newline and tab to > >>> make it a > >>> bit more readable) : > >>> > >>> Bio::OntologyIO > >>> Parser factory for Ontology formats > >>> Bio::OntologyIO::Handlers::BaseSAXHandler > >>> no short description available > >>> Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler > >>> no short description available > >>> Bio::Ontology::OntologyI > >>> Interface for an ontology implementation > >>> Bio::Ontology::TermFactory > >>> Instantiates a new Bio::Ontology::TermI (or derived class) > >>> through a > >>> factory > >>> Bio::Ontology::OntologyStore > >>> A repository of ontologies > >>> Bio::Ontology::RelationshipFactory > >>> Instantiates a new Bio::Ontology::RelationshipI (or derived class) > >>> through a factory > >>> Bio::Ontology::Ontology > >>> standard implementation of an Ontology > >>> > >>> So the names seem fine here. > >>> > >>> When I click on a class (Bio::Ontology::Ontology) I get in the > >>> results > >>> section: > >>> > >>> Method Class > >>> Returns > >>> Usage > >>> add_relationship Bio::Ontology::Ontology > >> Its > >>> argument. add_relationship(RelationshipI relationship): > >>> RelationshipI > >>> add_relationship_type Bio::Ontology::OntologyEngineI > >>> not > >>> documented not documented > >>> add_term Bio::Ontology::Ontology > >>> its > >>> argument. add_term(TermI term): TermI > >>> > >>> ....and so on > >>> > >>> Where each method is clickable and opens a new page containing a > >>> table: > >>> > >>> Bio::Ontology::Ontology::add_relationship > >>> Usage add_relationship(RelationshipI relationship): RelationshipI > >>> Function Adds a relationship object to the ontology engine. > >>> Returns Its argument. > >>> Args A RelationshipI object. > >>> > >>> > >>> Each class is also linked to the bioperl-live PDOC. Clicking on > >>> class > >>> Bio::Ontology::Ontology in the results table gets me this page > >>> (no new > >>> page): > >>> > >>> http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html > >>> > >>> > >>> Chris > >>> > >>>> -----Original Message----- > >>>> From: Hilmar Lapp [mailto:hlapp at gmx.net] > >>>> Sent: Monday, May 15, 2006 1:09 PM > >>>> To: Chris Fields > >>>> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > >>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available > >>>> > >>>> Safari or Firefox on MacOSX don't do this. Note that the appearance > >>>> in the browsable list is already different (the prefix is missing), > >>>> and the JavaScript link also lacks the prefix in the module name in > >>>> contrast to others, e.g., Bio::Ontology::Ontology (which is one of > >>>> the few Bio::Ontology exceptions that do work and do display > >>>> correctly). > >>>> > >>>> I suppose there is something peculiar about the code formatting of > >>>> those modules? Some of the modules under Bio::OntologyIO are also > >>>> affected BTW. > >>>> > >>>> What happens is after you click on the link the page apppears to > >>>> reload (i.e., gets submitted) but the second table that is supposed > >>>> open underneath the first doesn't appear. However, the sort-by drop > >>>> down selector does appear. > >>>> > >>>> -hilmar > >>>> > >>>> On May 15, 2006, at 1:22 PM, Chris Fields wrote: > >>>> > >>>>> That's strange. Clicking on the list gives me the results for > >>>>> that > >>>>> module. > >>>>> When I click on the hyperlinks in the results section they open > >>>>> fine; the > >>>>> method column links opens a new page containing usage-function- > >>>>> returns-args > >>>>> and the class column links opens pdoc (same page) for bioperl- > >>>>> live. I'm > >>>>> using Firefox 1.5 on WinXP. > >>>>> > >>>>> Chris > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > >>>>>> Sent: Monday, May 15, 2006 12:01 PM > >>>>>> To: Mauricio Herrera Cuadra > >>>>>> Cc: bioperl-l > >>>>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available > >>>>>> > >>>>>> Hey, thanks to Laura & David for this interface. > >>>>>> > >>>>>> Any idea why most of the Bio::Ontology::* modules show up without > >>>>>> their leading Bio::Ontology? And clicking on those hyperlinks > >>>>>> doesn't > >>>>>> go anywhere either ... Anything different with those modules > >>>>>> that I > >>>>>> can fix? > >>>>>> > >>>>>> -hilmar > >>>>>> > >>>>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: > >>>>>> > >>>>>>> I'm glad to announce the availability of the Deobfuscator > >>>>>>> interface at > >>>>>>> the BioPerl website. You can use it at the following URL: > >>>>>>> > >>>>>>> http://bioperl.org/cgi-bin/deob_interface.cgi > >>>>>>> > >>>>>>> Many thanks to Laura Kavanaugh and David Messina for this great > >>>>>>> contribution to the BioPerl project! > >>>>>>> > >>>>>>> Mauricio. > >>>>>>> > >>>>>>> -- > >>>>>>> MAURICIO HERRERA CUADRA > >>>>>>> arareko at campus.iztacala.unam.mx > >>>>>>> Laboratorio de Gen?tica > >>>>>>> Unidad de Morfofisiolog?a y Funci?n > >>>>>>> Facultad de Estudios Superiores Iztacala, UNAM > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Bioperl-l mailing list > >>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>> > >>>>>> > >>>>>> -- > >>>>>> =========================================================== > >>>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >>>>>> =========================================================== > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>> > >>>> -- > >>>> =========================================================== > >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >>>> =========================================================== > >>>> > >>>> > >>>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > >> -- > >> =========================================================== > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >> =========================================================== > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Mon May 15 20:14:28 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Mon, 15 May 2006 19:14:28 -0500 Subject: [Bioperl-l] Memory Leak in Bio::SearchIO Message-ID: ---- Original message ---- >Date: Mon, 15 May 2006 15:40:15 -0400 >From: "Clarke, Wayne" >Subject: [Bioperl-l] Memory Leak in Bio::SearchIO >To: > >Hey everyone, > > > >I have been developing some code to download and parse blast reports >from a remote server using Soap::Lite as well as insert the results into >a mysql database. The problem I am having is that my program seems to be >taking up and huge amount of RAM. For a single job of 10000 queries it >can consume as much as a couple hundred Mb inside an hour. If you're parsing 10000 queries (10000 different BLAST reports, right?) then it's not necessarily a memory leak as much as it is object creatio. Each report generates hit objects which in turn generate hsp objects. I think Jason recommends using the tabular output option (-m8 or -m9) for huge reports as it cuts down considerably on this. If you are cycling through each report it shouldn't be as much of a problem unless your BLAST reports are really huge. Have you tried parsing a single report to see if the problem persists? Now, if you are using Bioperl 1.5.1 with BLAST 2.2.13 or newer, you'll likely run into a problem with an infinite loop that occurs due to a change in NCBI's text output. You can try updating bioperl from CVS in either case to see if that helps any. Tabular output and XML output, AFAIK, is the same regardless of version; this bug only affected text output of BLAST reports. > I realize >that a lot of work is being done but this seems like way too much. This >leads me to the subject of my post. I think I may have traced the source >of the memory leak to Bio::SearchIO. I have used Devel::Size to track >the size of my variables and done other debugging steps and have had no >luck with resolving this very frustrating problem. My code is as >follows: > > > > my $result = $connector->getQueryResult($query_id); > > > > my $FH; > > open $FH, "<", \$result; > > > > my $searchio = new Bio::SearchIO(-format => "blast", > > > > -fh => $FH); > > > > while (my $o_blast = $searchio->next_result()) { > > my $clone_id = $o_blast->query_name(); > > > > my $statement = $bdbi->form_push_SQL ($o_blast, >$clone_id, 5); > > > >this is just the leading and tailing code surrounding the use of >Bio::SearchIO since there is quite a lot. I am mostly just wondering if >anyone has ever had problems with SearchIO and its memory usage. I >looked at the source code for it but am afraid it is out of my league. >Any help/suggestions/questions would be great. Thanks > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From torsten.seemann at infotech.monash.edu.au Mon May 15 20:18:44 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 16 May 2006 10:18:44 +1000 Subject: [Bioperl-l] Memory Leak in Bio::SearchIO In-Reply-To: <320530F83FA47047823E57F110DDEAADB159EC@onncrxms4.agr.gc.ca> References: <320530F83FA47047823E57F110DDEAADB159EC@onncrxms4.agr.gc.ca> Message-ID: <44691A64.8040607@infotech.monash.edu.au> > taking up and huge amount of RAM. For a single job of 10000 queries it > can consume as much as a couple hundred Mb inside an hour. I realize > my $result = $connector->getQueryResult($query_id); > my $searchio = new Bio::SearchIO(-format => "blast", > while (my $o_blast = $searchio->next_result()) { > my $clone_id = $o_blast->query_name(); > my $statement = $bdbi->form_push_SQL ($o_blast, $clone_id, 5); } Some comments: Have you considered that whatever class/module $bdbi belongs to is causing the problem? ie. is it keeping a reference to $o_blast around? Are you aware that Perl garbage collection does not necessarily return freed memory back to the OS? This may affect how you were measuring "memory usage". -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From kmdaily at indiana.edu Mon May 15 17:00:12 2006 From: kmdaily at indiana.edu (Daily, Kenneth Michael) Date: Mon, 15 May 2006 17:00:12 -0400 Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO Message-ID: <20528E699A515C499B80C222BDBEBC34043FF8@iu-mssg-mbx108.ads.iu.edu> I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in Bio/SeqIO). How can I get this module? Kenny Daily IU School of Informatics kmdaily at indiana.edu From letondal at pasteur.fr Tue May 16 02:06:19 2006 From: letondal at pasteur.fr (Catherine Letondal) Date: Tue, 16 May 2006 08:06:19 +0200 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: References: <000901c67827$d99eabb0$15327e82@pyrimidine> Message-ID: <9c36140009c3d80bbb0d543376afa6e0@pasteur.fr> On May 15, 2006, at 9:34 PM, David Messina wrote: >>>> A couple of minor interface thoughts. >>>> >>>> 1)There's quite a lot of methods for many of the classes. As such, I >>>> think I'll often want to browse through what's available in a >>>> class. But >>>> 60% or so of the screen real estate is used for "Enter a search >>>> string... OR select a class from the list". IMO, it would be >>>> better to >>>> have two pages, a search page and a result page. It only takes >>>> a click >>>> on Back (or a "new search" button) to get to a new search, and >>>> now you >>>> can use your whole screen for reading your results. >>> >>> As the compromise it must be, I like the way it behaves. I don't like >>> lots of windows. I especially don't like pop up windows. Right now >>> when >>> I'm using the bioperl docs I tend to have a whole bunch of tabs >>> open to >>> different class pages at once, so being able to see an overview >>> all on >>> one page in Deobfuscator is very nice. > > I think the current behavior makes sense as the default, but I like > the idea of being able to view the search results in a separate > window for easier browsing. Thanks for the suggestion; I'll add it to > the list. > First, thanks for this very useful Web interface! There are examples (quite ajaxian ones) that reach a compromise between several windows for easily browsing large results, and composing everything in one window to get an overview - the 2 examples that come in my mind currently are (not biology related): - http://montreal.mspace.fm/chi/sched/ - http://www.live.com/ (see the slider on the top right enabling to squeeze or enlarge the results area) -- Catherine Letondal -- Institut Pasteur From cjfields at uiuc.edu Tue May 16 07:38:42 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Tue, 16 May 2006 06:38:42 -0500 Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO Message-ID: <36b52ba4.b94c5b79.8198c00@expms6.cites.uiuc.edu> You'll have to install from CVS. I believe Brian added Entrezgene.pm after the lst developer release (1.5.1): http://www.bioperl.org/wiki/Installing_BioPerl Chris ---- Original message ---- >Date: Mon, 15 May 2006 17:00:12 -0400 >From: "Daily, Kenneth Michael" >Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO >To: > >I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in Bio/SeqIO). How can I get this module? > >Kenny Daily >IU School of Informatics >kmdaily at indiana.edu > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From bernd.web at gmail.com Tue May 16 07:37:46 2006 From: bernd.web at gmail.com (Bernd Web) Date: Tue, 16 May 2006 13:37:46 +0200 Subject: [Bioperl-l] Bio::DB::Query::GenBank checks Message-ID: <716af09c0605160437tfcf824dxa514f38f6b94d423@mail.gmail.com> Hi all, I was using Bio::DB::Query::GenBank to obtain only IDs from Entrez and found some issues and differences (bugs?) in behaviour wrt the pod. Do these look familiar ? Some example code: my $query = Bio::DB::Query::GenBank->new (-query =>'Lassa Virus[ORGN]', -reldate => '30', -db => 'protein', -ids => [195052,2981014,11127914], -maxids => 30 ); $gb = new Bio::DB::GenBank(format=>'fasta'); my $seqio = $gb->get_Stream_by_query($query); while (my $seq = $seqio->next_seq) { print $seq->desc,"\n"; } The module states that if we provide -ids that: If you provide an array reference of IDs in -ids, the query will be ignored and the list of IDs will be used when the query is passed to a Bio::DB::GenBank object's get_Stream_by_query() method. In the above case actually the query is passed ('Lassa Virus[ORGN]), not the IDs. Also $query->query shows the original query. Am I doing something wrong or is the pod not reflecting current behaviour of this module? I was also surprised that if internet is down no warning is thrown for $query->query or $query->count at all. Only the get_Stream_by_query above will warn us if the site is unreachable (500 Internal Server Error). $query->ids or $query->count will not throw a warning and @ids=$query->ids will just be an empty array. (I realize $query->count is not initialized, so I am using this now to check for succes, but a warning from WebDBSeqI would me more approprotiate I think). Last, the example from the pod is not working, but no warnings are raised: # initialize the list yourself my $query = Bio::DB::Query::GenBank->new(-ids=>[195052,2981014,11127914]); $query->count returns zero w/o any warning. Of course this query did not specify a DB. Only if we specify -db=>'nucleotide' $query->count is 3. However, why not any warning if we set -db->'protein' or if we did not set this? On the NCBI website searching Protein DB returns for 19505: See Details. No items found. The following term(s) refer to a different DB:195052 But this is not reflected via Bio::DB::Query::GenBank. Can I check for this situation in the code apart from checking on $query->count == 0 ? Or would it indeed be better to check for these situations in the module? Regards, Bernd From chen_li3 at yahoo.com Tue May 16 10:55:51 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 16 May 2006 07:55:51 -0700 (PDT) Subject: [Bioperl-l] module for 6 reading frames Message-ID: <20060516145551.50370.qmail@web36802.mail.mud.yahoo.com> Hi all, I wonder which module is available for translating DNA sequence into 6 reading frames. Thank you, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From smarkel at scitegic.com Tue May 16 11:10:35 2006 From: smarkel at scitegic.com (smarkel at scitegic.com) Date: Tue, 16 May 2006 08:10:35 -0700 Subject: [Bioperl-l] module for 6 reading frames In-Reply-To: <20060516145551.50370.qmail@web36802.mail.mud.yahoo.com> Message-ID: Li, Use the translate() function in Bio::Tools::CodonTable. Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at scitegic.com SciTegic Inc. mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 279 8804 USA web: http://www.scitegic.com bioperl-l-bounces at lists.open-bio.org wrote on 16.05.2006 07:55:51: > Hi all, > > I wonder which module is available for translating DNA > sequence into 6 reading frames. > > Thank you, > > Li From golharam at umdnj.edu Tue May 16 12:18:19 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Tue, 16 May 2006 12:18:19 -0400 Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene? Message-ID: <001f01c67904$59b08ad0$2f01a8c0@GOLHARMOBILE1> I just updated my local copy of bioperl from cvs. When I ran the configure script, it says I need the external module Bio::ASN1::EntrezGene. Which package contains this module? -- Ryan Golhar - golharam at umdnj.edu The Informatics Institute of UMDNJ From golharam at umdnj.edu Tue May 16 12:24:03 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Tue, 16 May 2006 12:24:03 -0400 Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene? Message-ID: <002001c67905$2622a580$2f01a8c0@GOLHARMOBILE1> Never mind. I see its in CPAN. -----Original Message----- From: Ryan Golhar [mailto:golharam at umdnj.edu] Sent: Tuesday, May 16, 2006 12:18 PM To: 'bioperl-l at bioperl.org' Subject: Where is Bio::ASN1::EntrezGene? I just updated my local copy of bioperl from cvs. When I ran the configure script, it says I need the external module Bio::ASN1::EntrezGene. Which package contains this module? -- Ryan Golhar - golharam at umdnj.edu The Informatics Institute of UMDNJ From cjfields at uiuc.edu Tue May 16 13:27:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 16 May 2006 12:27:32 -0500 Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene? In-Reply-To: <001f01c67904$59b08ad0$2f01a8c0@GOLHARMOBILE1> Message-ID: <002701c6790e$03d8f110$15327e82@pyrimidine> It's actually not part of Bioperl currently; you can find it on CPAN: http://search.cpan.org/~mingyiliu/Bio-ASN1-EntrezGene-1.091/lib/Bio/ASN1/Ent rezGene.pm Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Ryan Golhar > Sent: Tuesday, May 16, 2006 11:18 AM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene? > > I just updated my local copy of bioperl from cvs. When I ran the > configure script, it says I need the external module > Bio::ASN1::EntrezGene. Which package contains this module? > > -- > Ryan Golhar - golharam at umdnj.edu > The Informatics Institute of UMDNJ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ClarkeW at AGR.GC.CA Tue May 16 16:57:13 2006 From: ClarkeW at AGR.GC.CA (Clarke, Wayne) Date: Tue, 16 May 2006 16:57:13 -0400 Subject: [Bioperl-l] Memory Leak in Bio::SearchIO Message-ID: <320530F83FA47047823E57F110DDEAADB159FB@onncrxms4.agr.gc.ca> With regards to the suggestions/comments made thank you. However I think I should clear a few things up. I am running bioperl v1.4, I am cycling through the blast reports which should not be of absurd size since they only contain the top 5 hits, and I am using top to track(although I realize fairly inacuately) the memory usage. I have looked through the code for both AAFCBLAST and BEAST_UPDATE but do not believe the leak/problem to be contained within them since they are almost exclusively using method calls and those variables should be destroyed upon leaving the scope of the method. I have used Devel::Size to check the size of the variables $bdbi and $searchio and $connector and on each iteration these variables have the same size. Any other suggestions would be greatly appreciated as I have nearly gone insane trying to track this problem down. Thanks, Wayne -----Original Message----- From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au] Sent: Monday, May 15, 2006 6:19 PM To: Clarke, Wayne Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO > taking up and huge amount of RAM. For a single job of 10000 queries it > can consume as much as a couple hundred Mb inside an hour. I realize > my $result = $connector->getQueryResult($query_id); > my $searchio = new Bio::SearchIO(-format => "blast", > while (my $o_blast = $searchio->next_result()) { > my $clone_id = $o_blast->query_name(); > my $statement = $bdbi->form_push_SQL ($o_blast, $clone_id, 5); } Some comments: Have you considered that whatever class/module $bdbi belongs to is causing the problem? ie. is it keeping a reference to $o_blast around? Are you aware that Perl garbage collection does not necessarily return freed memory back to the OS? This may affect how you were measuring "memory usage". -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From smarkel at scitegic.com Tue May 16 16:52:05 2006 From: smarkel at scitegic.com (smarkel at scitegic.com) Date: Tue, 16 May 2006 13:52:05 -0700 Subject: [Bioperl-l] module for 6 reading frames In-Reply-To: <20060516200436.34908.qmail@web36812.mail.mud.yahoo.com> Message-ID: Li, You can either do the substring, and reverse complement, yourself or you can use the translate() function in Bio::PrimarySeq. It inherits from Bio::PrimarySeqI, so check there for the documentation. That translate() function takes a "-frame" argument. Scott PS In future, please respond to the list. That way others see the questions and answers. chen li wrote on 16.05.2006 13:04:36: > Dear Dr. Markel, > > I browse through the document of > Bio:Tools::Codontable and find this line: > > my $translation= $CodonTable->translate($seq); > > I think this line is to do the translation. Here is my > question: which line in the doc says how to translate > the remaining frames 2,3, and -1, -2, -3? > > > Thank you, > > Li > > --- smarkel at scitegic.com wrote: > > > Li, > > > > Use the translate() function in > > Bio::Tools::CodonTable. > > > > Scott > > > > Scott Markel, Ph.D. > > Principal Bioinformatics Architect email: > > smarkel at scitegic.com > > SciTegic Inc. mobile: +1 858 > > 205 3653 > > 10188 Telesis Court, Suite 100 voice: +1 858 > > 799 5603 > > San Diego, CA 92121 fax: +1 858 > > 279 8804 > > USA web: > > http://www.scitegic.com > > > > > > bioperl-l-bounces at lists.open-bio.org wrote on > > 16.05.2006 07:55:51: > > > > > Hi all, > > > > > > I wonder which module is available for translating > > DNA > > > sequence into 6 reading frames. > > > > > > Thank you, > > > > > > Li > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > > > -- > Click on the link below to report this email as spam > https://www.mailcontrol. > com/sr/YWaRnXqa+nSyeG1Z34OqL4dC5eYKMoJmYLQSBonkiAgNVwARwO! > frAkRrVu9wDE5L8wrIaSzXTpcs3mxX9Ufx7LAO0PQl77O8HiAh50c4TI! > ysIW++WTn79gM0HS11zvKPuUVANsGXCZT! > LRAY3PyyLo6NzoChgLXk6YfX05ndLG3vE+GH2aUSTxvV3pwd2! > JlBh9ARAt+OXXsyYtG6VgFNOO9GFnNxV From cjfields at uiuc.edu Tue May 16 17:15:10 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 16 May 2006 16:15:10 -0500 Subject: [Bioperl-l] Memory Leak in Bio::SearchIO In-Reply-To: <320530F83FA47047823E57F110DDEAADB159FB@onncrxms4.agr.gc.ca> Message-ID: <000601c6792d$d0ab1500$15327e82@pyrimidine> I mentioned two possibilities last time I posted: 1) that the BLAST file was too large, or 2) that you are using an old version of bioperl that SearchIO is broken. You seem to fit #2. The issue is that NCBI does not consider text BLAST output sacrosanct and routinely makes changes to it that break parsing. Due to this, SearchIO::blast needs to be constantly updated, so much so that there are normally a few updates a year to fix parsing issues in that module alone compared to BioPerl as a whole. And, BTW, although bioperl-1.4 is about 2 years old now, even bioperl-1.5.1 SearchIO is broken when it comes to the latest NCBI BLAST (2.2.14 now). I seriously suggest updating your local bioperl distribution to the latest bioperl-live (from CVS). Take one of those 10000 reports, just one, and try parsing it. If you have the same problem (a CPU spike and increasing memory usage) then it may be fixed in CVS. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne > Sent: Tuesday, May 16, 2006 3:57 PM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO > > > With regards to the suggestions/comments made thank you. However I think > I should clear a few things up. I am running bioperl v1.4, I am cycling > through the blast reports which should not be of absurd size since they > only contain the top 5 hits, and I am using top to track(although I > realize fairly inacuately) the memory usage. I have looked through the > code for both AAFCBLAST and BEAST_UPDATE but do not believe the > leak/problem to be contained within them since they are almost > exclusively using method calls and those variables should be destroyed > upon leaving the scope of the method. I have used Devel::Size to check > the size of the variables $bdbi and $searchio and $connector and on each > iteration these variables have the same size. Any other suggestions > would be greatly appreciated as I have nearly gone insane trying to > track this problem down. > > Thanks, Wayne > > > -----Original Message----- > From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au] > Sent: Monday, May 15, 2006 6:19 PM > To: Clarke, Wayne > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO > > > taking up and huge amount of RAM. For a single job of 10000 queries it > > can consume as much as a couple hundred Mb inside an hour. I realize > > > my $result = $connector->getQueryResult($query_id); > > my $searchio = new Bio::SearchIO(-format => "blast", > > while (my $o_blast = $searchio->next_result()) { > > my $clone_id = $o_blast->query_name(); > > my $statement = $bdbi->form_push_SQL > ($o_blast, $clone_id, 5); } > > Some comments: > > Have you considered that whatever class/module $bdbi belongs to is > causing the problem? ie. is it keeping a reference to $o_blast around? > > Are you aware that Perl garbage collection does not necessarily return > freed memory back to the OS? This may affect how you were measuring > "memory usage". > > -- > Dr Torsten Seemann http://www.vicbioinformatics.com > Victorian Bioinformatics Consortium, Monash University, Australia > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ClarkeW at AGR.GC.CA Tue May 16 17:24:51 2006 From: ClarkeW at AGR.GC.CA (Clarke, Wayne) Date: Tue, 16 May 2006 17:24:51 -0400 Subject: [Bioperl-l] Memory Leak in Bio::SearchIO Message-ID: <320530F83FA47047823E57F110DDEAADB159FC@onncrxms4.agr.gc.ca> Thanks Chris, I did forget to mention however that I did parse one single report and found no problems, it finished fast and with no noticeable memory usage. I will consider getting my SA to update bioperl from CVS as a precaution but he has already stated he prefers to wait for the release of v1.5. Even a single job of 10000 will finish but the problem is that I am trying to loop through many jobs of 10000 and it seems to be additive for reasons I can not determine. During testing I noticed that the RSS on top decreased around 80% MEM usage, but then the shared mem increased. I am wondering if this is due to the perl garbage collector freeing up memory but keeping it in its pool for use, if so that is fine as long as the it does not then want to reach into swapped mem. Thanks again, Wayne -----Original Message----- From: Chris Fields [mailto:cjfields at uiuc.edu] Sent: Tuesday, May 16, 2006 3:15 PM To: Clarke, Wayne; bioperl-l at lists.open-bio.org Subject: RE: [Bioperl-l] Memory Leak in Bio::SearchIO I mentioned two possibilities last time I posted: 1) that the BLAST file was too large, or 2) that you are using an old version of bioperl that SearchIO is broken. You seem to fit #2. The issue is that NCBI does not consider text BLAST output sacrosanct and routinely makes changes to it that break parsing. Due to this, SearchIO::blast needs to be constantly updated, so much so that there are normally a few updates a year to fix parsing issues in that module alone compared to BioPerl as a whole. And, BTW, although bioperl-1.4 is about 2 years old now, even bioperl-1.5.1 SearchIO is broken when it comes to the latest NCBI BLAST (2.2.14 now). I seriously suggest updating your local bioperl distribution to the latest bioperl-live (from CVS). Take one of those 10000 reports, just one, and try parsing it. If you have the same problem (a CPU spike and increasing memory usage) then it may be fixed in CVS. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne > Sent: Tuesday, May 16, 2006 3:57 PM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO > > > With regards to the suggestions/comments made thank you. However I think > I should clear a few things up. I am running bioperl v1.4, I am cycling > through the blast reports which should not be of absurd size since they > only contain the top 5 hits, and I am using top to track(although I > realize fairly inacuately) the memory usage. I have looked through the > code for both AAFCBLAST and BEAST_UPDATE but do not believe the > leak/problem to be contained within them since they are almost > exclusively using method calls and those variables should be destroyed > upon leaving the scope of the method. I have used Devel::Size to check > the size of the variables $bdbi and $searchio and $connector and on each > iteration these variables have the same size. Any other suggestions > would be greatly appreciated as I have nearly gone insane trying to > track this problem down. > > Thanks, Wayne > > > -----Original Message----- > From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au] > Sent: Monday, May 15, 2006 6:19 PM > To: Clarke, Wayne > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO > > > taking up and huge amount of RAM. For a single job of 10000 queries it > > can consume as much as a couple hundred Mb inside an hour. I realize > > > my $result = $connector->getQueryResult($query_id); > > my $searchio = new Bio::SearchIO(-format => "blast", > > while (my $o_blast = $searchio->next_result()) { > > my $clone_id = $o_blast->query_name(); > > my $statement = $bdbi->form_push_SQL > ($o_blast, $clone_id, 5); } > > Some comments: > > Have you considered that whatever class/module $bdbi belongs to is > causing the problem? ie. is it keeping a reference to $o_blast around? > > Are you aware that Perl garbage collection does not necessarily return > freed memory back to the OS? This may affect how you were measuring > "memory usage". > > -- > Dr Torsten Seemann http://www.vicbioinformatics.com > Victorian Bioinformatics Consortium, Monash University, Australia > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Tue May 16 17:45:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 16 May 2006 16:45:16 -0500 Subject: [Bioperl-l] Memory Leak in Bio::SearchIO In-Reply-To: <320530F83FA47047823E57F110DDEAADB159FC@onncrxms4.agr.gc.ca> Message-ID: <000801c67932$050dbd30$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne > Sent: Tuesday, May 16, 2006 4:25 PM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO > > > Thanks Chris, > > I did forget to mention however that I did parse one single report and > found no problems, it finished fast and with no noticeable memory usage. > I will consider getting my SA to update bioperl from CVS as a precaution > but he has already stated he prefers to wait for the release of v1.5. Um, you can tell him the last release was v.1.5.1 (last October). It's considered a developer release but is pretty stable; well, except for that whole SearchIO quibble, and that's not our fault. You could also install a local version in case he doesn't budge; see here: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPERL_I N_A_PERSONAL_MODULE_AREA Chris > Even a single job of 10000 will finish but the problem is that I am > trying to loop through many jobs of 10000 and it seems to be additive > for reasons I can not determine. During testing I noticed that the RSS > on top decreased around 80% MEM usage, but then the shared mem > increased. I am wondering if this is due to the perl garbage collector > freeing up memory but keeping it in its pool for use, if so that is fine > as long as the it does not then want to reach into swapped mem. > > Thanks again, Wayne > ... From cjfields at uiuc.edu Tue May 16 18:20:29 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 16 May 2006 17:20:29 -0500 Subject: [Bioperl-l] Bio::DB::Query::GenBank checks In-Reply-To: <716af09c0605160437tfcf824dxa514f38f6b94d423@mail.gmail.com> Message-ID: <000901c67936$f0896990$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Bernd Web > Sent: Tuesday, May 16, 2006 6:38 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::DB::Query::GenBank checks > > Hi all, > > I was using Bio::DB::Query::GenBank to obtain only IDs from Entrez and > found some issues and differences (bugs?) in behaviour wrt the pod. > Do these look familiar ? > > Some example code: > my $query = Bio::DB::Query::GenBank->new > (-query =>'Lassa Virus[ORGN]', > -reldate => '30', > -db => 'protein', > -ids => [195052,2981014,11127914], > -maxids => 30 ); > > $gb = new Bio::DB::GenBank(format=>'fasta'); > my $seqio = $gb->get_Stream_by_query($query); > while (my $seq = $seqio->next_seq) { > print $seq->desc,"\n"; } > > The module states that if we provide -ids that: > If you provide an array reference of IDs in -ids, the query will be > ignored and the list of IDs will be used when the query is passed > to a > Bio::DB::GenBank object's get_Stream_by_query() method. > > In the above case actually the query is passed ('Lassa Virus[ORGN]), > not the IDs. Also $query->query shows the original query. Am I doing > something wrong or is the pod not reflecting current behaviour of this > module? > > I was also surprised that if internet is down no warning is thrown for > $query->query or $query->count at all. Only the get_Stream_by_query > above will warn us if the site is unreachable (500 Internal Server > Error). I believe this has to do with the difference in the objects and the way they retrieve request data; Bio::DB::GenBank and Bio::DB::Query::GenBank use different methods to retrieve ids, Bio::DB::GenBank's get_Stream_by_query method just makes it a bit easier to retrieve a list of uid's directly instead of saving them as an array then reposting them using get_Stream_by_id. Not fullproof but it works okay. > $query->ids or $query->count will not throw a warning and > @ids=$query->ids will just be an empty array. (I realize $query->count > is not initialized, so I am using this now to check for succes, but a > warning from WebDBSeqI would me more approprotiate I think). WebDBSeqI would be the place to make general warnings (it supposed to be and interface for any web seq DB), but not eutils-specific warnings. > Last, the example from the pod is not working, but no warnings are raised: > # initialize the list yourself > my $query = > Bio::DB::Query::GenBank->new(-ids=>[195052,2981014,11127914]); > > $query->count returns zero w/o any warning. Of course this query did > not specify a DB. Only if we specify -db=>'nucleotide' $query->count > is 3. > However, why not any warning if we set -db->'protein' or if we did not set > this? > > > On the NCBI website searching Protein DB returns for 19505: > See Details. No items found. > The following term(s) refer to a different DB:195052 > > But this is not reflected via Bio::DB::Query::GenBank. > > Can I check for this situation in the code apart from checking on > $query->count == 0 ? Or would it indeed be better to check for these > situations in the module? > > Regards, > Bernd I can probably play around with adding a few things in tomorrow and clean up the POD somewhat. I'm planning a rewrite for EUtilities-based searches but that's a ways off still... Can't promise much;l I'm pretty busy til next week. Chris From chen_li3 at yahoo.com Tue May 16 20:53:17 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 16 May 2006 17:53:17 -0700 (PDT) Subject: [Bioperl-l] module for formating sequence output on the screen Message-ID: <20060517005317.3976.qmail@web36815.mail.mud.yahoo.com> Hi all, Thank you very much for the help. I have some DNA sequences printed on the screen. But the default output is longer than I expect. I need 50 necleotides/line. I search CPAN but can not get the right module. Which bioperl module can do this job? Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From kmdaily at indiana.edu Tue May 16 09:57:52 2006 From: kmdaily at indiana.edu (Daily, Kenneth Michael) Date: Tue, 16 May 2006 09:57:52 -0400 Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO References: <36b52ba4.b94c5b79.8198c00@expms6.cites.uiuc.edu> Message-ID: <20528E699A515C499B80C222BDBEBC34043FFB@iu-mssg-mbx108.ads.iu.edu> OK, got that installed. But I still get an error: Can't locate object method "url" via package "Bio::Annotation::DBLink" at /home/kmdaily/src/bioperl/core/Bio/SeqIO/entrezgene.pm line 557. I am using this on a shared system, and an older version of Bioperl was installed by the admin. But the path to the one I downloaded via CVS is first in the list @INC, and PERL5LIB="/home/kmdaily/src/bioperl/core". Kenny Daily IU School of Informatics kmdaily at indiana.edu -----Original Message----- From: Christopher Fields [mailto:cjfields at uiuc.edu] Sent: Tue 5/16/2006 7:38 AM To: Daily, Kenneth Michael; bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] entrezgene.pm not in Bio::SeqIO You'll have to install from CVS. I believe Brian added Entrezgene.pm after the lst developer release (1.5.1): http://www.bioperl.org/wiki/Installing_BioPerl Chris ---- Original message ---- >Date: Mon, 15 May 2006 17:00:12 -0400 >From: "Daily, Kenneth Michael" >Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO >To: > >I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in Bio/SeqIO). How can I get this module? > >Kenny Daily >IU School of Informatics >kmdaily at indiana.edu > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From skirov at utk.edu Wed May 17 07:48:29 2006 From: skirov at utk.edu (Stefan Kirov) Date: Wed, 17 May 2006 07:48:29 -0400 Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO In-Reply-To: <20528E699A515C499B80C222BDBEBC34043FFB@iu-mssg-mbx108.ads.iu.edu> References: <36b52ba4.b94c5b79.8198c00@expms6.cites.uiuc.edu> <20528E699A515C499B80C222BDBEBC34043FFB@iu-mssg-mbx108.ads.iu.edu> Message-ID: <446B0D8D.40901@utk.edu> You are using an old Bio::Annotation::DBLink module. Did you download only entrezgene.pm or the whole bioperl? If yes, what does the tests tell you? Stefan Daily, Kenneth Michael wrote: >OK, got that installed. But I still get an error: > >Can't locate object method "url" via package "Bio::Annotation::DBLink" at /home/kmdaily/src/bioperl/core/Bio/SeqIO/entrezgene.pm line 557. > >I am using this on a shared system, and an older version of Bioperl was installed by the admin. But the path to the one I downloaded via CVS is first in the list @INC, and PERL5LIB="/home/kmdaily/src/bioperl/core". > >Kenny Daily >IU School of Informatics >kmdaily at indiana.edu > > > >-----Original Message----- >From: Christopher Fields [mailto:cjfields at uiuc.edu] >Sent: Tue 5/16/2006 7:38 AM >To: Daily, Kenneth Michael; bioperl-l at lists.open-bio.org >Subject: Re: [Bioperl-l] entrezgene.pm not in Bio::SeqIO > >You'll have to install from CVS. I believe Brian added Entrezgene.pm after the lst >developer release (1.5.1): > >http://www.bioperl.org/wiki/Installing_BioPerl > >Chris > >---- Original message ---- > > >>Date: Mon, 15 May 2006 17:00:12 -0400 >>From: "Daily, Kenneth Michael" >>Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO >>To: >> >>I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in >> >> >Bio/SeqIO). How can I get this module? > > >>Kenny Daily >>IU School of Informatics >>kmdaily at indiana.edu >> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From osborne1 at optonline.net Tue May 16 20:46:00 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 16 May 2006 20:46:00 -0400 Subject: [Bioperl-l] module for 6 reading frames In-Reply-To: Message-ID: Chen Li, There's some documentation on translate() in bptutorial: http://bioperl.org/Core/Latest/bptutorial.html You could also use the translate_6frames() method of Bio::SeqUtils. Brian O. On 5/16/06 4:52 PM, "smarkel at scitegic.com" wrote: > Li, > > You can either do the substring, and reverse complement, yourself > or you can use the translate() function in Bio::PrimarySeq. It > inherits from Bio::PrimarySeqI, so check there for the documentation. > That translate() function takes a "-frame" argument. > > Scott > > PS In future, please respond to the list. That way others see > the questions and answers. > > chen li wrote on 16.05.2006 13:04:36: > >> Dear Dr. Markel, >> >> I browse through the document of >> Bio:Tools::Codontable and find this line: >> >> my $translation= $CodonTable->translate($seq); >> >> I think this line is to do the translation. Here is my >> question: which line in the doc says how to translate >> the remaining frames 2,3, and -1, -2, -3? >> >> >> Thank you, >> >> Li >> >> --- smarkel at scitegic.com wrote: >> >>> Li, >>> >>> Use the translate() function in >>> Bio::Tools::CodonTable. >>> >>> Scott >>> >>> Scott Markel, Ph.D. >>> Principal Bioinformatics Architect email: >>> smarkel at scitegic.com >>> SciTegic Inc. mobile: +1 858 >>> 205 3653 >>> 10188 Telesis Court, Suite 100 voice: +1 858 >>> 799 5603 >>> San Diego, CA 92121 fax: +1 858 >>> 279 8804 >>> USA web: >>> http://www.scitegic.com >>> >>> >>> bioperl-l-bounces at lists.open-bio.org wrote on >>> 16.05.2006 07:55:51: >>> >>>> Hi all, >>>> >>>> I wonder which module is available for translating >>> DNA >>>> sequence into 6 reading frames. >>>> >>>> Thank you, >>>> >>>> Li >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> __________________________________________________ >> Do You Yahoo!? >> Tired of spam? Yahoo! Mail has the best spam protection around >> http://mail.yahoo.com >> >> >> -- >> Click on the link below to report this email as spam >> https://www.mailcontrol. >> com/sr/YWaRnXqa+nSyeG1Z34OqL4dC5eYKMoJmYLQSBonkiAgNVwARwO! >> frAkRrVu9wDE5L8wrIaSzXTpcs3mxX9Ufx7LAO0PQl77O8HiAh50c4TI! >> ysIW++WTn79gM0HS11zvKPuUVANsGXCZT! >> LRAY3PyyLo6NzoChgLXk6YfX05ndLG3vE+GH2aUSTxvV3pwd2! >> JlBh9ARAt+OXXsyYtG6VgFNOO9GFnNxV > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From e-just at northwestern.edu Wed May 17 11:03:41 2006 From: e-just at northwestern.edu (Eric Just) Date: Wed, 17 May 2006 10:03:41 -0500 Subject: [Bioperl-l] Modware: a BioPerl based API for Chado Message-ID: <6.1.1.1.2.20060517095821.13353920@hecky.it.northwestern.edu> Hi Everyone, We are announcing a new Sourceforge Project called Modware that may be of interest to you. It is an object-oriented API written in Perl that creates BioPerl object representations of biological features stored in a Chado database. It basically creates a Bio::Seq object for chromosomes in Chado and creates Bio::SeqFeature::Gene objects for protein coding transcripts stored in Chado. Things like contigs are represented as Bio::SeqFeature::Generic objects. We also provide many methods for manipulating these objects once they are in memory. For download please visit our Sourceforge project page: http://sourceforge.net/projects/gmod-ware For API documentation and some short examples of selected use cases visit our project home page: http://gmod-ware.sourceforge.net/ This software is adapted from the production middleware code that dictyBase uses. Modware 0.1 requires the latest stable GMOD release: 0.003 be installed. We are currently calling it a release candidate and if we get some feedback will call it an official release if there are no major install bugs (we've installed it only on two different machines). If you would like a version that works on the latest CVS version of GMOD, let me know and I'll expedite getting that out the door. Lastly, please use the direct download version, we have not fully recovered from the recent Sourceforge CVS issues. Please try the software out and let us know what you think! Sincerely, Eric Just and Sohel Merchant e-just at northwestern.edu s-merchant at northwestern.edu ============================================ Eric Just e-just at northwestern.edu dictyBase Programmer Center for Genetic Medicine Northwestern University http://dictybase.org ============================================ From sb at mrc-dunn.cam.ac.uk Wed May 17 13:46:45 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Wed, 17 May 2006 18:46:45 +0100 Subject: [Bioperl-l] Bio::Map:: enhancements Message-ID: <446B6185.1000602@mrc-dunn.cam.ac.uk> I added bug http://bugzilla.bioperl.org/show_bug.cgi?id=1998 I'm interested in what people have to say about the secondary enhancement I talk about there. Is it a sane thing to do? What are the better ways of doing that? If it /is/ ok, I suppose I'd have to go back and alter Bio::Map::MappableI and Bio::Map::MarkerI as well, not just Marker. Oh, on a side note, you'll see I had to override RangeI's intersection method to work on multiple ranges. Why is RangeI limited to an intersection of only two ranges? Cheers, Sendu. From David_Waner/San_Diego/Accelrys at scitegic.com Thu May 18 15:30:46 2006 From: David_Waner/San_Diego/Accelrys at scitegic.com (David_Waner/San_Diego/Accelrys at scitegic.com) Date: Thu, 18 May 2006 12:30:46 -0700 Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8 on Windows Message-ID: BioPerl Users/Developers, In our testing we have found severe performance problems using BioPerl with Perl 5.8 on Windows (but not on Linux). They show up especially in SeqIO when reading or writing Fasta files containing large (~16 MB) sequences. The same files that can be read in 1 or 2 seconds with Windows Perl 5.6 or Linux Perl 5.8, take minutes in Windows Perl 5.8. Although the fault is clearly with Perl, not with BioPerl, I have identified a couple of places where BioPerl could be modified in order to save Windows Perl 5.8 users a lot of time, while not affecting other users. For example, in my testing the following excerpt from Bio::Root::IO::_readline() takes 50 seconds (!) to execute (when reading a 16 MB sequence): if( (!$param{-raw}) && (defined $line) ) { $line =~ s/\015?\012/\n/g; $line =~ s/\015/\n/g unless $ONMAC; } whereas the following replacement code should be equivalent: if( (!$param{-raw}) && (defined $line) ) { $line =~ s/\015\012/\012/g; # Change all CR/LF pairs to LF $line =~ tr/\015/\n/ unless $ONMAC; # Change all single CRs to NEWLINE } but executes in less than 1 second. In addition, changing: defined $sequence && $sequence =~ s/\s//g; # Remove whitespace to: defined $sequence && $sequence =~ tr/ \t\n\r//d; # Remove whitespace in Bio::SeqIO::fasta.pm saves an additional ~20 seconds. There are also problems in reading files with the <> operator when $/ is redefined to "\n>", where reading the first line of Fasta files containing large sequences takes ~50 seconds, but reading subsequent lines or files takes about 1 second. I don't have a work-around for this. I would like to ask the mailing list: 1. Has anyone else run into this problem? Any fixes? 2. Do you think BioPerl should incorporate these changes? I plan to submit a bug report to perlbug, but don't know when or if the problem will be fixed. - David From cjfields at uiuc.edu Thu May 18 16:07:14 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 18 May 2006 15:07:14 -0500 Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8 onWindows In-Reply-To: Message-ID: <002901c67ab6$a84c3140$15327e82@pyrimidine> David, I have seen some slowdowns with Bio::SeqIO associated with GenBank files, which this could be related to. I can't do anything about it (test or commit changes) until next week but someone else using Windows might (though we are few and far between, and I'm switching to Mac OS X in fall). Would be nice to try the changes and test it out on a few platforms. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of > David_Waner/San_Diego/Accelrys at scitegic.com > Sent: Thursday, May 18, 2006 2:31 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8 > onWindows > > BioPerl Users/Developers, > > In our testing we have found severe performance problems using BioPerl > with Perl 5.8 on Windows (but not on Linux). They show up especially in > SeqIO when reading or writing Fasta files containing large (~16 MB) > sequences. The same files that can be read in 1 or 2 seconds with Windows > Perl 5.6 or Linux Perl 5.8, take minutes in Windows Perl 5.8. > > Although the fault is clearly with Perl, not with BioPerl, I have > identified a couple of places where BioPerl could be modified in order to > save Windows Perl 5.8 users a lot of time, while not affecting other > users. > > For example, in my testing the following excerpt from > Bio::Root::IO::_readline() takes 50 seconds (!) to execute (when reading a > 16 MB sequence): > > if( (!$param{-raw}) && (defined $line) ) { > $line =~ s/\015?\012/\n/g; > $line =~ s/\015/\n/g unless $ONMAC; > } > > whereas the following replacement code should be equivalent: > > if( (!$param{-raw}) && (defined $line) ) { > $line =~ s/\015\012/\012/g; # Change all > CR/LF pairs to LF > $line =~ tr/\015/\n/ unless $ONMAC; # Change all single CRs to > NEWLINE > } > > but executes in less than 1 second. > > In addition, changing: > > defined $sequence && $sequence =~ s/\s//g; # Remove whitespace > > to: > > defined $sequence && $sequence =~ tr/ \t\n\r//d; # Remove > whitespace > > in Bio::SeqIO::fasta.pm saves an additional ~20 seconds. > > There are also problems in reading files with the <> operator when $/ is > redefined to "\n>", where reading the first line of Fasta files containing > large sequences takes ~50 seconds, but reading subsequent lines or files > takes about 1 second. I don't have a work-around for this. > > I would like to ask the mailing list: > > 1. Has anyone else run into this problem? Any fixes? > 2. Do you think BioPerl should incorporate these changes? > > I plan to submit a bug report to perlbug, but don't know when or if the > problem will be fixed. > > - David > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Thu May 18 16:27:57 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 18 May 2006 16:27:57 -0400 Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8 on Windows In-Reply-To: Message-ID: David, What are the results from the relevant t/*t files before and after these patches? Brian O. On 5/18/06 3:30 PM, "David_Waner/San_Diego/Accelrys at scitegic.com" wrote: > BioPerl Users/Developers, > > In our testing we have found severe performance problems using BioPerl > with Perl 5.8 on Windows (but not on Linux). They show up especially in > SeqIO when reading or writing Fasta files containing large (~16 MB) > sequences. The same files that can be read in 1 or 2 seconds with Windows > Perl 5.6 or Linux Perl 5.8, take minutes in Windows Perl 5.8. > > Although the fault is clearly with Perl, not with BioPerl, I have > identified a couple of places where BioPerl could be modified in order to > save Windows Perl 5.8 users a lot of time, while not affecting other > users. > > For example, in my testing the following excerpt from > Bio::Root::IO::_readline() takes 50 seconds (!) to execute (when reading a > 16 MB sequence): > > if( (!$param{-raw}) && (defined $line) ) { > $line =~ s/\015?\012/\n/g; > $line =~ s/\015/\n/g unless $ONMAC; > } > > whereas the following replacement code should be equivalent: > > if( (!$param{-raw}) && (defined $line) ) { > $line =~ s/\015\012/\012/g; # Change all > CR/LF pairs to LF > $line =~ tr/\015/\n/ unless $ONMAC; # Change all single CRs to > NEWLINE > } > > but executes in less than 1 second. > > In addition, changing: > > defined $sequence && $sequence =~ s/\s//g; # Remove whitespace > > to: > > defined $sequence && $sequence =~ tr/ \t\n\r//d; # Remove > whitespace > > in Bio::SeqIO::fasta.pm saves an additional ~20 seconds. > > There are also problems in reading files with the <> operator when $/ is > redefined to "\n>", where reading the first line of Fasta files containing > large sequences takes ~50 seconds, but reading subsequent lines or files > takes about 1 second. I don't have a work-around for this. > > I would like to ask the mailing list: > > 1. Has anyone else run into this problem? Any fixes? > 2. Do you think BioPerl should incorporate these changes? > > I plan to submit a bug report to perlbug, but don't know when or if the > problem will be fixed. > > - David > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Thu May 18 16:41:27 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Thu, 18 May 2006 14:41:27 -0600 Subject: [Bioperl-l] parsing xml output Message-ID: <446CDBF7.10908@gmx.at> hi, what is the best way to parse NCBI- and WU- Blast XML output.... and is it possible to parse both with the same parser, or differ their XML output... thanks From staffa at niehs.nih.gov Thu May 18 16:49:15 2006 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Thu, 18 May 2006 16:49:15 -0400 Subject: [Bioperl-l] Reading GenBank Genomic File Annotation Message-ID: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov> Would like a fairly simple way to extract certain information from Genbank Genomic File Annotations. Namely the six D.melanogaster sequences. Specifically to find gene entries and learn the gene name, begin and end and CDS. Please point me to appropriate modules and documentation. Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Information Technology Support Services Contract (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina From adamnkraut at gmail.com Thu May 18 17:07:42 2006 From: adamnkraut at gmail.com (Adam Kraut) Date: Thu, 18 May 2006 17:07:42 -0400 Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline C? Message-ID: <134ede0b0605181407l52d1c2c3x79dd7f177ae7b828@mail.gmail.com> I am currently using a pairwise alignment algorithm written in C (not by me). The program consists of a library of routines, structures, and definitions which I do not want to spend a lot of time abstracting. I already have a hack method of writing the parameters and inputs I want from perl, calling the c program with system( ), and then parsing the output in Perl. Any good programmer would probably smack me but I'm just an undergrad and I needed to show my boss that this works in order to spend more time on it. So on to my question, what is the preferred method of extending Bioperl to use this algorithm? I have just read the XS tutorial and a bit about Inline C. Can I put the main function in my script using Inline, and then just point Inline at the rest of the C library? The program has several C-structures that are semantically equivalent to Bioperl objects, so just need somewhere to start. I will spend some more time so that I have a more specific question, I just wanted a little feedback, this is my first post to the bioperl list. Thanks, Adam From osborne1 at optonline.net Thu May 18 17:54:01 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 18 May 2006 17:54:01 -0400 Subject: [Bioperl-l] Reading GenBank Genomic File Annotation In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov> Message-ID: Nick, Have you read the Feature-Annotation HOWTO? This would be a good starting point... Brian O. On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" wrote: > Would like a fairly simple way to extract certain information from Genbank > Genomic File Annotations. > Namely the six D.melanogaster sequences. > Specifically to find gene entries and learn the gene name, begin and end and > CDS. > Please point me to appropriate modules and documentation. > > > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Information Technology Support Services Contract > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Thu May 18 18:22:32 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu, 18 May 2006 18:22:32 -0400 Subject: [Bioperl-l] parsing xml output In-Reply-To: <446CDBF7.10908@gmx.at> References: <446CDBF7.10908@gmx.at> Message-ID: we don't parse WU-BLAST XML at this time. We'd welcome someone contributing this. ncbi XML is parsed with blastxml format. -jason On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote: > hi, > what is the best way to parse NCBI- and WU- Blast XML output.... > and is it possible to parse both with the same parser, or differ their > XML output... > > thanks > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From MEC at stowers-institute.org Thu May 18 18:39:15 2006 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Thu, 18 May 2006 17:39:15 -0500 Subject: [Bioperl-l] module for formating sequence output on the screen Message-ID: Li, Here's a one-liner that uses bioperl's Bio::SeqIO module to reformat fasta on standard input to 50 char wide fasta on standard output. perl -MBio::SeqIO -e 'select Bio::SeqIO->newFh(-format => "fasta", -width => 50); $in = Bio::SeqIO->newFh(-format => "fasta", -fh => \*STDIN); print while <$in>' You can call it like this: perl -MBio::SeqIO -e 'select Bio::SeqIO->newFh(-format => "fasta", -width => 50); $in = Bio::SeqIO->newFh(-format => "fasta", -fh => \*STDIN); print while <$in>' inputfile.fasta > outputfile.fasta Does this help? --Malcolm Cook >-----Original Message----- >From: bioperl-l-bounces at lists.open-bio.org >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chen li >Sent: Tuesday, May 16, 2006 7:53 PM >To: bioperl-l at bioperl.org >Subject: [Bioperl-l] module for formating sequence output on the screen > >Hi all, > >Thank you very much for the help. > >I have some DNA sequences printed on the screen. But >the default output is longer than I expect. I need 50 >necleotides/line. I search CPAN but can not get the >right module. Which bioperl module can do this job? > >Li > >__________________________________________________ >Do You Yahoo!? >Tired of spam? Yahoo! Mail has the best spam protection around >http://mail.yahoo.com >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From gish at watson.wustl.edu Thu May 18 19:57:03 2006 From: gish at watson.wustl.edu (Warren Gish) Date: Thu, 18 May 2006 18:57:03 -0500 Subject: [Bioperl-l] parsing xml output In-Reply-To: Message-ID: <009f01c67ad6$c359a560$0d00a8c0@PM> Just to clarify, the XML output from WU-BLAST conforms to the standard NCBI_BlastOutput.dtd. Technically, contents of data fields could still be incompatible, but care was taken to ensure compatibility. If someone identifies a difference that prevents parsing or proper interpretation of the WU-BLAST output, please let me know. Regards, --Warren > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Jason Stajich > Sent: Thursday, May 18, 2006 5:23 PM > To: Hubert Prielinger > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] parsing xml output > > we don't parse WU-BLAST XML at this time. We'd welcome someone > contributing this. > > ncbi XML is parsed with blastxml format. > > -jason > On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote: > > > hi, > > what is the best way to parse NCBI- and WU- Blast XML output.... > > and is it possible to parse both with the same parser, or > differ their > > XML output... > > > > thanks > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at uiuc.edu Thu May 18 21:10:50 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Thu, 18 May 2006 20:10:50 -0500 Subject: [Bioperl-l] parsing xml output Message-ID: Just to make sure everybody knows, if you use bioperl v1.5.1, SearchIO::blastxml uses XML::Parser which should come with most recent perl distributions. The bioperl-live version has switched over to XML::SAX for SAX2 parsing and it is recommended that you install XML::SAX::ExpatXS as well for faster parsing. Chris ---- Original message ---- >Date: Thu, 18 May 2006 18:57:03 -0500 >From: "Warren Gish" >Subject: Re: [Bioperl-l] parsing xml output >To: "'Hubert Prielinger'" >Cc: bioperl-l at bioperl.org > >Just to clarify, the XML output from WU-BLAST conforms to the standard >NCBI_BlastOutput.dtd. Technically, contents of data fields could still be >incompatible, but care was taken to ensure compatibility. If someone >identifies a difference that prevents parsing or proper interpretation of >the WU-BLAST output, please let me know. >Regards, >--Warren > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Jason Stajich >> Sent: Thursday, May 18, 2006 5:23 PM >> To: Hubert Prielinger >> Cc: bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] parsing xml output >> >> we don't parse WU-BLAST XML at this time. We'd welcome someone >> contributing this. >> >> ncbi XML is parsed with blastxml format. >> >> -jason >> On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote: >> >> > hi, >> > what is the best way to parse NCBI- and WU- Blast XML output.... >> > and is it possible to parse both with the same parser, or >> differ their >> > XML output... >> > >> > thanks >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Fri May 19 08:52:13 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri, 19 May 2006 08:52:13 -0400 Subject: [Bioperl-l] parsing xml output In-Reply-To: <009f01c67ad6$c359a560$0d00a8c0@PM> References: <009f01c67ad6$c359a560$0d00a8c0@PM> Message-ID: <360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu> Whoops - sorry Warren - for some reason I had it in my mind that it was different. So the blastxml parser should work fine. The WUBLAST tab-delimited output is different than NCBI's -m8/9 though, right? -jason On May 18, 2006, at 7:57 PM, Warren Gish wrote: > Just to clarify, the XML output from WU-BLAST conforms to the standard > NCBI_BlastOutput.dtd. Technically, contents of data fields could > still be > incompatible, but care was taken to ensure compatibility. If someone > identifies a difference that prevents parsing or proper > interpretation of > the WU-BLAST output, please let me know. > Regards, > --Warren > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Jason Stajich >> Sent: Thursday, May 18, 2006 5:23 PM >> To: Hubert Prielinger >> Cc: bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] parsing xml output >> >> we don't parse WU-BLAST XML at this time. We'd welcome someone >> contributing this. >> >> ncbi XML is parsed with blastxml format. >> >> -jason >> On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote: >> >>> hi, >>> what is the best way to parse NCBI- and WU- Blast XML output.... >>> and is it possible to parse both with the same parser, or >> differ their >>> XML output... >>> >>> thanks >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From torsten.seemann at infotech.monash.edu.au Thu May 18 18:42:05 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 19 May 2006 08:42:05 +1000 Subject: [Bioperl-l] parsing xml output In-Reply-To: <446CDBF7.10908@gmx.at> References: <446CDBF7.10908@gmx.at> Message-ID: <446CF83D.60207@infotech.monash.edu.au> > what is the best way to parse NCBI- and WU- Blast XML output.... > and is it possible to parse both with the same parser, or differ their > XML output... For NCBI BLAST XML format, use Bio::SearchIO->new(-format=>'blastxml', ...) I don't know if 'blastxml' will load WU-BLAST XML format. http://www.bioperl.org/wiki/HOWTO:SearchIO does not mention it. Why not try it, and report back the results to the bioperl list? -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia -------------- next part -------------- A non-text attachment was scrubbed... Name: torsten.seemann.vcf Type: text/x-vcard Size: 348 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060519/b6343abe/attachment.vcf From torsten.seemann at infotech.monash.edu.au Thu May 18 18:37:17 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 19 May 2006 08:37:17 +1000 Subject: [Bioperl-l] Reading GenBank Genomic File Annotation In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov> References: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov> Message-ID: <446CF71D.2070207@infotech.monash.edu.au> Staffa, Nick (NIH/NIEHS) [C] wrote: > Would like a fairly simple way to extract certain information from Genbank Genomic File Annotations. > Namely the six D.melanogaster sequences. > Specifically to find gene entries and learn the gene name, begin and end and CDS. > Please point me to appropriate modules and documentation. http://www.bioperl.org/ -> http://www.bioperl.org/wiki/HOWTOs -> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation http://www.bioperl.org/ -> http://www.bioperl.org/wiki/FAQ -> http://www.bioperl.org/wiki/FAQ#Annotations_and_Features -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia -------------- next part -------------- A non-text attachment was scrubbed... Name: torsten.seemann.vcf Type: text/x-vcard Size: 348 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060519/27f849fc/attachment.vcf From gish at watson.wustl.edu Fri May 19 10:50:08 2006 From: gish at watson.wustl.edu (Warren Gish) Date: Fri, 19 May 2006 09:50:08 -0500 Subject: [Bioperl-l] parsing xml output In-Reply-To: <360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu> References: <009f01c67ad6$c359a560$0d00a8c0@PM> <360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu> Message-ID: Right, the WU-BLAST tabbed output contains more fields. (See http:// blast.wustl.edu/blast/tabular.html). --Warren > Whoops - sorry Warren - for some reason I had it in my mind that it > was different. So the blastxml parser should work fine. The > WUBLAST tab-delimited output is different than NCBI's -m8/9 though, > right? > > -jason From adamnkraut at gmail.com Fri May 19 11:04:01 2006 From: adamnkraut at gmail.com (Adam Kraut) Date: Fri, 19 May 2006 11:04:01 -0400 Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline C? In-Reply-To: References: <134ede0b0605181407l52d1c2c3x79dd7f177ae7b828@mail.gmail.com> Message-ID: <134ede0b0605190804i60ee5ce1v984a33e0c91adf52@mail.gmail.com> The program generates an ensemble of weighted suboptimal alignments by use of a partition function and stochastic backtracking. The algorithm is quite novel and it's really only part of a larger multi-scale comparative modeling project. There documentation is here: http://www.tbi.univie.ac.at/~ulim/probA/probA_lib.html While I think this would be useful to the bioperl community if it were fully abstracted/extended, I would at the least like to be able to pass in any two sequences and get back SimpleAlign objects for our internal uses first. I have a good idea on how to get started. I will be sure to post when I get into trouble. On 5/19/06, aaron.j.mackey at gsk.com wrote: > > bioperl-ext is the package in which alignment algorithms and/or BioPerl > "wrapped" external C libraries live. Subprojects in bioperl-ext use both > XS and Inline::C, that's up to you. > > You'll need to get your C code compiled to a dynamically loaded library > (.so) to use either XS or Inline::C; this precludes any reuse of the C > main() function (although your Inline::C wrapper might recapitulate/copy > the main() function code). > > Out of curiosity, what pairwise alignment algorithm are you using? This > is a heavily beaten path, you might want to dig around first to see if > someone else already has what you need. > > -Aaron > > From slenk at emich.edu Fri May 19 10:42:41 2006 From: slenk at emich.edu (Stephen Gordon Lenk) Date: Fri, 19 May 2006 10:42:41 -0400 Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline C? Message-ID: There is nothing wrong with a reasonable way that works - better not to put yourself down. Inline is good if you can get it to work for you - I have had issues with linking Inline to dynamic libraries. I believe Inline makes a file that has linkage characteristics specified. Try it and see, then tell people how you did it. My two cents. Another way to use exterior executables is popen3, then reading and writing to the pipes. I use it (primer3 and local lab automation code) - snippet follows: my $pid = 0; my $cancmd = 'cancmd.exe'; my $write = 0; my $read = 0; sub new { my $c = {}; $pid = open3(\*WTRFH, \*RDRFH, \*RDRFH, $cancmd); $write = *WTRFH; $read = *RDRFH; $write->autoflush(); bless $c; return $c; } Just write your request, then read it back - I make sure that each pair is a newline terminated text line - be sure you harvest the child pid when you are done. ----- Original Message ----- From: Adam Kraut Date: Thursday, May 18, 2006 5:07 pm Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline C? > I am currently using a pairwise alignment algorithm written in C > (not by > me). The program consists of a library of routines, structures, and > definitions which I do not want to spend a lot of time > abstracting. I > already have a hack method of writing the parameters and inputs I > want from > perl, calling the c program with system( ), and then parsing the > output in > Perl. Any good programmer would probably smack me but I'm just an > undergradand I needed to show my boss that this works in order to > spend more time on > it. > > So on to my question, what is the preferred method of extending > Bioperl to > use this algorithm? I have just read the XS tutorial and a bit > about Inline > C. Can I put the main function in my script using Inline, and > then just > point Inline at the rest of the C library? The program has several > C-structures that are semantically equivalent to Bioperl objects, > so just > need somewhere to start. I will spend some more time so that I > have a more > specific question, I just wanted a little feedback, this is my > first post to > the bioperl list. > > Thanks, > Adam > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hubert.prielinger at gmx.at Fri May 19 12:52:28 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 19 May 2006 10:52:28 -0600 Subject: [Bioperl-l] parsing xml output In-Reply-To: References: <009f01c67ad6$c359a560$0d00a8c0@PM> <360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu> Message-ID: <446DF7CC.5060509@gmx.at> hi, I wondered whether is it also possible in the xml output (either WU or NCBI - Blast) to get the species (taxononmy) for every hit, if I do a general search. regards Warren Gish wrote: > Right, the WU-BLAST tabbed output contains more fields. (See http:// > blast.wustl.edu/blast/tabular.html). > --Warren > > >> Whoops - sorry Warren - for some reason I had it in my mind that it >> was different. So the blastxml parser should work fine. The >> WUBLAST tab-delimited output is different than NCBI's -m8/9 though, >> right? >> >> -jason >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From staffa at niehs.nih.gov Fri May 19 14:12:47 2006 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Fri, 19 May 2006 14:12:47 -0400 Subject: [Bioperl-l] Reading GenBank Genomic File Annotation In-Reply-To: Message-ID: <7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov> Specifically: I have the document to which you refer, but have not seen this one thing I need in the printout of tags etc.: the values in this line; mRNA join(380..509,578..1913,7784..8649,9439..10200) Is that a location object? Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Information Technology Support Services Contract (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina > ---------- > From: Brian Osborne > Sent: Thursday, May 18, 2006 5:54 PM > To: Staffa, Nick (NIH/NIEHS) [C]; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Reading GenBank Genomic File Annotation > > Nick, > > Have you read the Feature-Annotation HOWTO? This would be a good starting > point... > > Brian O. > > > On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" > wrote: > > > Would like a fairly simple way to extract certain information from Genbank > > Genomic File Annotations. > > Namely the six D.melanogaster sequences. > > Specifically to find gene entries and learn the gene name, begin and end and > > CDS. > > Please point me to appropriate modules and documentation. > > > > > > Nick Staffa > > Telephone: 919-316-4569 (NIEHS: 6-4569) > > Scientific Computing Support Group > > NIEHS Information Technology Support Services Contract > > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) > > National Institute of Environmental Health Sciences > > National Institutes of Health > > Research Triangle Park, North Carolina > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From chandan.kr.singh at gmail.com Fri May 19 14:37:26 2006 From: chandan.kr.singh at gmail.com (CHANDAN SINGH) Date: Sat, 20 May 2006 00:07:26 +0530 Subject: [Bioperl-l] Reading GenBank Genomic File Annotation In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov> References: <7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov> Message-ID: <2d4f320605191137n11017ec0xe41a632a3c7ea9a9@mail.gmail.com> On 5/19/06, Staffa, Nick (NIH/NIEHS) [C] wrote: > > Specifically: > I have the document to which you refer, > but have not seen this one thing I need in the printout of tags etc.: > the values in this line; > mRNA join(380..509,578..1913,7784..8649,9439..10200) > Is that a location object? Yes it is a location object . If you want that as a string (this is what seems from ur mail ) , u just have to do this : $loc = $fet->location(); $loc_str = $loc->to_FTstring() ; Hope it helps. Chandan Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Information Technology Support Services Contract > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > > > ---------- > > From: Brian Osborne > > Sent: Thursday, May 18, 2006 5:54 PM > > To: Staffa, Nick (NIH/NIEHS) [C]; bioperl-l at lists.open-bio.org > > Subject: Re: [Bioperl-l] Reading GenBank Genomic File Annotation > > > > Nick, > > > > Have you read the Feature-Annotation HOWTO? This would be a good > starting > > point... > > > > Brian O. > > > > > > On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" > > > wrote: > > > > > Would like a fairly simple way to extract certain information from > Genbank > > > Genomic File Annotations. > > > Namely the six D.melanogaster sequences. > > > Specifically to find gene entries and learn the gene name, begin and > end and > > > CDS. > > > Please point me to appropriate modules and documentation. > > > > > > > > > Nick Staffa > > > Telephone: 919-316-4569 (NIEHS: 6-4569) > > > Scientific Computing Support Group > > > NIEHS Information Technology Support Services Contract > > > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) > > > National Institute of Environmental Health Sciences > > > National Institutes of Health > > > Research Triangle Park, North Carolina > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From osborne1 at optonline.net Fri May 19 15:39:36 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 19 May 2006 15:39:36 -0400 Subject: [Bioperl-l] Reading GenBank Genomic File Annotation In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov> Message-ID: Nick, This is from the HOWTO: Another way of describing a feature in Genbank involves multiple start and end positions. These could be called "split" locations, and a very common example is the join statement in the CDS feature found in Genbank entries (e.g. join(45..122,233..267)). This calls for a specialized object, Bio::Location::SplitLocationI, which is a container for Location objects: for my $feature ($seqobj->top_SeqFeatures){ if ( $feature->location->isa('Bio::Location::SplitLocationI') && $feature->primary_tag eq 'CDS' ) { for my $location ( $feature->location->sub_Location ) { print $location->start . ".." . $location->end . "\n"; } } } Brian O. On 5/19/06 2:12 PM, "Staffa, Nick (NIH/NIEHS) [C]" wrote: > Specifically: > I have the document to which you refer, > but have not seen this one thing I need in the printout of tags etc.: > the values in this line; > mRNA join(380..509,578..1913,7784..8649,9439..10200) > Is that a location object? > > > > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Information Technology Support Services Contract > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > >> ---------- >> From: Brian Osborne >> Sent: Thursday, May 18, 2006 5:54 PM >> To: Staffa, Nick (NIH/NIEHS) [C]; bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Reading GenBank Genomic File Annotation >> >> Nick, >> >> Have you read the Feature-Annotation HOWTO? This would be a good starting >> point... >> >> Brian O. >> >> >> On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" >> wrote: >> >>> Would like a fairly simple way to extract certain information from Genbank >>> Genomic File Annotations. >>> Namely the six D.melanogaster sequences. >>> Specifically to find gene entries and learn the gene name, begin and end and >>> CDS. >>> Please point me to appropriate modules and documentation. >>> >>> >>> Nick Staffa >>> Telephone: 919-316-4569 (NIEHS: 6-4569) >>> Scientific Computing Support Group >>> NIEHS Information Technology Support Services Contract >>> (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) >>> National Institute of Environmental Health Sciences >>> National Institutes of Health >>> Research Triangle Park, North Carolina >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Fri May 19 16:42:09 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 19 May 2006 14:42:09 -0600 Subject: [Bioperl-l] parsing xml output In-Reply-To: References: <009f01c67ad6$c359a560$0d00a8c0@PM> <360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu> <446DF7CC.5060509@gmx.at> Message-ID: <446E2DA1.1050503@gmx.at> hi warren, that means if I alter the DTD (if that is possible) by adding the taxonomic id to the DTD..... then I should have the taxonomic id tag in the xml file (theoretically) but I guess this is only possible with a local search (blastall) but not with an online search. greetings Warren Gish wrote: > > On May 19, 2006, at 11:52 AM, Hubert Prielinger wrote: > >> hi, >> I wondered whether is it also possible in the xml output (either WU >> or NCBI - Blast) to get the species (taxononmy) for every hit, if I >> do a general search. >> regards >> > The taxonomic id is not an entity in the NCBI XML DTD. If the > information was embedded in deflines, one could conceivably parse for > it, but I believe the NCBI only distributes taxids in their ASN.1 data > and in their pre-formated BLAST databases, and NCBI BLAST only reports > taxids in its ASN.1 output format, where taxid is available as an entity. > > --Warren > > From cjfields at uiuc.edu Fri May 19 16:56:56 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Fri, 19 May 2006 15:56:56 -0500 Subject: [Bioperl-l] parsing xml output Message-ID: <5c1c5a79.bb0af5aa.8198d00@expms6.cites.uiuc.edu> You'll have to pull the GI or accession from each hit and do a lookup by either grabbing the sequence and using Bio::Species or use Bio::DB::Taxonomy; there isn't any tax information directly incorporated into BLAST reports AFAIK. Chris ---- Original message ---- >Date: Fri, 19 May 2006 10:52:28 -0600 >From: Hubert Prielinger >Subject: Re: [Bioperl-l] parsing xml output >To: Warren Gish , bioperl-l at bioperl.org > >hi, >I wondered whether is it also possible in the xml output (either WU or >NCBI - Blast) to get the species (taxononmy) for every hit, if I do a >general search. >regards > >Warren Gish wrote: >> Right, the WU-BLAST tabbed output contains more fields. (See http:// >> blast.wustl.edu/blast/tabular.html). >> --Warren >> >> >>> Whoops - sorry Warren - for some reason I had it in my mind that it >>> was different. So the blastxml parser should work fine. The >>> WUBLAST tab-delimited output is different than NCBI's -m8/9 though, >>> right? >>> >>> -jason >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri May 19 16:59:35 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Fri, 19 May 2006 15:59:35 -0500 Subject: [Bioperl-l] parsing xml output Message-ID: <65932c77.bb0b33b0.8253400@expms6.cites.uiuc.edu> Um, I don't think it works that way. I'm pretty sure the XML is generated from the ASN1 output. I don't think (like Warren says) that you can directly get to the tax information. Indirectly is another matter... Chris ---- Original message ---- >Date: Fri, 19 May 2006 14:42:09 -0600 >From: Hubert Prielinger >Subject: Re: [Bioperl-l] parsing xml output >To: Warren Gish , bioperl-l at bioperl.org > >hi warren, >that means if I alter the DTD (if that is possible) by adding the >taxonomic id to the DTD..... then I should have the taxonomic id tag in >the xml file (theoretically) >but I guess this is only possible with a local search (blastall) but not >with an online search. > >greetings > >Warren Gish wrote: >> >> On May 19, 2006, at 11:52 AM, Hubert Prielinger wrote: >> >>> hi, >>> I wondered whether is it also possible in the xml output (either WU >>> or NCBI - Blast) to get the species (taxononmy) for every hit, if I >>> do a general search. >>> regards >>> >> The taxonomic id is not an entity in the NCBI XML DTD. If the >> information was embedded in deflines, one could conceivably parse for >> it, but I believe the NCBI only distributes taxids in their ASN.1 data >> and in their pre-formated BLAST databases, and NCBI BLAST only reports >> taxids in its ASN.1 output format, where taxid is available as an entity. >> >> --Warren >> >> > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Fri May 19 17:30:20 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 19 May 2006 15:30:20 -0600 Subject: [Bioperl-l] parsing xml output In-Reply-To: <446E3854.5010708@gmx.at> References: <5c1c5a79.bb0af5aa.8198d00@expms6.cites.uiuc.edu> <446E3854.5010708@gmx.at> Message-ID: <446E38EC.9020100@gmx.at> ok, thanks, it appears that I only need the species where the Protein is derived from, so I guess Bio:Species would satisfy me, or? and it would work that I just pull off the accession from the blast output file and then assign the accession code and get as return value the species name. is it possible to just assign the accession code, because I looked up but they were always talking of the entire file. regards > > > Christopher Fields wrote: >> You'll have to pull the GI or accession from each hit and do a lookup >> by either grabbing the sequence and using Bio::Species or use >> Bio::DB::Taxonomy; there isn't any tax information directly >> incorporated into BLAST reports AFAIK. >> >> Chris >> >> ---- Original message ---- >> >>> Date: Fri, 19 May 2006 10:52:28 -0600 >>> From: Hubert Prielinger Subject: Re: >>> [Bioperl-l] parsing xml output To: Warren Gish >>> , bioperl-l at bioperl.org >>> >>> hi, >>> I wondered whether is it also possible in the xml output (either WU >>> or NCBI - Blast) to get the species (taxononmy) for every hit, if I >>> do a general search. >>> regards >>> >>> Warren Gish wrote: >>> >>>> Right, the WU-BLAST tabbed output contains more fields. (See >>>> http:// blast.wustl.edu/blast/tabular.html). >>>> --Warren >>>> >>>> >>>>> Whoops - sorry Warren - for some reason I had it in my mind that >>>>> it was different. So the blastxml parser should work fine. The >>>>> WUBLAST tab-delimited output is different than NCBI's -m8/9 >>>>> though, right? >>>>> >>>>> -jason >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > From jason.stajich at duke.edu Fri May 19 18:40:54 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri, 19 May 2006 18:40:54 -0400 Subject: [Bioperl-l] parsing xml output In-Reply-To: <446E38EC.9020100@gmx.at> References: <5c1c5a79.bb0af5aa.8198d00@expms6.cites.uiuc.edu> <446E3854.5010708@gmx.at> <446E38EC.9020100@gmx.at> Message-ID: There is a gi2taxid table in the /pub/taxonomy part of NCBI FTP site (ftp.ncbi.nih.gov) -- I have used this to take GI numbers from report and get taxonomy for overall classification. I think something like this exists in the scripts or examples directory in the bioperl distro. I know I posted about it when I wrote about it a while ago. -jason On May 19, 2006, at 5:30 PM, Hubert Prielinger wrote: > ok, thanks, > it appears that I only need the species where the Protein is derived > from, so I guess Bio:Species would satisfy me, or? > and it would work that I just pull off the accession from the blast > output file and then assign the accession code and get as return value > the species name. > is it possible to just assign the accession code, because I looked up > but they were always talking of the entire file. > > regards >> >> >> Christopher Fields wrote: >>> You'll have to pull the GI or accession from each hit and do a >>> lookup >>> by either grabbing the sequence and using Bio::Species or use >>> Bio::DB::Taxonomy; there isn't any tax information directly >>> incorporated into BLAST reports AFAIK. >>> >>> Chris >>> >>> ---- Original message ---- >>> >>>> Date: Fri, 19 May 2006 10:52:28 -0600 >>>> From: Hubert Prielinger Subject: Re: >>>> [Bioperl-l] parsing xml output To: Warren Gish >>>> , bioperl-l at bioperl.org >>>> >>>> hi, >>>> I wondered whether is it also possible in the xml output (either WU >>>> or NCBI - Blast) to get the species (taxononmy) for every hit, if I >>>> do a general search. >>>> regards >>>> >>>> Warren Gish wrote: >>>> >>>>> Right, the WU-BLAST tabbed output contains more fields. (See >>>>> http:// blast.wustl.edu/blast/tabular.html). >>>>> --Warren >>>>> >>>>> >>>>>> Whoops - sorry Warren - for some reason I had it in my mind that >>>>>> it was different. So the blastxml parser should work fine. The >>>>>> WUBLAST tab-delimited output is different than NCBI's -m8/9 >>>>>> though, right? >>>>>> >>>>>> -jason >>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From ewijaya at i2r.a-star.edu.sg Sat May 20 08:36:44 2006 From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward) Date: Sat, 20 May 2006 20:36:44 +0800 Subject: [Bioperl-l] Method for checking Sequence type of a file Message-ID: <30362db229c.446f7ddc@i2r.a-star.edu.sg> Dear expert, Is there any Bioperl method that allows you to check verify sequence type in a file? For example, given a file we wish to check (return true or false) whether it is in FASTA format, GENBANK format, etc. This method is useful in web application as taint checking procedure. Regards, Edward WIJAYA SINGAPORE ------------ Institute For Infocomm Research - Disclaimer ------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you. -------------------------------------------------------- From aaron.j.mackey at gsk.com Fri May 19 09:33:01 2006 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Fri, 19 May 2006 09:33:01 -0400 Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline C? In-Reply-To: <134ede0b0605181407l52d1c2c3x79dd7f177ae7b828@mail.gmail.com> Message-ID: bioperl-ext is the package in which alignment algorithms and/or BioPerl "wrapped" external C libraries live. Subprojects in bioperl-ext use both XS and Inline::C, that's up to you. You'll need to get your C code compiled to a dynamically loaded library (.so) to use either XS or Inline::C; this precludes any reuse of the C main() function (although your Inline::C wrapper might recapitulate/copy the main() function code). Out of curiosity, what pairwise alignment algorithm are you using? This is a heavily beaten path, you might want to dig around first to see if someone else already has what you need. -Aaron bioperl-l-bounces at lists.open-bio.org wrote on 05/18/2006 05:07:42 PM: > I am currently using a pairwise alignment algorithm written in C (not by > me). The program consists of a library of routines, structures, and > definitions which I do not want to spend a lot of time abstracting. I > already have a hack method of writing the parameters and inputs I want from > perl, calling the c program with system( ), and then parsing the output in > Perl. Any good programmer would probably smack me but I'm just an undergrad > and I needed to show my boss that this works in order to spend more time on > it. > > So on to my question, what is the preferred method of extending Bioperl to > use this algorithm? I have just read the XS tutorial and a bit about Inline > C. Can I put the main function in my script using Inline, and then just > point Inline at the rest of the C library? The program has several > C-structures that are semantically equivalent to Bioperl objects, so just > need somewhere to start. I will spend some more time so that I have a more > specific question, I just wanted a little feedback, this is my first post to > the bioperl list. > > Thanks, > Adam > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at duke.edu Sat May 20 10:50:17 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Sat, 20 May 2006 10:50:17 -0400 Subject: [Bioperl-l] Method for checking Sequence type of a file In-Reply-To: <30362db229c.446f7ddc@i2r.a-star.edu.sg> References: <30362db229c.446f7ddc@i2r.a-star.edu.sg> Message-ID: Try Bio::Tools::GuessSeqFormat On May 20, 2006, at 8:36 AM, Wijaya Edward wrote: > > Dear expert, > > Is there any Bioperl method that allows > you to check verify sequence type in a file? > > For example, given a file we wish > to check (return true or false) whether > it is in FASTA format, GENBANK format, etc. > > This method is useful in web application > as taint checking procedure. > > Regards, > Edward WIJAYA > SINGAPORE > > > ------------ Institute For Infocomm Research - Disclaimer > ------------- > This email is confidential and may be privileged. If you are not > the intended recipient, please delete it and notify us immediately. > Please do not copy or use it for any purpose, or disclose its > contents to any other person. Thank you. > -------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From chen_li3 at yahoo.com Sat May 20 20:15:01 2006 From: chen_li3 at yahoo.com (chen li) Date: Sat, 20 May 2006 17:15:01 -0700 (PDT) Subject: [Bioperl-l] problems iwth Bio::graphics module Message-ID: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com> Dear all, I try one script from GraphicsHowTo under Cygwin environment(GD and libpng already installed). I type this line in Cygwin X window: $ perl render_blast1.pl data1.txt | display - And here is the result: display: no decode delegate for this image format `/tmp/magick-qKiRPDRS'. Any idea? Thank you very much, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From osborne1 at optonline.net Sat May 20 20:59:06 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Sat, 20 May 2006 20:59:06 -0400 Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com> Message-ID: Chen, Not sure. However, whenever I see a new or incomprehensible error message like "display: no decode delegate for this image format" I Google it. Brian O. On 5/20/06 8:15 PM, "chen li" wrote: > Dear all, > > > I try one script from GraphicsHowTo under Cygwin > environment(GD and libpng already installed). I type > this line in Cygwin X window: > > > $ perl render_blast1.pl data1.txt | display - > > And here is the result: > > display: no decode delegate for this image format > `/tmp/magick-qKiRPDRS'. > > Any idea? > > > Thank you very much, > > Li > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From n.saunders at uq.edu.au Sun May 21 18:17:44 2006 From: n.saunders at uq.edu.au (Neil Saunders) Date: Mon, 22 May 2006 08:17:44 +1000 Subject: [Bioperl-l] problems with Bio::Graph Message-ID: <4470E708.3070402@uq.edu.au> dear all, I am having some problems with the Bio::Graph modules. Running Bioperl 1.5.0 RC1 with Ubuntu 5.10 i686. I would like to parse files in PSI MI XML 2.5 format and for selected proteins, get the Uniprot accession of interacting partners (this is outlined in the documentation for Bio::Graph::ProteinGraph). I wrote a very simple test script and ran it on a selection of XML files. The script is simply: ---------------------------------------------------------------- use strict; use Bio::Graph::IO; my $mifile = shift || die("Usage = biograph.pl \n"); my $graphio = Bio::Graph::IO->new('-file' => $mifile, '-format' => 'psi_xml'); my $gr = $graphio->next_network; ---------------------------------------------------------------- Here's a summary of the error messages with some sample files (I tried PSI MI XML versions 1 and 2.5): 1. MINT database 9707552_small.xml (PSI 2.5) Can't call method "att" on an undefined value at /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173. 2. IntAct database yeast_small-11.xml (PSI 2.5) Can't call method "att" on an undefined value at /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173. 3. IntAct database yeast_small-11.xml (PSI 1) Use of uninitialized value in string eq at /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 126. 4. DIP files Scere20060402.mif, Ecoli20060402.mif (PSI 1) These give no errors 5. DIP file dip20060402.mif (PSI 1, complete dataset) ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Invalid species name 'immunodeficiency virus type 1, HIV-1' STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.7/Bio/Root/Root.pm:328 STACK: Bio::Species::validate_species_name /usr/local/share/perl/5.8.7/Bio/Species.pm:340 STACK: Bio::Species::classification /usr/local/share/perl/5.8.7/Bio/Species.pm:170 STACK: Bio::Species::new /usr/local/share/perl/5.8.7/Bio/Species.pm:118 STACK: Bio::Graph::IO::psi_xml::_proteinInteractor /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:105 STACK: XML::Twig::_twig_end /usr/share/perl5/XML/Twig.pm:1473 STACK: XML::Parser::Expat::parse /usr/lib/perl5/XML/Parser/Expat.pm:469 STACK: XML::Parser::parse /usr/lib/perl5/XML/Parser.pm:187 STACK: XML::Parser::parsefile /usr/lib/perl5/XML/Parser.pm:233 STACK: Bio::Graph::IO::psi_xml::next_network /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:79 STACK: ./biograph.pl:18 ----------------------------------------------------------- Looking at the module code, it seems that the first 2 errors relate to a parameter "proteinInteractorRef", found in PSI MI version 1 but not version 2.5. Error 3 I haven't yet figured out. DIP PSI MI XML version 1 for single species seems OK, but it seems there are species names in the complete dataset that cause problems (error 5). Is the CVS version of Bio::Graph any better at handling PSI MI XML? Are there plans to get it to work with version 2.5 files from all sources (MINT and IntAct) ? Googling and checking the list archives didn't give a lot of hits which made me think it's not a widely-used module. thanks, Neil -- School of Molecular and Microbial Sciences University of Queensland Brisbane 4072 Australia http://psychro.bioinformatics.unsw.edu.au/neil From torsten.seemann at infotech.monash.edu.au Sun May 21 21:31:56 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Mon, 22 May 2006 11:31:56 +1000 Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com> References: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com> Message-ID: <4471148C.5090404@infotech.monash.edu.au> > I try one script from GraphicsHowTo under Cygwin > environment(GD and libpng already installed). I type > this line in Cygwin X window: > $ perl render_blast1.pl data1.txt | display - > display: no decode delegate for this image format > `/tmp/magick-qKiRPDRS'. You are piping the output of the Perl script (which is a GIF/PNG image) into the input of a program called "display". This program is part of the ImageMagick toolkit, standard on most Linux installations. Because you are using Windows you probably don't have it installed! Try this: $ perl render_blast1.pl data1.txt > image.gif Then load 'image.gif' into whatever your favourite image viewer is. -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From darin.london at duke.edu Mon May 22 11:29:45 2006 From: darin.london at duke.edu (Darin London) Date: Mon, 22 May 2006 11:29:45 -0400 Subject: [Bioperl-l] BOSC 2006 2nd Call for Papers In-Reply-To: <4471CE49.80109@duke.edu> References: <44294B65.4050207@duke.edu> <4471CE49.80109@duke.edu> Message-ID: <4471D8E9.8090109@duke.edu> 2nd CALL FOR SPEAKERS This is the second and last official call for speakers to submit their abstracts to speak at BOSC 2006 in Fortaleza, Brasil. In order to be considered as a potential speaker, an abstract must be recieved by Monday, June 5th, 2006. We look forward to a great conference this year. Please consult The Official BOSC 2006 Website at: http://www.open-bio.org/wiki/BOSC_2006 for more details and information. In addition, a BOSC weblog has been setup to make it easier to desiminate all BOSC related announcements: http://wiki.open-bio.org/boscblog/ And if you have an ICAL compatible Calendar, there is an EventDB calendar set up with all BOSC related deadlines. http://eventful.com/groups/G0-001-000014747-0 More information about ISMB can be found at the Official ISMB 2006 Website: http://ismb2006.cbi.cnptia.embrapa.br/ Thank You, and we look forward to seeing you all, The BOSC Organizing Committee. From darin.london at duke.edu Mon May 22 12:00:55 2006 From: darin.london at duke.edu (Darin London) Date: Mon, 22 May 2006 09:00:55 -0700 Subject: [Bioperl-l] [Bioperl-announce-l] BOSC 2006 2nd Call for Papers In-Reply-To: <4471CE49.80109@duke.edu> References: <44294B65.4050207@duke.edu> <4471CE49.80109@duke.edu> Message-ID: <000301c67db8$e8391f70$6400a8c0@CodonSolutions.local> 2nd CALL FOR SPEAKERS This is the second and last official call for speakers to submit their abstracts to speak at BOSC 2006 in Fortaleza, Brasil. In order to be considered as a potential speaker, an abstract must be recieved by Monday, June 5th, 2006. We look forward to a great conference this year. Please consult The Official BOSC 2006 Website at: http://www.open-bio.org/wiki/BOSC_2006 for more details and information. In addition, a BOSC weblog has been setup to make it easier to desiminate all BOSC related announcements: http://wiki.open-bio.org/boscblog/ And if you have an ICAL compatible Calendar, there is an EventDB calendar set up with all BOSC related deadlines. http://eventful.com/groups/G0-001-000014747-0 More information about ISMB can be found at the Official ISMB 2006 Website: http://ismb2006.cbi.cnptia.embrapa.br/ Thank You, and we look forward to seeing you all, The BOSC Organizing Committee. _______________________________________________ Bioperl-announce-l mailing list Bioperl-announce-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l From osborne1 at optonline.net Mon May 22 17:37:50 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Mon, 22 May 2006 17:37:50 -0400 Subject: [Bioperl-l] problems with Bio::Graph In-Reply-To: <4470E708.3070402@uq.edu.au> Message-ID: Neil, Let me propose an alternative. In the past few months I've been working on a Bioperl package for handling protein interaction networks, it is called bioperl-network. It's similar to the Bio::Graph modules, except for the following: - It does not use Nat Goodman's SimpleGraph, it uses Perl's Graph. The advantage is that we are not responsible for maintaining the algorithm code, the disadvantage is that Graph has some bugs but Jarkko Hietaniemi has been working on these and has fixed some significant ones recently. - It uses names and concepts from Graph. It also has separate notions of edge and interaction, where one edge can have one or more interactions. - It uses more method names and conventions borrowed from interaction databases and PSI MI. For example, a node can be a protein complex composed of multiple Seq objects, not just a protein. This package is a makeover of Bio::Graph, therefore Nat Goodman and Richard Adams are major contributors to it. It's also worth mentioning that it's not complete, meaning it won't parse all fields from PSI MI 2 or 2.5 but I think it should be able to handle the code you've shown (and if it cannot then I'll see that it's fixed). I don't know about PSI MI version 1 but if I'm not mistaken there's a version 1 -> version 2 converter. I'm about to put this into CVS so you can take a look, should you choose to. Brian O. On 5/21/06 6:17 PM, "Neil Saunders" wrote: > dear all, > > I am having some problems with the Bio::Graph modules. Running Bioperl 1.5.0 > RC1 with Ubuntu 5.10 i686. > > I would like to parse files in PSI MI XML 2.5 format and for selected > proteins, > get the Uniprot accession of interacting partners (this is outlined in the > documentation for Bio::Graph::ProteinGraph). I wrote a very simple test > script > and ran it on a selection of XML files. The script is simply: > > ---------------------------------------------------------------- > use strict; > use Bio::Graph::IO; > > my $mifile = shift || die("Usage = biograph.pl \n"); > my $graphio = Bio::Graph::IO->new('-file' => $mifile, > '-format' => 'psi_xml'); > my $gr = $graphio->next_network; > ---------------------------------------------------------------- > > Here's a summary of the error messages with some sample files (I tried PSI MI > XML versions 1 and 2.5): > > 1. MINT database 9707552_small.xml (PSI 2.5) > Can't call method "att" on an undefined value at > /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173. > > 2. IntAct database yeast_small-11.xml (PSI 2.5) > Can't call method "att" on an undefined value at > /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173. > > 3. IntAct database yeast_small-11.xml (PSI 1) > Use of uninitialized value in string eq at > /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 126. > > 4. DIP files Scere20060402.mif, Ecoli20060402.mif (PSI 1) > These give no errors > > 5. DIP file dip20060402.mif (PSI 1, complete dataset) > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Invalid species name 'immunodeficiency virus type 1, HIV-1' > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.7/Bio/Root/Root.pm:328 > STACK: Bio::Species::validate_species_name > /usr/local/share/perl/5.8.7/Bio/Species.pm:340 > STACK: Bio::Species::classification > /usr/local/share/perl/5.8.7/Bio/Species.pm:170 > STACK: Bio::Species::new /usr/local/share/perl/5.8.7/Bio/Species.pm:118 > STACK: Bio::Graph::IO::psi_xml::_proteinInteractor > /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:105 > STACK: XML::Twig::_twig_end /usr/share/perl5/XML/Twig.pm:1473 > STACK: XML::Parser::Expat::parse /usr/lib/perl5/XML/Parser/Expat.pm:469 > STACK: XML::Parser::parse /usr/lib/perl5/XML/Parser.pm:187 > STACK: XML::Parser::parsefile /usr/lib/perl5/XML/Parser.pm:233 > STACK: Bio::Graph::IO::psi_xml::next_network > /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:79 > STACK: ./biograph.pl:18 > ----------------------------------------------------------- > > > Looking at the module code, it seems that the first 2 errors relate to a > parameter "proteinInteractorRef", found in PSI MI version 1 but not version > 2.5. > Error 3 I haven't yet figured out. DIP PSI MI XML version 1 for single > species seems OK, but it seems there are species names in the complete dataset > that cause problems (error 5). > > > Is the CVS version of Bio::Graph any better at handling PSI MI XML? Are there > plans to get it to work with version 2.5 files from all sources (MINT and > IntAct) ? Googling and checking the list archives didn't give a lot of hits > which made me think it's not a widely-used module. > > thanks, > Neil From torsten.seemann at infotech.monash.edu.au Mon May 22 17:53:02 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 23 May 2006 07:53:02 +1000 Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <20060522132553.21896.qmail@web36804.mail.mud.yahoo.com> References: <20060522132553.21896.qmail@web36804.mail.mud.yahoo.com> Message-ID: <447232BE.1080001@infotech.monash.edu.au> Chen Li > perl render_blast1.pl data1.txt >im.png Based on http://bioperl.org/wiki/HOWTO:Graphics I believe the example script is creating a PNG image. The last line is: print $panel->png; > and Perl runs without any problem. I use adobe > photoshop to open them and Adobe can't recognize them. > If I use ACDSee to open them I only get a black > background. If I issue this line under Cygwin X window > display im.png or display im.gif > Cygwin says: > display: Improper image header `im.png'. > It seems Perl can't produce an image with right > format. Are you sure Perl is producing a PNG file at all? How many bytes does im.png use? Zero? Did you notice this in http://bioperl.org/wiki/HOWTO:Graphics ? It says: "If you are on a Windows platform, you need to put STDOUT into binary mode so that the PNG file does not go through Window's carriage return/linefeed transformations. Before the final print statement, put the statement binmode(STDOUT)." ie. your script should have binmode(STDOUT); print $panel->png; as the last 2 lines. > Do you experience the same problem before? No. --Torsten From chen_li3 at yahoo.com Mon May 22 09:25:53 2006 From: chen_li3 at yahoo.com (chen li) Date: Mon, 22 May 2006 06:25:53 -0700 (PDT) Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <4471148C.5090404@infotech.monash.edu.au> Message-ID: <20060522132553.21896.qmail@web36804.mail.mud.yahoo.com> Dear Dr. Seemann, Thank you very much for the reply. I issue this line: perl render_blast1.pl data1.txt >im.gif or perl render_blast1.pl data1.txt >im.png and Perl runs without any problem. I use adobe photoshop to open them and Adobe can't recognize them. If I use ACDSee to open them I only get a black background. If I issue this line under Cygwin X window display im.png or display im.gif Cygwin says: display: Improper image header `im.png'. or display: Improper image header `im.gif'. It seems Perl can't produce an image with right format. Do you experience the same problem before? Li --- Torsten Seemann wrote: > > I try one script from GraphicsHowTo under Cygwin > > environment(GD and libpng already installed). I > type > > this line in Cygwin X window: > > $ perl render_blast1.pl data1.txt | display - > > display: no decode delegate for this image format > > `/tmp/magick-qKiRPDRS'. > > You are piping the output of the Perl script (which > is a GIF/PNG image) > into the input of a program called "display". This > program is part of > the ImageMagick toolkit, standard on most Linux > installations. Because > you are using Windows you probably don't have it > installed! Try this: > > $ perl render_blast1.pl data1.txt > image.gif > > Then load 'image.gif' into whatever your favourite > image viewer is. > > -- > Dr Torsten Seemann > http://www.vicbioinformatics.com > Victorian Bioinformatics Consortium, Monash > University, Australia > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From chen_li3 at yahoo.com Mon May 22 18:57:42 2006 From: chen_li3 at yahoo.com (chen li) Date: Mon, 22 May 2006 15:57:42 -0700 (PDT) Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <447232BE.1080001@infotech.monash.edu.au> Message-ID: <20060522225742.78245.qmail@web36804.mail.mud.yahoo.com> Hi, I try both: either with or without this statement binmode(STDOUT) before the last line print $panel->png; But there are no differenes. I get a file of 2432 bytes. Li > Chen Li > > > perl render_blast1.pl data1.txt >im.png > > Based on http://bioperl.org/wiki/HOWTO:Graphics I > believe the example > script is creating a PNG image. The last line is: > print $panel->png; > > > and Perl runs without any problem. I use adobe > > photoshop to open them and Adobe can't recognize > them. > > If I use ACDSee to open them I only get a black > > background. If I issue this line under Cygwin X > window > > display im.png or display im.gif > > Cygwin says: > > display: Improper image header `im.png'. > > It seems Perl can't produce an image with right > > format. > > Are you sure Perl is producing a PNG file at all? > How many bytes does im.png use? Zero? > > Did you notice this in > http://bioperl.org/wiki/HOWTO:Graphics ? > > It says: "If you are on a Windows platform, you need > to put STDOUT into > binary mode so that the PNG file does not go through > Window's carriage > return/linefeed transformations. Before the final > print statement, put > the statement binmode(STDOUT)." > > ie. your script should have > > binmode(STDOUT); > print $panel->png; > > as the last 2 lines. > > > Do you experience the same problem before? > > No. > > --Torsten > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From barry.moore at genetics.utah.edu Mon May 22 21:00:06 2006 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Mon, 22 May 2006 19:00:06 -0600 Subject: [Bioperl-l] Problems with Unflattener.pm Message-ID: <729FFBBD-955B-4689-8A27-66733E81431C@genetics.utah.edu> Hi All, NT_113910 appears to throw Bio::SeqFeatures::Tools::Unflattener into an infinite recursive loop. The trouble occurs in the method find_best_matches between lines 2258 and 2281, and in particular the loop is perpetuated by line 2273. NT_113910 has a fairly complex features table, and but I have as yet been unable to figure out why this loop is not exiting properly. This has been submitted to bugzilla, but I?ll post here so it gets documented on the list also. Any suggestions from Chris or others would be greatly appreciated. This problem can be recreated as follows: Grab NT_113910 from genbank. bp_fetch.pl -fmt genbank net::genbank:NT_113910 > NT_113910.gbk Pass NT_113910.gbk on the command line to the attached script. #!/usr/bin/perl; use strict; use warnings; use Bio::SeqIO; use Bio::SeqFeature::Tools::Unflattener; my $file = shift; # generate an Unflattener object my $unflattener = Bio::SeqFeature::Tools::Unflattener->new; #$unflattener->verbose(1); # first fetch a genbank SeqI object my $seqio = Bio::SeqIO->new(-file => $file, -format => 'GenBank'); my $out = Bio::SeqIO->new(-format => 'asciitree'); while (my $seq = $seqio->next_seq()) { # get top level unflattended SeqFeatureI objects $unflattener->unflatten_seq(-seq => $seq, -use_magic => 1); $out->write_seq($seq); } From miker at biotiquesystems.com Mon May 22 19:56:52 2006 From: miker at biotiquesystems.com (Michael Rogoff) Date: Mon, 22 May 2006 16:56:52 -0700 Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version Message-ID: <002a01c67dfb$663cc600$c100a8c0@mike> As best as I can tell, using Bio::SeqIO to parse a uniprot file ignores the sequence version, and calling seq_version() on the resulting RichSeq object returns undef. It looks like swiss.pm is trying to parse the version out of the SV line, which apparently doesn't exist any more? The sequence version(s) are now specified as part of the Date (DT) lines. Is this not a bug? Is swiss.pm not designed to parse uniprot files? Thanks for any help ... From jason.stajich at duke.edu Mon May 22 21:37:13 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon, 22 May 2006 21:37:13 -0400 Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version In-Reply-To: <002a01c67dfb$663cc600$c100a8c0@mike> References: <002a01c67dfb$663cc600$c100a8c0@mike> Message-ID: Sounds like a "missing feature" =) AFAIK the module was only written for swissprot files. It is possible there have been changes in the format that have not been tracked to the current code. We'd certainly appreciate someone testing it out as versions evolve. If you submit a bug to bugzilla with version of bioperl and example files you can track when a fix is in. We of course appreciate anyone's efforts to provide a patch as most bugs get fixed of late when someone gets "itchy" enough to fix them. -jason On May 22, 2006, at 7:56 PM, Michael Rogoff wrote: > > As best as I can tell, using Bio::SeqIO to parse a uniprot file > ignores the > sequence version, and calling seq_version() on the resulting > RichSeq object > returns undef. > > It looks like swiss.pm is trying to parse the version out of the SV > line, which > apparently doesn't exist any more? The sequence version(s) are now > specified as > part of the Date (DT) lines. > > Is this not a bug? Is swiss.pm not designed to parse uniprot files? > > Thanks for any help ... > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From jason.stajich at duke.edu Mon May 22 22:04:17 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon, 22 May 2006 22:04:17 -0400 Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version In-Reply-To: <003301c67e0b$5dd44410$c100a8c0@mike> References: <003301c67e0b$5dd44410$c100a8c0@mike> Message-ID: <3607997C-DAD4-4E0E-A919-7D9212AC6D50@duke.edu> We ask that people post patches to the bugzilla as an attachment to the bugzilla so we can track what and why the bug was that the patch fixes. I am not totally sure this patch works because it seems like we need to strip out more information now from the DT line if the $date actually contains more information than just the date. If you would go ahead and create a bug in bugzilla for this (http:// bugzilla.open-bio.org) this sort of conversation can be tracked to the bug. If any of this is unclear please let us know - I though we had put some pages up about this sort of thing on the wiki but maybe they need to be expanded. -jason On May 22, 2006, at 9:51 PM, Michael Rogoff wrote: > I have a patch that seems to work but I'm not familiar with the > proper method to > "provide" it. How do I go about that? > > The patch is pretty simple, it just parses the sequence version out > of the date > line where it now hides: > > #date > elsif( /^DT\s+(.*)/ ) { > my $date = $1; > + > + if ($date =~ /sequence version (\d+)/i) { > + $params{'-seq_version'} ||= $1; > + } > + > $date =~ s/\;//; > $date =~ s/\s+$//; > push @{$params{'-dates'}}, $date; > } > > By the way, what is the difference between Bio::Seq::version and > Bio::Seq::RichSeq::seq_version? > > >> -----Original Message----- >> From: Jason Stajich [mailto:jason.stajich at duke.edu] >> Sent: Monday, May 22, 2006 6:37 PM >> To: Michael Rogoff >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version >> >> >> Sounds like a "missing feature" =) >> >> AFAIK the module was only written for swissprot files. It is >> possible there have been changes in the format that have not been >> tracked to the current code. We'd certainly appreciate someone >> testing it out as versions evolve. If you submit a bug to bugzilla >> with version of bioperl and example files you can track when >> a fix is >> in. We of course appreciate anyone's efforts to provide a patch as >> most bugs get fixed of late when someone gets "itchy" enough to fix >> them. >> >> -jason >> >> On May 22, 2006, at 7:56 PM, Michael Rogoff wrote: >> >>> >>> As best as I can tell, using Bio::SeqIO to parse a uniprot file >>> ignores the >>> sequence version, and calling seq_version() on the resulting >>> RichSeq object >>> returns undef. >>> >>> It looks like swiss.pm is trying to parse the version out >> of the SV >>> line, which >>> apparently doesn't exist any more? The sequence version(s) >> are now >>> specified as >>> part of the Date (DT) lines. >>> >>> Is this not a bug? Is swiss.pm not designed to parse uniprot files? >>> >>> Thanks for any help ... >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From Marc.Logghe at DEVGEN.com Tue May 23 03:08:37 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Tue, 23 May 2006 09:08:37 +0200 Subject: [Bioperl-l] problems iwth Bio::graphics module Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746E4E@ANTARESIA.be.devgen.com> Hi Li, Did you check your script for any other print statements (to STDOUT, that is) that potentially could contaminate your png stream ? Marc > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chen li > Sent: Tuesday, May 23, 2006 12:58 AM > To: Torsten Seemann > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] problems iwth Bio::graphics module > > Hi, > > I try both: either with or without this statement > binmode(STDOUT) before the last line print $panel->png; But > there are no differenes. I get a file of 2432 bytes. > > Li > > > > > Chen Li > > > > > perl render_blast1.pl data1.txt >im.png > > > > Based on http://bioperl.org/wiki/HOWTO:Graphics I believe > the example > > script is creating a PNG image. The last line is: > > print $panel->png; > > > > > and Perl runs without any problem. I use adobe photoshop to open > > > them and Adobe can't recognize > > them. > > > If I use ACDSee to open them I only get a black background. If I > > > issue this line under Cygwin X > > window > > > display im.png or display im.gif > > > Cygwin says: > > > display: Improper image header `im.png'. > > > It seems Perl can't produce an image with right format. > > > > Are you sure Perl is producing a PNG file at all? > > How many bytes does im.png use? Zero? > > > > Did you notice this in > > http://bioperl.org/wiki/HOWTO:Graphics ? > > > > It says: "If you are on a Windows platform, you need to put STDOUT > > into binary mode so that the PNG file does not go through Window's > > carriage return/linefeed transformations. Before the final print > > statement, put the statement binmode(STDOUT)." > > > > ie. your script should have > > > > binmode(STDOUT); > > print $panel->png; > > > > as the last 2 lines. > > > > > Do you experience the same problem before? > > > > No. > > > > --Torsten > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection > around http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From chen_li3 at yahoo.com Tue May 23 09:27:06 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 23 May 2006 06:27:06 -0700 (PDT) Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746E4E@ANTARESIA.be.devgen.com> Message-ID: <20060523132706.57245.qmail@web36811.mail.mud.yahoo.com> Dear Dr. Logghe, Thank you so much. I have the script worked after getting your suggestion under Cygwin. Here are the last two lines: either binmode (STDOUT); print STDOUT $panel->png; or only print STDOUT $panel->png; They both work for me. I know the default output in perl to the screen. I don't why it works if STDOUT after print is added. Could you explain it? BTW I copy this script from GraphicsHowTo on Bioperl website and only one line contains print statement, which is 'print $panel->png'. Once again thank you so much, Li --- Marc Logghe wrote: > Hi Li, > Did you check your script for any other print > statements (to STDOUT, > that is) that potentially could contaminate your png > stream ? > > Marc > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open-bio.org] On > Behalf Of chen li > > Sent: Tuesday, May 23, 2006 12:58 AM > > To: Torsten Seemann > > Cc: bioperl-l at lists.open-bio.org > > Subject: Re: [Bioperl-l] problems iwth > Bio::graphics module > > > > Hi, > > > > I try both: either with or without this statement > > binmode(STDOUT) before the last line print > $panel->png; But > > there are no differenes. I get a file of 2432 > bytes. > > > > Li > > > > > > > > > Chen Li > > > > > > > perl render_blast1.pl data1.txt >im.png > > > > > > Based on http://bioperl.org/wiki/HOWTO:Graphics > I believe > > the example > > > script is creating a PNG image. The last line > is: > > > print $panel->png; > > > > > > > and Perl runs without any problem. I use adobe > photoshop to open > > > > them and Adobe can't recognize > > > them. > > > > If I use ACDSee to open them I only get a > black background. If I > > > > issue this line under Cygwin X > > > window > > > > display im.png or display im.gif > > > > Cygwin says: > > > > display: Improper image header `im.png'. > > > > It seems Perl can't produce an image with > right format. > > > > > > Are you sure Perl is producing a PNG file at > all? > > > How many bytes does im.png use? Zero? > > > > > > Did you notice this in > > > http://bioperl.org/wiki/HOWTO:Graphics ? > > > > > > It says: "If you are on a Windows platform, you > need to put STDOUT > > > into binary mode so that the PNG file does not > go through Window's > > > carriage return/linefeed transformations. Before > the final print > > > statement, put the statement binmode(STDOUT)." > > > > > > ie. your script should have > > > > > > binmode(STDOUT); > > > print $panel->png; > > > > > > as the last 2 lines. > > > > > > > Do you experience the same problem before? > > > > > > No. > > > > > > --Torsten > > > > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection > > around http://mail.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From lstein at cshl.edu Tue May 23 10:06:27 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 23 May 2006 10:06:27 -0400 Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com> References: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com> Message-ID: <200605231006.28392.lstein@cshl.edu> Hi, It is possible that your version of display can't handle PNG images. Try saving the output as a file and then opening it in another image program: perl render_blast1.pl data1.txt > data1.png Another thing to watch out for is that, depending on what version of Perl you're using, you may have to insert this statement into the render_blast1.pl script (somewhere near the top): binmode STDOUT; Lincoln On Saturday 20 May 2006 20:15, chen li wrote: > Dear all, > > > I try one script from GraphicsHowTo under Cygwin > environment(GD and libpng already installed). I type > this line in Cygwin X window: > > > $ perl render_blast1.pl data1.txt | display - > > And here is the result: > > display: no decode delegate for this image format > `/tmp/magick-qKiRPDRS'. > > Any idea? > > > Thank you very much, > > Li > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From Derek.Fairley at bll.n-i.nhs.uk Tue May 23 10:39:16 2006 From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek) Date: Tue, 23 May 2006 15:39:16 +0100 Subject: [Bioperl-l] Bio::Restriction::IO query Message-ID: Hi folks, I'm new to BioPerl, and struggling to make the Bio::Restriction::* modules work (using BioPerl-1.4; Perl-5.8.1; Linux-2.4). Specifically, I'm having some trouble understanding the behaviour of the Bio::Restriction::IO module. I'm trying to use this to create a Bio::Restriction::EnzymeCollection object from a local REBASE file (which is in bairoch-format); this will in turn be passed to a Bio::Restriction::Analysis object. The following test script (derived from the Bio::Restriction::IO perldoc) runs fine: #! /usr/bin/perl -w use strict; use warnings; use Bio::Restriction::IO; my $in = Bio::Restriction::IO->new( -file => "REBASE_file", -format =>'Bairoch'); my $collection = $in->read(); print "Number of REs in the collection: ", scalar $collection->each_enzyme, "\n"; #note that using -format=>'bairoch' without capitalisation (as shown in perldoc synopsis) throws an exception: Failed to load module Bio::Restriction::IO::bairoch... However... the test script returns the number 532 - the number of enzymes in the default enzyme set - regardless of the number of enzymes in the file. A default Bio::Restriction::EnzymeCollection object has presumably been created (as the 'read()' and 'each_enzyme' methods are available) but it didn't come from the local file. The result is the same if the Bio::Restriction::IO->new() method is called with no arguments - a default EnzymeCollection object is created. It's not clear to me where this has come from. My (mis?)understanding was that the default set of enzymes would be loaded on creation of a new Bio::Restriction::Analysis object (in the absence of a -enzymes=>... argument). Presumably this is down to my poor understanding of the BioPerl object model... ;-) So: how should I create an EnzymeCollection object from file? Any help or advice would be gratefully received. PS. Congratulations to the development team for creating a very impressive and useful open source toolkit. Derek. ----------------------------------------- Derek Fairley, Ph.D. Regional Virus Laboratory, Kelvin Building, Royal Victoria Hospital, Grosvenor Road, Belfast, N. Ireland. BT12 6BA Tel. +44 (0)2890 635303 From rowan.mitchell at bbsrc.ac.uk Tue May 23 10:53:42 2006 From: rowan.mitchell at bbsrc.ac.uk (rowan mitchell (RRes-Roth)) Date: Tue, 23 May 2006 15:53:42 +0100 Subject: [Bioperl-l] Assembly::IO ace output Message-ID: Hi I am very interested in writing ace format files and had assumed that I would be able to do this with Assembly::IO until I tried it! I see there has been some correspondence last year on this, but as far as I can see this is still not implemented in 1.5.1. Is this correct ? Is it planned to be included; are there modules under development available ? many thanks Rowan =============================================== Dr Rowan Mitchell Rothamsted Research Harpenden Herts AL5 2JQ UK Tel: +44 (0)1582 763133 x2469 Fax: +44 (0)1582 763010 E-mail: rowan.mitchell at bbsrc.ac.uk WWW: http://www.rothamsted.bbsrc.ac.uk/ =============================================== Rothamsted Research is a company limited by guarantee, registered in England under the registration number 2393175 and a not for profit charity number 802038. From rfsouza at cecm.usp.br Tue May 23 16:17:36 2006 From: rfsouza at cecm.usp.br (Robson Francisco de Souza {S}) Date: Tue, 23 May 2006 17:17:36 -0300 Subject: [Bioperl-l] Assembly::IO ace output In-Reply-To: References: Message-ID: <20060523201736.GA28401@cecm.usp.br> Hi Rowan, On Tue, May 23, 2006 at 03:53:42PM +0100, rowan mitchell (RRes-Roth) wrote: > Hi > > I am very interested in writing ace format files and had assumed that I > would be able to do this with Assembly::IO until I tried it! I see there > has been some correspondence last year on this, but as far as I can see > this is still not implemented in 1.5.1. Is this correct ? Is it planned > to be included; are there modules under development available ? As far as I know, there are no plans to add write support to Bio::Assembly::IO. When I wrote the original modules there was no need for this so I left it aside. Best regards, Robson > many thanks > > Rowan > > =============================================== > Dr Rowan Mitchell > Rothamsted Research > Harpenden > Herts AL5 2JQ UK > > Tel: +44 (0)1582 763133 x2469 > Fax: +44 (0)1582 763010 > E-mail: rowan.mitchell at bbsrc.ac.uk > WWW: http://www.rothamsted.bbsrc.ac.uk/ > =============================================== > Rothamsted Research is a company limited by guarantee, registered in > England under the registration number 2393175 and a not for profit > charity number 802038. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From lstein at cshl.edu Tue May 23 16:53:34 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 23 May 2006 16:53:34 -0400 Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <200605231006.28392.lstein@cshl.edu> References: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com> <200605231006.28392.lstein@cshl.edu> Message-ID: <200605231653.36087.lstein@cshl.edu> Hi Chen, It looks to me like you cut and paste the data1.txt file from the web site, consequently replacing the tabs with spaces. Please get table1.txt from the BioPerl distribution, as instructed in the tutorial. Best, Lincoln On Tuesday 23 May 2006 10:06, Lincoln Stein wrote: > Hi, > > It is possible that your version of display can't handle PNG images. Try > saving the output as a file and then opening it in another image program: > > perl render_blast1.pl data1.txt > data1.png > > Another thing to watch out for is that, depending on what version of Perl > you're using, you may have to insert this statement into the > render_blast1.pl script (somewhere near the top): > > binmode STDOUT; > > Lincoln > > On Saturday 20 May 2006 20:15, chen li wrote: > > Dear all, > > > > > > I try one script from GraphicsHowTo under Cygwin > > environment(GD and libpng already installed). I type > > this line in Cygwin X window: > > > > > > $ perl render_blast1.pl data1.txt | display - > > > > And here is the result: > > > > display: no decode delegate for this image format > > `/tmp/magick-qKiRPDRS'. > > > > Any idea? > > > > > > Thank you very much, > > > > Li > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam protection around > > http://mail.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From chen_li3 at yahoo.com Tue May 23 17:46:16 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 23 May 2006 14:46:16 -0700 (PDT) Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <200605231653.36087.lstein@cshl.edu> Message-ID: <20060523214616.15131.qmail@web36813.mail.mud.yahoo.com> Dear Dr. Stein, Thank you so much. I follow your suggestions and download codes from the Bioperl CVS website. Now everything is working. Li --- Lincoln Stein wrote: > Hi Chen, > > It looks to me like you cut and paste the data1.txt > file from the web site, > consequently replacing the tabs with spaces. Please > get table1.txt from the > BioPerl distribution, as instructed in the tutorial. > > Best, > > Lincoln > > On Tuesday 23 May 2006 10:06, Lincoln Stein wrote: > > Hi, > > > > It is possible that your version of display can't > handle PNG images. Try > > saving the output as a file and then opening it in > another image program: > > > > perl render_blast1.pl data1.txt > data1.png > > > > Another thing to watch out for is that, depending > on what version of Perl > > you're using, you may have to insert this > statement into the > > render_blast1.pl script (somewhere near the top): > > > > binmode STDOUT; > > > > Lincoln > > > > On Saturday 20 May 2006 20:15, chen li wrote: > > > Dear all, > > > > > > > > > I try one script from GraphicsHowTo under Cygwin > > > environment(GD and libpng already installed). I > type > > > this line in Cygwin X window: > > > > > > > > > $ perl render_blast1.pl data1.txt | display - > > > > > > And here is the result: > > > > > > display: no decode delegate for this image > format > > > `/tmp/magick-qKiRPDRS'. > > > > > > Any idea? > > > > > > > > > Thank you very much, > > > > > > Li > > > > > > > > > > __________________________________________________ > > > Do You Yahoo!? > > > Tired of spam? Yahoo! Mail has the best spam > protection around > > > http://mail.yahoo.com > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From chen_li3 at yahoo.com Tue May 23 18:59:46 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 23 May 2006 15:59:46 -0700 (PDT) Subject: [Bioperl-l] How to download sequence files either in EMBL format Message-ID: <20060523225946.2118.qmail@web36805.mail.mud.yahoo.com> Hi all, I need to download one sequence for a gene. I go to NCBI website,find the gene of interest,download the file in Genbank format(saved as sequence.genbank). But to my surprise this so-called genbank format file doesn't contain many features such as exons,compared to the one in Emsembl. My question: where can I download this sequence file in EMBL format? It looks like the one in EMBL might contain other information such exon. Thank you very much, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From osborne1 at optonline.net Wed May 24 10:33:16 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 24 May 2006 10:33:16 -0400 Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <20060523132706.57245.qmail@web36811.mail.mud.yahoo.com> Message-ID: Li, The Graphics HOWTO talks about this Windows workaround in _four_ different places, it's impossible to miss if you read it from start to finish. This is what one should do if one wants to use these modules and one is a novice. Example: Important! Remember that if you are on a Windows platform, you need to put STDOUT into binary mode so that the PNG file does not go through Window's carriage return/linefeed transformations. Before the final print statement, write binmode(STDOUT). Brian O. On 5/23/06 9:27 AM, "chen li" wrote: > BTW I copy this script from GraphicsHowTo on Bioperl > website and only one line contains print statement, > which is 'print $panel->png'. From chen_li3 at yahoo.com Wed May 24 12:17:15 2006 From: chen_li3 at yahoo.com (chen li) Date: Wed, 24 May 2006 09:17:15 -0700 (PDT) Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: Message-ID: <20060524161715.45141.qmail@web36807.mail.mud.yahoo.com> Thanks but Dr. Stein already helps me to figure out what is going on: I should have copied the source codes for the examples in CVS instead of "cut and paste" from the HOWTO tutorial. And sorry for any inconvience. Li --- Brian Osborne wrote: > Li, > > The Graphics HOWTO talks about this Windows > workaround in _four_ different > places, it's impossible to miss if you read it from > start to finish. This is > what one should do if one wants to use these modules > and one is a novice. > Example: > > Important! Remember that if you are on a Windows > platform, you need to put > STDOUT into binary mode so that the PNG file does > not go through Window's > carriage return/linefeed transformations. Before the > final print statement, > write binmode(STDOUT). > > Brian O. > > > On 5/23/06 9:27 AM, "chen li" > wrote: > > > BTW I copy this script from GraphicsHowTo on > Bioperl > > website and only one line contains print > statement, > > which is 'print $panel->png'. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From ULNJUJERYDIX at spammotel.com Wed May 24 21:59:36 2006 From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau) Date: Thu, 25 May 2006 09:59:36 +0800 Subject: [Bioperl-l] URGENT: Bio::Graphics::Panel make the ruler have negative (-) position numbering imagemap making Message-ID: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com> Hi thanks for the help offered thus far! sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq using bioperl. therefore i was asked to make the numberings as such (-1000) is there any way at all to do this in bioperl without changing the .pm file? thanks guys.. kevin From cjfields at uiuc.edu Thu May 25 12:43:37 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 25 May 2006 11:43:37 -0500 Subject: [Bioperl-l] Problems with Unflattener.pm In-Reply-To: <729FFBBD-955B-4689-8A27-66733E81431C@genetics.utah.edu> Message-ID: <009d01c6801a$5f75d2a0$15327e82@pyrimidine> I was able to reproduce this using WinXP and bioperl-live. Seems to get caught up in the loop during recursion: debugging shows it is unable to get past 'find_best_matches: (/15)'. There are lots of unmatched pairs here with this sequence, so could that be the problem? I not terribly familiar with Unflattener... Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Barry Moore > Sent: Monday, May 22, 2006 8:00 PM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] Problems with Unflattener.pm > > Hi All, > > NT_113910 appears to throw Bio::SeqFeatures::Tools::Unflattener into > an infinite recursive loop. The trouble occurs in the method > find_best_matches between lines 2258 and 2281, and in particular the > loop is perpetuated by line 2273. NT_113910 has a fairly complex > features table, and but I have as yet been unable to figure out why > this loop is not exiting properly. This has been submitted to > bugzilla, but I'll post here so it gets documented on the list also. > Any suggestions from Chris or others would be greatly appreciated. > > This problem can be recreated as follows: > > Grab NT_113910 from genbank. > bp_fetch.pl -fmt genbank net::genbank:NT_113910 > NT_113910.gbk > > Pass NT_113910.gbk on the command line to the attached script. > > > > #!/usr/bin/perl; > > use strict; > use warnings; > > use Bio::SeqIO; > use Bio::SeqFeature::Tools::Unflattener; > > my $file = shift; > > # generate an Unflattener object > my $unflattener = Bio::SeqFeature::Tools::Unflattener->new; > #$unflattener->verbose(1); > > # first fetch a genbank SeqI object > my $seqio = > Bio::SeqIO->new(-file => $file, > -format => 'GenBank'); > my $out = > Bio::SeqIO->new(-format => 'asciitree'); > while (my $seq = $seqio->next_seq()) { > > # get top level unflattended SeqFeatureI objects > $unflattener->unflatten_seq(-seq => $seq, > -use_magic => 1); > $out->write_seq($seq); > } > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Thu May 25 15:44:01 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 25 May 2006 14:44:01 -0500 Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version In-Reply-To: <3607997C-DAD4-4E0E-A919-7D9212AC6D50@duke.edu> Message-ID: <00a101c68033$95606dd0$15327e82@pyrimidine> This is due to recent changes in the SwissProt/UniProt format (there apparently are many other changes besides this). >From UniProtKB news (http://ca.expasy.org/sprot/relnotes/sp_news.html) is this tidbit: ---------------------------------------------------------- UniProtKB release 7.0 of 07-Feb-2006 Changes concerning dates and versions numbers (DT lines) We changed from showing only the dates corresponding to full UniProtKB releases in the DT lines to displaying the date of the biweekly release at which an entry is integrated or updated. We dropped the information concerning the release number and introduced entry and sequence version numbers in the DT lines. The new format of the three DT lines is: DT DD-MMM-YYYY, integrated into UniProtKB/database_name. DT DD-MMM-YYYY, sequence version version_number. DT DD-MMM-YYYY, entry version version_number. Example for UniProtKB/Swiss-Prot: DT 01-JAN-1998, integrated into UniProtKB/Swiss-Prot. DT 15-OCT-2001, sequence version 3. DT 01-APR-2004, entry version 14. Example for UniProtKB/TrEMBL: DT 01-FEB-1999, integrated into UniProtKB/TrEMBL. DT 15-OCT-2000, sequence version 2. DT 15-DEC-2004, entry version 5. The sequence version number of an entry is incremented by one when its amino acid sequence is modified. The entry version number is incremented by one whenever any data in the flat file representation of the entry is modified. We retrofitted the entry and sequence version numbers, as well as all dates, using archived UniProtKB releases. ---------------------------------------------------------- Probably should explain on the swissprot wiki page that the format is in a state of flux at the moment. I've added this tidbit to the bug page (#2003) as well. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Jason Stajich > Sent: Monday, May 22, 2006 9:04 PM > To: Michael Rogoff > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version > > We ask that people post patches to the bugzilla as an attachment to > the bugzilla so we can track what and why the bug was that the patch > fixes. > > I am not totally sure this patch works because it seems like we need > to strip out more information now from the DT line if the $date > actually contains more information than just the date. > > If you would go ahead and create a bug in bugzilla for this (http:// > bugzilla.open-bio.org) this sort of conversation can be tracked to > the bug. > > If any of this is unclear please let us know - I though we had put > some pages up about this sort of thing on the wiki but maybe they > need to be expanded. > > -jason > On May 22, 2006, at 9:51 PM, Michael Rogoff wrote: > > > I have a patch that seems to work but I'm not familiar with the > > proper method to > > "provide" it. How do I go about that? > > > > The patch is pretty simple, it just parses the sequence version out > > of the date > > line where it now hides: > > > > #date > > elsif( /^DT\s+(.*)/ ) { > > my $date = $1; > > + > > + if ($date =~ /sequence version (\d+)/i) { > > + $params{'-seq_version'} ||= $1; > > + } > > + > > $date =~ s/\;//; > > $date =~ s/\s+$//; > > push @{$params{'-dates'}}, $date; > > } > > > > By the way, what is the difference between Bio::Seq::version and > > Bio::Seq::RichSeq::seq_version? > > > > > >> -----Original Message----- > >> From: Jason Stajich [mailto:jason.stajich at duke.edu] > >> Sent: Monday, May 22, 2006 6:37 PM > >> To: Michael Rogoff > >> Cc: bioperl-l at lists.open-bio.org > >> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version > >> > >> > >> Sounds like a "missing feature" =) > >> > >> AFAIK the module was only written for swissprot files. It is > >> possible there have been changes in the format that have not been > >> tracked to the current code. We'd certainly appreciate someone > >> testing it out as versions evolve. If you submit a bug to bugzilla > >> with version of bioperl and example files you can track when > >> a fix is > >> in. We of course appreciate anyone's efforts to provide a patch as > >> most bugs get fixed of late when someone gets "itchy" enough to fix > >> them. > >> > >> -jason > >> > >> On May 22, 2006, at 7:56 PM, Michael Rogoff wrote: > >> > >>> > >>> As best as I can tell, using Bio::SeqIO to parse a uniprot file > >>> ignores the > >>> sequence version, and calling seq_version() on the resulting > >>> RichSeq object > >>> returns undef. > >>> > >>> It looks like swiss.pm is trying to parse the version out > >> of the SV > >>> line, which > >>> apparently doesn't exist any more? The sequence version(s) > >> are now > >>> specified as > >>> part of the Date (DT) lines. > >>> > >>> Is this not a bug? Is swiss.pm not designed to parse uniprot files? > >>> > >>> Thanks for any help ... > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> -- > >> Jason Stajich > >> Duke University > >> http://www.duke.edu/~jes12 > >> > >> > >> > > > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From miker at biotiquesystems.com Mon May 22 21:51:10 2006 From: miker at biotiquesystems.com (Michael Rogoff) Date: Mon, 22 May 2006 18:51:10 -0700 Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version In-Reply-To: Message-ID: <003301c67e0b$5dd44410$c100a8c0@mike> I have a patch that seems to work but I'm not familiar with the proper method to "provide" it. How do I go about that? The patch is pretty simple, it just parses the sequence version out of the date line where it now hides: #date elsif( /^DT\s+(.*)/ ) { my $date = $1; + + if ($date =~ /sequence version (\d+)/i) { + $params{'-seq_version'} ||= $1; + } + $date =~ s/\;//; $date =~ s/\s+$//; push @{$params{'-dates'}}, $date; } By the way, what is the difference between Bio::Seq::version and Bio::Seq::RichSeq::seq_version? > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich at duke.edu] > Sent: Monday, May 22, 2006 6:37 PM > To: Michael Rogoff > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version > > > Sounds like a "missing feature" =) > > AFAIK the module was only written for swissprot files. It is > possible there have been changes in the format that have not been > tracked to the current code. We'd certainly appreciate someone > testing it out as versions evolve. If you submit a bug to bugzilla > with version of bioperl and example files you can track when > a fix is > in. We of course appreciate anyone's efforts to provide a patch as > most bugs get fixed of late when someone gets "itchy" enough to fix > them. > > -jason > > On May 22, 2006, at 7:56 PM, Michael Rogoff wrote: > > > > > As best as I can tell, using Bio::SeqIO to parse a uniprot file > > ignores the > > sequence version, and calling seq_version() on the resulting > > RichSeq object > > returns undef. > > > > It looks like swiss.pm is trying to parse the version out > of the SV > > line, which > > apparently doesn't exist any more? The sequence version(s) > are now > > specified as > > part of the Date (DT) lines. > > > > Is this not a bug? Is swiss.pm not designed to parse uniprot files? > > > > Thanks for any help ... > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > From chen_li3 at yahoo.com Tue May 23 11:48:46 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 23 May 2006 08:48:46 -0700 (PDT) Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <200605231006.28392.lstein@cshl.edu> Message-ID: <20060523154846.70831.qmail@web36815.mail.mud.yahoo.com> Dear Dr. Stein, I have the job partially done by adding this line (under Cygwin) print STDOUT $panel->png; It is done because I can produce the image to be viewed by other programs but it is only partially done because I don't get exactly the same image as that shown on the website. Enclosed is the image I get. Thank you, Li --- Lincoln Stein wrote: > Hi, > > It is possible that your version of display can't > handle PNG images. Try > saving the output as a file and then opening it in > another image program: > > perl render_blast1.pl data1.txt > data1.png > > Another thing to watch out for is that, depending on > what version of Perl > you're using, you may have to insert this statement > into the render_blast1.pl > script (somewhere near the top): > > binmode STDOUT; > > Lincoln > > > On Saturday 20 May 2006 20:15, chen li wrote: > > Dear all, > > > > > > I try one script from GraphicsHowTo under Cygwin > > environment(GD and libpng already installed). I > type > > this line in Cygwin X window: > > > > > > $ perl render_blast1.pl data1.txt | display - > > > > And here is the result: > > > > display: no decode delegate for this image format > > `/tmp/magick-qKiRPDRS'. > > > > Any idea? > > > > > > Thank you very much, > > > > Li > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -------------- next part -------------- A non-text attachment was scrubbed... Name: im1 Type: image/x-png Size: 2423 bytes Desc: 2615755531-im1 Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060523/6870f840/attachment.bin From cjfields at uiuc.edu Thu May 25 21:28:14 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 25 May 2006 20:28:14 -0500 Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version In-Reply-To: <003301c67e0b$5dd44410$c100a8c0@mike> References: <003301c67e0b$5dd44410$c100a8c0@mike> Message-ID: This patch works only for the recent change in swissprot seq format for sequence versions on the DT line. I checked it out vs the test data provided with bioperl (t\data\swiss.dat). I did manage to get it working for both old and new using a modification to your patch but there's another issue; using $seq->get_dates, which should only show dates, shows the entire line (date and version info). Jason mentioned that there needs to be a better way to address this which I'm looking into. Chris On May 22, 2006, at 8:51 PM, Michael Rogoff wrote: > I have a patch that seems to work but I'm not familiar with the > proper method to > "provide" it. How do I go about that? > > The patch is pretty simple, it just parses the sequence version out > of the date > line where it now hides: > > #date > elsif( /^DT\s+(.*)/ ) { > my $date = $1; > + > + if ($date =~ /sequence version (\d+)/i) { > + $params{'-seq_version'} ||= $1; > + } > + > $date =~ s/\;//; > $date =~ s/\s+$//; > push @{$params{'-dates'}}, $date; > } > > By the way, what is the difference between Bio::Seq::version and > Bio::Seq::RichSeq::seq_version? > > >> -----Original Message----- >> From: Jason Stajich [mailto:jason.stajich at duke.edu] >> Sent: Monday, May 22, 2006 6:37 PM >> To: Michael Rogoff >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version >> >> >> Sounds like a "missing feature" =) >> >> AFAIK the module was only written for swissprot files. It is >> possible there have been changes in the format that have not been >> tracked to the current code. We'd certainly appreciate someone >> testing it out as versions evolve. If you submit a bug to bugzilla >> with version of bioperl and example files you can track when >> a fix is >> in. We of course appreciate anyone's efforts to provide a patch as >> most bugs get fixed of late when someone gets "itchy" enough to fix >> them. >> >> -jason >> >> On May 22, 2006, at 7:56 PM, Michael Rogoff wrote: >> >>> >>> As best as I can tell, using Bio::SeqIO to parse a uniprot file >>> ignores the >>> sequence version, and calling seq_version() on the resulting >>> RichSeq object >>> returns undef. >>> >>> It looks like swiss.pm is trying to parse the version out >> of the SV >>> line, which >>> apparently doesn't exist any more? The sequence version(s) >> are now >>> specified as >>> part of the Date (DT) lines. >>> >>> Is this not a bug? Is swiss.pm not designed to parse uniprot files? >>> >>> Thanks for any help ... >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From lstein at cshl.edu Fri May 26 10:38:29 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 26 May 2006 10:38:29 -0400 Subject: [Bioperl-l] URGENT: Bio::Graphics::Panel make the ruler have negative (-) position numbering imagemap making In-Reply-To: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com> References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com> Message-ID: <200605261038.30380.lstein@cshl.edu> Hi, For some reason I didn't see the first posting on this. In current bioperl live, the ruler can have negative numberings - I use this routinely. You need to create a feature that starts in negative coordinates. What is happening to you when you try this? Lincoln On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote: > Hi > thanks for the help offered thus far! > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq using > bioperl. therefore i was asked to make the numberings as such (-1000) is > there any way at all to do this in bioperl without changing the .pm file? > > thanks guys.. > kevin > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From jelenaob at gmail.com Fri May 26 12:47:05 2006 From: jelenaob at gmail.com (Jelena Obradovic) Date: Fri, 26 May 2006 09:47:05 -0700 Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file Message-ID: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com> Hi there, I have tried loading enzyme list from a file REBASE bairoch.605 using Bio::Restriction::IO; 1. But for some reason the number of enzymes in the list is always 532 which is a default set of enzymes in enzyme collection. Is there any known issue with this module or a workaround? And here is the code I have been using: my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-format=>"Bairoch") || die "can't load the file bairoch.605: $!"; my $enzymes = $re_in->read; print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n"; 2. The other problem is when trying to use format that is lower-case it throws an exception, but when "B" is capitalized it is ok. I assume it cannot load a file and does not initilize enzyme collection properly. Can't call method "each_enzyme" on an undefined value at .../cgi-bin/seq-load.pl line 51. Any thoughts? Thanks in advance, Jelena Obradovic jelenaob at gmail.com From cjfields at uiuc.edu Fri May 26 15:27:13 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 26 May 2006 14:27:13 -0500 Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file In-Reply-To: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com> Message-ID: <002601c680fa$644635a0$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic > Sent: Friday, May 26, 2006 11:47 AM > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file > > Hi there, > > I have tried loading enzyme list from a file REBASE bairoch.605 using > Bio::Restriction::IO; > > 1. But for some reason the number of enzymes in the list is always 532 > which is a default set of enzymes in enzyme collection. > > Is there any known issue with this module or a workaround? > > And here is the code I have been using: > > my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- > format=>"Bairoch") > || die "can't load the file bairoch.605: $!"; > my $enzymes = $re_in->read; > print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n"; my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- format=>"Bairoch"); should be my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- format=>"bairoch"); Note the case change for the format; this is noted in the bug report you submitted earlier. Bio::Restriction::IO works similarly to Bio::SeqIO (i.e. requires a specific format, which I believe is case-sensitive). Judging by the modules in Bio/Restriction/IO directory, looks like the Bio::Restriction::IO format should match one of the following formats: bairoch, itype2, withrefm, and you can also build your own if needed using the previous as examples and implementing Bio::Restriction::IO::base. > 2. The other problem is when trying to use format that is lower-case > it throws an exception, but when "B" is capitalized it is ok. > I assume it cannot load a file and does not initilize enzyme > collection properly. > > Can't call method "each_enzyme" on an undefined value at > .../cgi-bin/seq-load.pl line 51. My guess? The reason it works with an uppercase ('Bairoch') is that it can't find the module and uses the default set of enzymes as a fallback. The exception that you reported when you use lowercase ('bairoch') is real and I reported it as a bug (there are a few I found in that module). You might want to try using one of the other formats if you can get the files in the right format from REBASE. I'm looking into the bugs specifically associated with Bio::Restriction::IO::bairoch. > Any thoughts? > > > Thanks in advance, > > > Jelena Obradovic > jelenaob at gmail.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Fri May 26 15:43:18 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 26 May 2006 15:43:18 -0400 Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file In-Reply-To: <002601c680fa$644635a0$15327e82@pyrimidine> Message-ID: Chris, SeqIO's arguments are case-insensitive (e.g. 'fasta', 'Fasta', 'FASTA' should work). This is what the documentation says and what the code seems to suggest. This is probably what the Restriction modules should do as well. Brian O. From cjfields at uiuc.edu Fri May 26 16:21:03 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 26 May 2006 15:21:03 -0500 Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file In-Reply-To: Message-ID: <002701c68101$e9432540$15327e82@pyrimidine> Okay, my bad. Having the format be case-insensitive makes sense and is probably an easy fix, but there seem to be more serious issues with the Bio::Restriction::IO modules at the moment. None have implemented write methods though POD implies they work: SYNOPSIS use Bio::Restriction::IO; $in = Bio::Restriction::IO->new(-file => "inputfilename" , -format => 'withrefm'); $out = Bio::Restriction::IO->new(-file => ">outputfilename" , -format => 'bairoch'); my $res = $in->read; # a Bio::Restriction::EnzymeCollection $out->write($res); and no tests exist for Bio::Restriction::IO::bairoch yet. In fact, the tests are pretty confusing; when did we allow this syntax: '-format => 8'? Anyway, I'm muddling my way through this and will probably write something up for the project priority list if I can't work this bug out. Chris > -----Original Message----- > From: Brian Osborne [mailto:osborne1 at optonline.net] > Sent: Friday, May 26, 2006 2:43 PM > To: Chris Fields; 'Jelena Obradovic'; Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::Restriction::IO and REBASE file > > Chris, > > SeqIO's arguments are case-insensitive (e.g. 'fasta', 'Fasta', 'FASTA' > should work). This is what the documentation says and what the code seems > to > suggest. This is probably what the Restriction modules should do as well. > > Brian O. > > From andreas.bender at complife.org Fri May 26 10:50:03 2006 From: andreas.bender at complife.org (Andreas Bender (CompLife'06)) Date: Fri, 26 May 2006 10:50:03 -0400 Subject: [Bioperl-l] Bioperl-based Applications for "Free Software" Session? Message-ID: Dear All, Did anyone of you implement some cool programs/tools using Bioperl? Or is there someone from the Bioperl core team who wants to present Bioperl itself at our conference? We are holding a "free software" session (free at least as in free beer, ideally also open source, some GNU-type license) at our "Computational Life Sciences" Conference in Cambridge/UK later this year and you are warmly welcome to present your software there. Please contact me directly or visit the website in case of any questions. Enjoy the weekend, Andreas Call for Contributions ================================================== LIFE SCIENCE FREE SOFTWARE SESSION held at CompLife 2006 (http://www.complife.org) in Cambridge, United Kingdom, on September 27 - 29, 2006 ================================================== In the last years more and more free and open source software has been developed for chemo- and bioinformatics, molecular modelling or other Life Science applications, but many of the programs are not well known. During the CompLife 2006 conference we will organize a special session dedicated to this type of free software. The demo session will be preceeded by a short session having room for brief introductory presentations whereas the demo session itself will allow attendees to see the tools in action. Authors of free software will have the opportunity to present their program to the CompLife audience which will consist of researchers and users from computer science, biology, chemistry and everything in between. In case you are interested in the free software session, send us an email at fss at complife.org and briefly describe your program and how you intend to present it at the conference (1-2 pages max - please include URL to downloadable version where available). The only restrictions are that the program must be freely available for everyone or even open source and that it must be related to Life Science applications. The deadline for these proposals is June, 16th 2006. In mid July we will notify you if your software demo was accepted. ************************ -- Computational Life Sciences '06 Cambridge/UK, 27-29 September 2006: Visit http://www.complife.org for more information! Andreas Kieron Patrick Bender - http://www.andreasbender.de Novartis Institutes for BioMedical Research, Cambridge/MA From cjfields at uiuc.edu Fri May 26 17:19:08 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 26 May 2006 16:19:08 -0500 Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file In-Reply-To: <286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com> Message-ID: <002b01c6810a$06642400$15327e82@pyrimidine> The POD documentation is a bit misleading for Bio::Restriction::IO. Brian's right, there needs to be more flexibility with the case for the formats used. I found a few other odd things as well which I may file bug reports for. Looks like another post for the project priority list. Chris _____ From: Jelena Obradovic [mailto:jobradovic at gmail.com] Sent: Friday, May 26, 2006 3:56 PM To: Chris Fields Cc: Jelena Obradovic; Bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio::Restriction::IO and REBASE file Hi guys, I tried with the other formats, and it works fine with "withrefm" format but not with "withref". Thanks a lot for your reponse. Cheers, Jelena On 5/26/06, Chris Fields wrote: > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic > Sent: Friday, May 26, 2006 11:47 AM > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file > > Hi there, > > I have tried loading enzyme list from a file REBASE bairoch.605 using > Bio::Restriction::IO; > > 1. But for some reason the number of enzymes in the list is always 532 > which is a default set of enzymes in enzyme collection. > > Is there any known issue with this module or a workaround? > > And here is the code I have been using: > > my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- > format=>"Bairoch") > || die "can't load the file bairoch.605: $!"; > my $enzymes = $re_in->read; > print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n"; my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- format=>"Bairoch"); should be my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- format=>"bairoch"); Note the case change for the format; this is noted in the bug report you submitted earlier. Bio::Restriction::IO works similarly to Bio::SeqIO ( i.e. requires a specific format, which I believe is case-sensitive). Judging by the modules in Bio/Restriction/IO directory, looks like the Bio::Restriction::IO format should match one of the following formats: bairoch, itype2, withrefm, and you can also build your own if needed using the previous as examples and implementing Bio::Restriction::IO::base. > 2. The other problem is when trying to use format that is lower-case > it throws an exception, but when "B" is capitalized it is ok. > I assume it cannot load a file and does not initilize enzyme > collection properly. > > Can't call method "each_enzyme" on an undefined value at > .../cgi-bin/seq-load.pl line 51. My guess? The reason it works with an uppercase ('Bairoch') is that it can't find the module and uses the default set of enzymes as a fallback. The exception that you reported when you use lowercase ('bairoch') is real and I reported it as a bug (there are a few I found in that module). You might want to try using one of the other formats if you can get the files in the right format from REBASE. I'm looking into the bugs specifically associated with Bio::Restriction::IO::bairoch. > Any thoughts? > > > Thanks in advance, > > > Jelena Obradovic > jelenaob at gmail.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jelena Obradovic Email: jobradovic at gmail.com From jay at jays.net Sat May 27 12:47:27 2006 From: jay at jays.net (Jay Hannah) Date: Sat, 27 May 2006 11:47:27 -0500 Subject: [Bioperl-l] "Project OpenLab" (working title) Message-ID: <4478829F.5030508@jays.net> Hola -- We've been kicking around this idea for a few months now. I'm threatening to start coding. Once I do I might not sleep for a few weeks so I thought I'd solicit feedback now. :) "Project OpenLab": http://omaha.pm.org/kwiki/?BioPerl - Does any such project already exist? - If there's no other obvious choice already bent to BioPerl / BioPerl DB / BioSQL, I'll probably be writing the web framework in Perl's Template Toolkit. The server is Linux, Apache, mySQL (BioPerl DBs). - I'll be using BioPerl objects for the persistence layer as much as possible. Where not possible I'll ask this list about my patches/additions/ugly hackery. - I'll be discussing my back office tables like "users" that don't belong in bioperl-db; and my questions about new tables that might belong there on the BioSQL-l mailing list. - I'm not a computer language zealot (usually), so I'm open to out-of-the-box ideas from anyone. - I'm a biology newb with a long Perl/database/web/e-commerce background, so please feel free to point out any bio idiocy I engage in. Thanks for your time, j From fernan at iib.unsam.edu.ar Sat May 27 18:30:44 2006 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Sat, 27 May 2006 19:30:44 -0300 Subject: [Bioperl-l] "Project OpenLab" (working title) In-Reply-To: <4478829F.5030508@jays.net> References: <4478829F.5030508@jays.net> Message-ID: <20060527223044.GA40583@iib.unsam.edu.ar> +----[ Jay Hannah (27.May.2006 15:15): | | Hola -- Hola! | We've been kicking around this idea for a few months now. I'm threatening to start coding. Once I do I might not sleep for a few weeks so I thought I'd solicit feedback now. :) | | "Project OpenLab": | http://omaha.pm.org/kwiki/?BioPerl | | - Does any such project already exist? mmm ... maybe ... both GUS (Genomics Unified Schema: gusdb.org, though not developed around bioperl) and GMOD (Generic Model Organism Database: gmod.org) provide you with i) RDBMS storage ii) a Perl object layer iii) a web app framework Though certainly overkill for the needs you describe in the wiki, they can be customized to work in the way you describe or at least serve as a guide. | - If there's no other obvious choice already bent to BioPerl / BioPerl DB / BioSQL, I'll probably be writing the web framework in Perl's Template Toolkit. The server is Linux, Apache, mySQL (BioPerl DBs). Have you considered Perl Catalyst? It has the benefits of allowing you to work with bioperl modules naturally (it's Perl!) a choice of templating toolkits (Template Toolkit, Mason, among others) and will provide you with an almost ready to go controller/url dispatcher. | - I'll be using BioPerl objects for the persistence layer as much as possible. Where not possible I'll ask this list about my patches/additions/ugly hackery. | - I'll be discussing my back office tables like "users" that don't belong in bioperl-db; and my questions about new tables that might belong there on the BioSQL-l mailing list. | - I'm not a computer language zealot (usually), so I'm open to out-of-the-box ideas from anyone. | - I'm a biology newb with a long Perl/database/web/e-commerce background, so please feel free to point out any bio idiocy I engage in. | | Thanks for your time, | | j | +----] Good luck, Fernan From epsteinj at mail.nih.gov Fri May 26 14:46:32 2006 From: epsteinj at mail.nih.gov (Epstein, Jonathan A (NIH/NICHD) [E]) Date: Fri, 26 May 2006 14:46:32 -0400 Subject: [Bioperl-l] URGENT: Bio::Graphics::Panel make the ruler havenegative (-) position numbering imagemap making In-Reply-To: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com> References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com> Message-ID: <42504F69898FE546B3F0238C9BD032750915F8@NIHCESMLBX7.nih.gov> While this is being discussed and we have Lincoln's attention; in example 4 on the Biographics Howto: http://stein.cshl.org/genome_informatics/BioGraphics/Graphics-HOWTO.html how can one assign directional arrows to the graded segments which represent the BLAST hits? I.e., is there a glyph type which is both an 'arrow' and a 'graded_segment'? What other techniques do you recommend for associating directionality with these hits? Thanks®ards, Jonathan From jobradovic at gmail.com Fri May 26 16:55:35 2006 From: jobradovic at gmail.com (Jelena Obradovic) Date: Fri, 26 May 2006 13:55:35 -0700 Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file In-Reply-To: <002601c680fa$644635a0$15327e82@pyrimidine> References: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com> <002601c680fa$644635a0$15327e82@pyrimidine> Message-ID: <286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com> Hi guys, I tried with the other formats, and it works fine with "withrefm" format but not with "withref". Thanks a lot for your reponse. Cheers, Jelena On 5/26/06, Chris Fields wrote: > > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic > > Sent: Friday, May 26, 2006 11:47 AM > > To: Bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file > > > > Hi there, > > > > I have tried loading enzyme list from a file REBASE bairoch.605 using > > Bio::Restriction::IO; > > > > 1. But for some reason the number of enzymes in the list is always 532 > > which is a default set of enzymes in enzyme collection. > > > > Is there any known issue with this module or a workaround? > > > > And here is the code I have been using: > > > > my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- > > format=>"Bairoch") > > || die "can't load the file bairoch.605: $!"; > > my $enzymes = $re_in->read; > > print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n"; > > my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- > format=>"Bairoch"); > > should be > > my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- > format=>"bairoch"); > > Note the case change for the format; this is noted in the bug report you > submitted earlier. Bio::Restriction::IO works similarly to Bio::SeqIO ( > i.e. > requires a specific format, which I believe is case-sensitive). Judging > by > the modules in Bio/Restriction/IO directory, looks like the > Bio::Restriction::IO format should match one of the following formats: > bairoch, itype2, withrefm, and you can also build your own if needed using > the previous as examples and implementing Bio::Restriction::IO::base. > > > 2. The other problem is when trying to use format that is lower-case > > it throws an exception, but when "B" is capitalized it is ok. > > I assume it cannot load a file and does not initilize enzyme > > collection properly. > > > > Can't call method "each_enzyme" on an undefined value at > > .../cgi-bin/seq-load.pl line 51. > > My guess? The reason it works with an uppercase ('Bairoch') is that it > can't find the module and uses the default set of enzymes as a fallback. > The exception that you reported when you use lowercase ('bairoch') is real > and I reported it as a bug (there are a few I found in that module). > > You might want to try using one of the other formats if you can get the > files in the right format from REBASE. I'm looking into the bugs > specifically associated with Bio::Restriction::IO::bairoch. > > > Any thoughts? > > > > > > Thanks in advance, > > > > > > Jelena Obradovic > > jelenaob at gmail.com > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Jelena Obradovic Email: jobradovic at gmail.com From gad14 at cornell.edu Fri May 26 16:02:33 2006 From: gad14 at cornell.edu (Genevieve DeClerck) Date: Fri, 26 May 2006 16:02:33 -0400 Subject: [Bioperl-l] results problem with StandAloneBlast Message-ID: <44775ED9.4020208@cornell.edu> Hi, I'm running local blast with Bio::Tools::Run::StandAloneBlast. Everything seems to work ok up to the point of accessing the results. I am able to print the results but when I try to do more than one thing with the result, nothing is returned for the second activity.. I'd like to first sort the results into groups of results that hit the db seq once, twice, three times, etc - where the results are stored as SeqFeature objects in temporary arrays whose contents are printed sequentially to stdout when the whole sort is complete. Secondly, I need to print the results in Hit Table (i.e. -m 8) format to stdout. If I've sorted the results the sorted-results will print to screen, however when I try to print the Hit Table results nothing is returned, as if the blast results have evaporated.... and visa versa, if i comment out the part where i point my sorting subroutine to the blast results reference, my hit table results suddenly prints to screen. It's almost like the reference to the SearchIO obj that holds the StandAloneBlast results is lost after one use?? (I'm beginning to think there is something naive about the way I'm using references?..) Here's an abbreviated version of my code: my $ref_seq_objs; # ref to array of Sequence obj's my $genome_seq; # fasta containing 1 genomic sequence my @params = ('program' => 'blastn', 'database' => $genome_seq, ); my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); my $blast_report = $factory->blastall($ref_seq_objs); #OK ####### ### the following 2 actions seem to be mutually exclusive. # 1) sort results into 1-hitter, 2-hitter, etc. groups of # SeqFeature objs stored in arrays. arrays are then printed # to stdout &sort_results($blast_report); # 2) print blast results &print_blast_results($blast_report); ####### sub print_blast_results{ my $report = shift; while(my $result = $report->next_result()){ while(my $hit = $result->next_hit()){ while(my $hsp = $hit->next_hsp()){ my $q_name = $hsp_q_seq_obj->display_id; print join(", ",$q_name,$hit->name,$hsp->bits)."\n"; } } } } I'm about to lose my mind on this... any assistance appreciated! Thanks, Genevieve From rvosa at sfu.ca Sun May 28 03:43:23 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Sun, 28 May 2006 00:43:23 -0700 Subject: [Bioperl-l] "Project OpenLab" (working title) In-Reply-To: <4478829F.5030508@jays.net> References: <4478829F.5030508@jays.net> Message-ID: <4479549B.5030202@sfu.ca> The TreeBaseII team (part of the cipres project: http://www.phylo.org) are working on a lab database system for storage of intermediate calculation results and data (sequence alignments, trees, taxon sets). I think what you're discussing is a bit more molecular and less phylogenetic, but it does sound similar in spirit. Rutger Jay Hannah wrote: > Hola -- > > We've been kicking around this idea for a few months now. I'm threatening to start coding. Once I do I might not sleep for a few weeks so I thought I'd solicit feedback now. :) > > "Project OpenLab": > http://omaha.pm.org/kwiki/?BioPerl > > - Does any such project already exist? > - If there's no other obvious choice already bent to BioPerl / BioPerl DB / BioSQL, I'll probably be writing the web framework in Perl's Template Toolkit. The server is Linux, Apache, mySQL (BioPerl DBs). > - I'll be using BioPerl objects for the persistence layer as much as possible. Where not possible I'll ask this list about my patches/additions/ugly hackery. > - I'll be discussing my back office tables like "users" that don't belong in bioperl-db; and my questions about new tables that might belong there on the BioSQL-l mailing list. > - I'm not a computer language zealot (usually), so I'm open to out-of-the-box ideas from anyone. > - I'm a biology newb with a long Perl/database/web/e-commerce background, so please feel free to point out any bio idiocy I engage in. > > Thanks for your time, > > j > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- ++++++++++++++++++++++++++++++++++++++++++++++++++++ Rutger Vos, PhD. candidate Department of Biological Sciences Simon Fraser University 8888 University Drive Burnaby, BC, V5A1S6 Phone: 604-291-5625 Fax: 604-291-3496 Personal site: http://www.sfu.ca/~rvosa FAB* lab: http://www.sfu.ca/~fabstar Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ ++++++++++++++++++++++++++++++++++++++++++++++++++++ From cjfields at uiuc.edu Sun May 28 09:55:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 28 May 2006 08:55:47 -0500 Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file In-Reply-To: <286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com> References: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com> <002601c680fa$644635a0$15327e82@pyrimidine> <286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com> Message-ID: Again, it's b/c 'withrefm' is a valid Restriction::IO module and 'withref' is not. Similar to the case issue you saw before with 'bairoch.' Making this more lenient would help but there are more serious issues with these modules that need to be addressed... http://www.bioperl.org/wiki/Project_priority_list#Restriction_Enzymes Chris On May 26, 2006, at 3:55 PM, Jelena Obradovic wrote: > Hi guys, I tried with the other formats, and it works fine with > "withrefm" > format but not with "withref". > > Thanks a lot for your reponse. > > Cheers, > > Jelena > > On 5/26/06, Chris Fields wrote: >> >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic >>> Sent: Friday, May 26, 2006 11:47 AM >>> To: Bioperl-l at lists.open-bio.org >>> Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file >>> >>> Hi there, >>> >>> I have tried loading enzyme list from a file REBASE bairoch.605 >>> using >>> Bio::Restriction::IO; >>> >>> 1. But for some reason the number of enzymes in the list is >>> always 532 >>> which is a default set of enzymes in enzyme collection. >>> >>> Is there any known issue with this module or a workaround? >>> >>> And here is the code I have been using: >>> >>> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- >>> format=>"Bairoch") >>> || die "can't load the file bairoch.605: $!"; >>> my $enzymes = $re_in->read; >>> print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n"; >> >> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- >> format=>"Bairoch"); >> >> should be >> >> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- >> format=>"bairoch"); >> >> Note the case change for the format; this is noted in the bug >> report you >> submitted earlier. Bio::Restriction::IO works similarly to >> Bio::SeqIO ( >> i.e. >> requires a specific format, which I believe is case-sensitive). >> Judging >> by >> the modules in Bio/Restriction/IO directory, looks like the >> Bio::Restriction::IO format should match one of the following >> formats: >> bairoch, itype2, withrefm, and you can also build your own if >> needed using >> the previous as examples and implementing Bio::Restriction::IO::base. >> >>> 2. The other problem is when trying to use format that is lower-case >>> it throws an exception, but when "B" is capitalized it is ok. >>> I assume it cannot load a file and does not initilize enzyme >>> collection properly. >>> >>> Can't call method "each_enzyme" on an undefined value at >>> .../cgi-bin/seq-load.pl line 51. >> >> My guess? The reason it works with an uppercase ('Bairoch') is >> that it >> can't find the module and uses the default set of enzymes as a >> fallback. >> The exception that you reported when you use lowercase ('bairoch') >> is real >> and I reported it as a bug (there are a few I found in that module). >> >> You might want to try using one of the other formats if you can >> get the >> files in the right format from REBASE. I'm looking into the bugs >> specifically associated with Bio::Restriction::IO::bairoch. >> >>> Any thoughts? >>> >>> >>> Thanks in advance, >>> >>> >>> Jelena Obradovic >>> jelenaob at gmail.com >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > -- > Jelena Obradovic > Email: jobradovic at gmail.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From osborne1 at optonline.net Sun May 28 11:03:37 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Sun, 28 May 2006 11:03:37 -0400 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <44775ED9.4020208@cornell.edu> Message-ID: Genevieve, Does this simplified code, without the &sort_results($blast_report) line, work? By the way, no one can really help you here because you haven't shown us all of the code. The code you are showing certainly looks OK. Brian O. On 5/26/06 4:02 PM, "Genevieve DeClerck" wrote: > &sort_results($blast_report); From simon.rayner.mlist at gmail.com Mon May 29 03:37:24 2006 From: simon.rayner.mlist at gmail.com (mailing lists) Date: Mon, 29 May 2006 15:37:24 +0800 Subject: [Bioperl-l] installation problems with bioperl-ext on x86_64 running SuSE linux Message-ID: Hello, i'm having a problem trying to install the bioperl-ext package on my system. biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align # perl Makefile.PL Writing Makefile for Bio::Ext::Align biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align # make cc -c -I./libs -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -pipe -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -fPIC -O2 -fmessage-length=0 -Wall -D_FORTIFY_SOURCE=2 -g -Wall -pipe -DVERSION=\"0.1\" -DXS_VERSION= \"0.1\" -fPIC "-I/usr/lib/perl5/5.8.7/x86_64-linux-thread-multi/CORE" -DPOSIX -DNOERROR Align.c In file included from Align.xs:12: ./libs/sw.h:1360:1: warning: "/*" within comment . . . Running Mkbootstrap for Bio::Ext::Align () chmod 644 Align.bs rm -f blib/arch/auto/Bio/Ext/Align/Align.so LD_RUN_PATH="" cc -shared -L/usr/local/lib64 Align.o -o blib/arch/auto/Bio/Ext/Align/Align.so libs/libsw.a -lm /usr/lib64/gcc/x86_64-suse-linux/4.0.2/../../../../x86_64-suse-linux/bin/ld: libs/libsw.a(aln.o): relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC libs/libsw.a: could not read symbols: Bad value collect2: ld returned 1 exit status make: *** [blib/arch/auto/Bio/Ext/Align/Align.so] Error 1 biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align # biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align # biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align # the -fPIC flag is already set in the makefile. I found a similar problem in an earlier posting with the following suggestions.... From: Aaron J. Mackey pcbi.upenn.edu> Subject: Re: compiling bioperl-ext Newsgroups: gmane.comp.lang.perl.bio.general Date: 2004-06-09 20:46:05 GMT (1 year, 50 weeks, 3 days, 3 hours and 50 minutes ago) 1) Are you starting with a clean build directory? 2) Does installing other compiled Perl modules work for you (e.g. Data::Dumper or Storable)? That's a pretty arcane error, and if the answer to #2 is "no", then I don't think we can help you. -Aaron ....In my case, both 1) and 2) are true. I installed Data::Dumper without any problems. I've found plenty of similar incidences for other sofware and it seems to relate to 32/64bit issues. Does anyone have any suggestions about how to get around this? thanks Simon Rayner From ULNJUJERYDIX at spammotel.com Mon May 29 05:46:21 2006 From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau) Date: Mon, 29 May 2006 17:46:21 +0800 Subject: [Bioperl-l] **Fwd: Re: URGENT: Bio::Graphics::Panel make the ruler have In-Reply-To: <200605261038.30380.lstein@cshl.edu> References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com> <200605261038.30380.lstein@cshl.edu> Message-ID: <5b6410e0605290246p8875c78n286caa672a55b4de@mail.gmail.com> Hi! oh it was in a slightly different header asking about the create image map feature. I am using the stable version 1.4 of bioperl now. In any case I have not added the sequence as a feature annotated seq. as I already have the bp where the TF binds (in 1-1050 numberings) so what I did was to just add graded segments based on the position. I saw that there is a scale function for the arrow glyp however, it is a multiply function, can it be hacked to take in a offset value (ie minus the scale by 1000?) cheers kevin Hi, > > For some reason I didn't see the first posting on this. In current bioperl > live, the ruler can have negative numberings - I use this routinely. You > need > to create a feature that starts in negative coordinates. What is happening > to > you when you try this? > > Lincoln > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote: > > Hi > > thanks for the help offered thus far! > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq > using > > bioperl. therefore i was asked to make the numberings as such (-1000) is > > there any way at all to do this in bioperl without changing the .pm > file? > > > > thanks guys.. > > kevin > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From shameer at ncbs.res.in Mon May 29 06:07:17 2006 From: shameer at ncbs.res.in (Shameer Khadar) Date: Mon, 29 May 2006 15:37:17 +0530 (IST) Subject: [Bioperl-l] Reg. Integrated Server / CGI to pass PDB to multiple Servers Message-ID: <49187.192.168.1.1.1148897237.squirrel@192.168.1.1> Dear All, My query may not be directly related to BioPERL, But am sure I will get some idea to move on. Some possibilities wil be available from Pise or related modules Query : --------- We have several public servers(say a,b,c). All of them will take a pdb-file as an input and process it and displays it. Now, I need to create a web page(a meta-server/integrated web-server) with three radio buttons(a,b,c) and a single input form(to accept pdb file from the users ...:( - File passing as an argument seems to be some what impossible to me). I need output as 3 links in next page. Is there any Bio-PERL module / CGI / Perl tricks to do it ? Thanks in advance, -- Shameer Khadar Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India T - 91-080-23636420-32 EXT 4241 F - 91-080-23636662/23636675 W - http://caps.ncbs.res.in -------------------------------------------------- "Refrain from illusions, insist on work and not words, patiently seek divine and scientific truth." From torsten.seemann at infotech.monash.edu.au Tue May 30 02:41:31 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 30 May 2006 16:41:31 +1000 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <44775ED9.4020208@cornell.edu> References: <44775ED9.4020208@cornell.edu> Message-ID: <447BE91B.30001@infotech.monash.edu.au> > my $ref_seq_objs; # ref to array of Sequence obj's > my $genome_seq; # fasta containing 1 genomic sequence > my @params = ('program' => 'blastn', > 'database' => $genome_seq, > ); The database parameter needs to be the same thing you would pass to the "-d" option in "blastall". I don't think you can pass a perl string here. ie. there needs to be a properly formatted set of blast indices for your genome sequence on the disk in the appropriate place. See ftp://ftp.ncbi.nlm.nih.gov/blast/documents/blast.html > my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); > my $blast_report = $factory->blastall($ref_seq_objs); #OK But I could be wrong, and $blast_report here contains a valid BLAST report. -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From sb at mrc-dunn.cam.ac.uk Tue May 30 03:59:28 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Tue, 30 May 2006 08:59:28 +0100 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <44775ED9.4020208@cornell.edu> References: <44775ED9.4020208@cornell.edu> Message-ID: <447BFB60.4000006@mrc-dunn.cam.ac.uk> Genevieve DeClerck wrote: > Hi, [snip] > If I've sorted the results the sorted-results will print to screen, > however when I try to print the Hit Table results nothing is returned, > as if the blast results have evaporated.... and visa versa, if i comment > out the part where i point my sorting subroutine to the blast results > reference, my hit table results suddenly prints to screen. [snip] > Here's an abbreviated version of my code: [snip] > ####### > ### the following 2 actions seem to be mutually exclusive. > # 1) sort results into 1-hitter, 2-hitter, etc. groups of > # SeqFeature objs stored in arrays. arrays are then printed > # to stdout > &sort_results($blast_report); > > # 2) print blast results > &print_blast_results($blast_report); > sub print_blast_results{ > my $report = shift; > while(my $result = $report->next_result()){ [snip] You didn't give us your sort_results subroutine, but is it as simple as they both use $report->next_result (and/or $result->next_hit), but you don't reset the internal counter back to the start, so the second subroutine tries to get the next_result and finds the first subroutine has already looked at the last result and so next_result returns false? From a quick look it wasn't obvious how to reset the counter. Hopefully this can be done and someone else knows how. From torsten.seemann at infotech.monash.edu.au Tue May 30 04:18:45 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 30 May 2006 18:18:45 +1000 Subject: [Bioperl-l] For CVS developers - potential pitfall with "return undef" Message-ID: <447BFFE5.8010508@infotech.monash.edu.au> FYI Bioperl developers: I just audited the bioperl-live CVS and found about 450 occurrences of "return undef". Page 199 of "Perl Best Practices" by Damian Conway, and this URL http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest: "Use return; instead of return undef; if you want to return nothing. If someone assigns the return value to an array, the latter creates an array of one value (undef), which evaluates to true. The former will correctly handle all contexts." So I'm guessing at least some of these 450 occurrences *could* result in bugs and should probably be changed. Your opinion may differ :-) -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From cjfields at uiuc.edu Tue May 30 10:07:45 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 30 May 2006 09:07:45 -0500 Subject: [Bioperl-l] For CVS developers - potential pitfall with "returnundef" In-Reply-To: <447BFFE5.8010508@infotech.monash.edu.au> Message-ID: <000c01c683f2$6ca62570$15327e82@pyrimidine> Torsten, Any way you can post a list of some/all of the offending lines or modules? Sounds like something to consider, but if the list is as large as you say we made need something (bugzilla? wiki?) to track the changes and make sure they pass tests; I'm sure a large majority will. I'm guessing Jason would want this somewhere on the project priority list or bugzilla, with a link to the actual list, but I'm not sure. Maybe start a page on the wiki for proposed code changes? Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > Sent: Tuesday, May 30, 2006 3:19 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] For CVS developers - potential pitfall with > "returnundef" > > FYI Bioperl developers: > > I just audited the bioperl-live CVS and found about 450 occurrences of > "return undef". > > Page 199 of "Perl Best Practices" by Damian Conway, and this URL > http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest: > > "Use return; instead of return undef; if you want to return nothing. If > someone assigns the return value to an array, the latter creates an > array of one value (undef), which evaluates to true. The former will > correctly handle all contexts." > > So I'm guessing at least some of these 450 occurrences *could* result in > bugs and should probably be changed. > > Your opinion may differ :-) > > -- > Dr Torsten Seemann http://www.vicbioinformatics.com > Victorian Bioinformatics Consortium, Monash University, Australia > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lstein at cshl.edu Tue May 30 10:47:48 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 30 May 2006 10:47:48 -0400 Subject: [Bioperl-l] **Fwd: Re: URGENT: Bio::Graphics::Panel make the ruler have In-Reply-To: <5b6410e0605290246p8875c78n286caa672a55b4de@mail.gmail.com> References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com> <200605261038.30380.lstein@cshl.edu> <5b6410e0605290246p8875c78n286caa672a55b4de@mail.gmail.com> Message-ID: <200605301047.49127.lstein@cshl.edu> Hi Kevin, I'm afraid that there is no offset value. You'll need the 1.51 version of bioperl to handle negative numbers properly. I understand your reluctance to upgrade just to get the Bio::Graphics functionality. You might consider checking out just the Bio/Graphics subtree and installing that. It should work on top of 1.4 Lincoln On Monday 29 May 2006 05:46, Kevin Lam Koiyau wrote: > Hi! > oh it was in a slightly different header asking about the create image map > feature. > I am using the stable version 1.4 of bioperl now. In any case I have not > added the sequence as a feature annotated seq. as I already have the bp > where the TF binds (in 1-1050 numberings) so what I did was to just add > graded segments based on the position. > I saw that there is a scale function for the arrow glyp however, it is a > multiply function, can it be hacked to take in a offset value (ie minus the > scale by 1000?) > > cheers > kevin > > > Hi, > > > For some reason I didn't see the first posting on this. In current > > bioperl live, the ruler can have negative numberings - I use this > > routinely. You need > > to create a feature that starts in negative coordinates. What is > > happening to > > you when you try this? > > > > Lincoln > > > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote: > > > Hi > > > thanks for the help offered thus far! > > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq > > > > using > > > > > bioperl. therefore i was asked to make the numberings as such (-1000) > > > is there any way at all to do this in bioperl without changing the .pm > > > > file? > > > > > thanks guys.. > > > kevin > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Lincoln D. Stein > > Cold Spring Harbor Laboratory > > 1 Bungtown Road > > Cold Spring Harbor, NY 11724 > > (516) 367-8380 (voice) > > (516) 367-8389 (fax) > > FOR URGENT MESSAGES & SCHEDULING, > > PLEASE CONTACT MY ASSISTANT, > > SANDRA MICHELSEN, AT michelse at cshl.edu > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Tue May 30 10:50:06 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 30 May 2006 09:50:06 -0500 Subject: [Bioperl-l] Bio::Restriction::IO issues Message-ID: <000f01c683f8$5771ed50$15327e82@pyrimidine> Jason, Brian, et al, I found several major issues with Bio::Restriction::IO (this popped up while bug squashing). In particular, the POD is pretty misleading. It states (directly from perldoc): SYNOPSIS use Bio::Restriction::IO; $in = Bio::Restriction::IO->new(-file => "inputfilename" , -format => 'withrefm'); $out = Bio::Restriction::IO->new(-file => ">outputfilename" , -format => 'bairoch'); my $res = $in->read; # a Bio::Restriction::EnzymeCollection $out->write($res); # or # use Bio::Restriction::IO; # # #input file format can be read from the file extension (dat|xml) # $in = Bio::Restriction::IO->newFh(-file => "inputfilename"); # $out = Bio::Restriction::IO->newFh('-format' => 'xml'); # # # World's shortest flat<->xml format converter: # print $out $_ while <$in>; So, I have found several problems with these modules. I really hate to criticize code here, as my own is pretty hacky, but I think these are things to seriously mull over: 1) Note that, though some of the lines above are commented they are still there in POD and thus present in perldoc/pod2html etc. So, judging from the above, it suggests using the script above should read in from one format and write out to another (like SeqIO). However, NONE of the current write() methods are implemented for any of the IO modules (withref, base, itype2, bairoch), so this does not happen as expected. You get the nasty thrown 'method not implemented error' instead when writing. 2) The commented statements in POD above also suggest that REBASE XML format is supported when there is no XML module. 3) The Bio::Restriction::IO::bairoch module had multiple bugs which made it unusable until I added a few small changes; it still can't handle multisite/multicut enzymes properly, so in essence it is useless until that is addressed. 4) Bio::Restriction::IO inherits from Bio::SeqIO, though I'm not sure why. Shouldn't it just inherit from Bio::Root::Root/Bio::Root::IO and make up it's own methods? I'm working on at least getting the 'bairoch' input format up and running (so at least it gets the enzymes into a Bio::Restriction::Enzyme::Collection). From this point I'm not sure where to proceed. The POD obviously needs to be corrected to reflect that writing formats is not implemented (and the bit about XML should be taken out completely); that's the easy part which I am working on and plan committing today. However, these modules don't seem to be used too frequently so I'm not sure whether it's worth spending too much time getting these up to speed at the moment (adding write methods, switching to Bio::Root::Root, etc); I have other priorities at the moment (including a way overdue ListSummary). I'm also not sure who else is (using|working) on these so I don't want to (make too many changes|step on someone else's toes), but these are, IMHO, pretty serious problems. Any thoughts? Chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue May 30 12:34:18 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 30 May 2006 11:34:18 -0500 Subject: [Bioperl-l] Bio::Restriction::IO changes Message-ID: <001401c68406$e71e9850$15327e82@pyrimidine> Jason, Brian, et al: I have made changes to the Bio::Restriction::IO POD to remove any reference to write functions since almost none have been implemented yet, so including this into POD is a bit misleading. At the moment, you can't write to any REBASE format except for 'base', which I found is the only one that works. And, upon further checking, even that one has issues: it looks like there are problems with multicut/multisite enzymes when writing in 'base' format which I'm not delving into ('TaqII' only displays one site when writing when it has two cut sites). I'll add this to the wiki and a bug report (enhancement) for this module. I am also removing mention of XML and 'bairoch' formats (the former isn't present and the latter is broken at the moment) and added a few things to the POD TO DO section. Rob (if you're out there somewhere in the ether), have you made any more changes to these modules that need to be committed? Didn't know if any of these issues have already been addressed/changed etc. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From jelenaob at gmail.com Tue May 30 00:58:35 2006 From: jelenaob at gmail.com (Jelena Obradovic) Date: Mon, 29 May 2006 21:58:35 -0700 Subject: [Bioperl-l] Bio::Graphic::Panel backgroud color Message-ID: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com> Hello everybody, does anybody know how to remove the background color of the Panel. Currently, I am not adding anything to it, so I can troubleshot the problem, and I have tried setting up all color attributes I could find to the panel, but no luck. Whatever I do, I get the BLUE border of the panel. Has anybody faced the same problem? Thanks in advance, Jelena And here is the code I am currently using: ----------------------------------------------------------------------------------------------------------- my $panel = Bio::Graphics::Panel->new(-length => $prim_seq->length() + 200, -width => 800, -pad_left => 10, -pad_right => 10, -key_color => 'white', -bgcolor => 'white', -gridcolor=>'black', -fgcolor => 'black', -grid => 0, ); my ($url,$map,$mapname) = $panel->image_and_map( #-root=>'$root_url' , -url => '/tmpimages'); #make clickable image print $cgi->img({-src=>$url,-usemap=>"#$mapname"}); print $map; ----------------------------------------------------------------------------------------------------------- From jelenaob at gmail.com Tue May 30 00:58:35 2006 From: jelenaob at gmail.com (Jelena Obradovic) Date: Mon, 29 May 2006 21:58:35 -0700 Subject: [Bioperl-l] Bio::Graphic::Panel backgroud color Message-ID: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com> Hello everybody, does anybody know how to remove the background color of the Panel. Currently, I am not adding anything to it, so I can troubleshot the problem, and I have tried setting up all color attributes I could find to the panel, but no luck. Whatever I do, I get the BLUE border of the panel. Has anybody faced the same problem? Thanks in advance, Jelena And here is the code I am currently using: ----------------------------------------------------------------------------------------------------------- my $panel = Bio::Graphics::Panel->new(-length => $prim_seq->length() + 200, -width => 800, -pad_left => 10, -pad_right => 10, -key_color => 'white', -bgcolor => 'white', -gridcolor=>'black', -fgcolor => 'black', -grid => 0, ); my ($url,$map,$mapname) = $panel->image_and_map( #-root=>'$root_url' , -url => '/tmpimages'); #make clickable image print $cgi->img({-src=>$url,-usemap=>"#$mapname"}); print $map; ----------------------------------------------------------------------------------------------------------- From luciap at sas.upenn.edu Tue May 30 14:49:48 2006 From: luciap at sas.upenn.edu (Lucia Peixoto) Date: Tue, 30 May 2006 14:49:48 -0400 Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function Message-ID: <1149014988.447c93cc01761@128.91.55.38> Hi I am here again, I finally got to write the "collapse nodes" function and have a couple of questions. In order to collpase any node $node, I first have to get the parent which I can do as $parent=$node->ancestor and then the children as: @children=$node->get_all_Descendents (or should I use each descendent?) Then before deleting $node I have to assign all its children to $parent, and here is where I am kind of confussed. Can I use the add_Descendent function for this? I've been tryig to write something like this: foreach $child (@children){ $parent=add_Descendent->$child; } but this doesn't work and I think it is because I don't have any idea of what I am doing any suggestions? thanks Lucia Peixoto Department of Biology,SAS University of Pennsylvania From rvosa at sfu.ca Tue May 30 14:52:52 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Tue, 30 May 2006 11:52:52 -0700 Subject: [Bioperl-l] For CVS developers - potential pitfall with "returnundef" In-Reply-To: <000c01c683f2$6ca62570$15327e82@pyrimidine> References: <000c01c683f2$6ca62570$15327e82@pyrimidine> Message-ID: <447C9484.9030102@sfu.ca> Although I agree with the sentiment of following PBP, I'm not so sure changing 'return undef' to 'return' *now* will fix any bugs without introducing new, subtle ones. Chris Fields wrote: > Torsten, > > Any way you can post a list of some/all of the offending lines or modules? > Sounds like something to consider, but if the list is as large as you say we > made need something (bugzilla? wiki?) to track the changes and make sure > they pass tests; I'm sure a large majority will. > > I'm guessing Jason would want this somewhere on the project priority list or > bugzilla, with a link to the actual list, but I'm not sure. Maybe start a > page on the wiki for proposed code changes? > > Chris > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann >> Sent: Tuesday, May 30, 2006 3:19 AM >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] For CVS developers - potential pitfall with >> "returnundef" >> >> FYI Bioperl developers: >> >> I just audited the bioperl-live CVS and found about 450 occurrences of >> "return undef". >> >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest: >> >> "Use return; instead of return undef; if you want to return nothing. If >> someone assigns the return value to an array, the latter creates an >> array of one value (undef), which evaluates to true. The former will >> correctly handle all contexts." >> >> So I'm guessing at least some of these 450 occurrences *could* result in >> bugs and should probably be changed. >> >> Your opinion may differ :-) >> >> -- >> Dr Torsten Seemann http://www.vicbioinformatics.com >> Victorian Bioinformatics Consortium, Monash University, Australia >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- ++++++++++++++++++++++++++++++++++++++++++++++++++++ Rutger Vos, PhD. candidate Department of Biological Sciences Simon Fraser University 8888 University Drive Burnaby, BC, V5A1S6 Phone: 604-291-5625 Fax: 604-291-3496 Personal site: http://www.sfu.ca/~rvosa FAB* lab: http://www.sfu.ca/~fabstar Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ ++++++++++++++++++++++++++++++++++++++++++++++++++++ From luciap at sas.upenn.edu Tue May 30 16:11:52 2006 From: luciap at sas.upenn.edu (Lucia Peixoto) Date: Tue, 30 May 2006 16:11:52 -0400 Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function In-Reply-To: References: Message-ID: <1149019912.447ca7085124e@128.91.55.38> Hi OK that was silly, but what I have in my code is what you just wrote But the problem is that if I write $parent->add_Descendent($child) it tells me that I am calling the method "ass_Descendent" on an undefined value (but I did define $parent before??) So here it goes the code so far: use Bio::TreeIO; my $in = new Bio::TreeIO(-file => 'Test2.tre', -format => 'newick'); my $out = new Bio::TreeIO(-file => '>mytree.out', -format => 'newick'); while( my $tree = $in->next_tree ) { foreach my $node ( grep { ! $_->is_Leaf() } $tree->get_nodes() ) { my $bootstrap=$node->_creation_id; if ($bootstrap < 70 ){ my $parent = $node->ancestor; my @children=$node->get_all_Descendents; foreach my $child (@children){ $parent->add_Descendent($child); } ........ eventually I'll add (once I assigned the children to the parent succesfully): $tree->remove_Node($node); } } $out->write_tree($tree); } Quoting aaron.j.mackey at gsk.com: > > foreach $child (@children){ > > $parent=add_Descendent->$child; > > } > > I think what you want is $parent->add_Descendent($child) > > -Aaron > Lucia Peixoto Department of Biology,SAS University of Pennsylvania From jason.stajich at duke.edu Tue May 30 16:30:56 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue, 30 May 2006 16:30:56 -0400 Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function In-Reply-To: <1149019912.447ca7085124e@128.91.55.38> References: <1149019912.447ca7085124e@128.91.55.38> Message-ID: <6B175FC0-F9D4-4658-AF9D-23D7F1C1B241@duke.edu> you need to special case the root - it won't have an ancestor. just protect the my $parent = $node->ancestor with an if statement as I did below On May 30, 2006, at 4:11 PM, Lucia Peixoto wrote: > Hi > OK that was silly, but what I have in my code is what you just wrote > But the problem is that if I write > > $parent->add_Descendent($child) > > it tells me that I am calling the method "ass_Descendent" on an > undefined value > (but I did define $parent before??) > > So here it goes the code so far: > > use Bio::TreeIO; > my $in = new Bio::TreeIO(-file => 'Test2.tre', > -format => 'newick'); > my $out = new Bio::TreeIO(-file => '>mytree.out', > -format => 'newick'); > while( my $tree = $in->next_tree ) { > foreach my $node ( grep { ! $_->is_Leaf() } $tree->get_nodes() ) { > my $bootstrap=$node->_creation_id; > > if ($bootstrap < 70 ){ > >>> if( my $parent = $node->ancestor ) { > my @children=$node->get_all_Descendents; > foreach my $child (@children){ > $parent->add_Descendent($child); > } } > > ........ > > eventually I'll add (once I assigned the children to the parent > succesfully): > $tree->remove_Node($node); > > } > } > $out->write_tree($tree); > } > > Quoting aaron.j.mackey at gsk.com: > >>> foreach $child (@children){ >>> $parent=add_Descendent->$child; >>> } >> >> I think what you want is $parent->add_Descendent($child) >> >> -Aaron >> > > > Lucia Peixoto > Department of Biology,SAS > University of Pennsylvania > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cjfields at uiuc.edu Tue May 30 17:40:18 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 30 May 2006 16:40:18 -0500 Subject: [Bioperl-l] For CVS developers - potential pitfallwith "returnundef" In-Reply-To: <447C9484.9030102@sfu.ca> Message-ID: <001801c68431$a586b2d0$15327e82@pyrimidine> Agreed, though I think these changes should be implemented at some point (Conway's argument here makes sense and it is nice for Torsten to check this out). If proper tests are written then any changes resulting in errors should be picked up by checking the appropriate test suite, though I know it doesn't absolutely guarantee it. ; P Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Rutger Vos > Sent: Tuesday, May 30, 2006 1:53 PM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith > "returnundef" > > Although I agree with the sentiment of following PBP, I'm not so sure > changing 'return undef' to 'return' *now* will fix any bugs without > introducing new, subtle ones. > > Chris Fields wrote: > > Torsten, > > > > Any way you can post a list of some/all of the offending lines or > modules? > > Sounds like something to consider, but if the list is as large as you > say we > > made need something (bugzilla? wiki?) to track the changes and make sure > > they pass tests; I'm sure a large majority will. > > > > I'm guessing Jason would want this somewhere on the project priority > list or > > bugzilla, with a link to the actual list, but I'm not sure. Maybe start > a > > page on the wiki for proposed code changes? > > > > Chris > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > >> Sent: Tuesday, May 30, 2006 3:19 AM > >> To: bioperl-l at lists.open-bio.org > >> Subject: [Bioperl-l] For CVS developers - potential pitfall with > >> "returnundef" > >> > >> FYI Bioperl developers: > >> > >> I just audited the bioperl-live CVS and found about 450 occurrences of > >> "return undef". > >> > >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL > >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest: > >> > >> "Use return; instead of return undef; if you want to return nothing. If > >> someone assigns the return value to an array, the latter creates an > >> array of one value (undef), which evaluates to true. The former will > >> correctly handle all contexts." > >> > >> So I'm guessing at least some of these 450 occurrences *could* result > in > >> bugs and should probably be changed. > >> > >> Your opinion may differ :-) > >> > >> -- > >> Dr Torsten Seemann http://www.vicbioinformatics.com > >> Victorian Bioinformatics Consortium, Monash University, Australia > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > -- > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > Rutger Vos, PhD. candidate > Department of Biological Sciences > Simon Fraser University > 8888 University Drive > Burnaby, BC, V5A1S6 > Phone: 604-291-5625 > Fax: 604-291-3496 > Personal site: http://www.sfu.ca/~rvosa > FAB* lab: http://www.sfu.ca/~fabstar > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rvosa at sfu.ca Tue May 30 17:58:25 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Tue, 30 May 2006 14:58:25 -0700 Subject: [Bioperl-l] For CVS developers - potential pitfallwith"returnundef" In-Reply-To: <001901c68433$026b1ad0$15327e82@pyrimidine> References: <001901c68433$026b1ad0$15327e82@pyrimidine> Message-ID: <447CC001.4050000@sfu.ca> I've been following the perl6 mailing lists for a while now. I think this time around it won't really take that long (one year?) for pugs/perl6 stacks to become more than just toys. I think especially large projects, like bioperl, will really benefit from the improved OO implementation in perl6, so it might be of interest to at least fantasize about it. Chris Fields wrote: > Ha! Or may be the 'nonexistent' bioperl-experimental. Wonder what'll > happen once Perl6 comes to term? > > -CJF > > >> -----Original Message----- >> From: Rutger Vos [mailto:rvosa at sfu.ca] >> Sent: Tuesday, May 30, 2006 4:48 PM >> To: Chris Fields >> Subject: Re: [Bioperl-l] For CVS developers - potential >> pitfallwith"returnundef" >> >> Surely this will all sort itself out in bioperl6 ;-) >> >> Chris Fields wrote: >> >>> Agreed, though I think these changes should be implemented at some point >>> (Conway's argument here makes sense and it is nice for Torsten to check >>> >> this >> >>> out). If proper tests are written then any changes resulting in errors >>> should be picked up by checking the appropriate test suite, though I >>> >> know it >> >>> doesn't absolutely guarantee it. ; P >>> >>> Chris >>> >>> >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Rutger Vos >>>> Sent: Tuesday, May 30, 2006 1:53 PM >>>> To: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith >>>> "returnundef" >>>> >>>> Although I agree with the sentiment of following PBP, I'm not so sure >>>> changing 'return undef' to 'return' *now* will fix any bugs without >>>> introducing new, subtle ones. >>>> >>>> Chris Fields wrote: >>>> >>>> >>>>> Torsten, >>>>> >>>>> Any way you can post a list of some/all of the offending lines or >>>>> >>>>> >>>> modules? >>>> >>>> >>>>> Sounds like something to consider, but if the list is as large as you >>>>> >>>>> >>>> say we >>>> >>>> >>>>> made need something (bugzilla? wiki?) to track the changes and make >>>>> >> sure >> >>>>> they pass tests; I'm sure a large majority will. >>>>> >>>>> I'm guessing Jason would want this somewhere on the project priority >>>>> >>>>> >>>> list or >>>> >>>> >>>>> bugzilla, with a link to the actual list, but I'm not sure. Maybe >>>>> >> start >> >>>> a >>>> >>>> >>>>> page on the wiki for proposed code changes? >>>>> >>>>> Chris >>>>> >>>>> >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann >>>>>> Sent: Tuesday, May 30, 2006 3:19 AM >>>>>> To: bioperl-l at lists.open-bio.org >>>>>> Subject: [Bioperl-l] For CVS developers - potential pitfall with >>>>>> "returnundef" >>>>>> >>>>>> FYI Bioperl developers: >>>>>> >>>>>> I just audited the bioperl-live CVS and found about 450 occurrences >>>>>> >> of >> >>>>>> "return undef". >>>>>> >>>>>> Page 199 of "Perl Best Practices" by Damian Conway, and this URL >>>>>> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html >>>>>> >> suggest: >> >>>>>> "Use return; instead of return undef; if you want to return nothing. >>>>>> >> If >> >>>>>> someone assigns the return value to an array, the latter creates an >>>>>> array of one value (undef), which evaluates to true. The former will >>>>>> correctly handle all contexts." >>>>>> >>>>>> So I'm guessing at least some of these 450 occurrences *could* result >>>>>> >>>>>> >>>> in >>>> >>>> >>>>>> bugs and should probably be changed. >>>>>> >>>>>> Your opinion may differ :-) >>>>>> >>>>>> -- >>>>>> Dr Torsten Seemann http://www.vicbioinformatics.com >>>>>> Victorian Bioinformatics Consortium, Monash University, Australia >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> -- >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> Rutger Vos, PhD. candidate >>>> Department of Biological Sciences >>>> Simon Fraser University >>>> 8888 University Drive >>>> Burnaby, BC, V5A1S6 >>>> Phone: 604-291-5625 >>>> Fax: 604-291-3496 >>>> Personal site: http://www.sfu.ca/~rvosa >>>> FAB* lab: http://www.sfu.ca/~fabstar >>>> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> >>> >>> >> -- >> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Rutger Vos, PhD. candidate >> Department of Biological Sciences >> Simon Fraser University >> 8888 University Drive >> Burnaby, BC, V5A1S6 >> Phone: 604-291-5625 >> Fax: 604-291-3496 >> Personal site: http://www.sfu.ca/~rvosa >> FAB* lab: http://www.sfu.ca/~fabstar >> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > > > > > > -- ++++++++++++++++++++++++++++++++++++++++++++++++++++ Rutger Vos, PhD. candidate Department of Biological Sciences Simon Fraser University 8888 University Drive Burnaby, BC, V5A1S6 Phone: 604-291-5625 Fax: 604-291-3496 Personal site: http://www.sfu.ca/~rvosa FAB* lab: http://www.sfu.ca/~fabstar Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ ++++++++++++++++++++++++++++++++++++++++++++++++++++ From cjfields at uiuc.edu Tue May 30 18:08:26 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 30 May 2006 17:08:26 -0500 Subject: [Bioperl-l] For CVS developers - potentialpitfallwith"returnundef" In-Reply-To: <447CC001.4050000@sfu.ca> Message-ID: <001a01c68435$93135a50$15327e82@pyrimidine> Agreed. I would say, probably 6-12 months time, might be a good idea to try getting something actually started, maybe under the 'bioperl-experimental' title Jason has mentioned. One could always try getting a Bio::Root-like object going in Pugs/Perl6 as a starter and work up from there, with emphasis on key areas (seq. parsing, so on). CJF > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Rutger Vos > Sent: Tuesday, May 30, 2006 4:58 PM > To: bioperl list > Subject: Re: [Bioperl-l] For CVS developers - > potentialpitfallwith"returnundef" > > I've been following the perl6 mailing lists for a while now. I think > this time around it won't really take that long (one year?) for > pugs/perl6 stacks to become more than just toys. I think especially > large projects, like bioperl, will really benefit from the improved OO > implementation in perl6, so it might be of interest to at least > fantasize about it. > > Chris Fields wrote: > > Ha! Or may be the 'nonexistent' bioperl-experimental. Wonder what'll > > happen once Perl6 comes to term? > > > > -CJF > > > > > >> -----Original Message----- > >> From: Rutger Vos [mailto:rvosa at sfu.ca] > >> Sent: Tuesday, May 30, 2006 4:48 PM > >> To: Chris Fields > >> Subject: Re: [Bioperl-l] For CVS developers - potential > >> pitfallwith"returnundef" > >> > >> Surely this will all sort itself out in bioperl6 ;-) > >> > >> Chris Fields wrote: > >> > >>> Agreed, though I think these changes should be implemented at some > point > >>> (Conway's argument here makes sense and it is nice for Torsten to > check > >>> > >> this > >> > >>> out). If proper tests are written then any changes resulting in > errors > >>> should be picked up by checking the appropriate test suite, though I > >>> > >> know it > >> > >>> doesn't absolutely guarantee it. ; P > >>> > >>> Chris > >>> > >>> > >>> > >>>> -----Original Message----- > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>> bounces at lists.open-bio.org] On Behalf Of Rutger Vos > >>>> Sent: Tuesday, May 30, 2006 1:53 PM > >>>> To: bioperl-l at lists.open-bio.org > >>>> Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith > >>>> "returnundef" > >>>> > >>>> Although I agree with the sentiment of following PBP, I'm not so sure > >>>> changing 'return undef' to 'return' *now* will fix any bugs without > >>>> introducing new, subtle ones. > >>>> > >>>> Chris Fields wrote: > >>>> > >>>> > >>>>> Torsten, > >>>>> > >>>>> Any way you can post a list of some/all of the offending lines or > >>>>> > >>>>> > >>>> modules? > >>>> > >>>> > >>>>> Sounds like something to consider, but if the list is as large as > you > >>>>> > >>>>> > >>>> say we > >>>> > >>>> > >>>>> made need something (bugzilla? wiki?) to track the changes and make > >>>>> > >> sure > >> > >>>>> they pass tests; I'm sure a large majority will. > >>>>> > >>>>> I'm guessing Jason would want this somewhere on the project priority > >>>>> > >>>>> > >>>> list or > >>>> > >>>> > >>>>> bugzilla, with a link to the actual list, but I'm not sure. Maybe > >>>>> > >> start > >> > >>>> a > >>>> > >>>> > >>>>> page on the wiki for proposed code changes? > >>>>> > >>>>> Chris > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>>>> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > >>>>>> Sent: Tuesday, May 30, 2006 3:19 AM > >>>>>> To: bioperl-l at lists.open-bio.org > >>>>>> Subject: [Bioperl-l] For CVS developers - potential pitfall with > >>>>>> "returnundef" > >>>>>> > >>>>>> FYI Bioperl developers: > >>>>>> > >>>>>> I just audited the bioperl-live CVS and found about 450 occurrences > >>>>>> > >> of > >> > >>>>>> "return undef". > >>>>>> > >>>>>> Page 199 of "Perl Best Practices" by Damian Conway, and this URL > >>>>>> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html > >>>>>> > >> suggest: > >> > >>>>>> "Use return; instead of return undef; if you want to return > nothing. > >>>>>> > >> If > >> > >>>>>> someone assigns the return value to an array, the latter creates an > >>>>>> array of one value (undef), which evaluates to true. The former > will > >>>>>> correctly handle all contexts." > >>>>>> > >>>>>> So I'm guessing at least some of these 450 occurrences *could* > result > >>>>>> > >>>>>> > >>>> in > >>>> > >>>> > >>>>>> bugs and should probably be changed. > >>>>>> > >>>>>> Your opinion may differ :-) > >>>>>> > >>>>>> -- > >>>>>> Dr Torsten Seemann http://www.vicbioinformatics.com > >>>>>> Victorian Bioinformatics Consortium, Monash University, Australia > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>>>> > >>>>>> > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>> -- > >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>> Rutger Vos, PhD. candidate > >>>> Department of Biological Sciences > >>>> Simon Fraser University > >>>> 8888 University Drive > >>>> Burnaby, BC, V5A1S6 > >>>> Phone: 604-291-5625 > >>>> Fax: 604-291-3496 > >>>> Personal site: http://www.sfu.ca/~rvosa > >>>> FAB* lab: http://www.sfu.ca/~fabstar > >>>> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>> > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> > >>> > >>> > >>> > >>> > >> -- > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> Rutger Vos, PhD. candidate > >> Department of Biological Sciences > >> Simon Fraser University > >> 8888 University Drive > >> Burnaby, BC, V5A1S6 > >> Phone: 604-291-5625 > >> Fax: 604-291-3496 > >> Personal site: http://www.sfu.ca/~rvosa > >> FAB* lab: http://www.sfu.ca/~fabstar > >> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> > > > > > > > > > > > > > > -- > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > Rutger Vos, PhD. candidate > Department of Biological Sciences > Simon Fraser University > 8888 University Drive > Burnaby, BC, V5A1S6 > Phone: 604-291-5625 > Fax: 604-291-3496 > Personal site: http://www.sfu.ca/~rvosa > FAB* lab: http://www.sfu.ca/~fabstar > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ULNJUJERYDIX at spammotel.com Tue May 30 23:45:12 2006 From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau) Date: Wed, 31 May 2006 11:45:12 +0800 Subject: [Bioperl-l] SOLVED Bio::Graphics::Panel make ruler have neg values Message-ID: <5b6410e0605302045x5c420674x6f898a8a2973991a@mail.gmail.com> I am so sorry for the truncated email accidentally hit reply. if anyone is interested i have opted to change change line 161 of arrow.pm in Perl/site/lib/Bio/Graphics/Glyph/arrow.pm in linux its /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Glyph/arrow.pm $gd->string($font,$middle,$center+$a2-1,$label,$font_color) to $gd->string($font,$middle,$center+$a2-1,$label-1000,$font_color) just for this one-off use. strangely I found at line 112 for ver 1.51 bioperl in arrow.pm a hidden option for coords offset? my $relative_coords_offset = $self->option('relative_coords_offset'); $relative_coords_offset = 1 unless defined $relative_coords_offset; but entering the option -relative_coords_offset=>1000 in the arrow glyphs didn't do anything... Hi! > oh it was in a slightly different header asking about the create image map > feature. > I am using the stable version 1.4 of bioperl now. In any case I have not > added the sequence as a feature annotated seq. as I already have the bp > where the TF binds (in 1-1050 numberings) so what I did was to just add > graded segments based on the position. > I saw that there is a scale function for the arrow glyp however, it is a > multiply function, can it be hacked to take in a offset value (ie minus > the > scale by 1000?) > > cheers > kevin > > > Hi, > > > > For some reason I didn't see the first posting on this. In current > bioperl > > live, the ruler can have negative numberings - I use this routinely. You > > need > > to create a feature that starts in negative coordinates. What is > happening > > to > > you when you try this? > > > > Lincoln > > > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote: > > > Hi > > > thanks for the help offered thus far! > > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq > > using > > > bioperl. therefore i was asked to make the numberings as such (-1000) > is > > > there any way at all to do this in bioperl without changing the .pm > > file? > > > > > > thanks guys.. > > > kevin > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Lincoln D. Stein > > Cold Spring Harbor Laboratory > > 1 Bungtown Road > > Cold Spring Harbor, NY 11724 > > (516) 367-8380 (voice) > > (516) 367-8389 (fax) > > FOR URGENT MESSAGES & SCHEDULING, > > PLEASE CONTACT MY ASSISTANT, > > SANDRA MICHELSEN, AT michelse at cshl.edu > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From sb at mrc-dunn.cam.ac.uk Wed May 31 04:40:08 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Wed, 31 May 2006 09:40:08 +0100 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <447C7985.9000404@cornell.edu> References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk> <447C7985.9000404@cornell.edu> Message-ID: <447D5668.7070500@mrc-dunn.cam.ac.uk> Genevieve DeClerck wrote: > Thanks for your comment Sendu, it was very helpful. I think this must be > what's going on.. I am using $blast_report->next_result in both > subroutines. It appears that analyzing the blast results first w/ my > sort subroutine empties (?) the $blast_result object so that when I try > to print, there is nothing left to print. (and visa-versa when I print > first then try to sort). > So, from the looks of things, using next_result has the effect of > popping the Bio::Search::Result::ResultI objects off of the SearchIO > blast report object?? Not quite. It's more or less exactly like opening a file and then trying to read it all twice like this: open(FILE, "file"); while () { print # prints each line in the file } while () { print # never happens, we never enter this while loop } To get the second while loop to print anything we need to say seek(FILE, 0, 0) before it. Or in the first while loop store each line in an array, and then make the second loop a foreach through that array. > It seems I could get around this by making a copy of the blast report by > setting it to another new variable...(not the most elegant solution) but > I'm having trouble with this... > > If I do: > > my $blast_report_copy = $blast_report; > > I'm just copying the reference to the SearchIO blast result, so it > doesn't help me. How can I make another physical copy of this blast > result object? Seems like a simple thing but how to do it is escaping me. Not really a good idea, and it may not work anyway if the object contains a filehandle. But for a simple object you might recursively loop through the data structure and copy each element out into a similar data structure. > But better yet, the way to go is to 'reset the counter,' or to find a > way to look at/print/sort the results without removing data from the > blast result object. How is this done though?? It would be rather nice if this worked: my $blast_report = $factory->blastall($ref_seq_objs); my $blast_fh = $blast_report->fh(); while (<$blast_fh>) { # $_ is a ResultI object, use as normal } seek($blast_fh, 0, 0); # this would be great, but does it work? while <$blast_fh>) { # go through the results again in your second subroutine } An alternative hacky way of doing it, which may also not work, would be to go through your $blast_report as normal, but then before going through it a second time, say my $fh = $blast_report->_fh; seek($fh, 0, 0); Finally, the most sensible way (assuming bioperl provides no methods of its own for this) of solving the problem is, the first time you go through each next_result, next_hit and next_hsp, just store the returned objects in an array of arrays of arrays. Then the second time get the objects from your array structure instead of with the method calls. From heikki at sanbi.ac.za Wed May 31 06:55:18 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 31 May 2006 12:55:18 +0200 Subject: [Bioperl-l] =?iso-8859-1?q?For_CVS_developers_-_potential_pitfall?= =?iso-8859-1?q?with_=22returnundef=22?= In-Reply-To: <001801c68431$a586b2d0$15327e82@pyrimidine> References: <001801c68431$a586b2d0$15327e82@pyrimidine> Message-ID: <200605311255.19166.heikki@sanbi.ac.za> In my opinion the sooner the bugs get exposed the better. It is much more likely that there is a well hidden bug caused by assigning accidentally undef into an one element array that someone intentionally writing code that expects that behaviour! I removed (but did not commit yet) all undefs from my old Bio::Variation code and could not see any differences in the test output. Let's remove them! -Heikki On Tuesday 30 May 2006 23:40, Chris Fields wrote: > Agreed, though I think these changes should be implemented at some point > (Conway's argument here makes sense and it is nice for Torsten to check > this out). If proper tests are written then any changes resulting in > errors should be picked up by checking the appropriate test suite, though I > know it doesn't absolutely guarantee it. ; P > > Chris > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Rutger Vos > > Sent: Tuesday, May 30, 2006 1:53 PM > > To: bioperl-l at lists.open-bio.org > > Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith > > "returnundef" > > > > Although I agree with the sentiment of following PBP, I'm not so sure > > changing 'return undef' to 'return' *now* will fix any bugs without > > introducing new, subtle ones. > > > > Chris Fields wrote: > > > Torsten, > > > > > > Any way you can post a list of some/all of the offending lines or > > > > modules? > > > > > Sounds like something to consider, but if the list is as large as you > > > > say we > > > > > made need something (bugzilla? wiki?) to track the changes and make > > > sure they pass tests; I'm sure a large majority will. > > > > > > I'm guessing Jason would want this somewhere on the project priority > > > > list or > > > > > bugzilla, with a link to the actual list, but I'm not sure. Maybe > > > start > > > > a > > > > > page on the wiki for proposed code changes? > > > > > > Chris > > > > > >> -----Original Message----- > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > > >> Sent: Tuesday, May 30, 2006 3:19 AM > > >> To: bioperl-l at lists.open-bio.org > > >> Subject: [Bioperl-l] For CVS developers - potential pitfall with > > >> "returnundef" > > >> > > >> FYI Bioperl developers: > > >> > > >> I just audited the bioperl-live CVS and found about 450 occurrences of > > >> "return undef". > > >> > > >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL > > >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html > > >> suggest: > > >> > > >> "Use return; instead of return undef; if you want to return nothing. > > >> If someone assigns the return value to an array, the latter creates an > > >> array of one value (undef), which evaluates to true. The former will > > >> correctly handle all contexts." > > >> > > >> So I'm guessing at least some of these 450 occurrences *could* result > > > > in > > > > >> bugs and should probably be changed. > > >> > > >> Your opinion may differ :-) > > >> > > >> -- > > >> Dr Torsten Seemann http://www.vicbioinformatics.com > > >> Victorian Bioinformatics Consortium, Monash University, Australia > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Rutger Vos, PhD. candidate > > Department of Biological Sciences > > Simon Fraser University > > 8888 University Drive > > Burnaby, BC, V5A1S6 > > Phone: 604-291-5625 > > Fax: 604-291-3496 > > Personal site: http://www.sfu.ca/~rvosa > > FAB* lab: http://www.sfu.ca/~fabstar > > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of the Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Wed May 31 06:44:28 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 31 May 2006 12:44:28 +0200 Subject: [Bioperl-l] Bio::Restriction::IO issues In-Reply-To: <000f01c683f8$5771ed50$15327e82@pyrimidine> References: <000f01c683f8$5771ed50$15327e82@pyrimidine> Message-ID: <200605311244.29187.heikki@sanbi.ac.za> Chris, Thanks for stepping in. I feel partly responsible here because I originally changed some of Rob's code but have not followed up since. There have not been active development on these modules so do not worry about stepping on anyone's toes. -Heikki On Tuesday 30 May 2006 16:50, Chris Fields wrote: > Jason, Brian, et al, > > I found several major issues with Bio::Restriction::IO (this popped up > while bug squashing). In particular, the POD is pretty misleading. It > states (directly from perldoc): > > SYNOPSIS > use Bio::Restriction::IO; > > $in = Bio::Restriction::IO->new(-file => "inputfilename" , > -format => 'withrefm'); > $out = Bio::Restriction::IO->new(-file => ">outputfilename" , > -format => 'bairoch'); > my $res = $in->read; # a Bio::Restriction::EnzymeCollection > $out->write($res); > > # or > > # use Bio::Restriction::IO; > # > # #input file format can be read from the file extension (dat|xml) > # $in = Bio::Restriction::IO->newFh(-file => "inputfilename"); > # $out = Bio::Restriction::IO->newFh('-format' => 'xml'); > # > # # World's shortest flat<->xml format converter: > # print $out $_ while <$in>; > > So, I have found several problems with these modules. I really hate to > criticize code here, as my own is pretty hacky, but I think these are > things to seriously mull over: > > 1) Note that, though some of the lines above are commented they are > still there in POD and thus present in perldoc/pod2html etc. So, judging > from the above, it suggests using the script above should read in from one > format and write out to another (like SeqIO). However, NONE of the current > write() methods are implemented for any of the IO modules (withref, base, > itype2, bairoch), so this does not happen as expected. You get the nasty > thrown 'method not implemented error' instead when writing. > 2) The commented statements in POD above also suggest that REBASE XML > format is supported when there is no XML module. > 3) The Bio::Restriction::IO::bairoch module had multiple bugs which > made it unusable until I added a few small changes; it still can't handle > multisite/multicut enzymes properly, so in essence it is useless until that > is addressed. > 4) Bio::Restriction::IO inherits from Bio::SeqIO, though I'm not sure > why. Shouldn't it just inherit from Bio::Root::Root/Bio::Root::IO and make > up it's own methods? > > I'm working on at least getting the 'bairoch' input format up and running > (so at least it gets the enzymes into a > Bio::Restriction::Enzyme::Collection). From this point I'm not sure where > to proceed. The POD obviously needs to be corrected to reflect that > writing formats is not implemented (and the bit about XML should be taken > out completely); that's the easy part which I am working on and plan > committing today. However, these modules don't seem to be used too > frequently so I'm not sure whether it's worth spending too much time > getting these up to speed at the moment (adding write methods, switching to > Bio::Root::Root, etc); I have other priorities at the moment (including a > way overdue ListSummary). I'm also not sure who else is (using|working) on > these so I don't want to (make too many changes|step on someone else's > toes), but these are, IMHO, pretty serious problems. > > Any thoughts? > > Chris > > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of the Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From cjfields at uiuc.edu Wed May 31 09:10:00 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 08:10:00 -0500 Subject: [Bioperl-l] Bio::Restriction::IO issues In-Reply-To: <200605311244.29187.heikki@sanbi.ac.za> References: <000f01c683f8$5771ed50$15327e82@pyrimidine> <200605311244.29187.heikki@sanbi.ac.za> Message-ID: Heikki, I mainly just changed a few things so no one would get the wrong ideas from POD (that they write format as well) and added a few things to the TO DO. I also added a warning to Bio::Restriction::IO::bairoch for the multisite/multicut issue. Besides that I haven't done much to them. I also added a bit to the Project Priority List in case someone wants to take it up. I may tinker with it but it's not really high on my priority list. I've been pretty busy getting the ListSummaries back up to speed (very busy mail lists since the last one) and am writing/testing a new interface to NCBI EUtilities which I may donate at some in the next few months or so. Chris On May 31, 2006, at 5:44 AM, Heikki Lehvaslaiho wrote: > > Chris, > > Thanks for stepping in. I feel partly responsible here because I > originally > changed some of Rob's code but have not followed up since. > > There have not been active development on these modules so do not > worry about > stepping on anyone's toes. > > -Heikki > > On Tuesday 30 May 2006 16:50, Chris Fields wrote: >> Jason, Brian, et al, >> >> I found several major issues with Bio::Restriction::IO (this >> popped up >> while bug squashing). In particular, the POD is pretty >> misleading. It >> states (directly from perldoc): >> >> SYNOPSIS >> use Bio::Restriction::IO; >> >> $in = Bio::Restriction::IO->new(-file => "inputfilename" , >> -format => 'withrefm'); >> $out = Bio::Restriction::IO->new(-file => ">outputfilename" , >> -format => 'bairoch'); >> my $res = $in->read; # a Bio::Restriction::EnzymeCollection >> $out->write($res); >> >> # or >> >> # use Bio::Restriction::IO; >> # >> # #input file format can be read from the file extension >> (dat|xml) >> # $in = Bio::Restriction::IO->newFh(-file => >> "inputfilename"); >> # $out = Bio::Restriction::IO->newFh('-format' => 'xml'); >> # >> # # World's shortest flat<->xml format converter: >> # print $out $_ while <$in>; >> >> So, I have found several problems with these modules. I really >> hate to >> criticize code here, as my own is pretty hacky, but I think these are >> things to seriously mull over: >> >> 1) Note that, though some of the lines above are commented they are >> still there in POD and thus present in perldoc/pod2html etc. So, >> judging >> from the above, it suggests using the script above should read in >> from one >> format and write out to another (like SeqIO). However, NONE of >> the current >> write() methods are implemented for any of the IO modules >> (withref, base, >> itype2, bairoch), so this does not happen as expected. You get >> the nasty >> thrown 'method not implemented error' instead when writing. >> 2) The commented statements in POD above also suggest that REBASE XML >> format is supported when there is no XML module. >> 3) The Bio::Restriction::IO::bairoch module had multiple bugs which >> made it unusable until I added a few small changes; it still can't >> handle >> multisite/multicut enzymes properly, so in essence it is useless >> until that >> is addressed. >> 4) Bio::Restriction::IO inherits from Bio::SeqIO, though I'm not sure >> why. Shouldn't it just inherit from Bio::Root::Root/Bio::Root::IO >> and make >> up it's own methods? >> >> I'm working on at least getting the 'bairoch' input format up and >> running >> (so at least it gets the enzymes into a >> Bio::Restriction::Enzyme::Collection). From this point I'm not >> sure where >> to proceed. The POD obviously needs to be corrected to reflect that >> writing formats is not implemented (and the bit about XML should >> be taken >> out completely); that's the easy part which I am working on and plan >> committing today. However, these modules don't seem to be used too >> frequently so I'm not sure whether it's worth spending too much time >> getting these up to speed at the moment (adding write methods, >> switching to >> Bio::Root::Root, etc); I have other priorities at the moment >> (including a >> way overdue ListSummary). I'm also not sure who else is (using| >> working) on >> these so I don't want to (make too many changes|step on someone >> else's >> toes), but these are, IMHO, pretty serious problems. >> >> Any thoughts? >> >> Chris >> >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of the Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jay at jays.net Wed May 31 09:07:10 2006 From: jay at jays.net (Jay Hannah) Date: Wed, 31 May 2006 08:07:10 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl Message-ID: <447D94FE.8090305@jays.net> http://www.bioperl.org/wiki/Bptutorial.pl I think I just partially fulfilled this TODO: TODO: check if the POD is in the Wiki yet, and if not, put it here? I used Pod::Simple::Wiki (format 'mediawiki') to burn bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it the wiki page via my web browser. (Is that proper procedure? Is the plan to just do that manually from time to time as the document changes?) Now what? Should there be a new link on the far left of bioperl.org called "Tutorial"? It's an amazing document. IMHO it should be listed prominently on bioperl.org. HTH, j From osborne1 at optonline.net Wed May 31 09:58:01 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 31 May 2006 09:58:01 -0400 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <447D94FE.8090305@jays.net> Message-ID: Jay, Excellent! Now we need to answer a few more questions for ourselves: - Do we remove the file bptutorial.pl from the package now? I'd say yes, we don't want to have to maintain two bptutorials. - What do we do with the script part of bptutorial.pl? It certainly could be excised and put into the examples/ directory, for example, but this would break a few of the paths that are being used. - A link to bptutorial? Or a link to the existing tutorials page? http://www.bioperl.org/wiki/Tutorials. Any thoughts on these? Brian O. On 5/31/06 9:07 AM, "Jay Hannah" wrote: > http://www.bioperl.org/wiki/Bptutorial.pl > > I think I just partially fulfilled this TODO: > > TODO: check if the POD is in the Wiki yet, and if not, put it here? > > I used Pod::Simple::Wiki (format 'mediawiki') to burn > bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it the > wiki page via my web browser. (Is that proper procedure? Is the plan to just > do that manually from time to time as the document changes?) > > Now what? > > Should there be a new link on the far left of bioperl.org called "Tutorial"? > > It's an amazing document. IMHO it should be listed prominently on bioperl.org. > > HTH, > > j > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From luciap at sas.upenn.edu Wed May 31 10:06:13 2006 From: luciap at sas.upenn.edu (Lucia Peixoto) Date: Wed, 31 May 2006 10:06:13 -0400 Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function In-Reply-To: <6B175FC0-F9D4-4658-AF9D-23D7F1C1B241@duke.edu> References: <1149019912.447ca7085124e@128.91.55.38> <6B175FC0-F9D4-4658-AF9D-23D7F1C1B241@duke.edu> Message-ID: <1149084373.447da2d5c5339@128.91.55.38> Hi Thanks a couple more questions why is the bootstrap value stored as the node id? Is that right? also, in the add_descendant method, how do you set the $ignoreoverwrite parameter to true? Lucia Quoting Jason Stajich : > you need to special case the root - it won't have an ancestor. just > protect the my $parent = $node->ancestor with an if statement as I > did below > > On May 30, 2006, at 4:11 PM, Lucia Peixoto wrote: > > > Hi > > OK that was silly, but what I have in my code is what you just wrote > > But the problem is that if I write > > > > $parent->add_Descendent($child) > > > > it tells me that I am calling the method "ass_Descendent" on an > > undefined value > > (but I did define $parent before??) > > > > So here it goes the code so far: > > > > use Bio::TreeIO; > > my $in = new Bio::TreeIO(-file => 'Test2.tre', > > -format => 'newick'); > > my $out = new Bio::TreeIO(-file => '>mytree.out', > > -format => 'newick'); > > while( my $tree = $in->next_tree ) { > > foreach my $node ( grep { ! $_->is_Leaf() } $tree->get_nodes() ) { > > my $bootstrap=$node->_creation_id; > > > > if ($bootstrap < 70 ){ > > >>> if( my $parent = $node->ancestor ) { > > my @children=$node->get_all_Descendents; > > foreach my $child (@children){ > > $parent->add_Descendent($child); > > } > } > > > > ........ > > > > eventually I'll add (once I assigned the children to the parent > > succesfully): > > $tree->remove_Node($node); > > > > } > > } > > $out->write_tree($tree); > > } > > > > Quoting aaron.j.mackey at gsk.com: > > > >>> foreach $child (@children){ > >>> $parent=add_Descendent->$child; > >>> } > >> > >> I think what you want is $parent->add_Descendent($child) > >> > >> -Aaron > >> > > > > > > Lucia Peixoto > > Department of Biology,SAS > > University of Pennsylvania > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > Lucia Peixoto Department of Biology,SAS University of Pennsylvania From sb at mrc-dunn.cam.ac.uk Wed May 31 10:56:49 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Wed, 31 May 2006 15:56:49 +0100 Subject: [Bioperl-l] For CVS developers - potential pitfallwith "returnundef" In-Reply-To: <200605311255.19166.heikki@sanbi.ac.za> References: <001801c68431$a586b2d0$15327e82@pyrimidine> <200605311255.19166.heikki@sanbi.ac.za> Message-ID: <447DAEB1.4040509@mrc-dunn.cam.ac.uk> Heikki Lehvaslaiho wrote: > In my opinion the sooner the bugs get exposed the better. It is much more > likely that there is a well hidden bug caused by assigning accidentally undef > into an one element array that someone intentionally writing code that > expects that behaviour! > > I removed (but did not commit yet) all undefs from my old Bio::Variation code > and could not see any differences in the test output. > > Let's remove them! Just looking for all return undef;s isn't enough. It's entirely possible to do something like: my $return_value; { # do something that assigns to return_value on success # on failure, just do nothing } return $return_value; The bioperl docs will typically explicitly state that undef is returned, and under what circumstance. If a user suffers from the undef-into-array-problem, yes it can be slightly unexpected, but lots of unexpected things will happen when you don't use a method correctly, as per the docs! Fixing the return of undef is either a job that shouldn't be done, or a much harder job than expected. From bernd.web at gmail.com Wed May 31 10:30:30 2006 From: bernd.web at gmail.com (Bernd Web) Date: Wed, 31 May 2006 16:30:30 +0200 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: References: <447D94FE.8090305@jays.net> Message-ID: <716af09c0605310730o7de20489m674a07b5a928039d@mail.gmail.com> Hi, I am not sure to what extent bptutorial will be removed, but I actually like having bptutorial.pl in my BioPerl base for reference. regards, Bernd On 5/31/06, Brian Osborne wrote: > Jay, > > Excellent! Now we need to answer a few more questions for ourselves: > > - Do we remove the file bptutorial.pl from the package now? I'd say yes, we > don't want to have to maintain two bptutorials. > > - What do we do with the script part of bptutorial.pl? It certainly could be > excised and put into the examples/ directory, for example, but this would > break a few of the paths that are being used. > > - A link to bptutorial? Or a link to the existing tutorials page? > http://www.bioperl.org/wiki/Tutorials. > > Any thoughts on these? > > > Brian O. > > > On 5/31/06 9:07 AM, "Jay Hannah" wrote: > > > http://www.bioperl.org/wiki/Bptutorial.pl > > > > I think I just partially fulfilled this TODO: > > > > TODO: check if the POD is in the Wiki yet, and if not, put it here? > > > > I used Pod::Simple::Wiki (format 'mediawiki') to burn > > bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it the > > wiki page via my web browser. (Is that proper procedure? Is the plan to just > > do that manually from time to time as the document changes?) > > > > Now what? > > > > Should there be a new link on the far left of bioperl.org called "Tutorial"? > > > > It's an amazing document. IMHO it should be listed prominently on bioperl.org. > > > > HTH, > > > > j > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From lstein at cshl.edu Wed May 31 12:03:13 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 31 May 2006 12:03:13 -0400 Subject: [Bioperl-l] For CVS developers - potential pitfallwith "returnundef" In-Reply-To: <200605311255.19166.heikki@sanbi.ac.za> References: <001801c68431$a586b2d0$15327e82@pyrimidine> <200605311255.19166.heikki@sanbi.ac.za> Message-ID: <200605311203.13922.lstein@cshl.edu> I'm afraid that everything depends on the context. If the subroutine is documented to return a single scalar, then returning undef is appropriate. If the subroutine is documented to return "false" on failure, then one must call return (or "return ()" ). Changing all the return undefs to return is going to expose hidden bugs in the code written by people who are using BioPerl. While I agree wholeheartedly with the proposed audit, I think we need to expect that people are going to complain. Lincoln On Wednesday 31 May 2006 06:55, Heikki Lehvaslaiho wrote: > In my opinion the sooner the bugs get exposed the better. It is much more > likely that there is a well hidden bug caused by assigning accidentally > undef into an one element array that someone intentionally writing code > that expects that behaviour! > > I removed (but did not commit yet) all undefs from my old Bio::Variation > code and could not see any differences in the test output. > > Let's remove them! > > -Heikki > > On Tuesday 30 May 2006 23:40, Chris Fields wrote: > > Agreed, though I think these changes should be implemented at some point > > (Conway's argument here makes sense and it is nice for Torsten to check > > this out). If proper tests are written then any changes resulting in > > errors should be picked up by checking the appropriate test suite, though > > I know it doesn't absolutely guarantee it. ; P > > > > Chris > > > > > -----Original Message----- > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > bounces at lists.open-bio.org] On Behalf Of Rutger Vos > > > Sent: Tuesday, May 30, 2006 1:53 PM > > > To: bioperl-l at lists.open-bio.org > > > Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith > > > "returnundef" > > > > > > Although I agree with the sentiment of following PBP, I'm not so sure > > > changing 'return undef' to 'return' *now* will fix any bugs without > > > introducing new, subtle ones. > > > > > > Chris Fields wrote: > > > > Torsten, > > > > > > > > Any way you can post a list of some/all of the offending lines or > > > > > > modules? > > > > > > > Sounds like something to consider, but if the list is as large as you > > > > > > say we > > > > > > > made need something (bugzilla? wiki?) to track the changes and make > > > > sure they pass tests; I'm sure a large majority will. > > > > > > > > I'm guessing Jason would want this somewhere on the project priority > > > > > > list or > > > > > > > bugzilla, with a link to the actual list, but I'm not sure. Maybe > > > > start > > > > > > a > > > > > > > page on the wiki for proposed code changes? > > > > > > > > Chris > > > > > > > >> -----Original Message----- > > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > > > >> Sent: Tuesday, May 30, 2006 3:19 AM > > > >> To: bioperl-l at lists.open-bio.org > > > >> Subject: [Bioperl-l] For CVS developers - potential pitfall with > > > >> "returnundef" > > > >> > > > >> FYI Bioperl developers: > > > >> > > > >> I just audited the bioperl-live CVS and found about 450 occurrences > > > >> of "return undef". > > > >> > > > >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL > > > >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html > > > >> suggest: > > > >> > > > >> "Use return; instead of return undef; if you want to return nothing. > > > >> If someone assigns the return value to an array, the latter creates > > > >> an array of one value (undef), which evaluates to true. The former > > > >> will correctly handle all contexts." > > > >> > > > >> So I'm guessing at least some of these 450 occurrences *could* > > > >> result > > > > > > in > > > > > > >> bugs and should probably be changed. > > > >> > > > >> Your opinion may differ :-) > > > >> > > > >> -- > > > >> Dr Torsten Seemann http://www.vicbioinformatics.com > > > >> Victorian Bioinformatics Consortium, Monash University, Australia > > > >> > > > >> _______________________________________________ > > > >> Bioperl-l mailing list > > > >> Bioperl-l at lists.open-bio.org > > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > Rutger Vos, PhD. candidate > > > Department of Biological Sciences > > > Simon Fraser University > > > 8888 University Drive > > > Burnaby, BC, V5A1S6 > > > Phone: 604-291-5625 > > > Fax: 604-291-3496 > > > Personal site: http://www.sfu.ca/~rvosa > > > FAB* lab: http://www.sfu.ca/~fabstar > > > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Wed May 31 12:34:54 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 11:34:54 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: Message-ID: <001201c684d0$263c5530$15327e82@pyrimidine> Brian, Jay, I think it would be nice to have the tutorial prominently displayed somehow (Jay's suggestion), with a link provided via the tutorials page. Hopefully this will help with the bioperl newbies. Jay, looks like there are still some weird formatting issues with the bptutorial wiki page, something which I ran into before when getting the Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or more spaces preceding a line denotes code for some reason). Not much you can do in these cases except remove the extra spaces in those spots. Looking good though! Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Brian Osborne > Sent: Wednesday, May 31, 2006 8:58 AM > To: Jay Hannah; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl > > Jay, > > Excellent! Now we need to answer a few more questions for ourselves: > > - Do we remove the file bptutorial.pl from the package now? I'd say yes, > we > don't want to have to maintain two bptutorials. > > - What do we do with the script part of bptutorial.pl? It certainly could > be > excised and put into the examples/ directory, for example, but this would > break a few of the paths that are being used. > > - A link to bptutorial? Or a link to the existing tutorials page? > http://www.bioperl.org/wiki/Tutorials. > > Any thoughts on these? > > > Brian O. > > > On 5/31/06 9:07 AM, "Jay Hannah" wrote: > > > http://www.bioperl.org/wiki/Bptutorial.pl > > > > I think I just partially fulfilled this TODO: > > > > TODO: check if the POD is in the Wiki yet, and if not, put it here? > > > > I used Pod::Simple::Wiki (format 'mediawiki') to burn > > bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it > the > > wiki page via my web browser. (Is that proper procedure? Is the plan to > just > > do that manually from time to time as the document changes?) > > > > Now what? > > > > Should there be a new link on the far left of bioperl.org called > "Tutorial"? > > > > It's an amazing document. IMHO it should be listed prominently on > bioperl.org. > > > > HTH, > > > > j > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Wed May 31 12:44:31 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 11:44:31 -0500 Subject: [Bioperl-l] For CVS developers - potential pitfallwith"returnundef" In-Reply-To: <200605311203.13922.lstein@cshl.edu> Message-ID: <001301c684d1$7e849fd0$15327e82@pyrimidine> My feeling is the test suite 'should' pick up a large majority of problems if changes are made to these lines, the quotes there indicating the utopian idea that the tests are all written well (I believe 99% of the tests are, BTW). You can always try the changes (wholesale or on smaller chunks of code), see if they pass tests on different OS's using 'make/nmake test', revert the ones that didn't pass, etc. It's a matter of someone willing to try it out. I think the original argument proposed here (originating from Damian Conway and 'Perl Best Practices') is maybe using 'return undef' is something we shouldn't be doing since this can lead to subtle errors itself. Not that everything we do is considered 'a good practice' by any means. If I remember correctly from 'OOPerl', Conway doesn't like combined get/setters either (he prefers separate getters and setters); we use the 'bad' combined version predominately in Bioperl. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Lincoln Stein > Sent: Wednesday, May 31, 2006 11:03 AM > To: bioperl-l at lists.open-bio.org > Cc: Heikki Lehvaslaiho > Subject: Re: [Bioperl-l] For CVS developers - potential > pitfallwith"returnundef" > > I'm afraid that everything depends on the context. If the subroutine is > documented to return a single scalar, then returning undef is appropriate. > If > the subroutine is documented to return "false" on failure, then one must > call > return (or "return ()" ). > > Changing all the return undefs to return is going to expose hidden bugs in > the > code written by people who are using BioPerl. While I agree wholeheartedly > with the proposed audit, I think we need to expect that people are going > to > complain. > > Lincoln > > > On Wednesday 31 May 2006 06:55, Heikki Lehvaslaiho wrote: > > In my opinion the sooner the bugs get exposed the better. It is much > more > > likely that there is a well hidden bug caused by assigning accidentally > > undef into an one element array that someone intentionally writing code > > that expects that behaviour! > > > > I removed (but did not commit yet) all undefs from my old Bio::Variation > > code and could not see any differences in the test output. > > > > Let's remove them! > > > > -Heikki > > > > On Tuesday 30 May 2006 23:40, Chris Fields wrote: > > > Agreed, though I think these changes should be implemented at some > point > > > (Conway's argument here makes sense and it is nice for Torsten to > check > > > this out). If proper tests are written then any changes resulting in > > > errors should be picked up by checking the appropriate test suite, > though > > > I know it doesn't absolutely guarantee it. ; P > > > > > > Chris > > > > > > > -----Original Message----- > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > bounces at lists.open-bio.org] On Behalf Of Rutger Vos > > > > Sent: Tuesday, May 30, 2006 1:53 PM > > > > To: bioperl-l at lists.open-bio.org > > > > Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith > > > > "returnundef" > > > > > > > > Although I agree with the sentiment of following PBP, I'm not so > sure > > > > changing 'return undef' to 'return' *now* will fix any bugs without > > > > introducing new, subtle ones. > > > > > > > > Chris Fields wrote: > > > > > Torsten, > > > > > > > > > > Any way you can post a list of some/all of the offending lines or > > > > > > > > modules? > > > > > > > > > Sounds like something to consider, but if the list is as large as > you > > > > > > > > say we > > > > > > > > > made need something (bugzilla? wiki?) to track the changes and > make > > > > > sure they pass tests; I'm sure a large majority will. > > > > > > > > > > I'm guessing Jason would want this somewhere on the project > priority > > > > > > > > list or > > > > > > > > > bugzilla, with a link to the actual list, but I'm not sure. Maybe > > > > > start > > > > > > > > a > > > > > > > > > page on the wiki for proposed code changes? > > > > > > > > > > Chris > > > > > > > > > >> -----Original Message----- > > > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > > > > >> Sent: Tuesday, May 30, 2006 3:19 AM > > > > >> To: bioperl-l at lists.open-bio.org > > > > >> Subject: [Bioperl-l] For CVS developers - potential pitfall with > > > > >> "returnundef" > > > > >> > > > > >> FYI Bioperl developers: > > > > >> > > > > >> I just audited the bioperl-live CVS and found about 450 > occurrences > > > > >> of "return undef". > > > > >> > > > > >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL > > > > >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html > > > > >> suggest: > > > > >> > > > > >> "Use return; instead of return undef; if you want to return > nothing. > > > > >> If someone assigns the return value to an array, the latter > creates > > > > >> an array of one value (undef), which evaluates to true. The > former > > > > >> will correctly handle all contexts." > > > > >> > > > > >> So I'm guessing at least some of these 450 occurrences *could* > > > > >> result > > > > > > > > in > > > > > > > > >> bugs and should probably be changed. > > > > >> > > > > >> Your opinion may differ :-) > > > > >> > > > > >> -- > > > > >> Dr Torsten Seemann http://www.vicbioinformatics.com > > > > >> Victorian Bioinformatics Consortium, Monash University, Australia > > > > >> > > > > >> _______________________________________________ > > > > >> Bioperl-l mailing list > > > > >> Bioperl-l at lists.open-bio.org > > > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > _______________________________________________ > > > > > Bioperl-l mailing list > > > > > Bioperl-l at lists.open-bio.org > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > -- > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > Rutger Vos, PhD. candidate > > > > Department of Biological Sciences > > > > Simon Fraser University > > > > 8888 University Drive > > > > Burnaby, BC, V5A1S6 > > > > Phone: 604-291-5625 > > > > Fax: 604-291-3496 > > > > Personal site: http://www.sfu.ca/~rvosa > > > > FAB* lab: http://www.sfu.ca/~fabstar > > > > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Wed May 31 10:59:53 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 31 May 2006 10:59:53 -0400 Subject: [Bioperl-l] For CVS developers - potential pitfallwith "returnundef" In-Reply-To: <200605311255.19166.heikki@sanbi.ac.za> References: <001801c68431$a586b2d0$15327e82@pyrimidine> <200605311255.19166.heikki@sanbi.ac.za> Message-ID: <949F348A-391B-495D-ABCE-30BABC37FF05@gmx.net> I agree. Thanks to Torsten for the audit and Chris for stepping up. -hilmar On May 31, 2006, at 6:55 AM, Heikki Lehvaslaiho wrote: > In my opinion the sooner the bugs get exposed the better. It is > much more > likely that there is a well hidden bug caused by assigning > accidentally undef > into an one element array that someone intentionally writing code that > expects that behaviour! > > I removed (but did not commit yet) all undefs from my old > Bio::Variation code > and could not see any differences in the test output. > > Let's remove them! > > -Heikki > > On Tuesday 30 May 2006 23:40, Chris Fields wrote: >> Agreed, though I think these changes should be implemented at some >> point >> (Conway's argument here makes sense and it is nice for Torsten to >> check >> this out). If proper tests are written then any changes resulting in >> errors should be picked up by checking the appropriate test suite, >> though I >> know it doesn't absolutely guarantee it. ; P >> >> Chris >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Rutger Vos >>> Sent: Tuesday, May 30, 2006 1:53 PM >>> To: bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith >>> "returnundef" >>> >>> Although I agree with the sentiment of following PBP, I'm not so >>> sure >>> changing 'return undef' to 'return' *now* will fix any bugs without >>> introducing new, subtle ones. >>> >>> Chris Fields wrote: >>>> Torsten, >>>> >>>> Any way you can post a list of some/all of the offending lines or >>> >>> modules? >>> >>>> Sounds like something to consider, but if the list is as large >>>> as you >>> >>> say we >>> >>>> made need something (bugzilla? wiki?) to track the changes and make >>>> sure they pass tests; I'm sure a large majority will. >>>> >>>> I'm guessing Jason would want this somewhere on the project >>>> priority >>> >>> list or >>> >>>> bugzilla, with a link to the actual list, but I'm not sure. Maybe >>>> start >>> >>> a >>> >>>> page on the wiki for proposed code changes? >>>> >>>> Chris >>>> >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann >>>>> Sent: Tuesday, May 30, 2006 3:19 AM >>>>> To: bioperl-l at lists.open-bio.org >>>>> Subject: [Bioperl-l] For CVS developers - potential pitfall with >>>>> "returnundef" >>>>> >>>>> FYI Bioperl developers: >>>>> >>>>> I just audited the bioperl-live CVS and found about 450 >>>>> occurrences of >>>>> "return undef". >>>>> >>>>> Page 199 of "Perl Best Practices" by Damian Conway, and this URL >>>>> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html >>>>> suggest: >>>>> >>>>> "Use return; instead of return undef; if you want to return >>>>> nothing. >>>>> If someone assigns the return value to an array, the latter >>>>> creates an >>>>> array of one value (undef), which evaluates to true. The former >>>>> will >>>>> correctly handle all contexts." >>>>> >>>>> So I'm guessing at least some of these 450 occurrences *could* >>>>> result >>> >>> in >>> >>>>> bugs and should probably be changed. >>>>> >>>>> Your opinion may differ :-) >>>>> >>>>> -- >>>>> Dr Torsten Seemann http://www.vicbioinformatics.com >>>>> Victorian Bioinformatics Consortium, Monash University, Australia >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> Rutger Vos, PhD. candidate >>> Department of Biological Sciences >>> Simon Fraser University >>> 8888 University Drive >>> Burnaby, BC, V5A1S6 >>> Phone: 604-291-5625 >>> Fax: 604-291-3496 >>> Personal site: http://www.sfu.ca/~rvosa >>> FAB* lab: http://www.sfu.ca/~fabstar >>> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of the Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed May 31 14:08:43 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 31 May 2006 14:08:43 -0400 Subject: [Bioperl-l] For CVS developers - potential pitfallwith "returnundef" In-Reply-To: <200605311203.13922.lstein@cshl.edu> References: <001801c68431$a586b2d0$15327e82@pyrimidine> <200605311255.19166.heikki@sanbi.ac.za> <200605311203.13922.lstein@cshl.edu> Message-ID: On May 31, 2006, at 12:03 PM, Lincoln Stein wrote: > If the subroutine is documented to return "false" on failure, then > one must call > return (or "return ()" ). The problem seems to be that 'a value that evaluates to either true or false' and 'a [meaningful] value or undef' and 'a value or false' ('a value or no value) are not the same in perl. And what would/should one expect if the doc states 'true on success and false otherwise'? Maybe the documentation should also be fixed to avoid any ambiguity. I.e., avoid documenting 'a value or false' because it may be ambiguous (not only) to the less proficient. 'True or false' should imply a value being returned. Comments? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From lstein at cshl.edu Wed May 31 14:14:59 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 31 May 2006 14:14:59 -0400 Subject: [Bioperl-l] For CVS developers - potential pitfallwith "returnundef" In-Reply-To: References: <001801c68431$a586b2d0$15327e82@pyrimidine> <200605311203.13922.lstein@cshl.edu> Message-ID: <200605311415.00414.lstein@cshl.edu> If the documentation says "returns false" then I expect to be able to do this: @result = foo(); die "foo() failed" unless @result; If the documentation says "returns undef" then I expect this: @result = foo(); die "foo() failed" unless $result[0]; Lincoln On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote: > On May 31, 2006, at 12:03 PM, Lincoln Stein wrote: > > If the subroutine is documented to return "false" on failure, then > > one must call > > return (or "return ()" ). > > The problem seems to be that 'a value that evaluates to either true > or false' and 'a [meaningful] value or undef' and 'a value or > false' ('a value or no value) are not the same in perl. And what > would/should one expect if the doc states 'true on success and false > otherwise'? > > Maybe the documentation should also be fixed to avoid any ambiguity. > I.e., avoid documenting 'a value or false' because it may be > ambiguous (not only) to the less proficient. 'True or false' should > imply a value being returned. > > Comments? > > -hilmar -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From hlapp at gmx.net Wed May 31 14:31:21 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 31 May 2006 14:31:21 -0400 Subject: [Bioperl-l] For CVS developers - potential pitfallwith "returnundef" In-Reply-To: <200605311415.00414.lstein@cshl.edu> References: <001801c68431$a586b2d0$15327e82@pyrimidine> <200605311203.13922.lstein@cshl.edu> <200605311415.00414.lstein@cshl.edu> Message-ID: <241E77AE-8D1E-4708-9C4C-8A9619822DB4@gmx.net> On May 31, 2006, at 2:14 PM, Lincoln Stein wrote: > If the documentation says "returns false" then I expect to be able > to do this: > > @result = foo(); > die "foo() failed" unless @result; Except if the alternative to 'false' would be a scalar, you normally wouldn't assign it to an array, would you? I.e., I wouldn't expect this strict of a behavior from an open-source package written largely from people whose job is biological science, not programming perl knowing and following DC to the letter ... I'd rather be on the safe side and assign to a scalar. Just my $0.02 ... -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Wed May 31 14:50:30 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 13:50:30 -0500 Subject: [Bioperl-l] For CVS developers - potential pitfallwith"returnundef" In-Reply-To: <447DAEB1.4040509@mrc-dunn.cam.ac.uk> Message-ID: <001801c684e3$16e33730$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Wednesday, May 31, 2006 9:57 AM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] For CVS developers - potential > pitfallwith"returnundef" > > Heikki Lehvaslaiho wrote: > > In my opinion the sooner the bugs get exposed the better. It is much > more > > likely that there is a well hidden bug caused by assigning accidentally > undef > > into an one element array that someone intentionally writing code that > > expects that behaviour! > > > > I removed (but did not commit yet) all undefs from my old Bio::Variation > code > > and could not see any differences in the test output. > > > > Let's remove them! > > Just looking for all return undef;s isn't enough. It's entirely possible > to do something like: > > my $return_value; > { > # do something that assigns to return_value on success > # on failure, just do nothing > } > return $return_value; Agreed, though looking for these is obviously much harder. The way to get around those is: return $return_value if $return_value; return; which I've seen used in a number of get/set methods. > The bioperl docs will typically explicitly state that undef is returned, > and under what circumstance. If a user suffers from the > undef-into-array-problem, yes it can be slightly unexpected, but lots of > unexpected things will happen when you don't use a method correctly, as > per the docs! Right, but the argument you make is that code will always work as expected from the perldoc examples. My recent experiences with the Bio::Restriction::IO and Bio::Species classes show that the docs are not always up-to-date and may indicate the unimplemented intent of the author more than the actual implementation. Again, I believe a large majority of the docs are fine, but it's those few errors that made a devil's advocate of me... > Fixing the return of undef is either a job that shouldn't be done, or a > much harder job than expected. I don't think ignoring the problem is the best answer here though I agree the problem is more complicated than at first glance. Judging from code I'm trolled through a bit lately I've seen a lot of methods (mainly get/setters) that are essentially copied multiple times in the same or across similar modules to save time. You could see a scenario where, in those instances, so-called 'bad code' would spread quite quickly. I think adding a wiki page to address some of these issues would be nice, something separate from the Project Priority List. Chris _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From forward at hongyu.org Wed May 31 14:03:46 2006 From: forward at hongyu.org (Hongyu Zhang) Date: Wed, 31 May 2006 11:03:46 -0700 Subject: [Bioperl-l] New functions for SimpleAlign.pm Message-ID: <20060531110346.78xod658td8o0w0w@hongyu.org> Greetings, I am a new member in this mailing list. Nice to be here. I wrote two more functions for the alignment module SimpleAlign.pm that calculate the percentage of identity based on the shortest and longest sequence length, respectively. I also found an error in the no_residues() function that calculate the number of residues in the alignment. I am wondering whether they can be added to the official bioperl package. I've contacted the original author of this module, Heikki Lehvaslaiho, a couple of weeks ago, but haven't heard from him yet. Thanks. -- Hongyu Zhang, Ph.D. Computational biologist Ceres Inc. From cjfields at uiuc.edu Wed May 31 15:39:26 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 14:39:26 -0500 Subject: [Bioperl-l] New functions for SimpleAlign.pm In-Reply-To: <20060531110346.78xod658td8o0w0w@hongyu.org> Message-ID: <001901c684e9$ed4a1720$15327e82@pyrimidine> I added a bit to the FAQ about this: http://www.bioperl.org/wiki/FAQ#How_do_I_submit_a_patch_or_enhancement_to_Bi oPerl.3F and the HOWTO explains things a bit more directly: http://www.bioperl.org/wiki/HOWTO:SubmitPatch In brief, these need to be submitted to Bugzilla as either code enhancements (for your added methods) or bugs with the patch to the relevant code. Code enhancements probably should include some code and test cases to demonstrate usage. Patches to buggy code are checked to make sure they pass relevant tests by the core developers. Submitting it to the mail list is definitely the first step, though, so you're on the right path. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hongyu Zhang > Sent: Wednesday, May 31, 2006 1:04 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] New functions for SimpleAlign.pm > > Greetings, > > I am a new member in this mailing list. Nice to be here. > > I wrote two more functions for the alignment module SimpleAlign.pm > that calculate the percentage of identity based on the shortest and > longest sequence length, respectively. I also found an error in the > no_residues() function that calculate the number of residues in the > alignment. > > I am wondering whether they can be added to the official bioperl > package. I've contacted the original author of this module, Heikki > Lehvaslaiho, a couple of weeks ago, but haven't heard from him yet. > > Thanks. > > -- > Hongyu Zhang, Ph.D. > Computational biologist > Ceres Inc. > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Wed May 31 16:40:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 15:40:19 -0500 Subject: [Bioperl-l] For CVS developers - potential pitfallwith"returnundef" In-Reply-To: <200605311415.00414.lstein@cshl.edu> Message-ID: <002001c684f2$6fb7daf0$15327e82@pyrimidine> What about modules that have 'throw_not_implemented' statements present? Here's a list with the total for each. Some of these are interfaces (I got rid of a number that ended in 'I' or 'IO' to remove the I/IO interfaces but it misses a few). There are a number here that are implementations, though (Bio::AlignIO::maf, Bio::Restriction:IO::*), so they are technically incomplete: Instances: 1 Module : Bio::AlignIO::maf Instances: 25 Module : Bio::Assembly::Contig Instances: 2 Module : Bio::Assembly::ContigAnalysis Instances: 2 Module : Bio::Biblio::BiblioBase Instances: 4 Module : Bio::DB::Expression Instances: 2 Module : Bio::DB::Expression::geo Instances: 5 Module : Bio::DB::Flat Instances: 2 Module : Bio::DB::Query::WebQuery Instances: 17 Module : Bio::DB::SeqFeature::Store Instances: 2 Module : Bio::DB::SeqVersion Instances: 3 Module : Bio::DB::Taxonomy Instances: 1 Module : Bio::FeatureIO::bed Instances: 1 Module : Bio::Map::Marker Instances: 1 Module : Bio::MapIO::fpc Instances: 1 Module : Bio::MapIO::mapmaker Instances: 1 Module : Bio::Restriction::IO::bairoch Instances: 1 Module : Bio::Restriction::IO::itype2 Instances: 1 Module : Bio::Restriction::IO::withrefm Instances: 1 Module : Bio::Tools::Analysis::SimpleAnalysisBase Instances: 3 Module : Bio::Tools::Run::WrapperBase Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Lincoln Stein > Sent: Wednesday, May 31, 2006 1:15 PM > To: Hilmar Lapp > Cc: bioperl-l at lists.open-bio.org; Heikki Lehvaslaiho > Subject: Re: [Bioperl-l] For CVS developers - potential > pitfallwith"returnundef" > > If the documentation says "returns false" then I expect to be able to do > this: > > @result = foo(); > die "foo() failed" unless @result; > > If the documentation says "returns undef" then I expect this: > > @result = foo(); > die "foo() failed" unless $result[0]; > > Lincoln > > > On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote: > > On May 31, 2006, at 12:03 PM, Lincoln Stein wrote: > > > If the subroutine is documented to return "false" on failure, then > > > one must call > > > return (or "return ()" ). > > > > The problem seems to be that 'a value that evaluates to either true > > or false' and 'a [meaningful] value or undef' and 'a value or > > false' ('a value or no value) are not the same in perl. And what > > would/should one expect if the doc states 'true on success and false > > otherwise'? > > > > Maybe the documentation should also be fixed to avoid any ambiguity. > > I.e., avoid documenting 'a value or false' because it may be > > ambiguous (not only) to the less proficient. 'True or false' should > > imply a value being returned. > > > > Comments? > > > > -hilmar > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lstein at cshl.edu Wed May 31 17:07:06 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 31 May 2006 17:07:06 -0400 Subject: [Bioperl-l] For CVS developers - potential pitfallwith"returnundef" In-Reply-To: <002001c684f2$6fb7daf0$15327e82@pyrimidine> References: <002001c684f2$6fb7daf0$15327e82@pyrimidine> Message-ID: <200605311707.08196.lstein@cshl.edu> > Instances: 17 Module : Bio::DB::SeqFeature::Store This is intentional. Bio::DB::SeqFeature::Store is intended to be a virtual base class. The throw_not_implemented() calls are there to force developers to override the needed interface methods. If this is not the right way to do it, let me know and I'll fix it. Lincoln > Instances: 2 Module : Bio::DB::SeqVersion > Instances: 3 Module : Bio::DB::Taxonomy > Instances: 1 Module : Bio::FeatureIO::bed > Instances: 1 Module : Bio::Map::Marker > Instances: 1 Module : Bio::MapIO::fpc > Instances: 1 Module : Bio::MapIO::mapmaker > Instances: 1 Module : Bio::Restriction::IO::bairoch > Instances: 1 Module : Bio::Restriction::IO::itype2 > Instances: 1 Module : Bio::Restriction::IO::withrefm > Instances: 1 Module : Bio::Tools::Analysis::SimpleAnalysisBase > Instances: 3 Module : Bio::Tools::Run::WrapperBase > > Chris > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Lincoln Stein > > Sent: Wednesday, May 31, 2006 1:15 PM > > To: Hilmar Lapp > > Cc: bioperl-l at lists.open-bio.org; Heikki Lehvaslaiho > > Subject: Re: [Bioperl-l] For CVS developers - potential > > pitfallwith"returnundef" > > > > If the documentation says "returns false" then I expect to be able to do > > this: > > > > @result = foo(); > > die "foo() failed" unless @result; > > > > If the documentation says "returns undef" then I expect this: > > > > @result = foo(); > > die "foo() failed" unless $result[0]; > > > > Lincoln > > > > On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote: > > > On May 31, 2006, at 12:03 PM, Lincoln Stein wrote: > > > > If the subroutine is documented to return "false" on failure, then > > > > one must call > > > > return (or "return ()" ). > > > > > > The problem seems to be that 'a value that evaluates to either true > > > or false' and 'a [meaningful] value or undef' and 'a value or > > > false' ('a value or no value) are not the same in perl. And what > > > would/should one expect if the doc states 'true on success and false > > > otherwise'? > > > > > > Maybe the documentation should also be fixed to avoid any ambiguity. > > > I.e., avoid documenting 'a value or false' because it may be > > > ambiguous (not only) to the less proficient. 'True or false' should > > > imply a value being returned. > > > > > > Comments? > > > > > > -hilmar > > > > -- > > Lincoln D. Stein > > Cold Spring Harbor Laboratory > > 1 Bungtown Road > > Cold Spring Harbor, NY 11724 > > (516) 367-8380 (voice) > > (516) 367-8389 (fax) > > FOR URGENT MESSAGES & SCHEDULING, > > PLEASE CONTACT MY ASSISTANT, > > SANDRA MICHELSEN, AT michelse at cshl.edu > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From hlapp at gmx.net Wed May 31 17:21:57 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 31 May 2006 17:21:57 -0400 Subject: [Bioperl-l] For CVS developers - throw_not_implemented In-Reply-To: <002001c684f2$6fb7daf0$15327e82@pyrimidine> References: <002001c684f2$6fb7daf0$15327e82@pyrimidine> Message-ID: On May 31, 2006, at 4:40 PM, Chris Fields wrote: > What about modules that have 'throw_not_implemented' statements > present? Those are often if not always legitimate - the problem are those that don't have them but fail to override an inherited interface or abstract method. If something is not implemented what is the better way to express this other than throwing an exception? (and if it's not an interface or abstract base class, saying so in the documentation) -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed May 31 17:25:48 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 31 May 2006 17:25:48 -0400 Subject: [Bioperl-l] For CVS developers - potential pitfallwith"returnundef" In-Reply-To: <001801c684e3$16e33730$15327e82@pyrimidine> References: <001801c684e3$16e33730$15327e82@pyrimidine> Message-ID: <8AA04BF0-FA79-43CF-9FBB-310314FECD91@gmx.net> On May 31, 2006, at 2:50 PM, Chris Fields wrote: > I've seen a lot of methods (mainly get/setters) > that are essentially copied multiple times in the same or across > similar > modules to save time. You could see a scenario where, in those > instances, > so-called 'bad code' would spread quite quickly. This will usually be code generated by macros, e.g. the emacs macros for getter/setter generation for properties. If the macro generates wrong code, that's indeed pretty bad. (We've had that.) OTOH it should be spotted quickly as well. And macro changes or new macros should probably be scrutinized by all eyes watching ... -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Wed May 31 17:40:22 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 16:40:22 -0500 Subject: [Bioperl-l] For CVS developers - throw_not_implemented In-Reply-To: Message-ID: <002401c684fa$d28e7640$15327e82@pyrimidine> I think, as long as it's reflected in the docs that something doesn't work (hasn't been implemented) then there's no problem. It's when the docs are misleading that we run into problems. The sticking point lies with some classes, such as IO classes (like SeqIO, or Restrict::IO, with read and write methods) where the IO base class specifies that it is possible to read and write a particular format but the actual implementation varies according to whether or not the derived class overrides the base or interface method (in other words, 'doesn't work as advertised' only in specific circumstances). I don't know how to solve this issue except to add in the docs that specific formats don't implement write() methods. Personally, I haven't had an issue with it and it probably makes no difference, but I think it needs to be pointed out. The most extreme I ran into was Bio::Restriction::IO, which had 3 out of 4 plugin modules that didn't implement the write() method but left this in the synopsis in POD: use Bio::Restriction::IO; $in = Bio::Restriction::IO->new(-file => "inputfilename" , -format => 'withrefm'); $out = Bio::Restriction::IO->new(-file => ">outputfilename" , -format => 'bairoch'); my $res = $in->read; # a Bio::Restriction::EnzymeCollection $out->write($res); # or # use Bio::Restriction::IO; # # #input file format can be read from the file extension (dat|xml) # $in = Bio::Restriction::IO->newFh(-file => "inputfilename"); # $out = Bio::Restriction::IO->newFh('-format' => 'xml'); # # # World's shortest flat<->xml format converter: # print $out $_ while <$in>; None of this code works; in fact, no XML parser even exists for these IO classes! Bio::AlignIO also has a few as well (maf and Stockholm formats don't write). Chris > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Wednesday, May 31, 2006 4:22 PM > To: Chris Fields > Cc: lstein at cshl.edu; bioperl-l at lists.open-bio.org; 'Heikki Lehvaslaiho' > Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented > > > On May 31, 2006, at 4:40 PM, Chris Fields wrote: > > > What about modules that have 'throw_not_implemented' statements > > present? > > Those are often if not always legitimate - the problem are those that > don't have them but fail to override an inherited interface or > abstract method. > > If something is not implemented what is the better way to express > this other than throwing an exception? (and if it's not an interface > or abstract base class, saying so in the documentation) > > -hilmar > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From hlapp at gmx.net Wed May 31 17:55:37 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 31 May 2006 17:55:37 -0400 Subject: [Bioperl-l] For CVS developers - throw_not_implemented In-Reply-To: <002401c684fa$d28e7640$15327e82@pyrimidine> References: <002401c684fa$d28e7640$15327e82@pyrimidine> Message-ID: This is documentation cruft resulting from copy&paste w/o later fixing it. (which isn't a justification) Note that not implementing the write is as legitimate as not implementing the read method ... It should be pointed out in the documentation though that it will depend on the actual implementation of the format whether it supports reading or writing or both. -hilmar On May 31, 2006, at 5:40 PM, Chris Fields wrote: > I think, as long as it's reflected in the docs that something > doesn't work > (hasn't been implemented) then there's no problem. It's when the > docs are > misleading that we run into problems. > > The sticking point lies with some classes, such as IO classes (like > SeqIO, > or Restrict::IO, with read and write methods) where the IO base class > specifies that it is possible to read and write a particular format > but the > actual implementation varies according to whether or not the > derived class > overrides the base or interface method (in other words, 'doesn't > work as > advertised' only in specific circumstances). I don't know how to > solve this > issue except to add in the docs that specific formats don't implement > write() methods. > > Personally, I haven't had an issue with it and it probably makes no > difference, but I think it needs to be pointed out. The most > extreme I ran > into was Bio::Restriction::IO, which had 3 out of 4 plugin modules > that > didn't implement the write() method but left this in the synopsis > in POD: > > use Bio::Restriction::IO; > > $in = Bio::Restriction::IO->new(-file => "inputfilename" , > -format => 'withrefm'); > $out = Bio::Restriction::IO->new(-file => ">outputfilename" , > -format => 'bairoch'); > my $res = $in->read; # a Bio::Restriction::EnzymeCollection > $out->write($res); > > # or > > # use Bio::Restriction::IO; > # > # #input file format can be read from the file extension (dat| > xml) > # $in = Bio::Restriction::IO->newFh(-file => "inputfilename"); > # $out = Bio::Restriction::IO->newFh('-format' => 'xml'); > # > # # World's shortest flat<->xml format converter: > # print $out $_ while <$in>; > > None of this code works; in fact, no XML parser even exists for > these IO > classes! Bio::AlignIO also has a few as well (maf and Stockholm > formats > don't write). > > Chris > > >> -----Original Message----- >> From: Hilmar Lapp [mailto:hlapp at gmx.net] >> Sent: Wednesday, May 31, 2006 4:22 PM >> To: Chris Fields >> Cc: lstein at cshl.edu; bioperl-l at lists.open-bio.org; 'Heikki >> Lehvaslaiho' >> Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented >> >> >> On May 31, 2006, at 4:40 PM, Chris Fields wrote: >> >>> What about modules that have 'throw_not_implemented' statements >>> present? >> >> Those are often if not always legitimate - the problem are those that >> don't have them but fail to override an inherited interface or >> abstract method. >> >> If something is not implemented what is the better way to express >> this other than throwing an exception? (and if it's not an interface >> or abstract base class, saying so in the documentation) >> >> -hilmar >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From slenk at emich.edu Wed May 31 17:52:13 2006 From: slenk at emich.edu (Stephen Gordon Lenk) Date: Wed, 31 May 2006 17:52:13 -0400 Subject: [Bioperl-l] For CVS developers - throw_not_implemented Message-ID: <100682f110067a83.10067a83100682f1@emich.edu> Isn't it fairly standard in OO schemes/languages to have an exception thrown if a method can't be found at the end of a search up the class hierarchy? I recall being very mad at Smalltalk because "method not found" kept biting me. C++ has pure virtual base classes that do not allow objects to be instantiated directly; they are meant to be inherited and then implemented. Perl 6 was mentioned a bit back. Is this issue addressed there? Should it be? Do the Bioperl people feed their needs into Perl 6 so that all the code effort to make Bio::Root is handled for them in the next effort by Perl 6 itself. Make the Perl 6 people solve these issues with your input, then you will not have to deal with implementing it yourselves. I'll just bet that you are not the only potential users of Perl 6 who will have to solve these issues eventually. ----- Original Message ----- From: Hilmar Lapp Date: Wednesday, May 31, 2006 5:21 pm Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented > > On May 31, 2006, at 4:40 PM, Chris Fields wrote: > > > What about modules that have 'throw_not_implemented' statements > > present? > > Those are often if not always legitimate - the problem are those > that > don't have them but fail to override an inherited interface or > abstract method. > > If something is not implemented what is the better way to express > this other than throwing an exception? (and if it's not an > interface > or abstract base class, saying so in the documentation) > > -hilmar > > -- > ========================================================= == > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > ========================================================= == > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From arareko at campus.iztacala.unam.mx Wed May 31 18:49:03 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Wed, 31 May 2006 17:49:03 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <001201c684d0$263c5530$15327e82@pyrimidine> References: <001201c684d0$263c5530$15327e82@pyrimidine> Message-ID: <447E1D5F.1050807@campus.iztacala.unam.mx> Brian, Jay, Chris, I agree with what Bernd Web said in another reply. For some people will be nice to still be able to run the script from the codebase and interact with it. I don't think it should be a lot of problem to maintain both tutorials, as long as the 'main' one is the one in the CVS tree. By reading what Jay did in order to convert it into mediawiki format, I suppose this can be easily done again for each new change to the script (again, this is just my guessing). Besides, as far as I've seen, there aren't frequent commits to the script at all. I've added a link in the left menu of the wiki. If you think it should point to the Tutorials page instead of the Bptutorial.pl page please let me know. Regards, Mauricio. Chris Fields wrote: > Brian, Jay, > > I think it would be nice to have the tutorial prominently displayed somehow > (Jay's suggestion), with a link provided via the tutorials page. Hopefully > this will help with the bioperl newbies. > > Jay, looks like there are still some weird formatting issues with the > bptutorial wiki page, something which I ran into before when getting the > Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or more > spaces preceding a line denotes code for some reason). Not much you can do > in these cases except remove the extra spaces in those spots. Looking good > though! > > Chris > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Brian Osborne >> Sent: Wednesday, May 31, 2006 8:58 AM >> To: Jay Hannah; bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl >> >> Jay, >> >> Excellent! Now we need to answer a few more questions for ourselves: >> >> - Do we remove the file bptutorial.pl from the package now? I'd say yes, >> we >> don't want to have to maintain two bptutorials. >> >> - What do we do with the script part of bptutorial.pl? It certainly could >> be >> excised and put into the examples/ directory, for example, but this would >> break a few of the paths that are being used. >> >> - A link to bptutorial? Or a link to the existing tutorials page? >> http://www.bioperl.org/wiki/Tutorials. >> >> Any thoughts on these? >> >> >> Brian O. >> >> >> On 5/31/06 9:07 AM, "Jay Hannah" wrote: >> >>> http://www.bioperl.org/wiki/Bptutorial.pl >>> >>> I think I just partially fulfilled this TODO: >>> >>> TODO: check if the POD is in the Wiki yet, and if not, put it here? >>> >>> I used Pod::Simple::Wiki (format 'mediawiki') to burn >>> bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it >> the >>> wiki page via my web browser. (Is that proper procedure? Is the plan to >> just >>> do that manually from time to time as the document changes?) >>> >>> Now what? >>> >>> Should there be a new link on the far left of bioperl.org called >> "Tutorial"? >>> It's an amazing document. IMHO it should be listed prominently on >> bioperl.org. >>> HTH, >>> >>> j >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Wed May 31 20:43:48 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 19:43:48 -0500 Subject: [Bioperl-l] For CVS developers - potential pitfallwith"returnundef" In-Reply-To: <200605311707.08196.lstein@cshl.edu> Message-ID: <002801c68514$72f11480$15327e82@pyrimidine> > -----Original Message----- > From: Lincoln Stein [mailto:lstein at cshl.edu] > Sent: Wednesday, May 31, 2006 4:07 PM > To: Chris Fields > Cc: 'Hilmar Lapp'; bioperl-l at lists.open-bio.org; 'Heikki Lehvaslaiho' > Subject: Re: [Bioperl-l] For CVS developers - potential > pitfallwith"returnundef" > > > > Instances: 17 Module : Bio::DB::SeqFeature::Store > > This is intentional. Bio::DB::SeqFeature::Store is intended to be a > virtual > base class. The throw_not_implemented() calls are there to force > developers > to override the needed interface methods. > > If this is not the right way to do it, let me know and I'll fix it. That's the right way, though I don't really know what the 'right way' is. Sorry Lincoln, didn't mean to imply anything directly at you specifically; I responded to your last post to stay in the thread, so to speak. It was meant to be a general statement that some classes haven't implemented methods specified by their abstract base or interface class. This is just output from a quickie script I wrote up to check on this and see how many of these statements are out there, and since there isn't a full-proof method to know what an abstract base class is, it pulls in a few abstract classes (such as yours) along with all the others. At least there aren't as many hits as Torsten's ~400-500 for 'return undef'! Anyway, I'm not sure what would be the best place to address code problems or issues like the unimplemented methods issue or Torsten's audits (list, wiki, etc); it's a delicate issue b/c it's bordering on code critiquing and what constitutes good vs. bad code. I remember some pretty heated arguments about the 'proper' way to do things a while back involving AUTOLOAD'ing methods, which I think is summarized somewhere in the wiki. Myself, I'm a microbiologist and not a programmer, so I'm prone to bouts of hackery, but I try to have the code at least do what the docs state. Chris > Lincoln > > > > Instances: 2 Module : Bio::DB::SeqVersion > > Instances: 3 Module : Bio::DB::Taxonomy > > Instances: 1 Module : Bio::FeatureIO::bed > > Instances: 1 Module : Bio::Map::Marker > > Instances: 1 Module : Bio::MapIO::fpc > > Instances: 1 Module : Bio::MapIO::mapmaker > > Instances: 1 Module : Bio::Restriction::IO::bairoch > > Instances: 1 Module : Bio::Restriction::IO::itype2 > > Instances: 1 Module : Bio::Restriction::IO::withrefm > > Instances: 1 Module : Bio::Tools::Analysis::SimpleAnalysisBase > > Instances: 3 Module : Bio::Tools::Run::WrapperBase > > > > Chris > > > > > -----Original Message----- > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > bounces at lists.open-bio.org] On Behalf Of Lincoln Stein > > > Sent: Wednesday, May 31, 2006 1:15 PM > > > To: Hilmar Lapp > > > Cc: bioperl-l at lists.open-bio.org; Heikki Lehvaslaiho > > > Subject: Re: [Bioperl-l] For CVS developers - potential > > > pitfallwith"returnundef" > > > > > > If the documentation says "returns false" then I expect to be able to > do > > > this: > > > > > > @result = foo(); > > > die "foo() failed" unless @result; > > > > > > If the documentation says "returns undef" then I expect this: > > > > > > @result = foo(); > > > die "foo() failed" unless $result[0]; > > > > > > Lincoln > > > > > > On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote: > > > > On May 31, 2006, at 12:03 PM, Lincoln Stein wrote: > > > > > If the subroutine is documented to return "false" on failure, then > > > > > one must call > > > > > return (or "return ()" ). > > > > > > > > The problem seems to be that 'a value that evaluates to either true > > > > or false' and 'a [meaningful] value or undef' and 'a value or > > > > false' ('a value or no value) are not the same in perl. And what > > > > would/should one expect if the doc states 'true on success and false > > > > otherwise'? > > > > > > > > Maybe the documentation should also be fixed to avoid any ambiguity. > > > > I.e., avoid documenting 'a value or false' because it may be > > > > ambiguous (not only) to the less proficient. 'True or false' should > > > > imply a value being returned. > > > > > > > > Comments? > > > > > > > > -hilmar > > > > > > -- > > > Lincoln D. Stein > > > Cold Spring Harbor Laboratory > > > 1 Bungtown Road > > > Cold Spring Harbor, NY 11724 > > > (516) 367-8380 (voice) > > > (516) 367-8389 (fax) > > > FOR URGENT MESSAGES & SCHEDULING, > > > PLEASE CONTACT MY ASSISTANT, > > > SANDRA MICHELSEN, AT michelse at cshl.edu > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Wed May 31 20:56:12 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 19:56:12 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <447E1D5F.1050807@campus.iztacala.unam.mx> Message-ID: <002901c68516$316d4fe0$15327e82@pyrimidine> Mauricio et al, Sounds good, except that there are a few issues with the formatting done by Pod::Simple::Wiki, such as changing some things to tags when they obviously aren't code; I don't know if thee is a work around for that (Jay?). It may not be anything too serious though. There was a similar issue with the INSTALL doc conversion to wiki that I ran into, in that I don't think it will be easy converting one way or the other (POD->wiki or wiki->POD or text), so syncing updates with wiki and CVS docs could be an issue we'll have to face in the future. We could strip the POD out of the script and have the docs on the wiki (Brian's idea), or have minimal POD in the tutorial and keep the wiki updated, just to simplify things, but this may not appeal to those who use perldoc frequently (I personally use browsable prettified HTML). cjf > -----Original Message----- > From: Mauricio Herrera Cuadra [mailto:arareko at campus.iztacala.unam.mx] > Sent: Wednesday, May 31, 2006 5:49 PM > To: Chris Fields > Cc: 'Brian Osborne'; 'Jay Hannah'; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl > > Brian, Jay, Chris, > > I agree with what Bernd Web said in another reply. For some people will > be nice to still be able to run the script from the codebase and > interact with it. > > I don't think it should be a lot of problem to maintain both tutorials, > as long as the 'main' one is the one in the CVS tree. By reading what > Jay did in order to convert it into mediawiki format, I suppose this can > be easily done again for each new change to the script (again, this is > just my guessing). Besides, as far as I've seen, there aren't frequent > commits to the script at all. > > I've added a link in the left menu of the wiki. If you think it should > point to the Tutorials page instead of the Bptutorial.pl page please let > me know. > > Regards, > Mauricio. > > Chris Fields wrote: > > Brian, Jay, > > > > I think it would be nice to have the tutorial prominently displayed > somehow > > (Jay's suggestion), with a link provided via the tutorials page. > Hopefully > > this will help with the bioperl newbies. > > > > Jay, looks like there are still some weird formatting issues with the > > bptutorial wiki page, something which I ran into before when getting the > > Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or > more > > spaces preceding a line denotes code for some reason). Not much you can > do > > in these cases except remove the extra spaces in those spots. Looking > good > > though! > > > > Chris > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Brian Osborne > >> Sent: Wednesday, May 31, 2006 8:58 AM > >> To: Jay Hannah; bioperl-l at lists.open-bio.org > >> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl > >> > >> Jay, > >> > >> Excellent! Now we need to answer a few more questions for ourselves: > >> > >> - Do we remove the file bptutorial.pl from the package now? I'd say > yes, > >> we > >> don't want to have to maintain two bptutorials. > >> > >> - What do we do with the script part of bptutorial.pl? It certainly > could > >> be > >> excised and put into the examples/ directory, for example, but this > would > >> break a few of the paths that are being used. > >> > >> - A link to bptutorial? Or a link to the existing tutorials page? > >> http://www.bioperl.org/wiki/Tutorials. > >> > >> Any thoughts on these? > >> > >> > >> Brian O. > >> > >> > >> On 5/31/06 9:07 AM, "Jay Hannah" wrote: > >> > >>> http://www.bioperl.org/wiki/Bptutorial.pl > >>> > >>> I think I just partially fulfilled this TODO: > >>> > >>> TODO: check if the POD is in the Wiki yet, and if not, put it here? > >>> > >>> I used Pod::Simple::Wiki (format 'mediawiki') to burn > >>> bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it > >> the > >>> wiki page via my web browser. (Is that proper procedure? Is the plan > to > >> just > >>> do that manually from time to time as the document changes?) > >>> > >>> Now what? > >>> > >>> Should there be a new link on the far left of bioperl.org called > >> "Tutorial"? > >>> It's an amazing document. IMHO it should be listed prominently on > >> bioperl.org. > >>> HTH, > >>> > >>> j > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM From osborne1 at optonline.net Wed May 31 21:37:15 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 31 May 2006 21:37:15 -0400 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <447E1D5F.1050807@campus.iztacala.unam.mx> Message-ID: Mauricio, Bernd didn't say he want the _script_ in the package, he said he wanted bptutorial.pl in the package, not indicating whether it was the documentation or the script that was important. It's my suspicion that the documentation is more important than the script, and this is what my last letter was asking, in part: is the script important? Or can we focus on the text/POD part? Brian O. On 5/31/06 6:49 PM, "Mauricio Herrera Cuadra" wrote: > I agree with what Bernd Web said in another reply. For some people will > be nice to still be able to run the script from the codebase and > interact with it. From cjfields at uiuc.edu Wed May 31 21:42:54 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 20:42:54 -0500 Subject: [Bioperl-l] For CVS developers - throw_not_implemented In-Reply-To: <100682f110067a83.10067a83100682f1@emich.edu> Message-ID: <002a01c6851c$b3b8a980$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Stephen Gordon Lenk > Sent: Wednesday, May 31, 2006 4:52 PM > To: Hilmar Lapp > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented > > > Isn't it fairly standard in OO schemes/languages to have an exception > thrown if a method > can't be found at the > end of a search up the class hierarchy? I recall being very mad at > Smalltalk because "method > not found" kept > biting me. C++ has pure virtual base classes that do not allow objects to > be instantiated > directly; they are > meant to be inherited and then implemented. Perl will throw an error if it can't find a method in a class hierarchy. It will do a few things first before dying, like looking for AUTOLOAD, etc. AUTOLOAD has it's supporters and detractors; I try to stay away from it as much as possible. Not sure about C++ like pure virtual classes in Perl5, i.e. not allowing direct object instantiation, but Perl6 is supposed to have them, at least according to Apocalypse 12. From what Mr. Wall says about OOP in Perl5, it's essentially 'bolted on' but works with caveats (is 'private' really 'private'?). Perl6 is rebuilt from scratch (internals are OO). > Perl 6 was mentioned a bit back. Is this issue addressed there? Should it > be? Do the Bioperl > people feed their > needs into Perl 6 so that all the code effort to make Bio::Root is handled > for them in the next > effort by Perl 6 > itself. Make the Perl 6 people solve these issues with your input, then > you will not have to > deal with > implementing it yourselves. I'll just bet that you are not the only > potential users of Perl 6 who > will have to solve > these issues eventually. I think Perl6 will solve most (if not all) these problems since it's a complete rebuild. In fact, it's pretty much a new language altogether from what I have seen (and the little I have played around with using Pugs). Parrot is supposed to handle mixes of Perl5/Perl6, so it may not be necessary to immediately convert all of bioperl to Perl6. Though I have also heard of a Perl5->6 converter in the works as well... >From an OO standpoint, I believe everything is considered an object in Perl6, though it's not supposed to force you into using objects according to the Apocalypses that I have read. I actually see a lot there that reminds me of C++ (but in a Perl-ish way, of course). Apocalypse 12 is a good primer, though you may want to go through the others first, they're heavy slogging: http://dev.perl.org/perl6/doc/design/apo/A12.html Not sure what you mean by 'feeding our needs into Perl6'. I have periodically checked on perl6 progress and they seem to have everything well under control. Chris > ----- Original Message ----- > From: Hilmar Lapp > Date: Wednesday, May 31, 2006 5:21 pm > Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented > > > > > On May 31, 2006, at 4:40 PM, Chris Fields wrote: > > > > > What about modules that have 'throw_not_implemented' statements > > > present? > > > > Those are often if not always legitimate - the problem are those > > that > > don't have them but fail to override an inherited interface or > > abstract method. > > > > If something is not implemented what is the better way to express > > this other than throwing an exception? (and if it's not an > > interface > > or abstract base class, saying so in the documentation) > > > > -hilmar > > > > -- > > > ========================================================= > == > > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > > ========================================================= > == > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jay at jays.net Wed May 31 21:54:01 2006 From: jay at jays.net (Jay Hannah) Date: Wed, 31 May 2006 20:54:01 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: References: Message-ID: <447E48B9.4080503@jays.net> Brian Osborne wrote: > - Do we remove the file bptutorial.pl from the package now? I'd say yes, we > don't want to have to maintain two bptutorials. We certainly wouldn't want to try to maintain two copies, one POD one in wiki. That would be the worst of all options. One option that hasn't been mentioned yet is to keep maintenance of that in POD in the distro (leaving the cool runability alone), and then flag that document as unchangeable in the wiki with a note on top "Maintenance of this document is done in POD in the distro. Submit POD patches to bioperl-l and we'll re-post an updated copy to this wiki." Just a thought. > - What do we do with the script part of bptutorial.pl? It certainly could be > excised and put into the examples/ directory, for example, but this would > break a few of the paths that are being used. /README says this: scripts/ - Useful production-quality scripts with POD documentation examples/ - Scripts demonstrating the many uses of Bioperl I'm personally not clear on the difference. Little stuff should start in examples/ and graduate to scripts/ once they've matured? Is the doc/ tree being abandoned? doc/faq (empty?) doc/howto doc/howto/examples doc/howto/figs (empty?) doc/howto/html (empty?) doc/howto/pdf (empty?) doc/howto/sgml (empty?) doc/howto/txt (empty?) doc/howto/xml (empty?) Does all that stuff officially live in and is being changed in the wiki, never to return to the distro? Any reason those empty dirs aren't nuked out of CVS? Chris Fields wrote: > Jay, looks like there are still some weird formatting issues with the > bptutorial wiki page, something which I ran into before when getting the > Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or more > spaces preceding a line denotes code for some reason). Not much you can do > in these cases except remove the extra spaces in those spots. Looking good > though! Sorry, I spent zero time on the whole conversion. I'm not sure what parts didn't convert well. I've never done that conversion before, and know nothing about mediawiki. I just blindly let Pod::Simple::Wiki do its thing then ran off to work. :) Mauricio Herrera Cuadra wrote: > I've added a link in the left menu of the wiki. If you think it should > point to the Tutorials page instead of the Bptutorial.pl page please let > me know. Instead of all these competing links on the left, maybe we should have a master "documentation" page linked on the left cascading like so? Documentation (linked on the left menu) - Quick start - FAQ - HOWTOs - Tutorials (What's the conceptual difference between a HOWTO and a tutorial?) It's hard for me to dive into a wiki lifestyle for the huge documentation pillars since it can't ever get back into the distro... (can it?) Small, throw away stuff is great for the wiki, but huge, established, thoughtful, long documents should be left in the distro? Present (and searchable) on the wiki but static? Why isn't the short "Current events" just listed on the top of the "News" page? Sick of my endless questions yet? -grin- j From cjfields at uiuc.edu Wed May 31 23:09:38 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 22:09:38 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <447E48B9.4080503@jays.net> Message-ID: <000001c68528$d1b6ec10$15327e82@pyrimidine> ... > We certainly wouldn't want to try to maintain two copies, one POD one in > wiki. That would be the worst of all options. One option that hasn't been > mentioned yet is to keep maintenance of that in POD in the distro (leaving > the cool runability alone), and then flag that document as unchangeable in > the wiki with a note on top "Maintenance of this document is done in POD > in the distro. Submit POD patches to bioperl-l and we'll re-post an > updated copy to this wiki." > > Just a thought. There are probably three schools of thought on docs: those that like nice docs with links within and beyond BioPerl (hence the wiki), those who like including docs with the distribution, and those that would like both. The latter would be nice but isn't realistic unless we can come up with a way to sync changes between the wiki and CVS those docs we want to include with the distribution w/o too much trouble. I'm in the first school of thought since rich text with links is better and more informative than plain text any day. It might be a very small school though... > > - What do we do with the script part of bptutorial.pl? It certainly > could be > > excised and put into the examples/ directory, for example, but this > would > > break a few of the paths that are being used. > > /README says this: > > scripts/ - Useful production-quality scripts with POD documentation > examples/ - Scripts demonstrating the many uses of Bioperl > > I'm personally not clear on the difference. Little stuff should start in > examples/ and graduate to scripts/ once they've matured? > > Is the doc/ tree being abandoned? Most docs have been moved over to the wiki, which generates nicely formatted docs for printing. ... > Does all that stuff officially live in and is being changed in the wiki, > never to return to the distro? It's easier to add changes in the wiki and add markup, links, etc. Much richer text, so on. > Any reason those empty dirs aren't nuked out of CVS? > > Chris Fields wrote: > > Jay, looks like there are still some weird formatting issues with the > > bptutorial wiki page, something which I ran into before when getting the > > Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or > more > > spaces preceding a line denotes code for some reason). Not much you can > do > > in these cases except remove the extra spaces in those spots. Looking > good > > though! > > Sorry, I spent zero time on the whole conversion. I'm not sure what parts > didn't convert well. I've never done that conversion before, and know > nothing about mediawiki. I just blindly let Pod::Simple::Wiki do its thing > then ran off to work. :) No big deal. > Mauricio Herrera Cuadra wrote: > > I've added a link in the left menu of the wiki. If you think it should > > point to the Tutorials page instead of the Bptutorial.pl page please let > > me know. > > Instead of all these competing links on the left, maybe we should have a > master "documentation" page linked on the left cascading like so? > > Documentation (linked on the left menu) > - Quick start > - FAQ > - HOWTOs > - Tutorials Okay, though Mauricio may know a bit more on how/if this can be done. Mauricio? > (What's the conceptual difference between a HOWTO and a tutorial?) I believe the reasoning is along these lines: HOWTO's are focused in on specific areas (graphics, trees, BLAST report parsing, etc) and thus usually has greater detail. The tutorials are more broadly based (sort of a general bioperl HOWTO). The only exception is the Beginner's HOWTO, but even that has additional information over the tutorial (at least it did the last time I looked at the tutorial, which has been a while). > It's hard for me to dive into a wiki lifestyle for the huge documentation > pillars since it can't ever get back into the distro... (can it?) Small, > throw away stuff is great for the wiki, but huge, established, thoughtful, > long documents should be left in the distro? Present (and searchable) on > the wiki but static? Hence the problem we face now. It is something we need to really look into before adding too much more to the wiki. IMHO, I think we should have very little information directly in the distribution itself since it's already quite large. It's almost as easy to have a bare-bones INSTALL file, which would point to the wiki for additional information. But I may be very much alone in that train of thought ; > > Why isn't the short "Current events" just listed on the top of the "News" > page? Don't know. > Sick of my endless questions yet? -grin- Not really. cjf > j > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From gad14 at cornell.edu Tue May 30 12:57:41 2006 From: gad14 at cornell.edu (Genevieve DeClerck) Date: Tue, 30 May 2006 12:57:41 -0400 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <447BFB20.40501@mrc-dunn.cam.ac.uk> References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk> Message-ID: <447C7985.9000404@cornell.edu> Thanks for your comment Sendu, it was very helpful. I think this must be what's going on.. I am using $blast_report->next_result in both subroutines. It appears that analyzing the blast results first w/ my sort subroutine empties (?) the $blast_result object so that when I try to print, there is nothing left to print. (and visa-versa when I print first then try to sort). So, from the looks of things, using next_result has the effect of popping the Bio::Search::Result::ResultI objects off of the SearchIO blast report object?? It seems I could get around this by making a copy of the blast report by setting it to another new variable...(not the most elegant solution) but I'm having trouble with this... If I do: my $blast_report_copy = $blast_report; I'm just copying the reference to the SearchIO blast result, so it doesn't help me. How can I make another physical copy of this blast result object? Seems like a simple thing but how to do it is escaping me. But better yet, the way to go is to 'reset the counter,' or to find a way to look at/print/sort the results without removing data from the blast result object. How is this done though?? Sendu and Brian, I didn't post the sort_results subroutine because it is sprawling, as is a lot of my code. The code I provided was more like an aid for my explanation of the problem.. it doesn't actually run - sorry for the confusion, I should have more clear on that. The important thing to know perhaps is that both sort_results and print_blast_results contain a foreach loop where I am using the 'next_results' method to view blast results. (And to clarify for Torsten, the blastall() is working just fine - the analysis/viewing of the results object is where I am encountering the problem.) Any other ideas would be greatly appreciated... Thank you, Genevieve Sendu Bala wrote: > Genevieve DeClerck wrote: > >> Hi, > > [snip] > >> If I've sorted the results the sorted-results will print to screen, >> however when I try to print the Hit Table results nothing is returned, >> as if the blast results have evaporated.... and visa versa, if i >> comment out the part where i point my sorting subroutine to the blast >> results reference, my hit table results suddenly prints to screen. > > [snip] > >> Here's an abbreviated version of my code: > > [snip] > >> ####### >> ### the following 2 actions seem to be mutually exclusive. >> # 1) sort results into 1-hitter, 2-hitter, etc. groups of >> # SeqFeature objs stored in arrays. arrays are then printed >> # to stdout >> &sort_results($blast_report); >> >> # 2) print blast results >> &print_blast_results($blast_report); > > >> sub print_blast_results{ >> my $report = shift; >> while(my $result = $report->next_result()){ > > [snip] > > You didn't give us your sort_results subroutine, but is it as simple as > they both use $report->next_result (and/or $result->next_hit), but you > don't reset the internal counter back to the start, so the second > subroutine tries to get the next_result and finds the first subroutine > has already looked at the last result and so next_result returns false? > > From a quick look it wasn't obvious how to reset the counter. Hopefully > this can be done and someone else knows how. > From lstein at cshl.edu Wed May 31 11:17:39 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 31 May 2006 11:17:39 -0400 Subject: [Bioperl-l] SOLVED Bio::Graphics::Panel make ruler have neg values In-Reply-To: <5b6410e0605302045x5c420674x6f898a8a2973991a@mail.gmail.com> References: <5b6410e0605302045x5c420674x6f898a8a2973991a@mail.gmail.com> Message-ID: <200605311117.41479.lstein@cshl.edu> Hi Kevin, Since you are modifying the Panel.pm source code, why don't you just go ahead and use the current Bio::Graphics development tree? Since 1.5.1 it supports negative coordinates. Here's an illustration: #!/usr/bin/perl use strict; use Bio::Graphics; use Bio::Graphics::Feature; my $whole = Bio::Graphics::Feature->new(-start=>-200,-end=>+200); my $feature = Bio::Graphics::Feature->new(-start=>-100,-end=>+100,-strand=>+1); my $panel = Bio::Graphics::Panel->new(-start=> -200, -end => +200, -width=>800, -pad_left=>10, -pad_right=>10); $panel->add_track($whole, -glyph=>'arrow', -double=>1, -tick=>2); $panel->add_track($feature, -glyph=>'box', -stranded=>1); print $panel->png; exit 0; The resulting image is attached. Lincoln On Tuesday 30 May 2006 23:45, Kevin Lam Koiyau wrote: > I am so sorry for the truncated email accidentally hit reply. > if anyone is interested i have opted to change > > change line 161 of arrow.pm in Perl/site/lib/Bio/Graphics/Glyph/arrow.pm > in linux its > /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Glyph/arrow.pm > > > $gd->string($font,$middle,$center+$a2-1,$label,$font_color) > > to > > $gd->string($font,$middle,$center+$a2-1,$label-1000,$font_color) > > just for this one-off use. > > > > strangely I found at line 112 for ver 1.51 bioperl in arrow.pm a hidden > option for coords offset? > my $relative_coords_offset = $self->option('relative_coords_offset'); > $relative_coords_offset = 1 unless defined $relative_coords_offset; > but entering the option -relative_coords_offset=>1000 in the arrow glyphs > didn't do anything... > > > > Hi! > > > oh it was in a slightly different header asking about the create image > > map feature. > > I am using the stable version 1.4 of bioperl now. In any case I have not > > added the sequence as a feature annotated seq. as I already have the bp > > where the TF binds (in 1-1050 numberings) so what I did was to just add > > graded segments based on the position. > > I saw that there is a scale function for the arrow glyp however, it is a > > multiply function, can it be hacked to take in a offset value (ie minus > > the > > scale by 1000?) > > > > cheers > > kevin > > > > > > Hi, > > > > > For some reason I didn't see the first posting on this. In current > > > > bioperl > > > > > live, the ruler can have negative numberings - I use this routinely. > > > You need > > > to create a feature that starts in negative coordinates. What is > > > > happening > > > > > to > > > you when you try this? > > > > > > Lincoln > > > > > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote: > > > > Hi > > > > thanks for the help offered thus far! > > > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq > > > > > > using > > > > > > > bioperl. therefore i was asked to make the numberings as such (-1000) > > > > is > > > > > > there any way at all to do this in bioperl without changing the .pm > > > > > > file? > > > > > > > thanks guys.. > > > > kevin > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > > > Lincoln D. Stein > > > Cold Spring Harbor Laboratory > > > 1 Bungtown Road > > > Cold Spring Harbor, NY 11724 > > > (516) 367-8380 (voice) > > > (516) 367-8389 (fax) > > > FOR URGENT MESSAGES & SCHEDULING, > > > PLEASE CONTACT MY ASSISTANT, > > > SANDRA MICHELSEN, AT michelse at cshl.edu > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu -------------- next part -------------- A non-text attachment was scrubbed... Name: negatives.png Type: image/png Size: 1065 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060531/eaeb5e28/attachment.png From lstein at cshl.edu Wed May 31 12:05:47 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 31 May 2006 12:05:47 -0400 Subject: [Bioperl-l] Fwd: Re: SOLVED Bio::Graphics::Panel make ruler have neg values Message-ID: <200605311205.48122.lstein@cshl.edu> Oddly, bioperl-l listserver is holding this mail because it has "a suspicious header". I took out Kevin's email address in case it is the "spammotel" header that is bothering it. Lincoln ---------- Forwarded Message ---------- Subject: Re: [Bioperl-l] SOLVED Bio::Graphics::Panel make ruler have neg values Date: Wednesday 31 May 2006 11:17 From: Lincoln Stein To: bioperl-l at lists.open-bio.org Cc: "Kevin Lam Koiyau" Hi Kevin, Since you are modifying the Panel.pm source code, why don't you just go ahead and use the current Bio::Graphics development tree? Since 1.5.1 it supports negative coordinates. Here's an illustration: #!/usr/bin/perl use strict; use Bio::Graphics; use Bio::Graphics::Feature; my $whole = Bio::Graphics::Feature->new(-start=>-200,-end=>+200); my $feature = Bio::Graphics::Feature->new(-start=>-100,-end=>+100,-strand=>+1); my $panel = Bio::Graphics::Panel->new(-start=> -200, -end => +200, -width=>800, -pad_left=>10, -pad_right=>10); $panel->add_track($whole, -glyph=>'arrow', -double=>1, -tick=>2); $panel->add_track($feature, -glyph=>'box', -stranded=>1); print $panel->png; exit 0; The resulting image is attached. Lincoln On Tuesday 30 May 2006 23:45, Kevin Lam Koiyau wrote: > I am so sorry for the truncated email accidentally hit reply. > if anyone is interested i have opted to change > > change line 161 of arrow.pm in Perl/site/lib/Bio/Graphics/Glyph/arrow.pm > in linux its > /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Glyph/arrow.pm > > > $gd->string($font,$middle,$center+$a2-1,$label,$font_color) > > to > > $gd->string($font,$middle,$center+$a2-1,$label-1000,$font_color) > > just for this one-off use. > > > > strangely I found at line 112 for ver 1.51 bioperl in arrow.pm a hidden > option for coords offset? > my $relative_coords_offset = $self->option('relative_coords_offset'); > $relative_coords_offset = 1 unless defined $relative_coords_offset; > but entering the option -relative_coords_offset=>1000 in the arrow glyphs > didn't do anything... > > > > Hi! > > > oh it was in a slightly different header asking about the create image > > map feature. > > I am using the stable version 1.4 of bioperl now. In any case I have not > > added the sequence as a feature annotated seq. as I already have the bp > > where the TF binds (in 1-1050 numberings) so what I did was to just add > > graded segments based on the position. > > I saw that there is a scale function for the arrow glyp however, it is a > > multiply function, can it be hacked to take in a offset value (ie minus > > the > > scale by 1000?) > > > > cheers > > kevin > > > > > > Hi, > > > > > For some reason I didn't see the first posting on this. In current > > > > bioperl > > > > > live, the ruler can have negative numberings - I use this routinely. > > > You need > > > to create a feature that starts in negative coordinates. What is > > > > happening > > > > > to > > > you when you try this? > > > > > > Lincoln > > > > > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote: > > > > Hi > > > > thanks for the help offered thus far! > > > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq > > > > > > using > > > > > > > bioperl. therefore i was asked to make the numberings as such (-1000) > > > > is > > > > > > there any way at all to do this in bioperl without changing the .pm > > > > > > file? > > > > > > > thanks guys.. > > > > kevin > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > > > Lincoln D. Stein > > > Cold Spring Harbor Laboratory > > > 1 Bungtown Road > > > Cold Spring Harbor, NY 11724 > > > (516) 367-8380 (voice) > > > (516) 367-8389 (fax) > > > FOR URGENT MESSAGES & SCHEDULING, > > > PLEASE CONTACT MY ASSISTANT, > > > SANDRA MICHELSEN, AT michelse at cshl.edu > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu ------------------------------------------------------- -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu -------------- next part -------------- A non-text attachment was scrubbed... Name: negatives.png Type: image/png Size: 1065 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060531/6c5f4137/attachment.png From rvosa at sfu.ca Tue May 30 15:10:17 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Tue, 30 May 2006 12:10:17 -0700 Subject: [Bioperl-l] New mailing list for Bio::Phylo Message-ID: <447C9899.5060102@sfu.ca> Dear recipients, the open bioinformatics foundation has been kind enough to host a mailing list for Bio::Phylo (http://search.cpan.org/~rvosa/Bio-Phylo/, the cpan distribution for phylogenetic analysis using perl). The scope of this list is at present fairly broad as it is both meant for user questions and development discussion on deeper integration with bioperl. You are invited to sign up at: http://lists.open-bio.org/mailman/listinfo/bio-phylo-l Best wishes, Rutger Vos -- ++++++++++++++++++++++++++++++++++++++++++++++++++++ Rutger Vos, PhD. candidate Department of Biological Sciences Simon Fraser University 8888 University Drive Burnaby, BC, V5A1S6 Phone: 604-291-5625 Fax: 604-291-3496 Personal site: http://www.sfu.ca/~rvosa FAB* lab: http://www.sfu.ca/~fabstar Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ ++++++++++++++++++++++++++++++++++++++++++++++++++++ From bioperlanand at yahoo.com Mon May 1 14:36:20 2006 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Mon, 1 May 2006 11:36:20 -0700 (PDT) Subject: [Bioperl-l] how to obtain GIs from clone_ids Message-ID: <20060501183620.85791.qmail@web37901.mail.mud.yahoo.com> Hi everybody, I have a file containing clone_ids (from the Features annotation section of a GenBank entry) ------------------------------------------------------------ FEATURES Location/Qualifiers source 1..707 /clone="C0005918b04" ------------------------------------------------------------ Is there a way in Bioperl to send a query over the internet (one clone_id at a time) and get out just the GI number for that clone_id? Any suggestions.. Thanks in advance. Anand --------------------------------- Blab-away for as little as 1?/min. Make PC-to-Phone Calls using Yahoo! Messenger with Voice. From cuiw at mail.nih.gov Mon May 1 15:39:01 2006 From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F]) Date: Mon, 1 May 2006 15:39:01 -0400 Subject: [Bioperl-l] how to obtain GIs from clone_ids In-Reply-To: <20060501183620.85791.qmail@web37901.mail.mud.yahoo.com> Message-ID: use strict; use Bio::DB::Query::GenBank; my $query_string = 'EST["C0005918b04"]'; my $query = Bio::DB::Query::GenBank->new(-db=>'nucleotide', -query=>$query_string, ); my $count = $query->count; my @ids = $query->ids; for (@ids) { print; } -----Original Message----- From: Anand Venkatraman [mailto:bioperlanand at yahoo.com] Sent: Monday, May 01, 2006 2:36 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] how to obtain GIs from clone_ids Hi everybody, I have a file containing clone_ids (from the Features annotation section of a GenBank entry) ------------------------------------------------------------ FEATURES Location/Qualifiers source 1..707 /clone="C0005918b04" ------------------------------------------------------------ Is there a way in Bioperl to send a query over the internet (one clone_id at a time) and get out just the GI number for that clone_id? Any suggestions.. Thanks in advance. Anand --------------------------------- Blab-away for as little as 1?/min. Make PC-to-Phone Calls using Yahoo! Messenger with Voice. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From s.ryazansky at gmail.com Mon May 1 17:55:13 2006 From: s.ryazansky at gmail.com (Sergei Ryazansky) Date: Mon, 1 May 2006 21:55:13 +0000 (UTC) Subject: [Bioperl-l] blast program to run locally on windows References: <007c01c66883$61f29490$15327e82@pyrimidine> <20060425215433.35436.qmail@web36613.mail.mud.yahoo.com> Message-ID: Hi, Can you post your formatdb.log file here? From cjfields at uiuc.edu Tue May 2 00:15:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 1 May 2006 23:15:19 -0500 Subject: [Bioperl-l] blast program to run locally on windows In-Reply-To: References: <007c01c66883$61f29490$15327e82@pyrimidine> <20060425215433.35436.qmail@web36613.mail.mud.yahoo.com> Message-ID: We managed to work our way through it. He hadn't set ncbi.ini to the correct directories; the database was formatted correctly. Chris On May 1, 2006, at 4:55 PM, Sergei Ryazansky wrote: > Hi, > Can you post your formatdb.log file here? > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue May 2 12:19:34 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 2 May 2006 11:19:34 -0500 Subject: [Bioperl-l] Bio::DB::GenBank and complexity Message-ID: <000901c66e04$33e07370$15327e82@pyrimidine> I ran into some wonkiness with using extra parameters ('seq_start', 'seq_stop', 'strand', and 'complexity') with Bio::DB::GenBank that I have gone through, fixed, and committed. I also have added a few tests to DB.t for everything (all changes were in Bio::DB::WebDBSeqI and Bio::DB::NCBIHelper). The 'complexity' tag is the strangest, though I did manage to get it added as well (with tests). This is how NCBI defines complexity: complexity regulates the display: 0 - get the whole blob 1 - get the bioseq for gi of interest (default in Entrez) 2 - get the minimal bioseq-set containing the gi of interest 3 - get the minimal nuc-prot containing the gi of interest 4 - get the minimal pub-set containing the gi of interest Here's my quandary; when setting complexity to '0', you get a glob back (the main sequence as well as any subsequences, such as CDS); this is in essence a sequence stream with multiple alphabet types. So, I now have it set up to do this: my $factory = Bio::DB::GenBank->new(-format => 'fasta', -complexity => 0 ); my $seqin = $factory->get_Seq_by_acc($acc); while (my $seq = $seqin->next_seq) { $seqout->write_seq($seq); } since I thought returning an array would be horrendously expensive on memory, esp. with larger sequences. Currently this is only set up for sequences which are retrieved when complexity is set to '0' so it's a pretty unique case. Regardless, I'm worried that, since users expect a Bio::Seq object instead of a Bio::SeqIO object here, it will cause a lot of confusion with the API. Any suggestions/gripes? Chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From mamillerpa at yahoo.com Tue May 2 07:41:01 2006 From: mamillerpa at yahoo.com (Mark A. Miller) Date: Tue, 2 May 2006 04:41:01 -0700 (PDT) Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines Message-ID: <20060502114101.29745.qmail@web50409.mail.yahoo.com> Hello all. I have a recently donwloaded UniProt/TrEMBL flat file. I am trying to make FASTA subset files for some bacterial strains. I haven't been able to parse out the strain information from the OS or RC lines. These lines typically look like: OS Somegenus somespecies subsp. somesubspecies strain ABC123. RC STRAIN=ABC123. I'm not especiialy good with Perl, and I'm definitely weak when it comes to OOP. I have included some code I pasted together from various pages on the bioperl wiki. In addition to the wiki, I have been making use of www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html The code I have so far reports the species but not the subspecies or variant. I have also tried to walk through all of the feature, annotation and reference objects but I still can't seem to parse out the information I need. (For brevity, the example I'm including below only lists the code I used for the annotation objects.) Also, this code only prints the information... I know that I'll have to write a FASTA sequence object seperately. Any suggestions? Thanks, Mark --- --- --- #!/usr/bin/perl use Bio::SeqIO; my $usage = "getaccs.pl file format\n"; my $file = shift or die $usage; my $format = shift or die $usage; my $inseq = Bio::SeqIO->new(-file => "<$file", -format => $format ); while (my $seq = $inseq->next_seq) { my $species_object = $seq->species; my $species_string = $species_object->species; my $variant_string = $species_object->variant; my $common_string = $species_object->common_name; my $sub_string = $species_object->sub_species; my $binomial = $species_object->binomial('FULL'); print "display ",$seq->display_id,"\n"; print "accession ",$seq->accession_number,"\n"; print "desc ",$seq->desc,"\n"; print "species ",$species_string,"\n"; print "variant ",$variant_string,"\n"; print "common ",$common_string,"\n"; print "sub ",$sub_string,"\n"; print "binomial ",$binomial,"\n"; print $seq->seq,"\n"; my $anno_collection = $seq->annotation; for my $key ( $anno_collection->get_all_annotation_keys ) { my @annotations = $anno_collection->get_Annotations($key); for my $value ( @annotations ) { print "tagname : ", $value->tagname, "\n"; # $value is an Bio::Annotation, and has an "as_text" method print " annotation value: ", $value->as_text, "\n"; if ($value->tagname eq "reference") { my $hash_ref = $value->hash_tree; for my $key (keys %{$hash_ref}) { print $key,": ",$hash_ref->{$key},"\n"; } } } } print "\n"; } exit; --- --- --- --- --- --- --- --- Mark A. Miller __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cjfields at uiuc.edu Tue May 2 14:01:58 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 2 May 2006 13:01:58 -0500 Subject: [Bioperl-l] Bio::DB::GenBank and complexity In-Reply-To: <000901c66e04$33e07370$15327e82@pyrimidine> Message-ID: <000a01c66e12$8131a960$15327e82@pyrimidine> I hate responding to my own post! Just wanted to add that I'm adding a warnings for the get_Seq* methods to use the approp. get_Stream* method when complexity == 0 before returning the Bio::SeqIO object. CJF > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Tuesday, May 02, 2006 11:20 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::DB::GenBank and complexity > > I ran into some wonkiness with using extra parameters ('seq_start', > 'seq_stop', 'strand', and 'complexity') with Bio::DB::GenBank that I have > gone through, fixed, and committed. I also have added a few tests to DB.t > for everything (all changes were in Bio::DB::WebDBSeqI and > Bio::DB::NCBIHelper). The 'complexity' tag is the strangest, though I did > manage to get it added as well (with tests). This is how NCBI defines > complexity: > > complexity regulates the display: > 0 - get the whole blob > 1 - get the bioseq for gi of interest (default in Entrez) > 2 - get the minimal bioseq-set containing the gi of interest > 3 - get the minimal nuc-prot containing the gi of interest > 4 - get the minimal pub-set containing the gi of interest > > Here's my quandary; when setting complexity to '0', you get a glob back > (the > main sequence as well as any subsequences, such as CDS); this is in > essence > a sequence stream with multiple alphabet types. So, I now have it set up > to > do this: > > my $factory = Bio::DB::GenBank->new(-format => 'fasta', > -complexity => 0 > ); > > my $seqin = $factory->get_Seq_by_acc($acc); > > while (my $seq = $seqin->next_seq) { > $seqout->write_seq($seq); > } > > since I thought returning an array would be horrendously expensive on > memory, esp. with larger sequences. Currently this is only set up for > sequences which are retrieved when complexity is set to '0' so it's a > pretty > unique case. Regardless, I'm worried that, since users expect a Bio::Seq > object instead of a Bio::SeqIO object here, it will cause a lot of > confusion > with the API. Any suggestions/gripes? > > Chris > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Tue May 2 14:36:08 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue, 2 May 2006 14:36:08 -0400 Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines In-Reply-To: <20060502114101.29745.qmail@web50409.mail.yahoo.com> References: <20060502114101.29745.qmail@web50409.mail.yahoo.com> Message-ID: <7B49D031-9F74-43C3-AA4F-2AE115BB843D@duke.edu> This is really a limitation of the EMBL/GenBank format See this thread: http://lists.open-bio.org/pipermail/bioperl-l/2006-March/021068.html or on GMANE http://comments.gmane.org/gmane.comp.lang.perl.bio.general/10557 I don't know if any of this has been resolved really so hopefully James will speak up if he's implemented anything. -jason On May 2, 2006, at 7:41 AM, Mark A. Miller wrote: > Hello all. > > I have a recently donwloaded UniProt/TrEMBL flat file. I am trying to > make FASTA subset files for some bacterial strains. I haven't been > able to parse out the strain information from the OS or RC lines. > These lines typically look like: > > OS Somegenus somespecies subsp. somesubspecies strain ABC123. > RC STRAIN=ABC123. > > I'm not especiialy good with Perl, and I'm definitely weak when it > comes to OOP. > > I have included some code I pasted together from various pages on the > bioperl wiki. In addition to the wiki, I have been making use of > www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html > > The code I have so far reports the species but not the subspecies or > variant. I have also tried to walk through all of the feature, > annotation and reference objects but I still can't seem to parse out > the information I need. (For brevity, the example I'm including below > only lists the code I used for the annotation objects.) Also, this > code only prints the information... I know that I'll have to write a > FASTA sequence object seperately. > > Any suggestions? > > Thanks, > Mark > > --- --- --- > > > #!/usr/bin/perl > > > > use Bio::SeqIO; > > > > my $usage = "getaccs.pl file format\n"; > > my $file = shift or die $usage; > > my $format = shift or die $usage; > > > > my $inseq = Bio::SeqIO->new(-file => "<$file", > > -format => $format ); > > > > while (my $seq = $inseq->next_seq) { > > > > my $species_object = $seq->species; > > my $species_string = $species_object->species; > > my $variant_string = $species_object->variant; > > my $common_string = $species_object->common_name; > > my $sub_string = $species_object->sub_species; > > my $binomial = $species_object->binomial('FULL'); > > > > print "display ",$seq->display_id,"\n"; > > print "accession ",$seq->accession_number,"\n"; > > print "desc ",$seq->desc,"\n"; > > > > print "species ",$species_string,"\n"; > > print "variant ",$variant_string,"\n"; > > print "common ",$common_string,"\n"; > > print "sub ",$sub_string,"\n"; > > print "binomial ",$binomial,"\n"; > > > > print $seq->seq,"\n"; > > > > my $anno_collection = $seq->annotation; > > for my $key ( $anno_collection->get_all_annotation_keys ) { > > my @annotations = $anno_collection->get_Annotations($key); > > for my $value ( @annotations ) { > > print "tagname : ", $value->tagname, "\n"; > > # $value is an Bio::Annotation, and has an "as_text" method > > print " annotation value: ", $value->as_text, "\n"; > > > > if ($value->tagname eq "reference") { > > my $hash_ref = $value->hash_tree; > > for my $key (keys %{$hash_ref}) { > > print $key,": ",$hash_ref->{$key},"\n"; > > } > > } > > } > > } > > print "\n"; > > } > > exit; > > > > > > --- --- --- --- --- --- --- --- > > Mark A. Miller > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From mblanche at berkeley.edu Tue May 2 15:30:49 2006 From: mblanche at berkeley.edu (Marco Blanchette) Date: Tue, 02 May 2006 12:30:49 -0700 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF Message-ID: Dear all-- I have been trying to use the intersection function to extract overlapping region from alternatively spliced exons as in the following script. The returned object from the 'my $overlap = $exon1->intersection($exon2);' is actually loosing the strand of $exon1 if $exon1 is from the negative strand. Is this behavior expected? Should I check the strand of $exon1 before working on the object return by any Bio::RangeI function? Many thanks #!/usr/bin/perl use strict; use warnings; use Bio::DB::GFF; MAIN:{ my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', -dsn => 'dbi:mysql:database=dmel_43_LS;host=riolab.net', -user => 'guest'); my $test_db = $db->segment('4'); # Load up the exons into $exons_p for my $gene ($test_db->features(-types => 'gene')){ my $exons_p = extractExons($gene); cluster($exons_p) unless ($#{$exons_p} == -1); } } sub extractExons { my $gene = shift; my %ex_list; my @tcs = $gene->features( -type =>'processed_transcript', -attributes =>{Gene => $gene->group}); for my $tc (@tcs){ my @exons = $tc->features (-type => 'exon', -attributes => {Parent => $tc->group} ); for (@exons){ my $ex_id = $_->id; $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); } } my @values = values %ex_list; return(\@values); } sub cluster { my $exons_p = shift; for (my $s = 0; $s <= $#{$exons_p}; $s++){ for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ my $exon1 = $exons_p->[$s]; my $exon2 = $exons_p->[$t]; if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ my $overlap = $exon1->intersection($exon2); print "===\n";; print "ex1\n", $exon1->seq, "\n"; print "ex2\n", $exon2->seq, "\n"; print "overlap\n", $overlap->seq, "\n"; } } } } ______________________________ Marco Blanchette, Ph.D. mblanche at uclink.berkeley.edu Donald C. Rio's lab Department of Molecular and Cell Biology 16 Barker Hall University of California Berkeley, CA 94720-3204 Tel: (510) 642-1084 Cell: (510) 847-0996 Fax: (510) 642-6062 -- From osborne1 at optonline.net Tue May 2 16:17:29 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 02 May 2006 16:17:29 -0400 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF In-Reply-To: Message-ID: Marco, Yes, this is how intersection() is supposed to work. If both of the Range objects have the same strand then the strand information is returned as part of the result but if they aren't on the same strand then no strand information is returned. Brian O. On 5/2/06 3:30 PM, "Marco Blanchette" wrote: > Dear all-- > > I have been trying to use the intersection function to extract overlapping > region from alternatively spliced exons as in the following script. The > returned object from the 'my $overlap = $exon1->intersection($exon2);' is > actually loosing the strand of $exon1 if $exon1 is from the negative strand. > Is this behavior expected? Should I check the strand of $exon1 before > working on the object return by any Bio::RangeI function? > > Many thanks > > #!/usr/bin/perl > use strict; > use warnings; > use Bio::DB::GFF; > > MAIN:{ > > my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', > -dsn => > 'dbi:mysql:database=dmel_43_LS;host=riolab.net', > -user => 'guest'); > my $test_db = $db->segment('4'); > > # Load up the exons into $exons_p > for my $gene ($test_db->features(-types => 'gene')){ > > my $exons_p = extractExons($gene); > > cluster($exons_p) unless ($#{$exons_p} == -1); > > } > } > > sub extractExons { > my $gene = shift; > my %ex_list; > my @tcs = $gene->features( -type =>'processed_transcript', > -attributes =>{Gene => $gene->group}); > > for my $tc (@tcs){ > my @exons = $tc->features (-type => 'exon', > -attributes => {Parent => $tc->group} > ); > > for (@exons){ > my $ex_id = $_->id; > $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); > > } > > } > my @values = values %ex_list; > return(\@values); > } > > sub cluster { > my $exons_p = shift; > > for (my $s = 0; $s <= $#{$exons_p}; $s++){ > for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ > my $exon1 = $exons_p->[$s]; > my $exon2 = $exons_p->[$t]; > > if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ > > my $overlap = $exon1->intersection($exon2); > > print "===\n";; > print "ex1\n", $exon1->seq, "\n"; > print "ex2\n", $exon2->seq, "\n"; > print "overlap\n", $overlap->seq, "\n"; > } > } > } > } > ______________________________ > Marco Blanchette, Ph.D. > > mblanche at uclink.berkeley.edu > > Donald C. Rio's lab > Department of Molecular and Cell Biology > 16 Barker Hall > University of California > Berkeley, CA 94720-3204 > > Tel: (510) 642-1084 > Cell: (510) 847-0996 > Fax: (510) 642-6062 From mblanche at berkeley.edu Tue May 2 16:32:58 2006 From: mblanche at berkeley.edu (Marco Blanchette) Date: Tue, 02 May 2006 13:32:58 -0700 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF In-Reply-To: Message-ID: Brian-- Even when both elements of intersection() are from the negative strand, the return object is from the positive strand and $overlap is actually the revervese complement of the intersection between the 2 exons. Here is part of the output from the script below: === ex1 Strand: -1 CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG ex2 Strand: -1 CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAACCCGT CAAATCG overlap Strand: 1 CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTATTTT TGCCGACTGCCATGTTCAACTAATAAACCGG AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG ... If both are from the positive strand, the return object is positive as in: === ex1 Strand: 1 CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCTTTGT TTTGTGCCTGTTTCAGTATAAATTAATTATG CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGATGAAT AAATATACATATATGCAACATATATAACTTC CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGGCAGA GCAGCGATCAAAGCTGCGTTCGGTACTCGTT GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT ex2 Strand: 1 ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG overlap Strand: 1 CAACGCAGACGTG Is there something I am missing? Here is the script generating the output Many thanks all... Marco use strict; use warnings; use Bio::DB::GFF; MAIN:{ my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', -dsn => 'dbi:mysql:database=dmel_43_LS;host=riolab.net', -user => 'guest'); my $test_db = $db->segment('4'); # Load up the exons into $exons_p for my $gene ($test_db->features(-types => 'gene')){ my $exons_p = extractExons($gene); cluster($exons_p) unless ($#{$exons_p} == -1); } } sub extractExons { my $gene = shift; my %ex_list; my @tcs = $gene->features( -type =>'processed_transcript', -attributes =>{Gene => $gene->group}); for my $tc (@tcs){ my @exons = $tc->features (-type => 'exon', -attributes => {Parent => $tc->group} ); for (@exons){ my $ex_id = $_->id; $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); } } my @values = values %ex_list; return(\@values); } sub cluster { my $exons_p = shift; for (my $s = 0; $s <= $#{$exons_p}; $s++){ for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ my $exon1 = $exons_p->[$s]; my $exon2 = $exons_p->[$t]; if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ my $overlap = $exon1->intersection($exon2); print "===\n";; print "ex1\tStrand: ", $exon1->strand, "\n", $exon1->seq, "\n"; print "ex2\tStrand: ", $exon2->strand, "\n", $exon2->seq, "\n"; print "overlap\tStrand: ", $overlap->strand, "\n", $overlap->seq, "\n"; } } } } On 5/2/06 13:17, "Brian Osborne" wrote: > Marco, > > Yes, this is how intersection() is supposed to work. If both of the Range > objects have the same strand then the strand information is returned as part > of the result but if they aren't on the same strand then no strand > information is returned. > > Brian O. > > > On 5/2/06 3:30 PM, "Marco Blanchette" wrote: > >> Dear all-- >> >> I have been trying to use the intersection function to extract overlapping >> region from alternatively spliced exons as in the following script. The >> returned object from the 'my $overlap = $exon1->intersection($exon2);' is >> actually loosing the strand of $exon1 if $exon1 is from the negative strand. >> Is this behavior expected? Should I check the strand of $exon1 before >> working on the object return by any Bio::RangeI function? >> >> Many thanks >> >> #!/usr/bin/perl >> use strict; >> use warnings; >> use Bio::DB::GFF; >> >> MAIN:{ >> >> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', >> -dsn => >> 'dbi:mysql:database=dmel_43_LS;host=riolab.net', >> -user => 'guest'); >> my $test_db = $db->segment('4'); >> >> # Load up the exons into $exons_p >> for my $gene ($test_db->features(-types => 'gene')){ >> >> my $exons_p = extractExons($gene); >> >> cluster($exons_p) unless ($#{$exons_p} == -1); >> >> } >> } >> >> sub extractExons { >> my $gene = shift; >> my %ex_list; >> my @tcs = $gene->features( -type =>'processed_transcript', >> -attributes =>{Gene => $gene->group}); >> >> for my $tc (@tcs){ >> my @exons = $tc->features (-type => 'exon', >> -attributes => {Parent => $tc->group} >> ); >> >> for (@exons){ >> my $ex_id = $_->id; >> $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); >> >> } >> >> } >> my @values = values %ex_list; >> return(\@values); >> } >> >> sub cluster { >> my $exons_p = shift; >> >> for (my $s = 0; $s <= $#{$exons_p}; $s++){ >> for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ >> my $exon1 = $exons_p->[$s]; >> my $exon2 = $exons_p->[$t]; >> >> if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ >> >> my $overlap = $exon1->intersection($exon2); >> >> print "===\n";; >> print "ex1\n", $exon1->seq, "\n"; >> print "ex2\n", $exon2->seq, "\n"; >> print "overlap\n", $overlap->seq, "\n"; >> } >> } >> } >> } >> ______________________________ >> Marco Blanchette, Ph.D. >> >> mblanche at uclink.berkeley.edu >> >> Donald C. Rio's lab >> Department of Molecular and Cell Biology >> 16 Barker Hall >> University of California >> Berkeley, CA 94720-3204 >> >> Tel: (510) 642-1084 >> Cell: (510) 847-0996 >> Fax: (510) 642-6062 > > ______________________________ Marco Blanchette, Ph.D. mblanche at uclink.berkeley.edu Donald C. Rio's lab Department of Molecular and Cell Biology 16 Barker Hall University of California Berkeley, CA 94720-3204 Tel: (510) 642-1084 Cell: (510) 847-0996 Fax: (510) 642-6062 -- From osborne1 at optonline.net Tue May 2 17:49:49 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 02 May 2006 17:49:49 -0400 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF In-Reply-To: Message-ID: Marco, Odd, because the intersection() code is quite simple and it's clear how it should behave. What version of Bioperl are you using? I'm looking at the latest, in bioperl-live... Brian O. On 5/2/06 4:32 PM, "Marco Blanchette" wrote: > Brian-- > > Even when both elements of intersection() are from the negative strand, the > return object is from the positive strand and $overlap is actually the > revervese complement of the intersection between the 2 exons. Here is part > of the output from the script below: > > === > ex1 Strand: -1 > CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA > AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG > TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG > ex2 Strand: -1 > CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA > AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG > TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAACCCGT > CAAATCG > overlap Strand: 1 > CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTATTTT > TGCCGACTGCCATGTTCAACTAATAAACCGG > AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG > ... > > If both are from the positive strand, the return object is positive as in: > > === > ex1 Strand: 1 > CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCTTTGT > TTTGTGCCTGTTTCAGTATAAATTAATTATG > CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGATGAAT > AAATATACATATATGCAACATATATAACTTC > CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGGCAGA > GCAGCGATCAAAGCTGCGTTCGGTACTCGTT > GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT > ex2 Strand: 1 > ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG > overlap Strand: 1 > CAACGCAGACGTG > > Is there something I am missing? Here is the script generating the output > > Many thanks all... > > Marco > > > use strict; > use warnings; > use Bio::DB::GFF; > > MAIN:{ > > my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', > -dsn => > 'dbi:mysql:database=dmel_43_LS;host=riolab.net', > -user => 'guest'); > my $test_db = $db->segment('4'); > > # Load up the exons into $exons_p > for my $gene ($test_db->features(-types => 'gene')){ > > my $exons_p = extractExons($gene); > > cluster($exons_p) unless ($#{$exons_p} == -1); > > } > } > > sub extractExons { > my $gene = shift; > my %ex_list; > my @tcs = $gene->features( -type =>'processed_transcript', > -attributes =>{Gene => $gene->group}); > > for my $tc (@tcs){ > my @exons = $tc->features (-type => 'exon', > -attributes => {Parent => $tc->group} > ); > > for (@exons){ > my $ex_id = $_->id; > $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); > > } > > } > my @values = values %ex_list; > return(\@values); > } > > sub cluster { > my $exons_p = shift; > > for (my $s = 0; $s <= $#{$exons_p}; $s++){ > for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ > my $exon1 = $exons_p->[$s]; > my $exon2 = $exons_p->[$t]; > > if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ > > my $overlap = $exon1->intersection($exon2); > > print "===\n";; > print "ex1\tStrand: ", $exon1->strand, "\n", > $exon1->seq, "\n"; > print "ex2\tStrand: ", $exon2->strand, "\n", > $exon2->seq, "\n"; > print "overlap\tStrand: ", $overlap->strand, "\n", > $overlap->seq, "\n"; > } > } > } > } > > On 5/2/06 13:17, "Brian Osborne" wrote: > >> Marco, >> >> Yes, this is how intersection() is supposed to work. If both of the Range >> objects have the same strand then the strand information is returned as part >> of the result but if they aren't on the same strand then no strand >> information is returned. >> >> Brian O. >> >> >> On 5/2/06 3:30 PM, "Marco Blanchette" wrote: >> >>> Dear all-- >>> >>> I have been trying to use the intersection function to extract overlapping >>> region from alternatively spliced exons as in the following script. The >>> returned object from the 'my $overlap = $exon1->intersection($exon2);' is >>> actually loosing the strand of $exon1 if $exon1 is from the negative strand. >>> Is this behavior expected? Should I check the strand of $exon1 before >>> working on the object return by any Bio::RangeI function? >>> >>> Many thanks >>> >>> #!/usr/bin/perl >>> use strict; >>> use warnings; >>> use Bio::DB::GFF; >>> >>> MAIN:{ >>> >>> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', >>> -dsn => >>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net', >>> -user => 'guest'); >>> my $test_db = $db->segment('4'); >>> >>> # Load up the exons into $exons_p >>> for my $gene ($test_db->features(-types => 'gene')){ >>> >>> my $exons_p = extractExons($gene); >>> >>> cluster($exons_p) unless ($#{$exons_p} == -1); >>> >>> } >>> } >>> >>> sub extractExons { >>> my $gene = shift; >>> my %ex_list; >>> my @tcs = $gene->features( -type =>'processed_transcript', >>> -attributes =>{Gene => $gene->group}); >>> >>> for my $tc (@tcs){ >>> my @exons = $tc->features (-type => 'exon', >>> -attributes => {Parent => $tc->group} >>> ); >>> >>> for (@exons){ >>> my $ex_id = $_->id; >>> $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); >>> >>> } >>> >>> } >>> my @values = values %ex_list; >>> return(\@values); >>> } >>> >>> sub cluster { >>> my $exons_p = shift; >>> >>> for (my $s = 0; $s <= $#{$exons_p}; $s++){ >>> for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ >>> my $exon1 = $exons_p->[$s]; >>> my $exon2 = $exons_p->[$t]; >>> >>> if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ >>> >>> my $overlap = $exon1->intersection($exon2); >>> >>> print "===\n";; >>> print "ex1\n", $exon1->seq, "\n"; >>> print "ex2\n", $exon2->seq, "\n"; >>> print "overlap\n", $overlap->seq, "\n"; >>> } >>> } >>> } >>> } >>> ______________________________ >>> Marco Blanchette, Ph.D. >>> >>> mblanche at uclink.berkeley.edu >>> >>> Donald C. Rio's lab >>> Department of Molecular and Cell Biology >>> 16 Barker Hall >>> University of California >>> Berkeley, CA 94720-3204 >>> >>> Tel: (510) 642-1084 >>> Cell: (510) 847-0996 >>> Fax: (510) 642-6062 >> >> > > ______________________________ > Marco Blanchette, Ph.D. > > mblanche at uclink.berkeley.edu > > Donald C. Rio's lab > Department of Molecular and Cell Biology > 16 Barker Hall > University of California > Berkeley, CA 94720-3204 > > Tel: (510) 642-1084 > Cell: (510) 847-0996 > Fax: (510) 642-6062 From mblanche at berkeley.edu Tue May 2 18:31:44 2006 From: mblanche at berkeley.edu (Marco Blanchette) Date: Tue, 02 May 2006 15:31:44 -0700 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF In-Reply-To: Message-ID: Brian-- I checked out last week version from the CVS. Silly question: How do I get the version of BioPerl I am using... Never had to check a module/bundle version number before... Marco On 5/2/06 14:49, "Brian Osborne" wrote: > Marco, > > Odd, because the intersection() code is quite simple and it's clear how it > should behave. What version of Bioperl are you using? I'm looking at the > latest, in bioperl-live... > > Brian O. > > > On 5/2/06 4:32 PM, "Marco Blanchette" wrote: > >> Brian-- >> >> Even when both elements of intersection() are from the negative strand, the >> return object is from the positive strand and $overlap is actually the >> revervese complement of the intersection between the 2 exons. Here is part >> of the output from the script below: >> >> === >> ex1 Strand: -1 >> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA >> AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG >> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG >> ex2 Strand: -1 >> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA >> AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG >> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAACCCGT >> CAAATCG >> overlap Strand: 1 >> CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTATTTT >> TGCCGACTGCCATGTTCAACTAATAAACCGG >> AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG >> ... >> >> If both are from the positive strand, the return object is positive as in: >> >> === >> ex1 Strand: 1 >> CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCTTTGT >> TTTGTGCCTGTTTCAGTATAAATTAATTATG >> CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGATGAAT >> AAATATACATATATGCAACATATATAACTTC >> CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGGCAGA >> GCAGCGATCAAAGCTGCGTTCGGTACTCGTT >> GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT >> ex2 Strand: 1 >> ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG >> overlap Strand: 1 >> CAACGCAGACGTG >> >> Is there something I am missing? Here is the script generating the output >> >> Many thanks all... >> >> Marco >> >> >> use strict; >> use warnings; >> use Bio::DB::GFF; >> >> MAIN:{ >> >> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', >> -dsn => >> 'dbi:mysql:database=dmel_43_LS;host=riolab.net', >> -user => 'guest'); >> my $test_db = $db->segment('4'); >> >> # Load up the exons into $exons_p >> for my $gene ($test_db->features(-types => 'gene')){ >> >> my $exons_p = extractExons($gene); >> >> cluster($exons_p) unless ($#{$exons_p} == -1); >> >> } >> } >> >> sub extractExons { >> my $gene = shift; >> my %ex_list; >> my @tcs = $gene->features( -type =>'processed_transcript', >> -attributes =>{Gene => $gene->group}); >> >> for my $tc (@tcs){ >> my @exons = $tc->features (-type => 'exon', >> -attributes => {Parent => $tc->group} >> ); >> >> for (@exons){ >> my $ex_id = $_->id; >> $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); >> >> } >> >> } >> my @values = values %ex_list; >> return(\@values); >> } >> >> sub cluster { >> my $exons_p = shift; >> >> for (my $s = 0; $s <= $#{$exons_p}; $s++){ >> for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ >> my $exon1 = $exons_p->[$s]; >> my $exon2 = $exons_p->[$t]; >> >> if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ >> >> my $overlap = $exon1->intersection($exon2); >> >> print "===\n";; >> print "ex1\tStrand: ", $exon1->strand, "\n", >> $exon1->seq, "\n"; >> print "ex2\tStrand: ", $exon2->strand, "\n", >> $exon2->seq, "\n"; >> print "overlap\tStrand: ", $overlap->strand, "\n", >> $overlap->seq, "\n"; >> } >> } >> } >> } >> >> On 5/2/06 13:17, "Brian Osborne" wrote: >> >>> Marco, >>> >>> Yes, this is how intersection() is supposed to work. If both of the Range >>> objects have the same strand then the strand information is returned as part >>> of the result but if they aren't on the same strand then no strand >>> information is returned. >>> >>> Brian O. >>> >>> >>> On 5/2/06 3:30 PM, "Marco Blanchette" wrote: >>> >>>> Dear all-- >>>> >>>> I have been trying to use the intersection function to extract overlapping >>>> region from alternatively spliced exons as in the following script. The >>>> returned object from the 'my $overlap = $exon1->intersection($exon2);' is >>>> actually loosing the strand of $exon1 if $exon1 is from the negative >>>> strand. >>>> Is this behavior expected? Should I check the strand of $exon1 before >>>> working on the object return by any Bio::RangeI function? >>>> >>>> Many thanks >>>> >>>> #!/usr/bin/perl >>>> use strict; >>>> use warnings; >>>> use Bio::DB::GFF; >>>> >>>> MAIN:{ >>>> >>>> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', >>>> -dsn => >>>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net', >>>> -user => 'guest'); >>>> my $test_db = $db->segment('4'); >>>> >>>> # Load up the exons into $exons_p >>>> for my $gene ($test_db->features(-types => 'gene')){ >>>> >>>> my $exons_p = extractExons($gene); >>>> >>>> cluster($exons_p) unless ($#{$exons_p} == -1); >>>> >>>> } >>>> } >>>> >>>> sub extractExons { >>>> my $gene = shift; >>>> my %ex_list; >>>> my @tcs = $gene->features( -type =>'processed_transcript', >>>> -attributes =>{Gene => $gene->group}); >>>> >>>> for my $tc (@tcs){ >>>> my @exons = $tc->features (-type => 'exon', >>>> -attributes => {Parent => $tc->group} >>>> ); >>>> >>>> for (@exons){ >>>> my $ex_id = $_->id; >>>> $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); >>>> >>>> } >>>> >>>> } >>>> my @values = values %ex_list; >>>> return(\@values); >>>> } >>>> >>>> sub cluster { >>>> my $exons_p = shift; >>>> >>>> for (my $s = 0; $s <= $#{$exons_p}; $s++){ >>>> for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ >>>> my $exon1 = $exons_p->[$s]; >>>> my $exon2 = $exons_p->[$t]; >>>> >>>> if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ >>>> >>>> my $overlap = $exon1->intersection($exon2); >>>> >>>> print "===\n";; >>>> print "ex1\n", $exon1->seq, "\n"; >>>> print "ex2\n", $exon2->seq, "\n"; >>>> print "overlap\n", $overlap->seq, "\n"; >>>> } >>>> } >>>> } >>>> } >>>> ______________________________ >>>> Marco Blanchette, Ph.D. >>>> >>>> mblanche at uclink.berkeley.edu >>>> >>>> Donald C. Rio's lab >>>> Department of Molecular and Cell Biology >>>> 16 Barker Hall >>>> University of California >>>> Berkeley, CA 94720-3204 >>>> >>>> Tel: (510) 642-1084 >>>> Cell: (510) 847-0996 >>>> Fax: (510) 642-6062 >>> >>> >> >> ______________________________ >> Marco Blanchette, Ph.D. >> >> mblanche at uclink.berkeley.edu >> >> Donald C. Rio's lab >> Department of Molecular and Cell Biology >> 16 Barker Hall >> University of California >> Berkeley, CA 94720-3204 >> >> Tel: (510) 642-1084 >> Cell: (510) 847-0996 >> Fax: (510) 642-6062 > > ______________________________ Marco Blanchette, Ph.D. mblanche at uclink.berkeley.edu Donald C. Rio's lab Department of Molecular and Cell Biology 16 Barker Hall University of California Berkeley, CA 94720-3204 Tel: (510) 642-1084 Cell: (510) 847-0996 Fax: (510) 642-6062 -- From arareko at campus.iztacala.unam.mx Tue May 2 18:32:24 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Tue, 02 May 2006 17:32:24 -0500 Subject: [Bioperl-l] BioPerl-run in FreeBSD Message-ID: <4457DDF8.4050005@campus.iztacala.unam.mx> It?s my great pleasure to announce the availability of the BioPerl-run packages (stable & developer releases) for the FreeBSD operating system. For instructions on how to install BioPerl ports in FreeBSD, please take a look into the Getting Bioperl section of the BioPerl Wiki. Regards, Mauricio. -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From heikki at sanbi.ac.za Wed May 3 02:51:12 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 3 May 2006 08:51:12 +0200 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF In-Reply-To: References: Message-ID: <200605030851.13007.heikki@sanbi.ac.za> On Wednesday 03 May 2006 00:31, Marco Blanchette wrote: > Brian-- > > I checked out last week version from the CVS. > > Silly question: How do I get the version of BioPerl I am using... Never had > to check a module/bundle version number before... It is not that silly. The syntax in not too easy: perl -MBio::Perl -le 'print Bio::Perl->VERSION;' You can use any module in bioperl, of course. -Heikki > Marco > > On 5/2/06 14:49, "Brian Osborne" wrote: > > Marco, > > > > Odd, because the intersection() code is quite simple and it's clear how > > it should behave. What version of Bioperl are you using? I'm looking at > > the latest, in bioperl-live... > > > > Brian O. > > > > On 5/2/06 4:32 PM, "Marco Blanchette" wrote: > >> Brian-- > >> > >> Even when both elements of intersection() are from the negative strand, > >> the return object is from the positive strand and $overlap is actually > >> the revervese complement of the intersection between the 2 exons. Here > >> is part of the output from the script below: > >> > >> === > >> ex1 Strand: -1 > >> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAA > >>AATA AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG > >> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG > >> ex2 Strand: -1 > >> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAA > >>AATA AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG > >> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAAC > >>CCGT CAAATCG > >> overlap Strand: 1 > >> CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTA > >>TTTT TGCCGACTGCCATGTTCAACTAATAAACCGG > >> AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG > >> ... > >> > >> If both are from the positive strand, the return object is positive as > >> in: > >> > >> === > >> ex1 Strand: 1 > >> CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCT > >>TTGT TTTGTGCCTGTTTCAGTATAAATTAATTATG > >> CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGAT > >>GAAT AAATATACATATATGCAACATATATAACTTC > >> CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGG > >>CAGA GCAGCGATCAAAGCTGCGTTCGGTACTCGTT > >> GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT > >> ex2 Strand: 1 > >> ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG > >> overlap Strand: 1 > >> CAACGCAGACGTG > >> > >> Is there something I am missing? Here is the script generating the > >> output > >> > >> Many thanks all... > >> > >> Marco > >> > >> > >> use strict; > >> use warnings; > >> use Bio::DB::GFF; > >> > >> MAIN:{ > >> > >> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', > >> -dsn => > >> 'dbi:mysql:database=dmel_43_LS;host=riolab.net', > >> -user => 'guest'); > >> my $test_db = $db->segment('4'); > >> > >> # Load up the exons into $exons_p > >> for my $gene ($test_db->features(-types => 'gene')){ > >> > >> my $exons_p = extractExons($gene); > >> > >> cluster($exons_p) unless ($#{$exons_p} == -1); > >> > >> } > >> } > >> > >> sub extractExons { > >> my $gene = shift; > >> my %ex_list; > >> my @tcs = $gene->features( -type =>'processed_transcript', > >> -attributes =>{Gene => > >> $gene->group}); > >> > >> for my $tc (@tcs){ > >> my @exons = $tc->features (-type => 'exon', > >> -attributes => {Parent => > >> $tc->group} ); > >> > >> for (@exons){ > >> my $ex_id = $_->id; > >> $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); > >> > >> } > >> > >> } > >> my @values = values %ex_list; > >> return(\@values); > >> } > >> > >> sub cluster { > >> my $exons_p = shift; > >> > >> for (my $s = 0; $s <= $#{$exons_p}; $s++){ > >> for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ > >> my $exon1 = $exons_p->[$s]; > >> my $exon2 = $exons_p->[$t]; > >> > >> if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ > >> > >> my $overlap = $exon1->intersection($exon2); > >> > >> print "===\n";; > >> print "ex1\tStrand: ", $exon1->strand, "\n", > >> $exon1->seq, "\n"; > >> print "ex2\tStrand: ", $exon2->strand, "\n", > >> $exon2->seq, "\n"; > >> print "overlap\tStrand: ", $overlap->strand, "\n", > >> $overlap->seq, "\n"; > >> } > >> } > >> } > >> } > >> > >> On 5/2/06 13:17, "Brian Osborne" wrote: > >>> Marco, > >>> > >>> Yes, this is how intersection() is supposed to work. If both of the > >>> Range objects have the same strand then the strand information is > >>> returned as part of the result but if they aren't on the same strand > >>> then no strand information is returned. > >>> > >>> Brian O. > >>> > >>> On 5/2/06 3:30 PM, "Marco Blanchette" wrote: > >>>> Dear all-- > >>>> > >>>> I have been trying to use the intersection function to extract > >>>> overlapping region from alternatively spliced exons as in the > >>>> following script. The returned object from the 'my $overlap = > >>>> $exon1->intersection($exon2);' is actually loosing the strand of > >>>> $exon1 if $exon1 is from the negative strand. > >>>> Is this behavior expected? Should I check the strand of $exon1 before > >>>> working on the object return by any Bio::RangeI function? > >>>> > >>>> Many thanks > >>>> > >>>> #!/usr/bin/perl > >>>> use strict; > >>>> use warnings; > >>>> use Bio::DB::GFF; > >>>> > >>>> MAIN:{ > >>>> > >>>> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', > >>>> -dsn => > >>>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net', > >>>> -user => 'guest'); > >>>> my $test_db = $db->segment('4'); > >>>> > >>>> # Load up the exons into $exons_p > >>>> for my $gene ($test_db->features(-types => 'gene')){ > >>>> > >>>> my $exons_p = extractExons($gene); > >>>> > >>>> cluster($exons_p) unless ($#{$exons_p} == -1); > >>>> > >>>> } > >>>> } > >>>> > >>>> sub extractExons { > >>>> my $gene = shift; > >>>> my %ex_list; > >>>> my @tcs = $gene->features( -type =>'processed_transcript', > >>>> -attributes =>{Gene => > >>>> $gene->group}); > >>>> > >>>> for my $tc (@tcs){ > >>>> my @exons = $tc->features (-type => 'exon', > >>>> -attributes => {Parent => > >>>> $tc->group} ); > >>>> > >>>> for (@exons){ > >>>> my $ex_id = $_->id; > >>>> $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); > >>>> > >>>> } > >>>> > >>>> } > >>>> my @values = values %ex_list; > >>>> return(\@values); > >>>> } > >>>> > >>>> sub cluster { > >>>> my $exons_p = shift; > >>>> > >>>> for (my $s = 0; $s <= $#{$exons_p}; $s++){ > >>>> for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ > >>>> my $exon1 = $exons_p->[$s]; > >>>> my $exon2 = $exons_p->[$t]; > >>>> > >>>> if (!($exon1->equals($exon2)) && > >>>> $exon1->overlaps($exon2)){ > >>>> > >>>> my $overlap = $exon1->intersection($exon2); > >>>> > >>>> print "===\n";; > >>>> print "ex1\n", $exon1->seq, "\n"; > >>>> print "ex2\n", $exon2->seq, "\n"; > >>>> print "overlap\n", $overlap->seq, "\n"; > >>>> } > >>>> } > >>>> } > >>>> } > >>>> ______________________________ > >>>> Marco Blanchette, Ph.D. > >>>> > >>>> mblanche at uclink.berkeley.edu > >>>> > >>>> Donald C. Rio's lab > >>>> Department of Molecular and Cell Biology > >>>> 16 Barker Hall > >>>> University of California > >>>> Berkeley, CA 94720-3204 > >>>> > >>>> Tel: (510) 642-1084 > >>>> Cell: (510) 847-0996 > >>>> Fax: (510) 642-6062 > >> > >> ______________________________ > >> Marco Blanchette, Ph.D. > >> > >> mblanche at uclink.berkeley.edu > >> > >> Donald C. Rio's lab > >> Department of Molecular and Cell Biology > >> 16 Barker Hall > >> University of California > >> Berkeley, CA 94720-3204 > >> > >> Tel: (510) 642-1084 > >> Cell: (510) 847-0996 > >> Fax: (510) 642-6062 > > ______________________________ > Marco Blanchette, Ph.D. > > mblanche at uclink.berkeley.edu > > Donald C. Rio's lab > Department of Molecular and Cell Biology > 16 Barker Hall > University of California > Berkeley, CA 94720-3204 > > Tel: (510) 642-1084 > Cell: (510) 847-0996 > Fax: (510) 642-6062 -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From nuclearn at gmail.com Wed May 3 02:05:42 2006 From: nuclearn at gmail.com (Li Xiao) Date: Wed, 3 May 2006 14:05:42 +0800 Subject: [Bioperl-l] about the frame and strand of a blastx report Message-ID: <150864390605022305p5a04e743l24938386af12edf3@mail.gmail.com> Hi, anybody, I am working to parse a blastx report by using BioPerl modules (Bio::SearchIO). The blastx result was created by NCBI-BLAST. How i can obtain the strand ( + or -) of query sequence against the hited protein? I tried to use the strand function, but nothing were reported. And i used the frame funtion, the result usually display 0,1,2, so, the result can not give any information about the query strand( + o r- ). How i obtain the strand of a query squence? -- ********************************************************************* Li Xiao Sichuan Key Laboratory of Molecular Biology and Biotechnology College of Life Science, Sichuan University Chengdu, SiChuan, P.R.China TEL:86-28-85470083 FAX:86-28-85412738 E-MAIL: nuclearn at gmail.com URL: http://scbi.scu.edu.cn ********************************************************************** From cjfields at uiuc.edu Wed May 3 09:38:17 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 3 May 2006 08:38:17 -0500 Subject: [Bioperl-l] about the frame and strand of a blastx report In-Reply-To: <150864390605022305p5a04e743l24938386af12edf3@mail.gmail.com> Message-ID: <000601c66eb6$d5d5f530$15327e82@pyrimidine> $hsp->strand(): my $parser = Bio::SearchIO->new (-file => shift @ARGV, -format => 'blast'); while (my $result = $parser->next_result) { while (my $hit = $result->next_hit) { while (my $hsp = $hit->next_hsp) { print $hsp->strand,"\n"; } } } This will give 1 or -1. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Li Xiao > Sent: Wednesday, May 03, 2006 1:06 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] about the frame and strand of a blastx report > > Hi, anybody, > > I am working to parse a blastx report by using BioPerl modules > (Bio::SearchIO). > The blastx result was created by NCBI-BLAST. How i can obtain the strand ( > + > or -) > of query sequence against the hited protein? I tried to use the strand > function, but > nothing were reported. And i used the frame funtion, the result usually > display 0,1,2, > so, the result can not give any information about the query strand( + o r- > ). > How i obtain the strand of a query squence? > -- > ********************************************************************* > Li Xiao > Sichuan Key Laboratory of Molecular Biology and Biotechnology > College of Life Science, Sichuan University > Chengdu, SiChuan, P.R.China > TEL:86-28-85470083 FAX:86-28-85412738 > E-MAIL: nuclearn at gmail.com > URL: http://scbi.scu.edu.cn > ********************************************************************** > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Wed May 3 11:22:27 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 03 May 2006 11:22:27 -0400 Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines In-Reply-To: <20060502114101.29745.qmail@web50409.mail.yahoo.com> Message-ID: Mark, So you're trying to get the information in the RC line from a Swissprot format file? Brian O. On 5/2/06 7:41 AM, "Mark A. Miller" wrote: > Hello all. > > I have a recently donwloaded UniProt/TrEMBL flat file. I am trying to > make FASTA subset files for some bacterial strains. I haven't been > able to parse out the strain information from the OS or RC lines. > These lines typically look like: > > OS Somegenus somespecies subsp. somesubspecies strain ABC123. > RC STRAIN=ABC123. > > I'm not especiialy good with Perl, and I'm definitely weak when it > comes to OOP. > > I have included some code I pasted together from various pages on the > bioperl wiki. In addition to the wiki, I have been making use of > www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html > > The code I have so far reports the species but not the subspecies or > variant. I have also tried to walk through all of the feature, > annotation and reference objects but I still can't seem to parse out > the information I need. (For brevity, the example I'm including below > only lists the code I used for the annotation objects.) Also, this > code only prints the information... I know that I'll have to write a > FASTA sequence object seperately. > > Any suggestions? > > Thanks, > Mark > > --- --- --- > > > #!/usr/bin/perl > > > > use Bio::SeqIO; > > > > my $usage = "getaccs.pl file format\n"; > > my $file = shift or die $usage; > > my $format = shift or die $usage; > > > > my $inseq = Bio::SeqIO->new(-file => "<$file", > > -format => $format ); > > > > while (my $seq = $inseq->next_seq) { > > > > my $species_object = $seq->species; > > my $species_string = $species_object->species; > > my $variant_string = $species_object->variant; > > my $common_string = $species_object->common_name; > > my $sub_string = $species_object->sub_species; > > my $binomial = $species_object->binomial('FULL'); > > > > print "display ",$seq->display_id,"\n"; > > print "accession ",$seq->accession_number,"\n"; > > print "desc ",$seq->desc,"\n"; > > > > print "species ",$species_string,"\n"; > > print "variant ",$variant_string,"\n"; > > print "common ",$common_string,"\n"; > > print "sub ",$sub_string,"\n"; > > print "binomial ",$binomial,"\n"; > > > > print $seq->seq,"\n"; > > > > my $anno_collection = $seq->annotation; > > for my $key ( $anno_collection->get_all_annotation_keys ) { > > my @annotations = $anno_collection->get_Annotations($key); > > for my $value ( @annotations ) { > > print "tagname : ", $value->tagname, "\n"; > > # $value is an Bio::Annotation, and has an "as_text" method > > print " annotation value: ", $value->as_text, "\n"; > > > > if ($value->tagname eq "reference") { > > my $hash_ref = $value->hash_tree; > > for my $key (keys %{$hash_ref}) { > > print $key,": ",$hash_ref->{$key},"\n"; > > } > > } > > } > > } > > print "\n"; > > } > > exit; > > > > > > --- --- --- --- --- --- --- --- > > Mark A. Miller > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From MEC at stowers-institute.org Wed May 3 11:09:04 2006 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Wed, 3 May 2006 10:09:04 -0500 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF Message-ID: Marco, It appears that your code assumes that the exons as returned from call to BIO::DB::GFF::features are sorted by start; I don't think is guaranteed (at least not in the documentation I'm reading). Also I think your code will not report overlap between two exons that have an intervening overlapping exon. Depending on what you're application is, you may care. For example, e1, e2, e3 all intersect pairwise, but your code won't report on e1's overlap with e3. e1 ---*******------- e2 -----******------ e3 ------***-------- Out of curiousity, what is your application? Designing primers for gene resequencing? Cheers, Malcolm Cook Database Applications Manager, Bioinformatics Stowers Institute for Medical Research >-----Original Message----- >From: bioperl-l-bounces at lists.open-bio.org >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >Marco Blanchette >Sent: Tuesday, May 02, 2006 2:31 PM >To: bioperl-l at lists.open-bio.org >Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF > >Dear all-- > >I have been trying to use the intersection function to extract >overlapping >region from alternatively spliced exons as in the following script. The >returned object from the 'my $overlap = >$exon1->intersection($exon2);' is >actually loosing the strand of $exon1 if $exon1 is from the >negative strand. >Is this behavior expected? Should I check the strand of $exon1 before >working on the object return by any Bio::RangeI function? > >Many thanks > >#!/usr/bin/perl >use strict; >use warnings; >use Bio::DB::GFF; > >MAIN:{ > > my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', > -dsn => >'dbi:mysql:database=dmel_43_LS;host=riolab.net', > -user => 'guest'); > my $test_db = $db->segment('4'); > > # Load up the exons into $exons_p > for my $gene ($test_db->features(-types => 'gene')){ > > my $exons_p = extractExons($gene); > > cluster($exons_p) unless ($#{$exons_p} == -1); > > } >} > >sub extractExons { > my $gene = shift; > my %ex_list; > my @tcs = $gene->features( -type =>'processed_transcript', > -attributes =>{Gene => >$gene->group}); > > for my $tc (@tcs){ > my @exons = $tc->features (-type => 'exon', > -attributes => {Parent => >$tc->group} >); > > for (@exons){ > my $ex_id = $_->id; > $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); > > } > > } > my @values = values %ex_list; > return(\@values); >} > >sub cluster { > my $exons_p = shift; > > for (my $s = 0; $s <= $#{$exons_p}; $s++){ > for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ > my $exon1 = $exons_p->[$s]; > my $exon2 = $exons_p->[$t]; > > if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ > > my $overlap = $exon1->intersection($exon2); > > print "===\n";; > print "ex1\n", $exon1->seq, "\n"; > print "ex2\n", $exon2->seq, "\n"; > print "overlap\n", $overlap->seq, "\n"; > } > } > } >} >______________________________ >Marco Blanchette, Ph.D. > >mblanche at uclink.berkeley.edu > >Donald C. Rio's lab >Department of Molecular and Cell Biology >16 Barker Hall >University of California >Berkeley, CA 94720-3204 > >Tel: (510) 642-1084 >Cell: (510) 847-0996 >Fax: (510) 642-6062 >-- > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From sdavis2 at mail.nih.gov Wed May 3 12:18:48 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 03 May 2006 12:18:48 -0400 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF In-Reply-To: Message-ID: On 5/3/06 11:09 AM, "Cook, Malcolm" wrote: > Marco, > > It appears that your code assumes that the exons as returned from call > to BIO::DB::GFF::features are sorted by start; I don't think is > guaranteed (at least not in the documentation I'm reading). Also I > think your code will not report overlap between two exons that have an > intervening overlapping exon. Depending on what you're application is, > you may care. For example, e1, e2, e3 all intersect pairwise, but your > code won't report on e1's overlap with e3. > > e1 ---*******------- > e2 -----******------ > e3 ------***-------- I think this can be done (looking for "superexons") via the UCSC table browser or via Penn State University's Galaxy server (written in python and downloadable) in case you want a quick solution to what I think is your problem.... Sean From osborne1 at optonline.net Wed May 3 16:22:57 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 03 May 2006 16:22:57 -0400 Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines In-Reply-To: <20060503193446.92476.qmail@web50412.mail.yahoo.com> Message-ID: Mark, The RC line is part of the description of a reference, I'm guessing 'RC' stands for Reference Comment. In order to get the attributes of a reference you'll first do something like: my $anno_collection = $seq->annotation; my @references = $anno_collection->get_Annotations('reference'); To get the comment field for a specific reference you can do: $references[0]->comment; See the Feature-Annotation HOWTO for more information on Annotations, the Reference object is a kind of Annotation object. Brian O. On 5/3/06 3:34 PM, "Mark A. Miller" wrote: > Yeah. Do you have any experience with that? > > Mark > > --- Brian Osborne wrote: > >> Mark, >> >> So you're trying to get the information in the RC line from a >> Swissprot >> format file? >> >> Brian O. > > > --- --- --- --- --- --- --- --- > > Mark A. Miller > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com From cjfields at uiuc.edu Wed May 3 17:09:36 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 3 May 2006 16:09:36 -0500 Subject: [Bioperl-l] Batch retrieval partially implemented in Bio::DB::GenBank/GenPept Message-ID: <000601c66ef5$e3066d90$15327e82@pyrimidine> Just wanted to let you guys know I have added a few bits and pieces to Bio::DB::Gen* and BioLLDB::NCBIHelper for batch retrieval using epost/efetch. I didn't want to break anything too severely so you can only use this at the moment using get_seq_stream (i.e. NOT through get_Stream* methods yet). I also added tests to DB.t, a few each for protein and nucleotide retrieval using batch mode and so far they all pass fine. I haven't tested the upper sequence limit for this yet to see if it's at all comparable to just using efetch but it seems a bit faster. The eutils coursebook states that one should only post ~500 at a time (I think you can get a bit higher though). Also, at the moment it only works at the moment for GI's (NOT accessions, which apparently epost does not accept). If we want to continue using this method for retrieval then we may need a workaround for accs. CJF Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From torsten.seemann at infotech.monash.edu.au Wed May 3 17:44:48 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Thu, 04 May 2006 07:44:48 +1000 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF In-Reply-To: References: Message-ID: <1146692688.12571.1.camel@chauvel.csse.monash.edu.au> Marco, > Silly question: How do I get the version of BioPerl I am using... Never had > to check a module/bundle version number before... http://bioperl.org/wiki/FAQ#How_can_I_tell_what_version_of_BioPerl_is_installed.3F -- Torsten Seemann Victorian Bioinformatics Consortium From cjfields at uiuc.edu Wed May 3 18:08:37 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 3 May 2006 17:08:37 -0500 Subject: [Bioperl-l] Batch retrieval partially implemented inBio::DB::GenBank/GenPept In-Reply-To: <000601c66ef5$e3066d90$15327e82@pyrimidine> Message-ID: <000001c66efe$21dbcf80$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Wednesday, May 03, 2006 4:10 PM > To: 'Jason Stajich'; 'Brian Osborne'; bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Batch retrieval partially implemented > inBio::DB::GenBank/GenPept > > Just wanted to let you guys know I have added a few bits and pieces to > Bio::DB::Gen* and BioLLDB::NCBIHelper for batch retrieval using ^^^^^^^^^^^^^^^^^^^ Bio::DB::NCBIHelper Fat fingers! > epost/efetch. I didn't want to break anything too severely so you can > only > use this at the moment using get_seq_stream (i.e. NOT through get_Stream* > methods yet). I also added tests to DB.t, a few each for protein and > nucleotide retrieval using batch mode and so far they all pass fine. > > I haven't tested the upper sequence limit for this yet to see if it's at > all > comparable to just using efetch but it seems a bit faster. The eutils > coursebook states that one should only post ~500 at a time (I think you > can > get a bit higher though). > > Also, at the moment it only works at the moment for GI's (NOT accessions, > which apparently epost does not accept). If we want to continue using > this > method for retrieval then we may need a workaround for accs. > > CJF > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From arareko at campus.iztacala.unam.mx Wed May 3 18:24:23 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Wed, 03 May 2006 17:24:23 -0500 Subject: [Bioperl-l] Batch retrieval partially implemented inBio::DB::GenBank/GenPept In-Reply-To: <000001c66efe$21dbcf80$15327e82@pyrimidine> References: <000001c66efe$21dbcf80$15327e82@pyrimidine> Message-ID: <44592D97.6090906@campus.iztacala.unam.mx> hehehe :) Chris Fields wrote: > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Chris Fields >> Sent: Wednesday, May 03, 2006 4:10 PM >> To: 'Jason Stajich'; 'Brian Osborne'; bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Batch retrieval partially implemented >> inBio::DB::GenBank/GenPept >> >> Just wanted to let you guys know I have added a few bits and pieces to >> Bio::DB::Gen* and BioLLDB::NCBIHelper for batch retrieval using > ^^^^^^^^^^^^^^^^^^^ > Bio::DB::NCBIHelper > Fat fingers! > >> epost/efetch. I didn't want to break anything too severely so you can >> only >> use this at the moment using get_seq_stream (i.e. NOT through get_Stream* >> methods yet). I also added tests to DB.t, a few each for protein and >> nucleotide retrieval using batch mode and so far they all pass fine. >> >> I haven't tested the upper sequence limit for this yet to see if it's at >> all >> comparable to just using efetch but it seems a bit faster. The eutils >> coursebook states that one should only post ~500 at a time (I think you >> can >> get a bit higher though). >> >> Also, at the moment it only works at the moment for GI's (NOT accessions, >> which apparently epost does not accept). If we want to continue using >> this >> method for retrieval then we may need a workaround for accs. >> >> CJF >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From fernan at iib.unsam.edu.ar Wed May 3 20:38:07 2006 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Wed, 3 May 2006 21:38:07 -0300 Subject: [Bioperl-l] BioPerl-run in FreeBSD In-Reply-To: <4457DDF8.4050005@campus.iztacala.unam.mx> References: <4457DDF8.4050005@campus.iztacala.unam.mx> Message-ID: <20060504003807.GA86447@iib.unsam.edu.ar> +----[ Mauricio Herrera Cuadra (02.May.2006 19:49): | | It?s my great pleasure to announce the availability of the BioPerl-run | packages (stable & developer releases) for the FreeBSD operating system. | | For instructions on how to install BioPerl ports in FreeBSD, please take | a look into the Getting Bioperl section of the BioPerl Wiki. | +----] Great job Mauricio, thanks for contributing this! Fernan From miker at biotiquesystems.com Tue May 2 23:31:59 2006 From: miker at biotiquesystems.com (Michael Rogoff) Date: Tue, 2 May 2006 20:31:59 -0700 Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps Message-ID: <007b01c66e62$23161d20$c100a8c0@mike> I've encountered a pretty serious bug in Bio::SeqIO when parsing certain genbank files that contain CONTIG entries with gaps. One such record is NW_925173. When I try to parse this file using Bio::SeqIO::genbank, it will enter an infinite loop and spin until it runs out of memory. I'm pretty certain it relates to this bug: http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to indicate that genbank records with CONTIG gaps are not valid and can't be parsed. But this bug actually claims to be fixed, which is strange, since looking at the code for FTLocationFactory (where the loop is) it's still right there. I assume that this may be fixed in other contexts but is still not fixed in Bio::SeqIO::genbank? Or am I doing something wrong? I think that this should probably be filed as an open bug. I would think that even if bioperl isn't interested in parsing this type of file via SeqIO, certainly you'd want to ensure that no finite input file would send the parser into an infinite loop. Have others encountered this problem? Is there any plan to address it? Thanks very much for any information or help! -Mike P.S. I've played around with my version of FTLocationFactory and it seems to actually work and parse the gaps. I'm not sure if I've created other bugs or if it works in all cases, but at least the parser doesn't die. I also don't know that my hacky code is appropriate for putting back in to BioPerl, but I'm happy to provide it if someone wants to check it out and/or consider it for checkin. From ULNJUJERYDIX at spammotel.com Wed May 3 04:20:38 2006 From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau) Date: Wed, 3 May 2006 16:20:38 +0800 Subject: [Bioperl-l] Bio::Graphics::Panel imagemap making with Bio::Graphics::Panel Message-ID: <5b6410e0605030120q31d1f554mbc4bf104deca48bf@mail.gmail.com> Help! I can't figure out the docs instructions I want to create an imagemap of short sequence matches with a longer one with clickable imagemaps for the short sequences. I figure I can do this easily enough using the example script for parsing blast output but I need an example script to understand how to produce the html code for the imagemap. I can find only rather cryptic references about how this can be done (see below). $boxes = $panel-Eboxes @boxes = $panel-Eboxes The boxes() method returns a list of arrayrefs containing the coordinates of each glyph. The method is useful for constructing an image map. In a scalar context, boxes() returns an arrayref. In an list context, the method returns the list directly. Each member of the list is an arrayref of the following format: [ $feature, $x1, $y1, $x2, $y2, $track ] The first element is the feature object; either an Ace::Sequence::Feature, a Das::Segment::Feature, or another Bioperl Bio::SeqFeatureI object. The coordinates are the topleft and bottomright corners of the glyph, including any space allocated for labels. The track is the Bio::Graphics::Glyph object corresponding to the track that the feature is rendered inside. $position = $panel-Etrack_position($track) After calling gd() or boxes(), you can learn the resulting Y coordinate of a track by calling track_position() with the value returned by add_track() or unshift_track(). This will return undef if called before gd() or boxes() or with an invalid track. @pixel_coords = $panel-Elocation2pixel(@feature_coords) Public routine to map feature coordinates (in base pairs) into pixel coordinates relative to the left-hand edge of the picture. If you define a -background callback, the callback may wish to invoke this routine in order to translate base coordinates into pixel coordinates. $left = $panel-Eleft $right = $panel-Eright $top = $panel-Etop $bottom = $panel-Ebottom Return the pixel coordinates of the *drawing area* of the panel, that is, exclusive of the padding. got it from http://docs.bioperl.org/bioperl-live/Bio/Graphics/Panel.html From s.johri at imperial.ac.uk Thu May 4 08:50:34 2006 From: s.johri at imperial.ac.uk (Johri, Saurabh) Date: Thu, 4 May 2006 13:50:34 +0100 Subject: [Bioperl-l] Fu and Li's D statistic - calculate Message-ID: <4A98ACB8EC146149872BAC9A132A582C277AB3@icex5.ic.ac.uk> Hi all, I'm trying to calculate Fu and Li's D summary statistic for a group of sequences. the function fu_and_li_D(@ingroup,$extmutations) takes 2 args, the first being the ingroup (population) and the second being the number of external mutations which is calculated from an outgroup sequence.. my question is, which function do i use to calculate the number of external mutations ? would this be the singleton_count() function ? the singleton_count() function takes a PopGen object - which represents a clustal alignment file... would i include the outgroup in a multiple fasta file for alignment with clustal ? any suggestions as to how to calculate the number of external mutations would be much appreciated Thanks for your help! Saurabh Johri Centre for Molecular Microbiology & Infection Imperial College London SW7 2AZ From hlapp at gmx.net Thu May 4 12:30:05 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 4 May 2006 12:30:05 -0400 Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps In-Reply-To: <007b01c66e62$23161d20$c100a8c0@mike> References: <007b01c66e62$23161d20$c100a8c0@mike> Message-ID: Infinite loop on a file you can download (i.e., as opposed to a file you tinkered with) is never ok. Could you file this as a bug report? And ideally attach your patch? Thanks, -hilmar On May 2, 2006, at 11:31 PM, Michael Rogoff wrote: > > I've encountered a pretty serious bug in Bio::SeqIO when parsing > certain genbank > files that contain CONTIG entries with gaps. One such record is > NW_925173. > > When I try to parse this file using Bio::SeqIO::genbank, it will > enter an > infinite loop and spin until it runs out of memory. > > I'm pretty certain it relates to this bug: > http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to > indicate that > genbank records with CONTIG gaps are not valid and can't be > parsed. But this > bug actually claims to be fixed, which is strange, since looking at > the code for > FTLocationFactory (where the loop is) it's still right there. I > assume that > this may be fixed in other contexts but is still not fixed in > Bio::SeqIO::genbank? Or am I doing something wrong? > > I think that this should probably be filed as an open bug. I would > think that > even if bioperl isn't interested in parsing this type of file via > SeqIO, > certainly you'd want to ensure that no finite input file would send > the parser > into an infinite loop. Have others encountered this problem? Is > there any plan > to address it? > > Thanks very much for any information or help! > > -Mike > > P.S. I've played around with my version of FTLocationFactory and > it seems to > actually work and parse the gaps. I'm not sure if I've created > other bugs or if > it works in all cases, but at least the parser doesn't die. I also > don't know > that my hacky code is appropriate for putting back in to BioPerl, > but I'm happy > to provide it if someone wants to check it out and/or consider it > for checkin. > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From saldroubi at yahoo.com Thu May 4 13:03:00 2006 From: saldroubi at yahoo.com (Sam Al-Droubi) Date: Thu, 4 May 2006 10:03:00 -0700 (PDT) Subject: [Bioperl-l] Is webiste down? Message-ID: <20060504170300.12178.qmail@web34301.mail.mud.yahoo.com> All, Is the bioperl website down? I can't get to http://www.bioperl.org Thank you. Sincerely, Sam Al-Droubi, M.S. saldroubi at yahoo.com From arareko at campus.iztacala.unam.mx Thu May 4 14:22:52 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 04 May 2006 13:22:52 -0500 Subject: [Bioperl-l] Is webiste down? In-Reply-To: <20060504170300.12178.qmail@web34301.mail.mud.yahoo.com> References: <20060504170300.12178.qmail@web34301.mail.mud.yahoo.com> Message-ID: <445A467C.4070700@campus.iztacala.unam.mx> Website is ok, maybe your gateway can't lookup the bioperl server at the moment. Regards, Mauricio. Sam Al-Droubi wrote: > All, > > Is the bioperl website down? I can't get to http://www.bioperl.org > > > Thank you. > > > > Sincerely, > Sam Al-Droubi, M.S. > saldroubi at yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Thu May 4 14:40:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 4 May 2006 13:40:32 -0500 Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps In-Reply-To: <007b01c66e62$23161d20$c100a8c0@mike> Message-ID: <000001c66faa$3a25b130$15327e82@pyrimidine> Are you using the CONTIG record or the full GenBank file? I see problems with both (using bioperl-live) which seem unrelated to one another. The full file seems to be running a bit slow b/c the full GenBank record is huge (~55 MB) but the CONTIG file does exactly what you said (runs out of memory). Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Michael Rogoff > Sent: Tuesday, May 02, 2006 10:32 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > > I've encountered a pretty serious bug in Bio::SeqIO when parsing certain > genbank > files that contain CONTIG entries with gaps. One such record is > NW_925173. > > When I try to parse this file using Bio::SeqIO::genbank, it will enter an > infinite loop and spin until it runs out of memory. > > I'm pretty certain it relates to this bug: > http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to indicate > that > genbank records with CONTIG gaps are not valid and can't be parsed. But > this > bug actually claims to be fixed, which is strange, since looking at the > code for > FTLocationFactory (where the loop is) it's still right there. I assume > that > this may be fixed in other contexts but is still not fixed in > Bio::SeqIO::genbank? Or am I doing something wrong? > > I think that this should probably be filed as an open bug. I would think > that > even if bioperl isn't interested in parsing this type of file via SeqIO, > certainly you'd want to ensure that no finite input file would send the > parser > into an infinite loop. Have others encountered this problem? Is there > any plan > to address it? > > Thanks very much for any information or help! > > -Mike > > P.S. I've played around with my version of FTLocationFactory and it seems > to > actually work and parse the gaps. I'm not sure if I've created other bugs > or if > it works in all cases, but at least the parser doesn't die. I also don't > know > that my hacky code is appropriate for putting back in to BioPerl, but I'm > happy > to provide it if someone wants to check it out and/or consider it for > checkin. > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From j.abbott at imperial.ac.uk Thu May 4 11:44:44 2006 From: j.abbott at imperial.ac.uk (James Abbott) Date: Thu, 04 May 2006 16:44:44 +0100 Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines In-Reply-To: <7B49D031-9F74-43C3-AA4F-2AE115BB843D@duke.edu> References: <20060502114101.29745.qmail@web50409.mail.yahoo.com> <7B49D031-9F74-43C3-AA4F-2AE115BB843D@duke.edu> Message-ID: <445A216C.7090108@imperial.ac.uk> Jason Stajich wrote: > I don't know if any of this has been resolved really so hopefully > James will speak up if he's implemented anything. Not as yet, I'm afraid - $job is keeping me overly busy at the moment, but it's on my todo list.... Cheers, James -- Dr. James Abbott Bioinformatics Software Developer, Bioinformatics Support Service Imperial College, London From hubert.prielinger at gmx.at Thu May 4 15:35:42 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Thu, 04 May 2006 13:35:42 -0600 Subject: [Bioperl-l] can't parse blast file anymore Message-ID: <445A578E.8050207@gmx.at> Hi, the following perl script worked fine until a few days ago.... ============================================================== #!/usr/bin/perl -w use Bio::SearchIO; use strict; use DBI; use Net::MySQL; #use lib qw(/usr/local/lib/perl5/site_perl/5.8.6/i686-linux); print "trying to connect to database \n"; my $database = 'antimicro_peptides'; my $host = 'ppc7.bio.ucalgary.ca'; my $user = 'Hubert'; my $password = 'Col00eng30'; my $mysql = Net::MySQL->new( hostname => $host, database => $database, user => $user, password => $password, ); print "Connection established \n"; my $selectID = 0; my $count = 0; ##output database results #while (my @row = $sth->fetchrow_array) # { print "@row\n" } print "start program\n"; my $directory = '/home/Hubert/test'; opendir(DIR, $directory) || die("Cannot open directory"); print "opened directory\n"; foreach my $file (readdir(DIR)) { if ($file =~ /txt$/) { $count++; print "read file $file \n"; $file = $directory . '/' . $file; my $search = new Bio::SearchIO (-format => 'blast', -file => $file); print "bioperl seems to work....\n"; my $cutoff_len = 10; #iterate over each query sequence print "try to enter while loop\n"; while (my $result = $search->next_result) { print "entered 1st while loop\n"; #iterate over each hit on the query sequence while (my $hit = $result->next_hit) { print "entered 2nd while loop\n"; #iterate over each HSP in the hit while (my $hsp = $hit->next_hsp) { print "entered 3rd while loop\n"; if ($hsp->length('sbjct') <= $cutoff_len) { #print $hsp->hit_string, "\n"; for ($hsp->hit_string) { #$hsp->hit_string print "count files....., $count ,\n"; ................. =================================================================== Output: [Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl trying to connect to database Connection established start program opened directory read file 40026.txt bioperl seems to work.... try to enter while loop but it doesn't enter the first while loop, it stuck there, first I thought it is a linux problem, because I updated from FC4 to FC5, but it isn't because perl is working fine, and it seems bioperl is working fine too, but it cannot parse the file anymore..... regards Hubert From barry.moore at genetics.utah.edu Thu May 4 17:22:51 2006 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu, 4 May 2006 15:22:51 -0600 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <445A578E.8050207@gmx.at> References: <445A578E.8050207@gmx.at> Message-ID: Hubert, My first suggestion would be to log onto your calgary server and change your password real quick (unless that is intended to post you password to the world). Well, this isn't an answer, but it may help you find one. Use perl -d your_script.pl to run your script under the debugger. Type 'n' to step forward to the line where you start the while loop. Type 'x $result' to see that an object exists (it should or you'd have gotten an error). Type 's' to step into the next_results call, and then continue to type 'n' and 's' as needed to burrow down to see if you can find where you're hanging. Barry On May 4, 2006, at 1:35 PM, Hubert Prielinger wrote: > Hi, > the following perl script worked fine until a few days ago.... > > ============================================================== > #!/usr/bin/perl -w > > use Bio::SearchIO; > use strict; > use DBI; > use Net::MySQL; > > #use lib qw(/usr/local/lib/perl5/site_perl/5.8.6/i686-linux); > > print "trying to connect to database \n"; > my $database = 'antimicro_peptides'; > my $host = 'ppc7.bio.ucalgary.ca'; > my $user = 'Hubert'; > my $password = 'Col00eng30'; > > my $mysql = Net::MySQL->new( > hostname => $host, > database => $database, > user => $user, > password => $password, > ); > > > print "Connection established \n"; > > my $selectID = 0; > my $count = 0; > > > > ##output database results > #while (my @row = $sth->fetchrow_array) > # { print "@row\n" } > > > > print "start program\n"; > my $directory = '/home/Hubert/test'; > opendir(DIR, $directory) || die("Cannot open directory"); > print "opened directory\n"; > > foreach my $file (readdir(DIR)) { > if ($file =~ /txt$/) { > $count++; > print "read file $file \n"; > > > $file = $directory . '/' . $file; > > my $search = new Bio::SearchIO (-format => 'blast', > -file => $file); > print "bioperl seems to work....\n"; > my $cutoff_len = 10; > > #iterate over each query sequence > print "try to enter while loop\n"; > while (my $result = $search->next_result) { > print "entered 1st while loop\n"; > > #iterate over each hit on the query sequence > while (my $hit = $result->next_hit) { > print "entered 2nd while loop\n"; > > #iterate over each HSP in the hit > while (my $hsp = $hit->next_hsp) { > print "entered 3rd while loop\n"; > > if ($hsp->length('sbjct') <= $cutoff_len) { > #print $hsp->hit_string, "\n"; > > for ($hsp->hit_string) { #$hsp->hit_string > print "count files....., $count ,\n"; > ................. > > =================================================================== > > Output: > > [Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl > trying to connect to database > Connection established > start program > opened directory > read file 40026.txt > bioperl seems to work.... > try to enter while loop > > > but it doesn't enter the first while loop, it stuck there, first I > thought it is a linux problem, because I updated from FC4 to FC5, > but it > isn't because perl is working fine, and it seems bioperl is working > fine > too, but it cannot parse the file anymore..... > > regards > Hubert > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Thu May 4 18:27:57 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 4 May 2006 17:27:57 -0500 Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps In-Reply-To: <000001c66faa$3a25b130$15327e82@pyrimidine> Message-ID: <000001c66fc9$fe7e5680$15327e82@pyrimidine> Here's another odd bit. This is what I get for the CONTIG line when I passed a simple contig file (NW_925062, with one join) through Bio::SeqIO: ----------------------------------- .... FEATURES Location/Qualifiers source 1..8541 /db_xref="taxon:9606" /mol_type="genomic DNA" /chromosome="11" /organism="Homo sapiens" CONTIG AADB02014027.1:1..8541 // ----------------------------------- Here's the original: ----------------------------------- FEATURES Location/Qualifiers source 1..8541 /organism="Homo sapiens" /mol_type="genomic DNA" /db_xref="taxon:9606" /chromosome="11" CONTIG join(AADB02014027.1:1..8541) // ----------------------------------- Looks like it lopped out the 'join' here as well. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Thursday, May 04, 2006 1:41 PM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > Are you using the CONTIG record or the full GenBank file? I see > problems with both (using bioperl-live) which seem unrelated to one > another. > The full file seems to be running a bit slow b/c the full GenBank record > is > huge (~55 MB) but the CONTIG file does exactly what you said (runs out of > memory). > > Chris > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Michael Rogoff > > Sent: Tuesday, May 02, 2006 10:32 PM > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > > > > > I've encountered a pretty serious bug in Bio::SeqIO when parsing certain > > genbank > > files that contain CONTIG entries with gaps. One such record is > > NW_925173. > > > > When I try to parse this file using Bio::SeqIO::genbank, it will enter > an > > infinite loop and spin until it runs out of memory. > > > > I'm pretty certain it relates to this bug: > > http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to indicate > > that > > genbank records with CONTIG gaps are not valid and can't be parsed. But > > this > > bug actually claims to be fixed, which is strange, since looking at the > > code for > > FTLocationFactory (where the loop is) it's still right there. I assume > > that > > this may be fixed in other contexts but is still not fixed in > > Bio::SeqIO::genbank? Or am I doing something wrong? > > > > I think that this should probably be filed as an open bug. I would > think > > that > > even if bioperl isn't interested in parsing this type of file via SeqIO, > > certainly you'd want to ensure that no finite input file would send the > > parser > > into an infinite loop. Have others encountered this problem? Is there > > any plan > > to address it? > > > > Thanks very much for any information or help! > > > > -Mike > > > > P.S. I've played around with my version of FTLocationFactory and it > seems > > to > > actually work and parse the gaps. I'm not sure if I've created other > bugs > > or if > > it works in all cases, but at least the parser doesn't die. I also > don't > > know > > that my hacky code is appropriate for putting back in to BioPerl, but > I'm > > happy > > to provide it if someone wants to check it out and/or consider it for > > checkin. > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Thu May 4 18:39:05 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 4 May 2006 18:39:05 -0400 Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps In-Reply-To: <000001c66fc9$fe7e5680$15327e82@pyrimidine> References: <000001c66fc9$fe7e5680$15327e82@pyrimidine> Message-ID: <2E0D7723-FA6E-4812-8DBB-30FCD11FA85C@gmx.net> The two notations are equivalent and syntactically correct, or so I believe ... I don't think 100% verbatim preservation should be the goal. Or am I missing the point? On May 4, 2006, at 6:27 PM, Chris Fields wrote: > Here's another odd bit. This is what I get for the CONTIG line when I > passed a simple contig file (NW_925062, with one join) through > Bio::SeqIO: > > ----------------------------------- > .... > FEATURES Location/Qualifiers > source 1..8541 > /db_xref="taxon:9606" > /mol_type="genomic DNA" > /chromosome="11" > /organism="Homo sapiens" > CONTIG AADB02014027.1:1..8541 > > // > ----------------------------------- > Here's the original: > ----------------------------------- > FEATURES Location/Qualifiers > source 1..8541 > /organism="Homo sapiens" > /mol_type="genomic DNA" > /db_xref="taxon:9606" > /chromosome="11" > CONTIG join(AADB02014027.1:1..8541) > // > ----------------------------------- > > Looks like it lopped out the 'join' here as well. > > Chris > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Chris Fields >> Sent: Thursday, May 04, 2006 1:41 PM >> To: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps >> >> Are you using the CONTIG record or the full GenBank file? I see >> problems with both (using bioperl-live) which seem unrelated to one >> another. >> The full file seems to be running a bit slow b/c the full GenBank >> record >> is >> huge (~55 MB) but the CONTIG file does exactly what you said (runs >> out of >> memory). >> >> Chris >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff >>> Sent: Tuesday, May 02, 2006 10:32 PM >>> To: bioperl-l at lists.open-bio.org >>> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps >>> >>> >>> I've encountered a pretty serious bug in Bio::SeqIO when parsing >>> certain >>> genbank >>> files that contain CONTIG entries with gaps. One such record is >>> NW_925173. >>> >>> When I try to parse this file using Bio::SeqIO::genbank, it will >>> enter >> an >>> infinite loop and spin until it runs out of memory. >>> >>> I'm pretty certain it relates to this bug: >>> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to >>> indicate >>> that >>> genbank records with CONTIG gaps are not valid and can't be >>> parsed. But >>> this >>> bug actually claims to be fixed, which is strange, since looking >>> at the >>> code for >>> FTLocationFactory (where the loop is) it's still right there. I >>> assume >>> that >>> this may be fixed in other contexts but is still not fixed in >>> Bio::SeqIO::genbank? Or am I doing something wrong? >>> >>> I think that this should probably be filed as an open bug. I would >> think >>> that >>> even if bioperl isn't interested in parsing this type of file via >>> SeqIO, >>> certainly you'd want to ensure that no finite input file would >>> send the >>> parser >>> into an infinite loop. Have others encountered this problem? Is >>> there >>> any plan >>> to address it? >>> >>> Thanks very much for any information or help! >>> >>> -Mike >>> >>> P.S. I've played around with my version of FTLocationFactory and it >> seems >>> to >>> actually work and parse the gaps. I'm not sure if I've created >>> other >> bugs >>> or if >>> it works in all cases, but at least the parser doesn't die. I also >> don't >>> know >>> that my hacky code is appropriate for putting back in to BioPerl, >>> but >> I'm >>> happy >>> to provide it if someone wants to check it out and/or consider it >>> for >>> checkin. >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hubert.prielinger at gmx.at Thu May 4 19:57:44 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Thu, 04 May 2006 17:57:44 -0600 Subject: [Bioperl-l] can't parse blast file anymore In-Reply-To: <445A7449.1080607@infotech.monash.edu.au> References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au> Message-ID: <445A94F8.9000903@gmx.at> Torsten Seemann wrote: > Hubert > >> the following perl script worked fine until a few days ago.... >> >> #iterate over each query sequence >> print "try to enter while loop\n"; >> >> > die "Bad BLAST report" if not defined $search; > >> while (my $result = $search->next_result) { >> print "entered 1st while loop\n"; >> >> Output: >> >> [Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl >> try to enter while loop >> >> but it doesn't enter the first while loop, it stuck there, first I >> > What is the value of $search before you start the WHILE loop ? > > hi, $search is defined, like my $search = new Bio::SearchIO (-format => 'blast', -file => $file) if I try it with the debugger as barry has suggested than I get the following DB<1> n main::(Blast.pl:24): print "Connection established \n"; DB<1> n Connection established main::(Blast.pl:26): my $selectID = 0; DB<1> n main::(Blast.pl:27): my $count = 0; DB<1> n main::(Blast.pl:37): print "start program\n"; DB<1> n start program main::(Blast.pl:38): my $directory = '/home/Hubert/test'; DB<1> n main::(Blast.pl:39): opendir(DIR, $directory) || die("Cannot open directory"); DB<1> n main::(Blast.pl:40): print "opened directory\n"; DB<1> n opened directory main::(Blast.pl:42): foreach my $file (readdir(DIR)) { DB<1> n main::(Blast.pl:43): if ($file =~ /txt$/) { DB<1> n main::(Blast.pl:44): $count++; DB<1> n main::(Blast.pl:45): print "read file $file \n"; DB<1> n read file 40026.txt main::(Blast.pl:48): $file = $directory . '/' . $file; DB<1> n main::(Blast.pl:50): my $search = new Bio::SearchIO (-format => 'blast', main::(Blast.pl:51): -file => $file); DB<1> n main::(Blast.pl:52): print "bioperl seems to work....\n"; DB<1> s $search main::((eval 14)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3): 3: $search; DB<<2>> n DB<2> n bioperl seems to work.... main::(Blast.pl:53): my $cutoff_len = 10; DB<2> n main::(Blast.pl:56): print "try to enter while loop\n"; DB<2> n try to enter while loop main::(Blast.pl:57): while (my $result = $search->next_result) { DB<2> s $result main::((eval 15)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3): 3: $result; DB<<3>> From torsten.seemann at infotech.monash.edu.au Thu May 4 17:38:17 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 05 May 2006 07:38:17 +1000 Subject: [Bioperl-l] can't parse blast file anymore In-Reply-To: <445A578E.8050207@gmx.at> References: <445A578E.8050207@gmx.at> Message-ID: <445A7449.1080607@infotech.monash.edu.au> Hubert >the following perl script worked fine until a few days ago.... > > #iterate over each query sequence > print "try to enter while loop\n"; > > die "Bad BLAST report" if not defined $search; > while (my $result = $search->next_result) { > print "entered 1st while loop\n"; > >Output: > >[Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl >try to enter while loop > >but it doesn't enter the first while loop, it stuck there, first I > > What is the value of $search before you start the WHILE loop ? From barry.moore at genetics.utah.edu Thu May 4 20:39:57 2006 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu, 4 May 2006 18:39:57 -0600 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <445A94F8.9000903@gmx.at> References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au> <445A94F8.9000903@gmx.at> Message-ID: <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu> That should be 'x $resust' and you should see the object dumped to the screen. or just 's' by itself which will step you into the sub on the while line will step you into the next_result sub, and you can look around and watch what's happening. B > DB<2> s $result > main::((eval 15)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3): > 3: $result; > DB<<3>> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Thu May 4 22:04:20 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Thu, 04 May 2006 20:04:20 -0600 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu> References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au> <445A94F8.9000903@gmx.at> <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu> Message-ID: <445AB2A4.7020405@gmx.at> if I do so it returns: 0 undef Barry Moore wrote: > That should be 'x $resust' and you should see the object dumped to > the screen. > > or just 's' by itself which will step you into the sub on the while > line will step you into the next_result sub, and you can look around > and watch what's happening. > > B > > >> DB<2> s $result >> main::((eval 15)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3): >> 3: $result; >> DB<<3>> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From torsten.seemann at infotech.monash.edu.au Fri May 5 00:40:34 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 05 May 2006 14:40:34 +1000 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <445AB2A4.7020405@gmx.at> References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au> <445A94F8.9000903@gmx.at> <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu> <445AB2A4.7020405@gmx.at> Message-ID: <445AD742.4070408@infotech.monash.edu.au> Hubert Prielinger wrote: > if I do so it returns: > 0 undef That means the value of $search was undef. That means that it could not parse or open the BLAST report. I repeat the line that I put in my earlier email which you ignored. # your line my $search = Bio::SearchIO->new( ..... ); # then check if it was successful! die "could not open blast report" if not defined $search; --Torsten From jason.stajich at duke.edu Fri May 5 09:21:38 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri, 5 May 2006 09:21:38 -0400 Subject: [Bioperl-l] bioperl-AlignIO problems parsing fasta files In-Reply-To: <5d35ac4d.b35ce863.8198d00@expms1.cites.uiuc.edu> References: <5d35ac4d.b35ce863.8198d00@expms1.cites.uiuc.edu> Message-ID: Space after the > is causing the problem since we infer the ID as the everything after the '>' BEFORE the first whitespace. Get rid of the space. $ perl -i.backup -p -e 's/^>\s+/>/' YOURFASALNFILE On May 4, 2006, at 7:00 PM, Gloria Rendon wrote: > contents of the input file has a single sequence: > >> gi|90108701|pdb|2AHZ|B Chain B, K+ Complex Of The Nak Channel > MLSFLLTLKRMLRACLRAWKDKEFQVLFVLTILTLISGTIFYSTVEGLRPIDALYFSVVTLTTVGDGNFS > PQTDFGKIFTILYIFIGIGLVFGFIHKLAVNVQLPSILSN > ------------------------------------------ > this is the script that tries to parse it: > > use Bio::AlignIO; > my $inseq = Bio::AlignIO->new(-format => 'fasta', > -file => 'test.fasta'); > while( my $aln = $inseq->next_aln ) { > print "name: ", $aln->displayname; > print "length: ", $aln->length; > print "\n"; > } > > ------------------------------------------ > and this is the result of running that script on winxp > > D:\msa\NAK MUTANTS>perl parseFasta.pl > > > ------------- EXCEPTION ------------- > MSG: No sequence with name [] > STACK Bio::SimpleAlign::displayname > C:/Perl/site/lib/Bio/SimpleAlign.pm:2047 > STACK toplevel parseFasta.pl:11 > > -------------------------------------- > D:\msa\NAK MUTANTS> -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From thoufek at pngg.org Thu May 4 12:50:44 2006 From: thoufek at pngg.org (T.D. Houfek) Date: Thu, 04 May 2006 12:50:44 -0400 Subject: [Bioperl-l] Bio::Seq::Quality description line problem In-Reply-To: References: Message-ID: <445A30E4.6070103@pngg.org> Using Bioperl 1.5, having trouble with writing FASTA-style quality files using Bio::Seq::Quality. I create the Bio::Seq::Quality object, giving its constructor an ID, a description, a nucleotide sequence, and a quality sequence. I then write the sequence FASTA and the quality FASTA. The description string will appear in the header line of the sequence FASTA, but not in the header line of the quality FASTA. Can anybody help me figure out how to fix this? I've attached a sample script and output. -T.D. ------------------- sample script follows --------------------------------------- #!/usr/bin/perl use strict; use Bio::Seq::Quality; use Bio::SeqIO; my $id = "bogus_id"; my $desc = "bogus description"; my $seq = "ATTATTATTATTATT"; my $qual = "10 20 30 10 20 30 10 20 30 10 20 30 10 20 30"; my $sequal_obj = Bio::Seq::Quality->new( -display_id => $id, -desc => $desc, -seq => $seq, -qual => $qual ); my $qualout = Bio::SeqIO->new( -file => ">myfile.qual", -format => 'qual' ); my $seqout = Bio::SeqIO->new( -file => ">myfile.seq", -format => 'Fasta' ); $seqout->write_seq($sequal_obj); $qualout->write_seq($sequal_obj); ------------------ sample output follows --------------------------------------- tdhoufek at aether:~$ cat myfile.seq >bogus_id bogus description ATTATTATTATTATT tdhoufek at aether:~$ cat myfile.qual >bogus_id 10 20 30 10 20 30 10 20 30 10 20 30 10 20 30 -------------------------------------------------------------------------------------------------- -- T.D. Houfek senior bioinformatics developer plant nematode genetics group north carolina state university Email: thoufek at pngg.org ---------------------------------------------------------- use Bio::Seq; @a =qw/NNN CCT GAG CAT GCG TGT AAG AAC TAG/; $u=seq;$r=Bio::Seq;sub c{$c=$r->new(-$u=>"@_[0]")->revcom; $t=$c->$u;}map{m/\d/?$g=c($a[$_]):tr/a-i/1-9/&&($g=$a[$_]) ;$x[$i++]=$g;} split //,"dgh5cb40ab120cdefb4";$z=$r->new(- $u=>(join"", at x))->translate()->$u;$z =~s/X/ /g;print"$z\n" From jason.stajich at duke.edu Fri May 5 09:27:51 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri, 5 May 2006 09:27:51 -0400 Subject: [Bioperl-l] bioperl-AlignIO problems parsing fasta files In-Reply-To: References: <5d35ac4d.b35ce863.8198d00@expms1.cites.uiuc.edu> Message-ID: <0F79C9AD-DE36-4424-9E59-37ABE8B62A5E@duke.edu> [replying to myself] although if you are trying to just read a sequence not an alignment then you want to use Bio::SeqIO. See the copious help on the HOWTO page at bioperl website including a sequence and feature howto and beginner's guide. http://bioperl.org/wiki/HOWTOs -jason On May 5, 2006, at 9:21 AM, Jason Stajich wrote: > Space after the > is causing the problem since we infer the ID as the > everything after the '>' BEFORE the first whitespace. Get rid of the > space. > $ perl -i.backup -p -e 's/^>\s+/>/' YOURFASALNFILE > > On May 4, 2006, at 7:00 PM, Gloria Rendon wrote: > >> contents of the input file has a single sequence: >> >>> gi|90108701|pdb|2AHZ|B Chain B, K+ Complex Of The Nak Channel >> MLSFLLTLKRMLRACLRAWKDKEFQVLFVLTILTLISGTIFYSTVEGLRPIDALYFSVVTLTTVGDGNF >> S >> PQTDFGKIFTILYIFIGIGLVFGFIHKLAVNVQLPSILSN >> ------------------------------------------ >> this is the script that tries to parse it: >> >> use Bio::AlignIO; >> my $inseq = Bio::AlignIO->new(-format => 'fasta', >> -file => 'test.fasta'); >> while( my $aln = $inseq->next_aln ) { >> print "name: ", $aln->displayname; >> print "length: ", $aln->length; >> print "\n"; >> } >> >> ------------------------------------------ >> and this is the result of running that script on winxp >> >> D:\msa\NAK MUTANTS>perl parseFasta.pl >> >> >> ------------- EXCEPTION ------------- >> MSG: No sequence with name [] >> STACK Bio::SimpleAlign::displayname >> C:/Perl/site/lib/Bio/SimpleAlign.pm:2047 >> STACK toplevel parseFasta.pl:11 >> >> -------------------------------------- >> D:\msa\NAK MUTANTS> > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From osborne1 at optonline.net Fri May 5 10:04:02 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 05 May 2006 10:04:02 -0400 Subject: [Bioperl-l] Bio::Seq::Quality description line problem In-Reply-To: <445A30E4.6070103@pngg.org> Message-ID: T.D., According to the documentation, http://www.bioperl.org/wiki/Qual_sequence_format, your *qual file looks right. What are you trying to create? Brian O. On 5/4/06 12:50 PM, "T.D. Houfek" wrote: > Using Bioperl 1.5, having trouble with writing FASTA-style quality files > using Bio::Seq::Quality. > > I create the Bio::Seq::Quality object, giving its constructor an ID, a > description, a nucleotide sequence, and a quality sequence. I then write > the sequence FASTA and the quality FASTA. The description string will > appear in the header line of the sequence FASTA, but not in the header > line of the quality FASTA. > > Can anybody help me figure out how to fix this? I've attached a sample > script and output. > > -T.D. > > ------------------- sample script follows > --------------------------------------- > > #!/usr/bin/perl > use strict; > use Bio::Seq::Quality; > use Bio::SeqIO; > > my $id = "bogus_id"; > my $desc = "bogus description"; > my $seq = "ATTATTATTATTATT"; > my $qual = "10 20 30 10 20 30 10 20 30 10 20 30 10 20 30"; > > my $sequal_obj = Bio::Seq::Quality->new( > -display_id => $id, > -desc => $desc, > -seq => $seq, > -qual => $qual > ); > > my $qualout = Bio::SeqIO->new( > -file => ">myfile.qual", > -format => 'qual' > ); > my $seqout = Bio::SeqIO->new( > -file => ">myfile.seq", > -format => 'Fasta' > ); > > $seqout->write_seq($sequal_obj); > $qualout->write_seq($sequal_obj); > > > ------------------ sample output follows > --------------------------------------- > > tdhoufek at aether:~$ cat myfile.seq >> bogus_id bogus description > ATTATTATTATTATT > tdhoufek at aether:~$ cat myfile.qual >> bogus_id > 10 20 30 10 20 30 10 20 30 10 20 30 10 20 30 > > ------------------------------------------------------------------------------ > -------------------- > > > From cjfields at uiuc.edu Fri May 5 10:24:05 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 May 2006 09:24:05 -0500 Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps In-Reply-To: <2E0D7723-FA6E-4812-8DBB-30FCD11FA85C@gmx.net> Message-ID: <001701c6704f$90dbd090$15327e82@pyrimidine> I'm not sure it's a valid CONTIG file w/o the join(...). This is a chunk from the longer file Michael used as an example here (NW_925173). I believe the CONTIG line is currently handled like a feature so I think it goes through Bio::SeqIO::FTHelper, which is where Michael mentions his bugfix is; I think it's getting beaten up in there somehow. I may see what happens if it's treated like a WGS line (like a Bio::Annotation::SimpleValue object) and just glob the whole mess together as is. Chris ... FEATURES Location/Qualifiers source 1..44976370 /organism="Homo sapiens" /mol_type="genomic DNA" /db_xref="taxon:9606" /chromosome="11" CONTIG join(AADB02014316.1:1..1482320,gap(67),AADB02014317.1:1..577321, gap(441),AADB02014318.1:1..173584,gap(676), AADB02014319.1:1..377558,gap(20), complement(AADB02014320.1:1..431263),gap(20), AADB02014321.1:1..794957,gap(1241),AADB02014322.1:1..1366198, gap(6446),AADB02014323.1:1..3366,gap(20),AADB02014324.1:1..4771, gap(4611),AADB02014325.1:1..383881,gap(20), complement(AADB02014326.1:1..381633),gap(1930), complement(AADB02014327.1:1..460053),gap(20), AADB02014328.1:1..4186,gap(1587), ... > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > Sent: Thursday, May 04, 2006 5:39 PM > To: Chris Fields > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > The two notations are equivalent and syntactically correct, or so I > believe ... I don't think 100% verbatim preservation should be the > goal. Or am I missing the point? > > On May 4, 2006, at 6:27 PM, Chris Fields wrote: > > > Here's another odd bit. This is what I get for the CONTIG line when I > > passed a simple contig file (NW_925062, with one join) through > > Bio::SeqIO: > > > > ----------------------------------- > > .... > > FEATURES Location/Qualifiers > > source 1..8541 > > /db_xref="taxon:9606" > > /mol_type="genomic DNA" > > /chromosome="11" > > /organism="Homo sapiens" > > CONTIG AADB02014027.1:1..8541 > > > > // > > ----------------------------------- > > Here's the original: > > ----------------------------------- > > FEATURES Location/Qualifiers > > source 1..8541 > > /organism="Homo sapiens" > > /mol_type="genomic DNA" > > /db_xref="taxon:9606" > > /chromosome="11" > > CONTIG join(AADB02014027.1:1..8541) > > // > > ----------------------------------- > > > > Looks like it lopped out the 'join' here as well. > > > > Chris > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Chris Fields > >> Sent: Thursday, May 04, 2006 1:41 PM > >> To: bioperl-l at lists.open-bio.org > >> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > >> > >> Are you using the CONTIG record or the full GenBank file? I see > >> problems with both (using bioperl-live) which seem unrelated to one > >> another. > >> The full file seems to be running a bit slow b/c the full GenBank > >> record > >> is > >> huge (~55 MB) but the CONTIG file does exactly what you said (runs > >> out of > >> memory). > >> > >> Chris > >> > >>> -----Original Message----- > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff > >>> Sent: Tuesday, May 02, 2006 10:32 PM > >>> To: bioperl-l at lists.open-bio.org > >>> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > >>> > >>> > >>> I've encountered a pretty serious bug in Bio::SeqIO when parsing > >>> certain > >>> genbank > >>> files that contain CONTIG entries with gaps. One such record is > >>> NW_925173. > >>> > >>> When I try to parse this file using Bio::SeqIO::genbank, it will > >>> enter > >> an > >>> infinite loop and spin until it runs out of memory. > >>> > >>> I'm pretty certain it relates to this bug: > >>> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to > >>> indicate > >>> that > >>> genbank records with CONTIG gaps are not valid and can't be > >>> parsed. But > >>> this > >>> bug actually claims to be fixed, which is strange, since looking > >>> at the > >>> code for > >>> FTLocationFactory (where the loop is) it's still right there. I > >>> assume > >>> that > >>> this may be fixed in other contexts but is still not fixed in > >>> Bio::SeqIO::genbank? Or am I doing something wrong? > >>> > >>> I think that this should probably be filed as an open bug. I would > >> think > >>> that > >>> even if bioperl isn't interested in parsing this type of file via > >>> SeqIO, > >>> certainly you'd want to ensure that no finite input file would > >>> send the > >>> parser > >>> into an infinite loop. Have others encountered this problem? Is > >>> there > >>> any plan > >>> to address it? > >>> > >>> Thanks very much for any information or help! > >>> > >>> -Mike > >>> > >>> P.S. I've played around with my version of FTLocationFactory and it > >> seems > >>> to > >>> actually work and parse the gaps. I'm not sure if I've created > >>> other > >> bugs > >>> or if > >>> it works in all cases, but at least the parser doesn't die. I also > >> don't > >>> know > >>> that my hacky code is appropriate for putting back in to BioPerl, > >>> but > >> I'm > >>> happy > >>> to provide it if someone wants to check it out and/or consider it > >>> for > >>> checkin. > >>> > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Fri May 5 10:47:50 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 5 May 2006 10:47:50 -0400 Subject: [Bioperl-l] Bio::Seq::Quality description line problem In-Reply-To: References: Message-ID: <2E1683FE-57E4-4D97-A958-1B529973E89E@gmx.net> He wants the description on the description line, like for the sequence file. Thomas, my guess is the code doesn't print the description to the line although I haven't made sure. Do you want to volunteer and check, add that print statement and post the patch? -hilmar On May 5, 2006, at 10:04 AM, Brian Osborne wrote: > T.D., > > According to the documentation, > http://www.bioperl.org/wiki/Qual_sequence_format, your *qual file > looks > right. What are you trying to create? > > Brian O. > > > On 5/4/06 12:50 PM, "T.D. Houfek" wrote: > >> Using Bioperl 1.5, having trouble with writing FASTA-style quality >> files >> using Bio::Seq::Quality. >> >> I create the Bio::Seq::Quality object, giving its constructor an >> ID, a >> description, a nucleotide sequence, and a quality sequence. I then >> write >> the sequence FASTA and the quality FASTA. The description string will >> appear in the header line of the sequence FASTA, but not in the >> header >> line of the quality FASTA. >> >> Can anybody help me figure out how to fix this? I've attached a >> sample >> script and output. >> >> -T.D. >> >> ------------------- sample script follows >> --------------------------------------- >> >> #!/usr/bin/perl >> use strict; >> use Bio::Seq::Quality; >> use Bio::SeqIO; >> >> my $id = "bogus_id"; >> my $desc = "bogus description"; >> my $seq = "ATTATTATTATTATT"; >> my $qual = "10 20 30 10 20 30 10 20 30 10 20 30 10 20 30"; >> >> my $sequal_obj = Bio::Seq::Quality->new( >> -display_id => $id, >> -desc => $desc, >> -seq => $seq, >> -qual => $qual >> ); >> >> my $qualout = Bio::SeqIO->new( >> -file => ">myfile.qual", >> -format => 'qual' >> ); >> my $seqout = Bio::SeqIO->new( >> -file => ">myfile.seq", >> -format => 'Fasta' >> ); >> >> $seqout->write_seq($sequal_obj); >> $qualout->write_seq($sequal_obj); >> >> >> ------------------ sample output follows >> --------------------------------------- >> >> tdhoufek at aether:~$ cat myfile.seq >>> bogus_id bogus description >> ATTATTATTATTATT >> tdhoufek at aether:~$ cat myfile.qual >>> bogus_id >> 10 20 30 10 20 30 10 20 30 10 20 30 10 20 30 >> >> --------------------------------------------------------------------- >> --------- >> -------------------- >> >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From dmessina at wustl.edu Fri May 5 11:24:47 2006 From: dmessina at wustl.edu (David Messina) Date: Fri, 5 May 2006 10:24:47 -0500 Subject: [Bioperl-l] Bio::Seq::Quality description line problem In-Reply-To: <445A30E4.6070103@pngg.org> References: <445A30E4.6070103@pngg.org> Message-ID: <5A549C57-A310-4623-BC44-787AC8BFD6C2@wustl.edu> Apologies if this is a repost -- mail troubles this morning. Hilmar is correct. From a cursory walk through the code in a debugger, it looks like Bio::SeqIO::qual's write_seq method doesn't read the 'desc' out of the Bio::Seq::Quality object. I think there should be something like this: if ($source->can('desc') and my $desc = $source->desc()) { $desc =~ s/\n//g; } $header .= " $desc"; before line 218 in Bio::SeqIO::qual (where the header is printed): $self->_print (">$header \n"); Dave From dmessina at wustl.edu Fri May 5 10:53:15 2006 From: dmessina at wustl.edu (David Messina) Date: Fri, 5 May 2006 09:53:15 -0500 Subject: [Bioperl-l] Bio::Seq::Quality description line problem In-Reply-To: <445A30E4.6070103@pngg.org> References: <445A30E4.6070103@pngg.org> Message-ID: T.D., From a cursory walk through your code in a debugger, it looks like Bio::SeqIO::qual's write_seq method doesn't read the 'desc' out of the Bio::Seq::Quality object. I think there should be something like this: if ($source->can('desc') and my $desc = $source->desc()) { $desc =~ s/\n//g; } $header .= " $desc"; before line 218 in Bio::SeqIO::qual (where the header is printed): $self->_print (">$header \n"); Dave From dmessina at wustl.edu Fri May 5 10:53:15 2006 From: dmessina at wustl.edu (David Messina) Date: Fri, 5 May 2006 09:53:15 -0500 Subject: [Bioperl-l] Bio::Seq::Quality description line problem In-Reply-To: <445A30E4.6070103@pngg.org> References: <445A30E4.6070103@pngg.org> Message-ID: T.D., From a cursory walk through your code in a debugger, it looks like Bio::SeqIO::qual's write_seq method doesn't read the 'desc' out of the Bio::Seq::Quality object. I think there should be something like this: if ($source->can('desc') and my $desc = $source->desc()) { $desc =~ s/\n//g; } $header .= " $desc"; before line 218 in Bio::SeqIO::qual (where the header is printed): $self->_print (">$header \n"); Dave From hubert.prielinger at gmx.at Fri May 5 14:30:24 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 05 May 2006 12:30:24 -0600 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <445AD742.4070408@infotech.monash.edu.au> References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au> <445A94F8.9000903@gmx.at> <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu> <445AB2A4.7020405@gmx.at> <445AD742.4070408@infotech.monash.edu.au> Message-ID: <445B99C0.6050407@gmx.at> hi, I have done, as you suggested and I got the error message: Can't call method "next_result" on an undefined value at.... then I looked up at the internet and found a thread which suggested to use strict and then the problem is solved.... but I'm already using use strict.. thanks Torsten Seemann wrote: > Hubert Prielinger wrote: > >> if I do so it returns: >> 0 undef >> > > That means the value of $search was undef. > That means that it could not parse or open the BLAST report. > I repeat the line that I put in my earlier email which you ignored. > > # your line > my $search = Bio::SearchIO->new( ..... ); > > # then check if it was successful! > die "could not open blast report" if not defined $search; > > --Torsten > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From cjfields at uiuc.edu Fri May 5 15:18:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 May 2006 14:18:16 -0500 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <445B99C0.6050407@gmx.at> Message-ID: <000001c67078$a9a7ca10$15327e82@pyrimidine> What happens if you add the verbose flag? my $search = new Bio::SearchIO (-verbose => 1, -format => 'blast', -file => $file); Added thought : you might want to look at File::Find for stepping through your files and performing a task on each one, such as parsing output. It changes into the working directory each time; you should be able to do something like this: use File::Find; use Bio::SearchIO; Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger > Sent: Friday, May 05, 2006 1:30 PM > To: Torsten Seemann; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore > > hi, > I have done, as you suggested and I got the error message: > > Can't call method "next_result" on an undefined value at.... > > then I looked up at the internet and found a thread which suggested to > use strict and then the problem is solved.... > but I'm already using use strict.. > > thanks > > Torsten Seemann wrote: > > Hubert Prielinger wrote: > > > >> if I do so it returns: > >> 0 undef > >> > > > > That means the value of $search was undef. > > That means that it could not parse or open the BLAST report. > > I repeat the line that I put in my earlier email which you ignored. > > > > # your line > > my $search = Bio::SearchIO->new( ..... ); > > > > # then check if it was successful! > > die "could not open blast report" if not defined $search; > > > > --Torsten > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri May 5 15:27:12 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 May 2006 14:27:12 -0500 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <445B99C0.6050407@gmx.at> Message-ID: <000101c67079$e8c86a00$15327e82@pyrimidine> Sorry, mail got sent before I finished it! Here I go again... What happens if you add the verbose flag? my $search = new Bio::SearchIO (-verbose => 1, -format => 'blast', -file => $file); Added thought : you might want to look at File::Find for stepping through your files and performing a task on each one, such as parsing output. It changes into the working directory each time; you should be able to do something like this: use File::Find; use Bio::SearchIO; my @dirlist = ("/home/Hubert/test"); find (\&dir, @dirlist); sub printdir { return unless /txt$/; return if (-d); my $parser = Bio::SearchIO->new(-file => $_, -format => 'blast'); while (my $result = $parser->next_result) { while (my $hit = $result->next_hit) { while (my $hsp = $hit->next_hsp) { # do stuff here } } } } Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger > Sent: Friday, May 05, 2006 1:30 PM > To: Torsten Seemann; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore > > hi, > I have done, as you suggested and I got the error message: > > Can't call method "next_result" on an undefined value at.... > > then I looked up at the internet and found a thread which suggested to > use strict and then the problem is solved.... > but I'm already using use strict.. > > thanks > > Torsten Seemann wrote: > > Hubert Prielinger wrote: > > > >> if I do so it returns: > >> 0 undef > >> > > > > That means the value of $search was undef. > > That means that it could not parse or open the BLAST report. > > I repeat the line that I put in my earlier email which you ignored. > > > > # your line > > my $search = Bio::SearchIO->new( ..... ); > > > > # then check if it was successful! > > die "could not open blast report" if not defined $search; > > > > --Torsten > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From barry.moore at genetics.utah.edu Fri May 5 15:39:37 2006 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri, 5 May 2006 13:39:37 -0600 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <445B99C0.6050407@gmx.at> References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au> <445A94F8.9000903@gmx.at> <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu> <445AB2A4.7020405@gmx.at> <445AD742.4070408@infotech.monash.edu.au> <445B99C0.6050407@gmx.at> Message-ID: <7F3D73A6-392E-4728-ACB9-FD3BEDFD3C18@genetics.utah.edu> Hubert- If you want to send me your script and input file I'll try to have a look at it. Barry On May 5, 2006, at 12:30 PM, Hubert Prielinger wrote: > hi, > I have done, as you suggested and I got the error message: > > Can't call method "next_result" on an undefined value at.... > > then I looked up at the internet and found a thread which suggested to > use strict and then the problem is solved.... > but I'm already using use strict.. > > thanks > > Torsten Seemann wrote: >> Hubert Prielinger wrote: >> >>> if I do so it returns: >>> 0 undef >>> >> >> That means the value of $search was undef. >> That means that it could not parse or open the BLAST report. >> I repeat the line that I put in my earlier email which you ignored. >> >> # your line >> my $search = Bio::SearchIO->new( ..... ); >> >> # then check if it was successful! >> die "could not open blast report" if not defined $search; >> >> --Torsten >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri May 5 16:07:53 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 May 2006 15:07:53 -0500 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <000101c67079$e8c86a00$15327e82@pyrimidine> Message-ID: <000201c6707f$97aaaba0$15327e82@pyrimidine> Oops! This is what happens when I copy and paste in a hurry. > use File::Find; > use Bio::SearchIO; > > my @dirlist = ("/home/Hubert/test"); > > find (\&dir, @dirlist); > > sub printdir { ^^^^^^^^^^^ Should be: sub dir { > return unless /txt$/; > return if (-d); > my $parser = Bio::SearchIO->new(-file => $_, > -format => 'blast'); > while (my $result = $parser->next_result) { > while (my $hit = $result->next_hit) { > while (my $hsp = $hit->next_hsp) { > # do stuff here > } > } > } > } Hubert, if the file you are parsing looks fine (i.e. valid BLAST output), post it and your script on Bugzilla and let us take a look. Leave out your password though ; > Chris From golharam at umdnj.edu Fri May 5 15:58:03 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Fri, 05 May 2006 15:58:03 -0400 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <000001c67078$a9a7ca10$15327e82@pyrimidine> Message-ID: <02f101c6707e$39a03a30$2f01a8c0@GOLHARMOBILE1> I'm not sure how applicable this is, but I've seen a problem with Perl if the LANG environment variable contain UTF8 (ex LANG=en_US.UTF8). I've changed mine to en_US and lots of perl string parsing problems went away. Also, what about running the bioperl tests on your installation (make test). What happens? -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields Sent: Friday, May 05, 2006 3:18 PM To: 'Hubert Prielinger'; 'Torsten Seemann'; bioperl-l at bioperl.org Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore What happens if you add the verbose flag? my $search = new Bio::SearchIO (-verbose => 1, -format => 'blast', -file => $file); Added thought : you might want to look at File::Find for stepping through your files and performing a task on each one, such as parsing output. It changes into the working directory each time; you should be able to do something like this: use File::Find; use Bio::SearchIO; Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger > Sent: Friday, May 05, 2006 1:30 PM > To: Torsten Seemann; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore > > hi, > I have done, as you suggested and I got the error message: > > Can't call method "next_result" on an undefined value at.... > > then I looked up at the internet and found a thread which suggested to > use strict and then the problem is solved.... but I'm already using > use strict.. > > thanks > > Torsten Seemann wrote: > > Hubert Prielinger wrote: > > > >> if I do so it returns: > >> 0 undef > >> > > > > That means the value of $search was undef. > > That means that it could not parse or open the BLAST report. I > > repeat the line that I put in my earlier email which you ignored. > > > > # your line > > my $search = Bio::SearchIO->new( ..... ); > > > > # then check if it was successful! > > die "could not open blast report" if not defined $search; > > > > --Torsten > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri May 5 17:56:29 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 May 2006 16:56:29 -0500 Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps In-Reply-To: <001701c6704f$90dbd090$15327e82@pyrimidine> Message-ID: <000901c6708e$c77442b0$15327e82@pyrimidine> Okay, I have changed the way the CONTIG line is handled in Bio::SeqIO::genbank. It was handling it as a feature; I just changed it over to handling it as a Bio::Annotation::SimpleValue object with the value being the entire contig section. It seems to pass tests fine but I'm operating off Windows and my wife's IBook went to the great desktop in the sky (motherboard), so I can't test it there. Pulling the file off using Bio::DB::GenBank (using the no-redirect flag) works w/o crashing out. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Friday, May 05, 2006 9:24 AM > To: 'Hilmar Lapp' > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > I'm not sure it's a valid CONTIG file w/o the join(...). This is a chunk > from the longer file Michael used as an example here (NW_925173). I > believe > the CONTIG line is currently handled like a feature so I think it goes > through Bio::SeqIO::FTHelper, which is where Michael mentions his bugfix > is; > I think it's getting beaten up in there somehow. I may see what happens if > it's treated like a WGS line (like a Bio::Annotation::SimpleValue object) > and just glob the whole mess together as is. > > > Chris > > ... > FEATURES Location/Qualifiers > source 1..44976370 > /organism="Homo sapiens" > /mol_type="genomic DNA" > /db_xref="taxon:9606" > /chromosome="11" > CONTIG > join(AADB02014316.1:1..1482320,gap(67),AADB02014317.1:1..577321, > gap(441),AADB02014318.1:1..173584,gap(676), > AADB02014319.1:1..377558,gap(20), > complement(AADB02014320.1:1..431263),gap(20), > AADB02014321.1:1..794957,gap(1241),AADB02014322.1:1..1366198, > > gap(6446),AADB02014323.1:1..3366,gap(20),AADB02014324.1:1..4771, > gap(4611),AADB02014325.1:1..383881,gap(20), > complement(AADB02014326.1:1..381633),gap(1930), > complement(AADB02014327.1:1..460053),gap(20), > AADB02014328.1:1..4186,gap(1587), > ... > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > > Sent: Thursday, May 04, 2006 5:39 PM > > To: Chris Fields > > Cc: bioperl-l at lists.open-bio.org > > Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > > > The two notations are equivalent and syntactically correct, or so I > > believe ... I don't think 100% verbatim preservation should be the > > goal. Or am I missing the point? > > > > On May 4, 2006, at 6:27 PM, Chris Fields wrote: > > > > > Here's another odd bit. This is what I get for the CONTIG line when I > > > passed a simple contig file (NW_925062, with one join) through > > > Bio::SeqIO: > > > > > > ----------------------------------- > > > .... > > > FEATURES Location/Qualifiers > > > source 1..8541 > > > /db_xref="taxon:9606" > > > /mol_type="genomic DNA" > > > /chromosome="11" > > > /organism="Homo sapiens" > > > CONTIG AADB02014027.1:1..8541 > > > > > > // > > > ----------------------------------- > > > Here's the original: > > > ----------------------------------- > > > FEATURES Location/Qualifiers > > > source 1..8541 > > > /organism="Homo sapiens" > > > /mol_type="genomic DNA" > > > /db_xref="taxon:9606" > > > /chromosome="11" > > > CONTIG join(AADB02014027.1:1..8541) > > > // > > > ----------------------------------- > > > > > > Looks like it lopped out the 'join' here as well. > > > > > > Chris > > > > > >> -----Original Message----- > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >> bounces at lists.open-bio.org] On Behalf Of Chris Fields > > >> Sent: Thursday, May 04, 2006 1:41 PM > > >> To: bioperl-l at lists.open-bio.org > > >> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > >> > > >> Are you using the CONTIG record or the full GenBank file? I see > > >> problems with both (using bioperl-live) which seem unrelated to one > > >> another. > > >> The full file seems to be running a bit slow b/c the full GenBank > > >> record > > >> is > > >> huge (~55 MB) but the CONTIG file does exactly what you said (runs > > >> out of > > >> memory). > > >> > > >> Chris > > >> > > >>> -----Original Message----- > > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >>> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff > > >>> Sent: Tuesday, May 02, 2006 10:32 PM > > >>> To: bioperl-l at lists.open-bio.org > > >>> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > >>> > > >>> > > >>> I've encountered a pretty serious bug in Bio::SeqIO when parsing > > >>> certain > > >>> genbank > > >>> files that contain CONTIG entries with gaps. One such record is > > >>> NW_925173. > > >>> > > >>> When I try to parse this file using Bio::SeqIO::genbank, it will > > >>> enter > > >> an > > >>> infinite loop and spin until it runs out of memory. > > >>> > > >>> I'm pretty certain it relates to this bug: > > >>> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to > > >>> indicate > > >>> that > > >>> genbank records with CONTIG gaps are not valid and can't be > > >>> parsed. But > > >>> this > > >>> bug actually claims to be fixed, which is strange, since looking > > >>> at the > > >>> code for > > >>> FTLocationFactory (where the loop is) it's still right there. I > > >>> assume > > >>> that > > >>> this may be fixed in other contexts but is still not fixed in > > >>> Bio::SeqIO::genbank? Or am I doing something wrong? > > >>> > > >>> I think that this should probably be filed as an open bug. I would > > >> think > > >>> that > > >>> even if bioperl isn't interested in parsing this type of file via > > >>> SeqIO, > > >>> certainly you'd want to ensure that no finite input file would > > >>> send the > > >>> parser > > >>> into an infinite loop. Have others encountered this problem? Is > > >>> there > > >>> any plan > > >>> to address it? > > >>> > > >>> Thanks very much for any information or help! > > >>> > > >>> -Mike > > >>> > > >>> P.S. I've played around with my version of FTLocationFactory and it > > >> seems > > >>> to > > >>> actually work and parse the gaps. I'm not sure if I've created > > >>> other > > >> bugs > > >>> or if > > >>> it works in all cases, but at least the parser doesn't die. I also > > >> don't > > >>> know > > >>> that my hacky code is appropriate for putting back in to BioPerl, > > >>> but > > >> I'm > > >>> happy > > >>> to provide it if someone wants to check it out and/or consider it > > >>> for > > >>> checkin. > > >>> > > >>> > > >>> > > >>> _______________________________________________ > > >>> Bioperl-l mailing list > > >>> Bioperl-l at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > -- > > =========================================================== > > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > =========================================================== > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Fri May 5 19:54:55 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 05 May 2006 17:54:55 -0600 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <02f101c6707e$39a03a30$2f01a8c0@GOLHARMOBILE1> References: <02f101c6707e$39a03a30$2f01a8c0@GOLHARMOBILE1> Message-ID: <445BE5CF.2000007@gmx.at> hi ryan, nothing happend if I add the verbose flag and how can I test my bioperl installation..... Ryan Golhar wrote: > I'm not sure how applicable this is, but I've seen a problem with Perl > if the LANG environment variable contain UTF8 (ex LANG=en_US.UTF8). > I've changed mine to en_US and lots of perl string parsing problems went > away. > > Also, what about running the bioperl tests on your installation (make > test). What happens? > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Friday, May 05, 2006 3:18 PM > To: 'Hubert Prielinger'; 'Torsten Seemann'; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore > > > What happens if you add the verbose flag? > > my $search = new Bio::SearchIO (-verbose => 1, > -format => 'blast', > -file => $file); > > Added thought : you might want to look at File::Find for stepping > through your files and performing a task on each one, such as parsing > output. It changes into the working directory each time; you should be > able to do something like this: > > use File::Find; > use Bio::SearchIO; > > > > > Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger >> Sent: Friday, May 05, 2006 1:30 PM >> To: Torsten Seemann; bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore >> >> hi, >> I have done, as you suggested and I got the error message: >> >> Can't call method "next_result" on an undefined value at.... >> >> then I looked up at the internet and found a thread which suggested to >> > > >> use strict and then the problem is solved.... but I'm already using >> use strict.. >> >> thanks >> >> Torsten Seemann wrote: >> >>> Hubert Prielinger wrote: >>> >>> >>>> if I do so it returns: >>>> 0 undef >>>> >>>> >>> That means the value of $search was undef. >>> That means that it could not parse or open the BLAST report. I >>> repeat the line that I put in my earlier email which you ignored. >>> >>> # your line >>> my $search = Bio::SearchIO->new( ..... ); >>> >>> # then check if it was successful! >>> die "could not open blast report" if not defined $search; >>> >>> --Torsten >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From hubert.prielinger at gmx.at Fri May 5 20:01:11 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 05 May 2006 18:01:11 -0600 Subject: [Bioperl-l] [BULK] can't parse blast file anymore Message-ID: <445BE747.5020202@gmx.at> hi I have posted my script and the blast file to bugzilla...... From hubert.prielinger at gmx.at Fri May 5 21:21:33 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 05 May 2006 19:21:33 -0600 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <445BE747.5020202@gmx.at> References: <445BE747.5020202@gmx.at> Message-ID: <445BFA1D.5060008@gmx.at> they bugzilla posting didn't work, what is the exact email address for bugzilla Hubert Prielinger wrote: > hi > I have posted my script and the blast file to bugzilla...... > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From cjfields at uiuc.edu Fri May 5 21:38:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 May 2006 20:38:47 -0500 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <445BFA1D.5060008@gmx.at> Message-ID: <000d01c670ad$d209f980$15327e82@pyrimidine> Hubert, Calm down. Breathe in, breath out. Relax....... Okay, here is the place to start. Read the instructions there first. http://www.bioperl.org/wiki/Bugs Bugs are reported at this site: http://bugzilla.bioperl.org/ Again, follow the instructions. You will have to create a user name and password to submit. Once that is set up, click the "Submit a new bug" link on the main bugzilla page. On that page, fill out all information first and a description of the error and hit 'commit'. Add the BLAST report and some sample script by clicking on the "Create a New Attachment" link (you'll have to do this for each file). Once you go back to the bug page you should see two attachments and the bug report. Any commits get sent through the bioperl-guts-l mail list which most developers subscribe to, so they'll know there's a new bug out there. I will not be able to get to it personally; our home computer died a slow painful death today (RIP 2002-2006) but I can get to it next week. If you post the bug, somebody might be able to get to it sooner! Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger > Sent: Friday, May 05, 2006 8:22 PM > To: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] [BULK] can't parse blast file anymore > > they bugzilla posting didn't work, what is the exact email address for > bugzilla > > Hubert Prielinger wrote: > > hi > > I have posted my script and the blast file to bugzilla...... > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri May 5 22:26:35 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 May 2006 21:26:35 -0500 Subject: [Bioperl-l] Changes to NCBIHelper (RE: CONTIG, genome files) Message-ID: <000f01c670b4$7f22f760$15327e82@pyrimidine> I committed a change to NCBIHelper that permits the downloading of CON (contig) files and corrects an issue where no sequence features were saved when rebuilding those files. If you use Bio::DB::GenBank regularly to download genome files, this likely will NOT affect your code unless you explicitly set the format type to 'genbank', like so: $factory = Bio::DB::GenBank->new(-format => 'gb'); # or 'genbank' I believe most will not have that setting since the default was already 'gb'. Now, the default is 'gbwithparts', which returns the full sequence regardless. If it is a file with a CONTIG line, the sequence is built on NCBI's end and will include seq features if they are present). As Brian said, we'll let NCBI do the work for us! If you need the actual file w/o sequence, then you can set the format to 'genbank' (like above) and it will grab it for you. There was an unrelated problem with CONTIG line parsing that I also fixed, where I changed the format over to a Bio::Annotation::SimpleValue as a workaround for now; for some reason some CON files were misparsed and resulted in infinite loops or missing 'join' statements. Chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From hubert.prielinger at gmx.at Sat May 6 18:22:05 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Sat, 06 May 2006 16:22:05 -0600 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <000d01c670ad$d209f980$15327e82@pyrimidine> References: <000d01c670ad$d209f980$15327e82@pyrimidine> Message-ID: <445D218D.2030504@gmx.at> ok, thanks I have submitted the bug bug #1994 Chris Fields wrote: > Hubert, > > Calm down. Breathe in, breath out. Relax....... > > Okay, here is the place to start. Read the instructions there first. > > http://www.bioperl.org/wiki/Bugs > > Bugs are reported at this site: > > http://bugzilla.bioperl.org/ > > Again, follow the instructions. You will have to create a user name and > password to submit. Once that is set up, click the "Submit a new bug" link > on the main bugzilla page. On that page, fill out all information first and > a description of the error and hit 'commit'. Add the BLAST report and some > sample script by clicking on the "Create a New Attachment" link (you'll have > to do this for each file). Once you go back to the bug page you should see > two attachments and the bug report. Any commits get sent through the > bioperl-guts-l mail list which most developers subscribe to, so they'll know > there's a new bug out there. > > I will not be able to get to it personally; our home computer died a slow > painful death today (RIP 2002-2006) but I can get to it next week. If you > post the bug, somebody might be able to get to it sooner! > > Chris > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger >> Sent: Friday, May 05, 2006 8:22 PM >> To: bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] [BULK] can't parse blast file anymore >> >> they bugzilla posting didn't work, what is the exact email address for >> bugzilla >> >> Hubert Prielinger wrote: >> >>> hi >>> I have posted my script and the blast file to bugzilla...... >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From torsten.seemann at infotech.monash.edu.au Sat May 6 20:57:14 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Sun, 07 May 2006 10:57:14 +1000 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <445D218D.2030504@gmx.at> References: <000d01c670ad$d209f980$15327e82@pyrimidine> <445D218D.2030504@gmx.at> Message-ID: <445D45EA.8020804@infotech.monash.edu.au> Hubert Prielinger wrote: > ok, thanks > I have submitted the bug > bug #1994 This is a line from the script you sent to Bugzilla: my $search = new Bio::SearchIO ( -verbose => 1,-format => 'blast', -file => $file) or die "could not open blast report" if not defined my $search; Althoygh syntactically correct, I don't think it is doing what you want. Please change it to this: my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or die "could not open blast report"; or alternatively, this: my $search = new Bio::SearchIO(-format => 'blast', -file => $file); if (not defined $search) { die "could not open blast report"; } and let us know what happens. all the example output you have supplied still suggests that Bio::SearchIO can not load or parse your blast report. -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia From mamillerpa at yahoo.com Sat May 6 19:07:30 2006 From: mamillerpa at yahoo.com (Mark A. Miller) Date: Sat, 6 May 2006 16:07:30 -0700 (PDT) Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines In-Reply-To: Message-ID: <20060506230730.56480.qmail@web50410.mail.yahoo.com> Thanks for your responses, Jason and Brian. Brian, you suggestion works great. I had really hoped that by parsing the OS line as well, I could be sure I wasn't missing any sequences from my organisms. Well, I gave up on that and just obtained the NCBI taxonomy values. I find it pretty easy to work with them in bioperl. Unfortunately, walking through all of Trembl takes a while, and I'm getting this error: Can't call method "ncbi_taxid" on an undefined value at ./ga2.pl line 55, line 3253682. When I try to extract annotations, etc., from entries like: DHE4_UNKP with: my $species_object = $seq->species; my $taxid_string = $species_object->ncbi_taxid; I guess I have to write an error handler for incomplete taxonomy values. Bye for now, Mark --- Brian Osborne wrote: > Mark, > > The RC line is part of the description of a reference, I'm guessing > 'RC' > stands for Reference Comment. In order to get the attributes of a > reference > you'll first do something like: > > my $anno_collection = $seq->annotation; > my @references = $anno_collection->get_Annotations('reference'); > > To get the comment field for a specific reference you can do: > > $references[0]->comment; > > See the Feature-Annotation HOWTO for more information on Annotations, > the > Reference object is a kind of Annotation object. > > Brian O. > > > On 5/3/06 3:34 PM, "Mark A. Miller" wrote: > > > Yeah. Do you have any experience with that? > > > > Mark > > > > --- Brian Osborne wrote: > > > >> Mark, > >> > >> So you're trying to get the information in the RC line from a > >> Swissprot > >> format file? > >> > >> Brian O. > > > > > > --- --- --- --- --- --- --- --- > > > > Mark A. Miller > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam protection around > > http://mail.yahoo.com > > > --- --- --- --- --- --- --- --- Mark A. Miller __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cjfields at uiuc.edu Sat May 6 23:33:40 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Sat, 6 May 2006 22:33:40 -0500 Subject: [Bioperl-l] [BULK] can't parse blast file anymore Message-ID: <65109dc1.b47d779e.81acb00@expms6.cites.uiuc.edu> The -verbose flag was my suggestion; it should output a ton of debugging info from SearchIO::blast; if you see anything there, then it means that it's at least attempting to parse the report. Of course I can't test this myself at the moment since my wife's computer died (along with the bioperl setup); I'm using a loaner computer at the moment. Chris ---- Original message ---- >Date: Sun, 07 May 2006 10:57:14 +1000 >From: Torsten Seemann >Subject: Re: [Bioperl-l] [BULK] ?can't parse blast file anymore >To: Hubert Prielinger >Cc: bioperl-l at bioperl.org > >Hubert Prielinger wrote: >> ok, thanks >> I have submitted the bug >> bug #1994 > >This is a line from the script you sent to Bugzilla: > >my $search = new Bio::SearchIO ( >-verbose => 1,-format => 'blast', -file => $file) >or die "could not open blast report" if not defined my $search; > >Althoygh syntactically correct, I don't think it is doing what you want. >Please change it to this: > >my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or die >"could not open blast report"; > >or alternatively, this: > >my $search = new Bio::SearchIO(-format => 'blast', -file => $file); >if (not defined $search) { > die "could not open blast report"; >} > >and let us know what happens. > >all the example output you have supplied still suggests that Bio::SearchIO can >not load or parse your blast report. > >-- >Torsten Seemann >Victorian Bioinformatics Consortium, Monash University, Australia >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From chen_li3 at yahoo.com Sun May 7 03:34:55 2006 From: chen_li3 at yahoo.com (chen li) Date: Sun, 7 May 2006 00:34:55 -0700 (PDT) Subject: [Bioperl-l] primer parameters using primer3 Message-ID: <20060507073455.11849.qmail@web36815.mail.mud.yahoo.com> Hi all, I use Bio::Tools::Run::Primer3 to design PCR primers. I want to change some default values, for example, to increase the PCR product size to 490-510 bp instead of using the default value of 100-300 bp. What should I do ? Thanks, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From jason.stajich at duke.edu Sun May 7 16:49:29 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun, 7 May 2006 16:49:29 -0400 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <65109dc1.b47d779e.81acb00@expms6.cites.uiuc.edu> References: <65109dc1.b47d779e.81acb00@expms6.cites.uiuc.edu> Message-ID: The problem is in how SearchIO was being initialized, the code basically looked like this: my $x = new Foo() or die if not defined my $x; which is invalid for two reason. 1) if not defined my $x; Will ALWAYS be false. 2) my $x = new Foo() or die ; Will cast the new object as a boolean. Whenever things aren't working, take a look at the code and try and walk through any shortcuts. For clarity make it a two-step process my $x = new Foo(); die "no valid $x" unless defined $x; Please note that currently BioPerl WILL die (via throw) if you try and ask for an invalid file when you initialize a new IO object -- this is handled by code in Bio::Root::IO (line 313 in Bio/Root/IO.pm) which all the IO objects use, so you don't really need to do a test on the object after all. --jason On May 6, 2006, at 11:33 PM, Christopher Fields wrote: > The -verbose flag was my suggestion; it should output a ton of > debugging info > from SearchIO::blast; if you see anything there, then it means that > it's at least > attempting to parse the report. > > Of course I can't test this myself at the moment since my wife's > computer died > (along with the bioperl setup); I'm using a loaner computer at the > moment. > > Chris > > ---- Original message ---- >> Date: Sun, 07 May 2006 10:57:14 +1000 >> From: Torsten Seemann >> Subject: Re: [Bioperl-l] [BULK] can't parse blast file anymore >> To: Hubert Prielinger >> Cc: bioperl-l at bioperl.org >> >> Hubert Prielinger wrote: >>> ok, thanks >>> I have submitted the bug >>> bug #1994 >> >> This is a line from the script you sent to Bugzilla: >> >> my $search = new Bio::SearchIO ( >> -verbose => 1,-format => 'blast', -file => $file) >> or die "could not open blast report" if not defined my $search; >> >> Althoygh syntactically correct, I don't think it is doing what you >> want. >> Please change it to this: >> >> my $search = new Bio::SearchIO(-format => 'blast', -file => $file) >> or die >> "could not open blast report"; >> >> or alternatively, this: >> >> my $search = new Bio::SearchIO(-format => 'blast', -file => $file); >> if (not defined $search) { >> die "could not open blast report"; >> } >> >> and let us know what happens. >> >> all the example output you have supplied still suggests that >> Bio::SearchIO can >> not load or parse your blast report. >> >> -- >> Torsten Seemann >> Victorian Bioinformatics Consortium, Monash University, Australia >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From jason.stajich at duke.edu Sun May 7 17:01:29 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun, 7 May 2006 17:01:29 -0400 Subject: [Bioperl-l] primer parameters using primer3 In-Reply-To: <20060507073455.11849.qmail@web36815.mail.mud.yahoo.com> References: <20060507073455.11849.qmail@web36815.mail.mud.yahoo.com> Message-ID: I put up some info on the wiki (and I encourage other people to do the same!) http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3 Set the command line parameters by just calling a function of the name of the parameter. To get a list of the available options, this perl code will report it to you: # what are the arguments, and what do they mean? my $args = $primer3->arguments; print "ARGUMENT\tMEANING\n"; foreach my $key (keys %{$args}) {print "$key\t", $$args{$key}, "\n"} The info for PRODUCT_SIZE_RANGE is: (size range list, default 100-300) space separated list of product sizes eg - - I believe you can set the PCR product size with $primer3->primer_product_size_range("490-510"); -jason On May 7, 2006, at 3:34 AM, chen li wrote: > Hi all, > > I use Bio::Tools::Run::Primer3 to design PCR primers. > I want to change some default values, for example, to > increase the PCR product size to 490-510 bp instead of > using the default value of 100-300 bp. What should I > do ? > > > Thanks, > > Li > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From chen_li3 at yahoo.com Sun May 7 21:18:17 2006 From: chen_li3 at yahoo.com (chen li) Date: Sun, 7 May 2006 18:18:17 -0700 (PDT) Subject: [Bioperl-l] primer parameters using primer3 In-Reply-To: Message-ID: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com> Hi Jason, I add the line code $primer3->primer_product_size_range("490-510"); to my script. But it doesn't work nor primer3 complains it. Li --- Jason Stajich wrote: > I put up some info on the wiki (and I encourage > other people to do > the same!) > http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3 > > Set the command line parameters by just calling a > function of the > name of the parameter. To get a list of the > available options, this > perl code will report it to you: > > # what are the arguments, and what do they mean? > my $args = $primer3->arguments; > > print "ARGUMENT\tMEANING\n"; > foreach my $key (keys %{$args}) {print "$key\t", > $$args{$key}, "\n"} > > The info for PRODUCT_SIZE_RANGE is: > (size range list, default 100-300) space > separated list of product > sizes eg - - > > I believe you can set the PCR product size with > $primer3->primer_product_size_range("490-510"); > > -jason > On May 7, 2006, at 3:34 AM, chen li wrote: > > > Hi all, > > > > I use Bio::Tools::Run::Primer3 to design PCR > primers. > > I want to change some default values, for example, > to > > increase the PCR product size to 490-510 bp > instead of > > using the default value of 100-300 bp. What should > I > > do ? > > > > > > Thanks, > > > > Li > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From hubert.prielinger at gmx.at Sun May 7 21:41:14 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Sun, 07 May 2006 19:41:14 -0600 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <445D45EA.8020804@infotech.monash.edu.au> References: <000d01c670ad$d209f980$15327e82@pyrimidine> <445D218D.2030504@gmx.at> <445D45EA.8020804@infotech.monash.edu.au> Message-ID: <445EA1BA.9050301@gmx.at> hi, I have corrected that and now I finally I got a few error messages: blast.pm: unrecognized line Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Sch?ffer, blast.pm: unrecognized line Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman blast.pm: unrecognized line (1997), "Gapped BLAST and PSI-BLAST: a new generation of blast.pm: unrecognized line protein database search programs", Nucleic Acids Res. 25:3389-3402. blast.pm: unrecognized line RID: 1137529800-24476-151611170370.BLASTQ1 after that line it stops without terminating.... Torsten Seemann wrote: > Hubert Prielinger wrote: >> ok, thanks >> I have submitted the bug >> bug #1994 > > This is a line from the script you sent to Bugzilla: > > my $search = new Bio::SearchIO ( > -verbose => 1,-format => 'blast', -file => $file) > or die "could not open blast report" if not defined my $search; > > Althoygh syntactically correct, I don't think it is doing what you want. > Please change it to this: > > my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or > die "could not open blast report"; > > or alternatively, this: > > my $search = new Bio::SearchIO(-format => 'blast', -file => $file); > if (not defined $search) { > die "could not open blast report"; > } > > and let us know what happens. > > all the example output you have supplied still suggests that > Bio::SearchIO can not load or parse your blast report. > From cjfields at uiuc.edu Sun May 7 22:04:13 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Sun, 7 May 2006 21:04:13 -0500 Subject: [Bioperl-l] [BULK] can't parse blast file anymore Message-ID: <42d52830.b4f91bfc.81e4600@expms6.cites.uiuc.edu> These are debugging lines (not errors); you still have the -verbose flag set. Did you follow Jason's advice? I believe he's right on the money about the issue at hand... Chris ---- Original message ---- >Date: Sun, 07 May 2006 19:41:14 -0600 >From: Hubert Prielinger >Subject: Re: [Bioperl-l] [BULK] ?can't parse blast file anymore >To: Torsten Seemann , bioperl- l at bioperl.org, Chris Fields , Jason Stajich > >hi, >I have corrected that and now I finally I got a few error messages: > >blast.pm: unrecognized line Reference: Altschul, Stephen F., Thomas L. >Madden, Alejandro A. Sch?ffer, >blast.pm: unrecognized line Jinghui Zhang, Zheng Zhang, Webb Miller, and >David J. Lipman >blast.pm: unrecognized line (1997), "Gapped BLAST and PSI-BLAST: a new >generation of >blast.pm: unrecognized line protein database search programs", Nucleic >Acids Res. 25:3389-3402. >blast.pm: unrecognized line RID: 1137529800-24476-151611170370.BLASTQ1 > >after that line it stops without terminating.... > > >Torsten Seemann wrote: >> Hubert Prielinger wrote: >>> ok, thanks >>> I have submitted the bug >>> bug #1994 >> >> This is a line from the script you sent to Bugzilla: >> >> my $search = new Bio::SearchIO ( >> -verbose => 1,-format => 'blast', -file => $file) >> or die "could not open blast report" if not defined my $search; >> >> Althoygh syntactically correct, I don't think it is doing what you want. >> Please change it to this: >> >> my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or >> die "could not open blast report"; >> >> or alternatively, this: >> >> my $search = new Bio::SearchIO(-format => 'blast', -file => $file); >> if (not defined $search) { >> die "could not open blast report"; >> } >> >> and let us know what happens. >> >> all the example output you have supplied still suggests that >> Bio::SearchIO can not load or parse your blast report. >> > From jason.stajich at duke.edu Sun May 7 22:47:00 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun, 7 May 2006 22:47:00 -0400 Subject: [Bioperl-l] primer parameters using primer3 In-Reply-To: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com> References: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com> Message-ID: <430DE892-8EE8-4FC9-8BAC-7D344C876B72@duke.edu> I'm not really familiar with the module more than what the documentation says so did you try and use the add_targets method to add arguments instead? I had thought the AUTOLOAD method took care of access to the cmd line arguments as it does for the other Run modules but I am not really sure. Perhaps folks on the list who use this module can provide better advice. -jason On May 7, 2006, at 9:18 PM, chen li wrote: > Hi Jason, > > I add the line code > $primer3->primer_product_size_range("490-510"); > to my script. But it doesn't work nor primer3 > complains it. > > Li > > --- Jason Stajich wrote: > >> I put up some info on the wiki (and I encourage >> other people to do >> the same!) >> > http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3 >> >> Set the command line parameters by just calling a >> function of the >> name of the parameter. To get a list of the >> available options, this >> perl code will report it to you: >> >> # what are the arguments, and what do they mean? >> my $args = $primer3->arguments; >> >> print "ARGUMENT\tMEANING\n"; >> foreach my $key (keys %{$args}) {print "$key\t", >> $$args{$key}, "\n"} >> >> The info for PRODUCT_SIZE_RANGE is: >> (size range list, default 100-300) space >> separated list of product >> sizes eg - - >> >> I believe you can set the PCR product size with >> $primer3->primer_product_size_range("490-510"); >> >> -jason >> On May 7, 2006, at 3:34 AM, chen li wrote: >> >>> Hi all, >>> >>> I use Bio::Tools::Run::Primer3 to design PCR >> primers. >>> I want to change some default values, for example, >> to >>> increase the PCR product size to 490-510 bp >> instead of >>> using the default value of 100-300 bp. What should >> I >>> do ? >>> >>> >>> Thanks, >>> >>> Li >>> >>> __________________________________________________ >>> Do You Yahoo!? >>> Tired of spam? Yahoo! Mail has the best spam >> protection around >>> http://mail.yahoo.com >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com -- Jason Stajich Duke University http://www.duke.edu/~jes12 From osborne1 at optonline.net Mon May 8 10:49:22 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Mon, 08 May 2006 10:49:22 -0400 Subject: [Bioperl-l] primer parameters using primer3 In-Reply-To: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com> Message-ID: Li, Read the documentation, Bio::Tools::Run::Primer3. It shows examples of the correct syntax. Also look at bioperl-run/t/Primer3.t. Brian O. On 5/7/06 9:18 PM, "chen li" wrote: > Hi Jason, > > I add the line code > $primer3->primer_product_size_range("490-510"); > to my script. But it doesn't work nor primer3 > complains it. > > Li > > --- Jason Stajich wrote: > >> I put up some info on the wiki (and I encourage >> other people to do >> the same!) >> > http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3 >> >> Set the command line parameters by just calling a >> function of the >> name of the parameter. To get a list of the >> available options, this >> perl code will report it to you: >> >> # what are the arguments, and what do they mean? >> my $args = $primer3->arguments; >> >> print "ARGUMENT\tMEANING\n"; >> foreach my $key (keys %{$args}) {print "$key\t", >> $$args{$key}, "\n"} >> >> The info for PRODUCT_SIZE_RANGE is: >> (size range list, default 100-300) space >> separated list of product >> sizes eg - - >> >> I believe you can set the PCR product size with >> $primer3->primer_product_size_range("490-510"); >> >> -jason >> On May 7, 2006, at 3:34 AM, chen li wrote: >> >>> Hi all, >>> >>> I use Bio::Tools::Run::Primer3 to design PCR >> primers. >>> I want to change some default values, for example, >> to >>> increase the PCR product size to 490-510 bp >> instead of >>> using the default value of 100-300 bp. What should >> I >>> do ? >>> >>> >>> Thanks, >>> >>> Li >>> >>> __________________________________________________ >>> Do You Yahoo!? >>> Tired of spam? Yahoo! Mail has the best spam >> protection around >>> http://mail.yahoo.com >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy at colibase.bham.ac.uk Mon May 8 07:12:49 2006 From: roy at colibase.bham.ac.uk (Roy Chaudhuri) Date: Mon, 08 May 2006 12:12:49 +0100 Subject: [Bioperl-l] primer parameters using primer3 In-Reply-To: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com> References: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com> Message-ID: <445F27B1.40501@colibase.bham.ac.uk> Hi Li, I think the syntax you need is: $primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510'); I guess you may also need to change the parameter PRIMER_PRODUCT_OPT_SIZE. Incidentally, such a restricted product size range may mean that Primer3 is unable to design any suitable primers. If I recall correctly, this doesn't cause an error, you just get a Bio::Tools::Primer3 object with no primers in it. I have had some success with testing for this, and if necessary relaxing some constraints on primer design and re-running Primer3. Hope this helps. Roy. -- Dr. Roy Chaudhuri Bioinformatics Research Fellow Division of Immunity and Infection University of Birmingham, U.K. http://xbase.bham.ac.uk > Hi Jason, > > I add the line code > $primer3->primer_product_size_range("490-510"); > to my script. But it doesn't work nor primer3 > complains it. > > Li > > --- Jason Stajich wrote: > >> > I put up some info on the wiki (and I encourage >> > other people to do >> > the same!) >> > > http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3 >> > >> > Set the command line parameters by just calling a >> > function of the >> > name of the parameter. To get a list of the >> > available options, this >> > perl code will report it to you: >> > >> > # what are the arguments, and what do they mean? >> > my $args = $primer3->arguments; >> > >> > print "ARGUMENT\tMEANING\n"; >> > foreach my $key (keys %{$args}) {print "$key\t", >> > $$args{$key}, "\n"} >> > >> > The info for PRODUCT_SIZE_RANGE is: >> > (size range list, default 100-300) space >> > separated list of product >> > sizes eg - - >> > >> > I believe you can set the PCR product size with >> > $primer3->primer_product_size_range("490-510"); >> > >> > -jason >> > On May 7, 2006, at 3:34 AM, chen li wrote: >> > >>> > > Hi all, >>> > > >>> > > I use Bio::Tools::Run::Primer3 to design PCR >> > primers. >>> > > I want to change some default values, for example, >> > to >>> > > increase the PCR product size to 490-510 bp >> > instead of >>> > > using the default value of 100-300 bp. What should >> > I >>> > > do ? >>> > > >>> > > >>> > > Thanks, >>> > > >>> > > Li >>> > > >>> > > __________________________________________________ >>> > > Do You Yahoo!? >>> > > Tired of spam? Yahoo! Mail has the best spam >> > protection around >>> > > http://mail.yahoo.com >>> > > _______________________________________________ >>> > > Bioperl-l mailing list >>> > > Bioperl-l at lists.open-bio.org >>> > > >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > -- >> > Jason Stajich >> > Duke University >> > http://www.duke.edu/~jes12 >> > >> > >> > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From chen_li3 at yahoo.com Mon May 8 09:21:54 2006 From: chen_li3 at yahoo.com (chen li) Date: Mon, 8 May 2006 06:21:54 -0700 (PDT) Subject: [Bioperl-l] primer parameters using primer3 In-Reply-To: <445F27B1.40501@colibase.bham.ac.uk> Message-ID: <20060508132154.71440.qmail@web36802.mail.mud.yahoo.com> I think Dr. Chaudhuri is correct. I add the follwoing line codes to my script(actually copy from the document) $primer3->add_targets( PRIMER_PRODUCT_SIZE_RANGE=>'490-510'); $primer3->add_targets('PRIMER_MIN_TM'=>60, 'PRIMER_MAX_TM'=>64); to design the primers with product size from 490-510 bp and primer annealing Tm from 60 to 64C . Here is part of the output in the file called temp.out: .......... original sequence..... GTGGGCTGGTGTTGCTTGGAAAATTTCAAAATCCCAAAGTTTCAGGCTTCCCAAAGTTGGCTTGGAAAAATGTGATAGTCTCACCTGAGTCTAGACATGT ................. PRIMER_PRODUCT_SIZE_RANGE=490-510 PRIMER_MIN_TM=60 PRIMER_MAX_TM=64 PRIMER_PAIR_PENALTY=0.1544 PRIMER_LEFT_PENALTY=0.081468 PRIMER_RIGHT_PENALTY=0.072951 PRIMER_LEFT_SEQUENCE=CCAAAGTTGGCTTGGAAAAA ............................... PRIMER_PRODUCT_SIZE=501 .............. This is what I want. If you don't set the special parameters such annealing Tm program will use the defualt ones. If you set your own parameters they will show up after the sequence (see this output example). If one needs to set more parameters and wants to know what parameters are available just browse the code for BEGIN section. Now I have another question: the program always prints out the original sequence at the beginning is it possible not to do that? Thanks all for join this topic, Li --- Roy Chaudhuri wrote: > Hi Li, > > I think the syntax you need is: > > $primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510'); > > I guess you may also need to change the parameter > PRIMER_PRODUCT_OPT_SIZE. > > Incidentally, such a restricted product size range > may mean that Primer3 > is unable to design any suitable primers. If I > recall correctly, this > doesn't cause an error, you just get a > Bio::Tools::Primer3 object with > no primers in it. I have had some success with > testing for this, and if > necessary relaxing some constraints on primer design > and re-running > Primer3. > > Hope this helps. > Roy. > > -- > Dr. Roy Chaudhuri > Bioinformatics Research Fellow > Division of Immunity and Infection > University of Birmingham, U.K. > > http://xbase.bham.ac.uk > > > Hi Jason, > > > > I add the line code > > $primer3->primer_product_size_range("490-510"); > > to my script. But it doesn't work nor primer3 > > complains it. > > > > Li > > > > --- Jason Stajich wrote: > > > >> > I put up some info on the wiki (and I encourage > >> > other people to do > >> > the same!) > >> > > > > http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3 > >> > > >> > Set the command line parameters by just calling > a > >> > function of the > >> > name of the parameter. To get a list of the > >> > available options, this > >> > perl code will report it to you: > >> > > >> > # what are the arguments, and what do they > mean? > >> > my $args = $primer3->arguments; > >> > > >> > print "ARGUMENT\tMEANING\n"; > >> > foreach my $key (keys %{$args}) {print > "$key\t", > >> > $$args{$key}, "\n"} > >> > > >> > The info for PRODUCT_SIZE_RANGE is: > >> > (size range list, default 100-300) space > >> > separated list of product > >> > sizes eg - - > >> > > >> > I believe you can set the PCR product size with > >> > > $primer3->primer_product_size_range("490-510"); > >> > > >> > -jason > >> > On May 7, 2006, at 3:34 AM, chen li wrote: > >> > > >>> > > Hi all, > >>> > > > >>> > > I use Bio::Tools::Run::Primer3 to design PCR > >> > primers. > >>> > > I want to change some default values, for > example, > >> > to > >>> > > increase the PCR product size to 490-510 bp > >> > instead of > >>> > > using the default value of 100-300 bp. What > should > >> > I > >>> > > do ? > >>> > > > >>> > > > >>> > > Thanks, > >>> > > > >>> > > Li > >>> > > > >>> > > > __________________________________________________ > >>> > > Do You Yahoo!? > >>> > > Tired of spam? Yahoo! Mail has the best > spam > >> > protection around > >>> > > http://mail.yahoo.com > >>> > > > _______________________________________________ > >>> > > Bioperl-l mailing list > >>> > > Bioperl-l at lists.open-bio.org > >>> > > > >> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > >> > -- > >> > Jason Stajich > >> > Duke University > >> > http://www.duke.edu/~jes12 > >> > > >> > > >> > > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From hubert.prielinger at gmx.at Mon May 8 15:09:29 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Mon, 08 May 2006 13:09:29 -0600 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <42d52830.b4f91bfc.81e4600@expms6.cites.uiuc.edu> References: <42d52830.b4f91bfc.81e4600@expms6.cites.uiuc.edu> Message-ID: <445F9769.70500@gmx.at> hi all together, i have solved the problem, because I'm parsing blast 2.2.13 and I have installed an early bioperl 1.5.1 and there it occurred that bug 1934 wasn't fixed yet, so I had to exchange the blast.pm file and now it works properly. thank you very much Hubert Christopher Fields wrote: > These are debugging lines (not errors); you still have the -verbose flag set. > > Did you follow Jason's advice? I believe he's right on the money about the issue > at hand... > > Chris > > ---- Original message ---- > >> Date: Sun, 07 May 2006 19:41:14 -0600 >> From: Hubert Prielinger >> Subject: Re: [Bioperl-l] [BULK] can't parse blast file anymore >> To: Torsten Seemann , bioperl- >> > l at bioperl.org, Chris Fields , Jason Stajich > > >> hi, >> I have corrected that and now I finally I got a few error messages: >> >> blast.pm: unrecognized line Reference: Altschul, Stephen F., Thomas L. >> Madden, Alejandro A. Sch?ffer, >> blast.pm: unrecognized line Jinghui Zhang, Zheng Zhang, Webb Miller, and >> David J. Lipman >> blast.pm: unrecognized line (1997), "Gapped BLAST and PSI-BLAST: a new >> generation of >> blast.pm: unrecognized line protein database search programs", Nucleic >> Acids Res. 25:3389-3402. >> blast.pm: unrecognized line RID: >> > 1137529800-24476-151611170370.BLASTQ1 > >> after that line it stops without terminating.... >> >> >> Torsten Seemann wrote: >> >>> Hubert Prielinger wrote: >>> >>>> ok, thanks >>>> I have submitted the bug >>>> bug #1994 >>>> >>> This is a line from the script you sent to Bugzilla: >>> >>> my $search = new Bio::SearchIO ( >>> -verbose => 1,-format => 'blast', -file => $file) >>> or die "could not open blast report" if not defined my $search; >>> >>> Althoygh syntactically correct, I don't think it is doing what you want. >>> Please change it to this: >>> >>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or >>> die "could not open blast report"; >>> >>> or alternatively, this: >>> >>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file); >>> if (not defined $search) { >>> die "could not open blast report"; >>> } >>> >>> and let us know what happens. >>> >>> all the example output you have supplied still suggests that >>> Bio::SearchIO can not load or parse your blast report. >>> >>> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From s.johri at imperial.ac.uk Mon May 8 11:38:13 2006 From: s.johri at imperial.ac.uk (Johri, Saurabh) Date: Mon, 8 May 2006 16:38:13 +0100 Subject: [Bioperl-l] PAML + Codeml problem.. Message-ID: <4A98ACB8EC146149872BAC9A132A582C277AC4@icex5.ic.ac.uk> Hi all, I'm trying to use codeml from PAML to estimate Ka, Ks values from sequences within a multi fasta file: i'm using the code which has been posted on the bioperl wiki... However, when I run the code, i get the following errors: I did a google search to see if anyone had come across similar problems.... in which case the problem seems to have been due to the sequences not being a multiple of 3, In my code I check if the sequence is a multiple of 3 and if not, i alter the sequences until this is the case, although I still have the same error messages, Any suggestions as to why this could be happening? Thanks!!! Saurabh Johri Tuberculosis Research Group Centre for Molecular Microbiology & Infection Imperial College London SW7 2AZ -------------------- WARNING --------------------- MSG: There was an error - see error_string for the program output --------------------------------------------------- ------------- EXCEPTION Bio::Root::NotImplemented ------------- MSG: Unknown format of PAML output STACK Bio::Tools::Phylo::PAML::_parse_summary /sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:359 STACK Bio::Tools::Phylo::PAML::next_result /sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:224 ------------------------------------ >Rv3923c caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc aaataagcccggtgttgcaatcaa >Rv3923c_mtb_cdc1551 caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcac >Rv3923c_mtb_f11 caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc aaataagcccggtgttgcaatcaa >Rv3923c_mtb_c1 caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc aaataagcccggtgttgcaatcaa >Rv3923c_mtb_210 caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc aaataagcccggtgttgcaatcaa >Rv3923c_mbovis caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc aaataagcccggtgttgcaatcaa ------------------------------------ From chen_li3 at yahoo.com Mon May 8 20:21:42 2006 From: chen_li3 at yahoo.com (chen li) Date: Mon, 8 May 2006 17:21:42 -0700 (PDT) Subject: [Bioperl-l] use primer3 to design primers with multiple sequences Message-ID: <20060509002142.94880.qmail@web36806.mail.mud.yahoo.com> Dear all, The following is the script I use to design primers for one sequence: #!/cygdrive/c/Perl/bin/perl.exe use warnings; use strict; use Bio::Tools::Run::Primer3; use Bio::SeqIO; my $file_in='piwil2.fa'; my $file_out='temp.out'; my $seqio=Bio::SeqIO->new(-file=>$file_in) my $seq=$seqio->next_seq; my $primer3=Bio::Tools::Run::Primer3->new( -seq=>$seq, -outfile=>$file_out, - path=>"c:/Perl/local/primer3_1.0.0/src/primer3.exe" ); unless ($primer3->executable){ print "primer3 can not be found. Is it installed?\n"; exit(-1); } $primer3->add_targets( # set your own parameters for the primers or product 'PRIMER_OPT_GC_PERCENT'=>' 50 ', 'PRIMER_OPT_SIZE'=> '24 ', 'PRIMER_OPT_TM'=> ' 60 '); my $result=$primer3->run; exit; I try to modify it for multiple sequences by using a while loop as following: while ($seq=$seqio->next_seq){ my $primer3=Bio::Tools::Run::Primer3->new() # design the primer} ....} I get primers only for the last sequence. It seems the earlier ones are overwritten. Any idea will be highly aprreciated. Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From jason.stajich at duke.edu Mon May 8 20:59:26 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon, 8 May 2006 20:59:26 -0400 Subject: [Bioperl-l] PAML + Codeml problem.. In-Reply-To: <4A98ACB8EC146149872BAC9A132A582C277AC4@icex5.ic.ac.uk> References: <4A98ACB8EC146149872BAC9A132A582C277AC4@icex5.ic.ac.uk> Message-ID: <4796FE3D-9D14-4D93-B455-69EDFE2B2B62@duke.edu> Saurabh - a) These sequences are identical except for difference in length so there isn't going to be any interesting values from PAML, but maybe you are just providing an example? b) I think you are missing the trailing gaps in the alignment of the Rv3923c_mtb_cdc1551 sequence as it is shorter PAML requires aligned sequences as input. c) The sequences, in the reading frame you have provided (and using the standard translation table), have stop codons in them, this will cause failure as well. Which code from the wiki are you running, the 'running PAML' part of the HOWTO? Try looking at the actual output from PAML to figure out what is wrong. Add this when initializing the Run object: -save_tempfiles => 1, -verbose => 1, then open up the tempdir that is reported and look at the output files (mlc file). -jason On May 8, 2006, at 11:38 AM, Johri, Saurabh wrote: > Hi all, > > I'm trying to use codeml from PAML to estimate Ka, Ks values from > sequences within a multi fasta file: > i'm using the code which has been posted on the bioperl wiki... > > However, when I run the code, i get the following errors: > > I did a google search to see if anyone had come across similar > problems.... in which case the problem seems to have been due to the > sequences not being a multiple of 3, > In my code I check if the sequence is a multiple of 3 and if not, i > alter the sequences until this is the case, although I still have the > same error messages, > > Any suggestions as to why this could be happening? > > Thanks!!! > > Saurabh Johri > Tuberculosis Research Group > Centre for Molecular Microbiology & Infection > Imperial College London > SW7 2AZ > > > > > -------------------- WARNING --------------------- > MSG: There was an error - see error_string for the program output > --------------------------------------------------- > > ------------- EXCEPTION Bio::Root::NotImplemented ------------- > MSG: Unknown format of PAML output > STACK Bio::Tools::Phylo::PAML::_parse_summary > /sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:359 > STACK Bio::Tools::Phylo::PAML::next_result > /sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:224 > ------------------------------------ > >> Rv3923c > caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg > ag > gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg > ac > ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc > gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg > gt > acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc > gc > aaataagcccggtgttgcaatcaa >> Rv3923c_mtb_cdc1551 > caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg > ag > gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg > ac > ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc > gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg > gt > acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcac >> Rv3923c_mtb_f11 > caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg > ag > gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg > ac > ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc > gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg > gt > acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc > gc > aaataagcccggtgttgcaatcaa >> Rv3923c_mtb_c1 > caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg > ag > gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg > ac > ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc > gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg > gt > acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc > gc > aaataagcccggtgttgcaatcaa >> Rv3923c_mtb_210 > caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg > ag > gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg > ac > ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc > gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg > gt > acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc > gc > aaataagcccggtgttgcaatcaa >> Rv3923c_mbovis > caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg > ag > gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg > ac > ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc > gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg > gt > acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc > gc > aaataagcccggtgttgcaatcaa > > ------------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From osborne1 at optonline.net Mon May 8 21:17:22 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Mon, 08 May 2006 21:17:22 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiple sequences In-Reply-To: <20060509002142.94880.qmail@web36806.mail.mud.yahoo.com> Message-ID: Li, If you're analyzing multiple input sequences you're going to have to create multiple output sequences. Brian O. On 5/8/06 8:21 PM, "chen li" wrote: > I get primers only for the last sequence. It seems the > earlier ones are overwritten. From WiersmaP at AGR.GC.CA Mon May 8 21:28:27 2006 From: WiersmaP at AGR.GC.CA (Wiersma, Paul) Date: Mon, 8 May 2006 21:28:27 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiple sequences Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C41@onncrxms5.agr.gc.ca> Hi Li, When you execute $primer3->run with a Bio::Tools::Run::Primer3 object it opens -outfile=>"filename" for writing and then closes. That's why putting it in a loop will overwrite your output file each time so you only see the last one. I suppose you could read in each output file before looping to the next seq and append it to another file. If you're doing a fair bit of work with this module it would be worth looking at the Bio::Tools::Primer3 module. The statement $result = $primer3->run produces a Bio::Tools::Primer3 object which has all the methods you need for customizing your output. Paul Paul A. Wiersma Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Summerland, BC wiersmap at agr.gc.ca From simon_sask at yahoo.com Tue May 9 04:06:04 2006 From: simon_sask at yahoo.com (Simon K. Chan) Date: Tue, 9 May 2006 01:06:04 -0700 (PDT) Subject: [Bioperl-l] Raw Blast Alignment Message-ID: <20060509080604.53621.qmail@web54104.mail.yahoo.com> Hi Fellow Bioperl-ers, bioperl-live/examples/searchio/rawwriter.pl is supposed to show the raw alignments using Bio::SearchIO. The script is written to parse a PSI-BLAST report. I found an old email in the archive from Jason stating that this should parse other flavors of blast reports as well. What do I need to do to make this script parse non-PSI blast reports? I tried to just specify a file and that the -format is 'blast', but I get an error stating that the object method 'raw_hit_data' is not defined in Bio::Search::Hit::BlastHit. Basically, I want to obtain the raw alignment because I'd like to get the size of the gaps, not just the number. Any help will be much appreciated. Many thanks __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cjfields at uiuc.edu Tue May 9 08:21:02 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Tue, 9 May 2006 07:21:02 -0500 Subject: [Bioperl-l] Raw Blast Alignment Message-ID: You need to read the SearchIO HOWTO, which gives several examples: http://www.bioperl.org/wiki/HOWTO:SearchIO Chris ---- Original message ---- >Date: Tue, 9 May 2006 01:06:04 -0700 (PDT) >From: "Simon K. Chan" >Subject: [Bioperl-l] Raw Blast Alignment >To: bioperl-l at lists.open-bio.org > >Hi Fellow Bioperl-ers, > >bioperl-live/examples/searchio/rawwriter.pl is >supposed to show the raw alignments using >Bio::SearchIO. The script is written to parse a >PSI-BLAST report. I found an old email in the archive >from Jason stating that this should parse other >flavors of blast reports as well. > >What do I need to do to make this script parse non-PSI >blast reports? I tried to just specify a file and >that the -format is 'blast', but I get an error >stating that the object method 'raw_hit_data' is not >defined in Bio::Search::Hit::BlastHit. > >Basically, I want to obtain the raw alignment because >I'd like to get the size of the gaps, not just the >number. > >Any help will be much appreciated. >Many thanks > > >__________________________________________________ >Do You Yahoo!? >Tired of spam? Yahoo! Mail has the best spam protection around >http://mail.yahoo.com >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From peterm at bioinf.uni-leipzig.de Tue May 9 08:44:25 2006 From: peterm at bioinf.uni-leipzig.de (Peter Menzel) Date: Tue, 09 May 2006 14:44:25 +0200 Subject: [Bioperl-l] colorize features Message-ID: <44608EA9.1030808@bioinf.uni-leipzig.de> Hi all, I am using the Bio::Graphics module to draw sequences and their features with Bio::SeqFeature::Generic. The features I want to highlight are occurrences of transcription binding factors. Therefore I want to give every factor its own color, but i didn't see how to manage it. I only can colorize complete tracks. Is there a known workaround? Thanks, Peter From Marc.Logghe at DEVGEN.com Tue May 9 10:13:24 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Tue, 9 May 2006 16:13:24 +0200 Subject: [Bioperl-l] colorize features Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746D88@ANTARESIA.be.devgen.com> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Peter Menzel > Sent: Tuesday, May 09, 2006 2:44 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] colorize features > > Hi all, > I am using the Bio::Graphics module to draw sequences and > their features with Bio::SeqFeature::Generic. > The features I want to highlight are occurrences of > transcription binding factors. Therefore I want to give every > factor its own color, but i didn't see how to manage it. I > only can colorize complete tracks. > Is there a known workaround? Yes, instead of giving a hardcoded color value you can pass a subroutine to the option. -bgcolor => sub { my $feat = shift; # get your attribute on which you want to base your color my ($attr) = $feat->get_tag_values('my_attribute'); return $attr > 10 ? 'red' : 'green' } Not sure about the method calls I am making here (could as well be get_attributes()) but you get the idea. Cheers, Marc From Marc.Logghe at DEVGEN.com Tue May 9 10:47:06 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Tue, 9 May 2006 16:47:06 +0200 Subject: [Bioperl-l] colorize features Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746D89@ANTARESIA.be.devgen.com> Hi Peter, Actually it is explained much better in this howto: http://bioperl.org/wiki/HOWTO:Graphics The examples show the principle I mentioned in my previous post (e.g. Example 4), but then for the -label or -description options. But as said, you can apply this as well for (most of ?) the other options as well. Regards, ML > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Marc Logghe > Sent: Tuesday, May 09, 2006 4:13 PM > To: Peter Menzel; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] colorize features > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Peter > > Menzel > > Sent: Tuesday, May 09, 2006 2:44 PM > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] colorize features > > > > Hi all, > > I am using the Bio::Graphics module to draw sequences and their > > features with Bio::SeqFeature::Generic. > > The features I want to highlight are occurrences of transcription > > binding factors. Therefore I want to give every factor its > own color, > > but i didn't see how to manage it. I only can colorize complete > > tracks. > > Is there a known workaround? > > Yes, instead of giving a hardcoded color value you can pass a > subroutine to the option. > -bgcolor => sub { > my $feat = shift; > # get your attribute on which you want to base your color > my ($attr) = $feat->get_tag_values('my_attribute'); > > return $attr > 10 ? 'red' : 'green' > } > > Not sure about the method calls I am making here (could as well be > get_attributes()) but you get the idea. > Cheers, > Marc > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From WiersmaP at AGR.GC.CA Tue May 9 11:49:33 2006 From: WiersmaP at AGR.GC.CA (Wiersma, Paul) Date: Tue, 9 May 2006 11:49:33 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiple sequences Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C42@onncrxms5.agr.gc.ca> Hi Li, The line "my $result = $primer3->run" is already in the code you submitted. In the Bio::Tools::Primer3 module the author uses "$p3" for the object. If you change your line to "my $p3 = $primer3->run" you should be able to run the examples below. Process the results for each sequence and output the results before looping to the next sequence. >From Bio::Tools::Primer3.pm: # how many results were there? my $num=$p3->number_of_results; print "There were $num results\n"; # get all the results my $all_results=$p3->all_results; print "ALL the results\n"; foreach my $key (keys %{$all_results}) {print "$key\t${$all_results}{$key}\n"} # get specific results my $result1=$p3->primer_results(1); print "The first primer is\n"; foreach my $key (keys %{$result1}) {print "$key\t${$result1}{$key}\n"} Paul Paul A. Wiersma Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Summerland, BC wiersmap at agr.gc.ca ? -----Original Message----- From: chen li [mailto:chen_li3 at yahoo.com] Sent: Monday, May 08, 2006 8:32 PM To: Wiersma, Paul Subject: Re: [Bioperl-l] use primer3 to design primers with multiple sequences Hi Paul, I read both documents. What I understand is that Bio:Tools::Run:Primer3 is for designing primers and Bio:Tools::Primer3 is for parsing the results. When I read the documents I do not see this line $result = $primer3->run in Bio:Tools::Primer3. I wonder how you get this infomration. Thanks, Li --- "Wiersma, Paul" wrote: > Hi Li, > > > > When you execute $primer3->run with a > Bio::Tools::Run::Primer3 object it > opens -outfile=>"filename" for writing and then > closes. That's why > putting it in a loop will overwrite your output file > each time so you > only see the last one. I suppose you could read in > each output file > before looping to the next seq and append it to > another file. > > > > If you're doing a fair bit of work with this module > it would be worth > looking at the Bio::Tools::Primer3 module. The > statement $result = > $primer3->run produces a Bio::Tools::Primer3 object > which has all the > methods you need for customizing your output. > > > > Paul > > > > Paul A. Wiersma > Agriculture and Agri-Food Canada/Agriculture et > Agroalimentaire Canada > Summerland, BC > > wiersmap at agr.gc.ca > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From chen_li3 at yahoo.com Tue May 9 13:32:32 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 9 May 2006 10:32:32 -0700 (PDT) Subject: [Bioperl-l] use primer3 to design primers with multiple sequences In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C42@onncrxms5.agr.gc.ca> Message-ID: <20060509173232.18843.qmail@web36802.mail.mud.yahoo.com> Thanks Paul it REALLY works. I have other questions: 1) When I run the script I use this line on the command prompt perl primer.pl >test When I check the default output file(temp.out) used by the script I only see the information about the last sequence which is different from what is in the test file. In test file I can get all the information for all the sequences. 2)Is it possible directly to use Bio::Tools:: Primer3 to print out selective information such as the primer sequence and the size of PCR product? Or do I have parse the file by myself? After I get all these information I would like to post the script for bacth-designing PCR primers. Thanks, Li --- "Wiersma, Paul" wrote: > Hi Li, > > The line "my $result = $primer3->run" is already in > the code you submitted. In the Bio::Tools::Primer3 > module the author uses "$p3" for the object. If you > change your line to "my $p3 = $primer3->run" you > should be able to run the examples below. Process > the results for each sequence and output the results > before looping to the next sequence. > > >From Bio::Tools::Primer3.pm: > > # how many results were there? > my $num=$p3->number_of_results; > print "There were $num results\n"; > > # get all the results > my $all_results=$p3->all_results; > print "ALL the results\n"; > foreach my $key (keys %{$all_results}) {print > "$key\t${$all_results}{$key}\n"} > > # get specific results > my $result1=$p3->primer_results(1); > print "The first primer is\n"; > foreach my $key (keys %{$result1}) {print > "$key\t${$result1}{$key}\n"} > > Paul > > Paul A. Wiersma > Agriculture and Agri-Food Canada/Agriculture et > Agroalimentaire Canada > Summerland, BC > wiersmap at agr.gc.ca > > ? > > > > -----Original Message----- > From: chen li [mailto:chen_li3 at yahoo.com] > Sent: Monday, May 08, 2006 8:32 PM > To: Wiersma, Paul > Subject: Re: [Bioperl-l] use primer3 to design > primers with multiple sequences > > Hi Paul, > > I read both documents. What I understand is that > Bio:Tools::Run:Primer3 is for designing primers and > Bio:Tools::Primer3 is for parsing the results. When > I > read the documents I do not see this line > $result = $primer3->run in Bio:Tools::Primer3. I > wonder how you get this infomration. > > Thanks, > > Li > > --- "Wiersma, Paul" wrote: > > > Hi Li, > > > > > > > > When you execute $primer3->run with a > > Bio::Tools::Run::Primer3 object it > > opens -outfile=>"filename" for writing and then > > closes. That's why > > putting it in a loop will overwrite your output > file > > each time so you > > only see the last one. I suppose you could read > in > > each output file > > before looping to the next seq and append it to > > another file. > > > > > > > > If you're doing a fair bit of work with this > module > > it would be worth > > looking at the Bio::Tools::Primer3 module. The > > statement $result = > > $primer3->run produces a Bio::Tools::Primer3 > object > > which has all the > > methods you need for customizing your output. > > > > > > > > Paul > > > > > > > > Paul A. Wiersma > > Agriculture and Agri-Food Canada/Agriculture et > > Agroalimentaire Canada > > Summerland, BC > > > > wiersmap at agr.gc.ca > > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam > protection around > http://mail.yahoo.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From WiersmaP at AGR.GC.CA Tue May 9 13:59:20 2006 From: WiersmaP at AGR.GC.CA (Wiersma, Paul) Date: Tue, 9 May 2006 13:59:20 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiple sequences Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C43@onncrxms5.agr.gc.ca> Hi Li, I've attached some code I used to explore basic functionality of Primer3.pm modules. Hopefully you can see how I've picked out parts of the results for printing. You can modify it as you need to output only some results. >>>>>>>> # design the primers. This runs primer3 and returns a # Bio::Tools::Run::Primer3 object with the results my $results=$primer3->run; # see the Bio::Tools::Run::Primer3 pod for # things that you can get from this. For example: print "There were ", $results->number_of_results+1, " primers\n"; my @out_keys_part = qw( START LENGTH TM GC_PERCENT SELF_ANY SELF_END SEQUENCE ); for (my $i=0;$i <= $results->number_of_results;$i++){ # get specific results my $result1=$results->primer_results($i); print "\n",$i+1; for $key qw(PRIMER_LEFT PRIMER_RIGHT){ my ($start, $length) = split /,/, ${$result1}{$key}; ${$result1}{$key."_START"} = $start; ${$result1}{$key."_LENGTH"} = $length; foreach $partkey (@out_keys_part) { print "\t", ${$result1}{$key."_".$partkey}; } print "\n"; } print "\tPRODUCT SIZE: ", ${$result1}{'PRIMER_PRODUCT_SIZE'}, ", PAIR ANY COMPL: ", ${$result1}{'PRIMER_PAIR_COMPL_ANY'}; print ", PAIR 3\' COMPL: ", ${$result1}{'PRIMER_PAIR_COMPL_END'}, "\n"; } >>>>>>>>>>>>>>> Paul A. Wiersma Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Telephone/T?l?phone: 250-494-6388 Facsimile/T?l?copieur: 250-494-0755 Box 5000, 4200 Hwy 97 Summerland, BC V0H 1Z0 wiersmap at agr.gc.ca ? -----Original Message----- From: chen li [mailto:chen_li3 at yahoo.com] Sent: Tuesday, May 09, 2006 10:33 AM To: Wiersma, Paul Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] use primer3 to design primers with multiple sequences Thanks Paul it REALLY works. I have other questions: 1) When I run the script I use this line on the command prompt perl primer.pl >test When I check the default output file(temp.out) used by the script I only see the information about the last sequence which is different from what is in the test file. In test file I can get all the information for all the sequences. 2)Is it possible directly to use Bio::Tools:: Primer3 to print out selective information such as the primer sequence and the size of PCR product? Or do I have parse the file by myself? After I get all these information I would like to post the script for bacth-designing PCR primers. Thanks, Li --- "Wiersma, Paul" wrote: > Hi Li, > > The line "my $result = $primer3->run" is already in > the code you submitted. In the Bio::Tools::Primer3 > module the author uses "$p3" for the object. If you > change your line to "my $p3 = $primer3->run" you > should be able to run the examples below. Process > the results for each sequence and output the results > before looping to the next sequence. > > >From Bio::Tools::Primer3.pm: > > # how many results were there? > my $num=$p3->number_of_results; > print "There were $num results\n"; > > # get all the results > my $all_results=$p3->all_results; > print "ALL the results\n"; > foreach my $key (keys %{$all_results}) {print > "$key\t${$all_results}{$key}\n"} > > # get specific results > my $result1=$p3->primer_results(1); > print "The first primer is\n"; > foreach my $key (keys %{$result1}) {print > "$key\t${$result1}{$key}\n"} > > Paul > > Paul A. Wiersma > Agriculture and Agri-Food Canada/Agriculture et > Agroalimentaire Canada > Summerland, BC > wiersmap at agr.gc.ca > > ? > > > > -----Original Message----- > From: chen li [mailto:chen_li3 at yahoo.com] > Sent: Monday, May 08, 2006 8:32 PM > To: Wiersma, Paul > Subject: Re: [Bioperl-l] use primer3 to design > primers with multiple sequences > > Hi Paul, > > I read both documents. What I understand is that > Bio:Tools::Run:Primer3 is for designing primers and > Bio:Tools::Primer3 is for parsing the results. When > I > read the documents I do not see this line > $result = $primer3->run in Bio:Tools::Primer3. I > wonder how you get this infomration. > > Thanks, > > Li > > --- "Wiersma, Paul" wrote: > > > Hi Li, > > > > > > > > When you execute $primer3->run with a > > Bio::Tools::Run::Primer3 object it > > opens -outfile=>"filename" for writing and then > > closes. That's why > > putting it in a loop will overwrite your output > file > > each time so you > > only see the last one. I suppose you could read > in > > each output file > > before looping to the next seq and append it to > > another file. > > > > > > > > If you're doing a fair bit of work with this > module > > it would be worth > > looking at the Bio::Tools::Primer3 module. The > > statement $result = > > $primer3->run produces a Bio::Tools::Primer3 > object > > which has all the > > methods you need for customizing your output. > > > > > > > > Paul > > > > > > > > Paul A. Wiersma > > Agriculture and Agri-Food Canada/Agriculture et > > Agroalimentaire Canada > > Summerland, BC > > > > wiersmap at agr.gc.ca > > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam > protection around > http://mail.yahoo.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cjfields at uiuc.edu Tue May 9 17:13:43 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 9 May 2006 16:13:43 -0500 Subject: [Bioperl-l] Oddness in Bio::SeqIO Message-ID: <000601c673ad$74601c30$15327e82@pyrimidine> I noticed an odd thing with SeqIO parsing of species lines (those problematic bacterial tax names again). I have a simple script that runs output to STDOUT to generate a list of hits. Here's what I get: Bacterium: Corynebacterium glutamicum ATCC 13032 hits: 4 Bacterium: Corynebacterium jeikeium K411 K411 <-- hits: 1 Bacterium: Frankia sp. CcI3 CcI3 <-- hits: 1 Bacterium: Frankia sp. EAN1pec EAN1pec <-- hits: 1 Bacterium: Janibacter sp. HTCC2649 HTCC2649 <-- hits: 1 Bacterium: Kineococcus radiotolerans SRS30216 SRS30216 <-- hits: 1 Bacterium: Leifsonia xyli subsp. xyli str. CTCB07 xyli str. CTCB07 <-- hits: 1 Bacterium: Mycobacterium avium subsp. paratuberculosis K-10 paratuberculosis K-10 <-- ... Most (but not all) of the strain numbers get repeated (marked with arrows). This is actually in the GenBank file itself, downloaded via Bio::DB::GenBank (and thus passed through Bio::SeqIO). Anyone seen this before? Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From torsten.seemann at infotech.monash.edu.au Tue May 9 19:42:29 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Wed, 10 May 2006 09:42:29 +1000 Subject: [Bioperl-l] Oddness in Bio::SeqIO In-Reply-To: <000601c673ad$74601c30$15327e82@pyrimidine> References: <000601c673ad$74601c30$15327e82@pyrimidine> Message-ID: <446128E5.1000908@infotech.monash.edu.au> Chris, > I noticed an odd thing with SeqIO parsing of species lines (those > problematic bacterial tax names again). I have a simple script that runs > output to STDOUT to generate a list of hits. Here's what I get: > Bacterium: Mycobacterium avium subsp. paratuberculosis K-10 paratuberculosis > K-10 <-- In this case, Genus = Mycobacterium Species = avium Subspecies = paratuberculosis Strain = K-10 which suggests that BioPerl is trying to handle something special, because the 'subsp.' is gone? Here's the pertinent parts of the Genbank file (apologies for the wrapping): LOCUS NC_002944 4829781 bp DNA circular BCT 18-JAN-2006 DEFINITION Mycobacterium avium subsp. paratuberculosis K-10, complete genome. SOURCE Mycobacterium avium subsp. paratuberculosis K-10 ORGANISM Mycobacterium avium subsp. paratuberculosis K-10 Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales; Corynebacterineae; Mycobacteriaceae; Mycobacterium; Mycobacterium avium complex (MAC). /organism="Mycobacterium avium subsp. paratuberculosis K-10" /strain="K-10" /sub_species="paratuberculosis" > Most (but not all) of the strain numbers get repeated (marked with arrows). > This is actually in the GenBank file itself, downloaded via Bio::DB::GenBank > (and thus passed through Bio::SeqIO). Anyone seen this before? The problem is mentioned in the wiki so it must have come up before? http://bioperl.org/wiki/Project_priority_list#Taxonomy_.2F_Species_data I also deal with Bacteria mainly, and should also look into this. I haven't been using the genbank headers directly, only the features, so i never came across this. Another thing which may crop up is when no Species has been allocated yet but the genus is known (or something like that). In that case the name is written as "Genus spp." eg. Gallibacterium spp. --Torsten From chen_li3 at yahoo.com Tue May 9 21:04:08 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 9 May 2006 18:04:08 -0700 (PDT) Subject: [Bioperl-l] use primer3 to design primers with multiple sequences In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C47@onncrxms5.agr.gc.ca> Message-ID: <20060510010408.24494.qmail@web36804.mail.mud.yahoo.com> Hi Paul, Thank you very much. Just like you point out in your lastest email I now figure out the line "my $result1=$results->primer_results(1);" returns a hash reference containing all the information for the first pair of primer. 1)Since it is a hash I should be able to get the specific value for its corresponding key by telling Perl which key is the entry for the value. 2) Also it is a reference I should deference it to get the so-called true value. I don't know too much OO and Perl and your code looks a little bit complicated to me. But I get the job done by adding the following lines directly: ############################################### #from Primer3 module to get all the infomration #foreach my $key (sort keys %{$result1}) { #print "$key\t${$result1}{$key}\n"} ################################################## #get the value for the key in the hash reference my $key_PRIMER_LEFT_SEQUENCE='PRIMER_LEFT_SEQUENCE'; print "$key_PRIMER_LEFT_SEQUENCE\t${$result1}{$key_PRIMER_LEFT_SEQUENCE}\n"; There is one point I don't understand: When I add these two lines into my code (line 49 in my code) my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID'; print "$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n"; I don't get the PRIMER_SEQUENCE_ID. Perl complains it and says "Use of uninitialized value in concatenation (.) or string at primer3-3 line 49." Li --- "Wiersma, Paul" wrote: > Hi Li, > > Just a bit of clarification of the code that I sent > earlier. > The line "my $result1=$results->primer_results($i);" > gives you a > reference to a hash that contains all of the > information for a primer > pair. > To access the entries you dereference the hash, i.e. > the hash is > %{$result1} and ${$result1}{'PRIMER_PRODUCT_SIZE'} > gives you the entry > for product size. The following are the available > entries. All are > single values or strings except PRIMER_RIGHT and > PRIMER_LEFT which are > start,length pairs (e.g. PRIMER_LEFT => '60,20') > which can be pulled out > with split. > my ($start, $length) = split /,/, > ${$result1}{'PRIMER_LEFT'}; > my $right_Tm = ${$result1}{'PRIMER_RIGHT_TM'} > PRIMER_PRODUCT_SIZE > PRIMER_PAIR_COMPL_ANY > PRIMER_PAIR_COMPL_END > PRIMER_PAIR_PENALTY > > PRIMER_LEFT > PRIMER_LEFT_END_STABILITY > PRIMER_LEFT_PENALTY > PRIMER_LEFT_TM > PRIMER_LEFT_GC_PERCENT > PRIMER_LEFT_SELF_ANY > PRIMER_LEFT_SELF_END > PRIMER_LEFT_SEQUENCE > > PRIMER_RIGHT > PRIMER_RIGHT_END_STABILITY > PRIMER_RIGHT_PENALTY > PRIMER_RIGHT_TM > PRIMER_RIGHT_GC_PERCENT > PRIMER_RIGHT_SELF_ANY > PRIMER_RIGHT_SELF_END > PRIMER_RIGHT_SEQUENCE > > Paul A. Wiersma > Agriculture and Agri-Food Canada/Agriculture et > Agroalimentaire Canada > Summerland, BC > wiersmap at agr.gc.ca > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From zhouyubio at gmail.com Tue May 9 21:35:01 2006 From: zhouyubio at gmail.com (Yu ZHOU) Date: Wed, 10 May 2006 01:35:01 +0000 (UTC) Subject: [Bioperl-l] pubmed References: <6.1.2.0.2.20050331171052.03830ba8@qfdong.mail.iastate.edu> Message-ID: Qunfeng iastate.edu> writes: > > Hi there, > > http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html > > I am not very familiar with BioPerl. I tried to follow the example showing > in the above page to retrieve pubmed ID under each Reference tag , i.e., > $value->pubmed(), but it doesn't work for me for the seq gi#56961711. The > authors() works for me. Appreciate any suggestions. > > Qunfeng > Hi, I have the same problem with you. Here is what I have done, by using regular expression to match the value of 'location' tag, if there is. #------------------ my $ann = $seqobj->annotation(); # annotation object foreach my $ref ( $ann->get_Annotations('reference') ) { print "Title: ", $ref->title,"\n"; print "Location: ", $ref->location, "\n"; if ($ref->location =~ /PUBMED\s+(\d+)/) { my $pmid = $1; print "PMID: ", $pmid, "\n"; } print "Authors: ", $ref->authors, "\n"; } #------------------ From osborne1 at optonline.net Tue May 9 23:01:49 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 09 May 2006 23:01:49 -0400 Subject: [Bioperl-l] pubmed In-Reply-To: Message-ID: Qunfeng, I'm using bioperl-live, I'm able retrieve the single PubMed id found in the 56961711 entry using the pubmed() method. Note that there are 4 references, only one of which has a Pubmed id. Also, the authors() method prints out the authors, not the Pubmed id. If you have a problem please show your code and tell us which version of Bioperl you're using. Brian O. use strict; use lib "/Users/bosborne/bioperl-live"; use Bio::DB::GenBank; my $db = Bio::DB::GenBank->new; my $seq = $db->get_Seq_by_id(56961711); my $ann_coll = $seq->annotation; foreach my $ann ($ann_coll->get_Annotations('reference')) { print "Author: ", $ann->authors, "\nPubmed id: ", $ann->pubmed, "\n"; } On 5/9/06 9:35 PM, "Yu ZHOU" wrote: > Qunfeng iastate.edu> writes: > >> >> Hi there, >> >> http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html >> >> I am not very familiar with BioPerl. I tried to follow the example showing >> in the above page to retrieve pubmed ID under each Reference tag , i.e., >> $value->pubmed(), but it doesn't work for me for the seq gi#56961711. The >> authors() works for me. Appreciate any suggestions. >> >> Qunfeng >> > > > Hi, > > I have the same problem with you. Here is what I have done, by using regular > expression to match the value of 'location' tag, if there is. > > #------------------ > my $ann = $seqobj->annotation(); # annotation object > foreach my $ref ( $ann->get_Annotations('reference') ) { > print "Title: ", $ref->title,"\n"; > print "Location: ", $ref->location, "\n"; > if ($ref->location =~ /PUBMED\s+(\d+)/) { > my $pmid = $1; > print "PMID: ", $pmid, "\n"; > } > print "Authors: ", $ref->authors, "\n"; > } > #------------------ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sb at mrc-dunn.cam.ac.uk Wed May 10 05:30:59 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Wed, 10 May 2006 10:30:59 +0100 Subject: [Bioperl-l] Bio::Taxonomy confusion Message-ID: <4461B2D3.7010603@mrc-dunn.cam.ac.uk> Hi, I'm a little confused as to how names are supposed to work in Bio::Taxonomy::Node. In the bioperl versions that I've looked at a Node doesn't seem to store the most important information about itself - it's scientific name - in an obvious place. bioperl 1.5.1 puts it at the start of the classification list. I'd have thought sticking it in -name would make more sense, but this is used only for the GenBank common name. The Bio::Taxonomy docs still suggests: my $node_species_sapiens = Bio::Taxonomy::Node->new( -object_id => 9606, # or -ncbi_taxid. Requird tag -names => { 'scientific' => ['sapiens'], 'common_name' => ['human'] }, -rank => 'species' # Required tag ); and whilst Bio::Taxonomy::Node does not accept -names, it does have a 'name' method which claims to work like: $obj->name('scientific', 'sapiens'); This kind of thing would be really nice, but afaics Bio::Taxonomy::Node->new takes the -name value and makes a common name out of it, whilst the name() method passes any 'scientific' name to the scientific_name() method which is unable to set any value (and warns about this), only get. It seems like the need to have this classification array work the same way as Bio::Species is causing some unnecessary restrictions. Can't the more sensible idea of having a dedicated storage spot for the ScientificName and other parameters be used, with the classification array either being generated just-in-time from the hash-stored data, or indeed being generated from the Lineage field? Also, why does a node store the complete hierarchy on itself in the classification array? If we're going that far, why don't the Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a get_taxonomy() method instead of a get_Taxonomy_Node() method. get_taxonomy() could, from a single efetch.fcgi lookup, create a complete Bio::Taxonomy with all the nodes. Whilst most nodes would only have a minimum of information, if you could simply ask a node what its rank and scientific name was you could easily build a classification array, or ask what Kingdom your species was in etc. Are there good reasons for Taxonomy working the way it does in 1.5.1, or would I not be wasting my time re-writing things to make more sense (to me)? Cheers, Sendu. From osborne1 at optonline.net Wed May 10 10:33:18 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 10 May 2006 10:33:18 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiple sequences In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C43@onncrxms5.agr.gc.ca> Message-ID: Paul, I took your code, added some "run" code and made it into a script and added this to CVS, examples/tools/run_primer3.pl. I hope this is OK with you. Brian O. On 5/9/06 1:59 PM, "Wiersma, Paul" wrote: > $results->number_of_results From stoltzfu at umbi.umd.edu Tue May 9 16:22:43 2006 From: stoltzfu at umbi.umd.edu (Arlin Stoltzfus) Date: Tue, 09 May 2006 16:22:43 -0400 Subject: [Bioperl-l] proposal: CDAT (character data and trees) integrative object Message-ID: Dear developers-- We propose a Bio::CDAT (Character Data And Trees) module to facilitate comparative analysis using evolutionary methods by 1) managing evolutionary relationships (by linking data to trees) and 2) allowing coordinated analysis of different types of data (by implementing a generic concept of ?character-state? data). Bio::CDAT would take advantage of existing BioPerl objects and would include the functionality of Rutger Vos's Bio::Phylo. It would provide the framework to develop interfaces to analysis tools (phylogeny inference, evolutionary rate models, functional shift inference, etc), as well as to file formats and visualization methods appropriate for such analyses. A proposal is attached. We would like to hear your thoughts (e.g., see the section on "Questions to consider")! Thanks Arlin Stoltzfus WeiGang Qiu Rutger Vos (with thanks to Justin Reese and Aaron Mackey) ------------------ Arlin Stoltzfus (stoltzfu at umbi.umd.edu) CARB, 9600 Gudelsky Drive, Rockville, Maryland 20850 tel 240 314 6208, fax 240 314 6255, www.molevol.org/camel --------- -------------- next part -------------- A non-text attachment was scrubbed... Name: CDAT-proposal.pdf Type: application/pdf Size: 193701 bytes Desc: not available URL: -------------- next part -------------- From zhouyubio at gmail.com Wed May 10 04:55:46 2006 From: zhouyubio at gmail.com (Yu Zhou) Date: Wed, 10 May 2006 16:55:46 +0800 Subject: [Bioperl-l] pubmed In-Reply-To: References: Message-ID: <613ffb490605100155w43a9ea4sca23818bc7fa4e33@mail.gmail.com> Thanks! I am using Bioperl-1.4, not bioperl-live. That may be the reason why it does not work! On 5/10/06, Brian Osborne wrote: > Qunfeng, > > I'm using bioperl-live, I'm able retrieve the single PubMed id found in the > 56961711 entry using the pubmed() method. Note that there are 4 references, > only one of which has a Pubmed id. Also, the authors() method prints out the > authors, not the Pubmed id. If you have a problem please show your code and > tell us which version of Bioperl you're using. > > Brian O. > > > use strict; > > use lib "/Users/bosborne/bioperl-live"; > > use Bio::DB::GenBank; > > > > my $db = Bio::DB::GenBank->new; > > my $seq = $db->get_Seq_by_id(56961711); > > my $ann_coll = $seq->annotation; > > > foreach my $ann ($ann_coll->get_Annotations('reference')) { > > print "Author: ", $ann->authors, "\nPubmed id: ", $ann->pubmed, "\n"; > > } > > > > > > On 5/9/06 9:35 PM, "Yu ZHOU" wrote: > > > Qunfeng iastate.edu> writes: > > > >> > >> Hi there, > >> > >> http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html > >> > >> I am not very familiar with BioPerl. I tried to follow the example > showing > >> in the above page to retrieve pubmed ID under each Reference tag , i.e., > >> $value->pubmed(), but it doesn't work for me for the seq gi#56961711. The > >> authors() works for me. Appreciate any suggestions. > >> > >> Qunfeng > >> > > > > > > Hi, > > > > I have the same problem with you. Here is what I have done, by using > regular > > expression to match the value of 'location' tag, if there is. > > > > #------------------ > > my $ann = $seqobj->annotation(); # annotation object > > foreach my $ref ( $ann->get_Annotations('reference') ) { > > print "Title: ", $ref->title,"\n"; > > print "Location: ", $ref->location, "\n"; > > if ($ref->location =~ /PUBMED\s+(\d+)/) { > > my $pmid = $1; > > print "PMID: ", $pmid, "\n"; > > } > > print "Authors: ", $ref->authors, "\n"; > > } > > #------------------ > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- Best Wishes! Yu From cjfields at uiuc.edu Wed May 10 11:46:27 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 10 May 2006 10:46:27 -0500 Subject: [Bioperl-l] Oddness in Bio::SeqIO In-Reply-To: <446128E5.1000908@infotech.monash.edu.au> Message-ID: <000f01c67448$e63973b0$15327e82@pyrimidine> This actually pops up when using $seq->species->common_name; using $seq->species->binomial chops some of the strain designations off, so really neither one works optimally for bacterial genus-species-strain taxonomy. Hilmar made the suggestion that it's probably best to grab the NCBI TaxID and parse it out that way by looking it up in the taxonomy database (using Bio::DB::Taxonomy), but at the moment that's not what Bio::SeqIO::genbank does. I wonder if we should be trying to shove most of this stuff into species objects directly from the beginning; in other words, maybe we should try to get the information in Bio::Annotation objects and then, after the parsing/IO is finished, have a method to get the information into Bio::Species objects when wanted/needed; a check could be added against the NCBI Taxonomy database there. Anyway, I really haven't looked at how they are parsed out and don't have the time at the moment. I may look into this as well but not until I get back from conference (end of May). Jason and Brian have been calling for a refactoring of Bio::SeqIO::genbank for a while; maybe it's getting time to do something about it... Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > Sent: Tuesday, May 09, 2006 6:42 PM > To: Chris Fields > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Oddness in Bio::SeqIO > > Chris, > > > I noticed an odd thing with SeqIO parsing of species lines (those > > problematic bacterial tax names again). I have a simple script that > runs > > output to STDOUT to generate a list of hits. Here's what I get: > > > Bacterium: Mycobacterium avium subsp. paratuberculosis K-10 > paratuberculosis > > K-10 <-- > > In this case, > > Genus = Mycobacterium > Species = avium > Subspecies = paratuberculosis > Strain = K-10 > > which suggests that BioPerl is trying to handle something special, > because the 'subsp.' is gone? > > Here's the pertinent parts of the Genbank file > (apologies for the wrapping): > > LOCUS NC_002944 4829781 bp DNA circular BCT > 18-JAN-2006 > DEFINITION Mycobacterium avium subsp. paratuberculosis K-10, complete > genome. > SOURCE Mycobacterium avium subsp. paratuberculosis K-10 > ORGANISM Mycobacterium avium subsp. paratuberculosis K-10 > Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales; > Corynebacterineae; Mycobacteriaceae; Mycobacterium; > Mycobacterium > avium complex (MAC). > > /organism="Mycobacterium avium subsp. > paratuberculosis K-10" > /strain="K-10" > /sub_species="paratuberculosis" > > > > Most (but not all) of the strain numbers get repeated (marked with > arrows). > > This is actually in the GenBank file itself, downloaded via > Bio::DB::GenBank > > (and thus passed through Bio::SeqIO). Anyone seen this before? > > The problem is mentioned in the wiki so it must have come up before? > http://bioperl.org/wiki/Project_priority_list#Taxonomy_.2F_Species_data > > I also deal with Bacteria mainly, and should also look into this. I > haven't been using the genbank headers directly, only the features, so i > never came across this. > > Another thing which may crop up is when no Species has been allocated > yet but the genus is known (or something like that). In that case the > name is written as "Genus spp." eg. Gallibacterium spp. > > --Torsten > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cuiw at mail.nih.gov Wed May 10 12:02:55 2006 From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F]) Date: Wed, 10 May 2006 12:02:55 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiplesequences In-Reply-To: <20060510010408.24494.qmail@web36804.mail.mud.yahoo.com> Message-ID: 'PRIMER_SEQUENCE_ID' is not a key in the Bio::Tools::Primer3 output hash. You can find all legal keys by "print keys %{$result1};" There is one point I don't understand: When I add these two lines into my code (line 49 in my code) my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID'; print "$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n"; I don't get the PRIMER_SEQUENCE_ID. Perl complains it and says "Use of uninitialized value in concatenation (.) or string at primer3-3 line 49." Li From WiersmaP at AGR.GC.CA Wed May 10 12:08:37 2006 From: WiersmaP at AGR.GC.CA (Wiersma, Paul) Date: Wed, 10 May 2006 12:08:37 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiple sequences Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C48@onncrxms5.agr.gc.ca> Brian, no problem with the code, thanks for asking. Li, PRIMER_SEQUENCE_ID and SEQUENCE are not part of the individual results but only end up by default with $results->primer_results(0). If you try to access them using $results->primer_results(1) (or anything but 0) you will get an error. Paul Paul A. Wiersma Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Summerland, BC wiersmap at agr.gc.ca ? -----Original Message----- From: chen li [mailto:chen_li3 at yahoo.com] Sent: Tuesday, May 09, 2006 6:04 PM To: Wiersma, Paul Cc: bioperl-l at bioperl.org Subject: RE: [Bioperl-l] use primer3 to design primers with multiple sequences Hi Paul, Thank you very much. Just like you point out in your lastest email I now figure out the line "my $result1=$results->primer_results(1);" returns a hash reference containing all the information for the first pair of primer. 1)Since it is a hash I should be able to get the specific value for its corresponding key by telling Perl which key is the entry for the value. 2) Also it is a reference I should deference it to get the so-called true value. I don't know too much OO and Perl and your code looks a little bit complicated to me. But I get the job done by adding the following lines directly: ############################################### #from Primer3 module to get all the infomration #foreach my $key (sort keys %{$result1}) { #print "$key\t${$result1}{$key}\n"} ################################################## #get the value for the key in the hash reference my $key_PRIMER_LEFT_SEQUENCE='PRIMER_LEFT_SEQUENCE'; print "$key_PRIMER_LEFT_SEQUENCE\t${$result1}{$key_PRIMER_LEFT_SEQUENCE}\n"; There is one point I don't understand: When I add these two lines into my code (line 49 in my code) my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID'; print "$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n"; I don't get the PRIMER_SEQUENCE_ID. Perl complains it and says "Use of uninitialized value in concatenation (.) or string at primer3-3 line 49." Li --- "Wiersma, Paul" wrote: > Hi Li, > > Just a bit of clarification of the code that I sent > earlier. > The line "my $result1=$results->primer_results($i);" > gives you a > reference to a hash that contains all of the > information for a primer > pair. > To access the entries you dereference the hash, i.e. > the hash is > %{$result1} and ${$result1}{'PRIMER_PRODUCT_SIZE'} > gives you the entry > for product size. The following are the available > entries. All are > single values or strings except PRIMER_RIGHT and > PRIMER_LEFT which are > start,length pairs (e.g. PRIMER_LEFT => '60,20') > which can be pulled out > with split. > my ($start, $length) = split /,/, > ${$result1}{'PRIMER_LEFT'}; > my $right_Tm = ${$result1}{'PRIMER_RIGHT_TM'} > PRIMER_PRODUCT_SIZE > PRIMER_PAIR_COMPL_ANY > PRIMER_PAIR_COMPL_END > PRIMER_PAIR_PENALTY > > PRIMER_LEFT > PRIMER_LEFT_END_STABILITY > PRIMER_LEFT_PENALTY > PRIMER_LEFT_TM > PRIMER_LEFT_GC_PERCENT > PRIMER_LEFT_SELF_ANY > PRIMER_LEFT_SELF_END > PRIMER_LEFT_SEQUENCE > > PRIMER_RIGHT > PRIMER_RIGHT_END_STABILITY > PRIMER_RIGHT_PENALTY > PRIMER_RIGHT_TM > PRIMER_RIGHT_GC_PERCENT > PRIMER_RIGHT_SELF_ANY > PRIMER_RIGHT_SELF_END > PRIMER_RIGHT_SEQUENCE > > Paul A. Wiersma > Agriculture and Agri-Food Canada/Agriculture et > Agroalimentaire Canada > Summerland, BC > wiersmap at agr.gc.ca > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cuiw at mail.nih.gov Wed May 10 14:42:36 2006 From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F]) Date: Wed, 10 May 2006 14:42:36 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiplesequences: bug in code! In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C48@onncrxms5.agr.gc.ca> Message-ID: Hope this works! Bio::Tools::Primer3 line 264 should be: $self->{seqobject}=Bio::Seq->new(-seq=>$value, -id=>$id); Then you should be able to display PRIMER_SEQUENCE_ID by ####read primer3 output file############ my $p3=Bio::Tools::Primer3->new(-file=>"data/primer3_output.txt"); ######## print id############### print $p3->seqobject->id; Wenwu Cui, PhD NIH/NCI -----Original Message----- From: Wiersma, Paul [mailto:WiersmaP at agr.gc.ca] Sent: Wednesday, May 10, 2006 12:09 PM To: chen li Cc: bioperl-l at bioperl.org Subject: Re: [Bioperl-l] use primer3 to design primers with multiplesequences Brian, no problem with the code, thanks for asking. Li, PRIMER_SEQUENCE_ID and SEQUENCE are not part of the individual results but only end up by default with $results->primer_results(0). If you try to access them using $results->primer_results(1) (or anything but 0) you will get an error. Paul Paul A. Wiersma Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Summerland, BC wiersmap at agr.gc.ca ? -----Original Message----- From: chen li [mailto:chen_li3 at yahoo.com] Sent: Tuesday, May 09, 2006 6:04 PM To: Wiersma, Paul Cc: bioperl-l at bioperl.org Subject: RE: [Bioperl-l] use primer3 to design primers with multiple sequences Hi Paul, Thank you very much. Just like you point out in your lastest email I now figure out the line "my $result1=$results->primer_results(1);" returns a hash reference containing all the information for the first pair of primer. 1)Since it is a hash I should be able to get the specific value for its corresponding key by telling Perl which key is the entry for the value. 2) Also it is a reference I should deference it to get the so-called true value. I don't know too much OO and Perl and your code looks a little bit complicated to me. But I get the job done by adding the following lines directly: ############################################### #from Primer3 module to get all the infomration #foreach my $key (sort keys %{$result1}) { #print "$key\t${$result1}{$key}\n"} ################################################## #get the value for the key in the hash reference my $key_PRIMER_LEFT_SEQUENCE='PRIMER_LEFT_SEQUENCE'; print "$key_PRIMER_LEFT_SEQUENCE\t${$result1}{$key_PRIMER_LEFT_SEQUENCE}\n"; There is one point I don't understand: When I add these two lines into my code (line 49 in my code) my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID'; print "$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n"; I don't get the PRIMER_SEQUENCE_ID. Perl complains it and says "Use of uninitialized value in concatenation (.) or string at primer3-3 line 49." Li --- "Wiersma, Paul" wrote: > Hi Li, > > Just a bit of clarification of the code that I sent > earlier. > The line "my $result1=$results->primer_results($i);" > gives you a > reference to a hash that contains all of the > information for a primer > pair. > To access the entries you dereference the hash, i.e. > the hash is > %{$result1} and ${$result1}{'PRIMER_PRODUCT_SIZE'} > gives you the entry > for product size. The following are the available > entries. All are > single values or strings except PRIMER_RIGHT and > PRIMER_LEFT which are > start,length pairs (e.g. PRIMER_LEFT => '60,20') > which can be pulled out > with split. > my ($start, $length) = split /,/, > ${$result1}{'PRIMER_LEFT'}; > my $right_Tm = ${$result1}{'PRIMER_RIGHT_TM'} > PRIMER_PRODUCT_SIZE > PRIMER_PAIR_COMPL_ANY > PRIMER_PAIR_COMPL_END > PRIMER_PAIR_PENALTY > > PRIMER_LEFT > PRIMER_LEFT_END_STABILITY > PRIMER_LEFT_PENALTY > PRIMER_LEFT_TM > PRIMER_LEFT_GC_PERCENT > PRIMER_LEFT_SELF_ANY > PRIMER_LEFT_SELF_END > PRIMER_LEFT_SEQUENCE > > PRIMER_RIGHT > PRIMER_RIGHT_END_STABILITY > PRIMER_RIGHT_PENALTY > PRIMER_RIGHT_TM > PRIMER_RIGHT_GC_PERCENT > PRIMER_RIGHT_SELF_ANY > PRIMER_RIGHT_SELF_END > PRIMER_RIGHT_SEQUENCE > > Paul A. Wiersma > Agriculture and Agri-Food Canada/Agriculture et > Agroalimentaire Canada > Summerland, BC > wiersmap at agr.gc.ca > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Wed May 10 14:58:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 10 May 2006 13:58:19 -0500 Subject: [Bioperl-l] ListSummaries for April 26-May 9 Message-ID: <001801c67463$b3c0a910$15327e82@pyrimidine> ListSummaries for April 26-May 9 are up at the usual place: http://www.bioperl.org/wiki/Mailing_list_summaries Direct link: http://www.bioperl.org/wiki/ListSummary:April_26-May_9%2C2006 It's a bit of a hurried one so don't be surprised to find a few spelling errors here and there. I'm getting ready for a conference in a couple weeks so I may be off the radar a bit here and there. The next ListSummary won't be posted until May 26. Enjoy! Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From chen_li3 at yahoo.com Wed May 10 20:27:34 2006 From: chen_li3 at yahoo.com (chen li) Date: Wed, 10 May 2006 17:27:34 -0700 (PDT) Subject: [Bioperl-l] What is the relationship between primer3 module and run-primer3 module? Message-ID: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com> First thank you all for replying my previous post about primer3. But now I am a little confused even after I read the documents: What is the relationship between these two modules? What is correct/standard way to use them to do the batch-primer design? What I do is that I use Bio::Tools::Run::Primer3 to design primers. Based on Dr. Roy Chaudhuri's information I can set the parameters using the following syntax: $primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510'); Based on Paul A. Wiersma's explanation I can also print out part of the primer results(because I don't need all the information). But there is a little trouble: PRIMER_SEQUENCE_ID can't be accessed using this method. And Paul points out that "PRIMER_SEQUENCE_ID and SEQUENCE are not part of the individual results but only end up by default with $results->primer_results(0)". So it seems there is no way to get around this problem using Bio::Tools::Run::Primer3. And others suggest using Bio::Tools::Primer3 to parse the results. So is true that Bio::Tools::Run::Primer3 is for primer design and Bio::Tools::Primer3 is for parsing the results from Bio::Tools::Run::Primer3? But what I find is that I get almost all the results (except PRIMER_SEQUENCE_ID and SEQUENCE ) without providing a line code use Bio::Tools::Primer3 in the script. How to explain this? Is it because the following line code? my $result=$primer3->run; The last question: which line code is used to invoke program primer3.exe? How does Perl script call the primer3.exe? Once again thank you all very much, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From jason.stajich at duke.edu Wed May 10 20:41:31 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed, 10 May 2006 20:41:31 -0400 Subject: [Bioperl-l] What is the relationship between primer3 module and run-primer3 module? In-Reply-To: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com> References: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com> Message-ID: Bio::Tools::Run::XXX modules are for running applications... On May 10, 2006, at 8:27 PM, chen li wrote: > First thank you all for replying my previous post > about primer3. > > But now I am a little confused even after I read the > documents: What is the relationship between these two > modules? What is correct/standard way to use them to > do the batch-primer design? What I do is that I use > Bio::Tools::Run::Primer3 to design primers. Based on > Dr. Roy Chaudhuri's information I can set the > parameters using the following syntax: > > $primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510'); > > Based on Paul A. Wiersma's explanation I can also > print out part of the primer results(because I don't > need all the information). But there is a little > trouble: PRIMER_SEQUENCE_ID can't be accessed using > this method. And Paul points out that > "PRIMER_SEQUENCE_ID and SEQUENCE are not part of the > individual > results but only end up by default with > $results->primer_results(0)". So it seems there is no > way to get around this problem using > Bio::Tools::Run::Primer3. And others suggest using > Bio::Tools::Primer3 to parse the results. So is true > that Bio::Tools::Run::Primer3 is for primer design and > Bio::Tools::Primer3 is for parsing the results from > Bio::Tools::Run::Primer3? But what I find is that I > get almost all the results (except PRIMER_SEQUENCE_ID > and SEQUENCE ) without providing a line code > > use Bio::Tools::Primer3 > > in the script. How to explain this? Is it because the > following line code? > > my $result=$primer3->run; > > The last question: which line code is used to invoke > program primer3.exe? How does Perl script call the > primer3.exe? > > Once again thank you all very much, > > Li > > > > > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From jason.stajich at duke.edu Wed May 10 20:53:43 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed, 10 May 2006 20:53:43 -0400 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: <4461B2D3.7010603@mrc-dunn.cam.ac.uk> References: <4461B2D3.7010603@mrc-dunn.cam.ac.uk> Message-ID: <655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu> I would use the implementation that talks to the flatfile db as the standard here. nodes are defined by the data in from taxonomy dump dbs from ncbi. the eutils is pretty worthless except for taxid->name or reverse, you can't get the full taxonomy (or couldn't when that implementation was written). The "name" method refers to the name of the node - each level in the taxonomy can have a "name". The bits of hackiness relate to wrapping the node object as a Bio::Species and/or being able to read a genbank file and the organism taxonomy data as a list and instantiating. If we could rely on everything being in a DB of course this would be simpler. Another problem is the depth of the taxonomy is not constant for every node so assuming that a fixed number of slots will be filled in to generate the taxonomy leads to problems. Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the best example of working code as this is how I really wanted it to work, the Bio::Species hacks are only there to shoehorn data retrieved from genbank files in. With the flatfile implementation you have to walk all the way up the db hierarchy to get the kingdom for a node so you do have to build up the classification hierarchy as each node only stores data about itsself. I'm not exactly sure what you are proposing to do, but would definitely enjoy another pair of hands, I don't really have time to mess with it any time soon. -jason On May 10, 2006, at 5:30 AM, Sendu Bala wrote: > Hi, > I'm a little confused as to how names are supposed to work in > Bio::Taxonomy::Node. > > In the bioperl versions that I've looked at a Node doesn't seem to > store > the most important information about itself - it's scientific name > - in > an obvious place. bioperl 1.5.1 puts it at the start of the > classification list. I'd have thought sticking it in -name would make > more sense, but this is used only for the GenBank common name. > > The Bio::Taxonomy docs still suggests: > > my $node_species_sapiens = Bio::Taxonomy::Node->new( > -object_id => 9606, # or -ncbi_taxid. Requird tag > -names => { > 'scientific' => ['sapiens'], > 'common_name' => ['human'] > }, > -rank => 'species' # Required tag > ); > > and whilst Bio::Taxonomy::Node does not accept -names, it does have a > 'name' method which claims to work like: > > $obj->name('scientific', 'sapiens'); > > This kind of thing would be really nice, but afaics > Bio::Taxonomy::Node->new takes the -name value and makes a common name > out of it, whilst the name() method passes any 'scientific' name to > the > scientific_name() method which is unable to set any value (and warns > about this), only get. > > It seems like the need to have this classification array work the same > way as Bio::Species is causing some unnecessary restrictions. Can't > the > more sensible idea of having a dedicated storage spot for the > ScientificName and other parameters be used, with the classification > array either being generated just-in-time from the hash-stored > data, or > indeed being generated from the Lineage field? > > > Also, why does a node store the complete hierarchy on itself in the > classification array? If we're going that far, why don't the > Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a > get_taxonomy() method instead of a get_Taxonomy_Node() method. > get_taxonomy() could, from a single efetch.fcgi lookup, create a > complete Bio::Taxonomy with all the nodes. Whilst most nodes would > only > have a minimum of information, if you could simply ask a node what its > rank and scientific name was you could easily build a classification > array, or ask what Kingdom your species was in etc. > > Are there good reasons for Taxonomy working the way it does in > 1.5.1, or > would I not be wasting my time re-writing things to make more sense > (to me)? > > > Cheers, > Sendu. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cuiw at mail.nih.gov Wed May 10 21:46:00 2006 From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F]) Date: Wed, 10 May 2006 21:46:00 -0400 Subject: [Bioperl-l] What is the relationship between primer3 module andrun-primer3 module? References: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com> Message-ID: 1. Bio::Tools::Primer3 is already included in Bio::Tools::Run::Primer3 module so that you can parse the result file. 2. There is a bug in Bio::Toos::Primer3.pm line 264 as I mentioned. Once fixed, it can output PRIMER_SEQUENCE_ID 3. primer3.exe is called in the Bio::Tools::Run::Primer3 "run" function, please read the function definition. ________________________________ From: chen li [mailto:chen_li3 at yahoo.com] Sent: Wed 5/10/2006 8:27 PM To: bioperl-l at bioperl.org Subject: [Bioperl-l] What is the relationship between primer3 module andrun-primer3 module? First thank you all for replying my previous post about primer3. But now I am a little confused even after I read the documents: What is the relationship between these two modules? What is correct/standard way to use them to do the batch-primer design? What I do is that I use Bio::Tools::Run::Primer3 to design primers. Based on Dr. Roy Chaudhuri's information I can set the parameters using the following syntax: $primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510'); Based on Paul A. Wiersma's explanation I can also print out part of the primer results(because I don't need all the information). But there is a little trouble: PRIMER_SEQUENCE_ID can't be accessed using this method. And Paul points out that "PRIMER_SEQUENCE_ID and SEQUENCE are not part of the individual results but only end up by default with $results->primer_results(0)". So it seems there is no way to get around this problem using Bio::Tools::Run::Primer3. And others suggest using Bio::Tools::Primer3 to parse the results. So is true that Bio::Tools::Run::Primer3 is for primer design and Bio::Tools::Primer3 is for parsing the results from Bio::Tools::Run::Primer3? But what I find is that I get almost all the results (except PRIMER_SEQUENCE_ID and SEQUENCE ) without providing a line code use Bio::Tools::Primer3 in the script. How to explain this? Is it because the following line code? my $result=$primer3->run; The last question: which line code is used to invoke program primer3.exe? How does Perl script call the primer3.exe? Once again thank you all very much, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Wed May 10 23:36:39 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 10 May 2006 22:36:39 -0500 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: <655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu> Message-ID: <000301c674ac$1d40f0f0$15327e82@pyrimidine> I think you can get pretty much everything now, though I can definitely see the use of a local database. I ran a few tests, really unrelated to this, using the powerscripting test page at NCBI for eutils (for the curious, at http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was able to retrieve XML-formatted taxonomic information; here's the bacterium Frankia sp. CcI3 TaxID info, which looks like they have everything set up by rank. It gives quite a bit of information. 106370 Frankia sp. CcI3 1854 species Bacteria 11 Bacterial and Plant Plastid 0 Unspecified cellular organisms; Bacteria; Actinobacteria; Actinobacteria (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae; Frankia 131567 cellular organisms no rank 2 Bacteria superkingdom 201174 Actinobacteria phylum 1760 Actinobacteria (class) class 85003 Actinobacteridae subclass 2037 Actinomycetales order 85013 Frankineae suborder 74712 Frankiaceae family 1854 Frankia genus 1999/10/22 2005/01/19 2000/02/02 Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Jason Stajich > Sent: Wednesday, May 10, 2006 7:54 PM > To: Sendu Bala > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::Taxonomy confusion > > I would use the implementation that talks to the flatfile db as the > standard here. nodes are defined by the data in from taxonomy dump > dbs from ncbi. > the eutils is pretty worthless except for taxid->name or reverse, you > can't get the full taxonomy (or couldn't when that implementation was > written). > > The "name" method refers to the name of the node - each level in the > taxonomy can have a "name". > > The bits of hackiness relate to wrapping the node object as a > Bio::Species and/or being able to read a genbank file and the > organism taxonomy data as a list and instantiating. If we could rely > on everything being in a DB of course this would be simpler. > > Another problem is the depth of the taxonomy is not constant for > every node so assuming that a fixed number of slots will be filled in > to generate the taxonomy leads to problems. > > Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the > best example of working code as this is how I really wanted it to > work, the Bio::Species hacks are only there to shoehorn data > retrieved from genbank files in. With the flatfile implementation > you have to walk all the way up the db hierarchy to get the kingdom > for a node so you do have to build up the classification hierarchy as > each node only stores data about itsself. > > I'm not exactly sure what you are proposing to do, but would > definitely enjoy another pair of hands, I don't really have time to > mess with it any time soon. > > -jason > On May 10, 2006, at 5:30 AM, Sendu Bala wrote: > > > Hi, > > I'm a little confused as to how names are supposed to work in > > Bio::Taxonomy::Node. > > > > In the bioperl versions that I've looked at a Node doesn't seem to > > store > > the most important information about itself - it's scientific name > > - in > > an obvious place. bioperl 1.5.1 puts it at the start of the > > classification list. I'd have thought sticking it in -name would make > > more sense, but this is used only for the GenBank common name. > > > > The Bio::Taxonomy docs still suggests: > > > > my $node_species_sapiens = Bio::Taxonomy::Node->new( > > -object_id => 9606, # or -ncbi_taxid. Requird tag > > -names => { > > 'scientific' => ['sapiens'], > > 'common_name' => ['human'] > > }, > > -rank => 'species' # Required tag > > ); > > > > and whilst Bio::Taxonomy::Node does not accept -names, it does have a > > 'name' method which claims to work like: > > > > $obj->name('scientific', 'sapiens'); > > > > This kind of thing would be really nice, but afaics > > Bio::Taxonomy::Node->new takes the -name value and makes a common name > > out of it, whilst the name() method passes any 'scientific' name to > > the > > scientific_name() method which is unable to set any value (and warns > > about this), only get. > > > > It seems like the need to have this classification array work the same > > way as Bio::Species is causing some unnecessary restrictions. Can't > > the > > more sensible idea of having a dedicated storage spot for the > > ScientificName and other parameters be used, with the classification > > array either being generated just-in-time from the hash-stored > > data, or > > indeed being generated from the Lineage field? > > > > > > Also, why does a node store the complete hierarchy on itself in the > > classification array? If we're going that far, why don't the > > Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a > > get_taxonomy() method instead of a get_Taxonomy_Node() method. > > get_taxonomy() could, from a single efetch.fcgi lookup, create a > > complete Bio::Taxonomy with all the nodes. Whilst most nodes would > > only > > have a minimum of information, if you could simply ask a node what its > > rank and scientific name was you could easily build a classification > > array, or ask what Kingdom your species was in etc. > > > > Are there good reasons for Taxonomy working the way it does in > > 1.5.1, or > > would I not be wasting my time re-writing things to make more sense > > (to me)? > > > > > > Cheers, > > Sendu. > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Thu May 11 08:04:54 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu, 11 May 2006 08:04:54 -0400 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: <000301c674ac$1d40f0f0$15327e82@pyrimidine> References: <000301c674ac$1d40f0f0$15327e82@pyrimidine> Message-ID: Great - now we just need someone to volunteer to actually work on this. The current code grabs most of this but I believe expects a different XML On May 10, 2006, at 11:36 PM, Chris Fields wrote: > I think you can get pretty much everything now, though I can > definitely see > the use of a local database. I ran a few tests, really unrelated > to this, > using the powerscripting test page at NCBI for eutils (for the > curious, at > http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was > able to > retrieve XML-formatted taxonomic information; here's the bacterium > Frankia > sp. CcI3 TaxID info, which looks like they have everything set up > by rank. > It gives quite a bit of information. > > > "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd"> > > > > 106370 > Frankia sp. CcI3 > 1854 > species > Bacteria > > 11 > Bacterial and Plant Plastid > > > 0 > Unspecified > > cellular organisms; Bacteria; Actinobacteria; > Actinobacteria > (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae; > Frankia > > > 131567 > cellular organisms > no rank > > > 2 > Bacteria > superkingdom > > > 201174 > Actinobacteria > phylum > > > 1760 > Actinobacteria (class) > class > > > 85003 > Actinobacteridae > subclass > > > 2037 > Actinomycetales > order > > > 85013 > Frankineae > suborder > > > 74712 > Frankiaceae > family > > > 1854 > Frankia > genus > > > 1999/10/22 > 2005/01/19 > 2000/02/02 > > > > Chris > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Jason Stajich >> Sent: Wednesday, May 10, 2006 7:54 PM >> To: Sendu Bala >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion >> >> I would use the implementation that talks to the flatfile db as the >> standard here. nodes are defined by the data in from taxonomy dump >> dbs from ncbi. >> the eutils is pretty worthless except for taxid->name or reverse, you >> can't get the full taxonomy (or couldn't when that implementation was >> written). >> >> The "name" method refers to the name of the node - each level in the >> taxonomy can have a "name". >> >> The bits of hackiness relate to wrapping the node object as a >> Bio::Species and/or being able to read a genbank file and the >> organism taxonomy data as a list and instantiating. If we could rely >> on everything being in a DB of course this would be simpler. >> >> Another problem is the depth of the taxonomy is not constant for >> every node so assuming that a fixed number of slots will be filled in >> to generate the taxonomy leads to problems. >> >> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the >> best example of working code as this is how I really wanted it to >> work, the Bio::Species hacks are only there to shoehorn data >> retrieved from genbank files in. With the flatfile implementation >> you have to walk all the way up the db hierarchy to get the kingdom >> for a node so you do have to build up the classification hierarchy as >> each node only stores data about itsself. >> >> I'm not exactly sure what you are proposing to do, but would >> definitely enjoy another pair of hands, I don't really have time to >> mess with it any time soon. >> >> -jason >> On May 10, 2006, at 5:30 AM, Sendu Bala wrote: >> >>> Hi, >>> I'm a little confused as to how names are supposed to work in >>> Bio::Taxonomy::Node. >>> >>> In the bioperl versions that I've looked at a Node doesn't seem to >>> store >>> the most important information about itself - it's scientific name >>> - in >>> an obvious place. bioperl 1.5.1 puts it at the start of the >>> classification list. I'd have thought sticking it in -name would >>> make >>> more sense, but this is used only for the GenBank common name. >>> >>> The Bio::Taxonomy docs still suggests: >>> >>> my $node_species_sapiens = Bio::Taxonomy::Node->new( >>> -object_id => 9606, # or -ncbi_taxid. Requird tag >>> -names => { >>> 'scientific' => ['sapiens'], >>> 'common_name' => ['human'] >>> }, >>> -rank => 'species' # Required tag >>> ); >>> >>> and whilst Bio::Taxonomy::Node does not accept -names, it does >>> have a >>> 'name' method which claims to work like: >>> >>> $obj->name('scientific', 'sapiens'); >>> >>> This kind of thing would be really nice, but afaics >>> Bio::Taxonomy::Node->new takes the -name value and makes a common >>> name >>> out of it, whilst the name() method passes any 'scientific' name to >>> the >>> scientific_name() method which is unable to set any value (and warns >>> about this), only get. >>> >>> It seems like the need to have this classification array work the >>> same >>> way as Bio::Species is causing some unnecessary restrictions. Can't >>> the >>> more sensible idea of having a dedicated storage spot for the >>> ScientificName and other parameters be used, with the classification >>> array either being generated just-in-time from the hash-stored >>> data, or >>> indeed being generated from the Lineage field? >>> >>> >>> Also, why does a node store the complete hierarchy on itself in the >>> classification array? If we're going that far, why don't the >>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a >>> get_taxonomy() method instead of a get_Taxonomy_Node() method. >>> get_taxonomy() could, from a single efetch.fcgi lookup, create a >>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would >>> only >>> have a minimum of information, if you could simply ask a node >>> what its >>> rank and scientific name was you could easily build a classification >>> array, or ask what Kingdom your species was in etc. >>> >>> Are there good reasons for Taxonomy working the way it does in >>> 1.5.1, or >>> would I not be wasting my time re-writing things to make more sense >>> (to me)? >>> >>> >>> Cheers, >>> Sendu. >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From sb at mrc-dunn.cam.ac.uk Thu May 11 07:51:44 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Thu, 11 May 2006 12:51:44 +0100 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: <655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu> References: <4461B2D3.7010603@mrc-dunn.cam.ac.uk> <655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu> Message-ID: <44632550.3040603@mrc-dunn.cam.ac.uk> Jason Stajich wrote: > I would use the implementation that talks to the flatfile db as the > standard here. nodes are defined by the data in from taxonomy dump > dbs from ncbi. the eutils is pretty worthless except for taxid->name > or reverse, you can't get the full taxonomy (or couldn't when that > implementation was written). I'm not sure what you mean. In 1.5.1 you have access to the full taxonomy because you're using efetch.fcgi. Indeed, you parse the full taxonomy already to get the classification. > The "name" method refers to the name of the node - each level in the > taxonomy can have a "name". Yes, and to me the 'name of the node' is its scientific name (something like 'sapiens'), not a 'common' name. So why is it stored as a 'common' name in the object? Why don't the DB::Taxonomy modules store the actual common names (something like 'human')? > The bits of hackiness relate to wrapping the node object as a > Bio::Species and/or being able to read a genbank file and the > organism taxonomy data as a list and instantiating. If we could rely > on everything being in a DB of course this would be simpler. I think that Taxonomy stuff could be done in a 'pure' way, with a new Bio::Species made as a wrapper around an appropriate Taxonomy module(s) that cheated and made fake nodes from a genbank list and then made a proper Bio::Taxonomy. > With the flatfile implementation you have to walk all the way up the > db hierarchy to get the kingdom for a node so you do have to build up > the classification hierarchy as each node only stores data about > itsself. I'm still actually using bioperl 1.4 but I'm looking at 1.5.1 assuming it is the latest available and I see that the flatfile implementation works the same way as the entrez one. The requested node is fetched, but then internally it walks the hierarchy purely so it can build a classification list which is then stored on the object. If you're already retrieving every node above the the requested node, why not just return every node? Why not just return a whole Bio::Taxonomy? > I'm not exactly sure what you are proposing to do, but would > definitely enjoy another pair of hands, I don't really have time to > mess with it any time soon. I shouldn't really be spending any time on it either, but I knocked up a quick implementation for myself yesterday/today. I'm working on a bunch of modules that inherit from bioperl and then add/alter to suit my needs. In this regard they're a bit limited and kind of hard-coded to my way of thinking, but hopefully you can see my intent and perhaps use some of my implementation. In my implementation: # DB::Taxonomy::* return a Bio::Taxonomy equivalent with a single database lookup. # The Taxonomy is implicitly a tree. # The Taxonomy can have branches of different length from root to the same rank level. # The Taxonomy isn't told what ranks is has (isn't limited by some supplied rank list); it has the ranks that its Nodes have and knows (without being told) what order those ranks should be in. # The Taxonomy is made of Nodes that truly only contain information about themselves and have no classification array or anything like that. # A Node can still be classified. # We can have Nodes of rank 'no rank' that will be correctly ordered in the classification. # Nodes have a scientific name and common names # You get parent and all children nodes without database lookups. # There is a Bio::Species like thing that wraps around this and gives easy access to what I really want to do: my $human = TFBS::Species->new(-common_name => 'human'); my @classification = $human->classification; # returns the array you'd expect from a normally created, fully classified Bio::Species my $kingdom = $human->kingdom # returns 'Metazoa' # For genbank, we can still supply TFBS::Species a classification array http://bix.sendu.me.uk/files/taxonomy_the_tfbs_way.tar.gz (only tested inheriting from bioperl 1.4, but ideally that shouldn't make any difference!) Is there any scope for bioperl Taxonomy becoming more like this? Or are there problems with my design (quite likely!)? Or are there good reasons for maintaining the current way of working? Please feel free to shoot me down/ discuss. Cheers, Sendu. From sb at mrc-dunn.cam.ac.uk Thu May 11 08:22:53 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Thu, 11 May 2006 13:22:53 +0100 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: References: <000301c674ac$1d40f0f0$15327e82@pyrimidine> Message-ID: <44632C9D.4010408@mrc-dunn.cam.ac.uk> Jason Stajich wrote: > Great - now we just need someone to volunteer to actually work on this. Now I'm really confused... > The current code grabs most of this but I believe expects a different XML No, I think the code in bioperl 1.5.1 Bio::DB::Taxonomy::entrez expects that XML, and parses it as fully as flatfile.pm does. Nothing more to do. Weren't you the person that wrote that parser? I parse the same XML in my version of entrez.pm (see my previous email); the main difference being I make Nodes out of each Taxon instead of just adding each Taxon's ScientificName to the classification array. From jason.stajich at duke.edu Thu May 11 09:53:56 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu, 11 May 2006 09:53:56 -0400 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: <44632C9D.4010408@mrc-dunn.cam.ac.uk> References: <000301c674ac$1d40f0f0$15327e82@pyrimidine> <44632C9D.4010408@mrc-dunn.cam.ac.uk> Message-ID: i guess so - long since forgotten what it supports though since I don't regularly use it. sorry. On May 11, 2006, at 8:22 AM, Sendu Bala wrote: > Jason Stajich wrote: >> Great - now we just need someone to volunteer to actually work on >> this. > > Now I'm really confused... > > >> The current code grabs most of this but I believe expects a >> different XML > > No, I think the code in bioperl 1.5.1 Bio::DB::Taxonomy::entrez > expects > that XML, and parses it as fully as flatfile.pm does. Nothing more to > do. Weren't you the person that wrote that parser? > > I parse the same XML in my version of entrez.pm (see my previous > email); > the main difference being I make Nodes out of each Taxon instead of > just > adding each Taxon's ScientificName to the classification array. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cjfields at uiuc.edu Thu May 11 10:57:20 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 11 May 2006 09:57:20 -0500 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: Message-ID: <000b01c6750b$33e95ea0$15327e82@pyrimidine> Heh... To tell the truth, I haven't looked at Bio::DB::Taxonomy in any depth yet, but I myself have seen issues with the way Bio::Species treats bacterial strains (I guess this also involves Bio::Taxonomy::Node since that's what Bio::Species delegates to). Seems it likes to repeat some strain names when using $seq->species->common_name. Not a killer problem but annoying since the correct name is in the source tag in the feature table! I 'could' take a look at it but I can't guarantee quick results. Jason, I could add Taxonomy to the EUtilities overhaul I mentioned to you previously but it'll take awhile to get going. I'm really more interested in getting epost-esearch-efetch sequence retrieval up and running first with the same API as Bio::DB::GenBank/Genpept and Bio::DB::Query::GenBank, donate the code (late summer/fall???) after working out namespace issues so it doesn't conflict with current Bio::DB::WebDBSeqI inheritance. I suppose I could also look at Bio::DB:Taxonomy to see what's up in the next couple of weeks (after conference), unless someone gets to it sooner. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Jason Stajich > Sent: Thursday, May 11, 2006 7:05 AM > To: Chris Fields > Cc: bioperl-l at lists.open-bio.org; 'Sendu Bala' > Subject: Re: [Bioperl-l] Bio::Taxonomy confusion > > Great - now we just need someone to volunteer to actually work on this. > > The current code grabs most of this but I believe expects a different > XML > > > On May 10, 2006, at 11:36 PM, Chris Fields wrote: > > > I think you can get pretty much everything now, though I can > > definitely see > > the use of a local database. I ran a few tests, really unrelated > > to this, > > using the powerscripting test page at NCBI for eutils (for the > > curious, at > > http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was > > able to > > retrieve XML-formatted taxonomic information; here's the bacterium > > Frankia > > sp. CcI3 TaxID info, which looks like they have everything set up > > by rank. > > It gives quite a bit of information. > > > > > > > "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd"> > > > > > > > > 106370 > > Frankia sp. CcI3 > > 1854 > > species > > Bacteria > > > > 11 > > Bacterial and Plant Plastid > > > > > > 0 > > Unspecified > > > > cellular organisms; Bacteria; Actinobacteria; > > Actinobacteria > > (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae; > > Frankia > > > > > > 131567 > > cellular organisms > > no rank > > > > > > 2 > > Bacteria > > superkingdom > > > > > > 201174 > > Actinobacteria > > phylum > > > > > > 1760 > > Actinobacteria (class) > > class > > > > > > 85003 > > Actinobacteridae > > subclass > > > > > > 2037 > > Actinomycetales > > order > > > > > > 85013 > > Frankineae > > suborder > > > > > > 74712 > > Frankiaceae > > family > > > > > > 1854 > > Frankia > > genus > > > > > > 1999/10/22 > > 2005/01/19 > > 2000/02/02 > > > > > > > > Chris > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Jason Stajich > >> Sent: Wednesday, May 10, 2006 7:54 PM > >> To: Sendu Bala > >> Cc: bioperl-l at lists.open-bio.org > >> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion > >> > >> I would use the implementation that talks to the flatfile db as the > >> standard here. nodes are defined by the data in from taxonomy dump > >> dbs from ncbi. > >> the eutils is pretty worthless except for taxid->name or reverse, you > >> can't get the full taxonomy (or couldn't when that implementation was > >> written). > >> > >> The "name" method refers to the name of the node - each level in the > >> taxonomy can have a "name". > >> > >> The bits of hackiness relate to wrapping the node object as a > >> Bio::Species and/or being able to read a genbank file and the > >> organism taxonomy data as a list and instantiating. If we could rely > >> on everything being in a DB of course this would be simpler. > >> > >> Another problem is the depth of the taxonomy is not constant for > >> every node so assuming that a fixed number of slots will be filled in > >> to generate the taxonomy leads to problems. > >> > >> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the > >> best example of working code as this is how I really wanted it to > >> work, the Bio::Species hacks are only there to shoehorn data > >> retrieved from genbank files in. With the flatfile implementation > >> you have to walk all the way up the db hierarchy to get the kingdom > >> for a node so you do have to build up the classification hierarchy as > >> each node only stores data about itsself. > >> > >> I'm not exactly sure what you are proposing to do, but would > >> definitely enjoy another pair of hands, I don't really have time to > >> mess with it any time soon. > >> > >> -jason > >> On May 10, 2006, at 5:30 AM, Sendu Bala wrote: > >> > >>> Hi, > >>> I'm a little confused as to how names are supposed to work in > >>> Bio::Taxonomy::Node. > >>> > >>> In the bioperl versions that I've looked at a Node doesn't seem to > >>> store > >>> the most important information about itself - it's scientific name > >>> - in > >>> an obvious place. bioperl 1.5.1 puts it at the start of the > >>> classification list. I'd have thought sticking it in -name would > >>> make > >>> more sense, but this is used only for the GenBank common name. > >>> > >>> The Bio::Taxonomy docs still suggests: > >>> > >>> my $node_species_sapiens = Bio::Taxonomy::Node->new( > >>> -object_id => 9606, # or -ncbi_taxid. Requird tag > >>> -names => { > >>> 'scientific' => ['sapiens'], > >>> 'common_name' => ['human'] > >>> }, > >>> -rank => 'species' # Required tag > >>> ); > >>> > >>> and whilst Bio::Taxonomy::Node does not accept -names, it does > >>> have a > >>> 'name' method which claims to work like: > >>> > >>> $obj->name('scientific', 'sapiens'); > >>> > >>> This kind of thing would be really nice, but afaics > >>> Bio::Taxonomy::Node->new takes the -name value and makes a common > >>> name > >>> out of it, whilst the name() method passes any 'scientific' name to > >>> the > >>> scientific_name() method which is unable to set any value (and warns > >>> about this), only get. > >>> > >>> It seems like the need to have this classification array work the > >>> same > >>> way as Bio::Species is causing some unnecessary restrictions. Can't > >>> the > >>> more sensible idea of having a dedicated storage spot for the > >>> ScientificName and other parameters be used, with the classification > >>> array either being generated just-in-time from the hash-stored > >>> data, or > >>> indeed being generated from the Lineage field? > >>> > >>> > >>> Also, why does a node store the complete hierarchy on itself in the > >>> classification array? If we're going that far, why don't the > >>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a > >>> get_taxonomy() method instead of a get_Taxonomy_Node() method. > >>> get_taxonomy() could, from a single efetch.fcgi lookup, create a > >>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would > >>> only > >>> have a minimum of information, if you could simply ask a node > >>> what its > >>> rank and scientific name was you could easily build a classification > >>> array, or ask what Kingdom your species was in etc. > >>> > >>> Are there good reasons for Taxonomy working the way it does in > >>> 1.5.1, or > >>> would I not be wasting my time re-writing things to make more sense > >>> (to me)? > >>> > >>> > >>> Cheers, > >>> Sendu. > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> -- > >> Jason Stajich > >> Duke University > >> http://www.duke.edu/~jes12 > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Thu May 11 11:42:07 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu, 11 May 2006 11:42:07 -0400 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: <000b01c6750b$33e95ea0$15327e82@pyrimidine> References: <000b01c6750b$33e95ea0$15327e82@pyrimidine> Message-ID: <0C1C2DAC-F388-465E-B6C2-7217A3B4CC6C@duke.edu> I think you'll see it is different and mostly a limitation of the genbank format and the Bio::Species objects that you get from a genbank parse do represent the full capabilities of a Taxonomy::Node. I am happy for someone to overhaul things, but it all boils down to inferring which part of a list of names is the species versus sub- species versus strain when none of the members of the list are labeled. This is some of the same problems we have for swissprot as well. I just don't think we can do it right only from the genbank file data so I don't see a lot of point of expecting Bio::Species to provide more than a representation of what is in the file and just return that array. It has seemed like we need to special case things pretty heavily or do a lookup in the taxonomydb for something. Can you guess what value is the strain versus sub-species? What happens when there is a two part strain name (space separated) and a sub-species or variety designation? SOURCE Staphylococcus haemolyticus JCSC1435 ORGANISM Staphylococcus haemolyticus JCSC1435 Bacteria; Firmicutes; Bacillales; Staphylococcus. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=279808 strain is JCSC1435 versus SOURCE Muntiacus muntjak vaginalis ORGANISM Muntiacus muntjak vaginalis Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Ruminantia; Pecora; Cervidae; Muntiacinae; Muntiacus. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9887 species is muntjak, sub-species vaginalis ? versus SOURCE Aspergillus nidulans FGSC A4 ORGANISM Aspergillus nidulans FGSC A4 Eukaryota; Fungi; Ascomycota; Pezizomycotina; Eurotiomycetes; Eurotiales; Trichocomaceae; Emericella. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=227321 Genus should be Aspergillus or Emericella ? Strain and subspecies/variety in the same entry SOURCE Cryptococcus neoformans var. grubii H99 ORGANISM Cryptococcus neoformans var. grubii H99 Eukaryota; Fungi; Basidiomycota; Hymenomycetes; Heterobasidiomycetes; Tremellomycetidae; Tremellales; Tremellaceae; Filobasidiella. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=235443 On May 11, 2006, at 10:57 AM, Chris Fields wrote: > Heh... > > To tell the truth, I haven't looked at Bio::DB::Taxonomy in any > depth yet, > but I myself have seen issues with the way Bio::Species treats > bacterial > strains (I guess this also involves Bio::Taxonomy::Node since > that's what > Bio::Species delegates to). Seems it likes to repeat some strain > names when > using $seq->species->common_name. Not a killer problem but > annoying since > the correct name is in the source tag in the feature table! I > 'could' take > a look at it but I can't guarantee quick results. > > Jason, I could add Taxonomy to the EUtilities overhaul I mentioned > to you > previously but it'll take awhile to get going. I'm really more > interested > in getting epost-esearch-efetch sequence retrieval up and running > first with > the same API as Bio::DB::GenBank/Genpept and > Bio::DB::Query::GenBank, donate > the code (late summer/fall???) after working out namespace issues > so it > doesn't conflict with current Bio::DB::WebDBSeqI inheritance. I > suppose I > could also look at Bio::DB:Taxonomy to see what's up in the next > couple of > weeks (after conference), unless someone gets to it sooner. > > Chris > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Jason Stajich >> Sent: Thursday, May 11, 2006 7:05 AM >> To: Chris Fields >> Cc: bioperl-l at lists.open-bio.org; 'Sendu Bala' >> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion >> >> Great - now we just need someone to volunteer to actually work on >> this. >> >> The current code grabs most of this but I believe expects a different >> XML >> >> >> On May 10, 2006, at 11:36 PM, Chris Fields wrote: >> >>> I think you can get pretty much everything now, though I can >>> definitely see >>> the use of a local database. I ran a few tests, really unrelated >>> to this, >>> using the powerscripting test page at NCBI for eutils (for the >>> curious, at >>> http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was >>> able to >>> retrieve XML-formatted taxonomic information; here's the bacterium >>> Frankia >>> sp. CcI3 TaxID info, which looks like they have everything set up >>> by rank. >>> It gives quite a bit of information. >>> >>> >>> >> "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd"> >>> >>> >>> >>> 106370 >>> Frankia sp. CcI3 >>> 1854 >>> species >>> Bacteria >>> >>> 11 >>> Bacterial and Plant Plastid >>> >>> >>> 0 >>> Unspecified >>> >>> cellular organisms; Bacteria; Actinobacteria; >>> Actinobacteria >>> (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae; >>> Frankia >>> >>> >>> 131567 >>> cellular organisms >>> no rank >>> >>> >>> 2 >>> Bacteria >>> superkingdom >>> >>> >>> 201174 >>> Actinobacteria >>> phylum >>> >>> >>> 1760 >>> Actinobacteria (class) >>> class >>> >>> >>> 85003 >>> Actinobacteridae >>> subclass >>> >>> >>> 2037 >>> Actinomycetales >>> order >>> >>> >>> 85013 >>> Frankineae >>> suborder >>> >>> >>> 74712 >>> Frankiaceae >>> family >>> >>> >>> 1854 >>> Frankia >>> genus >>> >>> >>> 1999/10/22 >>> 2005/01/19 >>> 2000/02/02 >>> >>> >>> >>> Chris >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Jason Stajich >>>> Sent: Wednesday, May 10, 2006 7:54 PM >>>> To: Sendu Bala >>>> Cc: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion >>>> >>>> I would use the implementation that talks to the flatfile db as the >>>> standard here. nodes are defined by the data in from taxonomy dump >>>> dbs from ncbi. >>>> the eutils is pretty worthless except for taxid->name or >>>> reverse, you >>>> can't get the full taxonomy (or couldn't when that >>>> implementation was >>>> written). >>>> >>>> The "name" method refers to the name of the node - each level in >>>> the >>>> taxonomy can have a "name". >>>> >>>> The bits of hackiness relate to wrapping the node object as a >>>> Bio::Species and/or being able to read a genbank file and the >>>> organism taxonomy data as a list and instantiating. If we could >>>> rely >>>> on everything being in a DB of course this would be simpler. >>>> >>>> Another problem is the depth of the taxonomy is not constant for >>>> every node so assuming that a fixed number of slots will be >>>> filled in >>>> to generate the taxonomy leads to problems. >>>> >>>> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as >>>> the >>>> best example of working code as this is how I really wanted it to >>>> work, the Bio::Species hacks are only there to shoehorn data >>>> retrieved from genbank files in. With the flatfile implementation >>>> you have to walk all the way up the db hierarchy to get the kingdom >>>> for a node so you do have to build up the classification >>>> hierarchy as >>>> each node only stores data about itsself. >>>> >>>> I'm not exactly sure what you are proposing to do, but would >>>> definitely enjoy another pair of hands, I don't really have time to >>>> mess with it any time soon. >>>> >>>> -jason >>>> On May 10, 2006, at 5:30 AM, Sendu Bala wrote: >>>> >>>>> Hi, >>>>> I'm a little confused as to how names are supposed to work in >>>>> Bio::Taxonomy::Node. >>>>> >>>>> In the bioperl versions that I've looked at a Node doesn't seem to >>>>> store >>>>> the most important information about itself - it's scientific name >>>>> - in >>>>> an obvious place. bioperl 1.5.1 puts it at the start of the >>>>> classification list. I'd have thought sticking it in -name would >>>>> make >>>>> more sense, but this is used only for the GenBank common name. >>>>> >>>>> The Bio::Taxonomy docs still suggests: >>>>> >>>>> my $node_species_sapiens = Bio::Taxonomy::Node->new( >>>>> -object_id => 9606, # or -ncbi_taxid. Requird tag >>>>> -names => { >>>>> 'scientific' => ['sapiens'], >>>>> 'common_name' => ['human'] >>>>> }, >>>>> -rank => 'species' # Required tag >>>>> ); >>>>> >>>>> and whilst Bio::Taxonomy::Node does not accept -names, it does >>>>> have a >>>>> 'name' method which claims to work like: >>>>> >>>>> $obj->name('scientific', 'sapiens'); >>>>> >>>>> This kind of thing would be really nice, but afaics >>>>> Bio::Taxonomy::Node->new takes the -name value and makes a common >>>>> name >>>>> out of it, whilst the name() method passes any 'scientific' >>>>> name to >>>>> the >>>>> scientific_name() method which is unable to set any value (and >>>>> warns >>>>> about this), only get. >>>>> >>>>> It seems like the need to have this classification array work the >>>>> same >>>>> way as Bio::Species is causing some unnecessary restrictions. >>>>> Can't >>>>> the >>>>> more sensible idea of having a dedicated storage spot for the >>>>> ScientificName and other parameters be used, with the >>>>> classification >>>>> array either being generated just-in-time from the hash-stored >>>>> data, or >>>>> indeed being generated from the Lineage field? >>>>> >>>>> >>>>> Also, why does a node store the complete hierarchy on itself in >>>>> the >>>>> classification array? If we're going that far, why don't the >>>>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just >>>>> have a >>>>> get_taxonomy() method instead of a get_Taxonomy_Node() method. >>>>> get_taxonomy() could, from a single efetch.fcgi lookup, create a >>>>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would >>>>> only >>>>> have a minimum of information, if you could simply ask a node >>>>> what its >>>>> rank and scientific name was you could easily build a >>>>> classification >>>>> array, or ask what Kingdom your species was in etc. >>>>> >>>>> Are there good reasons for Taxonomy working the way it does in >>>>> 1.5.1, or >>>>> would I not be wasting my time re-writing things to make more >>>>> sense >>>>> (to me)? >>>>> >>>>> >>>>> Cheers, >>>>> Sendu. >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> Jason Stajich >>>> Duke University >>>> http://www.duke.edu/~jes12 >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From WiersmaP at AGR.GC.CA Thu May 11 13:04:01 2006 From: WiersmaP at AGR.GC.CA (Wiersma, Paul) Date: Thu, 11 May 2006 13:04:01 -0400 Subject: [Bioperl-l] What is the relationship between primer3 moduleandrun-primer3 module? Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C4D@onncrxms5.agr.gc.ca> The bug that Wenwu referred should only occur when reading a Primer3 output file; the Bio::Tools::Run::Primer3->run method takes the results and directly transfers them to a Bio::Tools::Primer3 object without an intermediate file. A Data::Dumper look at the Bio::Tools::Primer3 object shows the keys and results for PRIMER_SEQUENCE_ID and SEQUENCE in 'results' and then again in the 'results_by_number' hash but only in the '0' hash. All of this doesn't really matter for Li's original concern. If you want to include the id of sequence along with the primer3 results just take it from the seq object (i.e. $seq->display_id() ). Since you are in a loop taking one sequence at a time this $seq will be the one that was sent to primer3. PAW Paul A. Wiersma Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Summerland, BC wiersmap at agr.gc.ca ? -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Cui, Wenwu (NIH/NCI) [F] Sent: Wednesday, May 10, 2006 6:46 PM To: chen li; bioperl-l at bioperl.org Subject: Re: [Bioperl-l] What is the relationship between primer3 moduleandrun-primer3 module? 1. Bio::Tools::Primer3 is already included in Bio::Tools::Run::Primer3 module so that you can parse the result file. 2. There is a bug in Bio::Toos::Primer3.pm line 264 as I mentioned. Once fixed, it can output 3. primer3.exe is called in the Bio::Tools::Run::Primer3 "run" function, please read the function definition. From cjfields at uiuc.edu Thu May 11 13:16:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 11 May 2006 12:16:19 -0500 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: <0C1C2DAC-F388-465E-B6C2-7217A3B4CC6C@duke.edu> Message-ID: <000f01c6751e$9e89d6a0$15327e82@pyrimidine> > I think you'll see it is different and mostly a limitation of the > genbank format and the Bio::Species objects that you get from a > genbank parse do represent the full capabilities of a Taxonomy::Node. I definitely see the rational for using a TaxID lookup (I think Hilmar said so as well), especially for local databases. I wonder, though, if there is a way that RichSeqs like GenBank, when passed through SeqIO, can be just be 'short-circuited' using the sequence builder to just accept what's on the SOURCE or ORGANISM line of a file as is, without forcing it into Bio::Species/Bio::Taxonomy::Node. Or maybe diminish the role of the SOURCE/ORGANISM lines altogether to just simple Annotation objects and place much greater emphasis on the TaxID itself, in effect decoupling the TaxID (taxonomic information) from SOURCE/ORGANISM (annotation information). In other words, have GenBank/EMBL classification lines and organism lines essentially stay like they are in the input file (use simple objects). Then, if one were really intent on getting the full name, classification, etc., or one wanted to store their sequences in bioperl-db, they would be required to either have a local db of NCBI Taxonomy or remote access to a similar database (NCBI or something else) so a lookup could be accomplished using the TaxID. If they us BioSQL, then require them to preload their BioSQL database with NCBI's taxonomy, something Hilmar already strongly suggests. If anyone isn't interested in the taxonomic information or doesn't want to bother grabbing the database or setting up remote access, tough luck; just grab the Bio::Annotation/Bio::Species object and use that. As the saying goes, "you can't be all things to all people." At some point you have to throw your arms in the air, do the best you can, but give up trying to please everyone. > I am happy for someone to overhaul things, but it all boils down to > inferring which part of a list of names is the species versus sub- > species versus strain when none of the members of the list are > labeled. This is some of the same problems we have for swissprot as > well. I just don't think we can do it right only from the genbank > file data so I don't see a lot of point of expecting Bio::Species to > provide more than a representation of what is in the file and just > return that array. > > > It has seemed like we need to special case things pretty heavily or > do a lookup in the taxonomydb for something. > > Can you guess what value is the strain versus sub-species? What > happens when there is a two part strain name (space separated) and a > sub-species or variety designation? > > SOURCE Staphylococcus haemolyticus JCSC1435 > ORGANISM Staphylococcus haemolyticus JCSC1435 > Bacteria; Firmicutes; Bacillales; Staphylococcus. > http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=279808 > strain is JCSC1435 > > versus > SOURCE Muntiacus muntjak vaginalis > ORGANISM Muntiacus muntjak vaginalis > Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; > Euteleostomi; > Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; > Ruminantia; > Pecora; Cervidae; Muntiacinae; Muntiacus. > http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9887 > species is muntjak, sub-species vaginalis ? > > versus > SOURCE Aspergillus nidulans FGSC A4 > ORGANISM Aspergillus nidulans FGSC A4 > Eukaryota; Fungi; Ascomycota; Pezizomycotina; > Eurotiomycetes; > Eurotiales; Trichocomaceae; Emericella. > http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=227321 > > Genus should be Aspergillus or Emericella ? > > Strain and subspecies/variety in the same entry > SOURCE Cryptococcus neoformans var. grubii H99 > ORGANISM Cryptococcus neoformans var. grubii H99 > Eukaryota; Fungi; Basidiomycota; Hymenomycetes; > Heterobasidiomycetes; Tremellomycetidae; Tremellales; > Tremellaceae; > Filobasidiella. > http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=235443 Definitely tricky! This really points out the problem here. It used to be a problem for only a few cases but with so many bacterial and fungal genomes that's changed. The Frankia XML example has the scientific name set to "Frankia sp. CcI3", which matches the SOURCE/ORGANISM line in NCBI's GenBank files and the OS line in EMBL files. It looks like the lines are parsed into and then built from the ground-up in Bio::SeqIO::genbank using Bio::Species objects, which, in my case with the strain designation, is where the problem lies. They could be placed in annotation objects with (-tagname=> 'SOURCE', value =>'Frankia sp. CcI3') or similar settings. Or simplify Bio::Species to only represent the information in the GenBank SOURCE/ORGANISM/CLASSIFICATION or EMBL OS/OC lines and nothing more complex than that (no complex taxonomy; for that you use the TaxID and local database). Okay, I need to lay off the coffee now... Chris > On May 11, 2006, at 10:57 AM, Chris Fields wrote: > > > Heh... > > > > To tell the truth, I haven't looked at Bio::DB::Taxonomy in any > > depth yet, > > but I myself have seen issues with the way Bio::Species treats > > bacterial > > strains (I guess this also involves Bio::Taxonomy::Node since > > that's what > > Bio::Species delegates to). Seems it likes to repeat some strain > > names when > > using $seq->species->common_name. Not a killer problem but > > annoying since > > the correct name is in the source tag in the feature table! I > > 'could' take > > a look at it but I can't guarantee quick results. > > > > Jason, I could add Taxonomy to the EUtilities overhaul I mentioned > > to you > > previously but it'll take awhile to get going. I'm really more > > interested > > in getting epost-esearch-efetch sequence retrieval up and running > > first with > > the same API as Bio::DB::GenBank/Genpept and > > Bio::DB::Query::GenBank, donate > > the code (late summer/fall???) after working out namespace issues > > so it > > doesn't conflict with current Bio::DB::WebDBSeqI inheritance. I > > suppose I > > could also look at Bio::DB:Taxonomy to see what's up in the next > > couple of > > weeks (after conference), unless someone gets to it sooner. > > > > Chris > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Jason Stajich > >> Sent: Thursday, May 11, 2006 7:05 AM > >> To: Chris Fields > >> Cc: bioperl-l at lists.open-bio.org; 'Sendu Bala' > >> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion > >> > >> Great - now we just need someone to volunteer to actually work on > >> this. > >> > >> The current code grabs most of this but I believe expects a different > >> XML > >> > >> > >> On May 10, 2006, at 11:36 PM, Chris Fields wrote: > >> > >>> I think you can get pretty much everything now, though I can > >>> definitely see > >>> the use of a local database. I ran a few tests, really unrelated > >>> to this, > >>> using the powerscripting test page at NCBI for eutils (for the > >>> curious, at > >>> http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was > >>> able to > >>> retrieve XML-formatted taxonomic information; here's the bacterium > >>> Frankia > >>> sp. CcI3 TaxID info, which looks like they have everything set up > >>> by rank. > >>> It gives quite a bit of information. > >>> > >>> > >>> >>> "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd"> > >>> > >>> > >>> > >>> 106370 > >>> Frankia sp. CcI3 > >>> 1854 > >>> species > >>> Bacteria > >>> > >>> 11 > >>> Bacterial and Plant Plastid > >>> > >>> > >>> 0 > >>> Unspecified > >>> > >>> cellular organisms; Bacteria; Actinobacteria; > >>> Actinobacteria > >>> (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae; > >>> Frankia > >>> > >>> > >>> 131567 > >>> cellular organisms > >>> no rank > >>> > >>> > >>> 2 > >>> Bacteria > >>> superkingdom > >>> > >>> > >>> 201174 > >>> Actinobacteria > >>> phylum > >>> > >>> > >>> 1760 > >>> Actinobacteria (class) > >>> class > >>> > >>> > >>> 85003 > >>> Actinobacteridae > >>> subclass > >>> > >>> > >>> 2037 > >>> Actinomycetales > >>> order > >>> > >>> > >>> 85013 > >>> Frankineae > >>> suborder > >>> > >>> > >>> 74712 > >>> Frankiaceae > >>> family > >>> > >>> > >>> 1854 > >>> Frankia > >>> genus > >>> > >>> > >>> 1999/10/22 > >>> 2005/01/19 > >>> 2000/02/02 > >>> > >>> > >>> > >>> Chris > >>> > >>>> -----Original Message----- > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>> bounces at lists.open-bio.org] On Behalf Of Jason Stajich > >>>> Sent: Wednesday, May 10, 2006 7:54 PM > >>>> To: Sendu Bala > >>>> Cc: bioperl-l at lists.open-bio.org > >>>> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion > >>>> > >>>> I would use the implementation that talks to the flatfile db as the > >>>> standard here. nodes are defined by the data in from taxonomy dump > >>>> dbs from ncbi. > >>>> the eutils is pretty worthless except for taxid->name or > >>>> reverse, you > >>>> can't get the full taxonomy (or couldn't when that > >>>> implementation was > >>>> written). > >>>> > >>>> The "name" method refers to the name of the node - each level in > >>>> the > >>>> taxonomy can have a "name". > >>>> > >>>> The bits of hackiness relate to wrapping the node object as a > >>>> Bio::Species and/or being able to read a genbank file and the > >>>> organism taxonomy data as a list and instantiating. If we could > >>>> rely > >>>> on everything being in a DB of course this would be simpler. > >>>> > >>>> Another problem is the depth of the taxonomy is not constant for > >>>> every node so assuming that a fixed number of slots will be > >>>> filled in > >>>> to generate the taxonomy leads to problems. > >>>> > >>>> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as > >>>> the > >>>> best example of working code as this is how I really wanted it to > >>>> work, the Bio::Species hacks are only there to shoehorn data > >>>> retrieved from genbank files in. With the flatfile implementation > >>>> you have to walk all the way up the db hierarchy to get the kingdom > >>>> for a node so you do have to build up the classification > >>>> hierarchy as > >>>> each node only stores data about itsself. > >>>> > >>>> I'm not exactly sure what you are proposing to do, but would > >>>> definitely enjoy another pair of hands, I don't really have time to > >>>> mess with it any time soon. > >>>> > >>>> -jason > >>>> On May 10, 2006, at 5:30 AM, Sendu Bala wrote: > >>>> > >>>>> Hi, > >>>>> I'm a little confused as to how names are supposed to work in > >>>>> Bio::Taxonomy::Node. > >>>>> > >>>>> In the bioperl versions that I've looked at a Node doesn't seem to > >>>>> store > >>>>> the most important information about itself - it's scientific name > >>>>> - in > >>>>> an obvious place. bioperl 1.5.1 puts it at the start of the > >>>>> classification list. I'd have thought sticking it in -name would > >>>>> make > >>>>> more sense, but this is used only for the GenBank common name. > >>>>> > >>>>> The Bio::Taxonomy docs still suggests: > >>>>> > >>>>> my $node_species_sapiens = Bio::Taxonomy::Node->new( > >>>>> -object_id => 9606, # or -ncbi_taxid. Requird tag > >>>>> -names => { > >>>>> 'scientific' => ['sapiens'], > >>>>> 'common_name' => ['human'] > >>>>> }, > >>>>> -rank => 'species' # Required tag > >>>>> ); > >>>>> > >>>>> and whilst Bio::Taxonomy::Node does not accept -names, it does > >>>>> have a > >>>>> 'name' method which claims to work like: > >>>>> > >>>>> $obj->name('scientific', 'sapiens'); > >>>>> > >>>>> This kind of thing would be really nice, but afaics > >>>>> Bio::Taxonomy::Node->new takes the -name value and makes a common > >>>>> name > >>>>> out of it, whilst the name() method passes any 'scientific' > >>>>> name to > >>>>> the > >>>>> scientific_name() method which is unable to set any value (and > >>>>> warns > >>>>> about this), only get. > >>>>> > >>>>> It seems like the need to have this classification array work the > >>>>> same > >>>>> way as Bio::Species is causing some unnecessary restrictions. > >>>>> Can't > >>>>> the > >>>>> more sensible idea of having a dedicated storage spot for the > >>>>> ScientificName and other parameters be used, with the > >>>>> classification > >>>>> array either being generated just-in-time from the hash-stored > >>>>> data, or > >>>>> indeed being generated from the Lineage field? > >>>>> > >>>>> > >>>>> Also, why does a node store the complete hierarchy on itself in > >>>>> the > >>>>> classification array? If we're going that far, why don't the > >>>>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just > >>>>> have a > >>>>> get_taxonomy() method instead of a get_Taxonomy_Node() method. > >>>>> get_taxonomy() could, from a single efetch.fcgi lookup, create a > >>>>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would > >>>>> only > >>>>> have a minimum of information, if you could simply ask a node > >>>>> what its > >>>>> rank and scientific name was you could easily build a > >>>>> classification > >>>>> array, or ask what Kingdom your species was in etc. > >>>>> > >>>>> Are there good reasons for Taxonomy working the way it does in > >>>>> 1.5.1, or > >>>>> would I not be wasting my time re-writing things to make more > >>>>> sense > >>>>> (to me)? > >>>>> > >>>>> > >>>>> Cheers, > >>>>> Sendu. > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> -- > >>>> Jason Stajich > >>>> Duke University > >>>> http://www.duke.edu/~jes12 > >>>> > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > >> -- > >> Jason Stajich > >> Duke University > >> http://www.duke.edu/~jes12 > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 From WiersmaP at AGR.GC.CA Thu May 11 20:13:12 2006 From: WiersmaP at AGR.GC.CA (Wiersma, Paul) Date: Thu, 11 May 2006 20:13:12 -0400 Subject: [Bioperl-l] What is the relationship between primer3 module andrun-primer3 module? Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C52@onncrxms5.agr.gc.ca> Li, If you are only "a little confused" by the OO concepts in the primer3 modules than you are doing well. To expand a little on Wenwu's explanations. A Bio::Tools::Run:Primer3 object is a "wrapper" around the Primer3 program. All the commands and parameters that Primer3 needs for it to run are collected inside the object. This includes a sequence (which you must supply as a sequence object) and parameters (most of which are already supplied by default but can be changed using the $primer3_object->add_targets method). Then, when everything is set the way you want it you 'run' the Primer3 program by using $primer3_object->run. The "wrapper" collects all the run parameters and sends them off to the Primer3 executable. Primer3 does the analysis and outputs the results to "stdout" in boulder-io format. By redirecting the output (i.e. perl p3run_script.pl > out.txt) you will get the Primer3 output directly in the boulder-io format ('tag'='value') stored in out.txt. Because out.txt is not being closed between each sequence called in the script you get all of the results concatenated in out.txt. However, if you supplied an output filename (-outfile=>$file_out) in the "wrapper", each line of output from Primer3 will be written to $file_out and at the end of Primer3 output the file will be closed. Now if your script loops to another sequence it will open the same outfile again and overwrite. One last important detail for the "wrapper" object. When Primer3 is executed the $primer3_object is designed to return a Bio::Tools::Primer3 object (the code is: my $results_object = $primer3_object->run). $results_object is a Bio::Tools::Primer3 object and contains the results of your Primer3 run as well as having methods for getting at that information. This includes finding out how many primer sets were found and the means to access the primer set results one at a time. It does work as advertised. Because all of the primer sets are based on the same sequence, Primer3 only outputs the SEQUENCE and PRIMER_SEQUENCE_ID one time instead of for each primer set. That is why they only show up in $results_object as if they belonged with the first primer set (set '0') and they are not available for the other primer sets. PAW Paul A. Wiersma Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Summerland, BC wiersmap at agr.gc.ca ? -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chen li Sent: Wednesday, May 10, 2006 5:28 PM To: bioperl-l at bioperl.org Subject: [Bioperl-l] What is the relationship between primer3 module andrun-primer3 module? First thank you all for replying my previous post about primer3. But now I am a little confused even after I read the documents: What is the relationship between these two modules? What is correct/standard way to use them to do the batch-primer design? What I do is that I use Bio::Tools::Run::Primer3 to design primers. Based on Dr. Roy Chaudhuri's information I can set the parameters using the following syntax: $primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510'); Based on Paul A. Wiersma's explanation I can also print out part of the primer results(because I don't need all the information). But there is a little trouble: PRIMER_SEQUENCE_ID can't be accessed using this method. And Paul points out that "PRIMER_SEQUENCE_ID and SEQUENCE are not part of the individual results but only end up by default with $results->primer_results(0)". So it seems there is no way to get around this problem using Bio::Tools::Run::Primer3. And others suggest using Bio::Tools::Primer3 to parse the results. So is true that Bio::Tools::Run::Primer3 is for primer design and Bio::Tools::Primer3 is for parsing the results from Bio::Tools::Run::Primer3? But what I find is that I get almost all the results (except PRIMER_SEQUENCE_ID and SEQUENCE ) without providing a line code use Bio::Tools::Primer3 in the script. How to explain this? Is it because the following line code? my $result=$primer3->run; The last question: which line code is used to invoke program primer3.exe? How does Perl script call the primer3.exe? Once again thank you all very much, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From torsten.seemann at infotech.monash.edu.au Fri May 12 00:29:37 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 12 May 2006 14:29:37 +1000 Subject: [Bioperl-l] Using bioperl to convert gene predictions to gff In-Reply-To: <000301c6698f$b17a4d20$0202a8c0@GosinkFranklin> References: <000301c6698f$b17a4d20$0202a8c0@GosinkFranklin> Message-ID: <44640F31.6090702@infotech.monash.edu.au> Mark, > I'd like to reformat gene predictions from several different programs > (genscan, glimmerhmm, fgenesh) to gff format. I know bioperl can parse the > output from these and other predictors and that it can export into GFF. But > I'm not clear on how to string the two together. > Can anyone point me at any example code? The parser module for the gene predictions generally allow you to iterate through the predicted genes. Each prediction is usually returned as a Bio::SeqFeatureI-derived object. Those objects have a gff_string() method to print them as GFF. So something as simple as this *may* work: use Bio::Tools::Glimmer; my $parser = new Bio::Tools::Glimmer(-file => 'glimmer.out'); while(my $gene = $parser->next_prediction) { print $gene->gff_string; } If you want separate GFF lines for each exon, you'll have to do another loop over $gene->exons() etc each of which are luckily also Bio::SeqFeatures! Or if want to modify some of the GFF columns first, eg. the source tag, just do $gene->source_tag('mynewtag') before printing it. Hope this helps, -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From torsten.seemann at infotech.monash.edu.au Fri May 12 00:36:46 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 12 May 2006 14:36:46 +1000 Subject: [Bioperl-l] Bio::Graphics::Panel imagemap making with Bio::Graphics::Panel In-Reply-To: <5b6410e0605030120q31d1f554mbc4bf104deca48bf@mail.gmail.com> References: <5b6410e0605030120q31d1f554mbc4bf104deca48bf@mail.gmail.com> Message-ID: <446410DE.7070305@infotech.monash.edu.au> Kevin, > I want to create an imagemap of short sequence matches with a longer one > with clickable imagemaps for the short sequences. I figure I can do this > easily enough using the example script for parsing blast output but I need > an example script to understand how to produce the html code for the > imagemap. I can find only rather cryptic references about how this can be > done (see below). The "blastGraphic" project probably has Perl code that could help you. http://www.gmod.org/blastGraphic.shtml It is/was part of the GMOD project. It produces pretty clickable image maps from BLAST reports. Hope it helps, -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From brianjgilmartin at hotmail.com Fri May 12 05:29:15 2006 From: brianjgilmartin at hotmail.com (brian gilmartin) Date: Fri, 12 May 2006 10:29:15 +0100 Subject: [Bioperl-l] (no subject) Message-ID: please remove me from the list _________________________________________________________________ Be the first to hear what's new at MSN - sign up to our free newsletters! http://www.msn.co.uk/newsletters From sb at mrc-dunn.cam.ac.uk Fri May 12 06:24:39 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Fri, 12 May 2006 11:24:39 +0100 Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names Message-ID: <44646267.2000802@mrc-dunn.cam.ac.uk> In bioperl up to at least 1.5.1, when one of the database modules comes across a species rank it does: if ($rank eq 'species') { # get rid of genus from species name (undef,$taxon_name) = split(/\s+/,$taxon_name,2); } However even though true scientific name is usually 'Genus species' in the database, note the 'usually' - sometimes the species is a multiword item that does not include the Genus, so we can't do some simple split and take the second word. The same applies to levels below species, eg. 'Avian erythroblastosis virus' is a variant of the species 'Avian leukosis virus' but 'Avian erythroblastosis virus (strain ES4)' is a variant of that variant... My solution is to just remove whatever is the same between the current rank and the previous rank. Maybe even that's not so perfect, but it must be a lot better than turning the species 'Avian leukosis virus' into the species 'virus' (especially given that the genus here is 'Alpharetrovirus')! # we need to be going root(kingdom) -> leaf (species or lower) order # # we need to be storing untouched versions of the scientific name of # the previous rank ($self->{_last_raw}) # # probably only bother start doing this when we get to genus my $last_raw = $self->{_last_raw} || undef; $self->{_last_raw} = $sci_name; if ($last_raw) { $sci_name =~ s/$last_raw//; $sci_name =~ s/^\s+//; } Are there even more strange species (and lower) names that would still not work well with the above solution? Cheers, Sendu. From s_maheshwari84 at rediffmail.com Fri May 12 09:55:49 2006 From: s_maheshwari84 at rediffmail.com (saurabh maheshwari) Date: 12 May 2006 13:55:49 -0000 Subject: [Bioperl-l] problem help me...........please Message-ID: <20060512135549.27106.qmail@webmail9.rediffmail.com> hello I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD). I am working on protein protein interaction but I am unable to use the protein interaction module i.e. ProteinGraph.pm.. Actially I am facing lots of problem in the programme I have written Please help me since last four months I am not able to solve the same problem.. I am pasting my programe here also I am attaching it also. ...... #!usr/bin/perl use lib "/usr/local/bioxapps/bioperl/library/"; use strict; use Bio::Graph::SimpleGraph; use Bio::Graph::IO; our @ISA=qw( Bio::SeqI); use Bio::Graph::Edge; use Bio::Graph::IO::dip; use Bio::Graph::IO::psi_xml; use Clone qw(clone); use vars qw(@ISA); use Bio::AnnotatableI; use Bio::IdentifiableI; our @ISA = qw(Bio::Graph::SimpleGraph); @ISA = qw(Bio::Graph::IO); our @ISA=qw(Expoerter); use Bio::Graph::ProteinGraph; use Class::AutoClass; use Bio::Graph::SimpleGraph::Traversal; my $graphio = Bio::Graph::IO->new(-file => '/users/saurabh/perl_program/sample1.txt',-format => 'dip'); print "$graphio"; my $graph = $graphio->next_network(); print "$graph->nodes\t"; $graph->remove_dup_edges(); my @un=$graph->unconnected_nodes(); print "\nthe unconnected nodes are =@un"; my @n=$graph->subgraph(); print "\subgraph=@n\n"; #print "Please the protein-id whose clusering coefficient is to be detemined\n"; #my $v=; my $density = $graph->density(); print "\ngraph density=$density\n"; my @graphs = $graph->components(); print "\nno of Connected components=$#graphs\n"; print "\nplease enter the protein-id whom you want to remove from the network\n"; my $no=; $graph->remove_nodes($graph->nodes_by_id($no)); my $count = $graph->edge_count(); print "\nno of edges=$count\n "; my $ncount = $graph->node_count(); print "\nno of nodes=$ncount\n "; print"\nenter the protein whose interactions is to be find "; my $x=; my $node = $graph->nodes_by_id($x); #print " this is $node\n"; my @neighbors = $graph->neighbors($node); print "to check"; print join",",map{$_->object_id()} @neighbors; my @nodes = $graph->nodes(); print "\nno of nodes = @nodes\t\n"; my @hubs; foreach my $nodi (@nodes) { if ($graph->neighbor_count($node) > 10) { push @hubs, $nodi; } } foreach my $r(@hubs) { my @y=@$r; print "the following proteins have > 10 interactors=@y\n"; } #siblingual protein my @edgeref = $graph->articulation_points(); print "no of articulation points=$#edgeref\n"; print "please enter the protein whom you want to check for articulation point \n "; my $nod=; # make pathgen graph my $grap = Bio::Graph::IO->new(-file => 'org.txt',-format => 'dip'); my $gra = $grap->next_network(); $graph->remove_dup_edges(); $graph->union($gra); my @duplicates = $graph->dup_edges(); print "these interactions exist in cere and c.elegan\n=@duplicates"; print "please enter the first protein for identifiaction of shortest path\n"; my $p1=; print "please enter the second protein for identifiaction of shortest path\n"; my $p2=; my @a=$graph->shortest_paths(); print "shortest path=@a\t\n"; with Regards SAURABH MAHESHWARI M.Sc. (BIOINFORMATICS) JAMIA MILLIA ISLAMIA NEW DELHI -------------- next part -------------- A non-text attachment was scrubbed... Name: from.pl Type: application/octet-stream Size: 2723 bytes Desc: not available URL: From chen_li3 at yahoo.com Thu May 11 13:47:33 2006 From: chen_li3 at yahoo.com (chen li) Date: Thu, 11 May 2006 10:47:33 -0700 (PDT) Subject: [Bioperl-l] script for batch-primer design using primer3 module In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C4D@onncrxms5.agr.gc.ca> Message-ID: <20060511174733.68836.qmail@web36812.mail.mud.yahoo.com> Hi all, With the valuable input from many of you I finally come out a script for my personal need: 1)bacth-primer design 2)set some of the parameters instead of using all the default values 3)output only part of the information for the first pair of primers but not all of them(but you can choose) 4)the reults can be exported into excel for my convience. Enclosed are the script and the results tested. I also include some lines about how I figure out which keys/entries are vailable for change.If you don't want the sequence part just add # to comment it. Any comments are welcome. BTW the solution suggested by Dr. Cui and Paul doesn't work for me. Once again thank you very much, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: primer3-5 URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: result1.txt URL: From Marc.Logghe at DEVGEN.com Fri May 12 11:28:55 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Fri, 12 May 2006 17:28:55 +0200 Subject: [Bioperl-l] problem help me...........please Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746DAB@ANTARESIA.be.devgen.com> Hi, What is actually the problem ? Do you have errors ? Is the script not behaving as you expect ? You also might attach the input file sample1.txt so that people can try it. Regards, Marc > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > saurabh maheshwari > Sent: Friday, May 12, 2006 3:56 PM > To: bioperl-l at bioperl.org; s_maheshwari84 > Subject: [Bioperl-l] problem help me...........please > > > hello > I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD). > I am working on protein protein interaction but I am unable > to use the protein interaction module i.e. ProteinGraph.pm.. > Actially I am facing lots of problem in the programme I have > written Please help me since last four months I am not able > to solve the same problem.. > I am pasting my programe here also I am attaching it also. ...... > > #!usr/bin/perl > use lib "/usr/local/bioxapps/bioperl/library/"; > use strict; > use Bio::Graph::SimpleGraph; > use Bio::Graph::IO; > our @ISA=qw( Bio::SeqI); > use Bio::Graph::Edge; > use Bio::Graph::IO::dip; > use Bio::Graph::IO::psi_xml; > use Clone qw(clone); > use vars qw(@ISA); > use Bio::AnnotatableI; > use Bio::IdentifiableI; > our @ISA = qw(Bio::Graph::SimpleGraph); > @ISA = qw(Bio::Graph::IO); > our @ISA=qw(Expoerter); > use Bio::Graph::ProteinGraph; > use Class::AutoClass; > use Bio::Graph::SimpleGraph::Traversal; > > my $graphio = Bio::Graph::IO->new(-file => > '/users/saurabh/perl_program/sample1.txt',-format => 'dip'); > print "$graphio"; > my $graph = $graphio->next_network(); > print "$graph->nodes\t"; > $graph->remove_dup_edges(); > my @un=$graph->unconnected_nodes(); > print "\nthe unconnected nodes are =@un"; my > @n=$graph->subgraph(); print "\subgraph=@n\n"; #print "Please > the protein-id whose clusering coefficient is to be > detemined\n"; #my $v=; my $density = > $graph->density(); print "\ngraph density=$density\n"; my > @graphs = $graph->components(); print "\nno of Connected > components=$#graphs\n"; print "\nplease enter the protein-id > whom you want to remove from the network\n"; my $no=; > $graph->remove_nodes($graph->nodes_by_id($no)); > my $count = $graph->edge_count(); > print "\nno of edges=$count\n "; > my $ncount = $graph->node_count(); > print "\nno of nodes=$ncount\n "; > > print"\nenter the protein whose interactions is to be find > "; my $x=; my $node = $graph->nodes_by_id($x); #print > " this is $node\n"; my @neighbors = $graph->neighbors($node); > print "to check"; print join",",map{$_->object_id()} > @neighbors; my @nodes = $graph->nodes(); print "\nno of nodes > = @nodes\t\n"; my @hubs; foreach my $nodi (@nodes) { > if ($graph->neighbor_count($node) > 10) > { > push @hubs, $nodi; > } > } > > foreach my $r(@hubs) > { > my @y=@$r; > print "the following proteins have > 10 interactors=@y\n"; > } > #siblingual protein > > my @edgeref = $graph->articulation_points(); print "no of > articulation points=$#edgeref\n"; print "please enter the > protein whom you want to check for articulation point \n "; > my $nod=; > # make pathgen graph > my $grap = Bio::Graph::IO->new(-file => 'org.txt',-format > => 'dip'); > my $gra = $grap->next_network(); > $graph->remove_dup_edges(); > $graph->union($gra); > my @duplicates = $graph->dup_edges(); > print "these interactions exist in cere and c.elegan\n=@duplicates"; > print "please enter the first protein for identifiaction of > shortest path\n"; > my $p1=; > print "please enter the second protein for identifiaction > of shortest path\n"; > my $p2=; > > my @a=$graph->shortest_paths(); > print "shortest path=@a\t\n"; > > > > with Regards > > SAURABH MAHESHWARI > > M.Sc. (BIOINFORMATICS) > > JAMIA MILLIA ISLAMIA > > NEW DELHI > From stoltzfu at umbi.umd.edu Fri May 12 11:56:06 2006 From: stoltzfu at umbi.umd.edu (Arlin Stoltzfus) Date: Fri, 12 May 2006 11:56:06 -0400 Subject: [Bioperl-l] proposal: Bio::CDAT (character data and trees) Message-ID: Dear developers-- We propose a Bio::CDAT (Character Data And Trees) module to facilitate comparative analysis using evolutionary methods by 1) managing evolutionary relationships (by linking data to trees) and 2) allowing coordinated analysis of different types of data (by implementing a generic concept of ?character-state? data). Bio::CDAT would leverage existing BioPerl objects and include the functionality of Rutger Vos's Bio::Phylo. It would provide the framework to develop interfaces to analysis tools (phylogeny inference, evolutionary rate models, functional shift inference, etc), as well as to file formats and visualization methods appropriate for such analyses. A proposal is available at http://www.molevol.org/camel/projects/CDAT-proposal.pdf We would like to hear your thoughts (e.g., see the section on "Questions to consider")! Thanks Arlin Stoltzfus WeiGang Qiu Rutger Vos (with thanks to Justin Reese and Aaron Mackey) ------------------ Arlin Stoltzfus (stoltzfu at umbi.umd.edu) CARB, 9600 Gudelsky Drive, Rockville, Maryland 20850 tel 240 314 6208, fax 240 314 6255, www.molevol.org/camel From sdavis2 at mail.nih.gov Fri May 12 11:54:57 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Fri, 12 May 2006 11:54:57 -0400 Subject: [Bioperl-l] problem help me...........please In-Reply-To: <20060512135549.27106.qmail@webmail9.rediffmail.com> Message-ID: On 5/12/06 9:55 AM, "saurabh maheshwari" wrote: > > hello > I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD). > I am working on protein protein interaction but I am unable to use the protein > interaction module i.e. ProteinGraph.pm.. > Actially I am facing lots of problem in the programme I have written Please > help me since last four months I am not able to solve the same problem.. > I am pasting my programe here also I am attaching it also. ...... You haven't really told us what you are trying to do or what problems you are having. Sean From cjfields at uiuc.edu Fri May 12 13:08:11 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 12 May 2006 12:08:11 -0500 Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names In-Reply-To: <44646267.2000802@mrc-dunn.cam.ac.uk> Message-ID: <000f01c675e6$a61bde90$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Friday, May 12, 2006 5:25 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles > species,subspecies/variant names > > In bioperl up to at least 1.5.1, when one of the database modules comes > across a species rank it does: > > if ($rank eq 'species') { > # get rid of genus from species name > (undef,$taxon_name) = split(/\s+/,$taxon_name,2); > } The XML example from NCBI Taxonomy I mentioned previously seems to have everything in the classification, from superkingdom down to species (no strain unfortunately, and I'm nit sure about subspecies); if it's missing the rank then the designation doesn't exist or is tagged as 'no rank'. Like I mentioned before I'm not intimately familiar Bio::Taxonomy, Bio::DB::Taxonomy, or Bio::Species, so I don't have a clue as to how everything is parsed and plugged in to Bio::Taxonomy objects. I do know that XML::Twig is used for parsing through the data so it shouldn't be too hard to change what you want. I haven't tried using Bio::DB::Taxonomy directly yet, but I would have thought that the binomial is just built from the XML twig 'LineageEx' Rank=Genus + Rank=Species, that the genus comes from the tag 'Genus' and species from 'Species', and that the scientific name is from the tag 'ScientificName'. Guess not. > However even though true scientific name is usually 'Genus species' in > the database, note the 'usually' - sometimes the species is a multiword > item that does not include the Genus, so we can't do some simple split > and take the second word. > The same applies to levels below species, eg. 'Avian erythroblastosis > virus' is a variant of the species 'Avian leukosis virus' but 'Avian > erythroblastosis virus (strain ES4)' is a variant of that variant... > > My solution is to just remove whatever is the same between the current > rank and the previous rank. Maybe even that's not so perfect, but it > must be a lot better than turning the species 'Avian leukosis virus' > into the species 'virus' (especially given that the genus here is > 'Alpharetrovirus')! > > # we need to be going root(kingdom) -> leaf (species or lower) order > # > # we need to be storing untouched versions of the scientific name of > # the previous rank ($self->{_last_raw}) > # > # probably only bother start doing this when we get to genus > my $last_raw = $self->{_last_raw} || undef; > $self->{_last_raw} = $sci_name; > if ($last_raw) { > $sci_name =~ s/$last_raw//; > $sci_name =~ s/^\s+//; > } > > Are there even more strange species (and lower) names that would still > not work well with the above solution? I'm don't think taking Genus/Species directly from the scientific name (normally what is in the SOURCE or ORGANISM annotation for GenBank or OS for EMBL) is the best way to go about it since it's really a best guess using regex; Jason pointed out several examples where this falls apart, and being a bacterial man I have found many examples myself. I'm also not sure that forcing a lookup for every TaxID in every sequence every time it's passed through SeqIO is the best way to go either, though I think it should be required for storing sequences. It's a tricky balance. I still think that maybe we should absolve ourselves from using SOURCE/ORGANISM or OS/OC information in GenBank files as anything more than strictly annotation, or reconstruct Bio::Species to maybe a Bio::Annotation::Species object to handle that annotation and either deprecate Bio::Species or separate it completely from any Bio::Taxonomy objects. It would really simplify things. Then, if anyone is interested in taxonomy, either install a local database or use Entrez efetch, and then use Bio::DB::Taxonomy (fixed of course) to grab the TaxID info. Seems like we're running more and more into exceptions to the rule as more genomes are made available. Anyway, using Bio::Species for GenBank is really screwy for bacterial names, so currently I get around BioPerl issues with bacterial names by grabbing the 'source' seqfeature and pulling the 'organism' tag out. But it really shouldn't be that obfuscated, right? Chris > Cheers, > Sendu. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sdavis2 at mail.nih.gov Sat May 13 08:19:21 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Sat, 13 May 2006 08:19:21 -0400 Subject: [Bioperl-l] problem help me...........please In-Reply-To: <20060513041853.16091.qmail@webmail31.rediffmail.com> References: <20060513041853.16091.qmail@webmail31.rediffmail.com> Message-ID: <4465CEC9.2010909@mail.nih.gov> saurabh maheshwari wrote: > > hello > Thanks for your prompt reply. > Actaully I am trying to make a protein interaction graph from a dip > file.But I am not able to do so.In my last mail I have already attached > my program which is giving some error and I am not able troble shot > them.Please help > Thanks I meant that since we don't know what error(s) you are getting, it is really not possible to determine what the problem is. Also, someone else on the list offered to look at your code if you were to privide the input file. I find it helpful to look at this webpage every now and then to remind myself what constitutes a useful question to email lists: http://www.catb.org/~esr/faqs/smart-questions.html Sean > On Fri, 12 May 2006 Sean Davis wrote : > > > > > > > >On 5/12/06 9:55 AM, "saurabh maheshwari" > >wrote: > > > > > > > > hello > > > I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD). > > > I am working on protein protein interaction but I am unable to use > the protein > > > interaction module i.e. ProteinGraph.pm.. > > > Actially I am facing lots of problem in the programme I have > written Please > > > help me since last four months I am not able to solve the same > problem.. > > > I am pasting my programe here also I am attaching it also. ...... > > > >You haven't really told us what you are trying to do or what problems you > >are having. > > > >Sean > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > with Regards > SAURABH MAHESHWARI > M.Sc. (BIOINFORMATICS) > JAMIA MILLIA ISLAMIA > NEW DELHI > > > From s_maheshwari84 at rediffmail.com Sat May 13 01:17:58 2006 From: s_maheshwari84 at rediffmail.com (saurabh maheshwari) Date: 13 May 2006 05:17:58 -0000 Subject: [Bioperl-l] problem help me...........please Message-ID: <20060513051758.4610.qmail@webmail31.rediffmail.com> hello I am very happy to see the prompt reply from the group members.. As you all suggested to attach the required files .. So I have attached all the three file first the input file,secod I have saved the error I was getting into a error file and third the programme file.. Actully in error file I want to know some thing . I am putting here one error line, ## no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) ## what this stand for Second thing I want to get the connected graph as I have. which type of connected grph I explain you by example.. Let there are five object in such a way. A connected to B A connected to C B connected to C D connected to C E connected to A I want to create a whole link in betwwen all five. Please help me I am not getting the result with Regards SAURABH MAHESHWARI M.Sc. (BIOINFORMATICS) JAMIA MILLIA ISLAMIA NEW DELHI -------------- next part -------------- A non-text attachment was scrubbed... Name: sample.dip Type: application/octet-stream Size: 5794 bytes Desc: not available URL: -------------- next part -------------- bash-2.05b$ perl from.pl Bio::Graph::ProteinGraph=HASH(0x1182e70) Bio::Graph::ProteinGraph=HASH(0x1182e70)->nodes the unconnected nodes are =subgraph=Bio::Graph::SimpleGraph=HASH(0x11e2160) graph density=0.00826446280991736 no of Connected components=60 please enter the protein-id whom you want to remove from the network XMECF2 no of edges=61 no of nodes=122 enter the protein whose interactions is to be find XMECF2 XMECF2 interacts with map{->object_id()} no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) Bio::Seq::RichSeq=HASH(0x11d1850 ) Bio::Seq::RichSeq=HASH(0x11bd4c0) Bio::Seq::RichSeq=HASH(0x11c2fd0) Bio::Seq:: RichSeq=HASH(0x11aa7f0) Bio::Seq::RichSeq=HASH(0x1198340) Bio::Seq::RichSeq=HASH (0x11d81a0) Bio::Seq::RichSeq=HASH(0x11ca320) Bio::Seq::RichSeq=HASH(0x11b5e40) Bio::Seq::RichSeq=HASH(0x1190e00) Bio::Seq::RichSeq=HASH(0x11c1350) Bio::Seq::Ri chSeq=HASH(0x11b2e20) Bio::Seq::RichSeq=HASH(0x11cb360) Bio::Seq::RichSeq=HASH(0 x1198250) Bio::Seq::RichSeq=HASH(0x11d0240) Bio::Seq::RichSeq=HASH(0x11c8f20) Bi o::Seq::RichSeq=HASH(0x11b4ef0) Bio::Seq::RichSeq=HASH(0x119f7a0) Bio::Seq::Rich Seq=HASH(0x11c2ee0) Bio::Seq::RichSeq=HASH(0x11dba20) Bio::Seq::RichSeq=HASH(0x1 1e2300) Bio::Seq::RichSeq=HASH(0x11b2f10) Bio::Seq::RichSeq=HASH(0x11b4b90) Bio: :Seq::RichSeq=HASH(0x11d4df0) Bio::Seq::RichSeq=HASH(0x11d4b80) Bio::Seq::RichSe q=HASH(0x11d8e70) Bio::Seq::RichSeq=HASH(0x11a1270) Bio::Seq::RichSeq=HASH(0x11c b5d0) Bio::Seq::RichSeq=HASH(0x11d5cc0) Bio::Seq::RichSeq=HASH(0x11d32a0) Bio::S eq::RichSeq=HASH(0x11b4c80) Bio::Seq::RichSeq=HASH(0x119e0c0) Bio::Seq::RichSeq= HASH(0x11b7ed0) Bio::Seq::RichSeq=HASH(0x11ad490) Bio::Seq::RichSeq=HASH(0x1196e 60) Bio::Seq::RichSeq=HASH(0x119b7f0) Bio::Seq::RichSeq=HASH(0x11cef60) Bio::Seq ::RichSeq=HASH(0x11b7b70) Bio::Seq::RichSeq=HASH(0x11dd330) Bio::Seq::RichSeq=HA SH(0x11da8c0) Bio::Seq::RichSeq=HASH(0x11a9f70) Bio::Seq::RichSeq=HASH(0x119b700 ) Bio::Seq::RichSeq=HASH(0x119a550) Bio::Seq::RichSeq=HASH(0x11ba910) Bio::Seq:: RichSeq=HASH(0x11e0b30) Bio::Seq::RichSeq=HASH(0x11d3030) Bio::Seq::RichSeq=HASH (0x11c62d0) Bio::Seq::RichSeq=HASH(0x11abb20) Bio::Seq::RichSeq=HASH(0x11d5bd0) Bio::Seq::RichSeq=HASH(0x11b03c0) Bio::Seq::RichSeq=HASH(0x119e1b0) Bio::Seq::Ri chSeq=HASH(0x11aa060) Bio::Seq::RichSeq=HASH(0x11a5700) Bio::Seq::RichSeq=HASH(0 x11a81e0) Bio::Seq::RichSeq=HASH(0x1196b00) Bio::Seq::RichSeq=HASH(0x11c1260) Bi o::Seq::RichSeq=HASH(0x11a2800) Bio::Seq::RichSeq=HASH(0x11c63c0) Bio::Seq::Rich Seq=HASH(0x11b60b0) Bio::Seq::RichSeq=HASH(0x11b93b0) Bio::Seq::RichSeq=HASH(0x1 1a4490) Bio::Seq::RichSeq=HASH(0x11ded50) Bio::Seq::RichSeq=HASH(0x11bbcd0) Bio: :Seq::RichSeq=HASH(0x1194780) Bio::Seq::RichSeq=HASH(0x11aedd0) Bio::Seq::RichSe q=HASH(0x11cd300) Bio::Seq::RichSeq=HASH(0x11a14e0) Bio::Seq::RichSeq=HASH(0x11c 4630) Bio::Seq::RichSeq=HASH(0x11a43a0) Bio::Seq::RichSeq=HASH(0x11a80f0) Bio::S eq::RichSeq=HASH(0x11bbbe0) Bio::Seq::RichSeq=HASH(0x11d5960) Bio::Seq::RichSeq= HASH(0x11c8e30) Bio::Seq::RichSeq=HASH(0x11cd3f0) Bio::Seq::RichSeq=HASH(0x11dd4 20) Bio::Seq::RichSeq=HASH(0x11cee70) Bio::Seq::RichSeq=HASH(0x11dbb10) Bio::Seq ::RichSeq=HASH(0x119a460) Bio::Seq::RichSeq=HASH(0x11aaa60) Bio::Seq::RichSeq=HA SH(0x11d1760) Bio::Seq::RichSeq=HASH(0x11cb6c0) Bio::Seq::RichSeq=HASH(0x11c7530 ) Bio::Seq::RichSeq=HASH(0x11deae0) Bio::Seq::RichSeq=HASH(0x11c4720) Bio::Seq:: RichSeq=HASH(0x119f890) Bio::Seq::RichSeq=HASH(0x11a6c40) Bio::Seq::RichSeq=HASH (0x11ad130) Bio::Seq::RichSeq=HASH(0x11e23f0) Bio::Seq::RichSeq=HASH(0x11d2f40) Bio::Seq::RichSeq=HASH(0x1194640) Bio::Seq::RichSeq=HASH(0x11d8f60) Bio::Seq::Ri chSeq=HASH(0x11d0150) Bio::Seq::RichSeq=HASH(0x119d070) Bio::Seq::RichSeq=HASH(0 x11a5610) Bio::Seq::RichSeq=HASH(0x11aa2d0) Bio::Seq::RichSeq=HASH(0x11b94a0) Bi o::Seq::RichSeq=HASH(0x11bd5b0) Bio::Seq::RichSeq=HASH(0x11c0ff0) Bio::Seq::Rich Seq=HASH(0x11a6b50) Bio::Seq::RichSeq=HASH(0x119cf80) Bio::Seq::RichSeq=HASH(0x1 1baa00) Bio::Seq::RichSeq=HASH(0x11c7620) Bio::Seq::RichSeq=HASH(0x119fb00) Bio: :Seq::RichSeq=HASH(0x11a2a70) Bio::Seq::RichSeq=HASH(0x11b1960) Bio::Seq::RichSe q=HASH(0x11ab8b0) Bio::Seq::RichSeq=HASH(0x11e0c20) Bio::Seq::RichSeq=HASH(0x11a d3a0) Bio::Seq::RichSeq=HASH(0x1197fe0) Bio::Seq::RichSeq=HASH(0x11b1870) Bio::S eq::RichSeq=HASH(0x11a2b60) Bio::Seq::RichSeq=HASH(0x1192750) Bio::Seq::RichSeq= HASH(0x11c9190) Bio::Seq::RichSeq=HASH(0x11e08c0) Bio::Seq::RichSeq=HASH(0x11dd6 90) Bio::Seq::RichSeq=HASH(0x11da7d0) Bio::Seq::RichSeq=HASH(0x11aece0) Bio::Seq ::RichSeq=HASH(0x11d80b0) Bio::Seq::RichSeq=HASH(0x11ca0b0) Bio::Seq::RichSeq=HA SH(0x1196bf0) Bio::Seq::RichSeq=HASH(0x11b7de0) Bio::Seq::RichSeq=HASH(0x11b02d0 ) Can't call method "isa" on an undefined value at /usr/local/bioxapps/bioperl/lib rary//Bio/Graph/ProteinGraph.pm line 477, line 2. -------------- next part -------------- A non-text attachment was scrubbed... Name: from.pl Type: application/octet-stream Size: 2723 bytes Desc: not available URL: From cjfields at uiuc.edu Sat May 13 14:18:53 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 13 May 2006 13:18:53 -0500 Subject: [Bioperl-l] problem help me...........please In-Reply-To: <20060513051758.4610.qmail@webmail31.rediffmail.com> Message-ID: <000901c676b9$b14479c0$15327e82@pyrimidine> I really hate to break the bad news here, but I'm going to be brutally honest. I have not looked at any of the Bio::Graph modules and have no idea how they are implemented, and I haven't looked at your input file, but I can tell right off the bat your script has major logic problems. I can also pretty much tell that you don't understand the object model we use here, at all. This is why I say that (from your last response): > ## no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) ## > what this stand for Did you cut and paste from several other scripts hoping that it would work? I say that b/c you mix styles quite frequently here, using objects correctly (deref'ing with '->') and incorrectly (print "$object"). You also declare (and redeclare) @ISA four times for a script (not needed unless you're declaring a class and inheriting methods from other modules). You also use @ISA once with a misspelled module name (I don't think there is a module named 'Expoerter'). So, I'm actually stunned that the script doesn't crash at all. Yikes! Okay, brutal honesty time over. Any time you see something like this: Bio::Graph::ProteinGraph=HASH(0x1182e70) means that what you are printing out is an reference to an object (it refers to the object class and the location in memory) and is NOT what you want. You should be doing something along the lines of $object->method, not 'print $object', to get at the object data and methods. You use this several times in your script already; that should be a big hint as the areas where it doesn't work do not use this syntax. Read the documentation for the many varied modules you use in your script. Look at script examples. Start simply, then work your way up. Also, using the '->' dereferencing operator inside double quotes doesn't work; you have to do something like: print $graph->nodes,"\t"; not print "$graph->nodes\t"; That's why you get this in your output: Bio::Graph::ProteinGraph=HASH(0x1182e70)->nodes Which just prints the object reference with the string '->nodes'. If any of what I just said doesn't make any sense, you really need to pick up 'Learning Perl' and 'Intermediate Perl' by Schwartz et al and 'Programming Perl' by Wall et al. I don't know if anyone can really help at this point w/o completely writing the script for you. We will fix problems to a point but we, for the most part, will not do your work for you. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of saurabh maheshwari > Sent: Saturday, May 13, 2006 12:18 AM > To: bioperl_l > Subject: [Bioperl-l] problem help me...........please > > > hello > I am very happy to see the prompt reply from the group members.. > As you all suggested to attach the required files .. > So I have attached all the three file first the input file,secod I have > saved the error I was getting into a error file and third the programme > file.. > Actully in error file I want to know some thing . > I am putting here one error line, > ## no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) ## > what this stand for > Second thing I want to get the connected graph as I have. > which type of connected grph I explain you by example.. > Let there are five object in such a way. > A connected to B > A connected to C > B connected to C > D connected to C > E connected to A > I want to create a whole link in betwwen all five. > > > Please help me I am not getting the result > > > with Regards > > SAURABH MAHESHWARI > > M.Sc. (BIOINFORMATICS) > > JAMIA MILLIA ISLAMIA > > NEW DELHI From hubert.prielinger at gmx.at Sat May 13 23:45:58 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Sat, 13 May 2006 21:45:58 -0600 Subject: [Bioperl-l] parsing output files from other tools Message-ID: <4466A7F6.30204@gmx.at> hi, Is it possible to parse text outputfiles rather than blast output files, like the text outputfiles form the search tool mpSrch that is offered by EBI, because the WU Blast output files are possible to parse with bioperl. thanks Hubert From arareko at campus.iztacala.unam.mx Sun May 14 00:09:35 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Sat, 13 May 2006 23:09:35 -0500 Subject: [Bioperl-l] Deobfuscator interface now available Message-ID: <4466AD7F.6050700@campus.iztacala.unam.mx> I'm glad to announce the availability of the Deobfuscator interface at the BioPerl website. You can use it at the following URL: http://bioperl.org/cgi-bin/deob_interface.cgi Many thanks to Laura Kavanaugh and David Messina for this great contribution to the BioPerl project! Mauricio. -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Sun May 14 12:18:10 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 14 May 2006 11:18:10 -0500 Subject: [Bioperl-l] parsing output files from other tools In-Reply-To: <4466A7F6.30204@gmx.at> Message-ID: <000301c67772$00b4e4f0$15327e82@pyrimidine> These are the current report types parsed through SearchIO: http://www.bioperl.org/wiki/Module:Bio::SearchIO I don't see mpsrch among them. If you want you could create a new plugin module to parse those reports; the SearchIO HOWTO gives some pointers: http://www.bioperl.org/wiki/HOWTO:SearchIO You can always look at some of the current modules like blast, blastxml, or fasta to get an idea of how it works. Judging by the mpsrch output I'm pretty sure you would have to build a custom plugin for it. A viable alternative: looking through the mail list it looks like mpsrch is a multiprocessor implementation of ssearch, itself an implementation of the Smith-Waterman algorithm for local alignments in the FASTA package of programs: http://www.bioperl.org/wiki/SSEARCH You might be able to use SearchIO::fasta there... Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger > Sent: Saturday, May 13, 2006 10:46 PM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] parsing output files from other tools > > hi, > Is it possible to parse text outputfiles rather than blast output files, > like the text outputfiles form the search tool mpSrch that is offered by > EBI, because the WU Blast output files are possible to parse with bioperl. > > thanks > Hubert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From chen_li3 at yahoo.com Sun May 14 13:14:30 2006 From: chen_li3 at yahoo.com (chen li) Date: Sun, 14 May 2006 10:14:30 -0700 (PDT) Subject: [Bioperl-l] no revcom method in Bio::Seq module? Message-ID: <20060514171430.74846.qmail@web36802.mail.mud.yahoo.com> Hi all, I need to get a reverse-complemenary sequence out of a fasta sequence file. And the Synopsis of Bio::Seq points out I can do like this way: $revcom=$seqobj->revcom(); I use the following script trying to get the job done but it doesn't work. Then I read documentation of Bio::Seq and it looks like it doesn't contain revcom method. Any idea will be appreciated. Li ############################### Here is the code: #!c:/perl/bin/perl.exe use strict; use warnings; use Bio::Seq; use Bio::SeqIO; my $file='c:/perl/local/primer3_1.0.0/src/est.txt'; my $seqIO=Bio::SeqIO->new(-file=>"<$file", -format=>'fasta' ); my $seqobj=$seqIO->next_seq();#create object print "what attributes/keys are available:\n"; for my $key (sort keys %$seqobj){ my $value=$seqobj->{$key}; print "$key\t=>\t$value\n" } # These are the output on the screen #primary_id => gi|54093|emb|X61809.1| #primary_seq => Bio::PrimarySeq=HASH(0x10492848) #based on these results primary_id can get #access right away # as to primary_seq it is an object in #Bio::Primaryseq and it provides the following #methods after reading the documentaion: #new #seq #validate_seq #subseq #length #display_id #accession_number #primary_id #alphabet #desc #can_call_new #id #is_circular #object_id #version #authority #namespace #display_name #description print "primary_id=",$seqobj->primary_id, "\n\n"; print "id=",$seqobj->id, "\n\n"; print "revcom=",$seqobj->revcom,"\n\n"; my $now_time=localtime; print $now_time, "\n\n"; exit; #These are the output on the screen #primary_id=gi|54093|emb|X61809.1| #id=gi|54093|emb|X61809.1 #revcom=Bio::Seq=HASH(0x10493304) #Sun May 14 12:45:20 2006 __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cjfields at uiuc.edu Sun May 14 13:39:50 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 14 May 2006 12:39:50 -0500 Subject: [Bioperl-l] no revcom method in Bio::Seq module? In-Reply-To: <20060514171430.74846.qmail@web36802.mail.mud.yahoo.com> Message-ID: <000401c6777d$66ddb120$15327e82@pyrimidine> This line should give you the hint: #revcom=Bio::Seq=HASH(0x10493304) You're getting an object ref here. The actual way to get the rev. comp on the wiki states '$seq->revcom->seq', not '$seq->revcom'. When I ran your script and change your line to the wiki version I get (using my test seq): what attributes/keys are available: primary_id => test, primary_seq => Bio::PrimarySeq=HASH(0x1d47fe0) primary_id=test, id=test, revcom=GGAACGAGATCTCCATGCCGCGCACCATCGGCCCGGGATGCAGCACGATCGCGCGGTCCGGCAGCATCG CCTGGCGCTTCTCGGACAATCCGTAGCGCACCGAGTACTCACGCGCGGA CGGGAAGAAACTGCCGTTCATGCGTTCGGCCTGCACGCGCAGCATGAGCACCGCGTCGGCCGCGGGCAGTTCGGCG TCCAGGTCATAGGACACGGTCACCGGCCAGTTCTCGACGCCCCTGGGGA GCAGCGTCGGTGGGGACACCAGCACCACCTCGGCCCCGAGGGTGTGCAGCAGCGTCACGTTGGAGCGGGCCACGCG GCTGTGCAGCACGTCGCCGACGATCACCACGCGCTTGCCCTCGACGCTG Sun May 14 17:34:45 2006 Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of chen li > Sent: Sunday, May 14, 2006 12:15 PM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] no revcom method in Bio::Seq module? > > Hi all, > > I need to get a reverse-complemenary sequence out of a > fasta sequence file. And the Synopsis of Bio::Seq > points out I can do like this way: > > $revcom=$seqobj->revcom(); > > I use the following script trying to get the job done > but it doesn't work. Then I read documentation of > Bio::Seq and it looks like it doesn't contain revcom > method. > > Any idea will be appreciated. > > Li > > > ############################### > Here is the code: > > #!c:/perl/bin/perl.exe > use strict; > use warnings; > > use Bio::Seq; > use Bio::SeqIO; > > my $file='c:/perl/local/primer3_1.0.0/src/est.txt'; > > > my $seqIO=Bio::SeqIO->new(-file=>"<$file", > -format=>'fasta' ); > > my $seqobj=$seqIO->next_seq();#create object > > print "what attributes/keys are available:\n"; > for my $key (sort keys %$seqobj){ > my $value=$seqobj->{$key}; > print "$key\t=>\t$value\n" > } > # These are the output on the screen > #primary_id => gi|54093|emb|X61809.1| > #primary_seq => Bio::PrimarySeq=HASH(0x10492848) > > #based on these results primary_id can get > #access right away > # as to primary_seq it is an object in > #Bio::Primaryseq and it provides the following > #methods after reading the documentaion: > #new > #seq > #validate_seq > #subseq > #length > #display_id > #accession_number > #primary_id > #alphabet > #desc > #can_call_new > #id > #is_circular > #object_id > #version > #authority > #namespace > #display_name > #description > > print "primary_id=",$seqobj->primary_id, "\n\n"; > print "id=",$seqobj->id, "\n\n"; > print "revcom=",$seqobj->revcom,"\n\n"; > > my $now_time=localtime; > print $now_time, "\n\n"; > exit; > > #These are the output on the screen > #primary_id=gi|54093|emb|X61809.1| > #id=gi|54093|emb|X61809.1 > #revcom=Bio::Seq=HASH(0x10493304) > #Sun May 14 12:45:20 2006 > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From chen_li3 at yahoo.com Sun May 14 14:08:49 2006 From: chen_li3 at yahoo.com (chen li) Date: Sun, 14 May 2006 11:08:49 -0700 (PDT) Subject: [Bioperl-l] no revcom method in Bio::Seq module? In-Reply-To: <000401c6777d$66ddb120$15327e82@pyrimidine> Message-ID: <20060514180849.55423.qmail@web36808.mail.mud.yahoo.com> Hi Chris, Thank you very much. But could you please give me the link for this syntax: $seq->revcom->seq? Li --- Chris Fields wrote: > This line should give you the hint: > > #revcom=Bio::Seq=HASH(0x10493304) > > You're getting an object ref here. The actual way > to get the rev. comp on > the wiki states '$seq->revcom->seq', not > '$seq->revcom'. > > When I ran your script and change your line to the > wiki version I get (using > my test seq): > > what attributes/keys are available: > primary_id => test, > primary_seq => > Bio::PrimarySeq=HASH(0x1d47fe0) > primary_id=test, > > id=test, > > revcom=GGAACGAGATCTCCATGCCGCGCACCATCGGCCCGGGATGCAGCACGATCGCGCGGTCCGGCAGCATCG > CCTGGCGCTTCTCGGACAATCCGTAGCGCACCGAGTACTCACGCGCGGA > CGGGAAGAAACTGCCGTTCATGCGTTCGGCCTGCACGCGCAGCATGAGCACCGCGTCGGCCGCGGGCAGTTCGGCG > TCCAGGTCATAGGACACGGTCACCGGCCAGTTCTCGACGCCCCTGGGGA > GCAGCGTCGGTGGGGACACCAGCACCACCTCGGCCCCGAGGGTGTGCAGCAGCGTCACGTTGGAGCGGGCCACGCG > GCTGTGCAGCACGTCGCCGACGATCACCACGCGCTTGCCCTCGACGCTG > > Sun May 14 17:34:45 2006 > > Chris > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of chen li > > Sent: Sunday, May 14, 2006 12:15 PM > > To: bioperl-l at bioperl.org > > Subject: [Bioperl-l] no revcom method in Bio::Seq > module? > > > > Hi all, > > > > I need to get a reverse-complemenary sequence out > of a > > fasta sequence file. And the Synopsis of Bio::Seq > > points out I can do like this way: > > > > $revcom=$seqobj->revcom(); > > > > I use the following script trying to get the job > done > > but it doesn't work. Then I read documentation of > > Bio::Seq and it looks like it doesn't contain > revcom > > method. > > > > Any idea will be appreciated. > > > > Li > > > > > > ############################### > > Here is the code: > > > > #!c:/perl/bin/perl.exe > > use strict; > > use warnings; > > > > use Bio::Seq; > > use Bio::SeqIO; > > > > my > $file='c:/perl/local/primer3_1.0.0/src/est.txt'; > > > > > > my $seqIO=Bio::SeqIO->new(-file=>"<$file", > > -format=>'fasta' ); > > > > my $seqobj=$seqIO->next_seq();#create object > > > > print "what attributes/keys are available:\n"; > > for my $key (sort keys %$seqobj){ > > my $value=$seqobj->{$key}; > > print "$key\t=>\t$value\n" > > } > > # These are the output on the screen > > #primary_id => gi|54093|emb|X61809.1| > > #primary_seq => > Bio::PrimarySeq=HASH(0x10492848) > > > > #based on these results primary_id can get > > #access right away > > # as to primary_seq it is an object in > > #Bio::Primaryseq and it provides the following > > #methods after reading the documentaion: > > #new > > #seq > > #validate_seq > > #subseq > > #length > > #display_id > > #accession_number > > #primary_id > > #alphabet > > #desc > > #can_call_new > > #id > > #is_circular > > #object_id > > #version > > #authority > > #namespace > > #display_name > > #description > > > > print "primary_id=",$seqobj->primary_id, "\n\n"; > > print "id=",$seqobj->id, "\n\n"; > > print "revcom=",$seqobj->revcom,"\n\n"; > > > > my $now_time=localtime; > > print $now_time, "\n\n"; > > exit; > > > > #These are the output on the screen > > #primary_id=gi|54093|emb|X61809.1| > > #id=gi|54093|emb|X61809.1 > > #revcom=Bio::Seq=HASH(0x10493304) > > #Sun May 14 12:45:20 2006 > > > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cjfields at uiuc.edu Sun May 14 14:28:14 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Sun, 14 May 2006 13:28:14 -0500 Subject: [Bioperl-l] no revcom method in Bio::Seq module? Message-ID: I think the confusion lies in what revcom returns. This page http://www.bioperl.org/wiki/Getting_Started show a quick way of using revcom, (which I mentioned previously) while this page http://www.bioperl.org/wiki/HOWTO:Beginners explains what is returned when you use revcom. '$seq_obj->revcom' returns a sequence object (not a sequence string): http://www.bioperl.org/wiki/HOWTO:Beginners#The_Sequence_Object which is why you need to use the 'seq' method to get the string. Hence, '$seq_obj->revcom->seq'. Chris ---- Original message ---- >Date: Sun, 14 May 2006 11:08:49 -0700 (PDT) >From: chen li >Subject: RE: [Bioperl-l] no revcom method in Bio::Seq module? >To: Chris Fields >Cc: bioperl-l at bioperl.org > >Hi Chris, > >Thank you very much. But could you please give me the >link for this syntax: $seq->revcom->seq? > >Li > > > >--- Chris Fields wrote: > >> This line should give you the hint: >> >> #revcom=Bio::Seq=HASH(0x10493304) >> >> You're getting an object ref here. The actual way >> to get the rev. comp on >> the wiki states '$seq->revcom->seq', not >> '$seq->revcom'. >> >> When I ran your script and change your line to the >> wiki version I get (using >> my test seq): >> >> what attributes/keys are available: >> primary_id => test, >> primary_seq => >> Bio::PrimarySeq=HASH(0x1d47fe0) >> primary_id=test, >> >> id=test, >> >> >revcom=GGAACGAGATCTCCATGCCGCGCACCATCGGCCCGGGATGCAGCACGAT CGCGCGGTCCGGCAGCATCG >> CCTGGCGCTTCTCGGACAATCCGTAGCGCACCGAGTACTCACGCGCGGA >> >CGGGAAGAAACTGCCGTTCATGCGTTCGGCCTGCACGCGCAGCATGAGCACCGCG TCGGCCGCGGGCAGTTCGGCG >> TCCAGGTCATAGGACACGGTCACCGGCCAGTTCTCGACGCCCCTGGGGA >> >GCAGCGTCGGTGGGGACACCAGCACCACCTCGGCCCCGAGGGTGTGCAGCAGCGT CACGTTGGAGCGGGCCACGCG >> GCTGTGCAGCACGTCGCCGACGATCACCACGCGCTTGCCCTCGACGCTG >> >> Sun May 14 17:34:45 2006 >> >> Chris >> >> > -----Original Message----- >> > From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l- >> > bounces at lists.open-bio.org] On Behalf Of chen li >> > Sent: Sunday, May 14, 2006 12:15 PM >> > To: bioperl-l at bioperl.org >> > Subject: [Bioperl-l] no revcom method in Bio::Seq >> module? >> > >> > Hi all, >> > >> > I need to get a reverse-complemenary sequence out >> of a >> > fasta sequence file. And the Synopsis of Bio::Seq >> > points out I can do like this way: >> > >> > $revcom=$seqobj->revcom(); >> > >> > I use the following script trying to get the job >> done >> > but it doesn't work. Then I read documentation of >> > Bio::Seq and it looks like it doesn't contain >> revcom >> > method. >> > >> > Any idea will be appreciated. >> > >> > Li >> > >> > >> > ############################### >> > Here is the code: >> > >> > #!c:/perl/bin/perl.exe >> > use strict; >> > use warnings; >> > >> > use Bio::Seq; >> > use Bio::SeqIO; >> > >> > my >> $file='c:/perl/local/primer3_1.0.0/src/est.txt'; >> > >> > >> > my $seqIO=Bio::SeqIO->new(-file=>"<$file", >> > -format=>'fasta' ); >> > >> > my $seqobj=$seqIO->next_seq();#create object >> > >> > print "what attributes/keys are available:\n"; >> > for my $key (sort keys %$seqobj){ >> > my $value=$seqobj->{$key}; >> > print "$key\t=>\t$value\n" >> > } >> > # These are the output on the screen >> > #primary_id => gi|54093|emb|X61809.1| >> > #primary_seq => >> Bio::PrimarySeq=HASH(0x10492848) >> > >> > #based on these results primary_id can get >> > #access right away >> > # as to primary_seq it is an object in >> > #Bio::Primaryseq and it provides the following >> > #methods after reading the documentaion: >> > #new >> > #seq >> > #validate_seq >> > #subseq >> > #length >> > #display_id >> > #accession_number >> > #primary_id >> > #alphabet >> > #desc >> > #can_call_new >> > #id >> > #is_circular >> > #object_id >> > #version >> > #authority >> > #namespace >> > #display_name >> > #description >> > >> > print "primary_id=",$seqobj->primary_id, "\n\n"; >> > print "id=",$seqobj->id, "\n\n"; >> > print "revcom=",$seqobj->revcom,"\n\n"; >> > >> > my $now_time=localtime; >> > print $now_time, "\n\n"; >> > exit; >> > >> > #These are the output on the screen >> > #primary_id=gi|54093|emb|X61809.1| >> > #id=gi|54093|emb|X61809.1 >> > #revcom=Bio::Seq=HASH(0x10493304) >> > #Sun May 14 12:45:20 2006 >> > >> > >> > >> > __________________________________________________ >> > Do You Yahoo!? >> > Tired of spam? Yahoo! Mail has the best spam >> protection around >> > http://mail.yahoo.com >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > >__________________________________________________ >Do You Yahoo!? >Tired of spam? Yahoo! Mail has the best spam protection around >http://mail.yahoo.com From Marc.Logghe at DEVGEN.com Sun May 14 16:28:34 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Sun, 14 May 2006 22:28:34 +0200 Subject: [Bioperl-l] no revcom method in Bio::Seq module? Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746DAC@ANTARESIA.be.devgen.com> Hi Li, > doesn't work. Then I read documentation of Bio::Seq and it > looks like it doesn't contain revcom method. Here, the Deobfuscator interface that Mauricio announced earlier, comes in handy. http://bioperl.org/cgi-bin/deob_interface.cgi?Search=Search&module=Bio%3 A%3ASeq&sort_order=by+method&search_string= If you look in the methods table, you will find out that the revcom method is inherited from, and implemented by Bio::PrimarySeqI. HTH, Marc From sb at mrc-dunn.cam.ac.uk Mon May 15 04:18:11 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Mon, 15 May 2006 09:18:11 +0100 Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names In-Reply-To: <000f01c675e6$a61bde90$15327e82@pyrimidine> References: <000f01c675e6$a61bde90$15327e82@pyrimidine> Message-ID: <44683943.5020307@mrc-dunn.cam.ac.uk> Chris Fields wrote: > Sendu Bala wrote: >> In bioperl up to at least 1.5.1, when one of the database modules >> comes across a species rank it does: >> >> if ($rank eq 'species') { # get rid of genus from species name >> (undef,$taxon_name) = split(/\s+/,$taxon_name,2); } > > The XML example from NCBI Taxonomy I mentioned previously seems to > have everything in the classification, from superkingdom down to > species (no strain unfortunately, and I'm nit sure about subspecies); > if it's missing the rank then the designation doesn't exist or is > tagged as 'no rank'. Like I mentioned before I'm not intimately > familiar Bio::Taxonomy, Bio::DB::Taxonomy, or Bio::Species, so I > don't have a clue as to how everything is parsed and plugged in to > Bio::Taxonomy objects. I do know that XML::Twig is used for parsing > through the data so it shouldn't be too hard to change what you > want. Yes, that's all true, but I'm not sure what it has to do with what I was saying. FYI, you do get a 'subspecies' rank but no 'variant' rank. In my own implementation I change the rank of all 'no rank' Nodes below species to 'variant'. > I haven't tried using Bio::DB::Taxonomy directly yet, but I would > have thought that the binomial is just built from the XML twig > 'LineageEx' Rank=Genus + Rank=Species, that the genus comes from the > tag 'Genus' and species from 'Species', and that the scientific name > is from the tag 'ScientificName'. Guess not. No. See above for what it actually does. That is a copy/paste from the code (there, $taxon_name == ScientificName). When it finds a species rank it does that split because in the ncbi taxonomy database the 'genus' rank for a human has a ScientificName of 'Homo', whilst the 'species' rank has a ScientificName of 'Homo sapiens', and the bioperl model (quite rightly, I think) wants the 'species' node to not have information of other nodes (well, except for the classification array). So it removes the 'Homo' from 'Homo sapiens' giving a species name of 'sapiens'. This then allows the binomial method to return 'Homo sapiens' instead of 'Homo Homo sapiens'. (though in a bizarre twist, and this is one of my problems with how names are currently represented in the Taxonomy modules, 'Scientific Name' and 'binomial' are synonymous) [snip] >> My solution is to just remove whatever is the same between the >> current rank and the previous rank. Maybe even that's not so >> perfect, but it must be a lot better than turning the species >> 'Avian leukosis virus' into the species 'virus' (especially given >> that the genus here is 'Alpharetrovirus')! > > I'm don't think taking Genus/Species directly from the scientific > name (normally what is in the SOURCE or ORGANISM annotation for > GenBank or OS for EMBL) is the best way to go about it [snip] Perhaps, but again I'm not sure what this has to do with what I was saying. If you don't want your species name to contain your genus name you have to do some kind of parsing. My post merely pointed out that the parsing currently in bioperl does not work for viruses and possibly other species. I'd like to think that someone cares about this error and would do the simple fix I offered, or that they already know about the problem and have done their own fix. > I'm also not sure that forcing a lookup for every TaxID in every > sequence every time it's passed through SeqIO is the best way to go > either, though I think it should be required for storing sequences. > It's a tricky balance. In my own implementation any database lookups are cached, and you have the option of not doing any database lookup at all and 'faking' a taxonomy from the supplied list of names (so it works just like normal Bio::Seq). > I still think that maybe we should absolve ourselves from using > SOURCE/ORGANISM or OS/OC information in GenBank files as anything > more than strictly annotation, or reconstruct Bio::Species to maybe a > Bio::Annotation::Species object to handle that annotation and either > deprecate Bio::Species or separate it completely from any > Bio::Taxonomy objects. It would really simplify things. Then, if > anyone is interested in taxonomy, either install a local database or > use Entrez efetch, and then use Bio::DB::Taxonomy (fixed of course) > to grab the TaxID info. My personal view is that having it as an annotation would serve no real purpose. For me the whole point of any kind of species representation in bioperl is to allow you to compare species in a biologically meaningful way. If it's just some annotation then that means it's basically free-form text and you have no guarantee that two sequences from the same species are annotated exactly the same - no guarantee that your code would identify that those sequences are from the same species. The only other useful thing that a species object needs to do it let you know how related two different species are - you need to be able to ask what a species' class, kingdom etc. are. Again, not viable with an annotation - you need something strict like a properly constructed Taxonomy. I guess it comes down to the philosophy of parsing a file. Do you try and reflect exactly what the file contains, letter for letter, so that your resulting object can recreate that file letter for letter, or do you parse the file and extract the correct /meaning/ in order to be more useful? I think there can be a choice by the user, and this is best done by making Bio::Species a clever wrapper around an improved Bio::Taxonomy, as in my own implementation. From s_maheshwari84 at rediffmail.com Mon May 15 04:15:26 2006 From: s_maheshwari84 at rediffmail.com (saurabh maheshwari) Date: 15 May 2006 08:15:26 -0000 Subject: [Bioperl-l] please help Message-ID: <20060515081526.27270.qmail@webmail7.rediffmail.com> Hello All I have sent a problem to the earlier also but my problem is still unsolve so i have modified the problem in another way please can any body give me code to make a graph between some items which are in a text file in the following formate: Example item1 interacts with item2 and i want to make graph by giving any item as input and asking all interactions of that item. item 1 item 2 A B A C C B D B D E A F G A with Regards SAURABH MAHESHWARI M.Sc. (BIOINFORMATICS) JAMIA MILLIA ISLAMIA NEW DELHI From sdavis2 at mail.nih.gov Mon May 15 06:26:53 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Mon, 15 May 2006 06:26:53 -0400 Subject: [Bioperl-l] please help In-Reply-To: <20060515081526.27270.qmail@webmail7.rediffmail.com> Message-ID: On 5/15/06 4:15 AM, "saurabh maheshwari" wrote: > > Hello All > I have sent a problem to the earlier also but my problem is still unsolve so i > have modified the problem in another way please can any body give me code to > make a graph between some items which are in a text file in the following > formate: > Example > item1 interacts with item2 and i want to make graph by giving any item as > input and asking all interactions of that item. > > item 1 item 2 > A B > A C > C B > D B > D E > A F > G A Not a bioperl answer, but in your case, I would suggest looking at using cytoscape to do this. Look here for details: http://www.cytoscape.org/ Sean From sdavis2 at mail.nih.gov Mon May 15 07:03:28 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Mon, 15 May 2006 07:03:28 -0400 Subject: [Bioperl-l] please help In-Reply-To: Message-ID: On 5/15/06 6:26 AM, "Sean Davis" wrote: > > > > On 5/15/06 4:15 AM, "saurabh maheshwari" > wrote: > >> >> Hello All >> I have sent a problem to the earlier also but my problem is still unsolve so >> i >> have modified the problem in another way please can any body give me code to >> make a graph between some items which are in a text file in the following >> formate: >> Example >> item1 interacts with item2 and i want to make graph by giving any item as >> input and asking all interactions of that item. >> >> item 1 item 2 >> A B >> A C >> C B >> D B >> D E >> A F >> G A > > Not a bioperl answer, but in your case, I would suggest looking at using > cytoscape to do this. Look here for details: > > http://www.cytoscape.org/ I forgot to mention, if you are looking for a perl solution, I would look at the Graph module. http://search.cpan.org/~jhi/Graph-0.69/lib/Graph.pod You can create the graph according to the docs and then use the neighbors() method (if I remember correctly) to get the nodes connected to the query node. Sean From akarger at CGR.Harvard.edu Mon May 15 08:20:11 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Mon, 15 May 2006 08:20:11 -0400 Subject: [Bioperl-l] Deobfuscator interface now available Message-ID: This tool is quite nice, and may save me a lot of perdoc'ing. A couple of minor interface thoughts. 1)There's quite a lot of methods for many of the classes. As such, I think I'll often want to browse through what's available in a class. But 60% or so of the screen real estate is used for "Enter a search string... OR select a class from the list". IMO, it would be better to have two pages, a search page and a result page. It only takes a click on Back (or a "new search" button) to get to a new search, and now you can use your whole screen for reading your results. 2) Please sort the "select a class from the list" alphabetically. I guess I can enter a search term to get the right classes, but it would be nice to be able to browse. 2a) if you want to be really fancy, make a javascript nested menu with expandable submenus. OK, maybe not. 3) Minimalist is nice, but documentation is even nicer. It wasn't clear to me that the search searches within class names rather than function names. What I really want to know sometimes is which module has, say, the revcom method in it. So, if it's not easy to include that within this search, then at least tell me what my search space is. 4) When I search for something that's not found, I get a screen that looks pretty familiar, with the extra text "No match to string found" down at the bottom. It took me a while to even notice it. (Studies show that most users don't read most of the text on a page.) Bold might be nice here. Or put the error at the top of the screen. Or both. 5) I'll save my stupidest comment for last - please make the page title "Bioperl Deobfuscator", so that when I bookmark it I'll know what the bookmark stands for. Thanks, Laura Kavanaugh and David Messina, for a neat AND useful tool. - Amir Karger Computational Biology Group Bauer Center for Genomics Research Harvard University 617-496-0626 From sb at mrc-dunn.cam.ac.uk Mon May 15 09:08:32 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Mon, 15 May 2006 14:08:32 +0100 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: References: Message-ID: <44687D50.6080306@mrc-dunn.cam.ac.uk> Amir Karger wrote: > This tool is quite nice, and may save me a lot of perdoc'ing. Yes, many thanks to everyone involved. > A couple of minor interface thoughts. > > 1)There's quite a lot of methods for many of the classes. As such, I > think I'll often want to browse through what's available in a class. But > 60% or so of the screen real estate is used for "Enter a search > string... OR select a class from the list". IMO, it would be better to > have two pages, a search page and a result page. It only takes a click > on Back (or a "new search" button) to get to a new search, and now you > can use your whole screen for reading your results. As the compromise it must be, I like the way it behaves. I don't like lots of windows. I especially don't like pop up windows. Right now when I'm using the bioperl docs I tend to have a whole bunch of tabs open to different class pages at once, so being able to see an overview all on one page in Deobfuscator is very nice. Further to that, I'd love it if clicking on a method name caused an in-place css(&|javascript) reveal (similar to how a well implemented drop down menu works in a website) rather than a new window opened. Alternatively, just have more columns in the results table, ie. usage, function, returns, args columns. I feel that opening a window for each method you want to understand is far too slow. I'd also really like a link to the code for the method as well. The bioperl docs are rarely complete enough that you can really understand what every method is supposed to do without looking at the code. > 3) Minimalist is nice, but documentation is even nicer. It wasn't clear > to me that the search searches within class names rather than function > names. What I really want to know sometimes is which module has, say, > the revcom method in it. This would be a great feature to add. Another minor interface thought: 6) Have a little more cell padding in all the tables. Things are just a little too cramped and things start to look messy/ run into each other. From cjfields at uiuc.edu Mon May 15 09:59:57 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 08:59:57 -0500 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <44687D50.6080306@mrc-dunn.cam.ac.uk> Message-ID: <000901c67827$d99eabb0$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Monday, May 15, 2006 8:09 AM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Deobfuscator interface now available > > Amir Karger wrote: > > This tool is quite nice, and may save me a lot of perdoc'ing. > > Yes, many thanks to everyone involved. The Deobfuscator currently indexes bioperl-1.4, so it's not completely up-to-date. I believe Mauricio and Dave may be working on updating to the newer versions and maybe bioperl-live, as well as getting the other bioperl packages up and running. For modules added after v1.4 I use the script in the FAQ question mentioned on the Deobfuscator wiki page to get up-to-date methods, then grab the that ActiveState HTML'd perldocs pumped out when installing using PPM (I make a custom PPM/PPD file and install myself every once in a while): #!/usr/bin/perl -w use Class::Inspector; $class = shift || die "Usage: methods perl_class_name\n"; eval "require $class"; print join ("\n", sort @{Class::Inspector- > > A couple of minor interface thoughts. > > > > 1)There's quite a lot of methods for many of the classes. As such, I > > think I'll often want to browse through what's available in a class. But > > 60% or so of the screen real estate is used for "Enter a search > > string... OR select a class from the list". IMO, it would be better to > > have two pages, a search page and a result page. It only takes a click > > on Back (or a "new search" button) to get to a new search, and now you > > can use your whole screen for reading your results. > > As the compromise it must be, I like the way it behaves. I don't like > lots of windows. I especially don't like pop up windows. Right now when > I'm using the bioperl docs I tend to have a whole bunch of tabs open to > different class pages at once, so being able to see an overview all on > one page in Deobfuscator is very nice. > > Further to that, I'd love it if clicking on a method name caused an > in-place css(&|javascript) reveal (similar to how a well implemented > drop down menu works in a website) rather than a new window opened. > Alternatively, just have more columns in the results table, ie. usage, > function, returns, args columns. I feel that opening a window for each > method you want to understand is far too slow. Agreed. > I'd also really like a link to the code for the method as well. The > bioperl docs are rarely complete enough that you can really understand > what every method is supposed to do without looking at the code. The methods that pop up are in columns along with the class module that implements the method. If you click on that link you get PDOC documentation for the module which includes most of the code (strangely, though Deobfuscator indexes bioperl 1.4, the PDOC corresponds to bioperl-live). Is that what you meant, or something a bit more detailed? > > 3) Minimalist is nice, but documentation is even nicer. It wasn't clear > > to me that the search searches within class names rather than function > > names. What I really want to know sometimes is which module has, say, > > the revcom method in it. That's listed in the method results table (the next column has the module with a link to the module's online docs). Chris > This would be a great feature to add. > > > Another minor interface thought: > 6) Have a little more cell padding in all the tables. Things are just a > little too cramped and things start to look messy/ run into each other. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Mon May 15 12:08:30 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 11:08:30 -0500 Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names In-Reply-To: <44683943.5020307@mrc-dunn.cam.ac.uk> Message-ID: <001601c67839$cf289490$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Monday, May 15, 2006 3:18 AM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, > subspecies/variant names > > Chris Fields wrote: > > Sendu Bala wrote: > >> In bioperl up to at least 1.5.1, when one of the database modules > >> comes across a species rank it does: > >> > >> if ($rank eq 'species') { # get rid of genus from species name > >> (undef,$taxon_name) = split(/\s+/,$taxon_name,2); } > > > > The XML example from NCBI Taxonomy I mentioned previously seems to > > have everything in the classification, from superkingdom down to > > species (no strain unfortunately, and I'm nit sure about subspecies); > > if it's missing the rank then the designation doesn't exist or is > > tagged as 'no rank'. Like I mentioned before I'm not intimately > > familiar Bio::Taxonomy, Bio::DB::Taxonomy, or Bio::Species, so I > > don't have a clue as to how everything is parsed and plugged in to > > Bio::Taxonomy objects. I do know that XML::Twig is used for parsing > > through the data so it shouldn't be too hard to change what you > > want. > > Yes, that's all true, but I'm not sure what it has to do with what I was > saying. FYI, you do get a 'subspecies' rank but no 'variant' rank. In my > own implementation I change the rank of all 'no rank' Nodes below > species to 'variant'. Sorry; wandered a bit off topic there. > > I haven't tried using Bio::DB::Taxonomy directly yet, but I would > > have thought that the binomial is just built from the XML twig > > 'LineageEx' Rank=Genus + Rank=Species, that the genus comes from the > > tag 'Genus' and species from 'Species', and that the scientific name > > is from the tag 'ScientificName'. Guess not. > > No. See above for what it actually does. That is a copy/paste from the > code (there, $taxon_name == ScientificName). When it finds a species > rank it does that split because in the > ncbi taxonomy database the 'genus' rank for a human has a ScientificName > of 'Homo', whilst the 'species' rank has a ScientificName of 'Homo > sapiens', and the bioperl model (quite rightly, I think) wants the > 'species' node to not have information of other nodes (well, except for > the classification array). So it removes the 'Homo' from 'Homo sapiens' > giving a species name of 'sapiens'. This then allows the binomial method > to return 'Homo sapiens' instead of 'Homo Homo sapiens'. > > (though in a bizarre twist, and this is one of my problems with how > names are currently represented in the Taxonomy modules, 'Scientific > Name' and 'binomial' are synonymous) Ah, now I see. That's a bit screwy, but it's not on our end so we have to deal with it. I also noticed that subspecies also contains the entire string: 135461 Bacillus subtilis subsp. subtilis subspecies As for the 'scientific_name' method when accessed through Bio::DB::Taxonomy, I don't get the actual scientific name for the node (from the GenBank ORGANISM line) almost every time; I get the name with the strain chopped off instead and a number of times the names get mangled. The regexes below only grab from the topmost tags: Script: --------------------------------- #! perl use strict; use warnings; use Bio::DB::Taxonomy; my $file = shift @ARGV; print "\nNCBI XML output ScientificName tag for each node:\n"; my @taxid =(); open (TAXFILE, "){ if (/^\s{2}(\d+)<\/TaxId>/) { print "$1\t"; push @taxid, $1; } print "$1\n" if /^\s{2}(.*)<\/ScientificName>/; } close TAXFILE; print "\nBio::DB::Taxonomy scientific_name:\n"; for my $id (@taxid){ my $factory = Bio::DB::Taxonomy->new(-source => 'entrez'); my $node = $factory->get_Taxonomy_Node(-taxonid => $id); print $node->ncbi_taxid,"\t",$node->scientific_name,"\n"; } --------------------------------- Output: --------------------------------- NCBI XML output ScientificName tag for each node: 191218 Bacillus anthracis str. A2012 198094 Bacillus anthracis str. Ames 222523 Bacillus cereus ATCC 10987 224308 Bacillus subtilis subsp. subtilis str. 168 226186 Bacteroides thetaiotaomicron VPI-5482 226900 Bacillus cereus ATCC 14579 246194 Carboxydothermus hydrogenoformans Z-2901 260799 Bacillus anthracis str. Sterne 261594 Bacillus anthracis str. 'Ames Ancestor' 264462 Bdellovibrio bacteriovorus HD100 272558 Bacillus halodurans C-125 272559 Bacteroides fragilis NCTC 9343 279010 Bacillus licheniformis ATCC 14580 281309 Bacillus thuringiensis serovar konkukian str. 97-27 288681 Bacillus cereus E33L 295405 Bacteroides fragilis YCH46 66692 Bacillus clausii KSM-K16 76114 Azoarcus sp. EbN1 Bio::DB::Taxonomy scientific_name: 191218 Bacillus cereus group anthracis 198094 Bacillus cereus group anthracis 222523 Bacillus cereus group cereus 224308 subtilis Bacillus subtilis subsp. subtilis 226186 Bacteroides thetaiotaomicron 226900 Bacillus cereus group cereus 246194 Carboxydothermus hydrogenoformans 260799 Bacillus cereus group anthracis 261594 Bacillus cereus group anthracis 264462 Bdellovibrio bacteriovorus 272558 Bacillus halodurans 272559 Bacteroides fragilis 279010 Bacillus licheniformis 281309 Bacillus cereus group thuringiensis 288681 Bacillus cereus group cereus 295405 Bacteroides fragilis 66692 Bacillus clausii 76114 Azoarcus sp. --------------------------------- Note Bacillus subtilis in the Bio::Tax output above. Not one of those is the scientific name as defined by NCBI (and most taxonomists for that matter). So, in a nutshell, there's a problem here. I don't know if your fix works for that, but I definitely don't think the 'scientific name' should be assembled ad hoc but should be taken from the tagname for that node. I am currently reduced to grabbing the feature primary_tagged 'source' and getting the 'organism' tagname from that. I cannot stress enough that it should NOT be that way. As for 'binomial' == 'scientific_name', I agree; I see it as well and that should be fixed. ... > Perhaps, but again I'm not sure what this has to do with what I was > saying. If you don't want your species name to contain your genus name > you have to do some kind of parsing. My post merely pointed out that the > parsing currently in bioperl does not work for viruses and possibly > other species. I'd like to think that someone cares about this error and > would do the simple fix I offered, or that they already know about the > problem and have done their own fix. Again me going off-topic, so my apologies; it's more to do with my frustrations with Bio::Species (not Bio::DB::Taxonomy). My point here was, since there is no real way to surmise from a GenBank flatfile what the taxonomic ranks are w/o guessing (which seems to break more often than not when dealing with complex names), there shouldn't be any tie to Bio::Tax objects, at least directly. I guess methods could be incorporated into Bio::Species for those who want to give it a try, but I would like to get a GenBank file, for once, in which the scientific name/binomial name isn't mangled by Bio::Species. Back to Bio::DB::Taxonomy; I don't have a problem with implementing your methods here; on the contrary, if they fix my problem above then I'll be more than glad to. I can't get to it immediately but maybe later today/tomorrow. > > I'm also not sure that forcing a lookup for every TaxID in every > > sequence every time it's passed through SeqIO is the best way to go > > either, though I think it should be required for storing sequences. > > It's a tricky balance. > > In my own implementation any database lookups are cached, and you have > the option of not doing any database lookup at all and 'faking' a > taxonomy from the supplied list of names (so it works just like normal > Bio::Seq). > > > > I still think that maybe we should absolve ourselves from using > > SOURCE/ORGANISM or OS/OC information in GenBank files as anything > > more than strictly annotation, or reconstruct Bio::Species to maybe a > > Bio::Annotation::Species object to handle that annotation and either > > deprecate Bio::Species or separate it completely from any > > Bio::Taxonomy objects. It would really simplify things. Then, if > > anyone is interested in taxonomy, either install a local database or > > use Entrez efetch, and then use Bio::DB::Taxonomy (fixed of course) > > to grab the TaxID info. > > My personal view is that having it as an annotation would serve no real > purpose. For me the whole point of any kind of species representation in > bioperl is to allow you to compare species in a biologically meaningful > way. If it's just some annotation then that means it's basically > free-form text and you have no guarantee that two sequences from the > same species are annotated exactly the same - no guarantee that your > code would identify that those sequences are from the same species. > The only other useful thing that a species object needs to do it let you > know how related two different species are - you need to be able to ask > what a species' class, kingdom etc. are. Again, not viable with an > annotation - you need something strict like a properly constructed > Taxonomy. My point is, a large number of users do NOT use, nor care about, taxonomic information to the degree they need to know the entire classification of the organism; many are just as happy about getting the scientific name only, which is in the GenBank/EMBL file itself. To take one extreme, it is not productive to force every user to download the NCBI tax database and use lookups just to convert sequences from EMBL format to GenBank format. It's not productive to allow users to spam the NCBI tax database remotely either, so hardcoding lookups is, IMHO, a big mistake. > I guess it comes down to the philosophy of parsing a file. Do you try > and reflect exactly what the file contains, letter for letter, so that > your resulting object can recreate that file letter for letter, or do > you parse the file and extract the correct /meaning/ in order to be more > useful? > I think there can be a choice by the user, and this is best done by > making Bio::Species a clever wrapper around an improved Bio::Taxonomy, > as in my own implementation. I understand both philosophies, but the latter implies that you know the intention of the ones submitting the sequence. 99.9% of the time that's fine, something I can live with. However, when we mess up something as simple as getting the scientific name for an organism when the information is directly in the flat file (ORGANISM line) by trying to 'imply' what the classification is, yes, I get frustrated. Even more frustrating to me is that Bio::DB::Taxonomy, which should return accurate information directly from the Taxonomy database, still manages to screw up the scientific name. The NCBI definition in the sample record: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html state that the ORGANISM line contains the formal scientific name and it's lineage (no ranking). If the lineage is very long it is abbreviated so you don't get the same thing as you would through using TaxID. So, in essence, I believe you are correct, that Bio::Species can be used as a 'wrapper' for Bio::Taxonomy objects, but only up to a certain degree with caveats or warnings for possible inaccuracies. I also believe that lookups should be allowed but optional, not required (i.e. left up to the user, as you state). I just feel that it's somewhat misleading to imply, by delegating to Bio::Taxonomy, that Bio::Species contains accurate taxonomic information when NCBI themselves state that the GenBank flatfile classification can be incomplete and does not supply rankings (genus, species) in the file. It's our best guess in most cases, and a best guess by definition is not very accurate. If you want taxonomic accuracy, use the TaxID and a local tax database. I feel that we shouldn't punish those who don't worry/care about taxonomy by implementing Bio::Species with methods that mangle data that's directly in the flat file they're parsing. Okay, not to cut short this discussion, but I have to get back to $job. I'll try adding your fixes in a bit later today/tomorrow; if they pass tests I'll commit them in. Chris From hlapp at gmx.net Mon May 15 12:59:06 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 15 May 2006 12:59:06 -0400 Subject: [Bioperl-l] error loading uniprot release 49.6 into mysql In-Reply-To: <051520061234.14794.446875470003415A000039CA21602807489D0A02970E9DD29C@att.net> References: <051520061234.14794.446875470003415A000039CA21602807489D0A02970E9DD29C@att.net> Message-ID: You found the right instance. Unfortunately with the way the bioperl swissprot parser works the group (RG) isn't promoted to author if there is no author in addition (in fact you may debate whether that would even be the best way of doing things), so it doesn't find it on second occurrence by unique key. If you can live without this entry, or any other entry that causes a hiccup, just supply the flag --safe and it will gracefully move on to the next entry. Fixing the issue would require either to fix the bioperl swissprot parser (or Bio::Annotation::Reference) to stick the RG group into the author slot if there is no author, or to fix Bioperl Bio::Annotation::Reference to also feature a group and biosql to use it in place of a missing author. Actually there is $reference->rg. Maybe Bioperl-db (and hence Biosql) should just use that in place of a missing author? The downside is that upon round-tripping an entry, the RG annotation line will become an RA annotation line. How bad would that be? Any thoughts from anyone? -hilmar On May 15, 2006, at 8:34 AM, s.rayner at att.net wrote: > I found where the script is hiccuping.... > > The Uniprot release contains lines with identical annotation for > the RL keyword for two different sequences. > > ___________________ > > First occurence... > ___________________ > > ID 1433T_PONPY STANDARD; PRT; 245 AA. > AC Q5RFJ2; Q5RDK2; > DT 05-JUL-2005, integrated into UniProtKB/Swiss-Prot. > DT 05-JUL-2005, sequence version 2. > DT 18-APR-2006, entry version 13. > DE 14-3-3 protein theta. > GN Name=YWHAQ; > OS Pongo pygmaeus (Orangutan). > OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; > OC Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; > OC Catarrhini; Hominidae; Pongo. > OX NCBI_TaxID=9600; > RN [1] > RP NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA]. > RC TISSUE=Brain cortex, and Kidney; > RG The German cDNA consortium; > RL Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases. > <====== Not Unique > > > ___________________ > > Second occurence... > ___________________ > > > ID 1433G_PONPY STANDARD; PRT; 246 AA. > AC Q5RC20; > DT 05-JUL-2005, integrated into UniProtKB/Swiss-Prot. > DT 05-JUL-2005, sequence version 2. > DT 18-APR-2006, entry version 13. > DE 14-3-3 protein gamma. > GN Name=YWHAG; > OS Pongo pygmaeus (Orangutan). > OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; > OC Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; > OC Catarrhini; Hominidae; Pongo. > OX NCBI_TaxID=9600; > RN [1] > RP NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA]. > RC TISSUE=Heart; > RG The German cDNA consortium; > RL Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases. > <====== Not Unique > > > > in these two cases the generated CRC key is identical and so MySQL > throws a wobbly. > > if i look at the MySQL entry in the REFERENCE table for the first > sequence > ------+-------+---------+----------------------+ > | 139 | NULL | Submitted (NOV-2004) to the EMBL/ > GenBank/DDBJ databases. | NULL | NULL | CRC-E7973FEA4B5611DC | > +--------------+----------- > +---------------------------------------------------- > > and the error when the script choked was > > MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, > values were > ("","","Submitted (NOV-2004) to the EMBL/GenBank/DDBJ > databases.","CRC-E7973FEA4B5611DC","","","") FKs ( Duplicate entry 'CRC-E7973FEA4B5611DC' for key 3 > > hence the problem. > > I'm guessing i'm not the first person to encounter this, but dont > see any hints for an easy way around this. > > any suggestions....? > > ta > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Mon May 15 13:01:14 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 15 May 2006 13:01:14 -0400 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <4466AD7F.6050700@campus.iztacala.unam.mx> References: <4466AD7F.6050700@campus.iztacala.unam.mx> Message-ID: <068E49BD-2DE4-47BA-BD7C-D6FD487DF095@gmx.net> Hey, thanks to Laura & David for this interface. Any idea why most of the Bio::Ontology::* modules show up without their leading Bio::Ontology? And clicking on those hyperlinks doesn't go anywhere either ... Anything different with those modules that I can fix? -hilmar On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: > I'm glad to announce the availability of the Deobfuscator interface at > the BioPerl website. You can use it at the following URL: > > http://bioperl.org/cgi-bin/deob_interface.cgi > > Many thanks to Laura Kavanaugh and David Messina for this great > contribution to the BioPerl project! > > Mauricio. > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Mon May 15 13:22:13 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 12:22:13 -0500 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <068E49BD-2DE4-47BA-BD7C-D6FD487DF095@gmx.net> Message-ID: <000301c67844$1b506280$15327e82@pyrimidine> That's strange. Clicking on the list gives me the results for that module. When I click on the hyperlinks in the results section they open fine; the method column links opens a new page containing usage-function-returns-args and the class column links opens pdoc (same page) for bioperl-live. I'm using Firefox 1.5 on WinXP. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > Sent: Monday, May 15, 2006 12:01 PM > To: Mauricio Herrera Cuadra > Cc: bioperl-l > Subject: Re: [Bioperl-l] Deobfuscator interface now available > > Hey, thanks to Laura & David for this interface. > > Any idea why most of the Bio::Ontology::* modules show up without > their leading Bio::Ontology? And clicking on those hyperlinks doesn't > go anywhere either ... Anything different with those modules that I > can fix? > > -hilmar > > On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: > > > I'm glad to announce the availability of the Deobfuscator interface at > > the BioPerl website. You can use it at the following URL: > > > > http://bioperl.org/cgi-bin/deob_interface.cgi > > > > Many thanks to Laura Kavanaugh and David Messina for this great > > contribution to the BioPerl project! > > > > Mauricio. > > > > -- > > MAURICIO HERRERA CUADRA > > arareko at campus.iztacala.unam.mx > > Laboratorio de Gen?tica > > Unidad de Morfofisiolog?a y Funci?n > > Facultad de Estudios Superiores Iztacala, UNAM > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sb at mrc-dunn.cam.ac.uk Mon May 15 14:00:15 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Mon, 15 May 2006 19:00:15 +0100 Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names In-Reply-To: <001601c67839$cf289490$15327e82@pyrimidine> References: <001601c67839$cf289490$15327e82@pyrimidine> Message-ID: <4468C1AF.9080400@mrc-dunn.cam.ac.uk> Chris Fields wrote: > > Ah, now I see. That's a bit screwy, but it's not on our end so we have to > deal with it. I also noticed that subspecies also contains the entire > string: > > > 135461 > Bacillus subtilis subsp. subtilis > subspecies > Yes, this is one of the problems I mentioned in the first post to this thread. > As for the 'scientific_name' method when accessed through Bio::DB::Taxonomy, > I don't get the actual scientific name for the node (from the GenBank > ORGANISM line) almost every time; I get the name with the strain chopped off > instead and a number of times the names get mangled. [snip, should be:] > 224308 Bacillus subtilis subsp. subtilis str. 168 > 281309 Bacillus thuringiensis serovar konkukian str. 97-27 [snip, but Bio::DB::Taxonomy gives:] > 224308 subtilis Bacillus subtilis subsp. subtilis > 281309 Bacillus cereus group thuringiensis [snip] > So, in a nutshell, there's a problem here. I don't know if your fix works > for that, but I definitely don't think the 'scientific name' should be > assembled ad hoc but should be taken from the tagname for that node. Yes, my implementation will get you the correct answer, but not quite as you say. My solution was to munge the actual ScientificName but 'ensure' that the binomial would give you back the actual binomial name you wanted - which is the intent of current Bio::DB::Taxonomy code. my $species0 = TFBS::Species->new(-ncbi_taxid => 224308); my $leaf_node = $species0->taxonomy->get_leaves(); print "sci_name of Node = '", $leaf_node->scientific_name, "'\n"; print "Species0 subspecies = '", $species0->subspecies, "'\n"; print "Species0 variants = '", scalar($species0->variant), "'\n"; print "Species0 binomial = '", $species0->binomial('FULL'), "'\n"; gives: sci_name of Node = 'str. 168' Species0 subspecies = 'subsp. subtilis' Species0 variants = 'str. 168' Species0 binomial = 'Bacillus subtilis subsp. subtilis str. 168' and the same again for id 281309: sci_name of Node = 'str. 97-27' Species0 subspecies = '' Species0 variants = 'serovar konkukian str. 97-27' Species0 binomial = 'Bacillus thuringiensis serovar konkukian str. 97-27' I've done it this way because even though strictly speaking the ScientificName for 224308 (a 'no rank') is 'Bacillus subtilis subsp. subtilis str. 168', when I ask for the variant I don't want that whole string. I just want the bit that will be different when comparing other strains of this subspecies of this species of Bacillus. I want 'str. 168'. Note that my objects never store the original ScientificName; it is due to 'luck' (or as I like to think, a good implementation) that the binomial method is able to reconstruct a string that is identical to what the original ScientificName was. If you'd like to see my code let me know. You can't just drop the code snippet I posted in this thread into existing bioperl modules; quite a bit else has to change as well. I'll have to make an updated taxonomy_the_tfbs_way.tar.gz file available if you want an example implementation; the current version of that file is now out of date - it doesn't do any of what I describe above. From hlapp at gmx.net Mon May 15 14:08:49 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 15 May 2006 14:08:49 -0400 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <000301c67844$1b506280$15327e82@pyrimidine> References: <000301c67844$1b506280$15327e82@pyrimidine> Message-ID: Safari or Firefox on MacOSX don't do this. Note that the appearance in the browsable list is already different (the prefix is missing), and the JavaScript link also lacks the prefix in the module name in contrast to others, e.g., Bio::Ontology::Ontology (which is one of the few Bio::Ontology exceptions that do work and do display correctly). I suppose there is something peculiar about the code formatting of those modules? Some of the modules under Bio::OntologyIO are also affected BTW. What happens is after you click on the link the page apppears to reload (i.e., gets submitted) but the second table that is supposed open underneath the first doesn't appear. However, the sort-by drop down selector does appear. -hilmar On May 15, 2006, at 1:22 PM, Chris Fields wrote: > That's strange. Clicking on the list gives me the results for that > module. > When I click on the hyperlinks in the results section they open > fine; the > method column links opens a new page containing usage-function- > returns-args > and the class column links opens pdoc (same page) for bioperl- > live. I'm > using Firefox 1.5 on WinXP. > > Chris > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp >> Sent: Monday, May 15, 2006 12:01 PM >> To: Mauricio Herrera Cuadra >> Cc: bioperl-l >> Subject: Re: [Bioperl-l] Deobfuscator interface now available >> >> Hey, thanks to Laura & David for this interface. >> >> Any idea why most of the Bio::Ontology::* modules show up without >> their leading Bio::Ontology? And clicking on those hyperlinks doesn't >> go anywhere either ... Anything different with those modules that I >> can fix? >> >> -hilmar >> >> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: >> >>> I'm glad to announce the availability of the Deobfuscator >>> interface at >>> the BioPerl website. You can use it at the following URL: >>> >>> http://bioperl.org/cgi-bin/deob_interface.cgi >>> >>> Many thanks to Laura Kavanaugh and David Messina for this great >>> contribution to the BioPerl project! >>> >>> Mauricio. >>> >>> -- >>> MAURICIO HERRERA CUADRA >>> arareko at campus.iztacala.unam.mx >>> Laboratorio de Gen?tica >>> Unidad de Morfofisiolog?a y Funci?n >>> Facultad de Estudios Superiores Iztacala, UNAM >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Mon May 15 15:07:59 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 14:07:59 -0500 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: Message-ID: <000501c67852$e1bb55c0$15327e82@pyrimidine> I'll have to give it a try on Mac OS X (we have an ancient G4 in the lab which I can try it on). I'll let you know what I find. This is what I get when I do a search for 'Bio::Ont*' using Firefox on WinXP and this Deobfuscator link (http://bioperl.org/cgi-bin/deob_interface.cgi?); all the classes have links that work (I added newline and tab to make it a bit more readable) : Bio::OntologyIO Parser factory for Ontology formats Bio::OntologyIO::Handlers::BaseSAXHandler no short description available Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler no short description available Bio::Ontology::OntologyI Interface for an ontology implementation Bio::Ontology::TermFactory Instantiates a new Bio::Ontology::TermI (or derived class) through a factory Bio::Ontology::OntologyStore A repository of ontologies Bio::Ontology::RelationshipFactory Instantiates a new Bio::Ontology::RelationshipI (or derived class) through a factory Bio::Ontology::Ontology standard implementation of an Ontology So the names seem fine here. When I click on a class (Bio::Ontology::Ontology) I get in the results section: Method Class Returns Usage add_relationship Bio::Ontology::Ontology Its argument. add_relationship(RelationshipI relationship): RelationshipI add_relationship_type Bio::Ontology::OntologyEngineI not documented not documented add_term Bio::Ontology::Ontology its argument. add_term(TermI term): TermI ....and so on Where each method is clickable and opens a new page containing a table: Bio::Ontology::Ontology::add_relationship Usage add_relationship(RelationshipI relationship): RelationshipI Function Adds a relationship object to the ontology engine. Returns Its argument. Args A RelationshipI object. Each class is also linked to the bioperl-live PDOC. Clicking on class Bio::Ontology::Ontology in the results table gets me this page (no new page): http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html Chris > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Monday, May 15, 2006 1:09 PM > To: Chris Fields > Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > Subject: Re: [Bioperl-l] Deobfuscator interface now available > > Safari or Firefox on MacOSX don't do this. Note that the appearance > in the browsable list is already different (the prefix is missing), > and the JavaScript link also lacks the prefix in the module name in > contrast to others, e.g., Bio::Ontology::Ontology (which is one of > the few Bio::Ontology exceptions that do work and do display correctly). > > I suppose there is something peculiar about the code formatting of > those modules? Some of the modules under Bio::OntologyIO are also > affected BTW. > > What happens is after you click on the link the page apppears to > reload (i.e., gets submitted) but the second table that is supposed > open underneath the first doesn't appear. However, the sort-by drop > down selector does appear. > > -hilmar > > On May 15, 2006, at 1:22 PM, Chris Fields wrote: > > > That's strange. Clicking on the list gives me the results for that > > module. > > When I click on the hyperlinks in the results section they open > > fine; the > > method column links opens a new page containing usage-function- > > returns-args > > and the class column links opens pdoc (same page) for bioperl- > > live. I'm > > using Firefox 1.5 on WinXP. > > > > Chris > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > >> Sent: Monday, May 15, 2006 12:01 PM > >> To: Mauricio Herrera Cuadra > >> Cc: bioperl-l > >> Subject: Re: [Bioperl-l] Deobfuscator interface now available > >> > >> Hey, thanks to Laura & David for this interface. > >> > >> Any idea why most of the Bio::Ontology::* modules show up without > >> their leading Bio::Ontology? And clicking on those hyperlinks doesn't > >> go anywhere either ... Anything different with those modules that I > >> can fix? > >> > >> -hilmar > >> > >> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: > >> > >>> I'm glad to announce the availability of the Deobfuscator > >>> interface at > >>> the BioPerl website. You can use it at the following URL: > >>> > >>> http://bioperl.org/cgi-bin/deob_interface.cgi > >>> > >>> Many thanks to Laura Kavanaugh and David Messina for this great > >>> contribution to the BioPerl project! > >>> > >>> Mauricio. > >>> > >>> -- > >>> MAURICIO HERRERA CUADRA > >>> arareko at campus.iztacala.unam.mx > >>> Laboratorio de Gen?tica > >>> Unidad de Morfofisiolog?a y Funci?n > >>> Facultad de Estudios Superiores Iztacala, UNAM > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > >> -- > >> =========================================================== > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >> =========================================================== > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From cjfields at uiuc.edu Mon May 15 15:12:34 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 14:12:34 -0500 Subject: [Bioperl-l] Deobfuscator interface now available Message-ID: <000601c67853$85d49cc0$15327e82@pyrimidine> I just tried the same thing (links, search, etc) with Mac OS X v 10.3.9 and Safari (no Firefox sorry) and it worked fine as well (all links, no missing Bio::Ontology, etc). Not sure what it could be... Chris > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > Sent: Monday, May 15, 2006 2:08 PM > To: 'Hilmar Lapp' > Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > Subject: RE: [Bioperl-l] Deobfuscator interface now available > > I'll have to give it a try on Mac OS X (we have an ancient G4 in the lab > which I can try it on). I'll let you know what I find. > > This is what I get when I do a search for 'Bio::Ont*' using Firefox on > WinXP and this Deobfuscator link (http://bioperl.org/cgi- > bin/deob_interface.cgi?); all the classes have links that work (I added > newline and tab to make it a bit more readable) : > > Bio::OntologyIO > Parser factory for Ontology formats > Bio::OntologyIO::Handlers::BaseSAXHandler > no short description available > Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler > no short description available > Bio::Ontology::OntologyI > Interface for an ontology implementation > Bio::Ontology::TermFactory > Instantiates a new Bio::Ontology::TermI (or derived class) through a > factory > Bio::Ontology::OntologyStore > A repository of ontologies > Bio::Ontology::RelationshipFactory > Instantiates a new Bio::Ontology::RelationshipI (or derived class) > through a factory > Bio::Ontology::Ontology > standard implementation of an Ontology > > So the names seem fine here. > > When I click on a class (Bio::Ontology::Ontology) I get in the results > section: > > Method Class Returns > Usage > add_relationship Bio::Ontology::Ontology Its > argument. add_relationship(RelationshipI relationship): RelationshipI > add_relationship_type Bio::Ontology::OntologyEngineI not > documented not documented > add_term Bio::Ontology::Ontology its > argument. add_term(TermI term): TermI > > ....and so on > > Where each method is clickable and opens a new page containing a table: > > Bio::Ontology::Ontology::add_relationship > Usage add_relationship(RelationshipI relationship): RelationshipI > Function Adds a relationship object to the ontology engine. > Returns Its argument. > Args A RelationshipI object. > > > Each class is also linked to the bioperl-live PDOC. Clicking on class > Bio::Ontology::Ontology in the results table gets me this page (no new > page): > > http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html > > > Chris > > > -----Original Message----- > > From: Hilmar Lapp [mailto:hlapp at gmx.net] > > Sent: Monday, May 15, 2006 1:09 PM > > To: Chris Fields > > Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > > Subject: Re: [Bioperl-l] Deobfuscator interface now available > > > > Safari or Firefox on MacOSX don't do this. Note that the appearance > > in the browsable list is already different (the prefix is missing), > > and the JavaScript link also lacks the prefix in the module name in > > contrast to others, e.g., Bio::Ontology::Ontology (which is one of > > the few Bio::Ontology exceptions that do work and do display correctly). > > > > I suppose there is something peculiar about the code formatting of > > those modules? Some of the modules under Bio::OntologyIO are also > > affected BTW. > > > > What happens is after you click on the link the page apppears to > > reload (i.e., gets submitted) but the second table that is supposed > > open underneath the first doesn't appear. However, the sort-by drop > > down selector does appear. > > > > -hilmar > > > > On May 15, 2006, at 1:22 PM, Chris Fields wrote: > > > > > That's strange. Clicking on the list gives me the results for that > > > module. > > > When I click on the hyperlinks in the results section they open > > > fine; the > > > method column links opens a new page containing usage-function- > > > returns-args > > > and the class column links opens pdoc (same page) for bioperl- > > > live. I'm > > > using Firefox 1.5 on WinXP. > > > > > > Chris > > > > > >> -----Original Message----- > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > > >> Sent: Monday, May 15, 2006 12:01 PM > > >> To: Mauricio Herrera Cuadra > > >> Cc: bioperl-l > > >> Subject: Re: [Bioperl-l] Deobfuscator interface now available > > >> > > >> Hey, thanks to Laura & David for this interface. > > >> > > >> Any idea why most of the Bio::Ontology::* modules show up without > > >> their leading Bio::Ontology? And clicking on those hyperlinks doesn't > > >> go anywhere either ... Anything different with those modules that I > > >> can fix? > > >> > > >> -hilmar > > >> > > >> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: > > >> > > >>> I'm glad to announce the availability of the Deobfuscator > > >>> interface at > > >>> the BioPerl website. You can use it at the following URL: > > >>> > > >>> http://bioperl.org/cgi-bin/deob_interface.cgi > > >>> > > >>> Many thanks to Laura Kavanaugh and David Messina for this great > > >>> contribution to the BioPerl project! > > >>> > > >>> Mauricio. > > >>> > > >>> -- > > >>> MAURICIO HERRERA CUADRA > > >>> arareko at campus.iztacala.unam.mx > > >>> Laboratorio de Gen?tica > > >>> Unidad de Morfofisiolog?a y Funci?n > > >>> Facultad de Estudios Superiores Iztacala, UNAM > > >>> > > >>> _______________________________________________ > > >>> Bioperl-l mailing list > > >>> Bioperl-l at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>> > > >> > > >> -- > > >> =========================================================== > > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > >> =========================================================== > > >> > > >> > > >> > > >> > > >> > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > -- > > =========================================================== > > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > =========================================================== > > > > > > From arareko at campus.iztacala.unam.mx Mon May 15 15:20:10 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Mon, 15 May 2006 14:20:10 -0500 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <000901c67827$d99eabb0$15327e82@pyrimidine> References: <000901c67827$d99eabb0$15327e82@pyrimidine> Message-ID: <4468D46A.8070203@campus.iztacala.unam.mx> Laura and Dave would be very happy to see all of your comments/suggestions/enhancements/complaints summarized in the appropriate wiki page. Just be sure to sign them properly with your name and date: http://bioperl.org/wiki/Deobfuscator I think they'll have to discuss which features will be nice to implement and which don't, depending on the direction they want their project to go. But don't worry, they're extremely nice people who are open to all kind of ideas. The best of all: the Deobfuscator is open-source so everyone is invited to contribute to it, just ask them for the code :) On my side, I'm working on tweaking the code so it would be able of browsing different BioPerl packages (core, run, ext) and their respective releases (stable, developer, cvs). Regards, Mauricio. Chris Fields wrote: >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Sendu Bala >> Sent: Monday, May 15, 2006 8:09 AM >> To: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Deobfuscator interface now available >> >> Amir Karger wrote: >>> This tool is quite nice, and may save me a lot of perdoc'ing. >> Yes, many thanks to everyone involved. > > The Deobfuscator currently indexes bioperl-1.4, so it's not completely > up-to-date. I believe Mauricio and Dave may be working on updating to the > newer versions and maybe bioperl-live, as well as getting the other bioperl > packages up and running. > > For modules added after v1.4 I use the script in the FAQ question mentioned > on the Deobfuscator wiki page to get up-to-date methods, then grab the that > ActiveState HTML'd perldocs pumped out when installing using PPM (I make a > custom PPM/PPD file and install myself every once in a while): > > #!/usr/bin/perl -w > use Class::Inspector; > $class = shift || die "Usage: methods perl_class_name\n"; > eval "require $class"; > print join ("\n", sort @{Class::Inspector- > >>> A couple of minor interface thoughts. >>> >>> 1)There's quite a lot of methods for many of the classes. As such, I >>> think I'll often want to browse through what's available in a class. But >>> 60% or so of the screen real estate is used for "Enter a search >>> string... OR select a class from the list". IMO, it would be better to >>> have two pages, a search page and a result page. It only takes a click >>> on Back (or a "new search" button) to get to a new search, and now you >>> can use your whole screen for reading your results. >> As the compromise it must be, I like the way it behaves. I don't like >> lots of windows. I especially don't like pop up windows. Right now when >> I'm using the bioperl docs I tend to have a whole bunch of tabs open to >> different class pages at once, so being able to see an overview all on >> one page in Deobfuscator is very nice. >> >> Further to that, I'd love it if clicking on a method name caused an >> in-place css(&|javascript) reveal (similar to how a well implemented >> drop down menu works in a website) rather than a new window opened. >> Alternatively, just have more columns in the results table, ie. usage, >> function, returns, args columns. I feel that opening a window for each >> method you want to understand is far too slow. > > Agreed. > >> I'd also really like a link to the code for the method as well. The >> bioperl docs are rarely complete enough that you can really understand >> what every method is supposed to do without looking at the code. > > The methods that pop up are in columns along with the class module that > implements the method. > > > If you click on that link you get PDOC documentation for the module which > includes most of the code (strangely, though Deobfuscator indexes bioperl > 1.4, the PDOC corresponds to bioperl-live). Is that what you meant, or > something a bit more detailed? > >>> 3) Minimalist is nice, but documentation is even nicer. It wasn't clear >>> to me that the search searches within class names rather than function >>> names. What I really want to know sometimes is which module has, say, >>> the revcom method in it. > > That's listed in the method results table (the next column has the module > with a link to the module's online docs). > > > Chris > > >> This would be a great feature to add. >> >> >> Another minor interface thought: >> 6) Have a little more cell padding in all the tables. Things are just a >> little too cramped and things start to look messy/ run into each other. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From hlapp at gmx.net Mon May 15 15:23:55 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 15 May 2006 15:23:55 -0400 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <000501c67852$e1bb55c0$15327e82@pyrimidine> References: <000501c67852$e1bb55c0$15327e82@pyrimidine> Message-ID: <57326DCD-D72B-4CED-801D-9E25609BF57C@gmx.net> I wasn't using the search. It's in the scrollable table for browsing. -hilmar On May 15, 2006, at 3:07 PM, Chris Fields wrote: > I'll have to give it a try on Mac OS X (we have an ancient G4 in > the lab > which I can try it on). I'll let you know what I find. > > This is what I get when I do a search for 'Bio::Ont*' using Firefox > on WinXP > and this Deobfuscator link (http://bioperl.org/cgi-bin/ > deob_interface.cgi?); > all the classes have links that work (I added newline and tab to > make it a > bit more readable) : > > Bio::OntologyIO > Parser factory for Ontology formats > Bio::OntologyIO::Handlers::BaseSAXHandler > no short description available > Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler > no short description available > Bio::Ontology::OntologyI > Interface for an ontology implementation > Bio::Ontology::TermFactory > Instantiates a new Bio::Ontology::TermI (or derived class) through a > factory > Bio::Ontology::OntologyStore > A repository of ontologies > Bio::Ontology::RelationshipFactory > Instantiates a new Bio::Ontology::RelationshipI (or derived class) > through a factory > Bio::Ontology::Ontology > standard implementation of an Ontology > > So the names seem fine here. > > When I click on a class (Bio::Ontology::Ontology) I get in the results > section: > > Method Class > Returns > Usage > add_relationship Bio::Ontology::Ontology Its > argument. add_relationship(RelationshipI relationship): > RelationshipI > add_relationship_type Bio::Ontology::OntologyEngineI not > documented not documented > add_term Bio::Ontology::Ontology its > argument. add_term(TermI term): TermI > > ....and so on > > Where each method is clickable and opens a new page containing a > table: > > Bio::Ontology::Ontology::add_relationship > Usage add_relationship(RelationshipI relationship): RelationshipI > Function Adds a relationship object to the ontology engine. > Returns Its argument. > Args A RelationshipI object. > > > Each class is also linked to the bioperl-live PDOC. Clicking on class > Bio::Ontology::Ontology in the results table gets me this page (no new > page): > > http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html > > > Chris > >> -----Original Message----- >> From: Hilmar Lapp [mailto:hlapp at gmx.net] >> Sent: Monday, May 15, 2006 1:09 PM >> To: Chris Fields >> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' >> Subject: Re: [Bioperl-l] Deobfuscator interface now available >> >> Safari or Firefox on MacOSX don't do this. Note that the appearance >> in the browsable list is already different (the prefix is missing), >> and the JavaScript link also lacks the prefix in the module name in >> contrast to others, e.g., Bio::Ontology::Ontology (which is one of >> the few Bio::Ontology exceptions that do work and do display >> correctly). >> >> I suppose there is something peculiar about the code formatting of >> those modules? Some of the modules under Bio::OntologyIO are also >> affected BTW. >> >> What happens is after you click on the link the page apppears to >> reload (i.e., gets submitted) but the second table that is supposed >> open underneath the first doesn't appear. However, the sort-by drop >> down selector does appear. >> >> -hilmar >> >> On May 15, 2006, at 1:22 PM, Chris Fields wrote: >> >>> That's strange. Clicking on the list gives me the results for that >>> module. >>> When I click on the hyperlinks in the results section they open >>> fine; the >>> method column links opens a new page containing usage-function- >>> returns-args >>> and the class column links opens pdoc (same page) for bioperl- >>> live. I'm >>> using Firefox 1.5 on WinXP. >>> >>> Chris >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp >>>> Sent: Monday, May 15, 2006 12:01 PM >>>> To: Mauricio Herrera Cuadra >>>> Cc: bioperl-l >>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available >>>> >>>> Hey, thanks to Laura & David for this interface. >>>> >>>> Any idea why most of the Bio::Ontology::* modules show up without >>>> their leading Bio::Ontology? And clicking on those hyperlinks >>>> doesn't >>>> go anywhere either ... Anything different with those modules that I >>>> can fix? >>>> >>>> -hilmar >>>> >>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: >>>> >>>>> I'm glad to announce the availability of the Deobfuscator >>>>> interface at >>>>> the BioPerl website. You can use it at the following URL: >>>>> >>>>> http://bioperl.org/cgi-bin/deob_interface.cgi >>>>> >>>>> Many thanks to Laura Kavanaugh and David Messina for this great >>>>> contribution to the BioPerl project! >>>>> >>>>> Mauricio. >>>>> >>>>> -- >>>>> MAURICIO HERRERA CUADRA >>>>> arareko at campus.iztacala.unam.mx >>>>> Laboratorio de Gen?tica >>>>> Unidad de Morfofisiolog?a y Funci?n >>>>> Facultad de Estudios Superiores Iztacala, UNAM >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> -- >>>> =========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>> =========================================================== >>>> >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From ClarkeW at AGR.GC.CA Mon May 15 15:40:15 2006 From: ClarkeW at AGR.GC.CA (Clarke, Wayne) Date: Mon, 15 May 2006 15:40:15 -0400 Subject: [Bioperl-l] Memory Leak in Bio::SearchIO Message-ID: <320530F83FA47047823E57F110DDEAADB159EC@onncrxms4.agr.gc.ca> Hey everyone, I have been developing some code to download and parse blast reports from a remote server using Soap::Lite as well as insert the results into a mysql database. The problem I am having is that my program seems to be taking up and huge amount of RAM. For a single job of 10000 queries it can consume as much as a couple hundred Mb inside an hour. I realize that a lot of work is being done but this seems like way too much. This leads me to the subject of my post. I think I may have traced the source of the memory leak to Bio::SearchIO. I have used Devel::Size to track the size of my variables and done other debugging steps and have had no luck with resolving this very frustrating problem. My code is as follows: my $result = $connector->getQueryResult($query_id); my $FH; open $FH, "<", \$result; my $searchio = new Bio::SearchIO(-format => "blast", -fh => $FH); while (my $o_blast = $searchio->next_result()) { my $clone_id = $o_blast->query_name(); my $statement = $bdbi->form_push_SQL ($o_blast, $clone_id, 5); this is just the leading and tailing code surrounding the use of Bio::SearchIO since there is quite a lot. I am mostly just wondering if anyone has ever had problems with SearchIO and its memory usage. I looked at the source code for it but am afraid it is out of my league. Any help/suggestions/questions would be great. Thanks From dmessina at wustl.edu Mon May 15 15:34:10 2006 From: dmessina at wustl.edu (David Messina) Date: Mon, 15 May 2006 14:34:10 -0500 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <000901c67827$d99eabb0$15327e82@pyrimidine> References: <000901c67827$d99eabb0$15327e82@pyrimidine> Message-ID: Responding to: >>> Amir Karger >> Sendu Bala > Chris Fields > The Deobfuscator currently indexes bioperl-1.4, so it's not completely > up-to-date. I believe Mauricio and Dave may be working on updating > to the > newer versions and maybe bioperl-live, as well as getting the other > bioperl > packages up and running. That's correct -- Mauricio is currently working on a version that will allow you to search 1.4, 1.5.1, or bioperl-live. The Deobfuscator indexes will be updated (daily?) to keep them in sync with the CVS repository. >>> A couple of minor interface thoughts. >>> >>> 1)There's quite a lot of methods for many of the classes. As such, I >>> think I'll often want to browse through what's available in a >>> class. But >>> 60% or so of the screen real estate is used for "Enter a search >>> string... OR select a class from the list". IMO, it would be >>> better to >>> have two pages, a search page and a result page. It only takes >>> a click >>> on Back (or a "new search" button) to get to a new search, and >>> now you >>> can use your whole screen for reading your results. >> >> As the compromise it must be, I like the way it behaves. I don't like >> lots of windows. I especially don't like pop up windows. Right now >> when >> I'm using the bioperl docs I tend to have a whole bunch of tabs >> open to >> different class pages at once, so being able to see an overview >> all on >> one page in Deobfuscator is very nice. I think the current behavior makes sense as the default, but I like the idea of being able to view the search results in a separate window for easier browsing. Thanks for the suggestion; I'll add it to the list. >> Further to that, I'd love it if clicking on a method name caused an >> in-place css(&|javascript) reveal (similar to how a well implemented >> drop down menu works in a website) rather than a new window opened. >> Alternatively, just have more columns in the results table, ie. >> usage, >> function, returns, args columns. I feel that opening a window for >> each >> method you want to understand is far too slow. > > Agreed. Yeah, the way it currently works is admittedly lame, and was done as a placeholder until we figured out a better way to do it. An in-place reveal sounds like a good solution. >>> 2) Please sort the "select a class from the list" alphabetically. I >>> guess I can enter a search term to get the right classes, but it >>> would >>> be nice to be able to browse. Agreed. I think we were doing this in an earlier test version, but I must have left it out of the release I handed off to Mauricio. >>> 3) Minimalist is nice, but documentation is even nicer. It wasn't >>> clear >>> to me that the search searches within class names rather than >>> function >>> names. What I really want to know sometimes is which module has, >>> say, >>> the revcom method in it. >> >> This would be a great feature to add. That's a great idea. >>> 4) When I search for something that's not found, I get a screen that >>> looks pretty familiar, with the extra text "No match to string >>> found" >>> down at the bottom. It took me a while to even notice it. >>> (Studies show >>> that most users don't read most of the text on a page.) Bold >>> might be >>> nice here. Or put the error at the top of the screen. Or both. Added to the list. >>> 5) I'll save my stupidest comment for last - please make the page >>> title >>> "Bioperl Deobfuscator", so that when I bookmark it I'll know what >>> the >>> bookmark stands for. Added to the list. Not stupid, by the way -- much to my surprise, there are at least 2 or 3 other (obviously inferior :) ) deobfuscators floating around out there. >> Another minor interface thought: >> 6) Have a little more cell padding in all the tables. Things are >> just a >> little too cramped and things start to look messy/ run into each >> other. Added to the list. Thanks to all of you for taking the time to give such detailed feedback -- it's really helpful. There is a wiki page on the BioPerl site for this project (http:// www.bioperl.org/wiki/Deobfuscator), so I'll be putting your comments there for tracking and further discussion. Please feel free to add to it. Dave -- Dave Messina WashU Genome Sequencing Center dmessina at wustl.edu 314-286-1825 From faruque at ebi.ac.uk Mon May 15 15:47:27 2006 From: faruque at ebi.ac.uk (Nadeem Faruque) Date: Mon, 15 May 2006 20:47:27 +0100 Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names Message-ID: <809AE0C7-A9B6-48A4-BF11-BA392C770CA9@ebi.ac.uk> >> My personal view is that having it as an annotation would serve no >> real >> purpose. For me the whole point of any kind of species >> representation in >> bioperl is to allow you to compare species in a biologically >> meaningful >> way. If it's just some annotation then that means it's basically I understand the need to find the species name of entries, especially now that so many complete genomes have been given their own strain- specific tax nodes, and I also think it is a shame that the ncbi tax dump does not give a rank to entries such as these (they cannot easily be distinguished from unofficial ranks higher in the tree without ascending the tree). Would it be useful for the species name to be included within EMBL file headers, eg in a line called OB (OB is a terrible suggestion based on 'Organism Binomial' since OS is already in use)? eg two examples of the species 'Apple stem grooving virus', where the second one would appear to be a different species without delving into the tax tree or the inclusion of an OB line. AC D14995; S47260; DE Apple stem grooving virus genome, complete sequence. OS Apple stem grooving virus OB Apple stem grooving virus OC Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae; OC Capillovirus. AC AY646511; DE Citrus tatter leaf virus strain Kumquat 1, complete genome. OS Citrus tatter leaf virus OB Apple stem grooving virus OC Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae; OC Capillovirus. > My point is, a large number of users do NOT use, nor care about, > taxonomic > information to the degree they need to know the entire > classification of the > organism; many are just as happy about getting the scientific name > only, > which is in the GenBank/EMBL file itself. To take one extreme, it > is not > productive to force every user to download the NCBI tax database > and use > lookups just to convert sequences from EMBL format to GenBank > format. It's > not productive to allow users to spam the NCBI tax database > remotely either, > so hardcoding lookups is, IMHO, a big mistake. I don't think you need to add any information to turn an embl-format file into a Genbank flatfile, but maybe I'm missing something obvious. Nadeem -- Dr S.M. Nadeem N. Faruque 9 Barley Court Saffron Walden Essex CB11 3HG 01799 500 120 From dmessina at wustl.edu Mon May 15 16:12:48 2006 From: dmessina at wustl.edu (David Messina) Date: Mon, 15 May 2006 15:12:48 -0500 Subject: [Bioperl-l] Deobfuscator interface now available Message-ID: <5A2309FD-8C6E-4349-99CC-B3EDA8B2F499@wustl.edu> On May 15, 2006, at 2:23 PM, Hilmar Lapp wrote: > I wasn't using the search. It's in the scrollable table for browsing. > -hilmar I'm seeing this too on OS X with Safari 2.0.3. If you type 'goflat' (without the quotes) into the search box, you'll see the behavior. Chris, can you try it again this way just to confirm it's an OS/browser-specific thing? Not sure what's going on, Hilmar -- I'll take a look. Dave From cjfields at uiuc.edu Mon May 15 16:56:29 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 15:56:29 -0500 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <57326DCD-D72B-4CED-801D-9E25609BF57C@gmx.net> Message-ID: <000a01c67862$0a00cab0$15327e82@pyrimidine> Okay, I see what you mean. Using the search term "Bio::Ont*" also explains why I didn't see it ;P. Yeah, the bug shows up here too (WinXP and Mac OS X), and those links are broken like you said. Could be something to do with indexing. Using the methods script in the FAQ (http://www.bioperl.org/wiki/FAQ#Why_can.27t_I_easily_get_a_list_of_all_the_ methods_a_object_can_call.3F) I get this: C:\Perl\Scripts>methods.pl Bio::OntologyIO::simplehierarchy Bio::OntologyIO::simplehierarchy::Dumper Bio::OntologyIO::simplehierarchy::basename Bio::OntologyIO::simplehierarchy::dirname Bio::OntologyIO::simplehierarchy::fileparse Bio::OntologyIO::simplehierarchy::fileparse_set_fstype Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > Sent: Monday, May 15, 2006 2:24 PM > To: Chris Fields > Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > Subject: Re: [Bioperl-l] Deobfuscator interface now available > > I wasn't using the search. It's in the scrollable table for browsing. > -hilmar > > On May 15, 2006, at 3:07 PM, Chris Fields wrote: > > > I'll have to give it a try on Mac OS X (we have an ancient G4 in > > the lab > > which I can try it on). I'll let you know what I find. > > > > This is what I get when I do a search for 'Bio::Ont*' using Firefox > > on WinXP > > and this Deobfuscator link (http://bioperl.org/cgi-bin/ > > deob_interface.cgi?); > > all the classes have links that work (I added newline and tab to > > make it a > > bit more readable) : > > > > Bio::OntologyIO > > Parser factory for Ontology formats > > Bio::OntologyIO::Handlers::BaseSAXHandler > > no short description available > > Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler > > no short description available > > Bio::Ontology::OntologyI > > Interface for an ontology implementation > > Bio::Ontology::TermFactory > > Instantiates a new Bio::Ontology::TermI (or derived class) through a > > factory > > Bio::Ontology::OntologyStore > > A repository of ontologies > > Bio::Ontology::RelationshipFactory > > Instantiates a new Bio::Ontology::RelationshipI (or derived class) > > through a factory > > Bio::Ontology::Ontology > > standard implementation of an Ontology > > > > So the names seem fine here. > > > > When I click on a class (Bio::Ontology::Ontology) I get in the results > > section: > > > > Method Class > > Returns > > Usage > > add_relationship Bio::Ontology::Ontology > Its > > argument. add_relationship(RelationshipI relationship): > > RelationshipI > > add_relationship_type Bio::Ontology::OntologyEngineI not > > documented not documented > > add_term Bio::Ontology::Ontology its > > argument. add_term(TermI term): TermI > > > > ....and so on > > > > Where each method is clickable and opens a new page containing a > > table: > > > > Bio::Ontology::Ontology::add_relationship > > Usage add_relationship(RelationshipI relationship): RelationshipI > > Function Adds a relationship object to the ontology engine. > > Returns Its argument. > > Args A RelationshipI object. > > > > > > Each class is also linked to the bioperl-live PDOC. Clicking on class > > Bio::Ontology::Ontology in the results table gets me this page (no new > > page): > > > > http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html > > > > > > Chris > > > >> -----Original Message----- > >> From: Hilmar Lapp [mailto:hlapp at gmx.net] > >> Sent: Monday, May 15, 2006 1:09 PM > >> To: Chris Fields > >> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > >> Subject: Re: [Bioperl-l] Deobfuscator interface now available > >> > >> Safari or Firefox on MacOSX don't do this. Note that the appearance > >> in the browsable list is already different (the prefix is missing), > >> and the JavaScript link also lacks the prefix in the module name in > >> contrast to others, e.g., Bio::Ontology::Ontology (which is one of > >> the few Bio::Ontology exceptions that do work and do display > >> correctly). > >> > >> I suppose there is something peculiar about the code formatting of > >> those modules? Some of the modules under Bio::OntologyIO are also > >> affected BTW. > >> > >> What happens is after you click on the link the page apppears to > >> reload (i.e., gets submitted) but the second table that is supposed > >> open underneath the first doesn't appear. However, the sort-by drop > >> down selector does appear. > >> > >> -hilmar > >> > >> On May 15, 2006, at 1:22 PM, Chris Fields wrote: > >> > >>> That's strange. Clicking on the list gives me the results for that > >>> module. > >>> When I click on the hyperlinks in the results section they open > >>> fine; the > >>> method column links opens a new page containing usage-function- > >>> returns-args > >>> and the class column links opens pdoc (same page) for bioperl- > >>> live. I'm > >>> using Firefox 1.5 on WinXP. > >>> > >>> Chris > >>> > >>>> -----Original Message----- > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > >>>> Sent: Monday, May 15, 2006 12:01 PM > >>>> To: Mauricio Herrera Cuadra > >>>> Cc: bioperl-l > >>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available > >>>> > >>>> Hey, thanks to Laura & David for this interface. > >>>> > >>>> Any idea why most of the Bio::Ontology::* modules show up without > >>>> their leading Bio::Ontology? And clicking on those hyperlinks > >>>> doesn't > >>>> go anywhere either ... Anything different with those modules that I > >>>> can fix? > >>>> > >>>> -hilmar > >>>> > >>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: > >>>> > >>>>> I'm glad to announce the availability of the Deobfuscator > >>>>> interface at > >>>>> the BioPerl website. You can use it at the following URL: > >>>>> > >>>>> http://bioperl.org/cgi-bin/deob_interface.cgi > >>>>> > >>>>> Many thanks to Laura Kavanaugh and David Messina for this great > >>>>> contribution to the BioPerl project! > >>>>> > >>>>> Mauricio. > >>>>> > >>>>> -- > >>>>> MAURICIO HERRERA CUADRA > >>>>> arareko at campus.iztacala.unam.mx > >>>>> Laboratorio de Gen?tica > >>>>> Unidad de Morfofisiolog?a y Funci?n > >>>>> Facultad de Estudios Superiores Iztacala, UNAM > >>>>> > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>> > >>>> -- > >>>> =========================================================== > >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >>>> =========================================================== > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > >> -- > >> =========================================================== > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >> =========================================================== > >> > >> > >> > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Mon May 15 17:29:14 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 16:29:14 -0500 Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names In-Reply-To: <809AE0C7-A9B6-48A4-BF11-BA392C770CA9@ebi.ac.uk> Message-ID: <000b01c67866$9dac2620$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Nadeem Faruque > Sent: Monday, May 15, 2006 2:47 PM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::DB::Taxonomy:: mishandles > species,subspecies/variant names > > >> My personal view is that having it as an annotation would serve no > >> real > >> purpose. For me the whole point of any kind of species > >> representation in > >> bioperl is to allow you to compare species in a biologically > >> meaningful > >> way. If it's just some annotation then that means it's basically > > I understand the need to find the species name of entries, especially > now that so many complete genomes have been given their own strain- > specific tax nodes, and I also think it is a shame that the ncbi tax > dump does not give a rank to entries such as these (they cannot > easily be distinguished from unofficial ranks higher in the tree > without ascending the tree). > Would it be useful for the species name to be included within EMBL > file headers, eg in a line called OB (OB is a terrible suggestion > based on 'Organism Binomial' since OS is already in use)? > > eg two examples of the species 'Apple stem grooving virus', where the > second one would appear to be a different species without delving > into the tax tree or the inclusion of an OB line. > > AC D14995; S47260; > DE Apple stem grooving virus genome, complete sequence. > OS Apple stem grooving virus > OB Apple stem grooving virus > OC Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae; > OC Capillovirus. > > AC AY646511; > DE Citrus tatter leaf virus strain Kumquat 1, complete genome. > OS Citrus tatter leaf virus > OB Apple stem grooving virus > OC Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae; > OC Capillovirus. Jason also mentions a few examples (see below). The problem lies in the fact that EMBL and GenBank flatfiles do not give hierarchy ranking for taxonomy, so it's a best guess. What I'm seeing is that the guess is wrong more often than not when it comes to complex scientific names (viruses, bacteria, etc). Notice the doubling of the strain in the following GenBank files passed through SeqIO (genbank->genbank conversion, BTW; haven't tried EMBL): SOURCE Azoarcus sp. EbN1 EbN1 ORGANISM Azoarcus sp. Bacteria; Proteobacteria; Betaproteobacteria; Rhodocyclales; Rhodocyclaceae; Azoarcus. SOURCE Mycobacterium sp. KMS KMS ORGANISM Mycobacterium sp. Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales; Corynebacterineae; Mycobacteriaceae; Mycobacterium. SOURCE Mycobacterium tuberculosis C C ORGANISM Mycobacterium tuberculosis Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales; Corynebacterineae; Mycobacteriaceae; Mycobacterium; Mycobacterium; tuberculosis complex; Mycobacterium. SOURCE Bacillus subtilis subsp. subtilis str. 168 subtilis str. 168 ORGANISM Bacillus subtilis subsp. Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus. Here are Jason's examples, for posterity: Can you guess what value is the strain versus sub-species? What happens when there is a two part strain name (space separated) and a sub-species or variety designation? SOURCE Staphylococcus haemolyticus JCSC1435 ORGANISM Staphylococcus haemolyticus JCSC1435 Bacteria; Firmicutes; Bacillales; Staphylococcus. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=279808 strain is JCSC1435 versus SOURCE Muntiacus muntjak vaginalis ORGANISM Muntiacus muntjak vaginalis Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Ruminantia; Pecora; Cervidae; Muntiacinae; Muntiacus. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9887 species is muntjak, sub-species vaginalis ? versus SOURCE Aspergillus nidulans FGSC A4 ORGANISM Aspergillus nidulans FGSC A4 Eukaryota; Fungi; Ascomycota; Pezizomycotina; Eurotiomycetes; Eurotiales; Trichocomaceae; Emericella. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=227321 Genus should be Aspergillus or Emericella ? Strain and subspecies/variety in the same entry SOURCE Cryptococcus neoformans var. grubii H99 ORGANISM Cryptococcus neoformans var. grubii H99 Eukaryota; Fungi; Basidiomycota; Hymenomycetes; Heterobasidiomycetes; Tremellomycetidae; Tremellales; Tremellaceae; Filobasidiella. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=235443 > > My point is, a large number of users do NOT use, nor care about, > > taxonomic > > information to the degree they need to know the entire > > classification of the > > organism; many are just as happy about getting the scientific name > > only, > > which is in the GenBank/EMBL file itself. To take one extreme, it > > is not > > productive to force every user to download the NCBI tax database > > and use > > lookups just to convert sequences from EMBL format to GenBank > > format. It's > > not productive to allow users to spam the NCBI tax database > > remotely either, > > so hardcoding lookups is, IMHO, a big mistake. > > I don't think you need to add any information to turn an embl-format > file into a Genbank flatfile, but maybe I'm missing something obvious. The issue is the way the SOURCE and ORGANISM lines are handled (OS/OC lines in EMBL, I believe), which is using a Bio::Species object. The problem is, like I mentioned above, no hierarchal ranking is in the flat file, just the order of the ranking. We can try to make a best guess based on that but it's obviously very tricky, particularly when dealing with subspecies, strains, etc. NCBI also states that many times the classification can be too long for a file so may be incomplete (I think they leave out nodes which have 'no rank' tags, but I can't be completely sure), so there's another issue. Anyway, this is where the lookup would come in, which would require a local taxonomy database (we can't spam the NCBI remote database, that would just be rude) which would give the complete taxonomic classification if it worked properly. So now we have three possible situations: 1) One extreme : We require a lookup to get it right (which, BTW, it currently doesn't); this by default requires a local database. 2) Middle of the road : we try and guess the information as best as we can with the information given (the current situation); this is breaking more and more often now, so is becoming more unreliable. 3) Other extreme : we punt and absolve ourselves of even trying to parse the data and just have a strict tagname->value or similar simple construct to handle the data. #3 as default with option to do #1 is probably best (least error prone with option for most information), with caching to speed up lookups as Sendu Bala does now. Chris > Nadeem > > > -- > Dr S.M. Nadeem N. Faruque > 9 Barley Court > Saffron Walden > Essex CB11 3HG > 01799 500 120 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Mon May 15 17:37:56 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 15 May 2006 17:37:56 -0400 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <000a01c67862$0a00cab0$15327e82@pyrimidine> References: <000a01c67862$0a00cab0$15327e82@pyrimidine> Message-ID: <6CCA8112-651D-4154-94AE-88FE8EFBCD27@gmx.net> It does have the following line though (and a 'use' statement for OntologyIO); @ISA = qw( Bio::OntologyIO ); So what is it doing 'wrong' (there aren't any tests or so in which anything erroneous would show)? -hilmar On May 15, 2006, at 4:56 PM, Chris Fields wrote: > Okay, I see what you mean. Using the search term "Bio::Ont*" also > explains > why I didn't see it ;P. Yeah, the bug shows up here too (WinXP and > Mac OS > X), and those links are broken like you said. Could be something > to do with > indexing. > > Using the methods script in the FAQ > (http://www.bioperl.org/wiki/FAQ#Why_can. > 27t_I_easily_get_a_list_of_all_the_ > methods_a_object_can_call.3F) I get this: > > C:\Perl\Scripts>methods.pl Bio::OntologyIO::simplehierarchy > Bio::OntologyIO::simplehierarchy::Dumper > Bio::OntologyIO::simplehierarchy::basename > Bio::OntologyIO::simplehierarchy::dirname > Bio::OntologyIO::simplehierarchy::fileparse > Bio::OntologyIO::simplehierarchy::fileparse_set_fstype > > Chris > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp >> Sent: Monday, May 15, 2006 2:24 PM >> To: Chris Fields >> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' >> Subject: Re: [Bioperl-l] Deobfuscator interface now available >> >> I wasn't using the search. It's in the scrollable table for browsing. >> -hilmar >> >> On May 15, 2006, at 3:07 PM, Chris Fields wrote: >> >>> I'll have to give it a try on Mac OS X (we have an ancient G4 in >>> the lab >>> which I can try it on). I'll let you know what I find. >>> >>> This is what I get when I do a search for 'Bio::Ont*' using Firefox >>> on WinXP >>> and this Deobfuscator link (http://bioperl.org/cgi-bin/ >>> deob_interface.cgi?); >>> all the classes have links that work (I added newline and tab to >>> make it a >>> bit more readable) : >>> >>> Bio::OntologyIO >>> Parser factory for Ontology formats >>> Bio::OntologyIO::Handlers::BaseSAXHandler >>> no short description available >>> Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler >>> no short description available >>> Bio::Ontology::OntologyI >>> Interface for an ontology implementation >>> Bio::Ontology::TermFactory >>> Instantiates a new Bio::Ontology::TermI (or derived class) >>> through a >>> factory >>> Bio::Ontology::OntologyStore >>> A repository of ontologies >>> Bio::Ontology::RelationshipFactory >>> Instantiates a new Bio::Ontology::RelationshipI (or derived class) >>> through a factory >>> Bio::Ontology::Ontology >>> standard implementation of an Ontology >>> >>> So the names seem fine here. >>> >>> When I click on a class (Bio::Ontology::Ontology) I get in the >>> results >>> section: >>> >>> Method Class >>> Returns >>> Usage >>> add_relationship Bio::Ontology::Ontology >> Its >>> argument. add_relationship(RelationshipI relationship): >>> RelationshipI >>> add_relationship_type Bio::Ontology::OntologyEngineI >>> not >>> documented not documented >>> add_term Bio::Ontology::Ontology >>> its >>> argument. add_term(TermI term): TermI >>> >>> ....and so on >>> >>> Where each method is clickable and opens a new page containing a >>> table: >>> >>> Bio::Ontology::Ontology::add_relationship >>> Usage add_relationship(RelationshipI relationship): RelationshipI >>> Function Adds a relationship object to the ontology engine. >>> Returns Its argument. >>> Args A RelationshipI object. >>> >>> >>> Each class is also linked to the bioperl-live PDOC. Clicking on >>> class >>> Bio::Ontology::Ontology in the results table gets me this page >>> (no new >>> page): >>> >>> http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html >>> >>> >>> Chris >>> >>>> -----Original Message----- >>>> From: Hilmar Lapp [mailto:hlapp at gmx.net] >>>> Sent: Monday, May 15, 2006 1:09 PM >>>> To: Chris Fields >>>> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' >>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available >>>> >>>> Safari or Firefox on MacOSX don't do this. Note that the appearance >>>> in the browsable list is already different (the prefix is missing), >>>> and the JavaScript link also lacks the prefix in the module name in >>>> contrast to others, e.g., Bio::Ontology::Ontology (which is one of >>>> the few Bio::Ontology exceptions that do work and do display >>>> correctly). >>>> >>>> I suppose there is something peculiar about the code formatting of >>>> those modules? Some of the modules under Bio::OntologyIO are also >>>> affected BTW. >>>> >>>> What happens is after you click on the link the page apppears to >>>> reload (i.e., gets submitted) but the second table that is supposed >>>> open underneath the first doesn't appear. However, the sort-by drop >>>> down selector does appear. >>>> >>>> -hilmar >>>> >>>> On May 15, 2006, at 1:22 PM, Chris Fields wrote: >>>> >>>>> That's strange. Clicking on the list gives me the results for >>>>> that >>>>> module. >>>>> When I click on the hyperlinks in the results section they open >>>>> fine; the >>>>> method column links opens a new page containing usage-function- >>>>> returns-args >>>>> and the class column links opens pdoc (same page) for bioperl- >>>>> live. I'm >>>>> using Firefox 1.5 on WinXP. >>>>> >>>>> Chris >>>>> >>>>>> -----Original Message----- >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp >>>>>> Sent: Monday, May 15, 2006 12:01 PM >>>>>> To: Mauricio Herrera Cuadra >>>>>> Cc: bioperl-l >>>>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available >>>>>> >>>>>> Hey, thanks to Laura & David for this interface. >>>>>> >>>>>> Any idea why most of the Bio::Ontology::* modules show up without >>>>>> their leading Bio::Ontology? And clicking on those hyperlinks >>>>>> doesn't >>>>>> go anywhere either ... Anything different with those modules >>>>>> that I >>>>>> can fix? >>>>>> >>>>>> -hilmar >>>>>> >>>>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: >>>>>> >>>>>>> I'm glad to announce the availability of the Deobfuscator >>>>>>> interface at >>>>>>> the BioPerl website. You can use it at the following URL: >>>>>>> >>>>>>> http://bioperl.org/cgi-bin/deob_interface.cgi >>>>>>> >>>>>>> Many thanks to Laura Kavanaugh and David Messina for this great >>>>>>> contribution to the BioPerl project! >>>>>>> >>>>>>> Mauricio. >>>>>>> >>>>>>> -- >>>>>>> MAURICIO HERRERA CUADRA >>>>>>> arareko at campus.iztacala.unam.mx >>>>>>> Laboratorio de Gen?tica >>>>>>> Unidad de Morfofisiolog?a y Funci?n >>>>>>> Facultad de Estudios Superiores Iztacala, UNAM >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>> >>>>>> -- >>>>>> =========================================================== >>>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>>>> =========================================================== >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> -- >>>> =========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>> =========================================================== >>>> >>>> >>>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Mon May 15 18:03:48 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 17:03:48 -0500 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <6CCA8112-651D-4154-94AE-88FE8EFBCD27@gmx.net> Message-ID: <000d01c6786b$71c04e60$15327e82@pyrimidine> And Bio::OntologyIO works on it's own: C:\Perl\Scripts>methods.pl Bio::OntologyIO Bio::OntologyIO::DESTROY Bio::OntologyIO::new Bio::OntologyIO::next_ontology Bio::OntologyIO::term_factory Bio::OntologyIO::unescape Bio::Root::IO::catfile Bio::Root::IO::close Bio::Root::IO::dup Bio::Root::IO::exists_exe Bio::Root::IO::file Bio::Root::IO::flush Bio::Root::IO::gensym Bio::Root::IO::mode Bio::Root::IO::noclose Bio::Root::IO::qualify Bio::Root::IO::qualify_to_ref Bio::Root::IO::rmtree Bio::Root::IO::tempdir Bio::Root::IO::tempfile Bio::Root::IO::ungensym Bio::Root::Root::confess Bio::Root::Root::debug Bio::Root::Root::throw Bio::Root::Root::verbose Bio::Root::RootI::carp Bio::Root::RootI::deprecated Bio::Root::RootI::stack_trace Bio::Root::RootI::stack_trace_dump Bio::Root::RootI::throw_not_implemented Bio::Root::RootI::warn Bio::Root::RootI::warn_not_implemented But when I try these: C:\Perl\Scripts>methods.pl Bio::OntologyIO::goflat C:\Perl\Scripts>methods.pl Bio::OntologyIO::dagflat I get nada. It could be related to the way the methods are parsed using Class::Inspector : print join ("\n", sort @{Class::Inspector->methods($class,'full','public')}), "\n"; I haven't tried it on all the weird Bio::Ontology-missing modules (don't have time today). It's not common to all of those modules though: C:\Perl\Scripts>methods.pl Bio::OntologyIO::InterProParser Bio::OntologyIO::DESTROY Bio::OntologyIO::InterProParser::next_ontology Bio::OntologyIO::InterProParser::parse Bio::OntologyIO::InterProParser::secondary_accessions_map Bio::OntologyIO::new Bio::OntologyIO::term_factory Bio::OntologyIO::unescape Bio::Root::IO::catfile Bio::Root::IO::close Bio::Root::IO::dup Bio::Root::IO::exists_exe Bio::Root::IO::file Bio::Root::IO::flush Bio::Root::IO::gensym Bio::Root::IO::mode Bio::Root::IO::noclose Bio::Root::IO::qualify Bio::Root::IO::qualify_to_ref Bio::Root::IO::rmtree Bio::Root::IO::tempdir Bio::Root::IO::tempfile Bio::Root::IO::ungensym Bio::Root::Root::confess Bio::Root::Root::debug Bio::Root::Root::throw Bio::Root::Root::verbose Bio::Root::RootI::carp Bio::Root::RootI::deprecated Bio::Root::RootI::stack_trace Bio::Root::RootI::stack_trace_dump Bio::Root::RootI::throw_not_implemented Bio::Root::RootI::warn Bio::Root::RootI::warn_not_implemented Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > Sent: Monday, May 15, 2006 4:38 PM > To: Chris Fields > Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > Subject: Re: [Bioperl-l] Deobfuscator interface now available > > It does have the following line though (and a 'use' statement for > OntologyIO); > > @ISA = qw( Bio::OntologyIO ); > > So what is it doing 'wrong' (there aren't any tests or so in which > anything erroneous would show)? > > -hilmar > > On May 15, 2006, at 4:56 PM, Chris Fields wrote: > > > Okay, I see what you mean. Using the search term "Bio::Ont*" also > > explains > > why I didn't see it ;P. Yeah, the bug shows up here too (WinXP and > > Mac OS > > X), and those links are broken like you said. Could be something > > to do with > > indexing. > > > > Using the methods script in the FAQ > > (http://www.bioperl.org/wiki/FAQ#Why_can. > > 27t_I_easily_get_a_list_of_all_the_ > > methods_a_object_can_call.3F) I get this: > > > > C:\Perl\Scripts>methods.pl Bio::OntologyIO::simplehierarchy > > Bio::OntologyIO::simplehierarchy::Dumper > > Bio::OntologyIO::simplehierarchy::basename > > Bio::OntologyIO::simplehierarchy::dirname > > Bio::OntologyIO::simplehierarchy::fileparse > > Bio::OntologyIO::simplehierarchy::fileparse_set_fstype > > > > Chris > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > >> Sent: Monday, May 15, 2006 2:24 PM > >> To: Chris Fields > >> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > >> Subject: Re: [Bioperl-l] Deobfuscator interface now available > >> > >> I wasn't using the search. It's in the scrollable table for browsing. > >> -hilmar > >> > >> On May 15, 2006, at 3:07 PM, Chris Fields wrote: > >> > >>> I'll have to give it a try on Mac OS X (we have an ancient G4 in > >>> the lab > >>> which I can try it on). I'll let you know what I find. > >>> > >>> This is what I get when I do a search for 'Bio::Ont*' using Firefox > >>> on WinXP > >>> and this Deobfuscator link (http://bioperl.org/cgi-bin/ > >>> deob_interface.cgi?); > >>> all the classes have links that work (I added newline and tab to > >>> make it a > >>> bit more readable) : > >>> > >>> Bio::OntologyIO > >>> Parser factory for Ontology formats > >>> Bio::OntologyIO::Handlers::BaseSAXHandler > >>> no short description available > >>> Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler > >>> no short description available > >>> Bio::Ontology::OntologyI > >>> Interface for an ontology implementation > >>> Bio::Ontology::TermFactory > >>> Instantiates a new Bio::Ontology::TermI (or derived class) > >>> through a > >>> factory > >>> Bio::Ontology::OntologyStore > >>> A repository of ontologies > >>> Bio::Ontology::RelationshipFactory > >>> Instantiates a new Bio::Ontology::RelationshipI (or derived class) > >>> through a factory > >>> Bio::Ontology::Ontology > >>> standard implementation of an Ontology > >>> > >>> So the names seem fine here. > >>> > >>> When I click on a class (Bio::Ontology::Ontology) I get in the > >>> results > >>> section: > >>> > >>> Method Class > >>> Returns > >>> Usage > >>> add_relationship Bio::Ontology::Ontology > >> Its > >>> argument. add_relationship(RelationshipI relationship): > >>> RelationshipI > >>> add_relationship_type Bio::Ontology::OntologyEngineI > >>> not > >>> documented not documented > >>> add_term Bio::Ontology::Ontology > >>> its > >>> argument. add_term(TermI term): TermI > >>> > >>> ....and so on > >>> > >>> Where each method is clickable and opens a new page containing a > >>> table: > >>> > >>> Bio::Ontology::Ontology::add_relationship > >>> Usage add_relationship(RelationshipI relationship): RelationshipI > >>> Function Adds a relationship object to the ontology engine. > >>> Returns Its argument. > >>> Args A RelationshipI object. > >>> > >>> > >>> Each class is also linked to the bioperl-live PDOC. Clicking on > >>> class > >>> Bio::Ontology::Ontology in the results table gets me this page > >>> (no new > >>> page): > >>> > >>> http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html > >>> > >>> > >>> Chris > >>> > >>>> -----Original Message----- > >>>> From: Hilmar Lapp [mailto:hlapp at gmx.net] > >>>> Sent: Monday, May 15, 2006 1:09 PM > >>>> To: Chris Fields > >>>> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > >>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available > >>>> > >>>> Safari or Firefox on MacOSX don't do this. Note that the appearance > >>>> in the browsable list is already different (the prefix is missing), > >>>> and the JavaScript link also lacks the prefix in the module name in > >>>> contrast to others, e.g., Bio::Ontology::Ontology (which is one of > >>>> the few Bio::Ontology exceptions that do work and do display > >>>> correctly). > >>>> > >>>> I suppose there is something peculiar about the code formatting of > >>>> those modules? Some of the modules under Bio::OntologyIO are also > >>>> affected BTW. > >>>> > >>>> What happens is after you click on the link the page apppears to > >>>> reload (i.e., gets submitted) but the second table that is supposed > >>>> open underneath the first doesn't appear. However, the sort-by drop > >>>> down selector does appear. > >>>> > >>>> -hilmar > >>>> > >>>> On May 15, 2006, at 1:22 PM, Chris Fields wrote: > >>>> > >>>>> That's strange. Clicking on the list gives me the results for > >>>>> that > >>>>> module. > >>>>> When I click on the hyperlinks in the results section they open > >>>>> fine; the > >>>>> method column links opens a new page containing usage-function- > >>>>> returns-args > >>>>> and the class column links opens pdoc (same page) for bioperl- > >>>>> live. I'm > >>>>> using Firefox 1.5 on WinXP. > >>>>> > >>>>> Chris > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > >>>>>> Sent: Monday, May 15, 2006 12:01 PM > >>>>>> To: Mauricio Herrera Cuadra > >>>>>> Cc: bioperl-l > >>>>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available > >>>>>> > >>>>>> Hey, thanks to Laura & David for this interface. > >>>>>> > >>>>>> Any idea why most of the Bio::Ontology::* modules show up without > >>>>>> their leading Bio::Ontology? And clicking on those hyperlinks > >>>>>> doesn't > >>>>>> go anywhere either ... Anything different with those modules > >>>>>> that I > >>>>>> can fix? > >>>>>> > >>>>>> -hilmar > >>>>>> > >>>>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: > >>>>>> > >>>>>>> I'm glad to announce the availability of the Deobfuscator > >>>>>>> interface at > >>>>>>> the BioPerl website. You can use it at the following URL: > >>>>>>> > >>>>>>> http://bioperl.org/cgi-bin/deob_interface.cgi > >>>>>>> > >>>>>>> Many thanks to Laura Kavanaugh and David Messina for this great > >>>>>>> contribution to the BioPerl project! > >>>>>>> > >>>>>>> Mauricio. > >>>>>>> > >>>>>>> -- > >>>>>>> MAURICIO HERRERA CUADRA > >>>>>>> arareko at campus.iztacala.unam.mx > >>>>>>> Laboratorio de Gen?tica > >>>>>>> Unidad de Morfofisiolog?a y Funci?n > >>>>>>> Facultad de Estudios Superiores Iztacala, UNAM > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Bioperl-l mailing list > >>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>> > >>>>>> > >>>>>> -- > >>>>>> =========================================================== > >>>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >>>>>> =========================================================== > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>> > >>>> -- > >>>> =========================================================== > >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >>>> =========================================================== > >>>> > >>>> > >>>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > >> -- > >> =========================================================== > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >> =========================================================== > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Mon May 15 20:14:28 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Mon, 15 May 2006 19:14:28 -0500 Subject: [Bioperl-l] Memory Leak in Bio::SearchIO Message-ID: ---- Original message ---- >Date: Mon, 15 May 2006 15:40:15 -0400 >From: "Clarke, Wayne" >Subject: [Bioperl-l] Memory Leak in Bio::SearchIO >To: > >Hey everyone, > > > >I have been developing some code to download and parse blast reports >from a remote server using Soap::Lite as well as insert the results into >a mysql database. The problem I am having is that my program seems to be >taking up and huge amount of RAM. For a single job of 10000 queries it >can consume as much as a couple hundred Mb inside an hour. If you're parsing 10000 queries (10000 different BLAST reports, right?) then it's not necessarily a memory leak as much as it is object creatio. Each report generates hit objects which in turn generate hsp objects. I think Jason recommends using the tabular output option (-m8 or -m9) for huge reports as it cuts down considerably on this. If you are cycling through each report it shouldn't be as much of a problem unless your BLAST reports are really huge. Have you tried parsing a single report to see if the problem persists? Now, if you are using Bioperl 1.5.1 with BLAST 2.2.13 or newer, you'll likely run into a problem with an infinite loop that occurs due to a change in NCBI's text output. You can try updating bioperl from CVS in either case to see if that helps any. Tabular output and XML output, AFAIK, is the same regardless of version; this bug only affected text output of BLAST reports. > I realize >that a lot of work is being done but this seems like way too much. This >leads me to the subject of my post. I think I may have traced the source >of the memory leak to Bio::SearchIO. I have used Devel::Size to track >the size of my variables and done other debugging steps and have had no >luck with resolving this very frustrating problem. My code is as >follows: > > > > my $result = $connector->getQueryResult($query_id); > > > > my $FH; > > open $FH, "<", \$result; > > > > my $searchio = new Bio::SearchIO(-format => "blast", > > > > -fh => $FH); > > > > while (my $o_blast = $searchio->next_result()) { > > my $clone_id = $o_blast->query_name(); > > > > my $statement = $bdbi->form_push_SQL ($o_blast, >$clone_id, 5); > > > >this is just the leading and tailing code surrounding the use of >Bio::SearchIO since there is quite a lot. I am mostly just wondering if >anyone has ever had problems with SearchIO and its memory usage. I >looked at the source code for it but am afraid it is out of my league. >Any help/suggestions/questions would be great. Thanks > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From torsten.seemann at infotech.monash.edu.au Mon May 15 20:18:44 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 16 May 2006 10:18:44 +1000 Subject: [Bioperl-l] Memory Leak in Bio::SearchIO In-Reply-To: <320530F83FA47047823E57F110DDEAADB159EC@onncrxms4.agr.gc.ca> References: <320530F83FA47047823E57F110DDEAADB159EC@onncrxms4.agr.gc.ca> Message-ID: <44691A64.8040607@infotech.monash.edu.au> > taking up and huge amount of RAM. For a single job of 10000 queries it > can consume as much as a couple hundred Mb inside an hour. I realize > my $result = $connector->getQueryResult($query_id); > my $searchio = new Bio::SearchIO(-format => "blast", > while (my $o_blast = $searchio->next_result()) { > my $clone_id = $o_blast->query_name(); > my $statement = $bdbi->form_push_SQL ($o_blast, $clone_id, 5); } Some comments: Have you considered that whatever class/module $bdbi belongs to is causing the problem? ie. is it keeping a reference to $o_blast around? Are you aware that Perl garbage collection does not necessarily return freed memory back to the OS? This may affect how you were measuring "memory usage". -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From kmdaily at indiana.edu Mon May 15 17:00:12 2006 From: kmdaily at indiana.edu (Daily, Kenneth Michael) Date: Mon, 15 May 2006 17:00:12 -0400 Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO Message-ID: <20528E699A515C499B80C222BDBEBC34043FF8@iu-mssg-mbx108.ads.iu.edu> I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in Bio/SeqIO). How can I get this module? Kenny Daily IU School of Informatics kmdaily at indiana.edu From letondal at pasteur.fr Tue May 16 02:06:19 2006 From: letondal at pasteur.fr (Catherine Letondal) Date: Tue, 16 May 2006 08:06:19 +0200 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: References: <000901c67827$d99eabb0$15327e82@pyrimidine> Message-ID: <9c36140009c3d80bbb0d543376afa6e0@pasteur.fr> On May 15, 2006, at 9:34 PM, David Messina wrote: >>>> A couple of minor interface thoughts. >>>> >>>> 1)There's quite a lot of methods for many of the classes. As such, I >>>> think I'll often want to browse through what's available in a >>>> class. But >>>> 60% or so of the screen real estate is used for "Enter a search >>>> string... OR select a class from the list". IMO, it would be >>>> better to >>>> have two pages, a search page and a result page. It only takes >>>> a click >>>> on Back (or a "new search" button) to get to a new search, and >>>> now you >>>> can use your whole screen for reading your results. >>> >>> As the compromise it must be, I like the way it behaves. I don't like >>> lots of windows. I especially don't like pop up windows. Right now >>> when >>> I'm using the bioperl docs I tend to have a whole bunch of tabs >>> open to >>> different class pages at once, so being able to see an overview >>> all on >>> one page in Deobfuscator is very nice. > > I think the current behavior makes sense as the default, but I like > the idea of being able to view the search results in a separate > window for easier browsing. Thanks for the suggestion; I'll add it to > the list. > First, thanks for this very useful Web interface! There are examples (quite ajaxian ones) that reach a compromise between several windows for easily browsing large results, and composing everything in one window to get an overview - the 2 examples that come in my mind currently are (not biology related): - http://montreal.mspace.fm/chi/sched/ - http://www.live.com/ (see the slider on the top right enabling to squeeze or enlarge the results area) -- Catherine Letondal -- Institut Pasteur From cjfields at uiuc.edu Tue May 16 07:38:42 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Tue, 16 May 2006 06:38:42 -0500 Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO Message-ID: <36b52ba4.b94c5b79.8198c00@expms6.cites.uiuc.edu> You'll have to install from CVS. I believe Brian added Entrezgene.pm after the lst developer release (1.5.1): http://www.bioperl.org/wiki/Installing_BioPerl Chris ---- Original message ---- >Date: Mon, 15 May 2006 17:00:12 -0400 >From: "Daily, Kenneth Michael" >Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO >To: > >I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in Bio/SeqIO). How can I get this module? > >Kenny Daily >IU School of Informatics >kmdaily at indiana.edu > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From bernd.web at gmail.com Tue May 16 07:37:46 2006 From: bernd.web at gmail.com (Bernd Web) Date: Tue, 16 May 2006 13:37:46 +0200 Subject: [Bioperl-l] Bio::DB::Query::GenBank checks Message-ID: <716af09c0605160437tfcf824dxa514f38f6b94d423@mail.gmail.com> Hi all, I was using Bio::DB::Query::GenBank to obtain only IDs from Entrez and found some issues and differences (bugs?) in behaviour wrt the pod. Do these look familiar ? Some example code: my $query = Bio::DB::Query::GenBank->new (-query =>'Lassa Virus[ORGN]', -reldate => '30', -db => 'protein', -ids => [195052,2981014,11127914], -maxids => 30 ); $gb = new Bio::DB::GenBank(format=>'fasta'); my $seqio = $gb->get_Stream_by_query($query); while (my $seq = $seqio->next_seq) { print $seq->desc,"\n"; } The module states that if we provide -ids that: If you provide an array reference of IDs in -ids, the query will be ignored and the list of IDs will be used when the query is passed to a Bio::DB::GenBank object's get_Stream_by_query() method. In the above case actually the query is passed ('Lassa Virus[ORGN]), not the IDs. Also $query->query shows the original query. Am I doing something wrong or is the pod not reflecting current behaviour of this module? I was also surprised that if internet is down no warning is thrown for $query->query or $query->count at all. Only the get_Stream_by_query above will warn us if the site is unreachable (500 Internal Server Error). $query->ids or $query->count will not throw a warning and @ids=$query->ids will just be an empty array. (I realize $query->count is not initialized, so I am using this now to check for succes, but a warning from WebDBSeqI would me more approprotiate I think). Last, the example from the pod is not working, but no warnings are raised: # initialize the list yourself my $query = Bio::DB::Query::GenBank->new(-ids=>[195052,2981014,11127914]); $query->count returns zero w/o any warning. Of course this query did not specify a DB. Only if we specify -db=>'nucleotide' $query->count is 3. However, why not any warning if we set -db->'protein' or if we did not set this? On the NCBI website searching Protein DB returns for 19505: See Details. No items found. The following term(s) refer to a different DB:195052 But this is not reflected via Bio::DB::Query::GenBank. Can I check for this situation in the code apart from checking on $query->count == 0 ? Or would it indeed be better to check for these situations in the module? Regards, Bernd From chen_li3 at yahoo.com Tue May 16 10:55:51 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 16 May 2006 07:55:51 -0700 (PDT) Subject: [Bioperl-l] module for 6 reading frames Message-ID: <20060516145551.50370.qmail@web36802.mail.mud.yahoo.com> Hi all, I wonder which module is available for translating DNA sequence into 6 reading frames. Thank you, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From smarkel at scitegic.com Tue May 16 11:10:35 2006 From: smarkel at scitegic.com (smarkel at scitegic.com) Date: Tue, 16 May 2006 08:10:35 -0700 Subject: [Bioperl-l] module for 6 reading frames In-Reply-To: <20060516145551.50370.qmail@web36802.mail.mud.yahoo.com> Message-ID: Li, Use the translate() function in Bio::Tools::CodonTable. Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at scitegic.com SciTegic Inc. mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 279 8804 USA web: http://www.scitegic.com bioperl-l-bounces at lists.open-bio.org wrote on 16.05.2006 07:55:51: > Hi all, > > I wonder which module is available for translating DNA > sequence into 6 reading frames. > > Thank you, > > Li From golharam at umdnj.edu Tue May 16 12:18:19 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Tue, 16 May 2006 12:18:19 -0400 Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene? Message-ID: <001f01c67904$59b08ad0$2f01a8c0@GOLHARMOBILE1> I just updated my local copy of bioperl from cvs. When I ran the configure script, it says I need the external module Bio::ASN1::EntrezGene. Which package contains this module? -- Ryan Golhar - golharam at umdnj.edu The Informatics Institute of UMDNJ From golharam at umdnj.edu Tue May 16 12:24:03 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Tue, 16 May 2006 12:24:03 -0400 Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene? Message-ID: <002001c67905$2622a580$2f01a8c0@GOLHARMOBILE1> Never mind. I see its in CPAN. -----Original Message----- From: Ryan Golhar [mailto:golharam at umdnj.edu] Sent: Tuesday, May 16, 2006 12:18 PM To: 'bioperl-l at bioperl.org' Subject: Where is Bio::ASN1::EntrezGene? I just updated my local copy of bioperl from cvs. When I ran the configure script, it says I need the external module Bio::ASN1::EntrezGene. Which package contains this module? -- Ryan Golhar - golharam at umdnj.edu The Informatics Institute of UMDNJ From cjfields at uiuc.edu Tue May 16 13:27:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 16 May 2006 12:27:32 -0500 Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene? In-Reply-To: <001f01c67904$59b08ad0$2f01a8c0@GOLHARMOBILE1> Message-ID: <002701c6790e$03d8f110$15327e82@pyrimidine> It's actually not part of Bioperl currently; you can find it on CPAN: http://search.cpan.org/~mingyiliu/Bio-ASN1-EntrezGene-1.091/lib/Bio/ASN1/Ent rezGene.pm Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Ryan Golhar > Sent: Tuesday, May 16, 2006 11:18 AM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene? > > I just updated my local copy of bioperl from cvs. When I ran the > configure script, it says I need the external module > Bio::ASN1::EntrezGene. Which package contains this module? > > -- > Ryan Golhar - golharam at umdnj.edu > The Informatics Institute of UMDNJ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ClarkeW at AGR.GC.CA Tue May 16 16:57:13 2006 From: ClarkeW at AGR.GC.CA (Clarke, Wayne) Date: Tue, 16 May 2006 16:57:13 -0400 Subject: [Bioperl-l] Memory Leak in Bio::SearchIO Message-ID: <320530F83FA47047823E57F110DDEAADB159FB@onncrxms4.agr.gc.ca> With regards to the suggestions/comments made thank you. However I think I should clear a few things up. I am running bioperl v1.4, I am cycling through the blast reports which should not be of absurd size since they only contain the top 5 hits, and I am using top to track(although I realize fairly inacuately) the memory usage. I have looked through the code for both AAFCBLAST and BEAST_UPDATE but do not believe the leak/problem to be contained within them since they are almost exclusively using method calls and those variables should be destroyed upon leaving the scope of the method. I have used Devel::Size to check the size of the variables $bdbi and $searchio and $connector and on each iteration these variables have the same size. Any other suggestions would be greatly appreciated as I have nearly gone insane trying to track this problem down. Thanks, Wayne -----Original Message----- From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au] Sent: Monday, May 15, 2006 6:19 PM To: Clarke, Wayne Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO > taking up and huge amount of RAM. For a single job of 10000 queries it > can consume as much as a couple hundred Mb inside an hour. I realize > my $result = $connector->getQueryResult($query_id); > my $searchio = new Bio::SearchIO(-format => "blast", > while (my $o_blast = $searchio->next_result()) { > my $clone_id = $o_blast->query_name(); > my $statement = $bdbi->form_push_SQL ($o_blast, $clone_id, 5); } Some comments: Have you considered that whatever class/module $bdbi belongs to is causing the problem? ie. is it keeping a reference to $o_blast around? Are you aware that Perl garbage collection does not necessarily return freed memory back to the OS? This may affect how you were measuring "memory usage". -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From smarkel at scitegic.com Tue May 16 16:52:05 2006 From: smarkel at scitegic.com (smarkel at scitegic.com) Date: Tue, 16 May 2006 13:52:05 -0700 Subject: [Bioperl-l] module for 6 reading frames In-Reply-To: <20060516200436.34908.qmail@web36812.mail.mud.yahoo.com> Message-ID: Li, You can either do the substring, and reverse complement, yourself or you can use the translate() function in Bio::PrimarySeq. It inherits from Bio::PrimarySeqI, so check there for the documentation. That translate() function takes a "-frame" argument. Scott PS In future, please respond to the list. That way others see the questions and answers. chen li wrote on 16.05.2006 13:04:36: > Dear Dr. Markel, > > I browse through the document of > Bio:Tools::Codontable and find this line: > > my $translation= $CodonTable->translate($seq); > > I think this line is to do the translation. Here is my > question: which line in the doc says how to translate > the remaining frames 2,3, and -1, -2, -3? > > > Thank you, > > Li > > --- smarkel at scitegic.com wrote: > > > Li, > > > > Use the translate() function in > > Bio::Tools::CodonTable. > > > > Scott > > > > Scott Markel, Ph.D. > > Principal Bioinformatics Architect email: > > smarkel at scitegic.com > > SciTegic Inc. mobile: +1 858 > > 205 3653 > > 10188 Telesis Court, Suite 100 voice: +1 858 > > 799 5603 > > San Diego, CA 92121 fax: +1 858 > > 279 8804 > > USA web: > > http://www.scitegic.com > > > > > > bioperl-l-bounces at lists.open-bio.org wrote on > > 16.05.2006 07:55:51: > > > > > Hi all, > > > > > > I wonder which module is available for translating > > DNA > > > sequence into 6 reading frames. > > > > > > Thank you, > > > > > > Li > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > > > -- > Click on the link below to report this email as spam > https://www.mailcontrol. > com/sr/YWaRnXqa+nSyeG1Z34OqL4dC5eYKMoJmYLQSBonkiAgNVwARwO! > frAkRrVu9wDE5L8wrIaSzXTpcs3mxX9Ufx7LAO0PQl77O8HiAh50c4TI! > ysIW++WTn79gM0HS11zvKPuUVANsGXCZT! > LRAY3PyyLo6NzoChgLXk6YfX05ndLG3vE+GH2aUSTxvV3pwd2! > JlBh9ARAt+OXXsyYtG6VgFNOO9GFnNxV From cjfields at uiuc.edu Tue May 16 17:15:10 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 16 May 2006 16:15:10 -0500 Subject: [Bioperl-l] Memory Leak in Bio::SearchIO In-Reply-To: <320530F83FA47047823E57F110DDEAADB159FB@onncrxms4.agr.gc.ca> Message-ID: <000601c6792d$d0ab1500$15327e82@pyrimidine> I mentioned two possibilities last time I posted: 1) that the BLAST file was too large, or 2) that you are using an old version of bioperl that SearchIO is broken. You seem to fit #2. The issue is that NCBI does not consider text BLAST output sacrosanct and routinely makes changes to it that break parsing. Due to this, SearchIO::blast needs to be constantly updated, so much so that there are normally a few updates a year to fix parsing issues in that module alone compared to BioPerl as a whole. And, BTW, although bioperl-1.4 is about 2 years old now, even bioperl-1.5.1 SearchIO is broken when it comes to the latest NCBI BLAST (2.2.14 now). I seriously suggest updating your local bioperl distribution to the latest bioperl-live (from CVS). Take one of those 10000 reports, just one, and try parsing it. If you have the same problem (a CPU spike and increasing memory usage) then it may be fixed in CVS. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne > Sent: Tuesday, May 16, 2006 3:57 PM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO > > > With regards to the suggestions/comments made thank you. However I think > I should clear a few things up. I am running bioperl v1.4, I am cycling > through the blast reports which should not be of absurd size since they > only contain the top 5 hits, and I am using top to track(although I > realize fairly inacuately) the memory usage. I have looked through the > code for both AAFCBLAST and BEAST_UPDATE but do not believe the > leak/problem to be contained within them since they are almost > exclusively using method calls and those variables should be destroyed > upon leaving the scope of the method. I have used Devel::Size to check > the size of the variables $bdbi and $searchio and $connector and on each > iteration these variables have the same size. Any other suggestions > would be greatly appreciated as I have nearly gone insane trying to > track this problem down. > > Thanks, Wayne > > > -----Original Message----- > From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au] > Sent: Monday, May 15, 2006 6:19 PM > To: Clarke, Wayne > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO > > > taking up and huge amount of RAM. For a single job of 10000 queries it > > can consume as much as a couple hundred Mb inside an hour. I realize > > > my $result = $connector->getQueryResult($query_id); > > my $searchio = new Bio::SearchIO(-format => "blast", > > while (my $o_blast = $searchio->next_result()) { > > my $clone_id = $o_blast->query_name(); > > my $statement = $bdbi->form_push_SQL > ($o_blast, $clone_id, 5); } > > Some comments: > > Have you considered that whatever class/module $bdbi belongs to is > causing the problem? ie. is it keeping a reference to $o_blast around? > > Are you aware that Perl garbage collection does not necessarily return > freed memory back to the OS? This may affect how you were measuring > "memory usage". > > -- > Dr Torsten Seemann http://www.vicbioinformatics.com > Victorian Bioinformatics Consortium, Monash University, Australia > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ClarkeW at AGR.GC.CA Tue May 16 17:24:51 2006 From: ClarkeW at AGR.GC.CA (Clarke, Wayne) Date: Tue, 16 May 2006 17:24:51 -0400 Subject: [Bioperl-l] Memory Leak in Bio::SearchIO Message-ID: <320530F83FA47047823E57F110DDEAADB159FC@onncrxms4.agr.gc.ca> Thanks Chris, I did forget to mention however that I did parse one single report and found no problems, it finished fast and with no noticeable memory usage. I will consider getting my SA to update bioperl from CVS as a precaution but he has already stated he prefers to wait for the release of v1.5. Even a single job of 10000 will finish but the problem is that I am trying to loop through many jobs of 10000 and it seems to be additive for reasons I can not determine. During testing I noticed that the RSS on top decreased around 80% MEM usage, but then the shared mem increased. I am wondering if this is due to the perl garbage collector freeing up memory but keeping it in its pool for use, if so that is fine as long as the it does not then want to reach into swapped mem. Thanks again, Wayne -----Original Message----- From: Chris Fields [mailto:cjfields at uiuc.edu] Sent: Tuesday, May 16, 2006 3:15 PM To: Clarke, Wayne; bioperl-l at lists.open-bio.org Subject: RE: [Bioperl-l] Memory Leak in Bio::SearchIO I mentioned two possibilities last time I posted: 1) that the BLAST file was too large, or 2) that you are using an old version of bioperl that SearchIO is broken. You seem to fit #2. The issue is that NCBI does not consider text BLAST output sacrosanct and routinely makes changes to it that break parsing. Due to this, SearchIO::blast needs to be constantly updated, so much so that there are normally a few updates a year to fix parsing issues in that module alone compared to BioPerl as a whole. And, BTW, although bioperl-1.4 is about 2 years old now, even bioperl-1.5.1 SearchIO is broken when it comes to the latest NCBI BLAST (2.2.14 now). I seriously suggest updating your local bioperl distribution to the latest bioperl-live (from CVS). Take one of those 10000 reports, just one, and try parsing it. If you have the same problem (a CPU spike and increasing memory usage) then it may be fixed in CVS. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne > Sent: Tuesday, May 16, 2006 3:57 PM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO > > > With regards to the suggestions/comments made thank you. However I think > I should clear a few things up. I am running bioperl v1.4, I am cycling > through the blast reports which should not be of absurd size since they > only contain the top 5 hits, and I am using top to track(although I > realize fairly inacuately) the memory usage. I have looked through the > code for both AAFCBLAST and BEAST_UPDATE but do not believe the > leak/problem to be contained within them since they are almost > exclusively using method calls and those variables should be destroyed > upon leaving the scope of the method. I have used Devel::Size to check > the size of the variables $bdbi and $searchio and $connector and on each > iteration these variables have the same size. Any other suggestions > would be greatly appreciated as I have nearly gone insane trying to > track this problem down. > > Thanks, Wayne > > > -----Original Message----- > From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au] > Sent: Monday, May 15, 2006 6:19 PM > To: Clarke, Wayne > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO > > > taking up and huge amount of RAM. For a single job of 10000 queries it > > can consume as much as a couple hundred Mb inside an hour. I realize > > > my $result = $connector->getQueryResult($query_id); > > my $searchio = new Bio::SearchIO(-format => "blast", > > while (my $o_blast = $searchio->next_result()) { > > my $clone_id = $o_blast->query_name(); > > my $statement = $bdbi->form_push_SQL > ($o_blast, $clone_id, 5); } > > Some comments: > > Have you considered that whatever class/module $bdbi belongs to is > causing the problem? ie. is it keeping a reference to $o_blast around? > > Are you aware that Perl garbage collection does not necessarily return > freed memory back to the OS? This may affect how you were measuring > "memory usage". > > -- > Dr Torsten Seemann http://www.vicbioinformatics.com > Victorian Bioinformatics Consortium, Monash University, Australia > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Tue May 16 17:45:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 16 May 2006 16:45:16 -0500 Subject: [Bioperl-l] Memory Leak in Bio::SearchIO In-Reply-To: <320530F83FA47047823E57F110DDEAADB159FC@onncrxms4.agr.gc.ca> Message-ID: <000801c67932$050dbd30$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne > Sent: Tuesday, May 16, 2006 4:25 PM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO > > > Thanks Chris, > > I did forget to mention however that I did parse one single report and > found no problems, it finished fast and with no noticeable memory usage. > I will consider getting my SA to update bioperl from CVS as a precaution > but he has already stated he prefers to wait for the release of v1.5. Um, you can tell him the last release was v.1.5.1 (last October). It's considered a developer release but is pretty stable; well, except for that whole SearchIO quibble, and that's not our fault. You could also install a local version in case he doesn't budge; see here: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPERL_I N_A_PERSONAL_MODULE_AREA Chris > Even a single job of 10000 will finish but the problem is that I am > trying to loop through many jobs of 10000 and it seems to be additive > for reasons I can not determine. During testing I noticed that the RSS > on top decreased around 80% MEM usage, but then the shared mem > increased. I am wondering if this is due to the perl garbage collector > freeing up memory but keeping it in its pool for use, if so that is fine > as long as the it does not then want to reach into swapped mem. > > Thanks again, Wayne > ... From cjfields at uiuc.edu Tue May 16 18:20:29 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 16 May 2006 17:20:29 -0500 Subject: [Bioperl-l] Bio::DB::Query::GenBank checks In-Reply-To: <716af09c0605160437tfcf824dxa514f38f6b94d423@mail.gmail.com> Message-ID: <000901c67936$f0896990$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Bernd Web > Sent: Tuesday, May 16, 2006 6:38 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::DB::Query::GenBank checks > > Hi all, > > I was using Bio::DB::Query::GenBank to obtain only IDs from Entrez and > found some issues and differences (bugs?) in behaviour wrt the pod. > Do these look familiar ? > > Some example code: > my $query = Bio::DB::Query::GenBank->new > (-query =>'Lassa Virus[ORGN]', > -reldate => '30', > -db => 'protein', > -ids => [195052,2981014,11127914], > -maxids => 30 ); > > $gb = new Bio::DB::GenBank(format=>'fasta'); > my $seqio = $gb->get_Stream_by_query($query); > while (my $seq = $seqio->next_seq) { > print $seq->desc,"\n"; } > > The module states that if we provide -ids that: > If you provide an array reference of IDs in -ids, the query will be > ignored and the list of IDs will be used when the query is passed > to a > Bio::DB::GenBank object's get_Stream_by_query() method. > > In the above case actually the query is passed ('Lassa Virus[ORGN]), > not the IDs. Also $query->query shows the original query. Am I doing > something wrong or is the pod not reflecting current behaviour of this > module? > > I was also surprised that if internet is down no warning is thrown for > $query->query or $query->count at all. Only the get_Stream_by_query > above will warn us if the site is unreachable (500 Internal Server > Error). I believe this has to do with the difference in the objects and the way they retrieve request data; Bio::DB::GenBank and Bio::DB::Query::GenBank use different methods to retrieve ids, Bio::DB::GenBank's get_Stream_by_query method just makes it a bit easier to retrieve a list of uid's directly instead of saving them as an array then reposting them using get_Stream_by_id. Not fullproof but it works okay. > $query->ids or $query->count will not throw a warning and > @ids=$query->ids will just be an empty array. (I realize $query->count > is not initialized, so I am using this now to check for succes, but a > warning from WebDBSeqI would me more approprotiate I think). WebDBSeqI would be the place to make general warnings (it supposed to be and interface for any web seq DB), but not eutils-specific warnings. > Last, the example from the pod is not working, but no warnings are raised: > # initialize the list yourself > my $query = > Bio::DB::Query::GenBank->new(-ids=>[195052,2981014,11127914]); > > $query->count returns zero w/o any warning. Of course this query did > not specify a DB. Only if we specify -db=>'nucleotide' $query->count > is 3. > However, why not any warning if we set -db->'protein' or if we did not set > this? > > > On the NCBI website searching Protein DB returns for 19505: > See Details. No items found. > The following term(s) refer to a different DB:195052 > > But this is not reflected via Bio::DB::Query::GenBank. > > Can I check for this situation in the code apart from checking on > $query->count == 0 ? Or would it indeed be better to check for these > situations in the module? > > Regards, > Bernd I can probably play around with adding a few things in tomorrow and clean up the POD somewhat. I'm planning a rewrite for EUtilities-based searches but that's a ways off still... Can't promise much;l I'm pretty busy til next week. Chris From chen_li3 at yahoo.com Tue May 16 20:53:17 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 16 May 2006 17:53:17 -0700 (PDT) Subject: [Bioperl-l] module for formating sequence output on the screen Message-ID: <20060517005317.3976.qmail@web36815.mail.mud.yahoo.com> Hi all, Thank you very much for the help. I have some DNA sequences printed on the screen. But the default output is longer than I expect. I need 50 necleotides/line. I search CPAN but can not get the right module. Which bioperl module can do this job? Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From kmdaily at indiana.edu Tue May 16 09:57:52 2006 From: kmdaily at indiana.edu (Daily, Kenneth Michael) Date: Tue, 16 May 2006 09:57:52 -0400 Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO References: <36b52ba4.b94c5b79.8198c00@expms6.cites.uiuc.edu> Message-ID: <20528E699A515C499B80C222BDBEBC34043FFB@iu-mssg-mbx108.ads.iu.edu> OK, got that installed. But I still get an error: Can't locate object method "url" via package "Bio::Annotation::DBLink" at /home/kmdaily/src/bioperl/core/Bio/SeqIO/entrezgene.pm line 557. I am using this on a shared system, and an older version of Bioperl was installed by the admin. But the path to the one I downloaded via CVS is first in the list @INC, and PERL5LIB="/home/kmdaily/src/bioperl/core". Kenny Daily IU School of Informatics kmdaily at indiana.edu -----Original Message----- From: Christopher Fields [mailto:cjfields at uiuc.edu] Sent: Tue 5/16/2006 7:38 AM To: Daily, Kenneth Michael; bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] entrezgene.pm not in Bio::SeqIO You'll have to install from CVS. I believe Brian added Entrezgene.pm after the lst developer release (1.5.1): http://www.bioperl.org/wiki/Installing_BioPerl Chris ---- Original message ---- >Date: Mon, 15 May 2006 17:00:12 -0400 >From: "Daily, Kenneth Michael" >Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO >To: > >I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in Bio/SeqIO). How can I get this module? > >Kenny Daily >IU School of Informatics >kmdaily at indiana.edu > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From skirov at utk.edu Wed May 17 07:48:29 2006 From: skirov at utk.edu (Stefan Kirov) Date: Wed, 17 May 2006 07:48:29 -0400 Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO In-Reply-To: <20528E699A515C499B80C222BDBEBC34043FFB@iu-mssg-mbx108.ads.iu.edu> References: <36b52ba4.b94c5b79.8198c00@expms6.cites.uiuc.edu> <20528E699A515C499B80C222BDBEBC34043FFB@iu-mssg-mbx108.ads.iu.edu> Message-ID: <446B0D8D.40901@utk.edu> You are using an old Bio::Annotation::DBLink module. Did you download only entrezgene.pm or the whole bioperl? If yes, what does the tests tell you? Stefan Daily, Kenneth Michael wrote: >OK, got that installed. But I still get an error: > >Can't locate object method "url" via package "Bio::Annotation::DBLink" at /home/kmdaily/src/bioperl/core/Bio/SeqIO/entrezgene.pm line 557. > >I am using this on a shared system, and an older version of Bioperl was installed by the admin. But the path to the one I downloaded via CVS is first in the list @INC, and PERL5LIB="/home/kmdaily/src/bioperl/core". > >Kenny Daily >IU School of Informatics >kmdaily at indiana.edu > > > >-----Original Message----- >From: Christopher Fields [mailto:cjfields at uiuc.edu] >Sent: Tue 5/16/2006 7:38 AM >To: Daily, Kenneth Michael; bioperl-l at lists.open-bio.org >Subject: Re: [Bioperl-l] entrezgene.pm not in Bio::SeqIO > >You'll have to install from CVS. I believe Brian added Entrezgene.pm after the lst >developer release (1.5.1): > >http://www.bioperl.org/wiki/Installing_BioPerl > >Chris > >---- Original message ---- > > >>Date: Mon, 15 May 2006 17:00:12 -0400 >>From: "Daily, Kenneth Michael" >>Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO >>To: >> >>I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in >> >> >Bio/SeqIO). How can I get this module? > > >>Kenny Daily >>IU School of Informatics >>kmdaily at indiana.edu >> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From osborne1 at optonline.net Tue May 16 20:46:00 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 16 May 2006 20:46:00 -0400 Subject: [Bioperl-l] module for 6 reading frames In-Reply-To: Message-ID: Chen Li, There's some documentation on translate() in bptutorial: http://bioperl.org/Core/Latest/bptutorial.html You could also use the translate_6frames() method of Bio::SeqUtils. Brian O. On 5/16/06 4:52 PM, "smarkel at scitegic.com" wrote: > Li, > > You can either do the substring, and reverse complement, yourself > or you can use the translate() function in Bio::PrimarySeq. It > inherits from Bio::PrimarySeqI, so check there for the documentation. > That translate() function takes a "-frame" argument. > > Scott > > PS In future, please respond to the list. That way others see > the questions and answers. > > chen li wrote on 16.05.2006 13:04:36: > >> Dear Dr. Markel, >> >> I browse through the document of >> Bio:Tools::Codontable and find this line: >> >> my $translation= $CodonTable->translate($seq); >> >> I think this line is to do the translation. Here is my >> question: which line in the doc says how to translate >> the remaining frames 2,3, and -1, -2, -3? >> >> >> Thank you, >> >> Li >> >> --- smarkel at scitegic.com wrote: >> >>> Li, >>> >>> Use the translate() function in >>> Bio::Tools::CodonTable. >>> >>> Scott >>> >>> Scott Markel, Ph.D. >>> Principal Bioinformatics Architect email: >>> smarkel at scitegic.com >>> SciTegic Inc. mobile: +1 858 >>> 205 3653 >>> 10188 Telesis Court, Suite 100 voice: +1 858 >>> 799 5603 >>> San Diego, CA 92121 fax: +1 858 >>> 279 8804 >>> USA web: >>> http://www.scitegic.com >>> >>> >>> bioperl-l-bounces at lists.open-bio.org wrote on >>> 16.05.2006 07:55:51: >>> >>>> Hi all, >>>> >>>> I wonder which module is available for translating >>> DNA >>>> sequence into 6 reading frames. >>>> >>>> Thank you, >>>> >>>> Li >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> __________________________________________________ >> Do You Yahoo!? >> Tired of spam? Yahoo! Mail has the best spam protection around >> http://mail.yahoo.com >> >> >> -- >> Click on the link below to report this email as spam >> https://www.mailcontrol. >> com/sr/YWaRnXqa+nSyeG1Z34OqL4dC5eYKMoJmYLQSBonkiAgNVwARwO! >> frAkRrVu9wDE5L8wrIaSzXTpcs3mxX9Ufx7LAO0PQl77O8HiAh50c4TI! >> ysIW++WTn79gM0HS11zvKPuUVANsGXCZT! >> LRAY3PyyLo6NzoChgLXk6YfX05ndLG3vE+GH2aUSTxvV3pwd2! >> JlBh9ARAt+OXXsyYtG6VgFNOO9GFnNxV > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From e-just at northwestern.edu Wed May 17 11:03:41 2006 From: e-just at northwestern.edu (Eric Just) Date: Wed, 17 May 2006 10:03:41 -0500 Subject: [Bioperl-l] Modware: a BioPerl based API for Chado Message-ID: <6.1.1.1.2.20060517095821.13353920@hecky.it.northwestern.edu> Hi Everyone, We are announcing a new Sourceforge Project called Modware that may be of interest to you. It is an object-oriented API written in Perl that creates BioPerl object representations of biological features stored in a Chado database. It basically creates a Bio::Seq object for chromosomes in Chado and creates Bio::SeqFeature::Gene objects for protein coding transcripts stored in Chado. Things like contigs are represented as Bio::SeqFeature::Generic objects. We also provide many methods for manipulating these objects once they are in memory. For download please visit our Sourceforge project page: http://sourceforge.net/projects/gmod-ware For API documentation and some short examples of selected use cases visit our project home page: http://gmod-ware.sourceforge.net/ This software is adapted from the production middleware code that dictyBase uses. Modware 0.1 requires the latest stable GMOD release: 0.003 be installed. We are currently calling it a release candidate and if we get some feedback will call it an official release if there are no major install bugs (we've installed it only on two different machines). If you would like a version that works on the latest CVS version of GMOD, let me know and I'll expedite getting that out the door. Lastly, please use the direct download version, we have not fully recovered from the recent Sourceforge CVS issues. Please try the software out and let us know what you think! Sincerely, Eric Just and Sohel Merchant e-just at northwestern.edu s-merchant at northwestern.edu ============================================ Eric Just e-just at northwestern.edu dictyBase Programmer Center for Genetic Medicine Northwestern University http://dictybase.org ============================================ From sb at mrc-dunn.cam.ac.uk Wed May 17 13:46:45 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Wed, 17 May 2006 18:46:45 +0100 Subject: [Bioperl-l] Bio::Map:: enhancements Message-ID: <446B6185.1000602@mrc-dunn.cam.ac.uk> I added bug http://bugzilla.bioperl.org/show_bug.cgi?id=1998 I'm interested in what people have to say about the secondary enhancement I talk about there. Is it a sane thing to do? What are the better ways of doing that? If it /is/ ok, I suppose I'd have to go back and alter Bio::Map::MappableI and Bio::Map::MarkerI as well, not just Marker. Oh, on a side note, you'll see I had to override RangeI's intersection method to work on multiple ranges. Why is RangeI limited to an intersection of only two ranges? Cheers, Sendu. From David_Waner/San_Diego/Accelrys at scitegic.com Thu May 18 15:30:46 2006 From: David_Waner/San_Diego/Accelrys at scitegic.com (David_Waner/San_Diego/Accelrys at scitegic.com) Date: Thu, 18 May 2006 12:30:46 -0700 Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8 on Windows Message-ID: BioPerl Users/Developers, In our testing we have found severe performance problems using BioPerl with Perl 5.8 on Windows (but not on Linux). They show up especially in SeqIO when reading or writing Fasta files containing large (~16 MB) sequences. The same files that can be read in 1 or 2 seconds with Windows Perl 5.6 or Linux Perl 5.8, take minutes in Windows Perl 5.8. Although the fault is clearly with Perl, not with BioPerl, I have identified a couple of places where BioPerl could be modified in order to save Windows Perl 5.8 users a lot of time, while not affecting other users. For example, in my testing the following excerpt from Bio::Root::IO::_readline() takes 50 seconds (!) to execute (when reading a 16 MB sequence): if( (!$param{-raw}) && (defined $line) ) { $line =~ s/\015?\012/\n/g; $line =~ s/\015/\n/g unless $ONMAC; } whereas the following replacement code should be equivalent: if( (!$param{-raw}) && (defined $line) ) { $line =~ s/\015\012/\012/g; # Change all CR/LF pairs to LF $line =~ tr/\015/\n/ unless $ONMAC; # Change all single CRs to NEWLINE } but executes in less than 1 second. In addition, changing: defined $sequence && $sequence =~ s/\s//g; # Remove whitespace to: defined $sequence && $sequence =~ tr/ \t\n\r//d; # Remove whitespace in Bio::SeqIO::fasta.pm saves an additional ~20 seconds. There are also problems in reading files with the <> operator when $/ is redefined to "\n>", where reading the first line of Fasta files containing large sequences takes ~50 seconds, but reading subsequent lines or files takes about 1 second. I don't have a work-around for this. I would like to ask the mailing list: 1. Has anyone else run into this problem? Any fixes? 2. Do you think BioPerl should incorporate these changes? I plan to submit a bug report to perlbug, but don't know when or if the problem will be fixed. - David From cjfields at uiuc.edu Thu May 18 16:07:14 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 18 May 2006 15:07:14 -0500 Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8 onWindows In-Reply-To: Message-ID: <002901c67ab6$a84c3140$15327e82@pyrimidine> David, I have seen some slowdowns with Bio::SeqIO associated with GenBank files, which this could be related to. I can't do anything about it (test or commit changes) until next week but someone else using Windows might (though we are few and far between, and I'm switching to Mac OS X in fall). Would be nice to try the changes and test it out on a few platforms. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of > David_Waner/San_Diego/Accelrys at scitegic.com > Sent: Thursday, May 18, 2006 2:31 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8 > onWindows > > BioPerl Users/Developers, > > In our testing we have found severe performance problems using BioPerl > with Perl 5.8 on Windows (but not on Linux). They show up especially in > SeqIO when reading or writing Fasta files containing large (~16 MB) > sequences. The same files that can be read in 1 or 2 seconds with Windows > Perl 5.6 or Linux Perl 5.8, take minutes in Windows Perl 5.8. > > Although the fault is clearly with Perl, not with BioPerl, I have > identified a couple of places where BioPerl could be modified in order to > save Windows Perl 5.8 users a lot of time, while not affecting other > users. > > For example, in my testing the following excerpt from > Bio::Root::IO::_readline() takes 50 seconds (!) to execute (when reading a > 16 MB sequence): > > if( (!$param{-raw}) && (defined $line) ) { > $line =~ s/\015?\012/\n/g; > $line =~ s/\015/\n/g unless $ONMAC; > } > > whereas the following replacement code should be equivalent: > > if( (!$param{-raw}) && (defined $line) ) { > $line =~ s/\015\012/\012/g; # Change all > CR/LF pairs to LF > $line =~ tr/\015/\n/ unless $ONMAC; # Change all single CRs to > NEWLINE > } > > but executes in less than 1 second. > > In addition, changing: > > defined $sequence && $sequence =~ s/\s//g; # Remove whitespace > > to: > > defined $sequence && $sequence =~ tr/ \t\n\r//d; # Remove > whitespace > > in Bio::SeqIO::fasta.pm saves an additional ~20 seconds. > > There are also problems in reading files with the <> operator when $/ is > redefined to "\n>", where reading the first line of Fasta files containing > large sequences takes ~50 seconds, but reading subsequent lines or files > takes about 1 second. I don't have a work-around for this. > > I would like to ask the mailing list: > > 1. Has anyone else run into this problem? Any fixes? > 2. Do you think BioPerl should incorporate these changes? > > I plan to submit a bug report to perlbug, but don't know when or if the > problem will be fixed. > > - David > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Thu May 18 16:27:57 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 18 May 2006 16:27:57 -0400 Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8 on Windows In-Reply-To: Message-ID: David, What are the results from the relevant t/*t files before and after these patches? Brian O. On 5/18/06 3:30 PM, "David_Waner/San_Diego/Accelrys at scitegic.com" wrote: > BioPerl Users/Developers, > > In our testing we have found severe performance problems using BioPerl > with Perl 5.8 on Windows (but not on Linux). They show up especially in > SeqIO when reading or writing Fasta files containing large (~16 MB) > sequences. The same files that can be read in 1 or 2 seconds with Windows > Perl 5.6 or Linux Perl 5.8, take minutes in Windows Perl 5.8. > > Although the fault is clearly with Perl, not with BioPerl, I have > identified a couple of places where BioPerl could be modified in order to > save Windows Perl 5.8 users a lot of time, while not affecting other > users. > > For example, in my testing the following excerpt from > Bio::Root::IO::_readline() takes 50 seconds (!) to execute (when reading a > 16 MB sequence): > > if( (!$param{-raw}) && (defined $line) ) { > $line =~ s/\015?\012/\n/g; > $line =~ s/\015/\n/g unless $ONMAC; > } > > whereas the following replacement code should be equivalent: > > if( (!$param{-raw}) && (defined $line) ) { > $line =~ s/\015\012/\012/g; # Change all > CR/LF pairs to LF > $line =~ tr/\015/\n/ unless $ONMAC; # Change all single CRs to > NEWLINE > } > > but executes in less than 1 second. > > In addition, changing: > > defined $sequence && $sequence =~ s/\s//g; # Remove whitespace > > to: > > defined $sequence && $sequence =~ tr/ \t\n\r//d; # Remove > whitespace > > in Bio::SeqIO::fasta.pm saves an additional ~20 seconds. > > There are also problems in reading files with the <> operator when $/ is > redefined to "\n>", where reading the first line of Fasta files containing > large sequences takes ~50 seconds, but reading subsequent lines or files > takes about 1 second. I don't have a work-around for this. > > I would like to ask the mailing list: > > 1. Has anyone else run into this problem? Any fixes? > 2. Do you think BioPerl should incorporate these changes? > > I plan to submit a bug report to perlbug, but don't know when or if the > problem will be fixed. > > - David > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Thu May 18 16:41:27 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Thu, 18 May 2006 14:41:27 -0600 Subject: [Bioperl-l] parsing xml output Message-ID: <446CDBF7.10908@gmx.at> hi, what is the best way to parse NCBI- and WU- Blast XML output.... and is it possible to parse both with the same parser, or differ their XML output... thanks From staffa at niehs.nih.gov Thu May 18 16:49:15 2006 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Thu, 18 May 2006 16:49:15 -0400 Subject: [Bioperl-l] Reading GenBank Genomic File Annotation Message-ID: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov> Would like a fairly simple way to extract certain information from Genbank Genomic File Annotations. Namely the six D.melanogaster sequences. Specifically to find gene entries and learn the gene name, begin and end and CDS. Please point me to appropriate modules and documentation. Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Information Technology Support Services Contract (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina From adamnkraut at gmail.com Thu May 18 17:07:42 2006 From: adamnkraut at gmail.com (Adam Kraut) Date: Thu, 18 May 2006 17:07:42 -0400 Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline C? Message-ID: <134ede0b0605181407l52d1c2c3x79dd7f177ae7b828@mail.gmail.com> I am currently using a pairwise alignment algorithm written in C (not by me). The program consists of a library of routines, structures, and definitions which I do not want to spend a lot of time abstracting. I already have a hack method of writing the parameters and inputs I want from perl, calling the c program with system( ), and then parsing the output in Perl. Any good programmer would probably smack me but I'm just an undergrad and I needed to show my boss that this works in order to spend more time on it. So on to my question, what is the preferred method of extending Bioperl to use this algorithm? I have just read the XS tutorial and a bit about Inline C. Can I put the main function in my script using Inline, and then just point Inline at the rest of the C library? The program has several C-structures that are semantically equivalent to Bioperl objects, so just need somewhere to start. I will spend some more time so that I have a more specific question, I just wanted a little feedback, this is my first post to the bioperl list. Thanks, Adam From osborne1 at optonline.net Thu May 18 17:54:01 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 18 May 2006 17:54:01 -0400 Subject: [Bioperl-l] Reading GenBank Genomic File Annotation In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov> Message-ID: Nick, Have you read the Feature-Annotation HOWTO? This would be a good starting point... Brian O. On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" wrote: > Would like a fairly simple way to extract certain information from Genbank > Genomic File Annotations. > Namely the six D.melanogaster sequences. > Specifically to find gene entries and learn the gene name, begin and end and > CDS. > Please point me to appropriate modules and documentation. > > > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Information Technology Support Services Contract > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Thu May 18 18:22:32 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu, 18 May 2006 18:22:32 -0400 Subject: [Bioperl-l] parsing xml output In-Reply-To: <446CDBF7.10908@gmx.at> References: <446CDBF7.10908@gmx.at> Message-ID: we don't parse WU-BLAST XML at this time. We'd welcome someone contributing this. ncbi XML is parsed with blastxml format. -jason On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote: > hi, > what is the best way to parse NCBI- and WU- Blast XML output.... > and is it possible to parse both with the same parser, or differ their > XML output... > > thanks > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From MEC at stowers-institute.org Thu May 18 18:39:15 2006 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Thu, 18 May 2006 17:39:15 -0500 Subject: [Bioperl-l] module for formating sequence output on the screen Message-ID: Li, Here's a one-liner that uses bioperl's Bio::SeqIO module to reformat fasta on standard input to 50 char wide fasta on standard output. perl -MBio::SeqIO -e 'select Bio::SeqIO->newFh(-format => "fasta", -width => 50); $in = Bio::SeqIO->newFh(-format => "fasta", -fh => \*STDIN); print while <$in>' You can call it like this: perl -MBio::SeqIO -e 'select Bio::SeqIO->newFh(-format => "fasta", -width => 50); $in = Bio::SeqIO->newFh(-format => "fasta", -fh => \*STDIN); print while <$in>' inputfile.fasta > outputfile.fasta Does this help? --Malcolm Cook >-----Original Message----- >From: bioperl-l-bounces at lists.open-bio.org >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chen li >Sent: Tuesday, May 16, 2006 7:53 PM >To: bioperl-l at bioperl.org >Subject: [Bioperl-l] module for formating sequence output on the screen > >Hi all, > >Thank you very much for the help. > >I have some DNA sequences printed on the screen. But >the default output is longer than I expect. I need 50 >necleotides/line. I search CPAN but can not get the >right module. Which bioperl module can do this job? > >Li > >__________________________________________________ >Do You Yahoo!? >Tired of spam? Yahoo! Mail has the best spam protection around >http://mail.yahoo.com >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From gish at watson.wustl.edu Thu May 18 19:57:03 2006 From: gish at watson.wustl.edu (Warren Gish) Date: Thu, 18 May 2006 18:57:03 -0500 Subject: [Bioperl-l] parsing xml output In-Reply-To: Message-ID: <009f01c67ad6$c359a560$0d00a8c0@PM> Just to clarify, the XML output from WU-BLAST conforms to the standard NCBI_BlastOutput.dtd. Technically, contents of data fields could still be incompatible, but care was taken to ensure compatibility. If someone identifies a difference that prevents parsing or proper interpretation of the WU-BLAST output, please let me know. Regards, --Warren > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Jason Stajich > Sent: Thursday, May 18, 2006 5:23 PM > To: Hubert Prielinger > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] parsing xml output > > we don't parse WU-BLAST XML at this time. We'd welcome someone > contributing this. > > ncbi XML is parsed with blastxml format. > > -jason > On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote: > > > hi, > > what is the best way to parse NCBI- and WU- Blast XML output.... > > and is it possible to parse both with the same parser, or > differ their > > XML output... > > > > thanks > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at uiuc.edu Thu May 18 21:10:50 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Thu, 18 May 2006 20:10:50 -0500 Subject: [Bioperl-l] parsing xml output Message-ID: Just to make sure everybody knows, if you use bioperl v1.5.1, SearchIO::blastxml uses XML::Parser which should come with most recent perl distributions. The bioperl-live version has switched over to XML::SAX for SAX2 parsing and it is recommended that you install XML::SAX::ExpatXS as well for faster parsing. Chris ---- Original message ---- >Date: Thu, 18 May 2006 18:57:03 -0500 >From: "Warren Gish" >Subject: Re: [Bioperl-l] parsing xml output >To: "'Hubert Prielinger'" >Cc: bioperl-l at bioperl.org > >Just to clarify, the XML output from WU-BLAST conforms to the standard >NCBI_BlastOutput.dtd. Technically, contents of data fields could still be >incompatible, but care was taken to ensure compatibility. If someone >identifies a difference that prevents parsing or proper interpretation of >the WU-BLAST output, please let me know. >Regards, >--Warren > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Jason Stajich >> Sent: Thursday, May 18, 2006 5:23 PM >> To: Hubert Prielinger >> Cc: bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] parsing xml output >> >> we don't parse WU-BLAST XML at this time. We'd welcome someone >> contributing this. >> >> ncbi XML is parsed with blastxml format. >> >> -jason >> On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote: >> >> > hi, >> > what is the best way to parse NCBI- and WU- Blast XML output.... >> > and is it possible to parse both with the same parser, or >> differ their >> > XML output... >> > >> > thanks >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Fri May 19 08:52:13 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri, 19 May 2006 08:52:13 -0400 Subject: [Bioperl-l] parsing xml output In-Reply-To: <009f01c67ad6$c359a560$0d00a8c0@PM> References: <009f01c67ad6$c359a560$0d00a8c0@PM> Message-ID: <360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu> Whoops - sorry Warren - for some reason I had it in my mind that it was different. So the blastxml parser should work fine. The WUBLAST tab-delimited output is different than NCBI's -m8/9 though, right? -jason On May 18, 2006, at 7:57 PM, Warren Gish wrote: > Just to clarify, the XML output from WU-BLAST conforms to the standard > NCBI_BlastOutput.dtd. Technically, contents of data fields could > still be > incompatible, but care was taken to ensure compatibility. If someone > identifies a difference that prevents parsing or proper > interpretation of > the WU-BLAST output, please let me know. > Regards, > --Warren > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Jason Stajich >> Sent: Thursday, May 18, 2006 5:23 PM >> To: Hubert Prielinger >> Cc: bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] parsing xml output >> >> we don't parse WU-BLAST XML at this time. We'd welcome someone >> contributing this. >> >> ncbi XML is parsed with blastxml format. >> >> -jason >> On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote: >> >>> hi, >>> what is the best way to parse NCBI- and WU- Blast XML output.... >>> and is it possible to parse both with the same parser, or >> differ their >>> XML output... >>> >>> thanks >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From torsten.seemann at infotech.monash.edu.au Thu May 18 18:42:05 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 19 May 2006 08:42:05 +1000 Subject: [Bioperl-l] parsing xml output In-Reply-To: <446CDBF7.10908@gmx.at> References: <446CDBF7.10908@gmx.at> Message-ID: <446CF83D.60207@infotech.monash.edu.au> > what is the best way to parse NCBI- and WU- Blast XML output.... > and is it possible to parse both with the same parser, or differ their > XML output... For NCBI BLAST XML format, use Bio::SearchIO->new(-format=>'blastxml', ...) I don't know if 'blastxml' will load WU-BLAST XML format. http://www.bioperl.org/wiki/HOWTO:SearchIO does not mention it. Why not try it, and report back the results to the bioperl list? -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia -------------- next part -------------- A non-text attachment was scrubbed... Name: torsten.seemann.vcf Type: text/x-vcard Size: 348 bytes Desc: not available URL: From torsten.seemann at infotech.monash.edu.au Thu May 18 18:37:17 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 19 May 2006 08:37:17 +1000 Subject: [Bioperl-l] Reading GenBank Genomic File Annotation In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov> References: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov> Message-ID: <446CF71D.2070207@infotech.monash.edu.au> Staffa, Nick (NIH/NIEHS) [C] wrote: > Would like a fairly simple way to extract certain information from Genbank Genomic File Annotations. > Namely the six D.melanogaster sequences. > Specifically to find gene entries and learn the gene name, begin and end and CDS. > Please point me to appropriate modules and documentation. http://www.bioperl.org/ -> http://www.bioperl.org/wiki/HOWTOs -> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation http://www.bioperl.org/ -> http://www.bioperl.org/wiki/FAQ -> http://www.bioperl.org/wiki/FAQ#Annotations_and_Features -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia -------------- next part -------------- A non-text attachment was scrubbed... Name: torsten.seemann.vcf Type: text/x-vcard Size: 348 bytes Desc: not available URL: From gish at watson.wustl.edu Fri May 19 10:50:08 2006 From: gish at watson.wustl.edu (Warren Gish) Date: Fri, 19 May 2006 09:50:08 -0500 Subject: [Bioperl-l] parsing xml output In-Reply-To: <360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu> References: <009f01c67ad6$c359a560$0d00a8c0@PM> <360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu> Message-ID: Right, the WU-BLAST tabbed output contains more fields. (See http:// blast.wustl.edu/blast/tabular.html). --Warren > Whoops - sorry Warren - for some reason I had it in my mind that it > was different. So the blastxml parser should work fine. The > WUBLAST tab-delimited output is different than NCBI's -m8/9 though, > right? > > -jason From adamnkraut at gmail.com Fri May 19 11:04:01 2006 From: adamnkraut at gmail.com (Adam Kraut) Date: Fri, 19 May 2006 11:04:01 -0400 Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline C? In-Reply-To: References: <134ede0b0605181407l52d1c2c3x79dd7f177ae7b828@mail.gmail.com> Message-ID: <134ede0b0605190804i60ee5ce1v984a33e0c91adf52@mail.gmail.com> The program generates an ensemble of weighted suboptimal alignments by use of a partition function and stochastic backtracking. The algorithm is quite novel and it's really only part of a larger multi-scale comparative modeling project. There documentation is here: http://www.tbi.univie.ac.at/~ulim/probA/probA_lib.html While I think this would be useful to the bioperl community if it were fully abstracted/extended, I would at the least like to be able to pass in any two sequences and get back SimpleAlign objects for our internal uses first. I have a good idea on how to get started. I will be sure to post when I get into trouble. On 5/19/06, aaron.j.mackey at gsk.com wrote: > > bioperl-ext is the package in which alignment algorithms and/or BioPerl > "wrapped" external C libraries live. Subprojects in bioperl-ext use both > XS and Inline::C, that's up to you. > > You'll need to get your C code compiled to a dynamically loaded library > (.so) to use either XS or Inline::C; this precludes any reuse of the C > main() function (although your Inline::C wrapper might recapitulate/copy > the main() function code). > > Out of curiosity, what pairwise alignment algorithm are you using? This > is a heavily beaten path, you might want to dig around first to see if > someone else already has what you need. > > -Aaron > > From slenk at emich.edu Fri May 19 10:42:41 2006 From: slenk at emich.edu (Stephen Gordon Lenk) Date: Fri, 19 May 2006 10:42:41 -0400 Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline C? Message-ID: There is nothing wrong with a reasonable way that works - better not to put yourself down. Inline is good if you can get it to work for you - I have had issues with linking Inline to dynamic libraries. I believe Inline makes a file that has linkage characteristics specified. Try it and see, then tell people how you did it. My two cents. Another way to use exterior executables is popen3, then reading and writing to the pipes. I use it (primer3 and local lab automation code) - snippet follows: my $pid = 0; my $cancmd = 'cancmd.exe'; my $write = 0; my $read = 0; sub new { my $c = {}; $pid = open3(\*WTRFH, \*RDRFH, \*RDRFH, $cancmd); $write = *WTRFH; $read = *RDRFH; $write->autoflush(); bless $c; return $c; } Just write your request, then read it back - I make sure that each pair is a newline terminated text line - be sure you harvest the child pid when you are done. ----- Original Message ----- From: Adam Kraut Date: Thursday, May 18, 2006 5:07 pm Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline C? > I am currently using a pairwise alignment algorithm written in C > (not by > me). The program consists of a library of routines, structures, and > definitions which I do not want to spend a lot of time > abstracting. I > already have a hack method of writing the parameters and inputs I > want from > perl, calling the c program with system( ), and then parsing the > output in > Perl. Any good programmer would probably smack me but I'm just an > undergradand I needed to show my boss that this works in order to > spend more time on > it. > > So on to my question, what is the preferred method of extending > Bioperl to > use this algorithm? I have just read the XS tutorial and a bit > about Inline > C. Can I put the main function in my script using Inline, and > then just > point Inline at the rest of the C library? The program has several > C-structures that are semantically equivalent to Bioperl objects, > so just > need somewhere to start. I will spend some more time so that I > have a more > specific question, I just wanted a little feedback, this is my > first post to > the bioperl list. > > Thanks, > Adam > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hubert.prielinger at gmx.at Fri May 19 12:52:28 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 19 May 2006 10:52:28 -0600 Subject: [Bioperl-l] parsing xml output In-Reply-To: References: <009f01c67ad6$c359a560$0d00a8c0@PM> <360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu> Message-ID: <446DF7CC.5060509@gmx.at> hi, I wondered whether is it also possible in the xml output (either WU or NCBI - Blast) to get the species (taxononmy) for every hit, if I do a general search. regards Warren Gish wrote: > Right, the WU-BLAST tabbed output contains more fields. (See http:// > blast.wustl.edu/blast/tabular.html). > --Warren > > >> Whoops - sorry Warren - for some reason I had it in my mind that it >> was different. So the blastxml parser should work fine. The >> WUBLAST tab-delimited output is different than NCBI's -m8/9 though, >> right? >> >> -jason >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From staffa at niehs.nih.gov Fri May 19 14:12:47 2006 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Fri, 19 May 2006 14:12:47 -0400 Subject: [Bioperl-l] Reading GenBank Genomic File Annotation In-Reply-To: Message-ID: <7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov> Specifically: I have the document to which you refer, but have not seen this one thing I need in the printout of tags etc.: the values in this line; mRNA join(380..509,578..1913,7784..8649,9439..10200) Is that a location object? Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Information Technology Support Services Contract (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina > ---------- > From: Brian Osborne > Sent: Thursday, May 18, 2006 5:54 PM > To: Staffa, Nick (NIH/NIEHS) [C]; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Reading GenBank Genomic File Annotation > > Nick, > > Have you read the Feature-Annotation HOWTO? This would be a good starting > point... > > Brian O. > > > On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" > wrote: > > > Would like a fairly simple way to extract certain information from Genbank > > Genomic File Annotations. > > Namely the six D.melanogaster sequences. > > Specifically to find gene entries and learn the gene name, begin and end and > > CDS. > > Please point me to appropriate modules and documentation. > > > > > > Nick Staffa > > Telephone: 919-316-4569 (NIEHS: 6-4569) > > Scientific Computing Support Group > > NIEHS Information Technology Support Services Contract > > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) > > National Institute of Environmental Health Sciences > > National Institutes of Health > > Research Triangle Park, North Carolina > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From chandan.kr.singh at gmail.com Fri May 19 14:37:26 2006 From: chandan.kr.singh at gmail.com (CHANDAN SINGH) Date: Sat, 20 May 2006 00:07:26 +0530 Subject: [Bioperl-l] Reading GenBank Genomic File Annotation In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov> References: <7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov> Message-ID: <2d4f320605191137n11017ec0xe41a632a3c7ea9a9@mail.gmail.com> On 5/19/06, Staffa, Nick (NIH/NIEHS) [C] wrote: > > Specifically: > I have the document to which you refer, > but have not seen this one thing I need in the printout of tags etc.: > the values in this line; > mRNA join(380..509,578..1913,7784..8649,9439..10200) > Is that a location object? Yes it is a location object . If you want that as a string (this is what seems from ur mail ) , u just have to do this : $loc = $fet->location(); $loc_str = $loc->to_FTstring() ; Hope it helps. Chandan Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Information Technology Support Services Contract > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > > > ---------- > > From: Brian Osborne > > Sent: Thursday, May 18, 2006 5:54 PM > > To: Staffa, Nick (NIH/NIEHS) [C]; bioperl-l at lists.open-bio.org > > Subject: Re: [Bioperl-l] Reading GenBank Genomic File Annotation > > > > Nick, > > > > Have you read the Feature-Annotation HOWTO? This would be a good > starting > > point... > > > > Brian O. > > > > > > On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" > > > wrote: > > > > > Would like a fairly simple way to extract certain information from > Genbank > > > Genomic File Annotations. > > > Namely the six D.melanogaster sequences. > > > Specifically to find gene entries and learn the gene name, begin and > end and > > > CDS. > > > Please point me to appropriate modules and documentation. > > > > > > > > > Nick Staffa > > > Telephone: 919-316-4569 (NIEHS: 6-4569) > > > Scientific Computing Support Group > > > NIEHS Information Technology Support Services Contract > > > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) > > > National Institute of Environmental Health Sciences > > > National Institutes of Health > > > Research Triangle Park, North Carolina > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From osborne1 at optonline.net Fri May 19 15:39:36 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 19 May 2006 15:39:36 -0400 Subject: [Bioperl-l] Reading GenBank Genomic File Annotation In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov> Message-ID: Nick, This is from the HOWTO: Another way of describing a feature in Genbank involves multiple start and end positions. These could be called "split" locations, and a very common example is the join statement in the CDS feature found in Genbank entries (e.g. join(45..122,233..267)). This calls for a specialized object, Bio::Location::SplitLocationI, which is a container for Location objects: for my $feature ($seqobj->top_SeqFeatures){ if ( $feature->location->isa('Bio::Location::SplitLocationI') && $feature->primary_tag eq 'CDS' ) { for my $location ( $feature->location->sub_Location ) { print $location->start . ".." . $location->end . "\n"; } } } Brian O. On 5/19/06 2:12 PM, "Staffa, Nick (NIH/NIEHS) [C]" wrote: > Specifically: > I have the document to which you refer, > but have not seen this one thing I need in the printout of tags etc.: > the values in this line; > mRNA join(380..509,578..1913,7784..8649,9439..10200) > Is that a location object? > > > > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Information Technology Support Services Contract > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > >> ---------- >> From: Brian Osborne >> Sent: Thursday, May 18, 2006 5:54 PM >> To: Staffa, Nick (NIH/NIEHS) [C]; bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Reading GenBank Genomic File Annotation >> >> Nick, >> >> Have you read the Feature-Annotation HOWTO? This would be a good starting >> point... >> >> Brian O. >> >> >> On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" >> wrote: >> >>> Would like a fairly simple way to extract certain information from Genbank >>> Genomic File Annotations. >>> Namely the six D.melanogaster sequences. >>> Specifically to find gene entries and learn the gene name, begin and end and >>> CDS. >>> Please point me to appropriate modules and documentation. >>> >>> >>> Nick Staffa >>> Telephone: 919-316-4569 (NIEHS: 6-4569) >>> Scientific Computing Support Group >>> NIEHS Information Technology Support Services Contract >>> (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) >>> National Institute of Environmental Health Sciences >>> National Institutes of Health >>> Research Triangle Park, North Carolina >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Fri May 19 16:42:09 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 19 May 2006 14:42:09 -0600 Subject: [Bioperl-l] parsing xml output In-Reply-To: References: <009f01c67ad6$c359a560$0d00a8c0@PM> <360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu> <446DF7CC.5060509@gmx.at> Message-ID: <446E2DA1.1050503@gmx.at> hi warren, that means if I alter the DTD (if that is possible) by adding the taxonomic id to the DTD..... then I should have the taxonomic id tag in the xml file (theoretically) but I guess this is only possible with a local search (blastall) but not with an online search. greetings Warren Gish wrote: > > On May 19, 2006, at 11:52 AM, Hubert Prielinger wrote: > >> hi, >> I wondered whether is it also possible in the xml output (either WU >> or NCBI - Blast) to get the species (taxononmy) for every hit, if I >> do a general search. >> regards >> > The taxonomic id is not an entity in the NCBI XML DTD. If the > information was embedded in deflines, one could conceivably parse for > it, but I believe the NCBI only distributes taxids in their ASN.1 data > and in their pre-formated BLAST databases, and NCBI BLAST only reports > taxids in its ASN.1 output format, where taxid is available as an entity. > > --Warren > > From cjfields at uiuc.edu Fri May 19 16:56:56 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Fri, 19 May 2006 15:56:56 -0500 Subject: [Bioperl-l] parsing xml output Message-ID: <5c1c5a79.bb0af5aa.8198d00@expms6.cites.uiuc.edu> You'll have to pull the GI or accession from each hit and do a lookup by either grabbing the sequence and using Bio::Species or use Bio::DB::Taxonomy; there isn't any tax information directly incorporated into BLAST reports AFAIK. Chris ---- Original message ---- >Date: Fri, 19 May 2006 10:52:28 -0600 >From: Hubert Prielinger >Subject: Re: [Bioperl-l] parsing xml output >To: Warren Gish , bioperl-l at bioperl.org > >hi, >I wondered whether is it also possible in the xml output (either WU or >NCBI - Blast) to get the species (taxononmy) for every hit, if I do a >general search. >regards > >Warren Gish wrote: >> Right, the WU-BLAST tabbed output contains more fields. (See http:// >> blast.wustl.edu/blast/tabular.html). >> --Warren >> >> >>> Whoops - sorry Warren - for some reason I had it in my mind that it >>> was different. So the blastxml parser should work fine. The >>> WUBLAST tab-delimited output is different than NCBI's -m8/9 though, >>> right? >>> >>> -jason >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri May 19 16:59:35 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Fri, 19 May 2006 15:59:35 -0500 Subject: [Bioperl-l] parsing xml output Message-ID: <65932c77.bb0b33b0.8253400@expms6.cites.uiuc.edu> Um, I don't think it works that way. I'm pretty sure the XML is generated from the ASN1 output. I don't think (like Warren says) that you can directly get to the tax information. Indirectly is another matter... Chris ---- Original message ---- >Date: Fri, 19 May 2006 14:42:09 -0600 >From: Hubert Prielinger >Subject: Re: [Bioperl-l] parsing xml output >To: Warren Gish , bioperl-l at bioperl.org > >hi warren, >that means if I alter the DTD (if that is possible) by adding the >taxonomic id to the DTD..... then I should have the taxonomic id tag in >the xml file (theoretically) >but I guess this is only possible with a local search (blastall) but not >with an online search. > >greetings > >Warren Gish wrote: >> >> On May 19, 2006, at 11:52 AM, Hubert Prielinger wrote: >> >>> hi, >>> I wondered whether is it also possible in the xml output (either WU >>> or NCBI - Blast) to get the species (taxononmy) for every hit, if I >>> do a general search. >>> regards >>> >> The taxonomic id is not an entity in the NCBI XML DTD. If the >> information was embedded in deflines, one could conceivably parse for >> it, but I believe the NCBI only distributes taxids in their ASN.1 data >> and in their pre-formated BLAST databases, and NCBI BLAST only reports >> taxids in its ASN.1 output format, where taxid is available as an entity. >> >> --Warren >> >> > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Fri May 19 17:30:20 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 19 May 2006 15:30:20 -0600 Subject: [Bioperl-l] parsing xml output In-Reply-To: <446E3854.5010708@gmx.at> References: <5c1c5a79.bb0af5aa.8198d00@expms6.cites.uiuc.edu> <446E3854.5010708@gmx.at> Message-ID: <446E38EC.9020100@gmx.at> ok, thanks, it appears that I only need the species where the Protein is derived from, so I guess Bio:Species would satisfy me, or? and it would work that I just pull off the accession from the blast output file and then assign the accession code and get as return value the species name. is it possible to just assign the accession code, because I looked up but they were always talking of the entire file. regards > > > Christopher Fields wrote: >> You'll have to pull the GI or accession from each hit and do a lookup >> by either grabbing the sequence and using Bio::Species or use >> Bio::DB::Taxonomy; there isn't any tax information directly >> incorporated into BLAST reports AFAIK. >> >> Chris >> >> ---- Original message ---- >> >>> Date: Fri, 19 May 2006 10:52:28 -0600 >>> From: Hubert Prielinger Subject: Re: >>> [Bioperl-l] parsing xml output To: Warren Gish >>> , bioperl-l at bioperl.org >>> >>> hi, >>> I wondered whether is it also possible in the xml output (either WU >>> or NCBI - Blast) to get the species (taxononmy) for every hit, if I >>> do a general search. >>> regards >>> >>> Warren Gish wrote: >>> >>>> Right, the WU-BLAST tabbed output contains more fields. (See >>>> http:// blast.wustl.edu/blast/tabular.html). >>>> --Warren >>>> >>>> >>>>> Whoops - sorry Warren - for some reason I had it in my mind that >>>>> it was different. So the blastxml parser should work fine. The >>>>> WUBLAST tab-delimited output is different than NCBI's -m8/9 >>>>> though, right? >>>>> >>>>> -jason >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > From jason.stajich at duke.edu Fri May 19 18:40:54 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri, 19 May 2006 18:40:54 -0400 Subject: [Bioperl-l] parsing xml output In-Reply-To: <446E38EC.9020100@gmx.at> References: <5c1c5a79.bb0af5aa.8198d00@expms6.cites.uiuc.edu> <446E3854.5010708@gmx.at> <446E38EC.9020100@gmx.at> Message-ID: There is a gi2taxid table in the /pub/taxonomy part of NCBI FTP site (ftp.ncbi.nih.gov) -- I have used this to take GI numbers from report and get taxonomy for overall classification. I think something like this exists in the scripts or examples directory in the bioperl distro. I know I posted about it when I wrote about it a while ago. -jason On May 19, 2006, at 5:30 PM, Hubert Prielinger wrote: > ok, thanks, > it appears that I only need the species where the Protein is derived > from, so I guess Bio:Species would satisfy me, or? > and it would work that I just pull off the accession from the blast > output file and then assign the accession code and get as return value > the species name. > is it possible to just assign the accession code, because I looked up > but they were always talking of the entire file. > > regards >> >> >> Christopher Fields wrote: >>> You'll have to pull the GI or accession from each hit and do a >>> lookup >>> by either grabbing the sequence and using Bio::Species or use >>> Bio::DB::Taxonomy; there isn't any tax information directly >>> incorporated into BLAST reports AFAIK. >>> >>> Chris >>> >>> ---- Original message ---- >>> >>>> Date: Fri, 19 May 2006 10:52:28 -0600 >>>> From: Hubert Prielinger Subject: Re: >>>> [Bioperl-l] parsing xml output To: Warren Gish >>>> , bioperl-l at bioperl.org >>>> >>>> hi, >>>> I wondered whether is it also possible in the xml output (either WU >>>> or NCBI - Blast) to get the species (taxononmy) for every hit, if I >>>> do a general search. >>>> regards >>>> >>>> Warren Gish wrote: >>>> >>>>> Right, the WU-BLAST tabbed output contains more fields. (See >>>>> http:// blast.wustl.edu/blast/tabular.html). >>>>> --Warren >>>>> >>>>> >>>>>> Whoops - sorry Warren - for some reason I had it in my mind that >>>>>> it was different. So the blastxml parser should work fine. The >>>>>> WUBLAST tab-delimited output is different than NCBI's -m8/9 >>>>>> though, right? >>>>>> >>>>>> -jason >>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From ewijaya at i2r.a-star.edu.sg Sat May 20 08:36:44 2006 From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward) Date: Sat, 20 May 2006 20:36:44 +0800 Subject: [Bioperl-l] Method for checking Sequence type of a file Message-ID: <30362db229c.446f7ddc@i2r.a-star.edu.sg> Dear expert, Is there any Bioperl method that allows you to check verify sequence type in a file? For example, given a file we wish to check (return true or false) whether it is in FASTA format, GENBANK format, etc. This method is useful in web application as taint checking procedure. Regards, Edward WIJAYA SINGAPORE ------------ Institute For Infocomm Research - Disclaimer ------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you. -------------------------------------------------------- From aaron.j.mackey at gsk.com Fri May 19 09:33:01 2006 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Fri, 19 May 2006 09:33:01 -0400 Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline C? In-Reply-To: <134ede0b0605181407l52d1c2c3x79dd7f177ae7b828@mail.gmail.com> Message-ID: bioperl-ext is the package in which alignment algorithms and/or BioPerl "wrapped" external C libraries live. Subprojects in bioperl-ext use both XS and Inline::C, that's up to you. You'll need to get your C code compiled to a dynamically loaded library (.so) to use either XS or Inline::C; this precludes any reuse of the C main() function (although your Inline::C wrapper might recapitulate/copy the main() function code). Out of curiosity, what pairwise alignment algorithm are you using? This is a heavily beaten path, you might want to dig around first to see if someone else already has what you need. -Aaron bioperl-l-bounces at lists.open-bio.org wrote on 05/18/2006 05:07:42 PM: > I am currently using a pairwise alignment algorithm written in C (not by > me). The program consists of a library of routines, structures, and > definitions which I do not want to spend a lot of time abstracting. I > already have a hack method of writing the parameters and inputs I want from > perl, calling the c program with system( ), and then parsing the output in > Perl. Any good programmer would probably smack me but I'm just an undergrad > and I needed to show my boss that this works in order to spend more time on > it. > > So on to my question, what is the preferred method of extending Bioperl to > use this algorithm? I have just read the XS tutorial and a bit about Inline > C. Can I put the main function in my script using Inline, and then just > point Inline at the rest of the C library? The program has several > C-structures that are semantically equivalent to Bioperl objects, so just > need somewhere to start. I will spend some more time so that I have a more > specific question, I just wanted a little feedback, this is my first post to > the bioperl list. > > Thanks, > Adam > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at duke.edu Sat May 20 10:50:17 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Sat, 20 May 2006 10:50:17 -0400 Subject: [Bioperl-l] Method for checking Sequence type of a file In-Reply-To: <30362db229c.446f7ddc@i2r.a-star.edu.sg> References: <30362db229c.446f7ddc@i2r.a-star.edu.sg> Message-ID: Try Bio::Tools::GuessSeqFormat On May 20, 2006, at 8:36 AM, Wijaya Edward wrote: > > Dear expert, > > Is there any Bioperl method that allows > you to check verify sequence type in a file? > > For example, given a file we wish > to check (return true or false) whether > it is in FASTA format, GENBANK format, etc. > > This method is useful in web application > as taint checking procedure. > > Regards, > Edward WIJAYA > SINGAPORE > > > ------------ Institute For Infocomm Research - Disclaimer > ------------- > This email is confidential and may be privileged. If you are not > the intended recipient, please delete it and notify us immediately. > Please do not copy or use it for any purpose, or disclose its > contents to any other person. Thank you. > -------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From chen_li3 at yahoo.com Sat May 20 20:15:01 2006 From: chen_li3 at yahoo.com (chen li) Date: Sat, 20 May 2006 17:15:01 -0700 (PDT) Subject: [Bioperl-l] problems iwth Bio::graphics module Message-ID: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com> Dear all, I try one script from GraphicsHowTo under Cygwin environment(GD and libpng already installed). I type this line in Cygwin X window: $ perl render_blast1.pl data1.txt | display - And here is the result: display: no decode delegate for this image format `/tmp/magick-qKiRPDRS'. Any idea? Thank you very much, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From osborne1 at optonline.net Sat May 20 20:59:06 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Sat, 20 May 2006 20:59:06 -0400 Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com> Message-ID: Chen, Not sure. However, whenever I see a new or incomprehensible error message like "display: no decode delegate for this image format" I Google it. Brian O. On 5/20/06 8:15 PM, "chen li" wrote: > Dear all, > > > I try one script from GraphicsHowTo under Cygwin > environment(GD and libpng already installed). I type > this line in Cygwin X window: > > > $ perl render_blast1.pl data1.txt | display - > > And here is the result: > > display: no decode delegate for this image format > `/tmp/magick-qKiRPDRS'. > > Any idea? > > > Thank you very much, > > Li > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From n.saunders at uq.edu.au Sun May 21 18:17:44 2006 From: n.saunders at uq.edu.au (Neil Saunders) Date: Mon, 22 May 2006 08:17:44 +1000 Subject: [Bioperl-l] problems with Bio::Graph Message-ID: <4470E708.3070402@uq.edu.au> dear all, I am having some problems with the Bio::Graph modules. Running Bioperl 1.5.0 RC1 with Ubuntu 5.10 i686. I would like to parse files in PSI MI XML 2.5 format and for selected proteins, get the Uniprot accession of interacting partners (this is outlined in the documentation for Bio::Graph::ProteinGraph). I wrote a very simple test script and ran it on a selection of XML files. The script is simply: ---------------------------------------------------------------- use strict; use Bio::Graph::IO; my $mifile = shift || die("Usage = biograph.pl \n"); my $graphio = Bio::Graph::IO->new('-file' => $mifile, '-format' => 'psi_xml'); my $gr = $graphio->next_network; ---------------------------------------------------------------- Here's a summary of the error messages with some sample files (I tried PSI MI XML versions 1 and 2.5): 1. MINT database 9707552_small.xml (PSI 2.5) Can't call method "att" on an undefined value at /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173. 2. IntAct database yeast_small-11.xml (PSI 2.5) Can't call method "att" on an undefined value at /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173. 3. IntAct database yeast_small-11.xml (PSI 1) Use of uninitialized value in string eq at /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 126. 4. DIP files Scere20060402.mif, Ecoli20060402.mif (PSI 1) These give no errors 5. DIP file dip20060402.mif (PSI 1, complete dataset) ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Invalid species name 'immunodeficiency virus type 1, HIV-1' STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.7/Bio/Root/Root.pm:328 STACK: Bio::Species::validate_species_name /usr/local/share/perl/5.8.7/Bio/Species.pm:340 STACK: Bio::Species::classification /usr/local/share/perl/5.8.7/Bio/Species.pm:170 STACK: Bio::Species::new /usr/local/share/perl/5.8.7/Bio/Species.pm:118 STACK: Bio::Graph::IO::psi_xml::_proteinInteractor /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:105 STACK: XML::Twig::_twig_end /usr/share/perl5/XML/Twig.pm:1473 STACK: XML::Parser::Expat::parse /usr/lib/perl5/XML/Parser/Expat.pm:469 STACK: XML::Parser::parse /usr/lib/perl5/XML/Parser.pm:187 STACK: XML::Parser::parsefile /usr/lib/perl5/XML/Parser.pm:233 STACK: Bio::Graph::IO::psi_xml::next_network /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:79 STACK: ./biograph.pl:18 ----------------------------------------------------------- Looking at the module code, it seems that the first 2 errors relate to a parameter "proteinInteractorRef", found in PSI MI version 1 but not version 2.5. Error 3 I haven't yet figured out. DIP PSI MI XML version 1 for single species seems OK, but it seems there are species names in the complete dataset that cause problems (error 5). Is the CVS version of Bio::Graph any better at handling PSI MI XML? Are there plans to get it to work with version 2.5 files from all sources (MINT and IntAct) ? Googling and checking the list archives didn't give a lot of hits which made me think it's not a widely-used module. thanks, Neil -- School of Molecular and Microbial Sciences University of Queensland Brisbane 4072 Australia http://psychro.bioinformatics.unsw.edu.au/neil From torsten.seemann at infotech.monash.edu.au Sun May 21 21:31:56 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Mon, 22 May 2006 11:31:56 +1000 Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com> References: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com> Message-ID: <4471148C.5090404@infotech.monash.edu.au> > I try one script from GraphicsHowTo under Cygwin > environment(GD and libpng already installed). I type > this line in Cygwin X window: > $ perl render_blast1.pl data1.txt | display - > display: no decode delegate for this image format > `/tmp/magick-qKiRPDRS'. You are piping the output of the Perl script (which is a GIF/PNG image) into the input of a program called "display". This program is part of the ImageMagick toolkit, standard on most Linux installations. Because you are using Windows you probably don't have it installed! Try this: $ perl render_blast1.pl data1.txt > image.gif Then load 'image.gif' into whatever your favourite image viewer is. -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From darin.london at duke.edu Mon May 22 11:29:45 2006 From: darin.london at duke.edu (Darin London) Date: Mon, 22 May 2006 11:29:45 -0400 Subject: [Bioperl-l] BOSC 2006 2nd Call for Papers In-Reply-To: <4471CE49.80109@duke.edu> References: <44294B65.4050207@duke.edu> <4471CE49.80109@duke.edu> Message-ID: <4471D8E9.8090109@duke.edu> 2nd CALL FOR SPEAKERS This is the second and last official call for speakers to submit their abstracts to speak at BOSC 2006 in Fortaleza, Brasil. In order to be considered as a potential speaker, an abstract must be recieved by Monday, June 5th, 2006. We look forward to a great conference this year. Please consult The Official BOSC 2006 Website at: http://www.open-bio.org/wiki/BOSC_2006 for more details and information. In addition, a BOSC weblog has been setup to make it easier to desiminate all BOSC related announcements: http://wiki.open-bio.org/boscblog/ And if you have an ICAL compatible Calendar, there is an EventDB calendar set up with all BOSC related deadlines. http://eventful.com/groups/G0-001-000014747-0 More information about ISMB can be found at the Official ISMB 2006 Website: http://ismb2006.cbi.cnptia.embrapa.br/ Thank You, and we look forward to seeing you all, The BOSC Organizing Committee. From darin.london at duke.edu Mon May 22 12:00:55 2006 From: darin.london at duke.edu (Darin London) Date: Mon, 22 May 2006 09:00:55 -0700 Subject: [Bioperl-l] [Bioperl-announce-l] BOSC 2006 2nd Call for Papers In-Reply-To: <4471CE49.80109@duke.edu> References: <44294B65.4050207@duke.edu> <4471CE49.80109@duke.edu> Message-ID: <000301c67db8$e8391f70$6400a8c0@CodonSolutions.local> 2nd CALL FOR SPEAKERS This is the second and last official call for speakers to submit their abstracts to speak at BOSC 2006 in Fortaleza, Brasil. In order to be considered as a potential speaker, an abstract must be recieved by Monday, June 5th, 2006. We look forward to a great conference this year. Please consult The Official BOSC 2006 Website at: http://www.open-bio.org/wiki/BOSC_2006 for more details and information. In addition, a BOSC weblog has been setup to make it easier to desiminate all BOSC related announcements: http://wiki.open-bio.org/boscblog/ And if you have an ICAL compatible Calendar, there is an EventDB calendar set up with all BOSC related deadlines. http://eventful.com/groups/G0-001-000014747-0 More information about ISMB can be found at the Official ISMB 2006 Website: http://ismb2006.cbi.cnptia.embrapa.br/ Thank You, and we look forward to seeing you all, The BOSC Organizing Committee. _______________________________________________ Bioperl-announce-l mailing list Bioperl-announce-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l From osborne1 at optonline.net Mon May 22 17:37:50 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Mon, 22 May 2006 17:37:50 -0400 Subject: [Bioperl-l] problems with Bio::Graph In-Reply-To: <4470E708.3070402@uq.edu.au> Message-ID: Neil, Let me propose an alternative. In the past few months I've been working on a Bioperl package for handling protein interaction networks, it is called bioperl-network. It's similar to the Bio::Graph modules, except for the following: - It does not use Nat Goodman's SimpleGraph, it uses Perl's Graph. The advantage is that we are not responsible for maintaining the algorithm code, the disadvantage is that Graph has some bugs but Jarkko Hietaniemi has been working on these and has fixed some significant ones recently. - It uses names and concepts from Graph. It also has separate notions of edge and interaction, where one edge can have one or more interactions. - It uses more method names and conventions borrowed from interaction databases and PSI MI. For example, a node can be a protein complex composed of multiple Seq objects, not just a protein. This package is a makeover of Bio::Graph, therefore Nat Goodman and Richard Adams are major contributors to it. It's also worth mentioning that it's not complete, meaning it won't parse all fields from PSI MI 2 or 2.5 but I think it should be able to handle the code you've shown (and if it cannot then I'll see that it's fixed). I don't know about PSI MI version 1 but if I'm not mistaken there's a version 1 -> version 2 converter. I'm about to put this into CVS so you can take a look, should you choose to. Brian O. On 5/21/06 6:17 PM, "Neil Saunders" wrote: > dear all, > > I am having some problems with the Bio::Graph modules. Running Bioperl 1.5.0 > RC1 with Ubuntu 5.10 i686. > > I would like to parse files in PSI MI XML 2.5 format and for selected > proteins, > get the Uniprot accession of interacting partners (this is outlined in the > documentation for Bio::Graph::ProteinGraph). I wrote a very simple test > script > and ran it on a selection of XML files. The script is simply: > > ---------------------------------------------------------------- > use strict; > use Bio::Graph::IO; > > my $mifile = shift || die("Usage = biograph.pl \n"); > my $graphio = Bio::Graph::IO->new('-file' => $mifile, > '-format' => 'psi_xml'); > my $gr = $graphio->next_network; > ---------------------------------------------------------------- > > Here's a summary of the error messages with some sample files (I tried PSI MI > XML versions 1 and 2.5): > > 1. MINT database 9707552_small.xml (PSI 2.5) > Can't call method "att" on an undefined value at > /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173. > > 2. IntAct database yeast_small-11.xml (PSI 2.5) > Can't call method "att" on an undefined value at > /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173. > > 3. IntAct database yeast_small-11.xml (PSI 1) > Use of uninitialized value in string eq at > /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 126. > > 4. DIP files Scere20060402.mif, Ecoli20060402.mif (PSI 1) > These give no errors > > 5. DIP file dip20060402.mif (PSI 1, complete dataset) > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Invalid species name 'immunodeficiency virus type 1, HIV-1' > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.7/Bio/Root/Root.pm:328 > STACK: Bio::Species::validate_species_name > /usr/local/share/perl/5.8.7/Bio/Species.pm:340 > STACK: Bio::Species::classification > /usr/local/share/perl/5.8.7/Bio/Species.pm:170 > STACK: Bio::Species::new /usr/local/share/perl/5.8.7/Bio/Species.pm:118 > STACK: Bio::Graph::IO::psi_xml::_proteinInteractor > /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:105 > STACK: XML::Twig::_twig_end /usr/share/perl5/XML/Twig.pm:1473 > STACK: XML::Parser::Expat::parse /usr/lib/perl5/XML/Parser/Expat.pm:469 > STACK: XML::Parser::parse /usr/lib/perl5/XML/Parser.pm:187 > STACK: XML::Parser::parsefile /usr/lib/perl5/XML/Parser.pm:233 > STACK: Bio::Graph::IO::psi_xml::next_network > /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:79 > STACK: ./biograph.pl:18 > ----------------------------------------------------------- > > > Looking at the module code, it seems that the first 2 errors relate to a > parameter "proteinInteractorRef", found in PSI MI version 1 but not version > 2.5. > Error 3 I haven't yet figured out. DIP PSI MI XML version 1 for single > species seems OK, but it seems there are species names in the complete dataset > that cause problems (error 5). > > > Is the CVS version of Bio::Graph any better at handling PSI MI XML? Are there > plans to get it to work with version 2.5 files from all sources (MINT and > IntAct) ? Googling and checking the list archives didn't give a lot of hits > which made me think it's not a widely-used module. > > thanks, > Neil From torsten.seemann at infotech.monash.edu.au Mon May 22 17:53:02 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 23 May 2006 07:53:02 +1000 Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <20060522132553.21896.qmail@web36804.mail.mud.yahoo.com> References: <20060522132553.21896.qmail@web36804.mail.mud.yahoo.com> Message-ID: <447232BE.1080001@infotech.monash.edu.au> Chen Li > perl render_blast1.pl data1.txt >im.png Based on http://bioperl.org/wiki/HOWTO:Graphics I believe the example script is creating a PNG image. The last line is: print $panel->png; > and Perl runs without any problem. I use adobe > photoshop to open them and Adobe can't recognize them. > If I use ACDSee to open them I only get a black > background. If I issue this line under Cygwin X window > display im.png or display im.gif > Cygwin says: > display: Improper image header `im.png'. > It seems Perl can't produce an image with right > format. Are you sure Perl is producing a PNG file at all? How many bytes does im.png use? Zero? Did you notice this in http://bioperl.org/wiki/HOWTO:Graphics ? It says: "If you are on a Windows platform, you need to put STDOUT into binary mode so that the PNG file does not go through Window's carriage return/linefeed transformations. Before the final print statement, put the statement binmode(STDOUT)." ie. your script should have binmode(STDOUT); print $panel->png; as the last 2 lines. > Do you experience the same problem before? No. --Torsten From chen_li3 at yahoo.com Mon May 22 09:25:53 2006 From: chen_li3 at yahoo.com (chen li) Date: Mon, 22 May 2006 06:25:53 -0700 (PDT) Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <4471148C.5090404@infotech.monash.edu.au> Message-ID: <20060522132553.21896.qmail@web36804.mail.mud.yahoo.com> Dear Dr. Seemann, Thank you very much for the reply. I issue this line: perl render_blast1.pl data1.txt >im.gif or perl render_blast1.pl data1.txt >im.png and Perl runs without any problem. I use adobe photoshop to open them and Adobe can't recognize them. If I use ACDSee to open them I only get a black background. If I issue this line under Cygwin X window display im.png or display im.gif Cygwin says: display: Improper image header `im.png'. or display: Improper image header `im.gif'. It seems Perl can't produce an image with right format. Do you experience the same problem before? Li --- Torsten Seemann wrote: > > I try one script from GraphicsHowTo under Cygwin > > environment(GD and libpng already installed). I > type > > this line in Cygwin X window: > > $ perl render_blast1.pl data1.txt | display - > > display: no decode delegate for this image format > > `/tmp/magick-qKiRPDRS'. > > You are piping the output of the Perl script (which > is a GIF/PNG image) > into the input of a program called "display". This > program is part of > the ImageMagick toolkit, standard on most Linux > installations. Because > you are using Windows you probably don't have it > installed! Try this: > > $ perl render_blast1.pl data1.txt > image.gif > > Then load 'image.gif' into whatever your favourite > image viewer is. > > -- > Dr Torsten Seemann > http://www.vicbioinformatics.com > Victorian Bioinformatics Consortium, Monash > University, Australia > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From chen_li3 at yahoo.com Mon May 22 18:57:42 2006 From: chen_li3 at yahoo.com (chen li) Date: Mon, 22 May 2006 15:57:42 -0700 (PDT) Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <447232BE.1080001@infotech.monash.edu.au> Message-ID: <20060522225742.78245.qmail@web36804.mail.mud.yahoo.com> Hi, I try both: either with or without this statement binmode(STDOUT) before the last line print $panel->png; But there are no differenes. I get a file of 2432 bytes. Li > Chen Li > > > perl render_blast1.pl data1.txt >im.png > > Based on http://bioperl.org/wiki/HOWTO:Graphics I > believe the example > script is creating a PNG image. The last line is: > print $panel->png; > > > and Perl runs without any problem. I use adobe > > photoshop to open them and Adobe can't recognize > them. > > If I use ACDSee to open them I only get a black > > background. If I issue this line under Cygwin X > window > > display im.png or display im.gif > > Cygwin says: > > display: Improper image header `im.png'. > > It seems Perl can't produce an image with right > > format. > > Are you sure Perl is producing a PNG file at all? > How many bytes does im.png use? Zero? > > Did you notice this in > http://bioperl.org/wiki/HOWTO:Graphics ? > > It says: "If you are on a Windows platform, you need > to put STDOUT into > binary mode so that the PNG file does not go through > Window's carriage > return/linefeed transformations. Before the final > print statement, put > the statement binmode(STDOUT)." > > ie. your script should have > > binmode(STDOUT); > print $panel->png; > > as the last 2 lines. > > > Do you experience the same problem before? > > No. > > --Torsten > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From barry.moore at genetics.utah.edu Mon May 22 21:00:06 2006 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Mon, 22 May 2006 19:00:06 -0600 Subject: [Bioperl-l] Problems with Unflattener.pm Message-ID: <729FFBBD-955B-4689-8A27-66733E81431C@genetics.utah.edu> Hi All, NT_113910 appears to throw Bio::SeqFeatures::Tools::Unflattener into an infinite recursive loop. The trouble occurs in the method find_best_matches between lines 2258 and 2281, and in particular the loop is perpetuated by line 2273. NT_113910 has a fairly complex features table, and but I have as yet been unable to figure out why this loop is not exiting properly. This has been submitted to bugzilla, but I?ll post here so it gets documented on the list also. Any suggestions from Chris or others would be greatly appreciated. This problem can be recreated as follows: Grab NT_113910 from genbank. bp_fetch.pl -fmt genbank net::genbank:NT_113910 > NT_113910.gbk Pass NT_113910.gbk on the command line to the attached script. #!/usr/bin/perl; use strict; use warnings; use Bio::SeqIO; use Bio::SeqFeature::Tools::Unflattener; my $file = shift; # generate an Unflattener object my $unflattener = Bio::SeqFeature::Tools::Unflattener->new; #$unflattener->verbose(1); # first fetch a genbank SeqI object my $seqio = Bio::SeqIO->new(-file => $file, -format => 'GenBank'); my $out = Bio::SeqIO->new(-format => 'asciitree'); while (my $seq = $seqio->next_seq()) { # get top level unflattended SeqFeatureI objects $unflattener->unflatten_seq(-seq => $seq, -use_magic => 1); $out->write_seq($seq); } From miker at biotiquesystems.com Mon May 22 19:56:52 2006 From: miker at biotiquesystems.com (Michael Rogoff) Date: Mon, 22 May 2006 16:56:52 -0700 Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version Message-ID: <002a01c67dfb$663cc600$c100a8c0@mike> As best as I can tell, using Bio::SeqIO to parse a uniprot file ignores the sequence version, and calling seq_version() on the resulting RichSeq object returns undef. It looks like swiss.pm is trying to parse the version out of the SV line, which apparently doesn't exist any more? The sequence version(s) are now specified as part of the Date (DT) lines. Is this not a bug? Is swiss.pm not designed to parse uniprot files? Thanks for any help ... From jason.stajich at duke.edu Mon May 22 21:37:13 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon, 22 May 2006 21:37:13 -0400 Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version In-Reply-To: <002a01c67dfb$663cc600$c100a8c0@mike> References: <002a01c67dfb$663cc600$c100a8c0@mike> Message-ID: Sounds like a "missing feature" =) AFAIK the module was only written for swissprot files. It is possible there have been changes in the format that have not been tracked to the current code. We'd certainly appreciate someone testing it out as versions evolve. If you submit a bug to bugzilla with version of bioperl and example files you can track when a fix is in. We of course appreciate anyone's efforts to provide a patch as most bugs get fixed of late when someone gets "itchy" enough to fix them. -jason On May 22, 2006, at 7:56 PM, Michael Rogoff wrote: > > As best as I can tell, using Bio::SeqIO to parse a uniprot file > ignores the > sequence version, and calling seq_version() on the resulting > RichSeq object > returns undef. > > It looks like swiss.pm is trying to parse the version out of the SV > line, which > apparently doesn't exist any more? The sequence version(s) are now > specified as > part of the Date (DT) lines. > > Is this not a bug? Is swiss.pm not designed to parse uniprot files? > > Thanks for any help ... > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From jason.stajich at duke.edu Mon May 22 22:04:17 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon, 22 May 2006 22:04:17 -0400 Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version In-Reply-To: <003301c67e0b$5dd44410$c100a8c0@mike> References: <003301c67e0b$5dd44410$c100a8c0@mike> Message-ID: <3607997C-DAD4-4E0E-A919-7D9212AC6D50@duke.edu> We ask that people post patches to the bugzilla as an attachment to the bugzilla so we can track what and why the bug was that the patch fixes. I am not totally sure this patch works because it seems like we need to strip out more information now from the DT line if the $date actually contains more information than just the date. If you would go ahead and create a bug in bugzilla for this (http:// bugzilla.open-bio.org) this sort of conversation can be tracked to the bug. If any of this is unclear please let us know - I though we had put some pages up about this sort of thing on the wiki but maybe they need to be expanded. -jason On May 22, 2006, at 9:51 PM, Michael Rogoff wrote: > I have a patch that seems to work but I'm not familiar with the > proper method to > "provide" it. How do I go about that? > > The patch is pretty simple, it just parses the sequence version out > of the date > line where it now hides: > > #date > elsif( /^DT\s+(.*)/ ) { > my $date = $1; > + > + if ($date =~ /sequence version (\d+)/i) { > + $params{'-seq_version'} ||= $1; > + } > + > $date =~ s/\;//; > $date =~ s/\s+$//; > push @{$params{'-dates'}}, $date; > } > > By the way, what is the difference between Bio::Seq::version and > Bio::Seq::RichSeq::seq_version? > > >> -----Original Message----- >> From: Jason Stajich [mailto:jason.stajich at duke.edu] >> Sent: Monday, May 22, 2006 6:37 PM >> To: Michael Rogoff >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version >> >> >> Sounds like a "missing feature" =) >> >> AFAIK the module was only written for swissprot files. It is >> possible there have been changes in the format that have not been >> tracked to the current code. We'd certainly appreciate someone >> testing it out as versions evolve. If you submit a bug to bugzilla >> with version of bioperl and example files you can track when >> a fix is >> in. We of course appreciate anyone's efforts to provide a patch as >> most bugs get fixed of late when someone gets "itchy" enough to fix >> them. >> >> -jason >> >> On May 22, 2006, at 7:56 PM, Michael Rogoff wrote: >> >>> >>> As best as I can tell, using Bio::SeqIO to parse a uniprot file >>> ignores the >>> sequence version, and calling seq_version() on the resulting >>> RichSeq object >>> returns undef. >>> >>> It looks like swiss.pm is trying to parse the version out >> of the SV >>> line, which >>> apparently doesn't exist any more? The sequence version(s) >> are now >>> specified as >>> part of the Date (DT) lines. >>> >>> Is this not a bug? Is swiss.pm not designed to parse uniprot files? >>> >>> Thanks for any help ... >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From Marc.Logghe at DEVGEN.com Tue May 23 03:08:37 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Tue, 23 May 2006 09:08:37 +0200 Subject: [Bioperl-l] problems iwth Bio::graphics module Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746E4E@ANTARESIA.be.devgen.com> Hi Li, Did you check your script for any other print statements (to STDOUT, that is) that potentially could contaminate your png stream ? Marc > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chen li > Sent: Tuesday, May 23, 2006 12:58 AM > To: Torsten Seemann > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] problems iwth Bio::graphics module > > Hi, > > I try both: either with or without this statement > binmode(STDOUT) before the last line print $panel->png; But > there are no differenes. I get a file of 2432 bytes. > > Li > > > > > Chen Li > > > > > perl render_blast1.pl data1.txt >im.png > > > > Based on http://bioperl.org/wiki/HOWTO:Graphics I believe > the example > > script is creating a PNG image. The last line is: > > print $panel->png; > > > > > and Perl runs without any problem. I use adobe photoshop to open > > > them and Adobe can't recognize > > them. > > > If I use ACDSee to open them I only get a black background. If I > > > issue this line under Cygwin X > > window > > > display im.png or display im.gif > > > Cygwin says: > > > display: Improper image header `im.png'. > > > It seems Perl can't produce an image with right format. > > > > Are you sure Perl is producing a PNG file at all? > > How many bytes does im.png use? Zero? > > > > Did you notice this in > > http://bioperl.org/wiki/HOWTO:Graphics ? > > > > It says: "If you are on a Windows platform, you need to put STDOUT > > into binary mode so that the PNG file does not go through Window's > > carriage return/linefeed transformations. Before the final print > > statement, put the statement binmode(STDOUT)." > > > > ie. your script should have > > > > binmode(STDOUT); > > print $panel->png; > > > > as the last 2 lines. > > > > > Do you experience the same problem before? > > > > No. > > > > --Torsten > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection > around http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From chen_li3 at yahoo.com Tue May 23 09:27:06 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 23 May 2006 06:27:06 -0700 (PDT) Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746E4E@ANTARESIA.be.devgen.com> Message-ID: <20060523132706.57245.qmail@web36811.mail.mud.yahoo.com> Dear Dr. Logghe, Thank you so much. I have the script worked after getting your suggestion under Cygwin. Here are the last two lines: either binmode (STDOUT); print STDOUT $panel->png; or only print STDOUT $panel->png; They both work for me. I know the default output in perl to the screen. I don't why it works if STDOUT after print is added. Could you explain it? BTW I copy this script from GraphicsHowTo on Bioperl website and only one line contains print statement, which is 'print $panel->png'. Once again thank you so much, Li --- Marc Logghe wrote: > Hi Li, > Did you check your script for any other print > statements (to STDOUT, > that is) that potentially could contaminate your png > stream ? > > Marc > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open-bio.org] On > Behalf Of chen li > > Sent: Tuesday, May 23, 2006 12:58 AM > > To: Torsten Seemann > > Cc: bioperl-l at lists.open-bio.org > > Subject: Re: [Bioperl-l] problems iwth > Bio::graphics module > > > > Hi, > > > > I try both: either with or without this statement > > binmode(STDOUT) before the last line print > $panel->png; But > > there are no differenes. I get a file of 2432 > bytes. > > > > Li > > > > > > > > > Chen Li > > > > > > > perl render_blast1.pl data1.txt >im.png > > > > > > Based on http://bioperl.org/wiki/HOWTO:Graphics > I believe > > the example > > > script is creating a PNG image. The last line > is: > > > print $panel->png; > > > > > > > and Perl runs without any problem. I use adobe > photoshop to open > > > > them and Adobe can't recognize > > > them. > > > > If I use ACDSee to open them I only get a > black background. If I > > > > issue this line under Cygwin X > > > window > > > > display im.png or display im.gif > > > > Cygwin says: > > > > display: Improper image header `im.png'. > > > > It seems Perl can't produce an image with > right format. > > > > > > Are you sure Perl is producing a PNG file at > all? > > > How many bytes does im.png use? Zero? > > > > > > Did you notice this in > > > http://bioperl.org/wiki/HOWTO:Graphics ? > > > > > > It says: "If you are on a Windows platform, you > need to put STDOUT > > > into binary mode so that the PNG file does not > go through Window's > > > carriage return/linefeed transformations. Before > the final print > > > statement, put the statement binmode(STDOUT)." > > > > > > ie. your script should have > > > > > > binmode(STDOUT); > > > print $panel->png; > > > > > > as the last 2 lines. > > > > > > > Do you experience the same problem before? > > > > > > No. > > > > > > --Torsten > > > > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection > > around http://mail.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From lstein at cshl.edu Tue May 23 10:06:27 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 23 May 2006 10:06:27 -0400 Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com> References: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com> Message-ID: <200605231006.28392.lstein@cshl.edu> Hi, It is possible that your version of display can't handle PNG images. Try saving the output as a file and then opening it in another image program: perl render_blast1.pl data1.txt > data1.png Another thing to watch out for is that, depending on what version of Perl you're using, you may have to insert this statement into the render_blast1.pl script (somewhere near the top): binmode STDOUT; Lincoln On Saturday 20 May 2006 20:15, chen li wrote: > Dear all, > > > I try one script from GraphicsHowTo under Cygwin > environment(GD and libpng already installed). I type > this line in Cygwin X window: > > > $ perl render_blast1.pl data1.txt | display - > > And here is the result: > > display: no decode delegate for this image format > `/tmp/magick-qKiRPDRS'. > > Any idea? > > > Thank you very much, > > Li > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From Derek.Fairley at bll.n-i.nhs.uk Tue May 23 10:39:16 2006 From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek) Date: Tue, 23 May 2006 15:39:16 +0100 Subject: [Bioperl-l] Bio::Restriction::IO query Message-ID: Hi folks, I'm new to BioPerl, and struggling to make the Bio::Restriction::* modules work (using BioPerl-1.4; Perl-5.8.1; Linux-2.4). Specifically, I'm having some trouble understanding the behaviour of the Bio::Restriction::IO module. I'm trying to use this to create a Bio::Restriction::EnzymeCollection object from a local REBASE file (which is in bairoch-format); this will in turn be passed to a Bio::Restriction::Analysis object. The following test script (derived from the Bio::Restriction::IO perldoc) runs fine: #! /usr/bin/perl -w use strict; use warnings; use Bio::Restriction::IO; my $in = Bio::Restriction::IO->new( -file => "REBASE_file", -format =>'Bairoch'); my $collection = $in->read(); print "Number of REs in the collection: ", scalar $collection->each_enzyme, "\n"; #note that using -format=>'bairoch' without capitalisation (as shown in perldoc synopsis) throws an exception: Failed to load module Bio::Restriction::IO::bairoch... However... the test script returns the number 532 - the number of enzymes in the default enzyme set - regardless of the number of enzymes in the file. A default Bio::Restriction::EnzymeCollection object has presumably been created (as the 'read()' and 'each_enzyme' methods are available) but it didn't come from the local file. The result is the same if the Bio::Restriction::IO->new() method is called with no arguments - a default EnzymeCollection object is created. It's not clear to me where this has come from. My (mis?)understanding was that the default set of enzymes would be loaded on creation of a new Bio::Restriction::Analysis object (in the absence of a -enzymes=>... argument). Presumably this is down to my poor understanding of the BioPerl object model... ;-) So: how should I create an EnzymeCollection object from file? Any help or advice would be gratefully received. PS. Congratulations to the development team for creating a very impressive and useful open source toolkit. Derek. ----------------------------------------- Derek Fairley, Ph.D. Regional Virus Laboratory, Kelvin Building, Royal Victoria Hospital, Grosvenor Road, Belfast, N. Ireland. BT12 6BA Tel. +44 (0)2890 635303 From rowan.mitchell at bbsrc.ac.uk Tue May 23 10:53:42 2006 From: rowan.mitchell at bbsrc.ac.uk (rowan mitchell (RRes-Roth)) Date: Tue, 23 May 2006 15:53:42 +0100 Subject: [Bioperl-l] Assembly::IO ace output Message-ID: Hi I am very interested in writing ace format files and had assumed that I would be able to do this with Assembly::IO until I tried it! I see there has been some correspondence last year on this, but as far as I can see this is still not implemented in 1.5.1. Is this correct ? Is it planned to be included; are there modules under development available ? many thanks Rowan =============================================== Dr Rowan Mitchell Rothamsted Research Harpenden Herts AL5 2JQ UK Tel: +44 (0)1582 763133 x2469 Fax: +44 (0)1582 763010 E-mail: rowan.mitchell at bbsrc.ac.uk WWW: http://www.rothamsted.bbsrc.ac.uk/ =============================================== Rothamsted Research is a company limited by guarantee, registered in England under the registration number 2393175 and a not for profit charity number 802038. From rfsouza at cecm.usp.br Tue May 23 16:17:36 2006 From: rfsouza at cecm.usp.br (Robson Francisco de Souza {S}) Date: Tue, 23 May 2006 17:17:36 -0300 Subject: [Bioperl-l] Assembly::IO ace output In-Reply-To: References: Message-ID: <20060523201736.GA28401@cecm.usp.br> Hi Rowan, On Tue, May 23, 2006 at 03:53:42PM +0100, rowan mitchell (RRes-Roth) wrote: > Hi > > I am very interested in writing ace format files and had assumed that I > would be able to do this with Assembly::IO until I tried it! I see there > has been some correspondence last year on this, but as far as I can see > this is still not implemented in 1.5.1. Is this correct ? Is it planned > to be included; are there modules under development available ? As far as I know, there are no plans to add write support to Bio::Assembly::IO. When I wrote the original modules there was no need for this so I left it aside. Best regards, Robson > many thanks > > Rowan > > =============================================== > Dr Rowan Mitchell > Rothamsted Research > Harpenden > Herts AL5 2JQ UK > > Tel: +44 (0)1582 763133 x2469 > Fax: +44 (0)1582 763010 > E-mail: rowan.mitchell at bbsrc.ac.uk > WWW: http://www.rothamsted.bbsrc.ac.uk/ > =============================================== > Rothamsted Research is a company limited by guarantee, registered in > England under the registration number 2393175 and a not for profit > charity number 802038. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From lstein at cshl.edu Tue May 23 16:53:34 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 23 May 2006 16:53:34 -0400 Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <200605231006.28392.lstein@cshl.edu> References: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com> <200605231006.28392.lstein@cshl.edu> Message-ID: <200605231653.36087.lstein@cshl.edu> Hi Chen, It looks to me like you cut and paste the data1.txt file from the web site, consequently replacing the tabs with spaces. Please get table1.txt from the BioPerl distribution, as instructed in the tutorial. Best, Lincoln On Tuesday 23 May 2006 10:06, Lincoln Stein wrote: > Hi, > > It is possible that your version of display can't handle PNG images. Try > saving the output as a file and then opening it in another image program: > > perl render_blast1.pl data1.txt > data1.png > > Another thing to watch out for is that, depending on what version of Perl > you're using, you may have to insert this statement into the > render_blast1.pl script (somewhere near the top): > > binmode STDOUT; > > Lincoln > > On Saturday 20 May 2006 20:15, chen li wrote: > > Dear all, > > > > > > I try one script from GraphicsHowTo under Cygwin > > environment(GD and libpng already installed). I type > > this line in Cygwin X window: > > > > > > $ perl render_blast1.pl data1.txt | display - > > > > And here is the result: > > > > display: no decode delegate for this image format > > `/tmp/magick-qKiRPDRS'. > > > > Any idea? > > > > > > Thank you very much, > > > > Li > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam protection around > > http://mail.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From chen_li3 at yahoo.com Tue May 23 17:46:16 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 23 May 2006 14:46:16 -0700 (PDT) Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <200605231653.36087.lstein@cshl.edu> Message-ID: <20060523214616.15131.qmail@web36813.mail.mud.yahoo.com> Dear Dr. Stein, Thank you so much. I follow your suggestions and download codes from the Bioperl CVS website. Now everything is working. Li --- Lincoln Stein wrote: > Hi Chen, > > It looks to me like you cut and paste the data1.txt > file from the web site, > consequently replacing the tabs with spaces. Please > get table1.txt from the > BioPerl distribution, as instructed in the tutorial. > > Best, > > Lincoln > > On Tuesday 23 May 2006 10:06, Lincoln Stein wrote: > > Hi, > > > > It is possible that your version of display can't > handle PNG images. Try > > saving the output as a file and then opening it in > another image program: > > > > perl render_blast1.pl data1.txt > data1.png > > > > Another thing to watch out for is that, depending > on what version of Perl > > you're using, you may have to insert this > statement into the > > render_blast1.pl script (somewhere near the top): > > > > binmode STDOUT; > > > > Lincoln > > > > On Saturday 20 May 2006 20:15, chen li wrote: > > > Dear all, > > > > > > > > > I try one script from GraphicsHowTo under Cygwin > > > environment(GD and libpng already installed). I > type > > > this line in Cygwin X window: > > > > > > > > > $ perl render_blast1.pl data1.txt | display - > > > > > > And here is the result: > > > > > > display: no decode delegate for this image > format > > > `/tmp/magick-qKiRPDRS'. > > > > > > Any idea? > > > > > > > > > Thank you very much, > > > > > > Li > > > > > > > > > > __________________________________________________ > > > Do You Yahoo!? > > > Tired of spam? Yahoo! Mail has the best spam > protection around > > > http://mail.yahoo.com > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From chen_li3 at yahoo.com Tue May 23 18:59:46 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 23 May 2006 15:59:46 -0700 (PDT) Subject: [Bioperl-l] How to download sequence files either in EMBL format Message-ID: <20060523225946.2118.qmail@web36805.mail.mud.yahoo.com> Hi all, I need to download one sequence for a gene. I go to NCBI website,find the gene of interest,download the file in Genbank format(saved as sequence.genbank). But to my surprise this so-called genbank format file doesn't contain many features such as exons,compared to the one in Emsembl. My question: where can I download this sequence file in EMBL format? It looks like the one in EMBL might contain other information such exon. Thank you very much, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From osborne1 at optonline.net Wed May 24 10:33:16 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 24 May 2006 10:33:16 -0400 Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <20060523132706.57245.qmail@web36811.mail.mud.yahoo.com> Message-ID: Li, The Graphics HOWTO talks about this Windows workaround in _four_ different places, it's impossible to miss if you read it from start to finish. This is what one should do if one wants to use these modules and one is a novice. Example: Important! Remember that if you are on a Windows platform, you need to put STDOUT into binary mode so that the PNG file does not go through Window's carriage return/linefeed transformations. Before the final print statement, write binmode(STDOUT). Brian O. On 5/23/06 9:27 AM, "chen li" wrote: > BTW I copy this script from GraphicsHowTo on Bioperl > website and only one line contains print statement, > which is 'print $panel->png'. From chen_li3 at yahoo.com Wed May 24 12:17:15 2006 From: chen_li3 at yahoo.com (chen li) Date: Wed, 24 May 2006 09:17:15 -0700 (PDT) Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: Message-ID: <20060524161715.45141.qmail@web36807.mail.mud.yahoo.com> Thanks but Dr. Stein already helps me to figure out what is going on: I should have copied the source codes for the examples in CVS instead of "cut and paste" from the HOWTO tutorial. And sorry for any inconvience. Li --- Brian Osborne wrote: > Li, > > The Graphics HOWTO talks about this Windows > workaround in _four_ different > places, it's impossible to miss if you read it from > start to finish. This is > what one should do if one wants to use these modules > and one is a novice. > Example: > > Important! Remember that if you are on a Windows > platform, you need to put > STDOUT into binary mode so that the PNG file does > not go through Window's > carriage return/linefeed transformations. Before the > final print statement, > write binmode(STDOUT). > > Brian O. > > > On 5/23/06 9:27 AM, "chen li" > wrote: > > > BTW I copy this script from GraphicsHowTo on > Bioperl > > website and only one line contains print > statement, > > which is 'print $panel->png'. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From ULNJUJERYDIX at spammotel.com Wed May 24 21:59:36 2006 From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau) Date: Thu, 25 May 2006 09:59:36 +0800 Subject: [Bioperl-l] URGENT: Bio::Graphics::Panel make the ruler have negative (-) position numbering imagemap making Message-ID: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com> Hi thanks for the help offered thus far! sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq using bioperl. therefore i was asked to make the numberings as such (-1000) is there any way at all to do this in bioperl without changing the .pm file? thanks guys.. kevin From cjfields at uiuc.edu Thu May 25 12:43:37 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 25 May 2006 11:43:37 -0500 Subject: [Bioperl-l] Problems with Unflattener.pm In-Reply-To: <729FFBBD-955B-4689-8A27-66733E81431C@genetics.utah.edu> Message-ID: <009d01c6801a$5f75d2a0$15327e82@pyrimidine> I was able to reproduce this using WinXP and bioperl-live. Seems to get caught up in the loop during recursion: debugging shows it is unable to get past 'find_best_matches: (/15)'. There are lots of unmatched pairs here with this sequence, so could that be the problem? I not terribly familiar with Unflattener... Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Barry Moore > Sent: Monday, May 22, 2006 8:00 PM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] Problems with Unflattener.pm > > Hi All, > > NT_113910 appears to throw Bio::SeqFeatures::Tools::Unflattener into > an infinite recursive loop. The trouble occurs in the method > find_best_matches between lines 2258 and 2281, and in particular the > loop is perpetuated by line 2273. NT_113910 has a fairly complex > features table, and but I have as yet been unable to figure out why > this loop is not exiting properly. This has been submitted to > bugzilla, but I'll post here so it gets documented on the list also. > Any suggestions from Chris or others would be greatly appreciated. > > This problem can be recreated as follows: > > Grab NT_113910 from genbank. > bp_fetch.pl -fmt genbank net::genbank:NT_113910 > NT_113910.gbk > > Pass NT_113910.gbk on the command line to the attached script. > > > > #!/usr/bin/perl; > > use strict; > use warnings; > > use Bio::SeqIO; > use Bio::SeqFeature::Tools::Unflattener; > > my $file = shift; > > # generate an Unflattener object > my $unflattener = Bio::SeqFeature::Tools::Unflattener->new; > #$unflattener->verbose(1); > > # first fetch a genbank SeqI object > my $seqio = > Bio::SeqIO->new(-file => $file, > -format => 'GenBank'); > my $out = > Bio::SeqIO->new(-format => 'asciitree'); > while (my $seq = $seqio->next_seq()) { > > # get top level unflattended SeqFeatureI objects > $unflattener->unflatten_seq(-seq => $seq, > -use_magic => 1); > $out->write_seq($seq); > } > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Thu May 25 15:44:01 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 25 May 2006 14:44:01 -0500 Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version In-Reply-To: <3607997C-DAD4-4E0E-A919-7D9212AC6D50@duke.edu> Message-ID: <00a101c68033$95606dd0$15327e82@pyrimidine> This is due to recent changes in the SwissProt/UniProt format (there apparently are many other changes besides this). >From UniProtKB news (http://ca.expasy.org/sprot/relnotes/sp_news.html) is this tidbit: ---------------------------------------------------------- UniProtKB release 7.0 of 07-Feb-2006 Changes concerning dates and versions numbers (DT lines) We changed from showing only the dates corresponding to full UniProtKB releases in the DT lines to displaying the date of the biweekly release at which an entry is integrated or updated. We dropped the information concerning the release number and introduced entry and sequence version numbers in the DT lines. The new format of the three DT lines is: DT DD-MMM-YYYY, integrated into UniProtKB/database_name. DT DD-MMM-YYYY, sequence version version_number. DT DD-MMM-YYYY, entry version version_number. Example for UniProtKB/Swiss-Prot: DT 01-JAN-1998, integrated into UniProtKB/Swiss-Prot. DT 15-OCT-2001, sequence version 3. DT 01-APR-2004, entry version 14. Example for UniProtKB/TrEMBL: DT 01-FEB-1999, integrated into UniProtKB/TrEMBL. DT 15-OCT-2000, sequence version 2. DT 15-DEC-2004, entry version 5. The sequence version number of an entry is incremented by one when its amino acid sequence is modified. The entry version number is incremented by one whenever any data in the flat file representation of the entry is modified. We retrofitted the entry and sequence version numbers, as well as all dates, using archived UniProtKB releases. ---------------------------------------------------------- Probably should explain on the swissprot wiki page that the format is in a state of flux at the moment. I've added this tidbit to the bug page (#2003) as well. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Jason Stajich > Sent: Monday, May 22, 2006 9:04 PM > To: Michael Rogoff > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version > > We ask that people post patches to the bugzilla as an attachment to > the bugzilla so we can track what and why the bug was that the patch > fixes. > > I am not totally sure this patch works because it seems like we need > to strip out more information now from the DT line if the $date > actually contains more information than just the date. > > If you would go ahead and create a bug in bugzilla for this (http:// > bugzilla.open-bio.org) this sort of conversation can be tracked to > the bug. > > If any of this is unclear please let us know - I though we had put > some pages up about this sort of thing on the wiki but maybe they > need to be expanded. > > -jason > On May 22, 2006, at 9:51 PM, Michael Rogoff wrote: > > > I have a patch that seems to work but I'm not familiar with the > > proper method to > > "provide" it. How do I go about that? > > > > The patch is pretty simple, it just parses the sequence version out > > of the date > > line where it now hides: > > > > #date > > elsif( /^DT\s+(.*)/ ) { > > my $date = $1; > > + > > + if ($date =~ /sequence version (\d+)/i) { > > + $params{'-seq_version'} ||= $1; > > + } > > + > > $date =~ s/\;//; > > $date =~ s/\s+$//; > > push @{$params{'-dates'}}, $date; > > } > > > > By the way, what is the difference between Bio::Seq::version and > > Bio::Seq::RichSeq::seq_version? > > > > > >> -----Original Message----- > >> From: Jason Stajich [mailto:jason.stajich at duke.edu] > >> Sent: Monday, May 22, 2006 6:37 PM > >> To: Michael Rogoff > >> Cc: bioperl-l at lists.open-bio.org > >> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version > >> > >> > >> Sounds like a "missing feature" =) > >> > >> AFAIK the module was only written for swissprot files. It is > >> possible there have been changes in the format that have not been > >> tracked to the current code. We'd certainly appreciate someone > >> testing it out as versions evolve. If you submit a bug to bugzilla > >> with version of bioperl and example files you can track when > >> a fix is > >> in. We of course appreciate anyone's efforts to provide a patch as > >> most bugs get fixed of late when someone gets "itchy" enough to fix > >> them. > >> > >> -jason > >> > >> On May 22, 2006, at 7:56 PM, Michael Rogoff wrote: > >> > >>> > >>> As best as I can tell, using Bio::SeqIO to parse a uniprot file > >>> ignores the > >>> sequence version, and calling seq_version() on the resulting > >>> RichSeq object > >>> returns undef. > >>> > >>> It looks like swiss.pm is trying to parse the version out > >> of the SV > >>> line, which > >>> apparently doesn't exist any more? The sequence version(s) > >> are now > >>> specified as > >>> part of the Date (DT) lines. > >>> > >>> Is this not a bug? Is swiss.pm not designed to parse uniprot files? > >>> > >>> Thanks for any help ... > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> -- > >> Jason Stajich > >> Duke University > >> http://www.duke.edu/~jes12 > >> > >> > >> > > > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From miker at biotiquesystems.com Mon May 22 21:51:10 2006 From: miker at biotiquesystems.com (Michael Rogoff) Date: Mon, 22 May 2006 18:51:10 -0700 Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version In-Reply-To: Message-ID: <003301c67e0b$5dd44410$c100a8c0@mike> I have a patch that seems to work but I'm not familiar with the proper method to "provide" it. How do I go about that? The patch is pretty simple, it just parses the sequence version out of the date line where it now hides: #date elsif( /^DT\s+(.*)/ ) { my $date = $1; + + if ($date =~ /sequence version (\d+)/i) { + $params{'-seq_version'} ||= $1; + } + $date =~ s/\;//; $date =~ s/\s+$//; push @{$params{'-dates'}}, $date; } By the way, what is the difference between Bio::Seq::version and Bio::Seq::RichSeq::seq_version? > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich at duke.edu] > Sent: Monday, May 22, 2006 6:37 PM > To: Michael Rogoff > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version > > > Sounds like a "missing feature" =) > > AFAIK the module was only written for swissprot files. It is > possible there have been changes in the format that have not been > tracked to the current code. We'd certainly appreciate someone > testing it out as versions evolve. If you submit a bug to bugzilla > with version of bioperl and example files you can track when > a fix is > in. We of course appreciate anyone's efforts to provide a patch as > most bugs get fixed of late when someone gets "itchy" enough to fix > them. > > -jason > > On May 22, 2006, at 7:56 PM, Michael Rogoff wrote: > > > > > As best as I can tell, using Bio::SeqIO to parse a uniprot file > > ignores the > > sequence version, and calling seq_version() on the resulting > > RichSeq object > > returns undef. > > > > It looks like swiss.pm is trying to parse the version out > of the SV > > line, which > > apparently doesn't exist any more? The sequence version(s) > are now > > specified as > > part of the Date (DT) lines. > > > > Is this not a bug? Is swiss.pm not designed to parse uniprot files? > > > > Thanks for any help ... > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > From chen_li3 at yahoo.com Tue May 23 11:48:46 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 23 May 2006 08:48:46 -0700 (PDT) Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <200605231006.28392.lstein@cshl.edu> Message-ID: <20060523154846.70831.qmail@web36815.mail.mud.yahoo.com> Dear Dr. Stein, I have the job partially done by adding this line (under Cygwin) print STDOUT $panel->png; It is done because I can produce the image to be viewed by other programs but it is only partially done because I don't get exactly the same image as that shown on the website. Enclosed is the image I get. Thank you, Li --- Lincoln Stein wrote: > Hi, > > It is possible that your version of display can't > handle PNG images. Try > saving the output as a file and then opening it in > another image program: > > perl render_blast1.pl data1.txt > data1.png > > Another thing to watch out for is that, depending on > what version of Perl > you're using, you may have to insert this statement > into the render_blast1.pl > script (somewhere near the top): > > binmode STDOUT; > > Lincoln > > > On Saturday 20 May 2006 20:15, chen li wrote: > > Dear all, > > > > > > I try one script from GraphicsHowTo under Cygwin > > environment(GD and libpng already installed). I > type > > this line in Cygwin X window: > > > > > > $ perl render_blast1.pl data1.txt | display - > > > > And here is the result: > > > > display: no decode delegate for this image format > > `/tmp/magick-qKiRPDRS'. > > > > Any idea? > > > > > > Thank you very much, > > > > Li > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -------------- next part -------------- A non-text attachment was scrubbed... Name: im1 Type: image/x-png Size: 2423 bytes Desc: 2615755531-im1 URL: From cjfields at uiuc.edu Thu May 25 21:28:14 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 25 May 2006 20:28:14 -0500 Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version In-Reply-To: <003301c67e0b$5dd44410$c100a8c0@mike> References: <003301c67e0b$5dd44410$c100a8c0@mike> Message-ID: This patch works only for the recent change in swissprot seq format for sequence versions on the DT line. I checked it out vs the test data provided with bioperl (t\data\swiss.dat). I did manage to get it working for both old and new using a modification to your patch but there's another issue; using $seq->get_dates, which should only show dates, shows the entire line (date and version info). Jason mentioned that there needs to be a better way to address this which I'm looking into. Chris On May 22, 2006, at 8:51 PM, Michael Rogoff wrote: > I have a patch that seems to work but I'm not familiar with the > proper method to > "provide" it. How do I go about that? > > The patch is pretty simple, it just parses the sequence version out > of the date > line where it now hides: > > #date > elsif( /^DT\s+(.*)/ ) { > my $date = $1; > + > + if ($date =~ /sequence version (\d+)/i) { > + $params{'-seq_version'} ||= $1; > + } > + > $date =~ s/\;//; > $date =~ s/\s+$//; > push @{$params{'-dates'}}, $date; > } > > By the way, what is the difference between Bio::Seq::version and > Bio::Seq::RichSeq::seq_version? > > >> -----Original Message----- >> From: Jason Stajich [mailto:jason.stajich at duke.edu] >> Sent: Monday, May 22, 2006 6:37 PM >> To: Michael Rogoff >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version >> >> >> Sounds like a "missing feature" =) >> >> AFAIK the module was only written for swissprot files. It is >> possible there have been changes in the format that have not been >> tracked to the current code. We'd certainly appreciate someone >> testing it out as versions evolve. If you submit a bug to bugzilla >> with version of bioperl and example files you can track when >> a fix is >> in. We of course appreciate anyone's efforts to provide a patch as >> most bugs get fixed of late when someone gets "itchy" enough to fix >> them. >> >> -jason >> >> On May 22, 2006, at 7:56 PM, Michael Rogoff wrote: >> >>> >>> As best as I can tell, using Bio::SeqIO to parse a uniprot file >>> ignores the >>> sequence version, and calling seq_version() on the resulting >>> RichSeq object >>> returns undef. >>> >>> It looks like swiss.pm is trying to parse the version out >> of the SV >>> line, which >>> apparently doesn't exist any more? The sequence version(s) >> are now >>> specified as >>> part of the Date (DT) lines. >>> >>> Is this not a bug? Is swiss.pm not designed to parse uniprot files? >>> >>> Thanks for any help ... >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From lstein at cshl.edu Fri May 26 10:38:29 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 26 May 2006 10:38:29 -0400 Subject: [Bioperl-l] URGENT: Bio::Graphics::Panel make the ruler have negative (-) position numbering imagemap making In-Reply-To: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com> References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com> Message-ID: <200605261038.30380.lstein@cshl.edu> Hi, For some reason I didn't see the first posting on this. In current bioperl live, the ruler can have negative numberings - I use this routinely. You need to create a feature that starts in negative coordinates. What is happening to you when you try this? Lincoln On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote: > Hi > thanks for the help offered thus far! > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq using > bioperl. therefore i was asked to make the numberings as such (-1000) is > there any way at all to do this in bioperl without changing the .pm file? > > thanks guys.. > kevin > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From jelenaob at gmail.com Fri May 26 12:47:05 2006 From: jelenaob at gmail.com (Jelena Obradovic) Date: Fri, 26 May 2006 09:47:05 -0700 Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file Message-ID: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com> Hi there, I have tried loading enzyme list from a file REBASE bairoch.605 using Bio::Restriction::IO; 1. But for some reason the number of enzymes in the list is always 532 which is a default set of enzymes in enzyme collection. Is there any known issue with this module or a workaround? And here is the code I have been using: my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-format=>"Bairoch") || die "can't load the file bairoch.605: $!"; my $enzymes = $re_in->read; print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n"; 2. The other problem is when trying to use format that is lower-case it throws an exception, but when "B" is capitalized it is ok. I assume it cannot load a file and does not initilize enzyme collection properly. Can't call method "each_enzyme" on an undefined value at .../cgi-bin/seq-load.pl line 51. Any thoughts? Thanks in advance, Jelena Obradovic jelenaob at gmail.com From cjfields at uiuc.edu Fri May 26 15:27:13 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 26 May 2006 14:27:13 -0500 Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file In-Reply-To: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com> Message-ID: <002601c680fa$644635a0$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic > Sent: Friday, May 26, 2006 11:47 AM > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file > > Hi there, > > I have tried loading enzyme list from a file REBASE bairoch.605 using > Bio::Restriction::IO; > > 1. But for some reason the number of enzymes in the list is always 532 > which is a default set of enzymes in enzyme collection. > > Is there any known issue with this module or a workaround? > > And here is the code I have been using: > > my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- > format=>"Bairoch") > || die "can't load the file bairoch.605: $!"; > my $enzymes = $re_in->read; > print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n"; my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- format=>"Bairoch"); should be my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- format=>"bairoch"); Note the case change for the format; this is noted in the bug report you submitted earlier. Bio::Restriction::IO works similarly to Bio::SeqIO (i.e. requires a specific format, which I believe is case-sensitive). Judging by the modules in Bio/Restriction/IO directory, looks like the Bio::Restriction::IO format should match one of the following formats: bairoch, itype2, withrefm, and you can also build your own if needed using the previous as examples and implementing Bio::Restriction::IO::base. > 2. The other problem is when trying to use format that is lower-case > it throws an exception, but when "B" is capitalized it is ok. > I assume it cannot load a file and does not initilize enzyme > collection properly. > > Can't call method "each_enzyme" on an undefined value at > .../cgi-bin/seq-load.pl line 51. My guess? The reason it works with an uppercase ('Bairoch') is that it can't find the module and uses the default set of enzymes as a fallback. The exception that you reported when you use lowercase ('bairoch') is real and I reported it as a bug (there are a few I found in that module). You might want to try using one of the other formats if you can get the files in the right format from REBASE. I'm looking into the bugs specifically associated with Bio::Restriction::IO::bairoch. > Any thoughts? > > > Thanks in advance, > > > Jelena Obradovic > jelenaob at gmail.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Fri May 26 15:43:18 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 26 May 2006 15:43:18 -0400 Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file In-Reply-To: <002601c680fa$644635a0$15327e82@pyrimidine> Message-ID: Chris, SeqIO's arguments are case-insensitive (e.g. 'fasta', 'Fasta', 'FASTA' should work). This is what the documentation says and what the code seems to suggest. This is probably what the Restriction modules should do as well. Brian O. From cjfields at uiuc.edu Fri May 26 16:21:03 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 26 May 2006 15:21:03 -0500 Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file In-Reply-To: Message-ID: <002701c68101$e9432540$15327e82@pyrimidine> Okay, my bad. Having the format be case-insensitive makes sense and is probably an easy fix, but there seem to be more serious issues with the Bio::Restriction::IO modules at the moment. None have implemented write methods though POD implies they work: SYNOPSIS use Bio::Restriction::IO; $in = Bio::Restriction::IO->new(-file => "inputfilename" , -format => 'withrefm'); $out = Bio::Restriction::IO->new(-file => ">outputfilename" , -format => 'bairoch'); my $res = $in->read; # a Bio::Restriction::EnzymeCollection $out->write($res); and no tests exist for Bio::Restriction::IO::bairoch yet. In fact, the tests are pretty confusing; when did we allow this syntax: '-format => 8'? Anyway, I'm muddling my way through this and will probably write something up for the project priority list if I can't work this bug out. Chris > -----Original Message----- > From: Brian Osborne [mailto:osborne1 at optonline.net] > Sent: Friday, May 26, 2006 2:43 PM > To: Chris Fields; 'Jelena Obradovic'; Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::Restriction::IO and REBASE file > > Chris, > > SeqIO's arguments are case-insensitive (e.g. 'fasta', 'Fasta', 'FASTA' > should work). This is what the documentation says and what the code seems > to > suggest. This is probably what the Restriction modules should do as well. > > Brian O. > > From andreas.bender at complife.org Fri May 26 10:50:03 2006 From: andreas.bender at complife.org (Andreas Bender (CompLife'06)) Date: Fri, 26 May 2006 10:50:03 -0400 Subject: [Bioperl-l] Bioperl-based Applications for "Free Software" Session? Message-ID: Dear All, Did anyone of you implement some cool programs/tools using Bioperl? Or is there someone from the Bioperl core team who wants to present Bioperl itself at our conference? We are holding a "free software" session (free at least as in free beer, ideally also open source, some GNU-type license) at our "Computational Life Sciences" Conference in Cambridge/UK later this year and you are warmly welcome to present your software there. Please contact me directly or visit the website in case of any questions. Enjoy the weekend, Andreas Call for Contributions ================================================== LIFE SCIENCE FREE SOFTWARE SESSION held at CompLife 2006 (http://www.complife.org) in Cambridge, United Kingdom, on September 27 - 29, 2006 ================================================== In the last years more and more free and open source software has been developed for chemo- and bioinformatics, molecular modelling or other Life Science applications, but many of the programs are not well known. During the CompLife 2006 conference we will organize a special session dedicated to this type of free software. The demo session will be preceeded by a short session having room for brief introductory presentations whereas the demo session itself will allow attendees to see the tools in action. Authors of free software will have the opportunity to present their program to the CompLife audience which will consist of researchers and users from computer science, biology, chemistry and everything in between. In case you are interested in the free software session, send us an email at fss at complife.org and briefly describe your program and how you intend to present it at the conference (1-2 pages max - please include URL to downloadable version where available). The only restrictions are that the program must be freely available for everyone or even open source and that it must be related to Life Science applications. The deadline for these proposals is June, 16th 2006. In mid July we will notify you if your software demo was accepted. ************************ -- Computational Life Sciences '06 Cambridge/UK, 27-29 September 2006: Visit http://www.complife.org for more information! Andreas Kieron Patrick Bender - http://www.andreasbender.de Novartis Institutes for BioMedical Research, Cambridge/MA From cjfields at uiuc.edu Fri May 26 17:19:08 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 26 May 2006 16:19:08 -0500 Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file In-Reply-To: <286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com> Message-ID: <002b01c6810a$06642400$15327e82@pyrimidine> The POD documentation is a bit misleading for Bio::Restriction::IO. Brian's right, there needs to be more flexibility with the case for the formats used. I found a few other odd things as well which I may file bug reports for. Looks like another post for the project priority list. Chris _____ From: Jelena Obradovic [mailto:jobradovic at gmail.com] Sent: Friday, May 26, 2006 3:56 PM To: Chris Fields Cc: Jelena Obradovic; Bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio::Restriction::IO and REBASE file Hi guys, I tried with the other formats, and it works fine with "withrefm" format but not with "withref". Thanks a lot for your reponse. Cheers, Jelena On 5/26/06, Chris Fields wrote: > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic > Sent: Friday, May 26, 2006 11:47 AM > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file > > Hi there, > > I have tried loading enzyme list from a file REBASE bairoch.605 using > Bio::Restriction::IO; > > 1. But for some reason the number of enzymes in the list is always 532 > which is a default set of enzymes in enzyme collection. > > Is there any known issue with this module or a workaround? > > And here is the code I have been using: > > my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- > format=>"Bairoch") > || die "can't load the file bairoch.605: $!"; > my $enzymes = $re_in->read; > print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n"; my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- format=>"Bairoch"); should be my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- format=>"bairoch"); Note the case change for the format; this is noted in the bug report you submitted earlier. Bio::Restriction::IO works similarly to Bio::SeqIO ( i.e. requires a specific format, which I believe is case-sensitive). Judging by the modules in Bio/Restriction/IO directory, looks like the Bio::Restriction::IO format should match one of the following formats: bairoch, itype2, withrefm, and you can also build your own if needed using the previous as examples and implementing Bio::Restriction::IO::base. > 2. The other problem is when trying to use format that is lower-case > it throws an exception, but when "B" is capitalized it is ok. > I assume it cannot load a file and does not initilize enzyme > collection properly. > > Can't call method "each_enzyme" on an undefined value at > .../cgi-bin/seq-load.pl line 51. My guess? The reason it works with an uppercase ('Bairoch') is that it can't find the module and uses the default set of enzymes as a fallback. The exception that you reported when you use lowercase ('bairoch') is real and I reported it as a bug (there are a few I found in that module). You might want to try using one of the other formats if you can get the files in the right format from REBASE. I'm looking into the bugs specifically associated with Bio::Restriction::IO::bairoch. > Any thoughts? > > > Thanks in advance, > > > Jelena Obradovic > jelenaob at gmail.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jelena Obradovic Email: jobradovic at gmail.com From jay at jays.net Sat May 27 12:47:27 2006 From: jay at jays.net (Jay Hannah) Date: Sat, 27 May 2006 11:47:27 -0500 Subject: [Bioperl-l] "Project OpenLab" (working title) Message-ID: <4478829F.5030508@jays.net> Hola -- We've been kicking around this idea for a few months now. I'm threatening to start coding. Once I do I might not sleep for a few weeks so I thought I'd solicit feedback now. :) "Project OpenLab": http://omaha.pm.org/kwiki/?BioPerl - Does any such project already exist? - If there's no other obvious choice already bent to BioPerl / BioPerl DB / BioSQL, I'll probably be writing the web framework in Perl's Template Toolkit. The server is Linux, Apache, mySQL (BioPerl DBs). - I'll be using BioPerl objects for the persistence layer as much as possible. Where not possible I'll ask this list about my patches/additions/ugly hackery. - I'll be discussing my back office tables like "users" that don't belong in bioperl-db; and my questions about new tables that might belong there on the BioSQL-l mailing list. - I'm not a computer language zealot (usually), so I'm open to out-of-the-box ideas from anyone. - I'm a biology newb with a long Perl/database/web/e-commerce background, so please feel free to point out any bio idiocy I engage in. Thanks for your time, j From fernan at iib.unsam.edu.ar Sat May 27 18:30:44 2006 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Sat, 27 May 2006 19:30:44 -0300 Subject: [Bioperl-l] "Project OpenLab" (working title) In-Reply-To: <4478829F.5030508@jays.net> References: <4478829F.5030508@jays.net> Message-ID: <20060527223044.GA40583@iib.unsam.edu.ar> +----[ Jay Hannah (27.May.2006 15:15): | | Hola -- Hola! | We've been kicking around this idea for a few months now. I'm threatening to start coding. Once I do I might not sleep for a few weeks so I thought I'd solicit feedback now. :) | | "Project OpenLab": | http://omaha.pm.org/kwiki/?BioPerl | | - Does any such project already exist? mmm ... maybe ... both GUS (Genomics Unified Schema: gusdb.org, though not developed around bioperl) and GMOD (Generic Model Organism Database: gmod.org) provide you with i) RDBMS storage ii) a Perl object layer iii) a web app framework Though certainly overkill for the needs you describe in the wiki, they can be customized to work in the way you describe or at least serve as a guide. | - If there's no other obvious choice already bent to BioPerl / BioPerl DB / BioSQL, I'll probably be writing the web framework in Perl's Template Toolkit. The server is Linux, Apache, mySQL (BioPerl DBs). Have you considered Perl Catalyst? It has the benefits of allowing you to work with bioperl modules naturally (it's Perl!) a choice of templating toolkits (Template Toolkit, Mason, among others) and will provide you with an almost ready to go controller/url dispatcher. | - I'll be using BioPerl objects for the persistence layer as much as possible. Where not possible I'll ask this list about my patches/additions/ugly hackery. | - I'll be discussing my back office tables like "users" that don't belong in bioperl-db; and my questions about new tables that might belong there on the BioSQL-l mailing list. | - I'm not a computer language zealot (usually), so I'm open to out-of-the-box ideas from anyone. | - I'm a biology newb with a long Perl/database/web/e-commerce background, so please feel free to point out any bio idiocy I engage in. | | Thanks for your time, | | j | +----] Good luck, Fernan From epsteinj at mail.nih.gov Fri May 26 14:46:32 2006 From: epsteinj at mail.nih.gov (Epstein, Jonathan A (NIH/NICHD) [E]) Date: Fri, 26 May 2006 14:46:32 -0400 Subject: [Bioperl-l] URGENT: Bio::Graphics::Panel make the ruler havenegative (-) position numbering imagemap making In-Reply-To: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com> References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com> Message-ID: <42504F69898FE546B3F0238C9BD032750915F8@NIHCESMLBX7.nih.gov> While this is being discussed and we have Lincoln's attention; in example 4 on the Biographics Howto: http://stein.cshl.org/genome_informatics/BioGraphics/Graphics-HOWTO.html how can one assign directional arrows to the graded segments which represent the BLAST hits? I.e., is there a glyph type which is both an 'arrow' and a 'graded_segment'? What other techniques do you recommend for associating directionality with these hits? Thanks®ards, Jonathan From jobradovic at gmail.com Fri May 26 16:55:35 2006 From: jobradovic at gmail.com (Jelena Obradovic) Date: Fri, 26 May 2006 13:55:35 -0700 Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file In-Reply-To: <002601c680fa$644635a0$15327e82@pyrimidine> References: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com> <002601c680fa$644635a0$15327e82@pyrimidine> Message-ID: <286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com> Hi guys, I tried with the other formats, and it works fine with "withrefm" format but not with "withref". Thanks a lot for your reponse. Cheers, Jelena On 5/26/06, Chris Fields wrote: > > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic > > Sent: Friday, May 26, 2006 11:47 AM > > To: Bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file > > > > Hi there, > > > > I have tried loading enzyme list from a file REBASE bairoch.605 using > > Bio::Restriction::IO; > > > > 1. But for some reason the number of enzymes in the list is always 532 > > which is a default set of enzymes in enzyme collection. > > > > Is there any known issue with this module or a workaround? > > > > And here is the code I have been using: > > > > my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- > > format=>"Bairoch") > > || die "can't load the file bairoch.605: $!"; > > my $enzymes = $re_in->read; > > print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n"; > > my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- > format=>"Bairoch"); > > should be > > my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- > format=>"bairoch"); > > Note the case change for the format; this is noted in the bug report you > submitted earlier. Bio::Restriction::IO works similarly to Bio::SeqIO ( > i.e. > requires a specific format, which I believe is case-sensitive). Judging > by > the modules in Bio/Restriction/IO directory, looks like the > Bio::Restriction::IO format should match one of the following formats: > bairoch, itype2, withrefm, and you can also build your own if needed using > the previous as examples and implementing Bio::Restriction::IO::base. > > > 2. The other problem is when trying to use format that is lower-case > > it throws an exception, but when "B" is capitalized it is ok. > > I assume it cannot load a file and does not initilize enzyme > > collection properly. > > > > Can't call method "each_enzyme" on an undefined value at > > .../cgi-bin/seq-load.pl line 51. > > My guess? The reason it works with an uppercase ('Bairoch') is that it > can't find the module and uses the default set of enzymes as a fallback. > The exception that you reported when you use lowercase ('bairoch') is real > and I reported it as a bug (there are a few I found in that module). > > You might want to try using one of the other formats if you can get the > files in the right format from REBASE. I'm looking into the bugs > specifically associated with Bio::Restriction::IO::bairoch. > > > Any thoughts? > > > > > > Thanks in advance, > > > > > > Jelena Obradovic > > jelenaob at gmail.com > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Jelena Obradovic Email: jobradovic at gmail.com From gad14 at cornell.edu Fri May 26 16:02:33 2006 From: gad14 at cornell.edu (Genevieve DeClerck) Date: Fri, 26 May 2006 16:02:33 -0400 Subject: [Bioperl-l] results problem with StandAloneBlast Message-ID: <44775ED9.4020208@cornell.edu> Hi, I'm running local blast with Bio::Tools::Run::StandAloneBlast. Everything seems to work ok up to the point of accessing the results. I am able to print the results but when I try to do more than one thing with the result, nothing is returned for the second activity.. I'd like to first sort the results into groups of results that hit the db seq once, twice, three times, etc - where the results are stored as SeqFeature objects in temporary arrays whose contents are printed sequentially to stdout when the whole sort is complete. Secondly, I need to print the results in Hit Table (i.e. -m 8) format to stdout. If I've sorted the results the sorted-results will print to screen, however when I try to print the Hit Table results nothing is returned, as if the blast results have evaporated.... and visa versa, if i comment out the part where i point my sorting subroutine to the blast results reference, my hit table results suddenly prints to screen. It's almost like the reference to the SearchIO obj that holds the StandAloneBlast results is lost after one use?? (I'm beginning to think there is something naive about the way I'm using references?..) Here's an abbreviated version of my code: my $ref_seq_objs; # ref to array of Sequence obj's my $genome_seq; # fasta containing 1 genomic sequence my @params = ('program' => 'blastn', 'database' => $genome_seq, ); my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); my $blast_report = $factory->blastall($ref_seq_objs); #OK ####### ### the following 2 actions seem to be mutually exclusive. # 1) sort results into 1-hitter, 2-hitter, etc. groups of # SeqFeature objs stored in arrays. arrays are then printed # to stdout &sort_results($blast_report); # 2) print blast results &print_blast_results($blast_report); ####### sub print_blast_results{ my $report = shift; while(my $result = $report->next_result()){ while(my $hit = $result->next_hit()){ while(my $hsp = $hit->next_hsp()){ my $q_name = $hsp_q_seq_obj->display_id; print join(", ",$q_name,$hit->name,$hsp->bits)."\n"; } } } } I'm about to lose my mind on this... any assistance appreciated! Thanks, Genevieve From rvosa at sfu.ca Sun May 28 03:43:23 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Sun, 28 May 2006 00:43:23 -0700 Subject: [Bioperl-l] "Project OpenLab" (working title) In-Reply-To: <4478829F.5030508@jays.net> References: <4478829F.5030508@jays.net> Message-ID: <4479549B.5030202@sfu.ca> The TreeBaseII team (part of the cipres project: http://www.phylo.org) are working on a lab database system for storage of intermediate calculation results and data (sequence alignments, trees, taxon sets). I think what you're discussing is a bit more molecular and less phylogenetic, but it does sound similar in spirit. Rutger Jay Hannah wrote: > Hola -- > > We've been kicking around this idea for a few months now. I'm threatening to start coding. Once I do I might not sleep for a few weeks so I thought I'd solicit feedback now. :) > > "Project OpenLab": > http://omaha.pm.org/kwiki/?BioPerl > > - Does any such project already exist? > - If there's no other obvious choice already bent to BioPerl / BioPerl DB / BioSQL, I'll probably be writing the web framework in Perl's Template Toolkit. The server is Linux, Apache, mySQL (BioPerl DBs). > - I'll be using BioPerl objects for the persistence layer as much as possible. Where not possible I'll ask this list about my patches/additions/ugly hackery. > - I'll be discussing my back office tables like "users" that don't belong in bioperl-db; and my questions about new tables that might belong there on the BioSQL-l mailing list. > - I'm not a computer language zealot (usually), so I'm open to out-of-the-box ideas from anyone. > - I'm a biology newb with a long Perl/database/web/e-commerce background, so please feel free to point out any bio idiocy I engage in. > > Thanks for your time, > > j > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- ++++++++++++++++++++++++++++++++++++++++++++++++++++ Rutger Vos, PhD. candidate Department of Biological Sciences Simon Fraser University 8888 University Drive Burnaby, BC, V5A1S6 Phone: 604-291-5625 Fax: 604-291-3496 Personal site: http://www.sfu.ca/~rvosa FAB* lab: http://www.sfu.ca/~fabstar Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ ++++++++++++++++++++++++++++++++++++++++++++++++++++ From cjfields at uiuc.edu Sun May 28 09:55:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 28 May 2006 08:55:47 -0500 Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file In-Reply-To: <286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com> References: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com> <002601c680fa$644635a0$15327e82@pyrimidine> <286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com> Message-ID: Again, it's b/c 'withrefm' is a valid Restriction::IO module and 'withref' is not. Similar to the case issue you saw before with 'bairoch.' Making this more lenient would help but there are more serious issues with these modules that need to be addressed... http://www.bioperl.org/wiki/Project_priority_list#Restriction_Enzymes Chris On May 26, 2006, at 3:55 PM, Jelena Obradovic wrote: > Hi guys, I tried with the other formats, and it works fine with > "withrefm" > format but not with "withref". > > Thanks a lot for your reponse. > > Cheers, > > Jelena > > On 5/26/06, Chris Fields wrote: >> >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic >>> Sent: Friday, May 26, 2006 11:47 AM >>> To: Bioperl-l at lists.open-bio.org >>> Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file >>> >>> Hi there, >>> >>> I have tried loading enzyme list from a file REBASE bairoch.605 >>> using >>> Bio::Restriction::IO; >>> >>> 1. But for some reason the number of enzymes in the list is >>> always 532 >>> which is a default set of enzymes in enzyme collection. >>> >>> Is there any known issue with this module or a workaround? >>> >>> And here is the code I have been using: >>> >>> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- >>> format=>"Bairoch") >>> || die "can't load the file bairoch.605: $!"; >>> my $enzymes = $re_in->read; >>> print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n"; >> >> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- >> format=>"Bairoch"); >> >> should be >> >> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- >> format=>"bairoch"); >> >> Note the case change for the format; this is noted in the bug >> report you >> submitted earlier. Bio::Restriction::IO works similarly to >> Bio::SeqIO ( >> i.e. >> requires a specific format, which I believe is case-sensitive). >> Judging >> by >> the modules in Bio/Restriction/IO directory, looks like the >> Bio::Restriction::IO format should match one of the following >> formats: >> bairoch, itype2, withrefm, and you can also build your own if >> needed using >> the previous as examples and implementing Bio::Restriction::IO::base. >> >>> 2. The other problem is when trying to use format that is lower-case >>> it throws an exception, but when "B" is capitalized it is ok. >>> I assume it cannot load a file and does not initilize enzyme >>> collection properly. >>> >>> Can't call method "each_enzyme" on an undefined value at >>> .../cgi-bin/seq-load.pl line 51. >> >> My guess? The reason it works with an uppercase ('Bairoch') is >> that it >> can't find the module and uses the default set of enzymes as a >> fallback. >> The exception that you reported when you use lowercase ('bairoch') >> is real >> and I reported it as a bug (there are a few I found in that module). >> >> You might want to try using one of the other formats if you can >> get the >> files in the right format from REBASE. I'm looking into the bugs >> specifically associated with Bio::Restriction::IO::bairoch. >> >>> Any thoughts? >>> >>> >>> Thanks in advance, >>> >>> >>> Jelena Obradovic >>> jelenaob at gmail.com >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > -- > Jelena Obradovic > Email: jobradovic at gmail.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From osborne1 at optonline.net Sun May 28 11:03:37 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Sun, 28 May 2006 11:03:37 -0400 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <44775ED9.4020208@cornell.edu> Message-ID: Genevieve, Does this simplified code, without the &sort_results($blast_report) line, work? By the way, no one can really help you here because you haven't shown us all of the code. The code you are showing certainly looks OK. Brian O. On 5/26/06 4:02 PM, "Genevieve DeClerck" wrote: > &sort_results($blast_report); From simon.rayner.mlist at gmail.com Mon May 29 03:37:24 2006 From: simon.rayner.mlist at gmail.com (mailing lists) Date: Mon, 29 May 2006 15:37:24 +0800 Subject: [Bioperl-l] installation problems with bioperl-ext on x86_64 running SuSE linux Message-ID: Hello, i'm having a problem trying to install the bioperl-ext package on my system. biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align # perl Makefile.PL Writing Makefile for Bio::Ext::Align biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align # make cc -c -I./libs -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -pipe -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -fPIC -O2 -fmessage-length=0 -Wall -D_FORTIFY_SOURCE=2 -g -Wall -pipe -DVERSION=\"0.1\" -DXS_VERSION= \"0.1\" -fPIC "-I/usr/lib/perl5/5.8.7/x86_64-linux-thread-multi/CORE" -DPOSIX -DNOERROR Align.c In file included from Align.xs:12: ./libs/sw.h:1360:1: warning: "/*" within comment . . . Running Mkbootstrap for Bio::Ext::Align () chmod 644 Align.bs rm -f blib/arch/auto/Bio/Ext/Align/Align.so LD_RUN_PATH="" cc -shared -L/usr/local/lib64 Align.o -o blib/arch/auto/Bio/Ext/Align/Align.so libs/libsw.a -lm /usr/lib64/gcc/x86_64-suse-linux/4.0.2/../../../../x86_64-suse-linux/bin/ld: libs/libsw.a(aln.o): relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC libs/libsw.a: could not read symbols: Bad value collect2: ld returned 1 exit status make: *** [blib/arch/auto/Bio/Ext/Align/Align.so] Error 1 biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align # biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align # biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align # the -fPIC flag is already set in the makefile. I found a similar problem in an earlier posting with the following suggestions.... From: Aaron J. Mackey pcbi.upenn.edu> Subject: Re: compiling bioperl-ext Newsgroups: gmane.comp.lang.perl.bio.general Date: 2004-06-09 20:46:05 GMT (1 year, 50 weeks, 3 days, 3 hours and 50 minutes ago) 1) Are you starting with a clean build directory? 2) Does installing other compiled Perl modules work for you (e.g. Data::Dumper or Storable)? That's a pretty arcane error, and if the answer to #2 is "no", then I don't think we can help you. -Aaron ....In my case, both 1) and 2) are true. I installed Data::Dumper without any problems. I've found plenty of similar incidences for other sofware and it seems to relate to 32/64bit issues. Does anyone have any suggestions about how to get around this? thanks Simon Rayner From ULNJUJERYDIX at spammotel.com Mon May 29 05:46:21 2006 From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau) Date: Mon, 29 May 2006 17:46:21 +0800 Subject: [Bioperl-l] **Fwd: Re: URGENT: Bio::Graphics::Panel make the ruler have In-Reply-To: <200605261038.30380.lstein@cshl.edu> References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com> <200605261038.30380.lstein@cshl.edu> Message-ID: <5b6410e0605290246p8875c78n286caa672a55b4de@mail.gmail.com> Hi! oh it was in a slightly different header asking about the create image map feature. I am using the stable version 1.4 of bioperl now. In any case I have not added the sequence as a feature annotated seq. as I already have the bp where the TF binds (in 1-1050 numberings) so what I did was to just add graded segments based on the position. I saw that there is a scale function for the arrow glyp however, it is a multiply function, can it be hacked to take in a offset value (ie minus the scale by 1000?) cheers kevin Hi, > > For some reason I didn't see the first posting on this. In current bioperl > live, the ruler can have negative numberings - I use this routinely. You > need > to create a feature that starts in negative coordinates. What is happening > to > you when you try this? > > Lincoln > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote: > > Hi > > thanks for the help offered thus far! > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq > using > > bioperl. therefore i was asked to make the numberings as such (-1000) is > > there any way at all to do this in bioperl without changing the .pm > file? > > > > thanks guys.. > > kevin > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From shameer at ncbs.res.in Mon May 29 06:07:17 2006 From: shameer at ncbs.res.in (Shameer Khadar) Date: Mon, 29 May 2006 15:37:17 +0530 (IST) Subject: [Bioperl-l] Reg. Integrated Server / CGI to pass PDB to multiple Servers Message-ID: <49187.192.168.1.1.1148897237.squirrel@192.168.1.1> Dear All, My query may not be directly related to BioPERL, But am sure I will get some idea to move on. Some possibilities wil be available from Pise or related modules Query : --------- We have several public servers(say a,b,c). All of them will take a pdb-file as an input and process it and displays it. Now, I need to create a web page(a meta-server/integrated web-server) with three radio buttons(a,b,c) and a single input form(to accept pdb file from the users ...:( - File passing as an argument seems to be some what impossible to me). I need output as 3 links in next page. Is there any Bio-PERL module / CGI / Perl tricks to do it ? Thanks in advance, -- Shameer Khadar Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India T - 91-080-23636420-32 EXT 4241 F - 91-080-23636662/23636675 W - http://caps.ncbs.res.in -------------------------------------------------- "Refrain from illusions, insist on work and not words, patiently seek divine and scientific truth." From torsten.seemann at infotech.monash.edu.au Tue May 30 02:41:31 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 30 May 2006 16:41:31 +1000 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <44775ED9.4020208@cornell.edu> References: <44775ED9.4020208@cornell.edu> Message-ID: <447BE91B.30001@infotech.monash.edu.au> > my $ref_seq_objs; # ref to array of Sequence obj's > my $genome_seq; # fasta containing 1 genomic sequence > my @params = ('program' => 'blastn', > 'database' => $genome_seq, > ); The database parameter needs to be the same thing you would pass to the "-d" option in "blastall". I don't think you can pass a perl string here. ie. there needs to be a properly formatted set of blast indices for your genome sequence on the disk in the appropriate place. See ftp://ftp.ncbi.nlm.nih.gov/blast/documents/blast.html > my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); > my $blast_report = $factory->blastall($ref_seq_objs); #OK But I could be wrong, and $blast_report here contains a valid BLAST report. -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From sb at mrc-dunn.cam.ac.uk Tue May 30 03:59:28 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Tue, 30 May 2006 08:59:28 +0100 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <44775ED9.4020208@cornell.edu> References: <44775ED9.4020208@cornell.edu> Message-ID: <447BFB60.4000006@mrc-dunn.cam.ac.uk> Genevieve DeClerck wrote: > Hi, [snip] > If I've sorted the results the sorted-results will print to screen, > however when I try to print the Hit Table results nothing is returned, > as if the blast results have evaporated.... and visa versa, if i comment > out the part where i point my sorting subroutine to the blast results > reference, my hit table results suddenly prints to screen. [snip] > Here's an abbreviated version of my code: [snip] > ####### > ### the following 2 actions seem to be mutually exclusive. > # 1) sort results into 1-hitter, 2-hitter, etc. groups of > # SeqFeature objs stored in arrays. arrays are then printed > # to stdout > &sort_results($blast_report); > > # 2) print blast results > &print_blast_results($blast_report); > sub print_blast_results{ > my $report = shift; > while(my $result = $report->next_result()){ [snip] You didn't give us your sort_results subroutine, but is it as simple as they both use $report->next_result (and/or $result->next_hit), but you don't reset the internal counter back to the start, so the second subroutine tries to get the next_result and finds the first subroutine has already looked at the last result and so next_result returns false? From a quick look it wasn't obvious how to reset the counter. Hopefully this can be done and someone else knows how. From torsten.seemann at infotech.monash.edu.au Tue May 30 04:18:45 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 30 May 2006 18:18:45 +1000 Subject: [Bioperl-l] For CVS developers - potential pitfall with "return undef" Message-ID: <447BFFE5.8010508@infotech.monash.edu.au> FYI Bioperl developers: I just audited the bioperl-live CVS and found about 450 occurrences of "return undef". Page 199 of "Perl Best Practices" by Damian Conway, and this URL http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest: "Use return; instead of return undef; if you want to return nothing. If someone assigns the return value to an array, the latter creates an array of one value (undef), which evaluates to true. The former will correctly handle all contexts." So I'm guessing at least some of these 450 occurrences *could* result in bugs and should probably be changed. Your opinion may differ :-) -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From cjfields at uiuc.edu Tue May 30 10:07:45 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 30 May 2006 09:07:45 -0500 Subject: [Bioperl-l] For CVS developers - potential pitfall with "returnundef" In-Reply-To: <447BFFE5.8010508@infotech.monash.edu.au> Message-ID: <000c01c683f2$6ca62570$15327e82@pyrimidine> Torsten, Any way you can post a list of some/all of the offending lines or modules? Sounds like something to consider, but if the list is as large as you say we made need something (bugzilla? wiki?) to track the changes and make sure they pass tests; I'm sure a large majority will. I'm guessing Jason would want this somewhere on the project priority list or bugzilla, with a link to the actual list, but I'm not sure. Maybe start a page on the wiki for proposed code changes? Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > Sent: Tuesday, May 30, 2006 3:19 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] For CVS developers - potential pitfall with > "returnundef" > > FYI Bioperl developers: > > I just audited the bioperl-live CVS and found about 450 occurrences of > "return undef". > > Page 199 of "Perl Best Practices" by Damian Conway, and this URL > http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest: > > "Use return; instead of return undef; if you want to return nothing. If > someone assigns the return value to an array, the latter creates an > array of one value (undef), which evaluates to true. The former will > correctly handle all contexts." > > So I'm guessing at least some of these 450 occurrences *could* result in > bugs and should probably be changed. > > Your opinion may differ :-) > > -- > Dr Torsten Seemann http://www.vicbioinformatics.com > Victorian Bioinformatics Consortium, Monash University, Australia > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lstein at cshl.edu Tue May 30 10:47:48 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 30 May 2006 10:47:48 -0400 Subject: [Bioperl-l] **Fwd: Re: URGENT: Bio::Graphics::Panel make the ruler have In-Reply-To: <5b6410e0605290246p8875c78n286caa672a55b4de@mail.gmail.com> References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com> <200605261038.30380.lstein@cshl.edu> <5b6410e0605290246p8875c78n286caa672a55b4de@mail.gmail.com> Message-ID: <200605301047.49127.lstein@cshl.edu> Hi Kevin, I'm afraid that there is no offset value. You'll need the 1.51 version of bioperl to handle negative numbers properly. I understand your reluctance to upgrade just to get the Bio::Graphics functionality. You might consider checking out just the Bio/Graphics subtree and installing that. It should work on top of 1.4 Lincoln On Monday 29 May 2006 05:46, Kevin Lam Koiyau wrote: > Hi! > oh it was in a slightly different header asking about the create image map > feature. > I am using the stable version 1.4 of bioperl now. In any case I have not > added the sequence as a feature annotated seq. as I already have the bp > where the TF binds (in 1-1050 numberings) so what I did was to just add > graded segments based on the position. > I saw that there is a scale function for the arrow glyp however, it is a > multiply function, can it be hacked to take in a offset value (ie minus the > scale by 1000?) > > cheers > kevin > > > Hi, > > > For some reason I didn't see the first posting on this. In current > > bioperl live, the ruler can have negative numberings - I use this > > routinely. You need > > to create a feature that starts in negative coordinates. What is > > happening to > > you when you try this? > > > > Lincoln > > > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote: > > > Hi > > > thanks for the help offered thus far! > > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq > > > > using > > > > > bioperl. therefore i was asked to make the numberings as such (-1000) > > > is there any way at all to do this in bioperl without changing the .pm > > > > file? > > > > > thanks guys.. > > > kevin > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Lincoln D. Stein > > Cold Spring Harbor Laboratory > > 1 Bungtown Road > > Cold Spring Harbor, NY 11724 > > (516) 367-8380 (voice) > > (516) 367-8389 (fax) > > FOR URGENT MESSAGES & SCHEDULING, > > PLEASE CONTACT MY ASSISTANT, > > SANDRA MICHELSEN, AT michelse at cshl.edu > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Tue May 30 10:50:06 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 30 May 2006 09:50:06 -0500 Subject: [Bioperl-l] Bio::Restriction::IO issues Message-ID: <000f01c683f8$5771ed50$15327e82@pyrimidine> Jason, Brian, et al, I found several major issues with Bio::Restriction::IO (this popped up while bug squashing). In particular, the POD is pretty misleading. It states (directly from perldoc): SYNOPSIS use Bio::Restriction::IO; $in = Bio::Restriction::IO->new(-file => "inputfilename" , -format => 'withrefm'); $out = Bio::Restriction::IO->new(-file => ">outputfilename" , -format => 'bairoch'); my $res = $in->read; # a Bio::Restriction::EnzymeCollection $out->write($res); # or # use Bio::Restriction::IO; # # #input file format can be read from the file extension (dat|xml) # $in = Bio::Restriction::IO->newFh(-file => "inputfilename"); # $out = Bio::Restriction::IO->newFh('-format' => 'xml'); # # # World's shortest flat<->xml format converter: # print $out $_ while <$in>; So, I have found several problems with these modules. I really hate to criticize code here, as my own is pretty hacky, but I think these are things to seriously mull over: 1) Note that, though some of the lines above are commented they are still there in POD and thus present in perldoc/pod2html etc. So, judging from the above, it suggests using the script above should read in from one format and write out to another (like SeqIO). However, NONE of the current write() methods are implemented for any of the IO modules (withref, base, itype2, bairoch), so this does not happen as expected. You get the nasty thrown 'method not implemented error' instead when writing. 2) The commented statements in POD above also suggest that REBASE XML format is supported when there is no XML module. 3) The Bio::Restriction::IO::bairoch module had multiple bugs which made it unusable until I added a few small changes; it still can't handle multisite/multicut enzymes properly, so in essence it is useless until that is addressed. 4) Bio::Restriction::IO inherits from Bio::SeqIO, though I'm not sure why. Shouldn't it just inherit from Bio::Root::Root/Bio::Root::IO and make up it's own methods? I'm working on at least getting the 'bairoch' input format up and running (so at least it gets the enzymes into a Bio::Restriction::Enzyme::Collection). From this point I'm not sure where to proceed. The POD obviously needs to be corrected to reflect that writing formats is not implemented (and the bit about XML should be taken out completely); that's the easy part which I am working on and plan committing today. However, these modules don't seem to be used too frequently so I'm not sure whether it's worth spending too much time getting these up to speed at the moment (adding write methods, switching to Bio::Root::Root, etc); I have other priorities at the moment (including a way overdue ListSummary). I'm also not sure who else is (using|working) on these so I don't want to (make too many changes|step on someone else's toes), but these are, IMHO, pretty serious problems. Any thoughts? Chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue May 30 12:34:18 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 30 May 2006 11:34:18 -0500 Subject: [Bioperl-l] Bio::Restriction::IO changes Message-ID: <001401c68406$e71e9850$15327e82@pyrimidine> Jason, Brian, et al: I have made changes to the Bio::Restriction::IO POD to remove any reference to write functions since almost none have been implemented yet, so including this into POD is a bit misleading. At the moment, you can't write to any REBASE format except for 'base', which I found is the only one that works. And, upon further checking, even that one has issues: it looks like there are problems with multicut/multisite enzymes when writing in 'base' format which I'm not delving into ('TaqII' only displays one site when writing when it has two cut sites). I'll add this to the wiki and a bug report (enhancement) for this module. I am also removing mention of XML and 'bairoch' formats (the former isn't present and the latter is broken at the moment) and added a few things to the POD TO DO section. Rob (if you're out there somewhere in the ether), have you made any more changes to these modules that need to be committed? Didn't know if any of these issues have already been addressed/changed etc. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From jelenaob at gmail.com Tue May 30 00:58:35 2006 From: jelenaob at gmail.com (Jelena Obradovic) Date: Mon, 29 May 2006 21:58:35 -0700 Subject: [Bioperl-l] Bio::Graphic::Panel backgroud color Message-ID: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com> Hello everybody, does anybody know how to remove the background color of the Panel. Currently, I am not adding anything to it, so I can troubleshot the problem, and I have tried setting up all color attributes I could find to the panel, but no luck. Whatever I do, I get the BLUE border of the panel. Has anybody faced the same problem? Thanks in advance, Jelena And here is the code I am currently using: ----------------------------------------------------------------------------------------------------------- my $panel = Bio::Graphics::Panel->new(-length => $prim_seq->length() + 200, -width => 800, -pad_left => 10, -pad_right => 10, -key_color => 'white', -bgcolor => 'white', -gridcolor=>'black', -fgcolor => 'black', -grid => 0, ); my ($url,$map,$mapname) = $panel->image_and_map( #-root=>'$root_url' , -url => '/tmpimages'); #make clickable image print $cgi->img({-src=>$url,-usemap=>"#$mapname"}); print $map; ----------------------------------------------------------------------------------------------------------- From jelenaob at gmail.com Tue May 30 00:58:35 2006 From: jelenaob at gmail.com (Jelena Obradovic) Date: Mon, 29 May 2006 21:58:35 -0700 Subject: [Bioperl-l] Bio::Graphic::Panel backgroud color Message-ID: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com> Hello everybody, does anybody know how to remove the background color of the Panel. Currently, I am not adding anything to it, so I can troubleshot the problem, and I have tried setting up all color attributes I could find to the panel, but no luck. Whatever I do, I get the BLUE border of the panel. Has anybody faced the same problem? Thanks in advance, Jelena And here is the code I am currently using: ----------------------------------------------------------------------------------------------------------- my $panel = Bio::Graphics::Panel->new(-length => $prim_seq->length() + 200, -width => 800, -pad_left => 10, -pad_right => 10, -key_color => 'white', -bgcolor => 'white', -gridcolor=>'black', -fgcolor => 'black', -grid => 0, ); my ($url,$map,$mapname) = $panel->image_and_map( #-root=>'$root_url' , -url => '/tmpimages'); #make clickable image print $cgi->img({-src=>$url,-usemap=>"#$mapname"}); print $map; ----------------------------------------------------------------------------------------------------------- From luciap at sas.upenn.edu Tue May 30 14:49:48 2006 From: luciap at sas.upenn.edu (Lucia Peixoto) Date: Tue, 30 May 2006 14:49:48 -0400 Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function Message-ID: <1149014988.447c93cc01761@128.91.55.38> Hi I am here again, I finally got to write the "collapse nodes" function and have a couple of questions. In order to collpase any node $node, I first have to get the parent which I can do as $parent=$node->ancestor and then the children as: @children=$node->get_all_Descendents (or should I use each descendent?) Then before deleting $node I have to assign all its children to $parent, and here is where I am kind of confussed. Can I use the add_Descendent function for this? I've been tryig to write something like this: foreach $child (@children){ $parent=add_Descendent->$child; } but this doesn't work and I think it is because I don't have any idea of what I am doing any suggestions? thanks Lucia Peixoto Department of Biology,SAS University of Pennsylvania From rvosa at sfu.ca Tue May 30 14:52:52 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Tue, 30 May 2006 11:52:52 -0700 Subject: [Bioperl-l] For CVS developers - potential pitfall with "returnundef" In-Reply-To: <000c01c683f2$6ca62570$15327e82@pyrimidine> References: <000c01c683f2$6ca62570$15327e82@pyrimidine> Message-ID: <447C9484.9030102@sfu.ca> Although I agree with the sentiment of following PBP, I'm not so sure changing 'return undef' to 'return' *now* will fix any bugs without introducing new, subtle ones. Chris Fields wrote: > Torsten, > > Any way you can post a list of some/all of the offending lines or modules? > Sounds like something to consider, but if the list is as large as you say we > made need something (bugzilla? wiki?) to track the changes and make sure > they pass tests; I'm sure a large majority will. > > I'm guessing Jason would want this somewhere on the project priority list or > bugzilla, with a link to the actual list, but I'm not sure. Maybe start a > page on the wiki for proposed code changes? > > Chris > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann >> Sent: Tuesday, May 30, 2006 3:19 AM >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] For CVS developers - potential pitfall with >> "returnundef" >> >> FYI Bioperl developers: >> >> I just audited the bioperl-live CVS and found about 450 occurrences of >> "return undef". >> >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest: >> >> "Use return; instead of return undef; if you want to return nothing. If >> someone assigns the return value to an array, the latter creates an >> array of one value (undef), which evaluates to true. The former will >> correctly handle all contexts." >> >> So I'm guessing at least some of these 450 occurrences *could* result in >> bugs and should probably be changed. >> >> Your opinion may differ :-) >> >> -- >> Dr Torsten Seemann http://www.vicbioinformatics.com >> Victorian Bioinformatics Consortium, Monash University, Australia >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- ++++++++++++++++++++++++++++++++++++++++++++++++++++ Rutger Vos, PhD. candidate Department of Biological Sciences Simon Fraser University 8888 University Drive Burnaby, BC, V5A1S6 Phone: 604-291-5625 Fax: 604-291-3496 Personal site: http://www.sfu.ca/~rvosa FAB* lab: http://www.sfu.ca/~fabstar Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ ++++++++++++++++++++++++++++++++++++++++++++++++++++ From luciap at sas.upenn.edu Tue May 30 16:11:52 2006 From: luciap at sas.upenn.edu (Lucia Peixoto) Date: Tue, 30 May 2006 16:11:52 -0400 Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function In-Reply-To: References: Message-ID: <1149019912.447ca7085124e@128.91.55.38> Hi OK that was silly, but what I have in my code is what you just wrote But the problem is that if I write $parent->add_Descendent($child) it tells me that I am calling the method "ass_Descendent" on an undefined value (but I did define $parent before??) So here it goes the code so far: use Bio::TreeIO; my $in = new Bio::TreeIO(-file => 'Test2.tre', -format => 'newick'); my $out = new Bio::TreeIO(-file => '>mytree.out', -format => 'newick'); while( my $tree = $in->next_tree ) { foreach my $node ( grep { ! $_->is_Leaf() } $tree->get_nodes() ) { my $bootstrap=$node->_creation_id; if ($bootstrap < 70 ){ my $parent = $node->ancestor; my @children=$node->get_all_Descendents; foreach my $child (@children){ $parent->add_Descendent($child); } ........ eventually I'll add (once I assigned the children to the parent succesfully): $tree->remove_Node($node); } } $out->write_tree($tree); } Quoting aaron.j.mackey at gsk.com: > > foreach $child (@children){ > > $parent=add_Descendent->$child; > > } > > I think what you want is $parent->add_Descendent($child) > > -Aaron > Lucia Peixoto Department of Biology,SAS University of Pennsylvania From jason.stajich at duke.edu Tue May 30 16:30:56 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue, 30 May 2006 16:30:56 -0400 Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function In-Reply-To: <1149019912.447ca7085124e@128.91.55.38> References: <1149019912.447ca7085124e@128.91.55.38> Message-ID: <6B175FC0-F9D4-4658-AF9D-23D7F1C1B241@duke.edu> you need to special case the root - it won't have an ancestor. just protect the my $parent = $node->ancestor with an if statement as I did below On May 30, 2006, at 4:11 PM, Lucia Peixoto wrote: > Hi > OK that was silly, but what I have in my code is what you just wrote > But the problem is that if I write > > $parent->add_Descendent($child) > > it tells me that I am calling the method "ass_Descendent" on an > undefined value > (but I did define $parent before??) > > So here it goes the code so far: > > use Bio::TreeIO; > my $in = new Bio::TreeIO(-file => 'Test2.tre', > -format => 'newick'); > my $out = new Bio::TreeIO(-file => '>mytree.out', > -format => 'newick'); > while( my $tree = $in->next_tree ) { > foreach my $node ( grep { ! $_->is_Leaf() } $tree->get_nodes() ) { > my $bootstrap=$node->_creation_id; > > if ($bootstrap < 70 ){ > >>> if( my $parent = $node->ancestor ) { > my @children=$node->get_all_Descendents; > foreach my $child (@children){ > $parent->add_Descendent($child); > } } > > ........ > > eventually I'll add (once I assigned the children to the parent > succesfully): > $tree->remove_Node($node); > > } > } > $out->write_tree($tree); > } > > Quoting aaron.j.mackey at gsk.com: > >>> foreach $child (@children){ >>> $parent=add_Descendent->$child; >>> } >> >> I think what you want is $parent->add_Descendent($child) >> >> -Aaron >> > > > Lucia Peixoto > Department of Biology,SAS > University of Pennsylvania > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cjfields at uiuc.edu Tue May 30 17:40:18 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 30 May 2006 16:40:18 -0500 Subject: [Bioperl-l] For CVS developers - potential pitfallwith "returnundef" In-Reply-To: <447C9484.9030102@sfu.ca> Message-ID: <001801c68431$a586b2d0$15327e82@pyrimidine> Agreed, though I think these changes should be implemented at some point (Conway's argument here makes sense and it is nice for Torsten to check this out). If proper tests are written then any changes resulting in errors should be picked up by checking the appropriate test suite, though I know it doesn't absolutely guarantee it. ; P Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Rutger Vos > Sent: Tuesday, May 30, 2006 1:53 PM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith > "returnundef" > > Although I agree with the sentiment of following PBP, I'm not so sure > changing 'return undef' to 'return' *now* will fix any bugs without > introducing new, subtle ones. > > Chris Fields wrote: > > Torsten, > > > > Any way you can post a list of some/all of the offending lines or > modules? > > Sounds like something to consider, but if the list is as large as you > say we > > made need something (bugzilla? wiki?) to track the changes and make sure > > they pass tests; I'm sure a large majority will. > > > > I'm guessing Jason would want this somewhere on the project priority > list or > > bugzilla, with a link to the actual list, but I'm not sure. Maybe start > a > > page on the wiki for proposed code changes? > > > > Chris > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > >> Sent: Tuesday, May 30, 2006 3:19 AM > >> To: bioperl-l at lists.open-bio.org > >> Subject: [Bioperl-l] For CVS developers - potential pitfall with > >> "returnundef" > >> > >> FYI Bioperl developers: > >> > >> I just audited the bioperl-live CVS and found about 450 occurrences of > >> "return undef". > >> > >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL > >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest: > >> > >> "Use return; instead of return undef; if you want to return nothing. If > >> someone assigns the return value to an array, the latter creates an > >> array of one value (undef), which evaluates to true. The former will > >> correctly handle all contexts." > >> > >> So I'm guessing at least some of these 450 occurrences *could* result > in > >> bugs and should probably be changed. > >> > >> Your opinion may differ :-) > >> > >> -- > >> Dr Torsten Seemann http://www.vicbioinformatics.com > >> Victorian Bioinformatics Consortium, Monash University, Australia > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > -- > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > Rutger Vos, PhD. candidate > Department of Biological Sciences > Simon Fraser University > 8888 University Drive > Burnaby, BC, V5A1S6 > Phone: 604-291-5625 > Fax: 604-291-3496 > Personal site: http://www.sfu.ca/~rvosa > FAB* lab: http://www.sfu.ca/~fabstar > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rvosa at sfu.ca Tue May 30 17:58:25 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Tue, 30 May 2006 14:58:25 -0700 Subject: [Bioperl-l] For CVS developers - potential pitfallwith"returnundef" In-Reply-To: <001901c68433$026b1ad0$15327e82@pyrimidine> References: <001901c68433$026b1ad0$15327e82@pyrimidine> Message-ID: <447CC001.4050000@sfu.ca> I've been following the perl6 mailing lists for a while now. I think this time around it won't really take that long (one year?) for pugs/perl6 stacks to become more than just toys. I think especially large projects, like bioperl, will really benefit from the improved OO implementation in perl6, so it might be of interest to at least fantasize about it. Chris Fields wrote: > Ha! Or may be the 'nonexistent' bioperl-experimental. Wonder what'll > happen once Perl6 comes to term? > > -CJF > > >> -----Original Message----- >> From: Rutger Vos [mailto:rvosa at sfu.ca] >> Sent: Tuesday, May 30, 2006 4:48 PM >> To: Chris Fields >> Subject: Re: [Bioperl-l] For CVS developers - potential >> pitfallwith"returnundef" >> >> Surely this will all sort itself out in bioperl6 ;-) >> >> Chris Fields wrote: >> >>> Agreed, though I think these changes should be implemented at some point >>> (Conway's argument here makes sense and it is nice for Torsten to check >>> >> this >> >>> out). If proper tests are written then any changes resulting in errors >>> should be picked up by checking the appropriate test suite, though I >>> >> know it >> >>> doesn't absolutely guarantee it. ; P >>> >>> Chris >>> >>> >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Rutger Vos >>>> Sent: Tuesday, May 30, 2006 1:53 PM >>>> To: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith >>>> "returnundef" >>>> >>>> Although I agree with the sentiment of following PBP, I'm not so sure >>>> changing 'return undef' to 'return' *now* will fix any bugs without >>>> introducing new, subtle ones. >>>> >>>> Chris Fields wrote: >>>> >>>> >>>>> Torsten, >>>>> >>>>> Any way you can post a list of some/all of the offending lines or >>>>> >>>>> >>>> modules? >>>> >>>> >>>>> Sounds like something to consider, but if the list is as large as you >>>>> >>>>> >>>> say we >>>> >>>> >>>>> made need something (bugzilla? wiki?) to track the changes and make >>>>> >> sure >> >>>>> they pass tests; I'm sure a large majority will. >>>>> >>>>> I'm guessing Jason would want this somewhere on the project priority >>>>> >>>>> >>>> list or >>>> >>>> >>>>> bugzilla, with a link to the actual list, but I'm not sure. Maybe >>>>> >> start >> >>>> a >>>> >>>> >>>>> page on the wiki for proposed code changes? >>>>> >>>>> Chris >>>>> >>>>> >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann >>>>>> Sent: Tuesday, May 30, 2006 3:19 AM >>>>>> To: bioperl-l at lists.open-bio.org >>>>>> Subject: [Bioperl-l] For CVS developers - potential pitfall with >>>>>> "returnundef" >>>>>> >>>>>> FYI Bioperl developers: >>>>>> >>>>>> I just audited the bioperl-live CVS and found about 450 occurrences >>>>>> >> of >> >>>>>> "return undef". >>>>>> >>>>>> Page 199 of "Perl Best Practices" by Damian Conway, and this URL >>>>>> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html >>>>>> >> suggest: >> >>>>>> "Use return; instead of return undef; if you want to return nothing. >>>>>> >> If >> >>>>>> someone assigns the return value to an array, the latter creates an >>>>>> array of one value (undef), which evaluates to true. The former will >>>>>> correctly handle all contexts." >>>>>> >>>>>> So I'm guessing at least some of these 450 occurrences *could* result >>>>>> >>>>>> >>>> in >>>> >>>> >>>>>> bugs and should probably be changed. >>>>>> >>>>>> Your opinion may differ :-) >>>>>> >>>>>> -- >>>>>> Dr Torsten Seemann http://www.vicbioinformatics.com >>>>>> Victorian Bioinformatics Consortium, Monash University, Australia >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> -- >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> Rutger Vos, PhD. candidate >>>> Department of Biological Sciences >>>> Simon Fraser University >>>> 8888 University Drive >>>> Burnaby, BC, V5A1S6 >>>> Phone: 604-291-5625 >>>> Fax: 604-291-3496 >>>> Personal site: http://www.sfu.ca/~rvosa >>>> FAB* lab: http://www.sfu.ca/~fabstar >>>> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> >>> >>> >> -- >> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Rutger Vos, PhD. candidate >> Department of Biological Sciences >> Simon Fraser University >> 8888 University Drive >> Burnaby, BC, V5A1S6 >> Phone: 604-291-5625 >> Fax: 604-291-3496 >> Personal site: http://www.sfu.ca/~rvosa >> FAB* lab: http://www.sfu.ca/~fabstar >> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > > > > > > -- ++++++++++++++++++++++++++++++++++++++++++++++++++++ Rutger Vos, PhD. candidate Department of Biological Sciences Simon Fraser University 8888 University Drive Burnaby, BC, V5A1S6 Phone: 604-291-5625 Fax: 604-291-3496 Personal site: http://www.sfu.ca/~rvosa FAB* lab: http://www.sfu.ca/~fabstar Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ ++++++++++++++++++++++++++++++++++++++++++++++++++++ From cjfields at uiuc.edu Tue May 30 18:08:26 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 30 May 2006 17:08:26 -0500 Subject: [Bioperl-l] For CVS developers - potentialpitfallwith"returnundef" In-Reply-To: <447CC001.4050000@sfu.ca> Message-ID: <001a01c68435$93135a50$15327e82@pyrimidine> Agreed. I would say, probably 6-12 months time, might be a good idea to try getting something actually started, maybe under the 'bioperl-experimental' title Jason has mentioned. One could always try getting a Bio::Root-like object going in Pugs/Perl6 as a starter and work up from there, with emphasis on key areas (seq. parsing, so on). CJF > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Rutger Vos > Sent: Tuesday, May 30, 2006 4:58 PM > To: bioperl list > Subject: Re: [Bioperl-l] For CVS developers - > potentialpitfallwith"returnundef" > > I've been following the perl6 mailing lists for a while now. I think > this time around it won't really take that long (one year?) for > pugs/perl6 stacks to become more than just toys. I think especially > large projects, like bioperl, will really benefit from the improved OO > implementation in perl6, so it might be of interest to at least > fantasize about it. > > Chris Fields wrote: > > Ha! Or may be the 'nonexistent' bioperl-experimental. Wonder what'll > > happen once Perl6 comes to term? > > > > -CJF > > > > > >> -----Original Message----- > >> From: Rutger Vos [mailto:rvosa at sfu.ca] > >> Sent: Tuesday, May 30, 2006 4:48 PM > >> To: Chris Fields > >> Subject: Re: [Bioperl-l] For CVS developers - potential > >> pitfallwith"returnundef" > >> > >> Surely this will all sort itself out in bioperl6 ;-) > >> > >> Chris Fields wrote: > >> > >>> Agreed, though I think these changes should be implemented at some > point > >>> (Conway's argument here makes sense and it is nice for Torsten to > check > >>> > >> this > >> > >>> out). If proper tests are written then any changes resulting in > errors > >>> should be picked up by checking the appropriate test suite, though I > >>> > >> know it > >> > >>> doesn't absolutely guarantee it. ; P > >>> > >>> Chris > >>> > >>> > >>> > >>>> -----Original Message----- > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>> bounces at lists.open-bio.org] On Behalf Of Rutger Vos > >>>> Sent: Tuesday, May 30, 2006 1:53 PM > >>>> To: bioperl-l at lists.open-bio.org > >>>> Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith > >>>> "returnundef" > >>>> > >>>> Although I agree with the sentiment of following PBP, I'm not so sure > >>>> changing 'return undef' to 'return' *now* will fix any bugs without > >>>> introducing new, subtle ones. > >>>> > >>>> Chris Fields wrote: > >>>> > >>>> > >>>>> Torsten, > >>>>> > >>>>> Any way you can post a list of some/all of the offending lines or > >>>>> > >>>>> > >>>> modules? > >>>> > >>>> > >>>>> Sounds like something to consider, but if the list is as large as > you > >>>>> > >>>>> > >>>> say we > >>>> > >>>> > >>>>> made need something (bugzilla? wiki?) to track the changes and make > >>>>> > >> sure > >> > >>>>> they pass tests; I'm sure a large majority will. > >>>>> > >>>>> I'm guessing Jason would want this somewhere on the project priority > >>>>> > >>>>> > >>>> list or > >>>> > >>>> > >>>>> bugzilla, with a link to the actual list, but I'm not sure. Maybe > >>>>> > >> start > >> > >>>> a > >>>> > >>>> > >>>>> page on the wiki for proposed code changes? > >>>>> > >>>>> Chris > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>>>> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > >>>>>> Sent: Tuesday, May 30, 2006 3:19 AM > >>>>>> To: bioperl-l at lists.open-bio.org > >>>>>> Subject: [Bioperl-l] For CVS developers - potential pitfall with > >>>>>> "returnundef" > >>>>>> > >>>>>> FYI Bioperl developers: > >>>>>> > >>>>>> I just audited the bioperl-live CVS and found about 450 occurrences > >>>>>> > >> of > >> > >>>>>> "return undef". > >>>>>> > >>>>>> Page 199 of "Perl Best Practices" by Damian Conway, and this URL > >>>>>> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html > >>>>>> > >> suggest: > >> > >>>>>> "Use return; instead of return undef; if you want to return > nothing. > >>>>>> > >> If > >> > >>>>>> someone assigns the return value to an array, the latter creates an > >>>>>> array of one value (undef), which evaluates to true. The former > will > >>>>>> correctly handle all contexts." > >>>>>> > >>>>>> So I'm guessing at least some of these 450 occurrences *could* > result > >>>>>> > >>>>>> > >>>> in > >>>> > >>>> > >>>>>> bugs and should probably be changed. > >>>>>> > >>>>>> Your opinion may differ :-) > >>>>>> > >>>>>> -- > >>>>>> Dr Torsten Seemann http://www.vicbioinformatics.com > >>>>>> Victorian Bioinformatics Consortium, Monash University, Australia > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>>>> > >>>>>> > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>> -- > >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>> Rutger Vos, PhD. candidate > >>>> Department of Biological Sciences > >>>> Simon Fraser University > >>>> 8888 University Drive > >>>> Burnaby, BC, V5A1S6 > >>>> Phone: 604-291-5625 > >>>> Fax: 604-291-3496 > >>>> Personal site: http://www.sfu.ca/~rvosa > >>>> FAB* lab: http://www.sfu.ca/~fabstar > >>>> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>> > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> > >>> > >>> > >>> > >>> > >> -- > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> Rutger Vos, PhD. candidate > >> Department of Biological Sciences > >> Simon Fraser University > >> 8888 University Drive > >> Burnaby, BC, V5A1S6 > >> Phone: 604-291-5625 > >> Fax: 604-291-3496 > >> Personal site: http://www.sfu.ca/~rvosa > >> FAB* lab: http://www.sfu.ca/~fabstar > >> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> > > > > > > > > > > > > > > -- > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > Rutger Vos, PhD. candidate > Department of Biological Sciences > Simon Fraser University > 8888 University Drive > Burnaby, BC, V5A1S6 > Phone: 604-291-5625 > Fax: 604-291-3496 > Personal site: http://www.sfu.ca/~rvosa > FAB* lab: http://www.sfu.ca/~fabstar > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ULNJUJERYDIX at spammotel.com Tue May 30 23:45:12 2006 From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau) Date: Wed, 31 May 2006 11:45:12 +0800 Subject: [Bioperl-l] SOLVED Bio::Graphics::Panel make ruler have neg values Message-ID: <5b6410e0605302045x5c420674x6f898a8a2973991a@mail.gmail.com> I am so sorry for the truncated email accidentally hit reply. if anyone is interested i have opted to change change line 161 of arrow.pm in Perl/site/lib/Bio/Graphics/Glyph/arrow.pm in linux its /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Glyph/arrow.pm $gd->string($font,$middle,$center+$a2-1,$label,$font_color) to $gd->string($font,$middle,$center+$a2-1,$label-1000,$font_color) just for this one-off use. strangely I found at line 112 for ver 1.51 bioperl in arrow.pm a hidden option for coords offset? my $relative_coords_offset = $self->option('relative_coords_offset'); $relative_coords_offset = 1 unless defined $relative_coords_offset; but entering the option -relative_coords_offset=>1000 in the arrow glyphs didn't do anything... Hi! > oh it was in a slightly different header asking about the create image map > feature. > I am using the stable version 1.4 of bioperl now. In any case I have not > added the sequence as a feature annotated seq. as I already have the bp > where the TF binds (in 1-1050 numberings) so what I did was to just add > graded segments based on the position. > I saw that there is a scale function for the arrow glyp however, it is a > multiply function, can it be hacked to take in a offset value (ie minus > the > scale by 1000?) > > cheers > kevin > > > Hi, > > > > For some reason I didn't see the first posting on this. In current > bioperl > > live, the ruler can have negative numberings - I use this routinely. You > > need > > to create a feature that starts in negative coordinates. What is > happening > > to > > you when you try this? > > > > Lincoln > > > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote: > > > Hi > > > thanks for the help offered thus far! > > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq > > using > > > bioperl. therefore i was asked to make the numberings as such (-1000) > is > > > there any way at all to do this in bioperl without changing the .pm > > file? > > > > > > thanks guys.. > > > kevin > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Lincoln D. Stein > > Cold Spring Harbor Laboratory > > 1 Bungtown Road > > Cold Spring Harbor, NY 11724 > > (516) 367-8380 (voice) > > (516) 367-8389 (fax) > > FOR URGENT MESSAGES & SCHEDULING, > > PLEASE CONTACT MY ASSISTANT, > > SANDRA MICHELSEN, AT michelse at cshl.edu > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From sb at mrc-dunn.cam.ac.uk Wed May 31 04:40:08 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Wed, 31 May 2006 09:40:08 +0100 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <447C7985.9000404@cornell.edu> References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk> <447C7985.9000404@cornell.edu> Message-ID: <447D5668.7070500@mrc-dunn.cam.ac.uk> Genevieve DeClerck wrote: > Thanks for your comment Sendu, it was very helpful. I think this must be > what's going on.. I am using $blast_report->next_result in both > subroutines. It appears that analyzing the blast results first w/ my > sort subroutine empties (?) the $blast_result object so that when I try > to print, there is nothing left to print. (and visa-versa when I print > first then try to sort). > So, from the looks of things, using next_result has the effect of > popping the Bio::Search::Result::ResultI objects off of the SearchIO > blast report object?? Not quite. It's more or less exactly like opening a file and then trying to read it all twice like this: open(FILE, "file"); while () { print # prints each line in the file } while () { print # never happens, we never enter this while loop } To get the second while loop to print anything we need to say seek(FILE, 0, 0) before it. Or in the first while loop store each line in an array, and then make the second loop a foreach through that array. > It seems I could get around this by making a copy of the blast report by > setting it to another new variable...(not the most elegant solution) but > I'm having trouble with this... > > If I do: > > my $blast_report_copy = $blast_report; > > I'm just copying the reference to the SearchIO blast result, so it > doesn't help me. How can I make another physical copy of this blast > result object? Seems like a simple thing but how to do it is escaping me. Not really a good idea, and it may not work anyway if the object contains a filehandle. But for a simple object you might recursively loop through the data structure and copy each element out into a similar data structure. > But better yet, the way to go is to 'reset the counter,' or to find a > way to look at/print/sort the results without removing data from the > blast result object. How is this done though?? It would be rather nice if this worked: my $blast_report = $factory->blastall($ref_seq_objs); my $blast_fh = $blast_report->fh(); while (<$blast_fh>) { # $_ is a ResultI object, use as normal } seek($blast_fh, 0, 0); # this would be great, but does it work? while <$blast_fh>) { # go through the results again in your second subroutine } An alternative hacky way of doing it, which may also not work, would be to go through your $blast_report as normal, but then before going through it a second time, say my $fh = $blast_report->_fh; seek($fh, 0, 0); Finally, the most sensible way (assuming bioperl provides no methods of its own for this) of solving the problem is, the first time you go through each next_result, next_hit and next_hsp, just store the returned objects in an array of arrays of arrays. Then the second time get the objects from your array structure instead of with the method calls. From heikki at sanbi.ac.za Wed May 31 06:55:18 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 31 May 2006 12:55:18 +0200 Subject: [Bioperl-l] =?iso-8859-1?q?For_CVS_developers_-_potential_pitfall?= =?iso-8859-1?q?with_=22returnundef=22?= In-Reply-To: <001801c68431$a586b2d0$15327e82@pyrimidine> References: <001801c68431$a586b2d0$15327e82@pyrimidine> Message-ID: <200605311255.19166.heikki@sanbi.ac.za> In my opinion the sooner the bugs get exposed the better. It is much more likely that there is a well hidden bug caused by assigning accidentally undef into an one element array that someone intentionally writing code that expects that behaviour! I removed (but did not commit yet) all undefs from my old Bio::Variation code and could not see any differences in the test output. Let's remove them! -Heikki On Tuesday 30 May 2006 23:40, Chris Fields wrote: > Agreed, though I think these changes should be implemented at some point > (Conway's argument here makes sense and it is nice for Torsten to check > this out). If proper tests are written then any changes resulting in > errors should be picked up by checking the appropriate test suite, though I > know it doesn't absolutely guarantee it. ; P > > Chris > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Rutger Vos > > Sent: Tuesday, May 30, 2006 1:53 PM > > To: bioperl-l at lists.open-bio.org > > Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith > > "returnundef" > > > > Although I agree with the sentiment of following PBP, I'm not so sure > > changing 'return undef' to 'return' *now* will fix any bugs without > > introducing new, subtle ones. > > > > Chris Fields wrote: > > > Torsten, > > > > > > Any way you can post a list of some/all of the offending lines or > > > > modules? > > > > > Sounds like something to consider, but if the list is as large as you > > > > say we > > > > > made need something (bugzilla? wiki?) to track the changes and make > > > sure they pass tests; I'm sure a large majority will. > > > > > > I'm guessing Jason would want this somewhere on the project priority > > > > list or > > > > > bugzilla, with a link to the actual list, but I'm not sure. Maybe > > > start > > > > a > > > > > page on the wiki for proposed code changes? > > > > > > Chris > > > > > >> -----Original Message----- > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > > >> Sent: Tuesday, May 30, 2006 3:19 AM > > >> To: bioperl-l at lists.open-bio.org > > >> Subject: [Bioperl-l] For CVS developers - potential pitfall with > > >> "returnundef" > > >> > > >> FYI Bioperl developers: > > >> > > >> I just audited the bioperl-live CVS and found about 450 occurrences of > > >> "return undef". > > >> > > >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL > > >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html > > >> suggest: > > >> > > >> "Use return; instead of return undef; if you want to return nothing. > > >> If someone assigns the return value to an array, the latter creates an > > >> array of one value (undef), which evaluates to true. The former will > > >> correctly handle all contexts." > > >> > > >> So I'm guessing at least some of these 450 occurrences *could* result > > > > in > > > > >> bugs and should probably be changed. > > >> > > >> Your opinion may differ :-) > > >> > > >> -- > > >> Dr Torsten Seemann http://www.vicbioinformatics.com > > >> Victorian Bioinformatics Consortium, Monash University, Australia > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Rutger Vos, PhD. candidate > > Department of Biological Sciences > > Simon Fraser University > > 8888 University Drive > > Burnaby, BC, V5A1S6 > > Phone: 604-291-5625 > > Fax: 604-291-3496 > > Personal site: http://www.sfu.ca/~rvosa > > FAB* lab: http://www.sfu.ca/~fabstar > > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of the Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Wed May 31 06:44:28 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 31 May 2006 12:44:28 +0200 Subject: [Bioperl-l] Bio::Restriction::IO issues In-Reply-To: <000f01c683f8$5771ed50$15327e82@pyrimidine> References: <000f01c683f8$5771ed50$15327e82@pyrimidine> Message-ID: <200605311244.29187.heikki@sanbi.ac.za> Chris, Thanks for stepping in. I feel partly responsible here because I originally changed some of Rob's code but have not followed up since. There have not been active development on these modules so do not worry about stepping on anyone's toes. -Heikki On Tuesday 30 May 2006 16:50, Chris Fields wrote: > Jason, Brian, et al, > > I found several major issues with Bio::Restriction::IO (this popped up > while bug squashing). In particular, the POD is pretty misleading. It > states (directly from perldoc): > > SYNOPSIS > use Bio::Restriction::IO; > > $in = Bio::Restriction::IO->new(-file => "inputfilename" , > -format => 'withrefm'); > $out = Bio::Restriction::IO->new(-file => ">outputfilename" , > -format => 'bairoch'); > my $res = $in->read; # a Bio::Restriction::EnzymeCollection > $out->write($res); > > # or > > # use Bio::Restriction::IO; > # > # #input file format can be read from the file extension (dat|xml) > # $in = Bio::Restriction::IO->newFh(-file => "inputfilename"); > # $out = Bio::Restriction::IO->newFh('-format' => 'xml'); > # > # # World's shortest flat<->xml format converter: > # print $out $_ while <$in>; > > So, I have found several problems with these modules. I really hate to > criticize code here, as my own is pretty hacky, but I think these are > things to seriously mull over: > > 1) Note that, though some of the lines above are commented they are > still there in POD and thus present in perldoc/pod2html etc. So, judging > from the above, it suggests using the script above should read in from one > format and write out to another (like SeqIO). However, NONE of the current > write() methods are implemented for any of the IO modules (withref, base, > itype2, bairoch), so this does not happen as expected. You get the nasty > thrown 'method not implemented error' instead when writing. > 2) The commented statements in POD above also suggest that REBASE XML > format is supported when there is no XML module. > 3) The Bio::Restriction::IO::bairoch module had multiple bugs which > made it unusable until I added a few small changes; it still can't handle > multisite/multicut enzymes properly, so in essence it is useless until that > is addressed. > 4) Bio::Restriction::IO inherits from Bio::SeqIO, though I'm not sure > why. Shouldn't it just inherit from Bio::Root::Root/Bio::Root::IO and make > up it's own methods? > > I'm working on at least getting the 'bairoch' input format up and running > (so at least it gets the enzymes into a > Bio::Restriction::Enzyme::Collection). From this point I'm not sure where > to proceed. The POD obviously needs to be corrected to reflect that > writing formats is not implemented (and the bit about XML should be taken > out completely); that's the easy part which I am working on and plan > committing today. However, these modules don't seem to be used too > frequently so I'm not sure whether it's worth spending too much time > getting these up to speed at the moment (adding write methods, switching to > Bio::Root::Root, etc); I have other priorities at the moment (including a > way overdue ListSummary). I'm also not sure who else is (using|working) on > these so I don't want to (make too many changes|step on someone else's > toes), but these are, IMHO, pretty serious problems. > > Any thoughts? > > Chris > > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of the Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From cjfields at uiuc.edu Wed May 31 09:10:00 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 08:10:00 -0500 Subject: [Bioperl-l] Bio::Restriction::IO issues In-Reply-To: <200605311244.29187.heikki@sanbi.ac.za> References: <000f01c683f8$5771ed50$15327e82@pyrimidine> <200605311244.29187.heikki@sanbi.ac.za> Message-ID: Heikki, I mainly just changed a few things so no one would get the wrong ideas from POD (that they write format as well) and added a few things to the TO DO. I also added a warning to Bio::Restriction::IO::bairoch for the multisite/multicut issue. Besides that I haven't done much to them. I also added a bit to the Project Priority List in case someone wants to take it up. I may tinker with it but it's not really high on my priority list. I've been pretty busy getting the ListSummaries back up to speed (very busy mail lists since the last one) and am writing/testing a new interface to NCBI EUtilities which I may donate at some in the next few months or so. Chris On May 31, 2006, at 5:44 AM, Heikki Lehvaslaiho wrote: > > Chris, > > Thanks for stepping in. I feel partly responsible here because I > originally > changed some of Rob's code but have not followed up since. > > There have not been active development on these modules so do not > worry about > stepping on anyone's toes. > > -Heikki > > On Tuesday 30 May 2006 16:50, Chris Fields wrote: >> Jason, Brian, et al, >> >> I found several major issues with Bio::Restriction::IO (this >> popped up >> while bug squashing). In particular, the POD is pretty >> misleading. It >> states (directly from perldoc): >> >> SYNOPSIS >> use Bio::Restriction::IO; >> >> $in = Bio::Restriction::IO->new(-file => "inputfilename" , >> -format => 'withrefm'); >> $out = Bio::Restriction::IO->new(-file => ">outputfilename" , >> -format => 'bairoch'); >> my $res = $in->read; # a Bio::Restriction::EnzymeCollection >> $out->write($res); >> >> # or >> >> # use Bio::Restriction::IO; >> # >> # #input file format can be read from the file extension >> (dat|xml) >> # $in = Bio::Restriction::IO->newFh(-file => >> "inputfilename"); >> # $out = Bio::Restriction::IO->newFh('-format' => 'xml'); >> # >> # # World's shortest flat<->xml format converter: >> # print $out $_ while <$in>; >> >> So, I have found several problems with these modules. I really >> hate to >> criticize code here, as my own is pretty hacky, but I think these are >> things to seriously mull over: >> >> 1) Note that, though some of the lines above are commented they are >> still there in POD and thus present in perldoc/pod2html etc. So, >> judging >> from the above, it suggests using the script above should read in >> from one >> format and write out to another (like SeqIO). However, NONE of >> the current >> write() methods are implemented for any of the IO modules >> (withref, base, >> itype2, bairoch), so this does not happen as expected. You get >> the nasty >> thrown 'method not implemented error' instead when writing. >> 2) The commented statements in POD above also suggest that REBASE XML >> format is supported when there is no XML module. >> 3) The Bio::Restriction::IO::bairoch module had multiple bugs which >> made it unusable until I added a few small changes; it still can't >> handle >> multisite/multicut enzymes properly, so in essence it is useless >> until that >> is addressed. >> 4) Bio::Restriction::IO inherits from Bio::SeqIO, though I'm not sure >> why. Shouldn't it just inherit from Bio::Root::Root/Bio::Root::IO >> and make >> up it's own methods? >> >> I'm working on at least getting the 'bairoch' input format up and >> running >> (so at least it gets the enzymes into a >> Bio::Restriction::Enzyme::Collection). From this point I'm not >> sure where >> to proceed. The POD obviously needs to be corrected to reflect that >> writing formats is not implemented (and the bit about XML should >> be taken >> out completely); that's the easy part which I am working on and plan >> committing today. However, these modules don't seem to be used too >> frequently so I'm not sure whether it's worth spending too much time >> getting these up to speed at the moment (adding write methods, >> switching to >> Bio::Root::Root, etc); I have other priorities at the moment >> (including a >> way overdue ListSummary). I'm also not sure who else is (using| >> working) on >> these so I don't want to (make too many changes|step on someone >> else's >> toes), but these are, IMHO, pretty serious problems. >> >> Any thoughts? >> >> Chris >> >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of the Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jay at jays.net Wed May 31 09:07:10 2006 From: jay at jays.net (Jay Hannah) Date: Wed, 31 May 2006 08:07:10 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl Message-ID: <447D94FE.8090305@jays.net> http://www.bioperl.org/wiki/Bptutorial.pl I think I just partially fulfilled this TODO: TODO: check if the POD is in the Wiki yet, and if not, put it here? I used Pod::Simple::Wiki (format 'mediawiki') to burn bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it the wiki page via my web browser. (Is that proper procedure? Is the plan to just do that manually from time to time as the document changes?) Now what? Should there be a new link on the far left of bioperl.org called "Tutorial"? It's an amazing document. IMHO it should be listed prominently on bioperl.org. HTH, j From osborne1 at optonline.net Wed May 31 09:58:01 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 31 May 2006 09:58:01 -0400 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <447D94FE.8090305@jays.net> Message-ID: Jay, Excellent! Now we need to answer a few more questions for ourselves: - Do we remove the file bptutorial.pl from the package now? I'd say yes, we don't want to have to maintain two bptutorials. - What do we do with the script part of bptutorial.pl? It certainly could be excised and put into the examples/ directory, for example, but this would break a few of the paths that are being used. - A link to bptutorial? Or a link to the existing tutorials page? http://www.bioperl.org/wiki/Tutorials. Any thoughts on these? Brian O. On 5/31/06 9:07 AM, "Jay Hannah" wrote: > http://www.bioperl.org/wiki/Bptutorial.pl > > I think I just partially fulfilled this TODO: > > TODO: check if the POD is in the Wiki yet, and if not, put it here? > > I used Pod::Simple::Wiki (format 'mediawiki') to burn > bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it the > wiki page via my web browser. (Is that proper procedure? Is the plan to just > do that manually from time to time as the document changes?) > > Now what? > > Should there be a new link on the far left of bioperl.org called "Tutorial"? > > It's an amazing document. IMHO it should be listed prominently on bioperl.org. > > HTH, > > j > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From luciap at sas.upenn.edu Wed May 31 10:06:13 2006 From: luciap at sas.upenn.edu (Lucia Peixoto) Date: Wed, 31 May 2006 10:06:13 -0400 Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function In-Reply-To: <6B175FC0-F9D4-4658-AF9D-23D7F1C1B241@duke.edu> References: <1149019912.447ca7085124e@128.91.55.38> <6B175FC0-F9D4-4658-AF9D-23D7F1C1B241@duke.edu> Message-ID: <1149084373.447da2d5c5339@128.91.55.38> Hi Thanks a couple more questions why is the bootstrap value stored as the node id? Is that right? also, in the add_descendant method, how do you set the $ignoreoverwrite parameter to true? Lucia Quoting Jason Stajich : > you need to special case the root - it won't have an ancestor. just > protect the my $parent = $node->ancestor with an if statement as I > did below > > On May 30, 2006, at 4:11 PM, Lucia Peixoto wrote: > > > Hi > > OK that was silly, but what I have in my code is what you just wrote > > But the problem is that if I write > > > > $parent->add_Descendent($child) > > > > it tells me that I am calling the method "ass_Descendent" on an > > undefined value > > (but I did define $parent before??) > > > > So here it goes the code so far: > > > > use Bio::TreeIO; > > my $in = new Bio::TreeIO(-file => 'Test2.tre', > > -format => 'newick'); > > my $out = new Bio::TreeIO(-file => '>mytree.out', > > -format => 'newick'); > > while( my $tree = $in->next_tree ) { > > foreach my $node ( grep { ! $_->is_Leaf() } $tree->get_nodes() ) { > > my $bootstrap=$node->_creation_id; > > > > if ($bootstrap < 70 ){ > > >>> if( my $parent = $node->ancestor ) { > > my @children=$node->get_all_Descendents; > > foreach my $child (@children){ > > $parent->add_Descendent($child); > > } > } > > > > ........ > > > > eventually I'll add (once I assigned the children to the parent > > succesfully): > > $tree->remove_Node($node); > > > > } > > } > > $out->write_tree($tree); > > } > > > > Quoting aaron.j.mackey at gsk.com: > > > >>> foreach $child (@children){ > >>> $parent=add_Descendent->$child; > >>> } > >> > >> I think what you want is $parent->add_Descendent($child) > >> > >> -Aaron > >> > > > > > > Lucia Peixoto > > Department of Biology,SAS > > University of Pennsylvania > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > Lucia Peixoto Department of Biology,SAS University of Pennsylvania From sb at mrc-dunn.cam.ac.uk Wed May 31 10:56:49 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Wed, 31 May 2006 15:56:49 +0100 Subject: [Bioperl-l] For CVS developers - potential pitfallwith "returnundef" In-Reply-To: <200605311255.19166.heikki@sanbi.ac.za> References: <001801c68431$a586b2d0$15327e82@pyrimidine> <200605311255.19166.heikki@sanbi.ac.za> Message-ID: <447DAEB1.4040509@mrc-dunn.cam.ac.uk> Heikki Lehvaslaiho wrote: > In my opinion the sooner the bugs get exposed the better. It is much more > likely that there is a well hidden bug caused by assigning accidentally undef > into an one element array that someone intentionally writing code that > expects that behaviour! > > I removed (but did not commit yet) all undefs from my old Bio::Variation code > and could not see any differences in the test output. > > Let's remove them! Just looking for all return undef;s isn't enough. It's entirely possible to do something like: my $return_value; { # do something that assigns to return_value on success # on failure, just do nothing } return $return_value; The bioperl docs will typically explicitly state that undef is returned, and under what circumstance. If a user suffers from the undef-into-array-problem, yes it can be slightly unexpected, but lots of unexpected things will happen when you don't use a method correctly, as per the docs! Fixing the return of undef is either a job that shouldn't be done, or a much harder job than expected. From bernd.web at gmail.com Wed May 31 10:30:30 2006 From: bernd.web at gmail.com (Bernd Web) Date: Wed, 31 May 2006 16:30:30 +0200 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: References: <447D94FE.8090305@jays.net> Message-ID: <716af09c0605310730o7de20489m674a07b5a928039d@mail.gmail.com> Hi, I am not sure to what extent bptutorial will be removed, but I actually like having bptutorial.pl in my BioPerl base for reference. regards, Bernd On 5/31/06, Brian Osborne wrote: > Jay, > > Excellent! Now we need to answer a few more questions for ourselves: > > - Do we remove the file bptutorial.pl from the package now? I'd say yes, we > don't want to have to maintain two bptutorials. > > - What do we do with the script part of bptutorial.pl? It certainly could be > excised and put into the examples/ directory, for example, but this would > break a few of the paths that are being used. > > - A link to bptutorial? Or a link to the existing tutorials page? > http://www.bioperl.org/wiki/Tutorials. > > Any thoughts on these? > > > Brian O. > > > On 5/31/06 9:07 AM, "Jay Hannah" wrote: > > > http://www.bioperl.org/wiki/Bptutorial.pl > > > > I think I just partially fulfilled this TODO: > > > > TODO: check if the POD is in the Wiki yet, and if not, put it here? > > > > I used Pod::Simple::Wiki (format 'mediawiki') to burn > > bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it the > > wiki page via my web browser. (Is that proper procedure? Is the plan to just > > do that manually from time to time as the document changes?) > > > > Now what? > > > > Should there be a new link on the far left of bioperl.org called "Tutorial"? > > > > It's an amazing document. IMHO it should be listed prominently on bioperl.org. > > > > HTH, > > > > j > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From lstein at cshl.edu Wed May 31 12:03:13 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 31 May 2006 12:03:13 -0400 Subject: [Bioperl-l] For CVS developers - potential pitfallwith "returnundef" In-Reply-To: <200605311255.19166.heikki@sanbi.ac.za> References: <001801c68431$a586b2d0$15327e82@pyrimidine> <200605311255.19166.heikki@sanbi.ac.za> Message-ID: <200605311203.13922.lstein@cshl.edu> I'm afraid that everything depends on the context. If the subroutine is documented to return a single scalar, then returning undef is appropriate. If the subroutine is documented to return "false" on failure, then one must call return (or "return ()" ). Changing all the return undefs to return is going to expose hidden bugs in the code written by people who are using BioPerl. While I agree wholeheartedly with the proposed audit, I think we need to expect that people are going to complain. Lincoln On Wednesday 31 May 2006 06:55, Heikki Lehvaslaiho wrote: > In my opinion the sooner the bugs get exposed the better. It is much more > likely that there is a well hidden bug caused by assigning accidentally > undef into an one element array that someone intentionally writing code > that expects that behaviour! > > I removed (but did not commit yet) all undefs from my old Bio::Variation > code and could not see any differences in the test output. > > Let's remove them! > > -Heikki > > On Tuesday 30 May 2006 23:40, Chris Fields wrote: > > Agreed, though I think these changes should be implemented at some point > > (Conway's argument here makes sense and it is nice for Torsten to check > > this out). If proper tests are written then any changes resulting in > > errors should be picked up by checking the appropriate test suite, though > > I know it doesn't absolutely guarantee it. ; P > > > > Chris > > > > > -----Original Message----- > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > bounces at lists.open-bio.org] On Behalf Of Rutger Vos > > > Sent: Tuesday, May 30, 2006 1:53 PM > > > To: bioperl-l at lists.open-bio.org > > > Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith > > > "returnundef" > > > > > > Although I agree with the sentiment of following PBP, I'm not so sure > > > changing 'return undef' to 'return' *now* will fix any bugs without > > > introducing new, subtle ones. > > > > > > Chris Fields wrote: > > > > Torsten, > > > > > > > > Any way you can post a list of some/all of the offending lines or > > > > > > modules? > > > > > > > Sounds like something to consider, but if the list is as large as you > > > > > > say we > > > > > > > made need something (bugzilla? wiki?) to track the changes and make > > > > sure they pass tests; I'm sure a large majority will. > > > > > > > > I'm guessing Jason would want this somewhere on the project priority > > > > > > list or > > > > > > > bugzilla, with a link to the actual list, but I'm not sure. Maybe > > > > start > > > > > > a > > > > > > > page on the wiki for proposed code changes? > > > > > > > > Chris > > > > > > > >> -----Original Message----- > > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > > > >> Sent: Tuesday, May 30, 2006 3:19 AM > > > >> To: bioperl-l at lists.open-bio.org > > > >> Subject: [Bioperl-l] For CVS developers - potential pitfall with > > > >> "returnundef" > > > >> > > > >> FYI Bioperl developers: > > > >> > > > >> I just audited the bioperl-live CVS and found about 450 occurrences > > > >> of "return undef". > > > >> > > > >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL > > > >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html > > > >> suggest: > > > >> > > > >> "Use return; instead of return undef; if you want to return nothing. > > > >> If someone assigns the return value to an array, the latter creates > > > >> an array of one value (undef), which evaluates to true. The former > > > >> will correctly handle all contexts." > > > >> > > > >> So I'm guessing at least some of these 450 occurrences *could* > > > >> result > > > > > > in > > > > > > >> bugs and should probably be changed. > > > >> > > > >> Your opinion may differ :-) > > > >> > > > >> -- > > > >> Dr Torsten Seemann http://www.vicbioinformatics.com > > > >> Victorian Bioinformatics Consortium, Monash University, Australia > > > >> > > > >> _______________________________________________ > > > >> Bioperl-l mailing list > > > >> Bioperl-l at lists.open-bio.org > > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > Rutger Vos, PhD. candidate > > > Department of Biological Sciences > > > Simon Fraser University > > > 8888 University Drive > > > Burnaby, BC, V5A1S6 > > > Phone: 604-291-5625 > > > Fax: 604-291-3496 > > > Personal site: http://www.sfu.ca/~rvosa > > > FAB* lab: http://www.sfu.ca/~fabstar > > > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Wed May 31 12:34:54 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 11:34:54 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: Message-ID: <001201c684d0$263c5530$15327e82@pyrimidine> Brian, Jay, I think it would be nice to have the tutorial prominently displayed somehow (Jay's suggestion), with a link provided via the tutorials page. Hopefully this will help with the bioperl newbies. Jay, looks like there are still some weird formatting issues with the bptutorial wiki page, something which I ran into before when getting the Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or more spaces preceding a line denotes code for some reason). Not much you can do in these cases except remove the extra spaces in those spots. Looking good though! Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Brian Osborne > Sent: Wednesday, May 31, 2006 8:58 AM > To: Jay Hannah; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl > > Jay, > > Excellent! Now we need to answer a few more questions for ourselves: > > - Do we remove the file bptutorial.pl from the package now? I'd say yes, > we > don't want to have to maintain two bptutorials. > > - What do we do with the script part of bptutorial.pl? It certainly could > be > excised and put into the examples/ directory, for example, but this would > break a few of the paths that are being used. > > - A link to bptutorial? Or a link to the existing tutorials page? > http://www.bioperl.org/wiki/Tutorials. > > Any thoughts on these? > > > Brian O. > > > On 5/31/06 9:07 AM, "Jay Hannah" wrote: > > > http://www.bioperl.org/wiki/Bptutorial.pl > > > > I think I just partially fulfilled this TODO: > > > > TODO: check if the POD is in the Wiki yet, and if not, put it here? > > > > I used Pod::Simple::Wiki (format 'mediawiki') to burn > > bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it > the > > wiki page via my web browser. (Is that proper procedure? Is the plan to > just > > do that manually from time to time as the document changes?) > > > > Now what? > > > > Should there be a new link on the far left of bioperl.org called > "Tutorial"? > > > > It's an amazing document. IMHO it should be listed prominently on > bioperl.org. > > > > HTH, > > > > j > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Wed May 31 12:44:31 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 11:44:31 -0500 Subject: [Bioperl-l] For CVS developers - potential pitfallwith"returnundef" In-Reply-To: <200605311203.13922.lstein@cshl.edu> Message-ID: <001301c684d1$7e849fd0$15327e82@pyrimidine> My feeling is the test suite 'should' pick up a large majority of problems if changes are made to these lines, the quotes there indicating the utopian idea that the tests are all written well (I believe 99% of the tests are, BTW). You can always try the changes (wholesale or on smaller chunks of code), see if they pass tests on different OS's using 'make/nmake test', revert the ones that didn't pass, etc. It's a matter of someone willing to try it out. I think the original argument proposed here (originating from Damian Conway and 'Perl Best Practices') is maybe using 'return undef' is something we shouldn't be doing since this can lead to subtle errors itself. Not that everything we do is considered 'a good practice' by any means. If I remember correctly from 'OOPerl', Conway doesn't like combined get/setters either (he prefers separate getters and setters); we use the 'bad' combined version predominately in Bioperl. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Lincoln Stein > Sent: Wednesday, May 31, 2006 11:03 AM > To: bioperl-l at lists.open-bio.org > Cc: Heikki Lehvaslaiho > Subject: Re: [Bioperl-l] For CVS developers - potential > pitfallwith"returnundef" > > I'm afraid that everything depends on the context. If the subroutine is > documented to return a single scalar, then returning undef is appropriate. > If > the subroutine is documented to return "false" on failure, then one must > call > return (or "return ()" ). > > Changing all the return undefs to return is going to expose hidden bugs in > the > code written by people who are using BioPerl. While I agree wholeheartedly > with the proposed audit, I think we need to expect that people are going > to > complain. > > Lincoln > > > On Wednesday 31 May 2006 06:55, Heikki Lehvaslaiho wrote: > > In my opinion the sooner the bugs get exposed the better. It is much > more > > likely that there is a well hidden bug caused by assigning accidentally > > undef into an one element array that someone intentionally writing code > > that expects that behaviour! > > > > I removed (but did not commit yet) all undefs from my old Bio::Variation > > code and could not see any differences in the test output. > > > > Let's remove them! > > > > -Heikki > > > > On Tuesday 30 May 2006 23:40, Chris Fields wrote: > > > Agreed, though I think these changes should be implemented at some > point > > > (Conway's argument here makes sense and it is nice for Torsten to > check > > > this out). If proper tests are written then any changes resulting in > > > errors should be picked up by checking the appropriate test suite, > though > > > I know it doesn't absolutely guarantee it. ; P > > > > > > Chris > > > > > > > -----Original Message----- > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > bounces at lists.open-bio.org] On Behalf Of Rutger Vos > > > > Sent: Tuesday, May 30, 2006 1:53 PM > > > > To: bioperl-l at lists.open-bio.org > > > > Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith > > > > "returnundef" > > > > > > > > Although I agree with the sentiment of following PBP, I'm not so > sure > > > > changing 'return undef' to 'return' *now* will fix any bugs without > > > > introducing new, subtle ones. > > > > > > > > Chris Fields wrote: > > > > > Torsten, > > > > > > > > > > Any way you can post a list of some/all of the offending lines or > > > > > > > > modules? > > > > > > > > > Sounds like something to consider, but if the list is as large as > you > > > > > > > > say we > > > > > > > > > made need something (bugzilla? wiki?) to track the changes and > make > > > > > sure they pass tests; I'm sure a large majority will. > > > > > > > > > > I'm guessing Jason would want this somewhere on the project > priority > > > > > > > > list or > > > > > > > > > bugzilla, with a link to the actual list, but I'm not sure. Maybe > > > > > start > > > > > > > > a > > > > > > > > > page on the wiki for proposed code changes? > > > > > > > > > > Chris > > > > > > > > > >> -----Original Message----- > > > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > > > > >> Sent: Tuesday, May 30, 2006 3:19 AM > > > > >> To: bioperl-l at lists.open-bio.org > > > > >> Subject: [Bioperl-l] For CVS developers - potential pitfall with > > > > >> "returnundef" > > > > >> > > > > >> FYI Bioperl developers: > > > > >> > > > > >> I just audited the bioperl-live CVS and found about 450 > occurrences > > > > >> of "return undef". > > > > >> > > > > >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL > > > > >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html > > > > >> suggest: > > > > >> > > > > >> "Use return; instead of return undef; if you want to return > nothing. > > > > >> If someone assigns the return value to an array, the latter > creates > > > > >> an array of one value (undef), which evaluates to true. The > former > > > > >> will correctly handle all contexts." > > > > >> > > > > >> So I'm guessing at least some of these 450 occurrences *could* > > > > >> result > > > > > > > > in > > > > > > > > >> bugs and should probably be changed. > > > > >> > > > > >> Your opinion may differ :-) > > > > >> > > > > >> -- > > > > >> Dr Torsten Seemann http://www.vicbioinformatics.com > > > > >> Victorian Bioinformatics Consortium, Monash University, Australia > > > > >> > > > > >> _______________________________________________ > > > > >> Bioperl-l mailing list > > > > >> Bioperl-l at lists.open-bio.org > > > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > _______________________________________________ > > > > > Bioperl-l mailing list > > > > > Bioperl-l at lists.open-bio.org > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > -- > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > Rutger Vos, PhD. candidate > > > > Department of Biological Sciences > > > > Simon Fraser University > > > > 8888 University Drive > > > > Burnaby, BC, V5A1S6 > > > > Phone: 604-291-5625 > > > > Fax: 604-291-3496 > > > > Personal site: http://www.sfu.ca/~rvosa > > > > FAB* lab: http://www.sfu.ca/~fabstar > > > > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Wed May 31 10:59:53 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 31 May 2006 10:59:53 -0400 Subject: [Bioperl-l] For CVS developers - potential pitfallwith "returnundef" In-Reply-To: <200605311255.19166.heikki@sanbi.ac.za> References: <001801c68431$a586b2d0$15327e82@pyrimidine> <200605311255.19166.heikki@sanbi.ac.za> Message-ID: <949F348A-391B-495D-ABCE-30BABC37FF05@gmx.net> I agree. Thanks to Torsten for the audit and Chris for stepping up. -hilmar On May 31, 2006, at 6:55 AM, Heikki Lehvaslaiho wrote: > In my opinion the sooner the bugs get exposed the better. It is > much more > likely that there is a well hidden bug caused by assigning > accidentally undef > into an one element array that someone intentionally writing code that > expects that behaviour! > > I removed (but did not commit yet) all undefs from my old > Bio::Variation code > and could not see any differences in the test output. > > Let's remove them! > > -Heikki > > On Tuesday 30 May 2006 23:40, Chris Fields wrote: >> Agreed, though I think these changes should be implemented at some >> point >> (Conway's argument here makes sense and it is nice for Torsten to >> check >> this out). If proper tests are written then any changes resulting in >> errors should be picked up by checking the appropriate test suite, >> though I >> know it doesn't absolutely guarantee it. ; P >> >> Chris >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Rutger Vos >>> Sent: Tuesday, May 30, 2006 1:53 PM >>> To: bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith >>> "returnundef" >>> >>> Although I agree with the sentiment of following PBP, I'm not so >>> sure >>> changing 'return undef' to 'return' *now* will fix any bugs without >>> introducing new, subtle ones. >>> >>> Chris Fields wrote: >>>> Torsten, >>>> >>>> Any way you can post a list of some/all of the offending lines or >>> >>> modules? >>> >>>> Sounds like something to consider, but if the list is as large >>>> as you >>> >>> say we >>> >>>> made need something (bugzilla? wiki?) to track the changes and make >>>> sure they pass tests; I'm sure a large majority will. >>>> >>>> I'm guessing Jason would want this somewhere on the project >>>> priority >>> >>> list or >>> >>>> bugzilla, with a link to the actual list, but I'm not sure. Maybe >>>> start >>> >>> a >>> >>>> page on the wiki for proposed code changes? >>>> >>>> Chris >>>> >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann >>>>> Sent: Tuesday, May 30, 2006 3:19 AM >>>>> To: bioperl-l at lists.open-bio.org >>>>> Subject: [Bioperl-l] For CVS developers - potential pitfall with >>>>> "returnundef" >>>>> >>>>> FYI Bioperl developers: >>>>> >>>>> I just audited the bioperl-live CVS and found about 450 >>>>> occurrences of >>>>> "return undef". >>>>> >>>>> Page 199 of "Perl Best Practices" by Damian Conway, and this URL >>>>> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html >>>>> suggest: >>>>> >>>>> "Use return; instead of return undef; if you want to return >>>>> nothing. >>>>> If someone assigns the return value to an array, the latter >>>>> creates an >>>>> array of one value (undef), which evaluates to true. The former >>>>> will >>>>> correctly handle all contexts." >>>>> >>>>> So I'm guessing at least some of these 450 occurrences *could* >>>>> result >>> >>> in >>> >>>>> bugs and should probably be changed. >>>>> >>>>> Your opinion may differ :-) >>>>> >>>>> -- >>>>> Dr Torsten Seemann http://www.vicbioinformatics.com >>>>> Victorian Bioinformatics Consortium, Monash University, Australia >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> Rutger Vos, PhD. candidate >>> Department of Biological Sciences >>> Simon Fraser University >>> 8888 University Drive >>> Burnaby, BC, V5A1S6 >>> Phone: 604-291-5625 >>> Fax: 604-291-3496 >>> Personal site: http://www.sfu.ca/~rvosa >>> FAB* lab: http://www.sfu.ca/~fabstar >>> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of the Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed May 31 14:08:43 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 31 May 2006 14:08:43 -0400 Subject: [Bioperl-l] For CVS developers - potential pitfallwith "returnundef" In-Reply-To: <200605311203.13922.lstein@cshl.edu> References: <001801c68431$a586b2d0$15327e82@pyrimidine> <200605311255.19166.heikki@sanbi.ac.za> <200605311203.13922.lstein@cshl.edu> Message-ID: On May 31, 2006, at 12:03 PM, Lincoln Stein wrote: > If the subroutine is documented to return "false" on failure, then > one must call > return (or "return ()" ). The problem seems to be that 'a value that evaluates to either true or false' and 'a [meaningful] value or undef' and 'a value or false' ('a value or no value) are not the same in perl. And what would/should one expect if the doc states 'true on success and false otherwise'? Maybe the documentation should also be fixed to avoid any ambiguity. I.e., avoid documenting 'a value or false' because it may be ambiguous (not only) to the less proficient. 'True or false' should imply a value being returned. Comments? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From lstein at cshl.edu Wed May 31 14:14:59 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 31 May 2006 14:14:59 -0400 Subject: [Bioperl-l] For CVS developers - potential pitfallwith "returnundef" In-Reply-To: References: <001801c68431$a586b2d0$15327e82@pyrimidine> <200605311203.13922.lstein@cshl.edu> Message-ID: <200605311415.00414.lstein@cshl.edu> If the documentation says "returns false" then I expect to be able to do this: @result = foo(); die "foo() failed" unless @result; If the documentation says "returns undef" then I expect this: @result = foo(); die "foo() failed" unless $result[0]; Lincoln On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote: > On May 31, 2006, at 12:03 PM, Lincoln Stein wrote: > > If the subroutine is documented to return "false" on failure, then > > one must call > > return (or "return ()" ). > > The problem seems to be that 'a value that evaluates to either true > or false' and 'a [meaningful] value or undef' and 'a value or > false' ('a value or no value) are not the same in perl. And what > would/should one expect if the doc states 'true on success and false > otherwise'? > > Maybe the documentation should also be fixed to avoid any ambiguity. > I.e., avoid documenting 'a value or false' because it may be > ambiguous (not only) to the less proficient. 'True or false' should > imply a value being returned. > > Comments? > > -hilmar -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From hlapp at gmx.net Wed May 31 14:31:21 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 31 May 2006 14:31:21 -0400 Subject: [Bioperl-l] For CVS developers - potential pitfallwith "returnundef" In-Reply-To: <200605311415.00414.lstein@cshl.edu> References: <001801c68431$a586b2d0$15327e82@pyrimidine> <200605311203.13922.lstein@cshl.edu> <200605311415.00414.lstein@cshl.edu> Message-ID: <241E77AE-8D1E-4708-9C4C-8A9619822DB4@gmx.net> On May 31, 2006, at 2:14 PM, Lincoln Stein wrote: > If the documentation says "returns false" then I expect to be able > to do this: > > @result = foo(); > die "foo() failed" unless @result; Except if the alternative to 'false' would be a scalar, you normally wouldn't assign it to an array, would you? I.e., I wouldn't expect this strict of a behavior from an open-source package written largely from people whose job is biological science, not programming perl knowing and following DC to the letter ... I'd rather be on the safe side and assign to a scalar. Just my $0.02 ... -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Wed May 31 14:50:30 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 13:50:30 -0500 Subject: [Bioperl-l] For CVS developers - potential pitfallwith"returnundef" In-Reply-To: <447DAEB1.4040509@mrc-dunn.cam.ac.uk> Message-ID: <001801c684e3$16e33730$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Wednesday, May 31, 2006 9:57 AM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] For CVS developers - potential > pitfallwith"returnundef" > > Heikki Lehvaslaiho wrote: > > In my opinion the sooner the bugs get exposed the better. It is much > more > > likely that there is a well hidden bug caused by assigning accidentally > undef > > into an one element array that someone intentionally writing code that > > expects that behaviour! > > > > I removed (but did not commit yet) all undefs from my old Bio::Variation > code > > and could not see any differences in the test output. > > > > Let's remove them! > > Just looking for all return undef;s isn't enough. It's entirely possible > to do something like: > > my $return_value; > { > # do something that assigns to return_value on success > # on failure, just do nothing > } > return $return_value; Agreed, though looking for these is obviously much harder. The way to get around those is: return $return_value if $return_value; return; which I've seen used in a number of get/set methods. > The bioperl docs will typically explicitly state that undef is returned, > and under what circumstance. If a user suffers from the > undef-into-array-problem, yes it can be slightly unexpected, but lots of > unexpected things will happen when you don't use a method correctly, as > per the docs! Right, but the argument you make is that code will always work as expected from the perldoc examples. My recent experiences with the Bio::Restriction::IO and Bio::Species classes show that the docs are not always up-to-date and may indicate the unimplemented intent of the author more than the actual implementation. Again, I believe a large majority of the docs are fine, but it's those few errors that made a devil's advocate of me... > Fixing the return of undef is either a job that shouldn't be done, or a > much harder job than expected. I don't think ignoring the problem is the best answer here though I agree the problem is more complicated than at first glance. Judging from code I'm trolled through a bit lately I've seen a lot of methods (mainly get/setters) that are essentially copied multiple times in the same or across similar modules to save time. You could see a scenario where, in those instances, so-called 'bad code' would spread quite quickly. I think adding a wiki page to address some of these issues would be nice, something separate from the Project Priority List. Chris _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From forward at hongyu.org Wed May 31 14:03:46 2006 From: forward at hongyu.org (Hongyu Zhang) Date: Wed, 31 May 2006 11:03:46 -0700 Subject: [Bioperl-l] New functions for SimpleAlign.pm Message-ID: <20060531110346.78xod658td8o0w0w@hongyu.org> Greetings, I am a new member in this mailing list. Nice to be here. I wrote two more functions for the alignment module SimpleAlign.pm that calculate the percentage of identity based on the shortest and longest sequence length, respectively. I also found an error in the no_residues() function that calculate the number of residues in the alignment. I am wondering whether they can be added to the official bioperl package. I've contacted the original author of this module, Heikki Lehvaslaiho, a couple of weeks ago, but haven't heard from him yet. Thanks. -- Hongyu Zhang, Ph.D. Computational biologist Ceres Inc. From cjfields at uiuc.edu Wed May 31 15:39:26 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 14:39:26 -0500 Subject: [Bioperl-l] New functions for SimpleAlign.pm In-Reply-To: <20060531110346.78xod658td8o0w0w@hongyu.org> Message-ID: <001901c684e9$ed4a1720$15327e82@pyrimidine> I added a bit to the FAQ about this: http://www.bioperl.org/wiki/FAQ#How_do_I_submit_a_patch_or_enhancement_to_Bi oPerl.3F and the HOWTO explains things a bit more directly: http://www.bioperl.org/wiki/HOWTO:SubmitPatch In brief, these need to be submitted to Bugzilla as either code enhancements (for your added methods) or bugs with the patch to the relevant code. Code enhancements probably should include some code and test cases to demonstrate usage. Patches to buggy code are checked to make sure they pass relevant tests by the core developers. Submitting it to the mail list is definitely the first step, though, so you're on the right path. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hongyu Zhang > Sent: Wednesday, May 31, 2006 1:04 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] New functions for SimpleAlign.pm > > Greetings, > > I am a new member in this mailing list. Nice to be here. > > I wrote two more functions for the alignment module SimpleAlign.pm > that calculate the percentage of identity based on the shortest and > longest sequence length, respectively. I also found an error in the > no_residues() function that calculate the number of residues in the > alignment. > > I am wondering whether they can be added to the official bioperl > package. I've contacted the original author of this module, Heikki > Lehvaslaiho, a couple of weeks ago, but haven't heard from him yet. > > Thanks. > > -- > Hongyu Zhang, Ph.D. > Computational biologist > Ceres Inc. > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Wed May 31 16:40:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 15:40:19 -0500 Subject: [Bioperl-l] For CVS developers - potential pitfallwith"returnundef" In-Reply-To: <200605311415.00414.lstein@cshl.edu> Message-ID: <002001c684f2$6fb7daf0$15327e82@pyrimidine> What about modules that have 'throw_not_implemented' statements present? Here's a list with the total for each. Some of these are interfaces (I got rid of a number that ended in 'I' or 'IO' to remove the I/IO interfaces but it misses a few). There are a number here that are implementations, though (Bio::AlignIO::maf, Bio::Restriction:IO::*), so they are technically incomplete: Instances: 1 Module : Bio::AlignIO::maf Instances: 25 Module : Bio::Assembly::Contig Instances: 2 Module : Bio::Assembly::ContigAnalysis Instances: 2 Module : Bio::Biblio::BiblioBase Instances: 4 Module : Bio::DB::Expression Instances: 2 Module : Bio::DB::Expression::geo Instances: 5 Module : Bio::DB::Flat Instances: 2 Module : Bio::DB::Query::WebQuery Instances: 17 Module : Bio::DB::SeqFeature::Store Instances: 2 Module : Bio::DB::SeqVersion Instances: 3 Module : Bio::DB::Taxonomy Instances: 1 Module : Bio::FeatureIO::bed Instances: 1 Module : Bio::Map::Marker Instances: 1 Module : Bio::MapIO::fpc Instances: 1 Module : Bio::MapIO::mapmaker Instances: 1 Module : Bio::Restriction::IO::bairoch Instances: 1 Module : Bio::Restriction::IO::itype2 Instances: 1 Module : Bio::Restriction::IO::withrefm Instances: 1 Module : Bio::Tools::Analysis::SimpleAnalysisBase Instances: 3 Module : Bio::Tools::Run::WrapperBase Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Lincoln Stein > Sent: Wednesday, May 31, 2006 1:15 PM > To: Hilmar Lapp > Cc: bioperl-l at lists.open-bio.org; Heikki Lehvaslaiho > Subject: Re: [Bioperl-l] For CVS developers - potential > pitfallwith"returnundef" > > If the documentation says "returns false" then I expect to be able to do > this: > > @result = foo(); > die "foo() failed" unless @result; > > If the documentation says "returns undef" then I expect this: > > @result = foo(); > die "foo() failed" unless $result[0]; > > Lincoln > > > On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote: > > On May 31, 2006, at 12:03 PM, Lincoln Stein wrote: > > > If the subroutine is documented to return "false" on failure, then > > > one must call > > > return (or "return ()" ). > > > > The problem seems to be that 'a value that evaluates to either true > > or false' and 'a [meaningful] value or undef' and 'a value or > > false' ('a value or no value) are not the same in perl. And what > > would/should one expect if the doc states 'true on success and false > > otherwise'? > > > > Maybe the documentation should also be fixed to avoid any ambiguity. > > I.e., avoid documenting 'a value or false' because it may be > > ambiguous (not only) to the less proficient. 'True or false' should > > imply a value being returned. > > > > Comments? > > > > -hilmar > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lstein at cshl.edu Wed May 31 17:07:06 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 31 May 2006 17:07:06 -0400 Subject: [Bioperl-l] For CVS developers - potential pitfallwith"returnundef" In-Reply-To: <002001c684f2$6fb7daf0$15327e82@pyrimidine> References: <002001c684f2$6fb7daf0$15327e82@pyrimidine> Message-ID: <200605311707.08196.lstein@cshl.edu> > Instances: 17 Module : Bio::DB::SeqFeature::Store This is intentional. Bio::DB::SeqFeature::Store is intended to be a virtual base class. The throw_not_implemented() calls are there to force developers to override the needed interface methods. If this is not the right way to do it, let me know and I'll fix it. Lincoln > Instances: 2 Module : Bio::DB::SeqVersion > Instances: 3 Module : Bio::DB::Taxonomy > Instances: 1 Module : Bio::FeatureIO::bed > Instances: 1 Module : Bio::Map::Marker > Instances: 1 Module : Bio::MapIO::fpc > Instances: 1 Module : Bio::MapIO::mapmaker > Instances: 1 Module : Bio::Restriction::IO::bairoch > Instances: 1 Module : Bio::Restriction::IO::itype2 > Instances: 1 Module : Bio::Restriction::IO::withrefm > Instances: 1 Module : Bio::Tools::Analysis::SimpleAnalysisBase > Instances: 3 Module : Bio::Tools::Run::WrapperBase > > Chris > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Lincoln Stein > > Sent: Wednesday, May 31, 2006 1:15 PM > > To: Hilmar Lapp > > Cc: bioperl-l at lists.open-bio.org; Heikki Lehvaslaiho > > Subject: Re: [Bioperl-l] For CVS developers - potential > > pitfallwith"returnundef" > > > > If the documentation says "returns false" then I expect to be able to do > > this: > > > > @result = foo(); > > die "foo() failed" unless @result; > > > > If the documentation says "returns undef" then I expect this: > > > > @result = foo(); > > die "foo() failed" unless $result[0]; > > > > Lincoln > > > > On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote: > > > On May 31, 2006, at 12:03 PM, Lincoln Stein wrote: > > > > If the subroutine is documented to return "false" on failure, then > > > > one must call > > > > return (or "return ()" ). > > > > > > The problem seems to be that 'a value that evaluates to either true > > > or false' and 'a [meaningful] value or undef' and 'a value or > > > false' ('a value or no value) are not the same in perl. And what > > > would/should one expect if the doc states 'true on success and false > > > otherwise'? > > > > > > Maybe the documentation should also be fixed to avoid any ambiguity. > > > I.e., avoid documenting 'a value or false' because it may be > > > ambiguous (not only) to the less proficient. 'True or false' should > > > imply a value being returned. > > > > > > Comments? > > > > > > -hilmar > > > > -- > > Lincoln D. Stein > > Cold Spring Harbor Laboratory > > 1 Bungtown Road > > Cold Spring Harbor, NY 11724 > > (516) 367-8380 (voice) > > (516) 367-8389 (fax) > > FOR URGENT MESSAGES & SCHEDULING, > > PLEASE CONTACT MY ASSISTANT, > > SANDRA MICHELSEN, AT michelse at cshl.edu > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From hlapp at gmx.net Wed May 31 17:21:57 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 31 May 2006 17:21:57 -0400 Subject: [Bioperl-l] For CVS developers - throw_not_implemented In-Reply-To: <002001c684f2$6fb7daf0$15327e82@pyrimidine> References: <002001c684f2$6fb7daf0$15327e82@pyrimidine> Message-ID: On May 31, 2006, at 4:40 PM, Chris Fields wrote: > What about modules that have 'throw_not_implemented' statements > present? Those are often if not always legitimate - the problem are those that don't have them but fail to override an inherited interface or abstract method. If something is not implemented what is the better way to express this other than throwing an exception? (and if it's not an interface or abstract base class, saying so in the documentation) -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed May 31 17:25:48 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 31 May 2006 17:25:48 -0400 Subject: [Bioperl-l] For CVS developers - potential pitfallwith"returnundef" In-Reply-To: <001801c684e3$16e33730$15327e82@pyrimidine> References: <001801c684e3$16e33730$15327e82@pyrimidine> Message-ID: <8AA04BF0-FA79-43CF-9FBB-310314FECD91@gmx.net> On May 31, 2006, at 2:50 PM, Chris Fields wrote: > I've seen a lot of methods (mainly get/setters) > that are essentially copied multiple times in the same or across > similar > modules to save time. You could see a scenario where, in those > instances, > so-called 'bad code' would spread quite quickly. This will usually be code generated by macros, e.g. the emacs macros for getter/setter generation for properties. If the macro generates wrong code, that's indeed pretty bad. (We've had that.) OTOH it should be spotted quickly as well. And macro changes or new macros should probably be scrutinized by all eyes watching ... -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Wed May 31 17:40:22 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 16:40:22 -0500 Subject: [Bioperl-l] For CVS developers - throw_not_implemented In-Reply-To: Message-ID: <002401c684fa$d28e7640$15327e82@pyrimidine> I think, as long as it's reflected in the docs that something doesn't work (hasn't been implemented) then there's no problem. It's when the docs are misleading that we run into problems. The sticking point lies with some classes, such as IO classes (like SeqIO, or Restrict::IO, with read and write methods) where the IO base class specifies that it is possible to read and write a particular format but the actual implementation varies according to whether or not the derived class overrides the base or interface method (in other words, 'doesn't work as advertised' only in specific circumstances). I don't know how to solve this issue except to add in the docs that specific formats don't implement write() methods. Personally, I haven't had an issue with it and it probably makes no difference, but I think it needs to be pointed out. The most extreme I ran into was Bio::Restriction::IO, which had 3 out of 4 plugin modules that didn't implement the write() method but left this in the synopsis in POD: use Bio::Restriction::IO; $in = Bio::Restriction::IO->new(-file => "inputfilename" , -format => 'withrefm'); $out = Bio::Restriction::IO->new(-file => ">outputfilename" , -format => 'bairoch'); my $res = $in->read; # a Bio::Restriction::EnzymeCollection $out->write($res); # or # use Bio::Restriction::IO; # # #input file format can be read from the file extension (dat|xml) # $in = Bio::Restriction::IO->newFh(-file => "inputfilename"); # $out = Bio::Restriction::IO->newFh('-format' => 'xml'); # # # World's shortest flat<->xml format converter: # print $out $_ while <$in>; None of this code works; in fact, no XML parser even exists for these IO classes! Bio::AlignIO also has a few as well (maf and Stockholm formats don't write). Chris > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Wednesday, May 31, 2006 4:22 PM > To: Chris Fields > Cc: lstein at cshl.edu; bioperl-l at lists.open-bio.org; 'Heikki Lehvaslaiho' > Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented > > > On May 31, 2006, at 4:40 PM, Chris Fields wrote: > > > What about modules that have 'throw_not_implemented' statements > > present? > > Those are often if not always legitimate - the problem are those that > don't have them but fail to override an inherited interface or > abstract method. > > If something is not implemented what is the better way to express > this other than throwing an exception? (and if it's not an interface > or abstract base class, saying so in the documentation) > > -hilmar > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From hlapp at gmx.net Wed May 31 17:55:37 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 31 May 2006 17:55:37 -0400 Subject: [Bioperl-l] For CVS developers - throw_not_implemented In-Reply-To: <002401c684fa$d28e7640$15327e82@pyrimidine> References: <002401c684fa$d28e7640$15327e82@pyrimidine> Message-ID: This is documentation cruft resulting from copy&paste w/o later fixing it. (which isn't a justification) Note that not implementing the write is as legitimate as not implementing the read method ... It should be pointed out in the documentation though that it will depend on the actual implementation of the format whether it supports reading or writing or both. -hilmar On May 31, 2006, at 5:40 PM, Chris Fields wrote: > I think, as long as it's reflected in the docs that something > doesn't work > (hasn't been implemented) then there's no problem. It's when the > docs are > misleading that we run into problems. > > The sticking point lies with some classes, such as IO classes (like > SeqIO, > or Restrict::IO, with read and write methods) where the IO base class > specifies that it is possible to read and write a particular format > but the > actual implementation varies according to whether or not the > derived class > overrides the base or interface method (in other words, 'doesn't > work as > advertised' only in specific circumstances). I don't know how to > solve this > issue except to add in the docs that specific formats don't implement > write() methods. > > Personally, I haven't had an issue with it and it probably makes no > difference, but I think it needs to be pointed out. The most > extreme I ran > into was Bio::Restriction::IO, which had 3 out of 4 plugin modules > that > didn't implement the write() method but left this in the synopsis > in POD: > > use Bio::Restriction::IO; > > $in = Bio::Restriction::IO->new(-file => "inputfilename" , > -format => 'withrefm'); > $out = Bio::Restriction::IO->new(-file => ">outputfilename" , > -format => 'bairoch'); > my $res = $in->read; # a Bio::Restriction::EnzymeCollection > $out->write($res); > > # or > > # use Bio::Restriction::IO; > # > # #input file format can be read from the file extension (dat| > xml) > # $in = Bio::Restriction::IO->newFh(-file => "inputfilename"); > # $out = Bio::Restriction::IO->newFh('-format' => 'xml'); > # > # # World's shortest flat<->xml format converter: > # print $out $_ while <$in>; > > None of this code works; in fact, no XML parser even exists for > these IO > classes! Bio::AlignIO also has a few as well (maf and Stockholm > formats > don't write). > > Chris > > >> -----Original Message----- >> From: Hilmar Lapp [mailto:hlapp at gmx.net] >> Sent: Wednesday, May 31, 2006 4:22 PM >> To: Chris Fields >> Cc: lstein at cshl.edu; bioperl-l at lists.open-bio.org; 'Heikki >> Lehvaslaiho' >> Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented >> >> >> On May 31, 2006, at 4:40 PM, Chris Fields wrote: >> >>> What about modules that have 'throw_not_implemented' statements >>> present? >> >> Those are often if not always legitimate - the problem are those that >> don't have them but fail to override an inherited interface or >> abstract method. >> >> If something is not implemented what is the better way to express >> this other than throwing an exception? (and if it's not an interface >> or abstract base class, saying so in the documentation) >> >> -hilmar >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From slenk at emich.edu Wed May 31 17:52:13 2006 From: slenk at emich.edu (Stephen Gordon Lenk) Date: Wed, 31 May 2006 17:52:13 -0400 Subject: [Bioperl-l] For CVS developers - throw_not_implemented Message-ID: <100682f110067a83.10067a83100682f1@emich.edu> Isn't it fairly standard in OO schemes/languages to have an exception thrown if a method can't be found at the end of a search up the class hierarchy? I recall being very mad at Smalltalk because "method not found" kept biting me. C++ has pure virtual base classes that do not allow objects to be instantiated directly; they are meant to be inherited and then implemented. Perl 6 was mentioned a bit back. Is this issue addressed there? Should it be? Do the Bioperl people feed their needs into Perl 6 so that all the code effort to make Bio::Root is handled for them in the next effort by Perl 6 itself. Make the Perl 6 people solve these issues with your input, then you will not have to deal with implementing it yourselves. I'll just bet that you are not the only potential users of Perl 6 who will have to solve these issues eventually. ----- Original Message ----- From: Hilmar Lapp Date: Wednesday, May 31, 2006 5:21 pm Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented > > On May 31, 2006, at 4:40 PM, Chris Fields wrote: > > > What about modules that have 'throw_not_implemented' statements > > present? > > Those are often if not always legitimate - the problem are those > that > don't have them but fail to override an inherited interface or > abstract method. > > If something is not implemented what is the better way to express > this other than throwing an exception? (and if it's not an > interface > or abstract base class, saying so in the documentation) > > -hilmar > > -- > ========================================================= == > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > ========================================================= == > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From arareko at campus.iztacala.unam.mx Wed May 31 18:49:03 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Wed, 31 May 2006 17:49:03 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <001201c684d0$263c5530$15327e82@pyrimidine> References: <001201c684d0$263c5530$15327e82@pyrimidine> Message-ID: <447E1D5F.1050807@campus.iztacala.unam.mx> Brian, Jay, Chris, I agree with what Bernd Web said in another reply. For some people will be nice to still be able to run the script from the codebase and interact with it. I don't think it should be a lot of problem to maintain both tutorials, as long as the 'main' one is the one in the CVS tree. By reading what Jay did in order to convert it into mediawiki format, I suppose this can be easily done again for each new change to the script (again, this is just my guessing). Besides, as far as I've seen, there aren't frequent commits to the script at all. I've added a link in the left menu of the wiki. If you think it should point to the Tutorials page instead of the Bptutorial.pl page please let me know. Regards, Mauricio. Chris Fields wrote: > Brian, Jay, > > I think it would be nice to have the tutorial prominently displayed somehow > (Jay's suggestion), with a link provided via the tutorials page. Hopefully > this will help with the bioperl newbies. > > Jay, looks like there are still some weird formatting issues with the > bptutorial wiki page, something which I ran into before when getting the > Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or more > spaces preceding a line denotes code for some reason). Not much you can do > in these cases except remove the extra spaces in those spots. Looking good > though! > > Chris > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Brian Osborne >> Sent: Wednesday, May 31, 2006 8:58 AM >> To: Jay Hannah; bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl >> >> Jay, >> >> Excellent! Now we need to answer a few more questions for ourselves: >> >> - Do we remove the file bptutorial.pl from the package now? I'd say yes, >> we >> don't want to have to maintain two bptutorials. >> >> - What do we do with the script part of bptutorial.pl? It certainly could >> be >> excised and put into the examples/ directory, for example, but this would >> break a few of the paths that are being used. >> >> - A link to bptutorial? Or a link to the existing tutorials page? >> http://www.bioperl.org/wiki/Tutorials. >> >> Any thoughts on these? >> >> >> Brian O. >> >> >> On 5/31/06 9:07 AM, "Jay Hannah" wrote: >> >>> http://www.bioperl.org/wiki/Bptutorial.pl >>> >>> I think I just partially fulfilled this TODO: >>> >>> TODO: check if the POD is in the Wiki yet, and if not, put it here? >>> >>> I used Pod::Simple::Wiki (format 'mediawiki') to burn >>> bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it >> the >>> wiki page via my web browser. (Is that proper procedure? Is the plan to >> just >>> do that manually from time to time as the document changes?) >>> >>> Now what? >>> >>> Should there be a new link on the far left of bioperl.org called >> "Tutorial"? >>> It's an amazing document. IMHO it should be listed prominently on >> bioperl.org. >>> HTH, >>> >>> j >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Wed May 31 20:43:48 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 19:43:48 -0500 Subject: [Bioperl-l] For CVS developers - potential pitfallwith"returnundef" In-Reply-To: <200605311707.08196.lstein@cshl.edu> Message-ID: <002801c68514$72f11480$15327e82@pyrimidine> > -----Original Message----- > From: Lincoln Stein [mailto:lstein at cshl.edu] > Sent: Wednesday, May 31, 2006 4:07 PM > To: Chris Fields > Cc: 'Hilmar Lapp'; bioperl-l at lists.open-bio.org; 'Heikki Lehvaslaiho' > Subject: Re: [Bioperl-l] For CVS developers - potential > pitfallwith"returnundef" > > > > Instances: 17 Module : Bio::DB::SeqFeature::Store > > This is intentional. Bio::DB::SeqFeature::Store is intended to be a > virtual > base class. The throw_not_implemented() calls are there to force > developers > to override the needed interface methods. > > If this is not the right way to do it, let me know and I'll fix it. That's the right way, though I don't really know what the 'right way' is. Sorry Lincoln, didn't mean to imply anything directly at you specifically; I responded to your last post to stay in the thread, so to speak. It was meant to be a general statement that some classes haven't implemented methods specified by their abstract base or interface class. This is just output from a quickie script I wrote up to check on this and see how many of these statements are out there, and since there isn't a full-proof method to know what an abstract base class is, it pulls in a few abstract classes (such as yours) along with all the others. At least there aren't as many hits as Torsten's ~400-500 for 'return undef'! Anyway, I'm not sure what would be the best place to address code problems or issues like the unimplemented methods issue or Torsten's audits (list, wiki, etc); it's a delicate issue b/c it's bordering on code critiquing and what constitutes good vs. bad code. I remember some pretty heated arguments about the 'proper' way to do things a while back involving AUTOLOAD'ing methods, which I think is summarized somewhere in the wiki. Myself, I'm a microbiologist and not a programmer, so I'm prone to bouts of hackery, but I try to have the code at least do what the docs state. Chris > Lincoln > > > > Instances: 2 Module : Bio::DB::SeqVersion > > Instances: 3 Module : Bio::DB::Taxonomy > > Instances: 1 Module : Bio::FeatureIO::bed > > Instances: 1 Module : Bio::Map::Marker > > Instances: 1 Module : Bio::MapIO::fpc > > Instances: 1 Module : Bio::MapIO::mapmaker > > Instances: 1 Module : Bio::Restriction::IO::bairoch > > Instances: 1 Module : Bio::Restriction::IO::itype2 > > Instances: 1 Module : Bio::Restriction::IO::withrefm > > Instances: 1 Module : Bio::Tools::Analysis::SimpleAnalysisBase > > Instances: 3 Module : Bio::Tools::Run::WrapperBase > > > > Chris > > > > > -----Original Message----- > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > bounces at lists.open-bio.org] On Behalf Of Lincoln Stein > > > Sent: Wednesday, May 31, 2006 1:15 PM > > > To: Hilmar Lapp > > > Cc: bioperl-l at lists.open-bio.org; Heikki Lehvaslaiho > > > Subject: Re: [Bioperl-l] For CVS developers - potential > > > pitfallwith"returnundef" > > > > > > If the documentation says "returns false" then I expect to be able to > do > > > this: > > > > > > @result = foo(); > > > die "foo() failed" unless @result; > > > > > > If the documentation says "returns undef" then I expect this: > > > > > > @result = foo(); > > > die "foo() failed" unless $result[0]; > > > > > > Lincoln > > > > > > On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote: > > > > On May 31, 2006, at 12:03 PM, Lincoln Stein wrote: > > > > > If the subroutine is documented to return "false" on failure, then > > > > > one must call > > > > > return (or "return ()" ). > > > > > > > > The problem seems to be that 'a value that evaluates to either true > > > > or false' and 'a [meaningful] value or undef' and 'a value or > > > > false' ('a value or no value) are not the same in perl. And what > > > > would/should one expect if the doc states 'true on success and false > > > > otherwise'? > > > > > > > > Maybe the documentation should also be fixed to avoid any ambiguity. > > > > I.e., avoid documenting 'a value or false' because it may be > > > > ambiguous (not only) to the less proficient. 'True or false' should > > > > imply a value being returned. > > > > > > > > Comments? > > > > > > > > -hilmar > > > > > > -- > > > Lincoln D. Stein > > > Cold Spring Harbor Laboratory > > > 1 Bungtown Road > > > Cold Spring Harbor, NY 11724 > > > (516) 367-8380 (voice) > > > (516) 367-8389 (fax) > > > FOR URGENT MESSAGES & SCHEDULING, > > > PLEASE CONTACT MY ASSISTANT, > > > SANDRA MICHELSEN, AT michelse at cshl.edu > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Wed May 31 20:56:12 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 19:56:12 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <447E1D5F.1050807@campus.iztacala.unam.mx> Message-ID: <002901c68516$316d4fe0$15327e82@pyrimidine> Mauricio et al, Sounds good, except that there are a few issues with the formatting done by Pod::Simple::Wiki, such as changing some things to tags when they obviously aren't code; I don't know if thee is a work around for that (Jay?). It may not be anything too serious though. There was a similar issue with the INSTALL doc conversion to wiki that I ran into, in that I don't think it will be easy converting one way or the other (POD->wiki or wiki->POD or text), so syncing updates with wiki and CVS docs could be an issue we'll have to face in the future. We could strip the POD out of the script and have the docs on the wiki (Brian's idea), or have minimal POD in the tutorial and keep the wiki updated, just to simplify things, but this may not appeal to those who use perldoc frequently (I personally use browsable prettified HTML). cjf > -----Original Message----- > From: Mauricio Herrera Cuadra [mailto:arareko at campus.iztacala.unam.mx] > Sent: Wednesday, May 31, 2006 5:49 PM > To: Chris Fields > Cc: 'Brian Osborne'; 'Jay Hannah'; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl > > Brian, Jay, Chris, > > I agree with what Bernd Web said in another reply. For some people will > be nice to still be able to run the script from the codebase and > interact with it. > > I don't think it should be a lot of problem to maintain both tutorials, > as long as the 'main' one is the one in the CVS tree. By reading what > Jay did in order to convert it into mediawiki format, I suppose this can > be easily done again for each new change to the script (again, this is > just my guessing). Besides, as far as I've seen, there aren't frequent > commits to the script at all. > > I've added a link in the left menu of the wiki. If you think it should > point to the Tutorials page instead of the Bptutorial.pl page please let > me know. > > Regards, > Mauricio. > > Chris Fields wrote: > > Brian, Jay, > > > > I think it would be nice to have the tutorial prominently displayed > somehow > > (Jay's suggestion), with a link provided via the tutorials page. > Hopefully > > this will help with the bioperl newbies. > > > > Jay, looks like there are still some weird formatting issues with the > > bptutorial wiki page, something which I ran into before when getting the > > Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or > more > > spaces preceding a line denotes code for some reason). Not much you can > do > > in these cases except remove the extra spaces in those spots. Looking > good > > though! > > > > Chris > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Brian Osborne > >> Sent: Wednesday, May 31, 2006 8:58 AM > >> To: Jay Hannah; bioperl-l at lists.open-bio.org > >> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl > >> > >> Jay, > >> > >> Excellent! Now we need to answer a few more questions for ourselves: > >> > >> - Do we remove the file bptutorial.pl from the package now? I'd say > yes, > >> we > >> don't want to have to maintain two bptutorials. > >> > >> - What do we do with the script part of bptutorial.pl? It certainly > could > >> be > >> excised and put into the examples/ directory, for example, but this > would > >> break a few of the paths that are being used. > >> > >> - A link to bptutorial? Or a link to the existing tutorials page? > >> http://www.bioperl.org/wiki/Tutorials. > >> > >> Any thoughts on these? > >> > >> > >> Brian O. > >> > >> > >> On 5/31/06 9:07 AM, "Jay Hannah" wrote: > >> > >>> http://www.bioperl.org/wiki/Bptutorial.pl > >>> > >>> I think I just partially fulfilled this TODO: > >>> > >>> TODO: check if the POD is in the Wiki yet, and if not, put it here? > >>> > >>> I used Pod::Simple::Wiki (format 'mediawiki') to burn > >>> bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it > >> the > >>> wiki page via my web browser. (Is that proper procedure? Is the plan > to > >> just > >>> do that manually from time to time as the document changes?) > >>> > >>> Now what? > >>> > >>> Should there be a new link on the far left of bioperl.org called > >> "Tutorial"? > >>> It's an amazing document. IMHO it should be listed prominently on > >> bioperl.org. > >>> HTH, > >>> > >>> j > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM From osborne1 at optonline.net Wed May 31 21:37:15 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 31 May 2006 21:37:15 -0400 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <447E1D5F.1050807@campus.iztacala.unam.mx> Message-ID: Mauricio, Bernd didn't say he want the _script_ in the package, he said he wanted bptutorial.pl in the package, not indicating whether it was the documentation or the script that was important. It's my suspicion that the documentation is more important than the script, and this is what my last letter was asking, in part: is the script important? Or can we focus on the text/POD part? Brian O. On 5/31/06 6:49 PM, "Mauricio Herrera Cuadra" wrote: > I agree with what Bernd Web said in another reply. For some people will > be nice to still be able to run the script from the codebase and > interact with it. From cjfields at uiuc.edu Wed May 31 21:42:54 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 20:42:54 -0500 Subject: [Bioperl-l] For CVS developers - throw_not_implemented In-Reply-To: <100682f110067a83.10067a83100682f1@emich.edu> Message-ID: <002a01c6851c$b3b8a980$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Stephen Gordon Lenk > Sent: Wednesday, May 31, 2006 4:52 PM > To: Hilmar Lapp > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented > > > Isn't it fairly standard in OO schemes/languages to have an exception > thrown if a method > can't be found at the > end of a search up the class hierarchy? I recall being very mad at > Smalltalk because "method > not found" kept > biting me. C++ has pure virtual base classes that do not allow objects to > be instantiated > directly; they are > meant to be inherited and then implemented. Perl will throw an error if it can't find a method in a class hierarchy. It will do a few things first before dying, like looking for AUTOLOAD, etc. AUTOLOAD has it's supporters and detractors; I try to stay away from it as much as possible. Not sure about C++ like pure virtual classes in Perl5, i.e. not allowing direct object instantiation, but Perl6 is supposed to have them, at least according to Apocalypse 12. From what Mr. Wall says about OOP in Perl5, it's essentially 'bolted on' but works with caveats (is 'private' really 'private'?). Perl6 is rebuilt from scratch (internals are OO). > Perl 6 was mentioned a bit back. Is this issue addressed there? Should it > be? Do the Bioperl > people feed their > needs into Perl 6 so that all the code effort to make Bio::Root is handled > for them in the next > effort by Perl 6 > itself. Make the Perl 6 people solve these issues with your input, then > you will not have to > deal with > implementing it yourselves. I'll just bet that you are not the only > potential users of Perl 6 who > will have to solve > these issues eventually. I think Perl6 will solve most (if not all) these problems since it's a complete rebuild. In fact, it's pretty much a new language altogether from what I have seen (and the little I have played around with using Pugs). Parrot is supposed to handle mixes of Perl5/Perl6, so it may not be necessary to immediately convert all of bioperl to Perl6. Though I have also heard of a Perl5->6 converter in the works as well... >From an OO standpoint, I believe everything is considered an object in Perl6, though it's not supposed to force you into using objects according to the Apocalypses that I have read. I actually see a lot there that reminds me of C++ (but in a Perl-ish way, of course). Apocalypse 12 is a good primer, though you may want to go through the others first, they're heavy slogging: http://dev.perl.org/perl6/doc/design/apo/A12.html Not sure what you mean by 'feeding our needs into Perl6'. I have periodically checked on perl6 progress and they seem to have everything well under control. Chris > ----- Original Message ----- > From: Hilmar Lapp > Date: Wednesday, May 31, 2006 5:21 pm > Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented > > > > > On May 31, 2006, at 4:40 PM, Chris Fields wrote: > > > > > What about modules that have 'throw_not_implemented' statements > > > present? > > > > Those are often if not always legitimate - the problem are those > > that > > don't have them but fail to override an inherited interface or > > abstract method. > > > > If something is not implemented what is the better way to express > > this other than throwing an exception? (and if it's not an > > interface > > or abstract base class, saying so in the documentation) > > > > -hilmar > > > > -- > > > ========================================================= > == > > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > > ========================================================= > == > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jay at jays.net Wed May 31 21:54:01 2006 From: jay at jays.net (Jay Hannah) Date: Wed, 31 May 2006 20:54:01 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: References: Message-ID: <447E48B9.4080503@jays.net> Brian Osborne wrote: > - Do we remove the file bptutorial.pl from the package now? I'd say yes, we > don't want to have to maintain two bptutorials. We certainly wouldn't want to try to maintain two copies, one POD one in wiki. That would be the worst of all options. One option that hasn't been mentioned yet is to keep maintenance of that in POD in the distro (leaving the cool runability alone), and then flag that document as unchangeable in the wiki with a note on top "Maintenance of this document is done in POD in the distro. Submit POD patches to bioperl-l and we'll re-post an updated copy to this wiki." Just a thought. > - What do we do with the script part of bptutorial.pl? It certainly could be > excised and put into the examples/ directory, for example, but this would > break a few of the paths that are being used. /README says this: scripts/ - Useful production-quality scripts with POD documentation examples/ - Scripts demonstrating the many uses of Bioperl I'm personally not clear on the difference. Little stuff should start in examples/ and graduate to scripts/ once they've matured? Is the doc/ tree being abandoned? doc/faq (empty?) doc/howto doc/howto/examples doc/howto/figs (empty?) doc/howto/html (empty?) doc/howto/pdf (empty?) doc/howto/sgml (empty?) doc/howto/txt (empty?) doc/howto/xml (empty?) Does all that stuff officially live in and is being changed in the wiki, never to return to the distro? Any reason those empty dirs aren't nuked out of CVS? Chris Fields wrote: > Jay, looks like there are still some weird formatting issues with the > bptutorial wiki page, something which I ran into before when getting the > Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or more > spaces preceding a line denotes code for some reason). Not much you can do > in these cases except remove the extra spaces in those spots. Looking good > though! Sorry, I spent zero time on the whole conversion. I'm not sure what parts didn't convert well. I've never done that conversion before, and know nothing about mediawiki. I just blindly let Pod::Simple::Wiki do its thing then ran off to work. :) Mauricio Herrera Cuadra wrote: > I've added a link in the left menu of the wiki. If you think it should > point to the Tutorials page instead of the Bptutorial.pl page please let > me know. Instead of all these competing links on the left, maybe we should have a master "documentation" page linked on the left cascading like so? Documentation (linked on the left menu) - Quick start - FAQ - HOWTOs - Tutorials (What's the conceptual difference between a HOWTO and a tutorial?) It's hard for me to dive into a wiki lifestyle for the huge documentation pillars since it can't ever get back into the distro... (can it?) Small, throw away stuff is great for the wiki, but huge, established, thoughtful, long documents should be left in the distro? Present (and searchable) on the wiki but static? Why isn't the short "Current events" just listed on the top of the "News" page? Sick of my endless questions yet? -grin- j From cjfields at uiuc.edu Wed May 31 23:09:38 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 22:09:38 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <447E48B9.4080503@jays.net> Message-ID: <000001c68528$d1b6ec10$15327e82@pyrimidine> ... > We certainly wouldn't want to try to maintain two copies, one POD one in > wiki. That would be the worst of all options. One option that hasn't been > mentioned yet is to keep maintenance of that in POD in the distro (leaving > the cool runability alone), and then flag that document as unchangeable in > the wiki with a note on top "Maintenance of this document is done in POD > in the distro. Submit POD patches to bioperl-l and we'll re-post an > updated copy to this wiki." > > Just a thought. There are probably three schools of thought on docs: those that like nice docs with links within and beyond BioPerl (hence the wiki), those who like including docs with the distribution, and those that would like both. The latter would be nice but isn't realistic unless we can come up with a way to sync changes between the wiki and CVS those docs we want to include with the distribution w/o too much trouble. I'm in the first school of thought since rich text with links is better and more informative than plain text any day. It might be a very small school though... > > - What do we do with the script part of bptutorial.pl? It certainly > could be > > excised and put into the examples/ directory, for example, but this > would > > break a few of the paths that are being used. > > /README says this: > > scripts/ - Useful production-quality scripts with POD documentation > examples/ - Scripts demonstrating the many uses of Bioperl > > I'm personally not clear on the difference. Little stuff should start in > examples/ and graduate to scripts/ once they've matured? > > Is the doc/ tree being abandoned? Most docs have been moved over to the wiki, which generates nicely formatted docs for printing. ... > Does all that stuff officially live in and is being changed in the wiki, > never to return to the distro? It's easier to add changes in the wiki and add markup, links, etc. Much richer text, so on. > Any reason those empty dirs aren't nuked out of CVS? > > Chris Fields wrote: > > Jay, looks like there are still some weird formatting issues with the > > bptutorial wiki page, something which I ran into before when getting the > > Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or > more > > spaces preceding a line denotes code for some reason). Not much you can > do > > in these cases except remove the extra spaces in those spots. Looking > good > > though! > > Sorry, I spent zero time on the whole conversion. I'm not sure what parts > didn't convert well. I've never done that conversion before, and know > nothing about mediawiki. I just blindly let Pod::Simple::Wiki do its thing > then ran off to work. :) No big deal. > Mauricio Herrera Cuadra wrote: > > I've added a link in the left menu of the wiki. If you think it should > > point to the Tutorials page instead of the Bptutorial.pl page please let > > me know. > > Instead of all these competing links on the left, maybe we should have a > master "documentation" page linked on the left cascading like so? > > Documentation (linked on the left menu) > - Quick start > - FAQ > - HOWTOs > - Tutorials Okay, though Mauricio may know a bit more on how/if this can be done. Mauricio? > (What's the conceptual difference between a HOWTO and a tutorial?) I believe the reasoning is along these lines: HOWTO's are focused in on specific areas (graphics, trees, BLAST report parsing, etc) and thus usually has greater detail. The tutorials are more broadly based (sort of a general bioperl HOWTO). The only exception is the Beginner's HOWTO, but even that has additional information over the tutorial (at least it did the last time I looked at the tutorial, which has been a while). > It's hard for me to dive into a wiki lifestyle for the huge documentation > pillars since it can't ever get back into the distro... (can it?) Small, > throw away stuff is great for the wiki, but huge, established, thoughtful, > long documents should be left in the distro? Present (and searchable) on > the wiki but static? Hence the problem we face now. It is something we need to really look into before adding too much more to the wiki. IMHO, I think we should have very little information directly in the distribution itself since it's already quite large. It's almost as easy to have a bare-bones INSTALL file, which would point to the wiki for additional information. But I may be very much alone in that train of thought ; > > Why isn't the short "Current events" just listed on the top of the "News" > page? Don't know. > Sick of my endless questions yet? -grin- Not really. cjf > j > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From gad14 at cornell.edu Tue May 30 12:57:41 2006 From: gad14 at cornell.edu (Genevieve DeClerck) Date: Tue, 30 May 2006 12:57:41 -0400 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <447BFB20.40501@mrc-dunn.cam.ac.uk> References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk> Message-ID: <447C7985.9000404@cornell.edu> Thanks for your comment Sendu, it was very helpful. I think this must be what's going on.. I am using $blast_report->next_result in both subroutines. It appears that analyzing the blast results first w/ my sort subroutine empties (?) the $blast_result object so that when I try to print, there is nothing left to print. (and visa-versa when I print first then try to sort). So, from the looks of things, using next_result has the effect of popping the Bio::Search::Result::ResultI objects off of the SearchIO blast report object?? It seems I could get around this by making a copy of the blast report by setting it to another new variable...(not the most elegant solution) but I'm having trouble with this... If I do: my $blast_report_copy = $blast_report; I'm just copying the reference to the SearchIO blast result, so it doesn't help me. How can I make another physical copy of this blast result object? Seems like a simple thing but how to do it is escaping me. But better yet, the way to go is to 'reset the counter,' or to find a way to look at/print/sort the results without removing data from the blast result object. How is this done though?? Sendu and Brian, I didn't post the sort_results subroutine because it is sprawling, as is a lot of my code. The code I provided was more like an aid for my explanation of the problem.. it doesn't actually run - sorry for the confusion, I should have more clear on that. The important thing to know perhaps is that both sort_results and print_blast_results contain a foreach loop where I am using the 'next_results' method to view blast results. (And to clarify for Torsten, the blastall() is working just fine - the analysis/viewing of the results object is where I am encountering the problem.) Any other ideas would be greatly appreciated... Thank you, Genevieve Sendu Bala wrote: > Genevieve DeClerck wrote: > >> Hi, > > [snip] > >> If I've sorted the results the sorted-results will print to screen, >> however when I try to print the Hit Table results nothing is returned, >> as if the blast results have evaporated.... and visa versa, if i >> comment out the part where i point my sorting subroutine to the blast >> results reference, my hit table results suddenly prints to screen. > > [snip] > >> Here's an abbreviated version of my code: > > [snip] > >> ####### >> ### the following 2 actions seem to be mutually exclusive. >> # 1) sort results into 1-hitter, 2-hitter, etc. groups of >> # SeqFeature objs stored in arrays. arrays are then printed >> # to stdout >> &sort_results($blast_report); >> >> # 2) print blast results >> &print_blast_results($blast_report); > > >> sub print_blast_results{ >> my $report = shift; >> while(my $result = $report->next_result()){ > > [snip] > > You didn't give us your sort_results subroutine, but is it as simple as > they both use $report->next_result (and/or $result->next_hit), but you > don't reset the internal counter back to the start, so the second > subroutine tries to get the next_result and finds the first subroutine > has already looked at the last result and so next_result returns false? > > From a quick look it wasn't obvious how to reset the counter. Hopefully > this can be done and someone else knows how. > From lstein at cshl.edu Wed May 31 11:17:39 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 31 May 2006 11:17:39 -0400 Subject: [Bioperl-l] SOLVED Bio::Graphics::Panel make ruler have neg values In-Reply-To: <5b6410e0605302045x5c420674x6f898a8a2973991a@mail.gmail.com> References: <5b6410e0605302045x5c420674x6f898a8a2973991a@mail.gmail.com> Message-ID: <200605311117.41479.lstein@cshl.edu> Hi Kevin, Since you are modifying the Panel.pm source code, why don't you just go ahead and use the current Bio::Graphics development tree? Since 1.5.1 it supports negative coordinates. Here's an illustration: #!/usr/bin/perl use strict; use Bio::Graphics; use Bio::Graphics::Feature; my $whole = Bio::Graphics::Feature->new(-start=>-200,-end=>+200); my $feature = Bio::Graphics::Feature->new(-start=>-100,-end=>+100,-strand=>+1); my $panel = Bio::Graphics::Panel->new(-start=> -200, -end => +200, -width=>800, -pad_left=>10, -pad_right=>10); $panel->add_track($whole, -glyph=>'arrow', -double=>1, -tick=>2); $panel->add_track($feature, -glyph=>'box', -stranded=>1); print $panel->png; exit 0; The resulting image is attached. Lincoln On Tuesday 30 May 2006 23:45, Kevin Lam Koiyau wrote: > I am so sorry for the truncated email accidentally hit reply. > if anyone is interested i have opted to change > > change line 161 of arrow.pm in Perl/site/lib/Bio/Graphics/Glyph/arrow.pm > in linux its > /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Glyph/arrow.pm > > > $gd->string($font,$middle,$center+$a2-1,$label,$font_color) > > to > > $gd->string($font,$middle,$center+$a2-1,$label-1000,$font_color) > > just for this one-off use. > > > > strangely I found at line 112 for ver 1.51 bioperl in arrow.pm a hidden > option for coords offset? > my $relative_coords_offset = $self->option('relative_coords_offset'); > $relative_coords_offset = 1 unless defined $relative_coords_offset; > but entering the option -relative_coords_offset=>1000 in the arrow glyphs > didn't do anything... > > > > Hi! > > > oh it was in a slightly different header asking about the create image > > map feature. > > I am using the stable version 1.4 of bioperl now. In any case I have not > > added the sequence as a feature annotated seq. as I already have the bp > > where the TF binds (in 1-1050 numberings) so what I did was to just add > > graded segments based on the position. > > I saw that there is a scale function for the arrow glyp however, it is a > > multiply function, can it be hacked to take in a offset value (ie minus > > the > > scale by 1000?) > > > > cheers > > kevin > > > > > > Hi, > > > > > For some reason I didn't see the first posting on this. In current > > > > bioperl > > > > > live, the ruler can have negative numberings - I use this routinely. > > > You need > > > to create a feature that starts in negative coordinates. What is > > > > happening > > > > > to > > > you when you try this? > > > > > > Lincoln > > > > > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote: > > > > Hi > > > > thanks for the help offered thus far! > > > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq > > > > > > using > > > > > > > bioperl. therefore i was asked to make the numberings as such (-1000) > > > > is > > > > > > there any way at all to do this in bioperl without changing the .pm > > > > > > file? > > > > > > > thanks guys.. > > > > kevin > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > > > Lincoln D. Stein > > > Cold Spring Harbor Laboratory > > > 1 Bungtown Road > > > Cold Spring Harbor, NY 11724 > > > (516) 367-8380 (voice) > > > (516) 367-8389 (fax) > > > FOR URGENT MESSAGES & SCHEDULING, > > > PLEASE CONTACT MY ASSISTANT, > > > SANDRA MICHELSEN, AT michelse at cshl.edu > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu -------------- next part -------------- A non-text attachment was scrubbed... Name: negatives.png Type: image/png Size: 1065 bytes Desc: not available URL: From lstein at cshl.edu Wed May 31 12:05:47 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 31 May 2006 12:05:47 -0400 Subject: [Bioperl-l] Fwd: Re: SOLVED Bio::Graphics::Panel make ruler have neg values Message-ID: <200605311205.48122.lstein@cshl.edu> Oddly, bioperl-l listserver is holding this mail because it has "a suspicious header". I took out Kevin's email address in case it is the "spammotel" header that is bothering it. Lincoln ---------- Forwarded Message ---------- Subject: Re: [Bioperl-l] SOLVED Bio::Graphics::Panel make ruler have neg values Date: Wednesday 31 May 2006 11:17 From: Lincoln Stein To: bioperl-l at lists.open-bio.org Cc: "Kevin Lam Koiyau" Hi Kevin, Since you are modifying the Panel.pm source code, why don't you just go ahead and use the current Bio::Graphics development tree? Since 1.5.1 it supports negative coordinates. Here's an illustration: #!/usr/bin/perl use strict; use Bio::Graphics; use Bio::Graphics::Feature; my $whole = Bio::Graphics::Feature->new(-start=>-200,-end=>+200); my $feature = Bio::Graphics::Feature->new(-start=>-100,-end=>+100,-strand=>+1); my $panel = Bio::Graphics::Panel->new(-start=> -200, -end => +200, -width=>800, -pad_left=>10, -pad_right=>10); $panel->add_track($whole, -glyph=>'arrow', -double=>1, -tick=>2); $panel->add_track($feature, -glyph=>'box', -stranded=>1); print $panel->png; exit 0; The resulting image is attached. Lincoln On Tuesday 30 May 2006 23:45, Kevin Lam Koiyau wrote: > I am so sorry for the truncated email accidentally hit reply. > if anyone is interested i have opted to change > > change line 161 of arrow.pm in Perl/site/lib/Bio/Graphics/Glyph/arrow.pm > in linux its > /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Glyph/arrow.pm > > > $gd->string($font,$middle,$center+$a2-1,$label,$font_color) > > to > > $gd->string($font,$middle,$center+$a2-1,$label-1000,$font_color) > > just for this one-off use. > > > > strangely I found at line 112 for ver 1.51 bioperl in arrow.pm a hidden > option for coords offset? > my $relative_coords_offset = $self->option('relative_coords_offset'); > $relative_coords_offset = 1 unless defined $relative_coords_offset; > but entering the option -relative_coords_offset=>1000 in the arrow glyphs > didn't do anything... > > > > Hi! > > > oh it was in a slightly different header asking about the create image > > map feature. > > I am using the stable version 1.4 of bioperl now. In any case I have not > > added the sequence as a feature annotated seq. as I already have the bp > > where the TF binds (in 1-1050 numberings) so what I did was to just add > > graded segments based on the position. > > I saw that there is a scale function for the arrow glyp however, it is a > > multiply function, can it be hacked to take in a offset value (ie minus > > the > > scale by 1000?) > > > > cheers > > kevin > > > > > > Hi, > > > > > For some reason I didn't see the first posting on this. In current > > > > bioperl > > > > > live, the ruler can have negative numberings - I use this routinely. > > > You need > > > to create a feature that starts in negative coordinates. What is > > > > happening > > > > > to > > > you when you try this? > > > > > > Lincoln > > > > > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote: > > > > Hi > > > > thanks for the help offered thus far! > > > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq > > > > > > using > > > > > > > bioperl. therefore i was asked to make the numberings as such (-1000) > > > > is > > > > > > there any way at all to do this in bioperl without changing the .pm > > > > > > file? > > > > > > > thanks guys.. > > > > kevin > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > > > Lincoln D. Stein > > > Cold Spring Harbor Laboratory > > > 1 Bungtown Road > > > Cold Spring Harbor, NY 11724 > > > (516) 367-8380 (voice) > > > (516) 367-8389 (fax) > > > FOR URGENT MESSAGES & SCHEDULING, > > > PLEASE CONTACT MY ASSISTANT, > > > SANDRA MICHELSEN, AT michelse at cshl.edu > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu ------------------------------------------------------- -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu -------------- next part -------------- A non-text attachment was scrubbed... Name: negatives.png Type: image/png Size: 1065 bytes Desc: not available URL: From rvosa at sfu.ca Tue May 30 15:10:17 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Tue, 30 May 2006 12:10:17 -0700 Subject: [Bioperl-l] New mailing list for Bio::Phylo Message-ID: <447C9899.5060102@sfu.ca> Dear recipients, the open bioinformatics foundation has been kind enough to host a mailing list for Bio::Phylo (http://search.cpan.org/~rvosa/Bio-Phylo/, the cpan distribution for phylogenetic analysis using perl). The scope of this list is at present fairly broad as it is both meant for user questions and development discussion on deeper integration with bioperl. You are invited to sign up at: http://lists.open-bio.org/mailman/listinfo/bio-phylo-l Best wishes, Rutger Vos -- ++++++++++++++++++++++++++++++++++++++++++++++++++++ Rutger Vos, PhD. candidate Department of Biological Sciences Simon Fraser University 8888 University Drive Burnaby, BC, V5A1S6 Phone: 604-291-5625 Fax: 604-291-3496 Personal site: http://www.sfu.ca/~rvosa FAB* lab: http://www.sfu.ca/~fabstar Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ ++++++++++++++++++++++++++++++++++++++++++++++++++++ From bioperlanand at yahoo.com Mon May 1 14:36:20 2006 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Mon, 1 May 2006 11:36:20 -0700 (PDT) Subject: [Bioperl-l] how to obtain GIs from clone_ids Message-ID: <20060501183620.85791.qmail@web37901.mail.mud.yahoo.com> Hi everybody, I have a file containing clone_ids (from the Features annotation section of a GenBank entry) ------------------------------------------------------------ FEATURES Location/Qualifiers source 1..707 /clone="C0005918b04" ------------------------------------------------------------ Is there a way in Bioperl to send a query over the internet (one clone_id at a time) and get out just the GI number for that clone_id? Any suggestions.. Thanks in advance. Anand --------------------------------- Blab-away for as little as 1?/min. Make PC-to-Phone Calls using Yahoo! Messenger with Voice. From cuiw at mail.nih.gov Mon May 1 15:39:01 2006 From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F]) Date: Mon, 1 May 2006 15:39:01 -0400 Subject: [Bioperl-l] how to obtain GIs from clone_ids In-Reply-To: <20060501183620.85791.qmail@web37901.mail.mud.yahoo.com> Message-ID: use strict; use Bio::DB::Query::GenBank; my $query_string = 'EST["C0005918b04"]'; my $query = Bio::DB::Query::GenBank->new(-db=>'nucleotide', -query=>$query_string, ); my $count = $query->count; my @ids = $query->ids; for (@ids) { print; } -----Original Message----- From: Anand Venkatraman [mailto:bioperlanand at yahoo.com] Sent: Monday, May 01, 2006 2:36 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] how to obtain GIs from clone_ids Hi everybody, I have a file containing clone_ids (from the Features annotation section of a GenBank entry) ------------------------------------------------------------ FEATURES Location/Qualifiers source 1..707 /clone="C0005918b04" ------------------------------------------------------------ Is there a way in Bioperl to send a query over the internet (one clone_id at a time) and get out just the GI number for that clone_id? Any suggestions.. Thanks in advance. Anand --------------------------------- Blab-away for as little as 1?/min. Make PC-to-Phone Calls using Yahoo! Messenger with Voice. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From s.ryazansky at gmail.com Mon May 1 17:55:13 2006 From: s.ryazansky at gmail.com (Sergei Ryazansky) Date: Mon, 1 May 2006 21:55:13 +0000 (UTC) Subject: [Bioperl-l] blast program to run locally on windows References: <007c01c66883$61f29490$15327e82@pyrimidine> <20060425215433.35436.qmail@web36613.mail.mud.yahoo.com> Message-ID: Hi, Can you post your formatdb.log file here? From cjfields at uiuc.edu Tue May 2 00:15:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 1 May 2006 23:15:19 -0500 Subject: [Bioperl-l] blast program to run locally on windows In-Reply-To: References: <007c01c66883$61f29490$15327e82@pyrimidine> <20060425215433.35436.qmail@web36613.mail.mud.yahoo.com> Message-ID: We managed to work our way through it. He hadn't set ncbi.ini to the correct directories; the database was formatted correctly. Chris On May 1, 2006, at 4:55 PM, Sergei Ryazansky wrote: > Hi, > Can you post your formatdb.log file here? > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue May 2 12:19:34 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 2 May 2006 11:19:34 -0500 Subject: [Bioperl-l] Bio::DB::GenBank and complexity Message-ID: <000901c66e04$33e07370$15327e82@pyrimidine> I ran into some wonkiness with using extra parameters ('seq_start', 'seq_stop', 'strand', and 'complexity') with Bio::DB::GenBank that I have gone through, fixed, and committed. I also have added a few tests to DB.t for everything (all changes were in Bio::DB::WebDBSeqI and Bio::DB::NCBIHelper). The 'complexity' tag is the strangest, though I did manage to get it added as well (with tests). This is how NCBI defines complexity: complexity regulates the display: 0 - get the whole blob 1 - get the bioseq for gi of interest (default in Entrez) 2 - get the minimal bioseq-set containing the gi of interest 3 - get the minimal nuc-prot containing the gi of interest 4 - get the minimal pub-set containing the gi of interest Here's my quandary; when setting complexity to '0', you get a glob back (the main sequence as well as any subsequences, such as CDS); this is in essence a sequence stream with multiple alphabet types. So, I now have it set up to do this: my $factory = Bio::DB::GenBank->new(-format => 'fasta', -complexity => 0 ); my $seqin = $factory->get_Seq_by_acc($acc); while (my $seq = $seqin->next_seq) { $seqout->write_seq($seq); } since I thought returning an array would be horrendously expensive on memory, esp. with larger sequences. Currently this is only set up for sequences which are retrieved when complexity is set to '0' so it's a pretty unique case. Regardless, I'm worried that, since users expect a Bio::Seq object instead of a Bio::SeqIO object here, it will cause a lot of confusion with the API. Any suggestions/gripes? Chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From mamillerpa at yahoo.com Tue May 2 07:41:01 2006 From: mamillerpa at yahoo.com (Mark A. Miller) Date: Tue, 2 May 2006 04:41:01 -0700 (PDT) Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines Message-ID: <20060502114101.29745.qmail@web50409.mail.yahoo.com> Hello all. I have a recently donwloaded UniProt/TrEMBL flat file. I am trying to make FASTA subset files for some bacterial strains. I haven't been able to parse out the strain information from the OS or RC lines. These lines typically look like: OS Somegenus somespecies subsp. somesubspecies strain ABC123. RC STRAIN=ABC123. I'm not especiialy good with Perl, and I'm definitely weak when it comes to OOP. I have included some code I pasted together from various pages on the bioperl wiki. In addition to the wiki, I have been making use of www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html The code I have so far reports the species but not the subspecies or variant. I have also tried to walk through all of the feature, annotation and reference objects but I still can't seem to parse out the information I need. (For brevity, the example I'm including below only lists the code I used for the annotation objects.) Also, this code only prints the information... I know that I'll have to write a FASTA sequence object seperately. Any suggestions? Thanks, Mark --- --- --- #!/usr/bin/perl use Bio::SeqIO; my $usage = "getaccs.pl file format\n"; my $file = shift or die $usage; my $format = shift or die $usage; my $inseq = Bio::SeqIO->new(-file => "<$file", -format => $format ); while (my $seq = $inseq->next_seq) { my $species_object = $seq->species; my $species_string = $species_object->species; my $variant_string = $species_object->variant; my $common_string = $species_object->common_name; my $sub_string = $species_object->sub_species; my $binomial = $species_object->binomial('FULL'); print "display ",$seq->display_id,"\n"; print "accession ",$seq->accession_number,"\n"; print "desc ",$seq->desc,"\n"; print "species ",$species_string,"\n"; print "variant ",$variant_string,"\n"; print "common ",$common_string,"\n"; print "sub ",$sub_string,"\n"; print "binomial ",$binomial,"\n"; print $seq->seq,"\n"; my $anno_collection = $seq->annotation; for my $key ( $anno_collection->get_all_annotation_keys ) { my @annotations = $anno_collection->get_Annotations($key); for my $value ( @annotations ) { print "tagname : ", $value->tagname, "\n"; # $value is an Bio::Annotation, and has an "as_text" method print " annotation value: ", $value->as_text, "\n"; if ($value->tagname eq "reference") { my $hash_ref = $value->hash_tree; for my $key (keys %{$hash_ref}) { print $key,": ",$hash_ref->{$key},"\n"; } } } } print "\n"; } exit; --- --- --- --- --- --- --- --- Mark A. Miller __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cjfields at uiuc.edu Tue May 2 14:01:58 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 2 May 2006 13:01:58 -0500 Subject: [Bioperl-l] Bio::DB::GenBank and complexity In-Reply-To: <000901c66e04$33e07370$15327e82@pyrimidine> Message-ID: <000a01c66e12$8131a960$15327e82@pyrimidine> I hate responding to my own post! Just wanted to add that I'm adding a warnings for the get_Seq* methods to use the approp. get_Stream* method when complexity == 0 before returning the Bio::SeqIO object. CJF > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Tuesday, May 02, 2006 11:20 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::DB::GenBank and complexity > > I ran into some wonkiness with using extra parameters ('seq_start', > 'seq_stop', 'strand', and 'complexity') with Bio::DB::GenBank that I have > gone through, fixed, and committed. I also have added a few tests to DB.t > for everything (all changes were in Bio::DB::WebDBSeqI and > Bio::DB::NCBIHelper). The 'complexity' tag is the strangest, though I did > manage to get it added as well (with tests). This is how NCBI defines > complexity: > > complexity regulates the display: > 0 - get the whole blob > 1 - get the bioseq for gi of interest (default in Entrez) > 2 - get the minimal bioseq-set containing the gi of interest > 3 - get the minimal nuc-prot containing the gi of interest > 4 - get the minimal pub-set containing the gi of interest > > Here's my quandary; when setting complexity to '0', you get a glob back > (the > main sequence as well as any subsequences, such as CDS); this is in > essence > a sequence stream with multiple alphabet types. So, I now have it set up > to > do this: > > my $factory = Bio::DB::GenBank->new(-format => 'fasta', > -complexity => 0 > ); > > my $seqin = $factory->get_Seq_by_acc($acc); > > while (my $seq = $seqin->next_seq) { > $seqout->write_seq($seq); > } > > since I thought returning an array would be horrendously expensive on > memory, esp. with larger sequences. Currently this is only set up for > sequences which are retrieved when complexity is set to '0' so it's a > pretty > unique case. Regardless, I'm worried that, since users expect a Bio::Seq > object instead of a Bio::SeqIO object here, it will cause a lot of > confusion > with the API. Any suggestions/gripes? > > Chris > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Tue May 2 14:36:08 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue, 2 May 2006 14:36:08 -0400 Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines In-Reply-To: <20060502114101.29745.qmail@web50409.mail.yahoo.com> References: <20060502114101.29745.qmail@web50409.mail.yahoo.com> Message-ID: <7B49D031-9F74-43C3-AA4F-2AE115BB843D@duke.edu> This is really a limitation of the EMBL/GenBank format See this thread: http://lists.open-bio.org/pipermail/bioperl-l/2006-March/021068.html or on GMANE http://comments.gmane.org/gmane.comp.lang.perl.bio.general/10557 I don't know if any of this has been resolved really so hopefully James will speak up if he's implemented anything. -jason On May 2, 2006, at 7:41 AM, Mark A. Miller wrote: > Hello all. > > I have a recently donwloaded UniProt/TrEMBL flat file. I am trying to > make FASTA subset files for some bacterial strains. I haven't been > able to parse out the strain information from the OS or RC lines. > These lines typically look like: > > OS Somegenus somespecies subsp. somesubspecies strain ABC123. > RC STRAIN=ABC123. > > I'm not especiialy good with Perl, and I'm definitely weak when it > comes to OOP. > > I have included some code I pasted together from various pages on the > bioperl wiki. In addition to the wiki, I have been making use of > www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html > > The code I have so far reports the species but not the subspecies or > variant. I have also tried to walk through all of the feature, > annotation and reference objects but I still can't seem to parse out > the information I need. (For brevity, the example I'm including below > only lists the code I used for the annotation objects.) Also, this > code only prints the information... I know that I'll have to write a > FASTA sequence object seperately. > > Any suggestions? > > Thanks, > Mark > > --- --- --- > > > #!/usr/bin/perl > > > > use Bio::SeqIO; > > > > my $usage = "getaccs.pl file format\n"; > > my $file = shift or die $usage; > > my $format = shift or die $usage; > > > > my $inseq = Bio::SeqIO->new(-file => "<$file", > > -format => $format ); > > > > while (my $seq = $inseq->next_seq) { > > > > my $species_object = $seq->species; > > my $species_string = $species_object->species; > > my $variant_string = $species_object->variant; > > my $common_string = $species_object->common_name; > > my $sub_string = $species_object->sub_species; > > my $binomial = $species_object->binomial('FULL'); > > > > print "display ",$seq->display_id,"\n"; > > print "accession ",$seq->accession_number,"\n"; > > print "desc ",$seq->desc,"\n"; > > > > print "species ",$species_string,"\n"; > > print "variant ",$variant_string,"\n"; > > print "common ",$common_string,"\n"; > > print "sub ",$sub_string,"\n"; > > print "binomial ",$binomial,"\n"; > > > > print $seq->seq,"\n"; > > > > my $anno_collection = $seq->annotation; > > for my $key ( $anno_collection->get_all_annotation_keys ) { > > my @annotations = $anno_collection->get_Annotations($key); > > for my $value ( @annotations ) { > > print "tagname : ", $value->tagname, "\n"; > > # $value is an Bio::Annotation, and has an "as_text" method > > print " annotation value: ", $value->as_text, "\n"; > > > > if ($value->tagname eq "reference") { > > my $hash_ref = $value->hash_tree; > > for my $key (keys %{$hash_ref}) { > > print $key,": ",$hash_ref->{$key},"\n"; > > } > > } > > } > > } > > print "\n"; > > } > > exit; > > > > > > --- --- --- --- --- --- --- --- > > Mark A. Miller > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From mblanche at berkeley.edu Tue May 2 15:30:49 2006 From: mblanche at berkeley.edu (Marco Blanchette) Date: Tue, 02 May 2006 12:30:49 -0700 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF Message-ID: Dear all-- I have been trying to use the intersection function to extract overlapping region from alternatively spliced exons as in the following script. The returned object from the 'my $overlap = $exon1->intersection($exon2);' is actually loosing the strand of $exon1 if $exon1 is from the negative strand. Is this behavior expected? Should I check the strand of $exon1 before working on the object return by any Bio::RangeI function? Many thanks #!/usr/bin/perl use strict; use warnings; use Bio::DB::GFF; MAIN:{ my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', -dsn => 'dbi:mysql:database=dmel_43_LS;host=riolab.net', -user => 'guest'); my $test_db = $db->segment('4'); # Load up the exons into $exons_p for my $gene ($test_db->features(-types => 'gene')){ my $exons_p = extractExons($gene); cluster($exons_p) unless ($#{$exons_p} == -1); } } sub extractExons { my $gene = shift; my %ex_list; my @tcs = $gene->features( -type =>'processed_transcript', -attributes =>{Gene => $gene->group}); for my $tc (@tcs){ my @exons = $tc->features (-type => 'exon', -attributes => {Parent => $tc->group} ); for (@exons){ my $ex_id = $_->id; $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); } } my @values = values %ex_list; return(\@values); } sub cluster { my $exons_p = shift; for (my $s = 0; $s <= $#{$exons_p}; $s++){ for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ my $exon1 = $exons_p->[$s]; my $exon2 = $exons_p->[$t]; if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ my $overlap = $exon1->intersection($exon2); print "===\n";; print "ex1\n", $exon1->seq, "\n"; print "ex2\n", $exon2->seq, "\n"; print "overlap\n", $overlap->seq, "\n"; } } } } ______________________________ Marco Blanchette, Ph.D. mblanche at uclink.berkeley.edu Donald C. Rio's lab Department of Molecular and Cell Biology 16 Barker Hall University of California Berkeley, CA 94720-3204 Tel: (510) 642-1084 Cell: (510) 847-0996 Fax: (510) 642-6062 -- From osborne1 at optonline.net Tue May 2 16:17:29 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 02 May 2006 16:17:29 -0400 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF In-Reply-To: Message-ID: Marco, Yes, this is how intersection() is supposed to work. If both of the Range objects have the same strand then the strand information is returned as part of the result but if they aren't on the same strand then no strand information is returned. Brian O. On 5/2/06 3:30 PM, "Marco Blanchette" wrote: > Dear all-- > > I have been trying to use the intersection function to extract overlapping > region from alternatively spliced exons as in the following script. The > returned object from the 'my $overlap = $exon1->intersection($exon2);' is > actually loosing the strand of $exon1 if $exon1 is from the negative strand. > Is this behavior expected? Should I check the strand of $exon1 before > working on the object return by any Bio::RangeI function? > > Many thanks > > #!/usr/bin/perl > use strict; > use warnings; > use Bio::DB::GFF; > > MAIN:{ > > my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', > -dsn => > 'dbi:mysql:database=dmel_43_LS;host=riolab.net', > -user => 'guest'); > my $test_db = $db->segment('4'); > > # Load up the exons into $exons_p > for my $gene ($test_db->features(-types => 'gene')){ > > my $exons_p = extractExons($gene); > > cluster($exons_p) unless ($#{$exons_p} == -1); > > } > } > > sub extractExons { > my $gene = shift; > my %ex_list; > my @tcs = $gene->features( -type =>'processed_transcript', > -attributes =>{Gene => $gene->group}); > > for my $tc (@tcs){ > my @exons = $tc->features (-type => 'exon', > -attributes => {Parent => $tc->group} > ); > > for (@exons){ > my $ex_id = $_->id; > $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); > > } > > } > my @values = values %ex_list; > return(\@values); > } > > sub cluster { > my $exons_p = shift; > > for (my $s = 0; $s <= $#{$exons_p}; $s++){ > for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ > my $exon1 = $exons_p->[$s]; > my $exon2 = $exons_p->[$t]; > > if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ > > my $overlap = $exon1->intersection($exon2); > > print "===\n";; > print "ex1\n", $exon1->seq, "\n"; > print "ex2\n", $exon2->seq, "\n"; > print "overlap\n", $overlap->seq, "\n"; > } > } > } > } > ______________________________ > Marco Blanchette, Ph.D. > > mblanche at uclink.berkeley.edu > > Donald C. Rio's lab > Department of Molecular and Cell Biology > 16 Barker Hall > University of California > Berkeley, CA 94720-3204 > > Tel: (510) 642-1084 > Cell: (510) 847-0996 > Fax: (510) 642-6062 From mblanche at berkeley.edu Tue May 2 16:32:58 2006 From: mblanche at berkeley.edu (Marco Blanchette) Date: Tue, 02 May 2006 13:32:58 -0700 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF In-Reply-To: Message-ID: Brian-- Even when both elements of intersection() are from the negative strand, the return object is from the positive strand and $overlap is actually the revervese complement of the intersection between the 2 exons. Here is part of the output from the script below: === ex1 Strand: -1 CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG ex2 Strand: -1 CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAACCCGT CAAATCG overlap Strand: 1 CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTATTTT TGCCGACTGCCATGTTCAACTAATAAACCGG AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG ... If both are from the positive strand, the return object is positive as in: === ex1 Strand: 1 CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCTTTGT TTTGTGCCTGTTTCAGTATAAATTAATTATG CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGATGAAT AAATATACATATATGCAACATATATAACTTC CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGGCAGA GCAGCGATCAAAGCTGCGTTCGGTACTCGTT GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT ex2 Strand: 1 ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG overlap Strand: 1 CAACGCAGACGTG Is there something I am missing? Here is the script generating the output Many thanks all... Marco use strict; use warnings; use Bio::DB::GFF; MAIN:{ my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', -dsn => 'dbi:mysql:database=dmel_43_LS;host=riolab.net', -user => 'guest'); my $test_db = $db->segment('4'); # Load up the exons into $exons_p for my $gene ($test_db->features(-types => 'gene')){ my $exons_p = extractExons($gene); cluster($exons_p) unless ($#{$exons_p} == -1); } } sub extractExons { my $gene = shift; my %ex_list; my @tcs = $gene->features( -type =>'processed_transcript', -attributes =>{Gene => $gene->group}); for my $tc (@tcs){ my @exons = $tc->features (-type => 'exon', -attributes => {Parent => $tc->group} ); for (@exons){ my $ex_id = $_->id; $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); } } my @values = values %ex_list; return(\@values); } sub cluster { my $exons_p = shift; for (my $s = 0; $s <= $#{$exons_p}; $s++){ for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ my $exon1 = $exons_p->[$s]; my $exon2 = $exons_p->[$t]; if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ my $overlap = $exon1->intersection($exon2); print "===\n";; print "ex1\tStrand: ", $exon1->strand, "\n", $exon1->seq, "\n"; print "ex2\tStrand: ", $exon2->strand, "\n", $exon2->seq, "\n"; print "overlap\tStrand: ", $overlap->strand, "\n", $overlap->seq, "\n"; } } } } On 5/2/06 13:17, "Brian Osborne" wrote: > Marco, > > Yes, this is how intersection() is supposed to work. If both of the Range > objects have the same strand then the strand information is returned as part > of the result but if they aren't on the same strand then no strand > information is returned. > > Brian O. > > > On 5/2/06 3:30 PM, "Marco Blanchette" wrote: > >> Dear all-- >> >> I have been trying to use the intersection function to extract overlapping >> region from alternatively spliced exons as in the following script. The >> returned object from the 'my $overlap = $exon1->intersection($exon2);' is >> actually loosing the strand of $exon1 if $exon1 is from the negative strand. >> Is this behavior expected? Should I check the strand of $exon1 before >> working on the object return by any Bio::RangeI function? >> >> Many thanks >> >> #!/usr/bin/perl >> use strict; >> use warnings; >> use Bio::DB::GFF; >> >> MAIN:{ >> >> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', >> -dsn => >> 'dbi:mysql:database=dmel_43_LS;host=riolab.net', >> -user => 'guest'); >> my $test_db = $db->segment('4'); >> >> # Load up the exons into $exons_p >> for my $gene ($test_db->features(-types => 'gene')){ >> >> my $exons_p = extractExons($gene); >> >> cluster($exons_p) unless ($#{$exons_p} == -1); >> >> } >> } >> >> sub extractExons { >> my $gene = shift; >> my %ex_list; >> my @tcs = $gene->features( -type =>'processed_transcript', >> -attributes =>{Gene => $gene->group}); >> >> for my $tc (@tcs){ >> my @exons = $tc->features (-type => 'exon', >> -attributes => {Parent => $tc->group} >> ); >> >> for (@exons){ >> my $ex_id = $_->id; >> $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); >> >> } >> >> } >> my @values = values %ex_list; >> return(\@values); >> } >> >> sub cluster { >> my $exons_p = shift; >> >> for (my $s = 0; $s <= $#{$exons_p}; $s++){ >> for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ >> my $exon1 = $exons_p->[$s]; >> my $exon2 = $exons_p->[$t]; >> >> if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ >> >> my $overlap = $exon1->intersection($exon2); >> >> print "===\n";; >> print "ex1\n", $exon1->seq, "\n"; >> print "ex2\n", $exon2->seq, "\n"; >> print "overlap\n", $overlap->seq, "\n"; >> } >> } >> } >> } >> ______________________________ >> Marco Blanchette, Ph.D. >> >> mblanche at uclink.berkeley.edu >> >> Donald C. Rio's lab >> Department of Molecular and Cell Biology >> 16 Barker Hall >> University of California >> Berkeley, CA 94720-3204 >> >> Tel: (510) 642-1084 >> Cell: (510) 847-0996 >> Fax: (510) 642-6062 > > ______________________________ Marco Blanchette, Ph.D. mblanche at uclink.berkeley.edu Donald C. Rio's lab Department of Molecular and Cell Biology 16 Barker Hall University of California Berkeley, CA 94720-3204 Tel: (510) 642-1084 Cell: (510) 847-0996 Fax: (510) 642-6062 -- From osborne1 at optonline.net Tue May 2 17:49:49 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 02 May 2006 17:49:49 -0400 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF In-Reply-To: Message-ID: Marco, Odd, because the intersection() code is quite simple and it's clear how it should behave. What version of Bioperl are you using? I'm looking at the latest, in bioperl-live... Brian O. On 5/2/06 4:32 PM, "Marco Blanchette" wrote: > Brian-- > > Even when both elements of intersection() are from the negative strand, the > return object is from the positive strand and $overlap is actually the > revervese complement of the intersection between the 2 exons. Here is part > of the output from the script below: > > === > ex1 Strand: -1 > CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA > AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG > TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG > ex2 Strand: -1 > CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA > AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG > TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAACCCGT > CAAATCG > overlap Strand: 1 > CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTATTTT > TGCCGACTGCCATGTTCAACTAATAAACCGG > AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG > ... > > If both are from the positive strand, the return object is positive as in: > > === > ex1 Strand: 1 > CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCTTTGT > TTTGTGCCTGTTTCAGTATAAATTAATTATG > CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGATGAAT > AAATATACATATATGCAACATATATAACTTC > CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGGCAGA > GCAGCGATCAAAGCTGCGTTCGGTACTCGTT > GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT > ex2 Strand: 1 > ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG > overlap Strand: 1 > CAACGCAGACGTG > > Is there something I am missing? Here is the script generating the output > > Many thanks all... > > Marco > > > use strict; > use warnings; > use Bio::DB::GFF; > > MAIN:{ > > my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', > -dsn => > 'dbi:mysql:database=dmel_43_LS;host=riolab.net', > -user => 'guest'); > my $test_db = $db->segment('4'); > > # Load up the exons into $exons_p > for my $gene ($test_db->features(-types => 'gene')){ > > my $exons_p = extractExons($gene); > > cluster($exons_p) unless ($#{$exons_p} == -1); > > } > } > > sub extractExons { > my $gene = shift; > my %ex_list; > my @tcs = $gene->features( -type =>'processed_transcript', > -attributes =>{Gene => $gene->group}); > > for my $tc (@tcs){ > my @exons = $tc->features (-type => 'exon', > -attributes => {Parent => $tc->group} > ); > > for (@exons){ > my $ex_id = $_->id; > $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); > > } > > } > my @values = values %ex_list; > return(\@values); > } > > sub cluster { > my $exons_p = shift; > > for (my $s = 0; $s <= $#{$exons_p}; $s++){ > for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ > my $exon1 = $exons_p->[$s]; > my $exon2 = $exons_p->[$t]; > > if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ > > my $overlap = $exon1->intersection($exon2); > > print "===\n";; > print "ex1\tStrand: ", $exon1->strand, "\n", > $exon1->seq, "\n"; > print "ex2\tStrand: ", $exon2->strand, "\n", > $exon2->seq, "\n"; > print "overlap\tStrand: ", $overlap->strand, "\n", > $overlap->seq, "\n"; > } > } > } > } > > On 5/2/06 13:17, "Brian Osborne" wrote: > >> Marco, >> >> Yes, this is how intersection() is supposed to work. If both of the Range >> objects have the same strand then the strand information is returned as part >> of the result but if they aren't on the same strand then no strand >> information is returned. >> >> Brian O. >> >> >> On 5/2/06 3:30 PM, "Marco Blanchette" wrote: >> >>> Dear all-- >>> >>> I have been trying to use the intersection function to extract overlapping >>> region from alternatively spliced exons as in the following script. The >>> returned object from the 'my $overlap = $exon1->intersection($exon2);' is >>> actually loosing the strand of $exon1 if $exon1 is from the negative strand. >>> Is this behavior expected? Should I check the strand of $exon1 before >>> working on the object return by any Bio::RangeI function? >>> >>> Many thanks >>> >>> #!/usr/bin/perl >>> use strict; >>> use warnings; >>> use Bio::DB::GFF; >>> >>> MAIN:{ >>> >>> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', >>> -dsn => >>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net', >>> -user => 'guest'); >>> my $test_db = $db->segment('4'); >>> >>> # Load up the exons into $exons_p >>> for my $gene ($test_db->features(-types => 'gene')){ >>> >>> my $exons_p = extractExons($gene); >>> >>> cluster($exons_p) unless ($#{$exons_p} == -1); >>> >>> } >>> } >>> >>> sub extractExons { >>> my $gene = shift; >>> my %ex_list; >>> my @tcs = $gene->features( -type =>'processed_transcript', >>> -attributes =>{Gene => $gene->group}); >>> >>> for my $tc (@tcs){ >>> my @exons = $tc->features (-type => 'exon', >>> -attributes => {Parent => $tc->group} >>> ); >>> >>> for (@exons){ >>> my $ex_id = $_->id; >>> $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); >>> >>> } >>> >>> } >>> my @values = values %ex_list; >>> return(\@values); >>> } >>> >>> sub cluster { >>> my $exons_p = shift; >>> >>> for (my $s = 0; $s <= $#{$exons_p}; $s++){ >>> for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ >>> my $exon1 = $exons_p->[$s]; >>> my $exon2 = $exons_p->[$t]; >>> >>> if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ >>> >>> my $overlap = $exon1->intersection($exon2); >>> >>> print "===\n";; >>> print "ex1\n", $exon1->seq, "\n"; >>> print "ex2\n", $exon2->seq, "\n"; >>> print "overlap\n", $overlap->seq, "\n"; >>> } >>> } >>> } >>> } >>> ______________________________ >>> Marco Blanchette, Ph.D. >>> >>> mblanche at uclink.berkeley.edu >>> >>> Donald C. Rio's lab >>> Department of Molecular and Cell Biology >>> 16 Barker Hall >>> University of California >>> Berkeley, CA 94720-3204 >>> >>> Tel: (510) 642-1084 >>> Cell: (510) 847-0996 >>> Fax: (510) 642-6062 >> >> > > ______________________________ > Marco Blanchette, Ph.D. > > mblanche at uclink.berkeley.edu > > Donald C. Rio's lab > Department of Molecular and Cell Biology > 16 Barker Hall > University of California > Berkeley, CA 94720-3204 > > Tel: (510) 642-1084 > Cell: (510) 847-0996 > Fax: (510) 642-6062 From mblanche at berkeley.edu Tue May 2 18:31:44 2006 From: mblanche at berkeley.edu (Marco Blanchette) Date: Tue, 02 May 2006 15:31:44 -0700 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF In-Reply-To: Message-ID: Brian-- I checked out last week version from the CVS. Silly question: How do I get the version of BioPerl I am using... Never had to check a module/bundle version number before... Marco On 5/2/06 14:49, "Brian Osborne" wrote: > Marco, > > Odd, because the intersection() code is quite simple and it's clear how it > should behave. What version of Bioperl are you using? I'm looking at the > latest, in bioperl-live... > > Brian O. > > > On 5/2/06 4:32 PM, "Marco Blanchette" wrote: > >> Brian-- >> >> Even when both elements of intersection() are from the negative strand, the >> return object is from the positive strand and $overlap is actually the >> revervese complement of the intersection between the 2 exons. Here is part >> of the output from the script below: >> >> === >> ex1 Strand: -1 >> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA >> AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG >> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG >> ex2 Strand: -1 >> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA >> AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG >> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAACCCGT >> CAAATCG >> overlap Strand: 1 >> CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTATTTT >> TGCCGACTGCCATGTTCAACTAATAAACCGG >> AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG >> ... >> >> If both are from the positive strand, the return object is positive as in: >> >> === >> ex1 Strand: 1 >> CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCTTTGT >> TTTGTGCCTGTTTCAGTATAAATTAATTATG >> CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGATGAAT >> AAATATACATATATGCAACATATATAACTTC >> CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGGCAGA >> GCAGCGATCAAAGCTGCGTTCGGTACTCGTT >> GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT >> ex2 Strand: 1 >> ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG >> overlap Strand: 1 >> CAACGCAGACGTG >> >> Is there something I am missing? Here is the script generating the output >> >> Many thanks all... >> >> Marco >> >> >> use strict; >> use warnings; >> use Bio::DB::GFF; >> >> MAIN:{ >> >> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', >> -dsn => >> 'dbi:mysql:database=dmel_43_LS;host=riolab.net', >> -user => 'guest'); >> my $test_db = $db->segment('4'); >> >> # Load up the exons into $exons_p >> for my $gene ($test_db->features(-types => 'gene')){ >> >> my $exons_p = extractExons($gene); >> >> cluster($exons_p) unless ($#{$exons_p} == -1); >> >> } >> } >> >> sub extractExons { >> my $gene = shift; >> my %ex_list; >> my @tcs = $gene->features( -type =>'processed_transcript', >> -attributes =>{Gene => $gene->group}); >> >> for my $tc (@tcs){ >> my @exons = $tc->features (-type => 'exon', >> -attributes => {Parent => $tc->group} >> ); >> >> for (@exons){ >> my $ex_id = $_->id; >> $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); >> >> } >> >> } >> my @values = values %ex_list; >> return(\@values); >> } >> >> sub cluster { >> my $exons_p = shift; >> >> for (my $s = 0; $s <= $#{$exons_p}; $s++){ >> for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ >> my $exon1 = $exons_p->[$s]; >> my $exon2 = $exons_p->[$t]; >> >> if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ >> >> my $overlap = $exon1->intersection($exon2); >> >> print "===\n";; >> print "ex1\tStrand: ", $exon1->strand, "\n", >> $exon1->seq, "\n"; >> print "ex2\tStrand: ", $exon2->strand, "\n", >> $exon2->seq, "\n"; >> print "overlap\tStrand: ", $overlap->strand, "\n", >> $overlap->seq, "\n"; >> } >> } >> } >> } >> >> On 5/2/06 13:17, "Brian Osborne" wrote: >> >>> Marco, >>> >>> Yes, this is how intersection() is supposed to work. If both of the Range >>> objects have the same strand then the strand information is returned as part >>> of the result but if they aren't on the same strand then no strand >>> information is returned. >>> >>> Brian O. >>> >>> >>> On 5/2/06 3:30 PM, "Marco Blanchette" wrote: >>> >>>> Dear all-- >>>> >>>> I have been trying to use the intersection function to extract overlapping >>>> region from alternatively spliced exons as in the following script. The >>>> returned object from the 'my $overlap = $exon1->intersection($exon2);' is >>>> actually loosing the strand of $exon1 if $exon1 is from the negative >>>> strand. >>>> Is this behavior expected? Should I check the strand of $exon1 before >>>> working on the object return by any Bio::RangeI function? >>>> >>>> Many thanks >>>> >>>> #!/usr/bin/perl >>>> use strict; >>>> use warnings; >>>> use Bio::DB::GFF; >>>> >>>> MAIN:{ >>>> >>>> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', >>>> -dsn => >>>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net', >>>> -user => 'guest'); >>>> my $test_db = $db->segment('4'); >>>> >>>> # Load up the exons into $exons_p >>>> for my $gene ($test_db->features(-types => 'gene')){ >>>> >>>> my $exons_p = extractExons($gene); >>>> >>>> cluster($exons_p) unless ($#{$exons_p} == -1); >>>> >>>> } >>>> } >>>> >>>> sub extractExons { >>>> my $gene = shift; >>>> my %ex_list; >>>> my @tcs = $gene->features( -type =>'processed_transcript', >>>> -attributes =>{Gene => $gene->group}); >>>> >>>> for my $tc (@tcs){ >>>> my @exons = $tc->features (-type => 'exon', >>>> -attributes => {Parent => $tc->group} >>>> ); >>>> >>>> for (@exons){ >>>> my $ex_id = $_->id; >>>> $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); >>>> >>>> } >>>> >>>> } >>>> my @values = values %ex_list; >>>> return(\@values); >>>> } >>>> >>>> sub cluster { >>>> my $exons_p = shift; >>>> >>>> for (my $s = 0; $s <= $#{$exons_p}; $s++){ >>>> for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ >>>> my $exon1 = $exons_p->[$s]; >>>> my $exon2 = $exons_p->[$t]; >>>> >>>> if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ >>>> >>>> my $overlap = $exon1->intersection($exon2); >>>> >>>> print "===\n";; >>>> print "ex1\n", $exon1->seq, "\n"; >>>> print "ex2\n", $exon2->seq, "\n"; >>>> print "overlap\n", $overlap->seq, "\n"; >>>> } >>>> } >>>> } >>>> } >>>> ______________________________ >>>> Marco Blanchette, Ph.D. >>>> >>>> mblanche at uclink.berkeley.edu >>>> >>>> Donald C. Rio's lab >>>> Department of Molecular and Cell Biology >>>> 16 Barker Hall >>>> University of California >>>> Berkeley, CA 94720-3204 >>>> >>>> Tel: (510) 642-1084 >>>> Cell: (510) 847-0996 >>>> Fax: (510) 642-6062 >>> >>> >> >> ______________________________ >> Marco Blanchette, Ph.D. >> >> mblanche at uclink.berkeley.edu >> >> Donald C. Rio's lab >> Department of Molecular and Cell Biology >> 16 Barker Hall >> University of California >> Berkeley, CA 94720-3204 >> >> Tel: (510) 642-1084 >> Cell: (510) 847-0996 >> Fax: (510) 642-6062 > > ______________________________ Marco Blanchette, Ph.D. mblanche at uclink.berkeley.edu Donald C. Rio's lab Department of Molecular and Cell Biology 16 Barker Hall University of California Berkeley, CA 94720-3204 Tel: (510) 642-1084 Cell: (510) 847-0996 Fax: (510) 642-6062 -- From arareko at campus.iztacala.unam.mx Tue May 2 18:32:24 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Tue, 02 May 2006 17:32:24 -0500 Subject: [Bioperl-l] BioPerl-run in FreeBSD Message-ID: <4457DDF8.4050005@campus.iztacala.unam.mx> It?s my great pleasure to announce the availability of the BioPerl-run packages (stable & developer releases) for the FreeBSD operating system. For instructions on how to install BioPerl ports in FreeBSD, please take a look into the Getting Bioperl section of the BioPerl Wiki. Regards, Mauricio. -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From heikki at sanbi.ac.za Wed May 3 02:51:12 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 3 May 2006 08:51:12 +0200 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF In-Reply-To: References: Message-ID: <200605030851.13007.heikki@sanbi.ac.za> On Wednesday 03 May 2006 00:31, Marco Blanchette wrote: > Brian-- > > I checked out last week version from the CVS. > > Silly question: How do I get the version of BioPerl I am using... Never had > to check a module/bundle version number before... It is not that silly. The syntax in not too easy: perl -MBio::Perl -le 'print Bio::Perl->VERSION;' You can use any module in bioperl, of course. -Heikki > Marco > > On 5/2/06 14:49, "Brian Osborne" wrote: > > Marco, > > > > Odd, because the intersection() code is quite simple and it's clear how > > it should behave. What version of Bioperl are you using? I'm looking at > > the latest, in bioperl-live... > > > > Brian O. > > > > On 5/2/06 4:32 PM, "Marco Blanchette" wrote: > >> Brian-- > >> > >> Even when both elements of intersection() are from the negative strand, > >> the return object is from the positive strand and $overlap is actually > >> the revervese complement of the intersection between the 2 exons. Here > >> is part of the output from the script below: > >> > >> === > >> ex1 Strand: -1 > >> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAA > >>AATA AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG > >> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG > >> ex2 Strand: -1 > >> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAA > >>AATA AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG > >> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAAC > >>CCGT CAAATCG > >> overlap Strand: 1 > >> CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTA > >>TTTT TGCCGACTGCCATGTTCAACTAATAAACCGG > >> AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG > >> ... > >> > >> If both are from the positive strand, the return object is positive as > >> in: > >> > >> === > >> ex1 Strand: 1 > >> CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCT > >>TTGT TTTGTGCCTGTTTCAGTATAAATTAATTATG > >> CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGAT > >>GAAT AAATATACATATATGCAACATATATAACTTC > >> CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGG > >>CAGA GCAGCGATCAAAGCTGCGTTCGGTACTCGTT > >> GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT > >> ex2 Strand: 1 > >> ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG > >> overlap Strand: 1 > >> CAACGCAGACGTG > >> > >> Is there something I am missing? Here is the script generating the > >> output > >> > >> Many thanks all... > >> > >> Marco > >> > >> > >> use strict; > >> use warnings; > >> use Bio::DB::GFF; > >> > >> MAIN:{ > >> > >> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', > >> -dsn => > >> 'dbi:mysql:database=dmel_43_LS;host=riolab.net', > >> -user => 'guest'); > >> my $test_db = $db->segment('4'); > >> > >> # Load up the exons into $exons_p > >> for my $gene ($test_db->features(-types => 'gene')){ > >> > >> my $exons_p = extractExons($gene); > >> > >> cluster($exons_p) unless ($#{$exons_p} == -1); > >> > >> } > >> } > >> > >> sub extractExons { > >> my $gene = shift; > >> my %ex_list; > >> my @tcs = $gene->features( -type =>'processed_transcript', > >> -attributes =>{Gene => > >> $gene->group}); > >> > >> for my $tc (@tcs){ > >> my @exons = $tc->features (-type => 'exon', > >> -attributes => {Parent => > >> $tc->group} ); > >> > >> for (@exons){ > >> my $ex_id = $_->id; > >> $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); > >> > >> } > >> > >> } > >> my @values = values %ex_list; > >> return(\@values); > >> } > >> > >> sub cluster { > >> my $exons_p = shift; > >> > >> for (my $s = 0; $s <= $#{$exons_p}; $s++){ > >> for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ > >> my $exon1 = $exons_p->[$s]; > >> my $exon2 = $exons_p->[$t]; > >> > >> if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ > >> > >> my $overlap = $exon1->intersection($exon2); > >> > >> print "===\n";; > >> print "ex1\tStrand: ", $exon1->strand, "\n", > >> $exon1->seq, "\n"; > >> print "ex2\tStrand: ", $exon2->strand, "\n", > >> $exon2->seq, "\n"; > >> print "overlap\tStrand: ", $overlap->strand, "\n", > >> $overlap->seq, "\n"; > >> } > >> } > >> } > >> } > >> > >> On 5/2/06 13:17, "Brian Osborne" wrote: > >>> Marco, > >>> > >>> Yes, this is how intersection() is supposed to work. If both of the > >>> Range objects have the same strand then the strand information is > >>> returned as part of the result but if they aren't on the same strand > >>> then no strand information is returned. > >>> > >>> Brian O. > >>> > >>> On 5/2/06 3:30 PM, "Marco Blanchette" wrote: > >>>> Dear all-- > >>>> > >>>> I have been trying to use the intersection function to extract > >>>> overlapping region from alternatively spliced exons as in the > >>>> following script. The returned object from the 'my $overlap = > >>>> $exon1->intersection($exon2);' is actually loosing the strand of > >>>> $exon1 if $exon1 is from the negative strand. > >>>> Is this behavior expected? Should I check the strand of $exon1 before > >>>> working on the object return by any Bio::RangeI function? > >>>> > >>>> Many thanks > >>>> > >>>> #!/usr/bin/perl > >>>> use strict; > >>>> use warnings; > >>>> use Bio::DB::GFF; > >>>> > >>>> MAIN:{ > >>>> > >>>> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', > >>>> -dsn => > >>>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net', > >>>> -user => 'guest'); > >>>> my $test_db = $db->segment('4'); > >>>> > >>>> # Load up the exons into $exons_p > >>>> for my $gene ($test_db->features(-types => 'gene')){ > >>>> > >>>> my $exons_p = extractExons($gene); > >>>> > >>>> cluster($exons_p) unless ($#{$exons_p} == -1); > >>>> > >>>> } > >>>> } > >>>> > >>>> sub extractExons { > >>>> my $gene = shift; > >>>> my %ex_list; > >>>> my @tcs = $gene->features( -type =>'processed_transcript', > >>>> -attributes =>{Gene => > >>>> $gene->group}); > >>>> > >>>> for my $tc (@tcs){ > >>>> my @exons = $tc->features (-type => 'exon', > >>>> -attributes => {Parent => > >>>> $tc->group} ); > >>>> > >>>> for (@exons){ > >>>> my $ex_id = $_->id; > >>>> $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); > >>>> > >>>> } > >>>> > >>>> } > >>>> my @values = values %ex_list; > >>>> return(\@values); > >>>> } > >>>> > >>>> sub cluster { > >>>> my $exons_p = shift; > >>>> > >>>> for (my $s = 0; $s <= $#{$exons_p}; $s++){ > >>>> for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ > >>>> my $exon1 = $exons_p->[$s]; > >>>> my $exon2 = $exons_p->[$t]; > >>>> > >>>> if (!($exon1->equals($exon2)) && > >>>> $exon1->overlaps($exon2)){ > >>>> > >>>> my $overlap = $exon1->intersection($exon2); > >>>> > >>>> print "===\n";; > >>>> print "ex1\n", $exon1->seq, "\n"; > >>>> print "ex2\n", $exon2->seq, "\n"; > >>>> print "overlap\n", $overlap->seq, "\n"; > >>>> } > >>>> } > >>>> } > >>>> } > >>>> ______________________________ > >>>> Marco Blanchette, Ph.D. > >>>> > >>>> mblanche at uclink.berkeley.edu > >>>> > >>>> Donald C. Rio's lab > >>>> Department of Molecular and Cell Biology > >>>> 16 Barker Hall > >>>> University of California > >>>> Berkeley, CA 94720-3204 > >>>> > >>>> Tel: (510) 642-1084 > >>>> Cell: (510) 847-0996 > >>>> Fax: (510) 642-6062 > >> > >> ______________________________ > >> Marco Blanchette, Ph.D. > >> > >> mblanche at uclink.berkeley.edu > >> > >> Donald C. Rio's lab > >> Department of Molecular and Cell Biology > >> 16 Barker Hall > >> University of California > >> Berkeley, CA 94720-3204 > >> > >> Tel: (510) 642-1084 > >> Cell: (510) 847-0996 > >> Fax: (510) 642-6062 > > ______________________________ > Marco Blanchette, Ph.D. > > mblanche at uclink.berkeley.edu > > Donald C. Rio's lab > Department of Molecular and Cell Biology > 16 Barker Hall > University of California > Berkeley, CA 94720-3204 > > Tel: (510) 642-1084 > Cell: (510) 847-0996 > Fax: (510) 642-6062 -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From nuclearn at gmail.com Wed May 3 02:05:42 2006 From: nuclearn at gmail.com (Li Xiao) Date: Wed, 3 May 2006 14:05:42 +0800 Subject: [Bioperl-l] about the frame and strand of a blastx report Message-ID: <150864390605022305p5a04e743l24938386af12edf3@mail.gmail.com> Hi, anybody, I am working to parse a blastx report by using BioPerl modules (Bio::SearchIO). The blastx result was created by NCBI-BLAST. How i can obtain the strand ( + or -) of query sequence against the hited protein? I tried to use the strand function, but nothing were reported. And i used the frame funtion, the result usually display 0,1,2, so, the result can not give any information about the query strand( + o r- ). How i obtain the strand of a query squence? -- ********************************************************************* Li Xiao Sichuan Key Laboratory of Molecular Biology and Biotechnology College of Life Science, Sichuan University Chengdu, SiChuan, P.R.China TEL:86-28-85470083 FAX:86-28-85412738 E-MAIL: nuclearn at gmail.com URL: http://scbi.scu.edu.cn ********************************************************************** From cjfields at uiuc.edu Wed May 3 09:38:17 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 3 May 2006 08:38:17 -0500 Subject: [Bioperl-l] about the frame and strand of a blastx report In-Reply-To: <150864390605022305p5a04e743l24938386af12edf3@mail.gmail.com> Message-ID: <000601c66eb6$d5d5f530$15327e82@pyrimidine> $hsp->strand(): my $parser = Bio::SearchIO->new (-file => shift @ARGV, -format => 'blast'); while (my $result = $parser->next_result) { while (my $hit = $result->next_hit) { while (my $hsp = $hit->next_hsp) { print $hsp->strand,"\n"; } } } This will give 1 or -1. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Li Xiao > Sent: Wednesday, May 03, 2006 1:06 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] about the frame and strand of a blastx report > > Hi, anybody, > > I am working to parse a blastx report by using BioPerl modules > (Bio::SearchIO). > The blastx result was created by NCBI-BLAST. How i can obtain the strand ( > + > or -) > of query sequence against the hited protein? I tried to use the strand > function, but > nothing were reported. And i used the frame funtion, the result usually > display 0,1,2, > so, the result can not give any information about the query strand( + o r- > ). > How i obtain the strand of a query squence? > -- > ********************************************************************* > Li Xiao > Sichuan Key Laboratory of Molecular Biology and Biotechnology > College of Life Science, Sichuan University > Chengdu, SiChuan, P.R.China > TEL:86-28-85470083 FAX:86-28-85412738 > E-MAIL: nuclearn at gmail.com > URL: http://scbi.scu.edu.cn > ********************************************************************** > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Wed May 3 11:22:27 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 03 May 2006 11:22:27 -0400 Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines In-Reply-To: <20060502114101.29745.qmail@web50409.mail.yahoo.com> Message-ID: Mark, So you're trying to get the information in the RC line from a Swissprot format file? Brian O. On 5/2/06 7:41 AM, "Mark A. Miller" wrote: > Hello all. > > I have a recently donwloaded UniProt/TrEMBL flat file. I am trying to > make FASTA subset files for some bacterial strains. I haven't been > able to parse out the strain information from the OS or RC lines. > These lines typically look like: > > OS Somegenus somespecies subsp. somesubspecies strain ABC123. > RC STRAIN=ABC123. > > I'm not especiialy good with Perl, and I'm definitely weak when it > comes to OOP. > > I have included some code I pasted together from various pages on the > bioperl wiki. In addition to the wiki, I have been making use of > www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html > > The code I have so far reports the species but not the subspecies or > variant. I have also tried to walk through all of the feature, > annotation and reference objects but I still can't seem to parse out > the information I need. (For brevity, the example I'm including below > only lists the code I used for the annotation objects.) Also, this > code only prints the information... I know that I'll have to write a > FASTA sequence object seperately. > > Any suggestions? > > Thanks, > Mark > > --- --- --- > > > #!/usr/bin/perl > > > > use Bio::SeqIO; > > > > my $usage = "getaccs.pl file format\n"; > > my $file = shift or die $usage; > > my $format = shift or die $usage; > > > > my $inseq = Bio::SeqIO->new(-file => "<$file", > > -format => $format ); > > > > while (my $seq = $inseq->next_seq) { > > > > my $species_object = $seq->species; > > my $species_string = $species_object->species; > > my $variant_string = $species_object->variant; > > my $common_string = $species_object->common_name; > > my $sub_string = $species_object->sub_species; > > my $binomial = $species_object->binomial('FULL'); > > > > print "display ",$seq->display_id,"\n"; > > print "accession ",$seq->accession_number,"\n"; > > print "desc ",$seq->desc,"\n"; > > > > print "species ",$species_string,"\n"; > > print "variant ",$variant_string,"\n"; > > print "common ",$common_string,"\n"; > > print "sub ",$sub_string,"\n"; > > print "binomial ",$binomial,"\n"; > > > > print $seq->seq,"\n"; > > > > my $anno_collection = $seq->annotation; > > for my $key ( $anno_collection->get_all_annotation_keys ) { > > my @annotations = $anno_collection->get_Annotations($key); > > for my $value ( @annotations ) { > > print "tagname : ", $value->tagname, "\n"; > > # $value is an Bio::Annotation, and has an "as_text" method > > print " annotation value: ", $value->as_text, "\n"; > > > > if ($value->tagname eq "reference") { > > my $hash_ref = $value->hash_tree; > > for my $key (keys %{$hash_ref}) { > > print $key,": ",$hash_ref->{$key},"\n"; > > } > > } > > } > > } > > print "\n"; > > } > > exit; > > > > > > --- --- --- --- --- --- --- --- > > Mark A. Miller > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From MEC at stowers-institute.org Wed May 3 11:09:04 2006 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Wed, 3 May 2006 10:09:04 -0500 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF Message-ID: Marco, It appears that your code assumes that the exons as returned from call to BIO::DB::GFF::features are sorted by start; I don't think is guaranteed (at least not in the documentation I'm reading). Also I think your code will not report overlap between two exons that have an intervening overlapping exon. Depending on what you're application is, you may care. For example, e1, e2, e3 all intersect pairwise, but your code won't report on e1's overlap with e3. e1 ---*******------- e2 -----******------ e3 ------***-------- Out of curiousity, what is your application? Designing primers for gene resequencing? Cheers, Malcolm Cook Database Applications Manager, Bioinformatics Stowers Institute for Medical Research >-----Original Message----- >From: bioperl-l-bounces at lists.open-bio.org >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >Marco Blanchette >Sent: Tuesday, May 02, 2006 2:31 PM >To: bioperl-l at lists.open-bio.org >Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF > >Dear all-- > >I have been trying to use the intersection function to extract >overlapping >region from alternatively spliced exons as in the following script. The >returned object from the 'my $overlap = >$exon1->intersection($exon2);' is >actually loosing the strand of $exon1 if $exon1 is from the >negative strand. >Is this behavior expected? Should I check the strand of $exon1 before >working on the object return by any Bio::RangeI function? > >Many thanks > >#!/usr/bin/perl >use strict; >use warnings; >use Bio::DB::GFF; > >MAIN:{ > > my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', > -dsn => >'dbi:mysql:database=dmel_43_LS;host=riolab.net', > -user => 'guest'); > my $test_db = $db->segment('4'); > > # Load up the exons into $exons_p > for my $gene ($test_db->features(-types => 'gene')){ > > my $exons_p = extractExons($gene); > > cluster($exons_p) unless ($#{$exons_p} == -1); > > } >} > >sub extractExons { > my $gene = shift; > my %ex_list; > my @tcs = $gene->features( -type =>'processed_transcript', > -attributes =>{Gene => >$gene->group}); > > for my $tc (@tcs){ > my @exons = $tc->features (-type => 'exon', > -attributes => {Parent => >$tc->group} >); > > for (@exons){ > my $ex_id = $_->id; > $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); > > } > > } > my @values = values %ex_list; > return(\@values); >} > >sub cluster { > my $exons_p = shift; > > for (my $s = 0; $s <= $#{$exons_p}; $s++){ > for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ > my $exon1 = $exons_p->[$s]; > my $exon2 = $exons_p->[$t]; > > if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ > > my $overlap = $exon1->intersection($exon2); > > print "===\n";; > print "ex1\n", $exon1->seq, "\n"; > print "ex2\n", $exon2->seq, "\n"; > print "overlap\n", $overlap->seq, "\n"; > } > } > } >} >______________________________ >Marco Blanchette, Ph.D. > >mblanche at uclink.berkeley.edu > >Donald C. Rio's lab >Department of Molecular and Cell Biology >16 Barker Hall >University of California >Berkeley, CA 94720-3204 > >Tel: (510) 642-1084 >Cell: (510) 847-0996 >Fax: (510) 642-6062 >-- > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From sdavis2 at mail.nih.gov Wed May 3 12:18:48 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 03 May 2006 12:18:48 -0400 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF In-Reply-To: Message-ID: On 5/3/06 11:09 AM, "Cook, Malcolm" wrote: > Marco, > > It appears that your code assumes that the exons as returned from call > to BIO::DB::GFF::features are sorted by start; I don't think is > guaranteed (at least not in the documentation I'm reading). Also I > think your code will not report overlap between two exons that have an > intervening overlapping exon. Depending on what you're application is, > you may care. For example, e1, e2, e3 all intersect pairwise, but your > code won't report on e1's overlap with e3. > > e1 ---*******------- > e2 -----******------ > e3 ------***-------- I think this can be done (looking for "superexons") via the UCSC table browser or via Penn State University's Galaxy server (written in python and downloadable) in case you want a quick solution to what I think is your problem.... Sean From osborne1 at optonline.net Wed May 3 16:22:57 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 03 May 2006 16:22:57 -0400 Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines In-Reply-To: <20060503193446.92476.qmail@web50412.mail.yahoo.com> Message-ID: Mark, The RC line is part of the description of a reference, I'm guessing 'RC' stands for Reference Comment. In order to get the attributes of a reference you'll first do something like: my $anno_collection = $seq->annotation; my @references = $anno_collection->get_Annotations('reference'); To get the comment field for a specific reference you can do: $references[0]->comment; See the Feature-Annotation HOWTO for more information on Annotations, the Reference object is a kind of Annotation object. Brian O. On 5/3/06 3:34 PM, "Mark A. Miller" wrote: > Yeah. Do you have any experience with that? > > Mark > > --- Brian Osborne wrote: > >> Mark, >> >> So you're trying to get the information in the RC line from a >> Swissprot >> format file? >> >> Brian O. > > > --- --- --- --- --- --- --- --- > > Mark A. Miller > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com From cjfields at uiuc.edu Wed May 3 17:09:36 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 3 May 2006 16:09:36 -0500 Subject: [Bioperl-l] Batch retrieval partially implemented in Bio::DB::GenBank/GenPept Message-ID: <000601c66ef5$e3066d90$15327e82@pyrimidine> Just wanted to let you guys know I have added a few bits and pieces to Bio::DB::Gen* and BioLLDB::NCBIHelper for batch retrieval using epost/efetch. I didn't want to break anything too severely so you can only use this at the moment using get_seq_stream (i.e. NOT through get_Stream* methods yet). I also added tests to DB.t, a few each for protein and nucleotide retrieval using batch mode and so far they all pass fine. I haven't tested the upper sequence limit for this yet to see if it's at all comparable to just using efetch but it seems a bit faster. The eutils coursebook states that one should only post ~500 at a time (I think you can get a bit higher though). Also, at the moment it only works at the moment for GI's (NOT accessions, which apparently epost does not accept). If we want to continue using this method for retrieval then we may need a workaround for accs. CJF Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From torsten.seemann at infotech.monash.edu.au Wed May 3 17:44:48 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Thu, 04 May 2006 07:44:48 +1000 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF In-Reply-To: References: Message-ID: <1146692688.12571.1.camel@chauvel.csse.monash.edu.au> Marco, > Silly question: How do I get the version of BioPerl I am using... Never had > to check a module/bundle version number before... http://bioperl.org/wiki/FAQ#How_can_I_tell_what_version_of_BioPerl_is_installed.3F -- Torsten Seemann Victorian Bioinformatics Consortium From cjfields at uiuc.edu Wed May 3 18:08:37 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 3 May 2006 17:08:37 -0500 Subject: [Bioperl-l] Batch retrieval partially implemented inBio::DB::GenBank/GenPept In-Reply-To: <000601c66ef5$e3066d90$15327e82@pyrimidine> Message-ID: <000001c66efe$21dbcf80$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Wednesday, May 03, 2006 4:10 PM > To: 'Jason Stajich'; 'Brian Osborne'; bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Batch retrieval partially implemented > inBio::DB::GenBank/GenPept > > Just wanted to let you guys know I have added a few bits and pieces to > Bio::DB::Gen* and BioLLDB::NCBIHelper for batch retrieval using ^^^^^^^^^^^^^^^^^^^ Bio::DB::NCBIHelper Fat fingers! > epost/efetch. I didn't want to break anything too severely so you can > only > use this at the moment using get_seq_stream (i.e. NOT through get_Stream* > methods yet). I also added tests to DB.t, a few each for protein and > nucleotide retrieval using batch mode and so far they all pass fine. > > I haven't tested the upper sequence limit for this yet to see if it's at > all > comparable to just using efetch but it seems a bit faster. The eutils > coursebook states that one should only post ~500 at a time (I think you > can > get a bit higher though). > > Also, at the moment it only works at the moment for GI's (NOT accessions, > which apparently epost does not accept). If we want to continue using > this > method for retrieval then we may need a workaround for accs. > > CJF > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From arareko at campus.iztacala.unam.mx Wed May 3 18:24:23 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Wed, 03 May 2006 17:24:23 -0500 Subject: [Bioperl-l] Batch retrieval partially implemented inBio::DB::GenBank/GenPept In-Reply-To: <000001c66efe$21dbcf80$15327e82@pyrimidine> References: <000001c66efe$21dbcf80$15327e82@pyrimidine> Message-ID: <44592D97.6090906@campus.iztacala.unam.mx> hehehe :) Chris Fields wrote: > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Chris Fields >> Sent: Wednesday, May 03, 2006 4:10 PM >> To: 'Jason Stajich'; 'Brian Osborne'; bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Batch retrieval partially implemented >> inBio::DB::GenBank/GenPept >> >> Just wanted to let you guys know I have added a few bits and pieces to >> Bio::DB::Gen* and BioLLDB::NCBIHelper for batch retrieval using > ^^^^^^^^^^^^^^^^^^^ > Bio::DB::NCBIHelper > Fat fingers! > >> epost/efetch. I didn't want to break anything too severely so you can >> only >> use this at the moment using get_seq_stream (i.e. NOT through get_Stream* >> methods yet). I also added tests to DB.t, a few each for protein and >> nucleotide retrieval using batch mode and so far they all pass fine. >> >> I haven't tested the upper sequence limit for this yet to see if it's at >> all >> comparable to just using efetch but it seems a bit faster. The eutils >> coursebook states that one should only post ~500 at a time (I think you >> can >> get a bit higher though). >> >> Also, at the moment it only works at the moment for GI's (NOT accessions, >> which apparently epost does not accept). If we want to continue using >> this >> method for retrieval then we may need a workaround for accs. >> >> CJF >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From fernan at iib.unsam.edu.ar Wed May 3 20:38:07 2006 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Wed, 3 May 2006 21:38:07 -0300 Subject: [Bioperl-l] BioPerl-run in FreeBSD In-Reply-To: <4457DDF8.4050005@campus.iztacala.unam.mx> References: <4457DDF8.4050005@campus.iztacala.unam.mx> Message-ID: <20060504003807.GA86447@iib.unsam.edu.ar> +----[ Mauricio Herrera Cuadra (02.May.2006 19:49): | | It?s my great pleasure to announce the availability of the BioPerl-run | packages (stable & developer releases) for the FreeBSD operating system. | | For instructions on how to install BioPerl ports in FreeBSD, please take | a look into the Getting Bioperl section of the BioPerl Wiki. | +----] Great job Mauricio, thanks for contributing this! Fernan From miker at biotiquesystems.com Tue May 2 23:31:59 2006 From: miker at biotiquesystems.com (Michael Rogoff) Date: Tue, 2 May 2006 20:31:59 -0700 Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps Message-ID: <007b01c66e62$23161d20$c100a8c0@mike> I've encountered a pretty serious bug in Bio::SeqIO when parsing certain genbank files that contain CONTIG entries with gaps. One such record is NW_925173. When I try to parse this file using Bio::SeqIO::genbank, it will enter an infinite loop and spin until it runs out of memory. I'm pretty certain it relates to this bug: http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to indicate that genbank records with CONTIG gaps are not valid and can't be parsed. But this bug actually claims to be fixed, which is strange, since looking at the code for FTLocationFactory (where the loop is) it's still right there. I assume that this may be fixed in other contexts but is still not fixed in Bio::SeqIO::genbank? Or am I doing something wrong? I think that this should probably be filed as an open bug. I would think that even if bioperl isn't interested in parsing this type of file via SeqIO, certainly you'd want to ensure that no finite input file would send the parser into an infinite loop. Have others encountered this problem? Is there any plan to address it? Thanks very much for any information or help! -Mike P.S. I've played around with my version of FTLocationFactory and it seems to actually work and parse the gaps. I'm not sure if I've created other bugs or if it works in all cases, but at least the parser doesn't die. I also don't know that my hacky code is appropriate for putting back in to BioPerl, but I'm happy to provide it if someone wants to check it out and/or consider it for checkin. From ULNJUJERYDIX at spammotel.com Wed May 3 04:20:38 2006 From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau) Date: Wed, 3 May 2006 16:20:38 +0800 Subject: [Bioperl-l] Bio::Graphics::Panel imagemap making with Bio::Graphics::Panel Message-ID: <5b6410e0605030120q31d1f554mbc4bf104deca48bf@mail.gmail.com> Help! I can't figure out the docs instructions I want to create an imagemap of short sequence matches with a longer one with clickable imagemaps for the short sequences. I figure I can do this easily enough using the example script for parsing blast output but I need an example script to understand how to produce the html code for the imagemap. I can find only rather cryptic references about how this can be done (see below). $boxes = $panel-Eboxes @boxes = $panel-Eboxes The boxes() method returns a list of arrayrefs containing the coordinates of each glyph. The method is useful for constructing an image map. In a scalar context, boxes() returns an arrayref. In an list context, the method returns the list directly. Each member of the list is an arrayref of the following format: [ $feature, $x1, $y1, $x2, $y2, $track ] The first element is the feature object; either an Ace::Sequence::Feature, a Das::Segment::Feature, or another Bioperl Bio::SeqFeatureI object. The coordinates are the topleft and bottomright corners of the glyph, including any space allocated for labels. The track is the Bio::Graphics::Glyph object corresponding to the track that the feature is rendered inside. $position = $panel-Etrack_position($track) After calling gd() or boxes(), you can learn the resulting Y coordinate of a track by calling track_position() with the value returned by add_track() or unshift_track(). This will return undef if called before gd() or boxes() or with an invalid track. @pixel_coords = $panel-Elocation2pixel(@feature_coords) Public routine to map feature coordinates (in base pairs) into pixel coordinates relative to the left-hand edge of the picture. If you define a -background callback, the callback may wish to invoke this routine in order to translate base coordinates into pixel coordinates. $left = $panel-Eleft $right = $panel-Eright $top = $panel-Etop $bottom = $panel-Ebottom Return the pixel coordinates of the *drawing area* of the panel, that is, exclusive of the padding. got it from http://docs.bioperl.org/bioperl-live/Bio/Graphics/Panel.html From s.johri at imperial.ac.uk Thu May 4 08:50:34 2006 From: s.johri at imperial.ac.uk (Johri, Saurabh) Date: Thu, 4 May 2006 13:50:34 +0100 Subject: [Bioperl-l] Fu and Li's D statistic - calculate Message-ID: <4A98ACB8EC146149872BAC9A132A582C277AB3@icex5.ic.ac.uk> Hi all, I'm trying to calculate Fu and Li's D summary statistic for a group of sequences. the function fu_and_li_D(@ingroup,$extmutations) takes 2 args, the first being the ingroup (population) and the second being the number of external mutations which is calculated from an outgroup sequence.. my question is, which function do i use to calculate the number of external mutations ? would this be the singleton_count() function ? the singleton_count() function takes a PopGen object - which represents a clustal alignment file... would i include the outgroup in a multiple fasta file for alignment with clustal ? any suggestions as to how to calculate the number of external mutations would be much appreciated Thanks for your help! Saurabh Johri Centre for Molecular Microbiology & Infection Imperial College London SW7 2AZ From hlapp at gmx.net Thu May 4 12:30:05 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 4 May 2006 12:30:05 -0400 Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps In-Reply-To: <007b01c66e62$23161d20$c100a8c0@mike> References: <007b01c66e62$23161d20$c100a8c0@mike> Message-ID: Infinite loop on a file you can download (i.e., as opposed to a file you tinkered with) is never ok. Could you file this as a bug report? And ideally attach your patch? Thanks, -hilmar On May 2, 2006, at 11:31 PM, Michael Rogoff wrote: > > I've encountered a pretty serious bug in Bio::SeqIO when parsing > certain genbank > files that contain CONTIG entries with gaps. One such record is > NW_925173. > > When I try to parse this file using Bio::SeqIO::genbank, it will > enter an > infinite loop and spin until it runs out of memory. > > I'm pretty certain it relates to this bug: > http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to > indicate that > genbank records with CONTIG gaps are not valid and can't be > parsed. But this > bug actually claims to be fixed, which is strange, since looking at > the code for > FTLocationFactory (where the loop is) it's still right there. I > assume that > this may be fixed in other contexts but is still not fixed in > Bio::SeqIO::genbank? Or am I doing something wrong? > > I think that this should probably be filed as an open bug. I would > think that > even if bioperl isn't interested in parsing this type of file via > SeqIO, > certainly you'd want to ensure that no finite input file would send > the parser > into an infinite loop. Have others encountered this problem? Is > there any plan > to address it? > > Thanks very much for any information or help! > > -Mike > > P.S. I've played around with my version of FTLocationFactory and > it seems to > actually work and parse the gaps. I'm not sure if I've created > other bugs or if > it works in all cases, but at least the parser doesn't die. I also > don't know > that my hacky code is appropriate for putting back in to BioPerl, > but I'm happy > to provide it if someone wants to check it out and/or consider it > for checkin. > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From saldroubi at yahoo.com Thu May 4 13:03:00 2006 From: saldroubi at yahoo.com (Sam Al-Droubi) Date: Thu, 4 May 2006 10:03:00 -0700 (PDT) Subject: [Bioperl-l] Is webiste down? Message-ID: <20060504170300.12178.qmail@web34301.mail.mud.yahoo.com> All, Is the bioperl website down? I can't get to http://www.bioperl.org Thank you. Sincerely, Sam Al-Droubi, M.S. saldroubi at yahoo.com From arareko at campus.iztacala.unam.mx Thu May 4 14:22:52 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 04 May 2006 13:22:52 -0500 Subject: [Bioperl-l] Is webiste down? In-Reply-To: <20060504170300.12178.qmail@web34301.mail.mud.yahoo.com> References: <20060504170300.12178.qmail@web34301.mail.mud.yahoo.com> Message-ID: <445A467C.4070700@campus.iztacala.unam.mx> Website is ok, maybe your gateway can't lookup the bioperl server at the moment. Regards, Mauricio. Sam Al-Droubi wrote: > All, > > Is the bioperl website down? I can't get to http://www.bioperl.org > > > Thank you. > > > > Sincerely, > Sam Al-Droubi, M.S. > saldroubi at yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Thu May 4 14:40:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 4 May 2006 13:40:32 -0500 Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps In-Reply-To: <007b01c66e62$23161d20$c100a8c0@mike> Message-ID: <000001c66faa$3a25b130$15327e82@pyrimidine> Are you using the CONTIG record or the full GenBank file? I see problems with both (using bioperl-live) which seem unrelated to one another. The full file seems to be running a bit slow b/c the full GenBank record is huge (~55 MB) but the CONTIG file does exactly what you said (runs out of memory). Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Michael Rogoff > Sent: Tuesday, May 02, 2006 10:32 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > > I've encountered a pretty serious bug in Bio::SeqIO when parsing certain > genbank > files that contain CONTIG entries with gaps. One such record is > NW_925173. > > When I try to parse this file using Bio::SeqIO::genbank, it will enter an > infinite loop and spin until it runs out of memory. > > I'm pretty certain it relates to this bug: > http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to indicate > that > genbank records with CONTIG gaps are not valid and can't be parsed. But > this > bug actually claims to be fixed, which is strange, since looking at the > code for > FTLocationFactory (where the loop is) it's still right there. I assume > that > this may be fixed in other contexts but is still not fixed in > Bio::SeqIO::genbank? Or am I doing something wrong? > > I think that this should probably be filed as an open bug. I would think > that > even if bioperl isn't interested in parsing this type of file via SeqIO, > certainly you'd want to ensure that no finite input file would send the > parser > into an infinite loop. Have others encountered this problem? Is there > any plan > to address it? > > Thanks very much for any information or help! > > -Mike > > P.S. I've played around with my version of FTLocationFactory and it seems > to > actually work and parse the gaps. I'm not sure if I've created other bugs > or if > it works in all cases, but at least the parser doesn't die. I also don't > know > that my hacky code is appropriate for putting back in to BioPerl, but I'm > happy > to provide it if someone wants to check it out and/or consider it for > checkin. > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From j.abbott at imperial.ac.uk Thu May 4 11:44:44 2006 From: j.abbott at imperial.ac.uk (James Abbott) Date: Thu, 04 May 2006 16:44:44 +0100 Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines In-Reply-To: <7B49D031-9F74-43C3-AA4F-2AE115BB843D@duke.edu> References: <20060502114101.29745.qmail@web50409.mail.yahoo.com> <7B49D031-9F74-43C3-AA4F-2AE115BB843D@duke.edu> Message-ID: <445A216C.7090108@imperial.ac.uk> Jason Stajich wrote: > I don't know if any of this has been resolved really so hopefully > James will speak up if he's implemented anything. Not as yet, I'm afraid - $job is keeping me overly busy at the moment, but it's on my todo list.... Cheers, James -- Dr. James Abbott Bioinformatics Software Developer, Bioinformatics Support Service Imperial College, London From hubert.prielinger at gmx.at Thu May 4 15:35:42 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Thu, 04 May 2006 13:35:42 -0600 Subject: [Bioperl-l] can't parse blast file anymore Message-ID: <445A578E.8050207@gmx.at> Hi, the following perl script worked fine until a few days ago.... ============================================================== #!/usr/bin/perl -w use Bio::SearchIO; use strict; use DBI; use Net::MySQL; #use lib qw(/usr/local/lib/perl5/site_perl/5.8.6/i686-linux); print "trying to connect to database \n"; my $database = 'antimicro_peptides'; my $host = 'ppc7.bio.ucalgary.ca'; my $user = 'Hubert'; my $password = 'Col00eng30'; my $mysql = Net::MySQL->new( hostname => $host, database => $database, user => $user, password => $password, ); print "Connection established \n"; my $selectID = 0; my $count = 0; ##output database results #while (my @row = $sth->fetchrow_array) # { print "@row\n" } print "start program\n"; my $directory = '/home/Hubert/test'; opendir(DIR, $directory) || die("Cannot open directory"); print "opened directory\n"; foreach my $file (readdir(DIR)) { if ($file =~ /txt$/) { $count++; print "read file $file \n"; $file = $directory . '/' . $file; my $search = new Bio::SearchIO (-format => 'blast', -file => $file); print "bioperl seems to work....\n"; my $cutoff_len = 10; #iterate over each query sequence print "try to enter while loop\n"; while (my $result = $search->next_result) { print "entered 1st while loop\n"; #iterate over each hit on the query sequence while (my $hit = $result->next_hit) { print "entered 2nd while loop\n"; #iterate over each HSP in the hit while (my $hsp = $hit->next_hsp) { print "entered 3rd while loop\n"; if ($hsp->length('sbjct') <= $cutoff_len) { #print $hsp->hit_string, "\n"; for ($hsp->hit_string) { #$hsp->hit_string print "count files....., $count ,\n"; ................. =================================================================== Output: [Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl trying to connect to database Connection established start program opened directory read file 40026.txt bioperl seems to work.... try to enter while loop but it doesn't enter the first while loop, it stuck there, first I thought it is a linux problem, because I updated from FC4 to FC5, but it isn't because perl is working fine, and it seems bioperl is working fine too, but it cannot parse the file anymore..... regards Hubert From barry.moore at genetics.utah.edu Thu May 4 17:22:51 2006 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu, 4 May 2006 15:22:51 -0600 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <445A578E.8050207@gmx.at> References: <445A578E.8050207@gmx.at> Message-ID: Hubert, My first suggestion would be to log onto your calgary server and change your password real quick (unless that is intended to post you password to the world). Well, this isn't an answer, but it may help you find one. Use perl -d your_script.pl to run your script under the debugger. Type 'n' to step forward to the line where you start the while loop. Type 'x $result' to see that an object exists (it should or you'd have gotten an error). Type 's' to step into the next_results call, and then continue to type 'n' and 's' as needed to burrow down to see if you can find where you're hanging. Barry On May 4, 2006, at 1:35 PM, Hubert Prielinger wrote: > Hi, > the following perl script worked fine until a few days ago.... > > ============================================================== > #!/usr/bin/perl -w > > use Bio::SearchIO; > use strict; > use DBI; > use Net::MySQL; > > #use lib qw(/usr/local/lib/perl5/site_perl/5.8.6/i686-linux); > > print "trying to connect to database \n"; > my $database = 'antimicro_peptides'; > my $host = 'ppc7.bio.ucalgary.ca'; > my $user = 'Hubert'; > my $password = 'Col00eng30'; > > my $mysql = Net::MySQL->new( > hostname => $host, > database => $database, > user => $user, > password => $password, > ); > > > print "Connection established \n"; > > my $selectID = 0; > my $count = 0; > > > > ##output database results > #while (my @row = $sth->fetchrow_array) > # { print "@row\n" } > > > > print "start program\n"; > my $directory = '/home/Hubert/test'; > opendir(DIR, $directory) || die("Cannot open directory"); > print "opened directory\n"; > > foreach my $file (readdir(DIR)) { > if ($file =~ /txt$/) { > $count++; > print "read file $file \n"; > > > $file = $directory . '/' . $file; > > my $search = new Bio::SearchIO (-format => 'blast', > -file => $file); > print "bioperl seems to work....\n"; > my $cutoff_len = 10; > > #iterate over each query sequence > print "try to enter while loop\n"; > while (my $result = $search->next_result) { > print "entered 1st while loop\n"; > > #iterate over each hit on the query sequence > while (my $hit = $result->next_hit) { > print "entered 2nd while loop\n"; > > #iterate over each HSP in the hit > while (my $hsp = $hit->next_hsp) { > print "entered 3rd while loop\n"; > > if ($hsp->length('sbjct') <= $cutoff_len) { > #print $hsp->hit_string, "\n"; > > for ($hsp->hit_string) { #$hsp->hit_string > print "count files....., $count ,\n"; > ................. > > =================================================================== > > Output: > > [Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl > trying to connect to database > Connection established > start program > opened directory > read file 40026.txt > bioperl seems to work.... > try to enter while loop > > > but it doesn't enter the first while loop, it stuck there, first I > thought it is a linux problem, because I updated from FC4 to FC5, > but it > isn't because perl is working fine, and it seems bioperl is working > fine > too, but it cannot parse the file anymore..... > > regards > Hubert > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Thu May 4 18:27:57 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 4 May 2006 17:27:57 -0500 Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps In-Reply-To: <000001c66faa$3a25b130$15327e82@pyrimidine> Message-ID: <000001c66fc9$fe7e5680$15327e82@pyrimidine> Here's another odd bit. This is what I get for the CONTIG line when I passed a simple contig file (NW_925062, with one join) through Bio::SeqIO: ----------------------------------- .... FEATURES Location/Qualifiers source 1..8541 /db_xref="taxon:9606" /mol_type="genomic DNA" /chromosome="11" /organism="Homo sapiens" CONTIG AADB02014027.1:1..8541 // ----------------------------------- Here's the original: ----------------------------------- FEATURES Location/Qualifiers source 1..8541 /organism="Homo sapiens" /mol_type="genomic DNA" /db_xref="taxon:9606" /chromosome="11" CONTIG join(AADB02014027.1:1..8541) // ----------------------------------- Looks like it lopped out the 'join' here as well. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Thursday, May 04, 2006 1:41 PM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > Are you using the CONTIG record or the full GenBank file? I see > problems with both (using bioperl-live) which seem unrelated to one > another. > The full file seems to be running a bit slow b/c the full GenBank record > is > huge (~55 MB) but the CONTIG file does exactly what you said (runs out of > memory). > > Chris > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Michael Rogoff > > Sent: Tuesday, May 02, 2006 10:32 PM > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > > > > > I've encountered a pretty serious bug in Bio::SeqIO when parsing certain > > genbank > > files that contain CONTIG entries with gaps. One such record is > > NW_925173. > > > > When I try to parse this file using Bio::SeqIO::genbank, it will enter > an > > infinite loop and spin until it runs out of memory. > > > > I'm pretty certain it relates to this bug: > > http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to indicate > > that > > genbank records with CONTIG gaps are not valid and can't be parsed. But > > this > > bug actually claims to be fixed, which is strange, since looking at the > > code for > > FTLocationFactory (where the loop is) it's still right there. I assume > > that > > this may be fixed in other contexts but is still not fixed in > > Bio::SeqIO::genbank? Or am I doing something wrong? > > > > I think that this should probably be filed as an open bug. I would > think > > that > > even if bioperl isn't interested in parsing this type of file via SeqIO, > > certainly you'd want to ensure that no finite input file would send the > > parser > > into an infinite loop. Have others encountered this problem? Is there > > any plan > > to address it? > > > > Thanks very much for any information or help! > > > > -Mike > > > > P.S. I've played around with my version of FTLocationFactory and it > seems > > to > > actually work and parse the gaps. I'm not sure if I've created other > bugs > > or if > > it works in all cases, but at least the parser doesn't die. I also > don't > > know > > that my hacky code is appropriate for putting back in to BioPerl, but > I'm > > happy > > to provide it if someone wants to check it out and/or consider it for > > checkin. > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Thu May 4 18:39:05 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 4 May 2006 18:39:05 -0400 Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps In-Reply-To: <000001c66fc9$fe7e5680$15327e82@pyrimidine> References: <000001c66fc9$fe7e5680$15327e82@pyrimidine> Message-ID: <2E0D7723-FA6E-4812-8DBB-30FCD11FA85C@gmx.net> The two notations are equivalent and syntactically correct, or so I believe ... I don't think 100% verbatim preservation should be the goal. Or am I missing the point? On May 4, 2006, at 6:27 PM, Chris Fields wrote: > Here's another odd bit. This is what I get for the CONTIG line when I > passed a simple contig file (NW_925062, with one join) through > Bio::SeqIO: > > ----------------------------------- > .... > FEATURES Location/Qualifiers > source 1..8541 > /db_xref="taxon:9606" > /mol_type="genomic DNA" > /chromosome="11" > /organism="Homo sapiens" > CONTIG AADB02014027.1:1..8541 > > // > ----------------------------------- > Here's the original: > ----------------------------------- > FEATURES Location/Qualifiers > source 1..8541 > /organism="Homo sapiens" > /mol_type="genomic DNA" > /db_xref="taxon:9606" > /chromosome="11" > CONTIG join(AADB02014027.1:1..8541) > // > ----------------------------------- > > Looks like it lopped out the 'join' here as well. > > Chris > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Chris Fields >> Sent: Thursday, May 04, 2006 1:41 PM >> To: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps >> >> Are you using the CONTIG record or the full GenBank file? I see >> problems with both (using bioperl-live) which seem unrelated to one >> another. >> The full file seems to be running a bit slow b/c the full GenBank >> record >> is >> huge (~55 MB) but the CONTIG file does exactly what you said (runs >> out of >> memory). >> >> Chris >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff >>> Sent: Tuesday, May 02, 2006 10:32 PM >>> To: bioperl-l at lists.open-bio.org >>> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps >>> >>> >>> I've encountered a pretty serious bug in Bio::SeqIO when parsing >>> certain >>> genbank >>> files that contain CONTIG entries with gaps. One such record is >>> NW_925173. >>> >>> When I try to parse this file using Bio::SeqIO::genbank, it will >>> enter >> an >>> infinite loop and spin until it runs out of memory. >>> >>> I'm pretty certain it relates to this bug: >>> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to >>> indicate >>> that >>> genbank records with CONTIG gaps are not valid and can't be >>> parsed. But >>> this >>> bug actually claims to be fixed, which is strange, since looking >>> at the >>> code for >>> FTLocationFactory (where the loop is) it's still right there. I >>> assume >>> that >>> this may be fixed in other contexts but is still not fixed in >>> Bio::SeqIO::genbank? Or am I doing something wrong? >>> >>> I think that this should probably be filed as an open bug. I would >> think >>> that >>> even if bioperl isn't interested in parsing this type of file via >>> SeqIO, >>> certainly you'd want to ensure that no finite input file would >>> send the >>> parser >>> into an infinite loop. Have others encountered this problem? Is >>> there >>> any plan >>> to address it? >>> >>> Thanks very much for any information or help! >>> >>> -Mike >>> >>> P.S. I've played around with my version of FTLocationFactory and it >> seems >>> to >>> actually work and parse the gaps. I'm not sure if I've created >>> other >> bugs >>> or if >>> it works in all cases, but at least the parser doesn't die. I also >> don't >>> know >>> that my hacky code is appropriate for putting back in to BioPerl, >>> but >> I'm >>> happy >>> to provide it if someone wants to check it out and/or consider it >>> for >>> checkin. >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hubert.prielinger at gmx.at Thu May 4 19:57:44 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Thu, 04 May 2006 17:57:44 -0600 Subject: [Bioperl-l] can't parse blast file anymore In-Reply-To: <445A7449.1080607@infotech.monash.edu.au> References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au> Message-ID: <445A94F8.9000903@gmx.at> Torsten Seemann wrote: > Hubert > >> the following perl script worked fine until a few days ago.... >> >> #iterate over each query sequence >> print "try to enter while loop\n"; >> >> > die "Bad BLAST report" if not defined $search; > >> while (my $result = $search->next_result) { >> print "entered 1st while loop\n"; >> >> Output: >> >> [Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl >> try to enter while loop >> >> but it doesn't enter the first while loop, it stuck there, first I >> > What is the value of $search before you start the WHILE loop ? > > hi, $search is defined, like my $search = new Bio::SearchIO (-format => 'blast', -file => $file) if I try it with the debugger as barry has suggested than I get the following DB<1> n main::(Blast.pl:24): print "Connection established \n"; DB<1> n Connection established main::(Blast.pl:26): my $selectID = 0; DB<1> n main::(Blast.pl:27): my $count = 0; DB<1> n main::(Blast.pl:37): print "start program\n"; DB<1> n start program main::(Blast.pl:38): my $directory = '/home/Hubert/test'; DB<1> n main::(Blast.pl:39): opendir(DIR, $directory) || die("Cannot open directory"); DB<1> n main::(Blast.pl:40): print "opened directory\n"; DB<1> n opened directory main::(Blast.pl:42): foreach my $file (readdir(DIR)) { DB<1> n main::(Blast.pl:43): if ($file =~ /txt$/) { DB<1> n main::(Blast.pl:44): $count++; DB<1> n main::(Blast.pl:45): print "read file $file \n"; DB<1> n read file 40026.txt main::(Blast.pl:48): $file = $directory . '/' . $file; DB<1> n main::(Blast.pl:50): my $search = new Bio::SearchIO (-format => 'blast', main::(Blast.pl:51): -file => $file); DB<1> n main::(Blast.pl:52): print "bioperl seems to work....\n"; DB<1> s $search main::((eval 14)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3): 3: $search; DB<<2>> n DB<2> n bioperl seems to work.... main::(Blast.pl:53): my $cutoff_len = 10; DB<2> n main::(Blast.pl:56): print "try to enter while loop\n"; DB<2> n try to enter while loop main::(Blast.pl:57): while (my $result = $search->next_result) { DB<2> s $result main::((eval 15)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3): 3: $result; DB<<3>> From torsten.seemann at infotech.monash.edu.au Thu May 4 17:38:17 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 05 May 2006 07:38:17 +1000 Subject: [Bioperl-l] can't parse blast file anymore In-Reply-To: <445A578E.8050207@gmx.at> References: <445A578E.8050207@gmx.at> Message-ID: <445A7449.1080607@infotech.monash.edu.au> Hubert >the following perl script worked fine until a few days ago.... > > #iterate over each query sequence > print "try to enter while loop\n"; > > die "Bad BLAST report" if not defined $search; > while (my $result = $search->next_result) { > print "entered 1st while loop\n"; > >Output: > >[Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl >try to enter while loop > >but it doesn't enter the first while loop, it stuck there, first I > > What is the value of $search before you start the WHILE loop ? From barry.moore at genetics.utah.edu Thu May 4 20:39:57 2006 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu, 4 May 2006 18:39:57 -0600 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <445A94F8.9000903@gmx.at> References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au> <445A94F8.9000903@gmx.at> Message-ID: <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu> That should be 'x $resust' and you should see the object dumped to the screen. or just 's' by itself which will step you into the sub on the while line will step you into the next_result sub, and you can look around and watch what's happening. B > DB<2> s $result > main::((eval 15)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3): > 3: $result; > DB<<3>> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Thu May 4 22:04:20 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Thu, 04 May 2006 20:04:20 -0600 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu> References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au> <445A94F8.9000903@gmx.at> <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu> Message-ID: <445AB2A4.7020405@gmx.at> if I do so it returns: 0 undef Barry Moore wrote: > That should be 'x $resust' and you should see the object dumped to > the screen. > > or just 's' by itself which will step you into the sub on the while > line will step you into the next_result sub, and you can look around > and watch what's happening. > > B > > >> DB<2> s $result >> main::((eval 15)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3): >> 3: $result; >> DB<<3>> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From torsten.seemann at infotech.monash.edu.au Fri May 5 00:40:34 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 05 May 2006 14:40:34 +1000 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <445AB2A4.7020405@gmx.at> References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au> <445A94F8.9000903@gmx.at> <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu> <445AB2A4.7020405@gmx.at> Message-ID: <445AD742.4070408@infotech.monash.edu.au> Hubert Prielinger wrote: > if I do so it returns: > 0 undef That means the value of $search was undef. That means that it could not parse or open the BLAST report. I repeat the line that I put in my earlier email which you ignored. # your line my $search = Bio::SearchIO->new( ..... ); # then check if it was successful! die "could not open blast report" if not defined $search; --Torsten From jason.stajich at duke.edu Fri May 5 09:21:38 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri, 5 May 2006 09:21:38 -0400 Subject: [Bioperl-l] bioperl-AlignIO problems parsing fasta files In-Reply-To: <5d35ac4d.b35ce863.8198d00@expms1.cites.uiuc.edu> References: <5d35ac4d.b35ce863.8198d00@expms1.cites.uiuc.edu> Message-ID: Space after the > is causing the problem since we infer the ID as the everything after the '>' BEFORE the first whitespace. Get rid of the space. $ perl -i.backup -p -e 's/^>\s+/>/' YOURFASALNFILE On May 4, 2006, at 7:00 PM, Gloria Rendon wrote: > contents of the input file has a single sequence: > >> gi|90108701|pdb|2AHZ|B Chain B, K+ Complex Of The Nak Channel > MLSFLLTLKRMLRACLRAWKDKEFQVLFVLTILTLISGTIFYSTVEGLRPIDALYFSVVTLTTVGDGNFS > PQTDFGKIFTILYIFIGIGLVFGFIHKLAVNVQLPSILSN > ------------------------------------------ > this is the script that tries to parse it: > > use Bio::AlignIO; > my $inseq = Bio::AlignIO->new(-format => 'fasta', > -file => 'test.fasta'); > while( my $aln = $inseq->next_aln ) { > print "name: ", $aln->displayname; > print "length: ", $aln->length; > print "\n"; > } > > ------------------------------------------ > and this is the result of running that script on winxp > > D:\msa\NAK MUTANTS>perl parseFasta.pl > > > ------------- EXCEPTION ------------- > MSG: No sequence with name [] > STACK Bio::SimpleAlign::displayname > C:/Perl/site/lib/Bio/SimpleAlign.pm:2047 > STACK toplevel parseFasta.pl:11 > > -------------------------------------- > D:\msa\NAK MUTANTS> -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From thoufek at pngg.org Thu May 4 12:50:44 2006 From: thoufek at pngg.org (T.D. Houfek) Date: Thu, 04 May 2006 12:50:44 -0400 Subject: [Bioperl-l] Bio::Seq::Quality description line problem In-Reply-To: References: Message-ID: <445A30E4.6070103@pngg.org> Using Bioperl 1.5, having trouble with writing FASTA-style quality files using Bio::Seq::Quality. I create the Bio::Seq::Quality object, giving its constructor an ID, a description, a nucleotide sequence, and a quality sequence. I then write the sequence FASTA and the quality FASTA. The description string will appear in the header line of the sequence FASTA, but not in the header line of the quality FASTA. Can anybody help me figure out how to fix this? I've attached a sample script and output. -T.D. ------------------- sample script follows --------------------------------------- #!/usr/bin/perl use strict; use Bio::Seq::Quality; use Bio::SeqIO; my $id = "bogus_id"; my $desc = "bogus description"; my $seq = "ATTATTATTATTATT"; my $qual = "10 20 30 10 20 30 10 20 30 10 20 30 10 20 30"; my $sequal_obj = Bio::Seq::Quality->new( -display_id => $id, -desc => $desc, -seq => $seq, -qual => $qual ); my $qualout = Bio::SeqIO->new( -file => ">myfile.qual", -format => 'qual' ); my $seqout = Bio::SeqIO->new( -file => ">myfile.seq", -format => 'Fasta' ); $seqout->write_seq($sequal_obj); $qualout->write_seq($sequal_obj); ------------------ sample output follows --------------------------------------- tdhoufek at aether:~$ cat myfile.seq >bogus_id bogus description ATTATTATTATTATT tdhoufek at aether:~$ cat myfile.qual >bogus_id 10 20 30 10 20 30 10 20 30 10 20 30 10 20 30 -------------------------------------------------------------------------------------------------- -- T.D. Houfek senior bioinformatics developer plant nematode genetics group north carolina state university Email: thoufek at pngg.org ---------------------------------------------------------- use Bio::Seq; @a =qw/NNN CCT GAG CAT GCG TGT AAG AAC TAG/; $u=seq;$r=Bio::Seq;sub c{$c=$r->new(-$u=>"@_[0]")->revcom; $t=$c->$u;}map{m/\d/?$g=c($a[$_]):tr/a-i/1-9/&&($g=$a[$_]) ;$x[$i++]=$g;} split //,"dgh5cb40ab120cdefb4";$z=$r->new(- $u=>(join"", at x))->translate()->$u;$z =~s/X/ /g;print"$z\n" From jason.stajich at duke.edu Fri May 5 09:27:51 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri, 5 May 2006 09:27:51 -0400 Subject: [Bioperl-l] bioperl-AlignIO problems parsing fasta files In-Reply-To: References: <5d35ac4d.b35ce863.8198d00@expms1.cites.uiuc.edu> Message-ID: <0F79C9AD-DE36-4424-9E59-37ABE8B62A5E@duke.edu> [replying to myself] although if you are trying to just read a sequence not an alignment then you want to use Bio::SeqIO. See the copious help on the HOWTO page at bioperl website including a sequence and feature howto and beginner's guide. http://bioperl.org/wiki/HOWTOs -jason On May 5, 2006, at 9:21 AM, Jason Stajich wrote: > Space after the > is causing the problem since we infer the ID as the > everything after the '>' BEFORE the first whitespace. Get rid of the > space. > $ perl -i.backup -p -e 's/^>\s+/>/' YOURFASALNFILE > > On May 4, 2006, at 7:00 PM, Gloria Rendon wrote: > >> contents of the input file has a single sequence: >> >>> gi|90108701|pdb|2AHZ|B Chain B, K+ Complex Of The Nak Channel >> MLSFLLTLKRMLRACLRAWKDKEFQVLFVLTILTLISGTIFYSTVEGLRPIDALYFSVVTLTTVGDGNF >> S >> PQTDFGKIFTILYIFIGIGLVFGFIHKLAVNVQLPSILSN >> ------------------------------------------ >> this is the script that tries to parse it: >> >> use Bio::AlignIO; >> my $inseq = Bio::AlignIO->new(-format => 'fasta', >> -file => 'test.fasta'); >> while( my $aln = $inseq->next_aln ) { >> print "name: ", $aln->displayname; >> print "length: ", $aln->length; >> print "\n"; >> } >> >> ------------------------------------------ >> and this is the result of running that script on winxp >> >> D:\msa\NAK MUTANTS>perl parseFasta.pl >> >> >> ------------- EXCEPTION ------------- >> MSG: No sequence with name [] >> STACK Bio::SimpleAlign::displayname >> C:/Perl/site/lib/Bio/SimpleAlign.pm:2047 >> STACK toplevel parseFasta.pl:11 >> >> -------------------------------------- >> D:\msa\NAK MUTANTS> > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From osborne1 at optonline.net Fri May 5 10:04:02 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 05 May 2006 10:04:02 -0400 Subject: [Bioperl-l] Bio::Seq::Quality description line problem In-Reply-To: <445A30E4.6070103@pngg.org> Message-ID: T.D., According to the documentation, http://www.bioperl.org/wiki/Qual_sequence_format, your *qual file looks right. What are you trying to create? Brian O. On 5/4/06 12:50 PM, "T.D. Houfek" wrote: > Using Bioperl 1.5, having trouble with writing FASTA-style quality files > using Bio::Seq::Quality. > > I create the Bio::Seq::Quality object, giving its constructor an ID, a > description, a nucleotide sequence, and a quality sequence. I then write > the sequence FASTA and the quality FASTA. The description string will > appear in the header line of the sequence FASTA, but not in the header > line of the quality FASTA. > > Can anybody help me figure out how to fix this? I've attached a sample > script and output. > > -T.D. > > ------------------- sample script follows > --------------------------------------- > > #!/usr/bin/perl > use strict; > use Bio::Seq::Quality; > use Bio::SeqIO; > > my $id = "bogus_id"; > my $desc = "bogus description"; > my $seq = "ATTATTATTATTATT"; > my $qual = "10 20 30 10 20 30 10 20 30 10 20 30 10 20 30"; > > my $sequal_obj = Bio::Seq::Quality->new( > -display_id => $id, > -desc => $desc, > -seq => $seq, > -qual => $qual > ); > > my $qualout = Bio::SeqIO->new( > -file => ">myfile.qual", > -format => 'qual' > ); > my $seqout = Bio::SeqIO->new( > -file => ">myfile.seq", > -format => 'Fasta' > ); > > $seqout->write_seq($sequal_obj); > $qualout->write_seq($sequal_obj); > > > ------------------ sample output follows > --------------------------------------- > > tdhoufek at aether:~$ cat myfile.seq >> bogus_id bogus description > ATTATTATTATTATT > tdhoufek at aether:~$ cat myfile.qual >> bogus_id > 10 20 30 10 20 30 10 20 30 10 20 30 10 20 30 > > ------------------------------------------------------------------------------ > -------------------- > > > From cjfields at uiuc.edu Fri May 5 10:24:05 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 May 2006 09:24:05 -0500 Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps In-Reply-To: <2E0D7723-FA6E-4812-8DBB-30FCD11FA85C@gmx.net> Message-ID: <001701c6704f$90dbd090$15327e82@pyrimidine> I'm not sure it's a valid CONTIG file w/o the join(...). This is a chunk from the longer file Michael used as an example here (NW_925173). I believe the CONTIG line is currently handled like a feature so I think it goes through Bio::SeqIO::FTHelper, which is where Michael mentions his bugfix is; I think it's getting beaten up in there somehow. I may see what happens if it's treated like a WGS line (like a Bio::Annotation::SimpleValue object) and just glob the whole mess together as is. Chris ... FEATURES Location/Qualifiers source 1..44976370 /organism="Homo sapiens" /mol_type="genomic DNA" /db_xref="taxon:9606" /chromosome="11" CONTIG join(AADB02014316.1:1..1482320,gap(67),AADB02014317.1:1..577321, gap(441),AADB02014318.1:1..173584,gap(676), AADB02014319.1:1..377558,gap(20), complement(AADB02014320.1:1..431263),gap(20), AADB02014321.1:1..794957,gap(1241),AADB02014322.1:1..1366198, gap(6446),AADB02014323.1:1..3366,gap(20),AADB02014324.1:1..4771, gap(4611),AADB02014325.1:1..383881,gap(20), complement(AADB02014326.1:1..381633),gap(1930), complement(AADB02014327.1:1..460053),gap(20), AADB02014328.1:1..4186,gap(1587), ... > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > Sent: Thursday, May 04, 2006 5:39 PM > To: Chris Fields > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > The two notations are equivalent and syntactically correct, or so I > believe ... I don't think 100% verbatim preservation should be the > goal. Or am I missing the point? > > On May 4, 2006, at 6:27 PM, Chris Fields wrote: > > > Here's another odd bit. This is what I get for the CONTIG line when I > > passed a simple contig file (NW_925062, with one join) through > > Bio::SeqIO: > > > > ----------------------------------- > > .... > > FEATURES Location/Qualifiers > > source 1..8541 > > /db_xref="taxon:9606" > > /mol_type="genomic DNA" > > /chromosome="11" > > /organism="Homo sapiens" > > CONTIG AADB02014027.1:1..8541 > > > > // > > ----------------------------------- > > Here's the original: > > ----------------------------------- > > FEATURES Location/Qualifiers > > source 1..8541 > > /organism="Homo sapiens" > > /mol_type="genomic DNA" > > /db_xref="taxon:9606" > > /chromosome="11" > > CONTIG join(AADB02014027.1:1..8541) > > // > > ----------------------------------- > > > > Looks like it lopped out the 'join' here as well. > > > > Chris > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Chris Fields > >> Sent: Thursday, May 04, 2006 1:41 PM > >> To: bioperl-l at lists.open-bio.org > >> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > >> > >> Are you using the CONTIG record or the full GenBank file? I see > >> problems with both (using bioperl-live) which seem unrelated to one > >> another. > >> The full file seems to be running a bit slow b/c the full GenBank > >> record > >> is > >> huge (~55 MB) but the CONTIG file does exactly what you said (runs > >> out of > >> memory). > >> > >> Chris > >> > >>> -----Original Message----- > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff > >>> Sent: Tuesday, May 02, 2006 10:32 PM > >>> To: bioperl-l at lists.open-bio.org > >>> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > >>> > >>> > >>> I've encountered a pretty serious bug in Bio::SeqIO when parsing > >>> certain > >>> genbank > >>> files that contain CONTIG entries with gaps. One such record is > >>> NW_925173. > >>> > >>> When I try to parse this file using Bio::SeqIO::genbank, it will > >>> enter > >> an > >>> infinite loop and spin until it runs out of memory. > >>> > >>> I'm pretty certain it relates to this bug: > >>> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to > >>> indicate > >>> that > >>> genbank records with CONTIG gaps are not valid and can't be > >>> parsed. But > >>> this > >>> bug actually claims to be fixed, which is strange, since looking > >>> at the > >>> code for > >>> FTLocationFactory (where the loop is) it's still right there. I > >>> assume > >>> that > >>> this may be fixed in other contexts but is still not fixed in > >>> Bio::SeqIO::genbank? Or am I doing something wrong? > >>> > >>> I think that this should probably be filed as an open bug. I would > >> think > >>> that > >>> even if bioperl isn't interested in parsing this type of file via > >>> SeqIO, > >>> certainly you'd want to ensure that no finite input file would > >>> send the > >>> parser > >>> into an infinite loop. Have others encountered this problem? Is > >>> there > >>> any plan > >>> to address it? > >>> > >>> Thanks very much for any information or help! > >>> > >>> -Mike > >>> > >>> P.S. I've played around with my version of FTLocationFactory and it > >> seems > >>> to > >>> actually work and parse the gaps. I'm not sure if I've created > >>> other > >> bugs > >>> or if > >>> it works in all cases, but at least the parser doesn't die. I also > >> don't > >>> know > >>> that my hacky code is appropriate for putting back in to BioPerl, > >>> but > >> I'm > >>> happy > >>> to provide it if someone wants to check it out and/or consider it > >>> for > >>> checkin. > >>> > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Fri May 5 10:47:50 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 5 May 2006 10:47:50 -0400 Subject: [Bioperl-l] Bio::Seq::Quality description line problem In-Reply-To: References: Message-ID: <2E1683FE-57E4-4D97-A958-1B529973E89E@gmx.net> He wants the description on the description line, like for the sequence file. Thomas, my guess is the code doesn't print the description to the line although I haven't made sure. Do you want to volunteer and check, add that print statement and post the patch? -hilmar On May 5, 2006, at 10:04 AM, Brian Osborne wrote: > T.D., > > According to the documentation, > http://www.bioperl.org/wiki/Qual_sequence_format, your *qual file > looks > right. What are you trying to create? > > Brian O. > > > On 5/4/06 12:50 PM, "T.D. Houfek" wrote: > >> Using Bioperl 1.5, having trouble with writing FASTA-style quality >> files >> using Bio::Seq::Quality. >> >> I create the Bio::Seq::Quality object, giving its constructor an >> ID, a >> description, a nucleotide sequence, and a quality sequence. I then >> write >> the sequence FASTA and the quality FASTA. The description string will >> appear in the header line of the sequence FASTA, but not in the >> header >> line of the quality FASTA. >> >> Can anybody help me figure out how to fix this? I've attached a >> sample >> script and output. >> >> -T.D. >> >> ------------------- sample script follows >> --------------------------------------- >> >> #!/usr/bin/perl >> use strict; >> use Bio::Seq::Quality; >> use Bio::SeqIO; >> >> my $id = "bogus_id"; >> my $desc = "bogus description"; >> my $seq = "ATTATTATTATTATT"; >> my $qual = "10 20 30 10 20 30 10 20 30 10 20 30 10 20 30"; >> >> my $sequal_obj = Bio::Seq::Quality->new( >> -display_id => $id, >> -desc => $desc, >> -seq => $seq, >> -qual => $qual >> ); >> >> my $qualout = Bio::SeqIO->new( >> -file => ">myfile.qual", >> -format => 'qual' >> ); >> my $seqout = Bio::SeqIO->new( >> -file => ">myfile.seq", >> -format => 'Fasta' >> ); >> >> $seqout->write_seq($sequal_obj); >> $qualout->write_seq($sequal_obj); >> >> >> ------------------ sample output follows >> --------------------------------------- >> >> tdhoufek at aether:~$ cat myfile.seq >>> bogus_id bogus description >> ATTATTATTATTATT >> tdhoufek at aether:~$ cat myfile.qual >>> bogus_id >> 10 20 30 10 20 30 10 20 30 10 20 30 10 20 30 >> >> --------------------------------------------------------------------- >> --------- >> -------------------- >> >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From dmessina at wustl.edu Fri May 5 11:24:47 2006 From: dmessina at wustl.edu (David Messina) Date: Fri, 5 May 2006 10:24:47 -0500 Subject: [Bioperl-l] Bio::Seq::Quality description line problem In-Reply-To: <445A30E4.6070103@pngg.org> References: <445A30E4.6070103@pngg.org> Message-ID: <5A549C57-A310-4623-BC44-787AC8BFD6C2@wustl.edu> Apologies if this is a repost -- mail troubles this morning. Hilmar is correct. From a cursory walk through the code in a debugger, it looks like Bio::SeqIO::qual's write_seq method doesn't read the 'desc' out of the Bio::Seq::Quality object. I think there should be something like this: if ($source->can('desc') and my $desc = $source->desc()) { $desc =~ s/\n//g; } $header .= " $desc"; before line 218 in Bio::SeqIO::qual (where the header is printed): $self->_print (">$header \n"); Dave From dmessina at wustl.edu Fri May 5 10:53:15 2006 From: dmessina at wustl.edu (David Messina) Date: Fri, 5 May 2006 09:53:15 -0500 Subject: [Bioperl-l] Bio::Seq::Quality description line problem In-Reply-To: <445A30E4.6070103@pngg.org> References: <445A30E4.6070103@pngg.org> Message-ID: T.D., From a cursory walk through your code in a debugger, it looks like Bio::SeqIO::qual's write_seq method doesn't read the 'desc' out of the Bio::Seq::Quality object. I think there should be something like this: if ($source->can('desc') and my $desc = $source->desc()) { $desc =~ s/\n//g; } $header .= " $desc"; before line 218 in Bio::SeqIO::qual (where the header is printed): $self->_print (">$header \n"); Dave From dmessina at wustl.edu Fri May 5 10:53:15 2006 From: dmessina at wustl.edu (David Messina) Date: Fri, 5 May 2006 09:53:15 -0500 Subject: [Bioperl-l] Bio::Seq::Quality description line problem In-Reply-To: <445A30E4.6070103@pngg.org> References: <445A30E4.6070103@pngg.org> Message-ID: T.D., From a cursory walk through your code in a debugger, it looks like Bio::SeqIO::qual's write_seq method doesn't read the 'desc' out of the Bio::Seq::Quality object. I think there should be something like this: if ($source->can('desc') and my $desc = $source->desc()) { $desc =~ s/\n//g; } $header .= " $desc"; before line 218 in Bio::SeqIO::qual (where the header is printed): $self->_print (">$header \n"); Dave From hubert.prielinger at gmx.at Fri May 5 14:30:24 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 05 May 2006 12:30:24 -0600 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <445AD742.4070408@infotech.monash.edu.au> References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au> <445A94F8.9000903@gmx.at> <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu> <445AB2A4.7020405@gmx.at> <445AD742.4070408@infotech.monash.edu.au> Message-ID: <445B99C0.6050407@gmx.at> hi, I have done, as you suggested and I got the error message: Can't call method "next_result" on an undefined value at.... then I looked up at the internet and found a thread which suggested to use strict and then the problem is solved.... but I'm already using use strict.. thanks Torsten Seemann wrote: > Hubert Prielinger wrote: > >> if I do so it returns: >> 0 undef >> > > That means the value of $search was undef. > That means that it could not parse or open the BLAST report. > I repeat the line that I put in my earlier email which you ignored. > > # your line > my $search = Bio::SearchIO->new( ..... ); > > # then check if it was successful! > die "could not open blast report" if not defined $search; > > --Torsten > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From cjfields at uiuc.edu Fri May 5 15:18:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 May 2006 14:18:16 -0500 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <445B99C0.6050407@gmx.at> Message-ID: <000001c67078$a9a7ca10$15327e82@pyrimidine> What happens if you add the verbose flag? my $search = new Bio::SearchIO (-verbose => 1, -format => 'blast', -file => $file); Added thought : you might want to look at File::Find for stepping through your files and performing a task on each one, such as parsing output. It changes into the working directory each time; you should be able to do something like this: use File::Find; use Bio::SearchIO; Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger > Sent: Friday, May 05, 2006 1:30 PM > To: Torsten Seemann; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore > > hi, > I have done, as you suggested and I got the error message: > > Can't call method "next_result" on an undefined value at.... > > then I looked up at the internet and found a thread which suggested to > use strict and then the problem is solved.... > but I'm already using use strict.. > > thanks > > Torsten Seemann wrote: > > Hubert Prielinger wrote: > > > >> if I do so it returns: > >> 0 undef > >> > > > > That means the value of $search was undef. > > That means that it could not parse or open the BLAST report. > > I repeat the line that I put in my earlier email which you ignored. > > > > # your line > > my $search = Bio::SearchIO->new( ..... ); > > > > # then check if it was successful! > > die "could not open blast report" if not defined $search; > > > > --Torsten > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri May 5 15:27:12 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 May 2006 14:27:12 -0500 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <445B99C0.6050407@gmx.at> Message-ID: <000101c67079$e8c86a00$15327e82@pyrimidine> Sorry, mail got sent before I finished it! Here I go again... What happens if you add the verbose flag? my $search = new Bio::SearchIO (-verbose => 1, -format => 'blast', -file => $file); Added thought : you might want to look at File::Find for stepping through your files and performing a task on each one, such as parsing output. It changes into the working directory each time; you should be able to do something like this: use File::Find; use Bio::SearchIO; my @dirlist = ("/home/Hubert/test"); find (\&dir, @dirlist); sub printdir { return unless /txt$/; return if (-d); my $parser = Bio::SearchIO->new(-file => $_, -format => 'blast'); while (my $result = $parser->next_result) { while (my $hit = $result->next_hit) { while (my $hsp = $hit->next_hsp) { # do stuff here } } } } Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger > Sent: Friday, May 05, 2006 1:30 PM > To: Torsten Seemann; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore > > hi, > I have done, as you suggested and I got the error message: > > Can't call method "next_result" on an undefined value at.... > > then I looked up at the internet and found a thread which suggested to > use strict and then the problem is solved.... > but I'm already using use strict.. > > thanks > > Torsten Seemann wrote: > > Hubert Prielinger wrote: > > > >> if I do so it returns: > >> 0 undef > >> > > > > That means the value of $search was undef. > > That means that it could not parse or open the BLAST report. > > I repeat the line that I put in my earlier email which you ignored. > > > > # your line > > my $search = Bio::SearchIO->new( ..... ); > > > > # then check if it was successful! > > die "could not open blast report" if not defined $search; > > > > --Torsten > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From barry.moore at genetics.utah.edu Fri May 5 15:39:37 2006 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri, 5 May 2006 13:39:37 -0600 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <445B99C0.6050407@gmx.at> References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au> <445A94F8.9000903@gmx.at> <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu> <445AB2A4.7020405@gmx.at> <445AD742.4070408@infotech.monash.edu.au> <445B99C0.6050407@gmx.at> Message-ID: <7F3D73A6-392E-4728-ACB9-FD3BEDFD3C18@genetics.utah.edu> Hubert- If you want to send me your script and input file I'll try to have a look at it. Barry On May 5, 2006, at 12:30 PM, Hubert Prielinger wrote: > hi, > I have done, as you suggested and I got the error message: > > Can't call method "next_result" on an undefined value at.... > > then I looked up at the internet and found a thread which suggested to > use strict and then the problem is solved.... > but I'm already using use strict.. > > thanks > > Torsten Seemann wrote: >> Hubert Prielinger wrote: >> >>> if I do so it returns: >>> 0 undef >>> >> >> That means the value of $search was undef. >> That means that it could not parse or open the BLAST report. >> I repeat the line that I put in my earlier email which you ignored. >> >> # your line >> my $search = Bio::SearchIO->new( ..... ); >> >> # then check if it was successful! >> die "could not open blast report" if not defined $search; >> >> --Torsten >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri May 5 16:07:53 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 May 2006 15:07:53 -0500 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <000101c67079$e8c86a00$15327e82@pyrimidine> Message-ID: <000201c6707f$97aaaba0$15327e82@pyrimidine> Oops! This is what happens when I copy and paste in a hurry. > use File::Find; > use Bio::SearchIO; > > my @dirlist = ("/home/Hubert/test"); > > find (\&dir, @dirlist); > > sub printdir { ^^^^^^^^^^^ Should be: sub dir { > return unless /txt$/; > return if (-d); > my $parser = Bio::SearchIO->new(-file => $_, > -format => 'blast'); > while (my $result = $parser->next_result) { > while (my $hit = $result->next_hit) { > while (my $hsp = $hit->next_hsp) { > # do stuff here > } > } > } > } Hubert, if the file you are parsing looks fine (i.e. valid BLAST output), post it and your script on Bugzilla and let us take a look. Leave out your password though ; > Chris From golharam at umdnj.edu Fri May 5 15:58:03 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Fri, 05 May 2006 15:58:03 -0400 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <000001c67078$a9a7ca10$15327e82@pyrimidine> Message-ID: <02f101c6707e$39a03a30$2f01a8c0@GOLHARMOBILE1> I'm not sure how applicable this is, but I've seen a problem with Perl if the LANG environment variable contain UTF8 (ex LANG=en_US.UTF8). I've changed mine to en_US and lots of perl string parsing problems went away. Also, what about running the bioperl tests on your installation (make test). What happens? -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields Sent: Friday, May 05, 2006 3:18 PM To: 'Hubert Prielinger'; 'Torsten Seemann'; bioperl-l at bioperl.org Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore What happens if you add the verbose flag? my $search = new Bio::SearchIO (-verbose => 1, -format => 'blast', -file => $file); Added thought : you might want to look at File::Find for stepping through your files and performing a task on each one, such as parsing output. It changes into the working directory each time; you should be able to do something like this: use File::Find; use Bio::SearchIO; Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger > Sent: Friday, May 05, 2006 1:30 PM > To: Torsten Seemann; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore > > hi, > I have done, as you suggested and I got the error message: > > Can't call method "next_result" on an undefined value at.... > > then I looked up at the internet and found a thread which suggested to > use strict and then the problem is solved.... but I'm already using > use strict.. > > thanks > > Torsten Seemann wrote: > > Hubert Prielinger wrote: > > > >> if I do so it returns: > >> 0 undef > >> > > > > That means the value of $search was undef. > > That means that it could not parse or open the BLAST report. I > > repeat the line that I put in my earlier email which you ignored. > > > > # your line > > my $search = Bio::SearchIO->new( ..... ); > > > > # then check if it was successful! > > die "could not open blast report" if not defined $search; > > > > --Torsten > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri May 5 17:56:29 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 May 2006 16:56:29 -0500 Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps In-Reply-To: <001701c6704f$90dbd090$15327e82@pyrimidine> Message-ID: <000901c6708e$c77442b0$15327e82@pyrimidine> Okay, I have changed the way the CONTIG line is handled in Bio::SeqIO::genbank. It was handling it as a feature; I just changed it over to handling it as a Bio::Annotation::SimpleValue object with the value being the entire contig section. It seems to pass tests fine but I'm operating off Windows and my wife's IBook went to the great desktop in the sky (motherboard), so I can't test it there. Pulling the file off using Bio::DB::GenBank (using the no-redirect flag) works w/o crashing out. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Friday, May 05, 2006 9:24 AM > To: 'Hilmar Lapp' > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > I'm not sure it's a valid CONTIG file w/o the join(...). This is a chunk > from the longer file Michael used as an example here (NW_925173). I > believe > the CONTIG line is currently handled like a feature so I think it goes > through Bio::SeqIO::FTHelper, which is where Michael mentions his bugfix > is; > I think it's getting beaten up in there somehow. I may see what happens if > it's treated like a WGS line (like a Bio::Annotation::SimpleValue object) > and just glob the whole mess together as is. > > > Chris > > ... > FEATURES Location/Qualifiers > source 1..44976370 > /organism="Homo sapiens" > /mol_type="genomic DNA" > /db_xref="taxon:9606" > /chromosome="11" > CONTIG > join(AADB02014316.1:1..1482320,gap(67),AADB02014317.1:1..577321, > gap(441),AADB02014318.1:1..173584,gap(676), > AADB02014319.1:1..377558,gap(20), > complement(AADB02014320.1:1..431263),gap(20), > AADB02014321.1:1..794957,gap(1241),AADB02014322.1:1..1366198, > > gap(6446),AADB02014323.1:1..3366,gap(20),AADB02014324.1:1..4771, > gap(4611),AADB02014325.1:1..383881,gap(20), > complement(AADB02014326.1:1..381633),gap(1930), > complement(AADB02014327.1:1..460053),gap(20), > AADB02014328.1:1..4186,gap(1587), > ... > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > > Sent: Thursday, May 04, 2006 5:39 PM > > To: Chris Fields > > Cc: bioperl-l at lists.open-bio.org > > Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > > > The two notations are equivalent and syntactically correct, or so I > > believe ... I don't think 100% verbatim preservation should be the > > goal. Or am I missing the point? > > > > On May 4, 2006, at 6:27 PM, Chris Fields wrote: > > > > > Here's another odd bit. This is what I get for the CONTIG line when I > > > passed a simple contig file (NW_925062, with one join) through > > > Bio::SeqIO: > > > > > > ----------------------------------- > > > .... > > > FEATURES Location/Qualifiers > > > source 1..8541 > > > /db_xref="taxon:9606" > > > /mol_type="genomic DNA" > > > /chromosome="11" > > > /organism="Homo sapiens" > > > CONTIG AADB02014027.1:1..8541 > > > > > > // > > > ----------------------------------- > > > Here's the original: > > > ----------------------------------- > > > FEATURES Location/Qualifiers > > > source 1..8541 > > > /organism="Homo sapiens" > > > /mol_type="genomic DNA" > > > /db_xref="taxon:9606" > > > /chromosome="11" > > > CONTIG join(AADB02014027.1:1..8541) > > > // > > > ----------------------------------- > > > > > > Looks like it lopped out the 'join' here as well. > > > > > > Chris > > > > > >> -----Original Message----- > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >> bounces at lists.open-bio.org] On Behalf Of Chris Fields > > >> Sent: Thursday, May 04, 2006 1:41 PM > > >> To: bioperl-l at lists.open-bio.org > > >> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > >> > > >> Are you using the CONTIG record or the full GenBank file? I see > > >> problems with both (using bioperl-live) which seem unrelated to one > > >> another. > > >> The full file seems to be running a bit slow b/c the full GenBank > > >> record > > >> is > > >> huge (~55 MB) but the CONTIG file does exactly what you said (runs > > >> out of > > >> memory). > > >> > > >> Chris > > >> > > >>> -----Original Message----- > > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >>> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff > > >>> Sent: Tuesday, May 02, 2006 10:32 PM > > >>> To: bioperl-l at lists.open-bio.org > > >>> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > >>> > > >>> > > >>> I've encountered a pretty serious bug in Bio::SeqIO when parsing > > >>> certain > > >>> genbank > > >>> files that contain CONTIG entries with gaps. One such record is > > >>> NW_925173. > > >>> > > >>> When I try to parse this file using Bio::SeqIO::genbank, it will > > >>> enter > > >> an > > >>> infinite loop and spin until it runs out of memory. > > >>> > > >>> I'm pretty certain it relates to this bug: > > >>> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to > > >>> indicate > > >>> that > > >>> genbank records with CONTIG gaps are not valid and can't be > > >>> parsed. But > > >>> this > > >>> bug actually claims to be fixed, which is strange, since looking > > >>> at the > > >>> code for > > >>> FTLocationFactory (where the loop is) it's still right there. I > > >>> assume > > >>> that > > >>> this may be fixed in other contexts but is still not fixed in > > >>> Bio::SeqIO::genbank? Or am I doing something wrong? > > >>> > > >>> I think that this should probably be filed as an open bug. I would > > >> think > > >>> that > > >>> even if bioperl isn't interested in parsing this type of file via > > >>> SeqIO, > > >>> certainly you'd want to ensure that no finite input file would > > >>> send the > > >>> parser > > >>> into an infinite loop. Have others encountered this problem? Is > > >>> there > > >>> any plan > > >>> to address it? > > >>> > > >>> Thanks very much for any information or help! > > >>> > > >>> -Mike > > >>> > > >>> P.S. I've played around with my version of FTLocationFactory and it > > >> seems > > >>> to > > >>> actually work and parse the gaps. I'm not sure if I've created > > >>> other > > >> bugs > > >>> or if > > >>> it works in all cases, but at least the parser doesn't die. I also > > >> don't > > >>> know > > >>> that my hacky code is appropriate for putting back in to BioPerl, > > >>> but > > >> I'm > > >>> happy > > >>> to provide it if someone wants to check it out and/or consider it > > >>> for > > >>> checkin. > > >>> > > >>> > > >>> > > >>> _______________________________________________ > > >>> Bioperl-l mailing list > > >>> Bioperl-l at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > -- > > =========================================================== > > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > =========================================================== > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Fri May 5 19:54:55 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 05 May 2006 17:54:55 -0600 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <02f101c6707e$39a03a30$2f01a8c0@GOLHARMOBILE1> References: <02f101c6707e$39a03a30$2f01a8c0@GOLHARMOBILE1> Message-ID: <445BE5CF.2000007@gmx.at> hi ryan, nothing happend if I add the verbose flag and how can I test my bioperl installation..... Ryan Golhar wrote: > I'm not sure how applicable this is, but I've seen a problem with Perl > if the LANG environment variable contain UTF8 (ex LANG=en_US.UTF8). > I've changed mine to en_US and lots of perl string parsing problems went > away. > > Also, what about running the bioperl tests on your installation (make > test). What happens? > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Friday, May 05, 2006 3:18 PM > To: 'Hubert Prielinger'; 'Torsten Seemann'; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore > > > What happens if you add the verbose flag? > > my $search = new Bio::SearchIO (-verbose => 1, > -format => 'blast', > -file => $file); > > Added thought : you might want to look at File::Find for stepping > through your files and performing a task on each one, such as parsing > output. It changes into the working directory each time; you should be > able to do something like this: > > use File::Find; > use Bio::SearchIO; > > > > > Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger >> Sent: Friday, May 05, 2006 1:30 PM >> To: Torsten Seemann; bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore >> >> hi, >> I have done, as you suggested and I got the error message: >> >> Can't call method "next_result" on an undefined value at.... >> >> then I looked up at the internet and found a thread which suggested to >> > > >> use strict and then the problem is solved.... but I'm already using >> use strict.. >> >> thanks >> >> Torsten Seemann wrote: >> >>> Hubert Prielinger wrote: >>> >>> >>>> if I do so it returns: >>>> 0 undef >>>> >>>> >>> That means the value of $search was undef. >>> That means that it could not parse or open the BLAST report. I >>> repeat the line that I put in my earlier email which you ignored. >>> >>> # your line >>> my $search = Bio::SearchIO->new( ..... ); >>> >>> # then check if it was successful! >>> die "could not open blast report" if not defined $search; >>> >>> --Torsten >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From hubert.prielinger at gmx.at Fri May 5 20:01:11 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 05 May 2006 18:01:11 -0600 Subject: [Bioperl-l] [BULK] can't parse blast file anymore Message-ID: <445BE747.5020202@gmx.at> hi I have posted my script and the blast file to bugzilla...... From hubert.prielinger at gmx.at Fri May 5 21:21:33 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 05 May 2006 19:21:33 -0600 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <445BE747.5020202@gmx.at> References: <445BE747.5020202@gmx.at> Message-ID: <445BFA1D.5060008@gmx.at> they bugzilla posting didn't work, what is the exact email address for bugzilla Hubert Prielinger wrote: > hi > I have posted my script and the blast file to bugzilla...... > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From cjfields at uiuc.edu Fri May 5 21:38:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 May 2006 20:38:47 -0500 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <445BFA1D.5060008@gmx.at> Message-ID: <000d01c670ad$d209f980$15327e82@pyrimidine> Hubert, Calm down. Breathe in, breath out. Relax....... Okay, here is the place to start. Read the instructions there first. http://www.bioperl.org/wiki/Bugs Bugs are reported at this site: http://bugzilla.bioperl.org/ Again, follow the instructions. You will have to create a user name and password to submit. Once that is set up, click the "Submit a new bug" link on the main bugzilla page. On that page, fill out all information first and a description of the error and hit 'commit'. Add the BLAST report and some sample script by clicking on the "Create a New Attachment" link (you'll have to do this for each file). Once you go back to the bug page you should see two attachments and the bug report. Any commits get sent through the bioperl-guts-l mail list which most developers subscribe to, so they'll know there's a new bug out there. I will not be able to get to it personally; our home computer died a slow painful death today (RIP 2002-2006) but I can get to it next week. If you post the bug, somebody might be able to get to it sooner! Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger > Sent: Friday, May 05, 2006 8:22 PM > To: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] [BULK] can't parse blast file anymore > > they bugzilla posting didn't work, what is the exact email address for > bugzilla > > Hubert Prielinger wrote: > > hi > > I have posted my script and the blast file to bugzilla...... > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri May 5 22:26:35 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 May 2006 21:26:35 -0500 Subject: [Bioperl-l] Changes to NCBIHelper (RE: CONTIG, genome files) Message-ID: <000f01c670b4$7f22f760$15327e82@pyrimidine> I committed a change to NCBIHelper that permits the downloading of CON (contig) files and corrects an issue where no sequence features were saved when rebuilding those files. If you use Bio::DB::GenBank regularly to download genome files, this likely will NOT affect your code unless you explicitly set the format type to 'genbank', like so: $factory = Bio::DB::GenBank->new(-format => 'gb'); # or 'genbank' I believe most will not have that setting since the default was already 'gb'. Now, the default is 'gbwithparts', which returns the full sequence regardless. If it is a file with a CONTIG line, the sequence is built on NCBI's end and will include seq features if they are present). As Brian said, we'll let NCBI do the work for us! If you need the actual file w/o sequence, then you can set the format to 'genbank' (like above) and it will grab it for you. There was an unrelated problem with CONTIG line parsing that I also fixed, where I changed the format over to a Bio::Annotation::SimpleValue as a workaround for now; for some reason some CON files were misparsed and resulted in infinite loops or missing 'join' statements. Chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From hubert.prielinger at gmx.at Sat May 6 18:22:05 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Sat, 06 May 2006 16:22:05 -0600 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <000d01c670ad$d209f980$15327e82@pyrimidine> References: <000d01c670ad$d209f980$15327e82@pyrimidine> Message-ID: <445D218D.2030504@gmx.at> ok, thanks I have submitted the bug bug #1994 Chris Fields wrote: > Hubert, > > Calm down. Breathe in, breath out. Relax....... > > Okay, here is the place to start. Read the instructions there first. > > http://www.bioperl.org/wiki/Bugs > > Bugs are reported at this site: > > http://bugzilla.bioperl.org/ > > Again, follow the instructions. You will have to create a user name and > password to submit. Once that is set up, click the "Submit a new bug" link > on the main bugzilla page. On that page, fill out all information first and > a description of the error and hit 'commit'. Add the BLAST report and some > sample script by clicking on the "Create a New Attachment" link (you'll have > to do this for each file). Once you go back to the bug page you should see > two attachments and the bug report. Any commits get sent through the > bioperl-guts-l mail list which most developers subscribe to, so they'll know > there's a new bug out there. > > I will not be able to get to it personally; our home computer died a slow > painful death today (RIP 2002-2006) but I can get to it next week. If you > post the bug, somebody might be able to get to it sooner! > > Chris > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger >> Sent: Friday, May 05, 2006 8:22 PM >> To: bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] [BULK] can't parse blast file anymore >> >> they bugzilla posting didn't work, what is the exact email address for >> bugzilla >> >> Hubert Prielinger wrote: >> >>> hi >>> I have posted my script and the blast file to bugzilla...... >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From torsten.seemann at infotech.monash.edu.au Sat May 6 20:57:14 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Sun, 07 May 2006 10:57:14 +1000 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <445D218D.2030504@gmx.at> References: <000d01c670ad$d209f980$15327e82@pyrimidine> <445D218D.2030504@gmx.at> Message-ID: <445D45EA.8020804@infotech.monash.edu.au> Hubert Prielinger wrote: > ok, thanks > I have submitted the bug > bug #1994 This is a line from the script you sent to Bugzilla: my $search = new Bio::SearchIO ( -verbose => 1,-format => 'blast', -file => $file) or die "could not open blast report" if not defined my $search; Althoygh syntactically correct, I don't think it is doing what you want. Please change it to this: my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or die "could not open blast report"; or alternatively, this: my $search = new Bio::SearchIO(-format => 'blast', -file => $file); if (not defined $search) { die "could not open blast report"; } and let us know what happens. all the example output you have supplied still suggests that Bio::SearchIO can not load or parse your blast report. -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia From mamillerpa at yahoo.com Sat May 6 19:07:30 2006 From: mamillerpa at yahoo.com (Mark A. Miller) Date: Sat, 6 May 2006 16:07:30 -0700 (PDT) Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines In-Reply-To: Message-ID: <20060506230730.56480.qmail@web50410.mail.yahoo.com> Thanks for your responses, Jason and Brian. Brian, you suggestion works great. I had really hoped that by parsing the OS line as well, I could be sure I wasn't missing any sequences from my organisms. Well, I gave up on that and just obtained the NCBI taxonomy values. I find it pretty easy to work with them in bioperl. Unfortunately, walking through all of Trembl takes a while, and I'm getting this error: Can't call method "ncbi_taxid" on an undefined value at ./ga2.pl line 55, line 3253682. When I try to extract annotations, etc., from entries like: DHE4_UNKP with: my $species_object = $seq->species; my $taxid_string = $species_object->ncbi_taxid; I guess I have to write an error handler for incomplete taxonomy values. Bye for now, Mark --- Brian Osborne wrote: > Mark, > > The RC line is part of the description of a reference, I'm guessing > 'RC' > stands for Reference Comment. In order to get the attributes of a > reference > you'll first do something like: > > my $anno_collection = $seq->annotation; > my @references = $anno_collection->get_Annotations('reference'); > > To get the comment field for a specific reference you can do: > > $references[0]->comment; > > See the Feature-Annotation HOWTO for more information on Annotations, > the > Reference object is a kind of Annotation object. > > Brian O. > > > On 5/3/06 3:34 PM, "Mark A. Miller" wrote: > > > Yeah. Do you have any experience with that? > > > > Mark > > > > --- Brian Osborne wrote: > > > >> Mark, > >> > >> So you're trying to get the information in the RC line from a > >> Swissprot > >> format file? > >> > >> Brian O. > > > > > > --- --- --- --- --- --- --- --- > > > > Mark A. Miller > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam protection around > > http://mail.yahoo.com > > > --- --- --- --- --- --- --- --- Mark A. Miller __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cjfields at uiuc.edu Sat May 6 23:33:40 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Sat, 6 May 2006 22:33:40 -0500 Subject: [Bioperl-l] [BULK] can't parse blast file anymore Message-ID: <65109dc1.b47d779e.81acb00@expms6.cites.uiuc.edu> The -verbose flag was my suggestion; it should output a ton of debugging info from SearchIO::blast; if you see anything there, then it means that it's at least attempting to parse the report. Of course I can't test this myself at the moment since my wife's computer died (along with the bioperl setup); I'm using a loaner computer at the moment. Chris ---- Original message ---- >Date: Sun, 07 May 2006 10:57:14 +1000 >From: Torsten Seemann >Subject: Re: [Bioperl-l] [BULK] ?can't parse blast file anymore >To: Hubert Prielinger >Cc: bioperl-l at bioperl.org > >Hubert Prielinger wrote: >> ok, thanks >> I have submitted the bug >> bug #1994 > >This is a line from the script you sent to Bugzilla: > >my $search = new Bio::SearchIO ( >-verbose => 1,-format => 'blast', -file => $file) >or die "could not open blast report" if not defined my $search; > >Althoygh syntactically correct, I don't think it is doing what you want. >Please change it to this: > >my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or die >"could not open blast report"; > >or alternatively, this: > >my $search = new Bio::SearchIO(-format => 'blast', -file => $file); >if (not defined $search) { > die "could not open blast report"; >} > >and let us know what happens. > >all the example output you have supplied still suggests that Bio::SearchIO can >not load or parse your blast report. > >-- >Torsten Seemann >Victorian Bioinformatics Consortium, Monash University, Australia >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From chen_li3 at yahoo.com Sun May 7 03:34:55 2006 From: chen_li3 at yahoo.com (chen li) Date: Sun, 7 May 2006 00:34:55 -0700 (PDT) Subject: [Bioperl-l] primer parameters using primer3 Message-ID: <20060507073455.11849.qmail@web36815.mail.mud.yahoo.com> Hi all, I use Bio::Tools::Run::Primer3 to design PCR primers. I want to change some default values, for example, to increase the PCR product size to 490-510 bp instead of using the default value of 100-300 bp. What should I do ? Thanks, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From jason.stajich at duke.edu Sun May 7 16:49:29 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun, 7 May 2006 16:49:29 -0400 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <65109dc1.b47d779e.81acb00@expms6.cites.uiuc.edu> References: <65109dc1.b47d779e.81acb00@expms6.cites.uiuc.edu> Message-ID: The problem is in how SearchIO was being initialized, the code basically looked like this: my $x = new Foo() or die if not defined my $x; which is invalid for two reason. 1) if not defined my $x; Will ALWAYS be false. 2) my $x = new Foo() or die ; Will cast the new object as a boolean. Whenever things aren't working, take a look at the code and try and walk through any shortcuts. For clarity make it a two-step process my $x = new Foo(); die "no valid $x" unless defined $x; Please note that currently BioPerl WILL die (via throw) if you try and ask for an invalid file when you initialize a new IO object -- this is handled by code in Bio::Root::IO (line 313 in Bio/Root/IO.pm) which all the IO objects use, so you don't really need to do a test on the object after all. --jason On May 6, 2006, at 11:33 PM, Christopher Fields wrote: > The -verbose flag was my suggestion; it should output a ton of > debugging info > from SearchIO::blast; if you see anything there, then it means that > it's at least > attempting to parse the report. > > Of course I can't test this myself at the moment since my wife's > computer died > (along with the bioperl setup); I'm using a loaner computer at the > moment. > > Chris > > ---- Original message ---- >> Date: Sun, 07 May 2006 10:57:14 +1000 >> From: Torsten Seemann >> Subject: Re: [Bioperl-l] [BULK] can't parse blast file anymore >> To: Hubert Prielinger >> Cc: bioperl-l at bioperl.org >> >> Hubert Prielinger wrote: >>> ok, thanks >>> I have submitted the bug >>> bug #1994 >> >> This is a line from the script you sent to Bugzilla: >> >> my $search = new Bio::SearchIO ( >> -verbose => 1,-format => 'blast', -file => $file) >> or die "could not open blast report" if not defined my $search; >> >> Althoygh syntactically correct, I don't think it is doing what you >> want. >> Please change it to this: >> >> my $search = new Bio::SearchIO(-format => 'blast', -file => $file) >> or die >> "could not open blast report"; >> >> or alternatively, this: >> >> my $search = new Bio::SearchIO(-format => 'blast', -file => $file); >> if (not defined $search) { >> die "could not open blast report"; >> } >> >> and let us know what happens. >> >> all the example output you have supplied still suggests that >> Bio::SearchIO can >> not load or parse your blast report. >> >> -- >> Torsten Seemann >> Victorian Bioinformatics Consortium, Monash University, Australia >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From jason.stajich at duke.edu Sun May 7 17:01:29 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun, 7 May 2006 17:01:29 -0400 Subject: [Bioperl-l] primer parameters using primer3 In-Reply-To: <20060507073455.11849.qmail@web36815.mail.mud.yahoo.com> References: <20060507073455.11849.qmail@web36815.mail.mud.yahoo.com> Message-ID: I put up some info on the wiki (and I encourage other people to do the same!) http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3 Set the command line parameters by just calling a function of the name of the parameter. To get a list of the available options, this perl code will report it to you: # what are the arguments, and what do they mean? my $args = $primer3->arguments; print "ARGUMENT\tMEANING\n"; foreach my $key (keys %{$args}) {print "$key\t", $$args{$key}, "\n"} The info for PRODUCT_SIZE_RANGE is: (size range list, default 100-300) space separated list of product sizes eg - - I believe you can set the PCR product size with $primer3->primer_product_size_range("490-510"); -jason On May 7, 2006, at 3:34 AM, chen li wrote: > Hi all, > > I use Bio::Tools::Run::Primer3 to design PCR primers. > I want to change some default values, for example, to > increase the PCR product size to 490-510 bp instead of > using the default value of 100-300 bp. What should I > do ? > > > Thanks, > > Li > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From chen_li3 at yahoo.com Sun May 7 21:18:17 2006 From: chen_li3 at yahoo.com (chen li) Date: Sun, 7 May 2006 18:18:17 -0700 (PDT) Subject: [Bioperl-l] primer parameters using primer3 In-Reply-To: Message-ID: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com> Hi Jason, I add the line code $primer3->primer_product_size_range("490-510"); to my script. But it doesn't work nor primer3 complains it. Li --- Jason Stajich wrote: > I put up some info on the wiki (and I encourage > other people to do > the same!) > http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3 > > Set the command line parameters by just calling a > function of the > name of the parameter. To get a list of the > available options, this > perl code will report it to you: > > # what are the arguments, and what do they mean? > my $args = $primer3->arguments; > > print "ARGUMENT\tMEANING\n"; > foreach my $key (keys %{$args}) {print "$key\t", > $$args{$key}, "\n"} > > The info for PRODUCT_SIZE_RANGE is: > (size range list, default 100-300) space > separated list of product > sizes eg - - > > I believe you can set the PCR product size with > $primer3->primer_product_size_range("490-510"); > > -jason > On May 7, 2006, at 3:34 AM, chen li wrote: > > > Hi all, > > > > I use Bio::Tools::Run::Primer3 to design PCR > primers. > > I want to change some default values, for example, > to > > increase the PCR product size to 490-510 bp > instead of > > using the default value of 100-300 bp. What should > I > > do ? > > > > > > Thanks, > > > > Li > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From hubert.prielinger at gmx.at Sun May 7 21:41:14 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Sun, 07 May 2006 19:41:14 -0600 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <445D45EA.8020804@infotech.monash.edu.au> References: <000d01c670ad$d209f980$15327e82@pyrimidine> <445D218D.2030504@gmx.at> <445D45EA.8020804@infotech.monash.edu.au> Message-ID: <445EA1BA.9050301@gmx.at> hi, I have corrected that and now I finally I got a few error messages: blast.pm: unrecognized line Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Sch?ffer, blast.pm: unrecognized line Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman blast.pm: unrecognized line (1997), "Gapped BLAST and PSI-BLAST: a new generation of blast.pm: unrecognized line protein database search programs", Nucleic Acids Res. 25:3389-3402. blast.pm: unrecognized line RID: 1137529800-24476-151611170370.BLASTQ1 after that line it stops without terminating.... Torsten Seemann wrote: > Hubert Prielinger wrote: >> ok, thanks >> I have submitted the bug >> bug #1994 > > This is a line from the script you sent to Bugzilla: > > my $search = new Bio::SearchIO ( > -verbose => 1,-format => 'blast', -file => $file) > or die "could not open blast report" if not defined my $search; > > Althoygh syntactically correct, I don't think it is doing what you want. > Please change it to this: > > my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or > die "could not open blast report"; > > or alternatively, this: > > my $search = new Bio::SearchIO(-format => 'blast', -file => $file); > if (not defined $search) { > die "could not open blast report"; > } > > and let us know what happens. > > all the example output you have supplied still suggests that > Bio::SearchIO can not load or parse your blast report. > From cjfields at uiuc.edu Sun May 7 22:04:13 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Sun, 7 May 2006 21:04:13 -0500 Subject: [Bioperl-l] [BULK] can't parse blast file anymore Message-ID: <42d52830.b4f91bfc.81e4600@expms6.cites.uiuc.edu> These are debugging lines (not errors); you still have the -verbose flag set. Did you follow Jason's advice? I believe he's right on the money about the issue at hand... Chris ---- Original message ---- >Date: Sun, 07 May 2006 19:41:14 -0600 >From: Hubert Prielinger >Subject: Re: [Bioperl-l] [BULK] ?can't parse blast file anymore >To: Torsten Seemann , bioperl- l at bioperl.org, Chris Fields , Jason Stajich > >hi, >I have corrected that and now I finally I got a few error messages: > >blast.pm: unrecognized line Reference: Altschul, Stephen F., Thomas L. >Madden, Alejandro A. Sch?ffer, >blast.pm: unrecognized line Jinghui Zhang, Zheng Zhang, Webb Miller, and >David J. Lipman >blast.pm: unrecognized line (1997), "Gapped BLAST and PSI-BLAST: a new >generation of >blast.pm: unrecognized line protein database search programs", Nucleic >Acids Res. 25:3389-3402. >blast.pm: unrecognized line RID: 1137529800-24476-151611170370.BLASTQ1 > >after that line it stops without terminating.... > > >Torsten Seemann wrote: >> Hubert Prielinger wrote: >>> ok, thanks >>> I have submitted the bug >>> bug #1994 >> >> This is a line from the script you sent to Bugzilla: >> >> my $search = new Bio::SearchIO ( >> -verbose => 1,-format => 'blast', -file => $file) >> or die "could not open blast report" if not defined my $search; >> >> Althoygh syntactically correct, I don't think it is doing what you want. >> Please change it to this: >> >> my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or >> die "could not open blast report"; >> >> or alternatively, this: >> >> my $search = new Bio::SearchIO(-format => 'blast', -file => $file); >> if (not defined $search) { >> die "could not open blast report"; >> } >> >> and let us know what happens. >> >> all the example output you have supplied still suggests that >> Bio::SearchIO can not load or parse your blast report. >> > From jason.stajich at duke.edu Sun May 7 22:47:00 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun, 7 May 2006 22:47:00 -0400 Subject: [Bioperl-l] primer parameters using primer3 In-Reply-To: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com> References: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com> Message-ID: <430DE892-8EE8-4FC9-8BAC-7D344C876B72@duke.edu> I'm not really familiar with the module more than what the documentation says so did you try and use the add_targets method to add arguments instead? I had thought the AUTOLOAD method took care of access to the cmd line arguments as it does for the other Run modules but I am not really sure. Perhaps folks on the list who use this module can provide better advice. -jason On May 7, 2006, at 9:18 PM, chen li wrote: > Hi Jason, > > I add the line code > $primer3->primer_product_size_range("490-510"); > to my script. But it doesn't work nor primer3 > complains it. > > Li > > --- Jason Stajich wrote: > >> I put up some info on the wiki (and I encourage >> other people to do >> the same!) >> > http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3 >> >> Set the command line parameters by just calling a >> function of the >> name of the parameter. To get a list of the >> available options, this >> perl code will report it to you: >> >> # what are the arguments, and what do they mean? >> my $args = $primer3->arguments; >> >> print "ARGUMENT\tMEANING\n"; >> foreach my $key (keys %{$args}) {print "$key\t", >> $$args{$key}, "\n"} >> >> The info for PRODUCT_SIZE_RANGE is: >> (size range list, default 100-300) space >> separated list of product >> sizes eg - - >> >> I believe you can set the PCR product size with >> $primer3->primer_product_size_range("490-510"); >> >> -jason >> On May 7, 2006, at 3:34 AM, chen li wrote: >> >>> Hi all, >>> >>> I use Bio::Tools::Run::Primer3 to design PCR >> primers. >>> I want to change some default values, for example, >> to >>> increase the PCR product size to 490-510 bp >> instead of >>> using the default value of 100-300 bp. What should >> I >>> do ? >>> >>> >>> Thanks, >>> >>> Li >>> >>> __________________________________________________ >>> Do You Yahoo!? >>> Tired of spam? Yahoo! Mail has the best spam >> protection around >>> http://mail.yahoo.com >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com -- Jason Stajich Duke University http://www.duke.edu/~jes12 From osborne1 at optonline.net Mon May 8 10:49:22 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Mon, 08 May 2006 10:49:22 -0400 Subject: [Bioperl-l] primer parameters using primer3 In-Reply-To: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com> Message-ID: Li, Read the documentation, Bio::Tools::Run::Primer3. It shows examples of the correct syntax. Also look at bioperl-run/t/Primer3.t. Brian O. On 5/7/06 9:18 PM, "chen li" wrote: > Hi Jason, > > I add the line code > $primer3->primer_product_size_range("490-510"); > to my script. But it doesn't work nor primer3 > complains it. > > Li > > --- Jason Stajich wrote: > >> I put up some info on the wiki (and I encourage >> other people to do >> the same!) >> > http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3 >> >> Set the command line parameters by just calling a >> function of the >> name of the parameter. To get a list of the >> available options, this >> perl code will report it to you: >> >> # what are the arguments, and what do they mean? >> my $args = $primer3->arguments; >> >> print "ARGUMENT\tMEANING\n"; >> foreach my $key (keys %{$args}) {print "$key\t", >> $$args{$key}, "\n"} >> >> The info for PRODUCT_SIZE_RANGE is: >> (size range list, default 100-300) space >> separated list of product >> sizes eg - - >> >> I believe you can set the PCR product size with >> $primer3->primer_product_size_range("490-510"); >> >> -jason >> On May 7, 2006, at 3:34 AM, chen li wrote: >> >>> Hi all, >>> >>> I use Bio::Tools::Run::Primer3 to design PCR >> primers. >>> I want to change some default values, for example, >> to >>> increase the PCR product size to 490-510 bp >> instead of >>> using the default value of 100-300 bp. What should >> I >>> do ? >>> >>> >>> Thanks, >>> >>> Li >>> >>> __________________________________________________ >>> Do You Yahoo!? >>> Tired of spam? Yahoo! Mail has the best spam >> protection around >>> http://mail.yahoo.com >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy at colibase.bham.ac.uk Mon May 8 07:12:49 2006 From: roy at colibase.bham.ac.uk (Roy Chaudhuri) Date: Mon, 08 May 2006 12:12:49 +0100 Subject: [Bioperl-l] primer parameters using primer3 In-Reply-To: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com> References: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com> Message-ID: <445F27B1.40501@colibase.bham.ac.uk> Hi Li, I think the syntax you need is: $primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510'); I guess you may also need to change the parameter PRIMER_PRODUCT_OPT_SIZE. Incidentally, such a restricted product size range may mean that Primer3 is unable to design any suitable primers. If I recall correctly, this doesn't cause an error, you just get a Bio::Tools::Primer3 object with no primers in it. I have had some success with testing for this, and if necessary relaxing some constraints on primer design and re-running Primer3. Hope this helps. Roy. -- Dr. Roy Chaudhuri Bioinformatics Research Fellow Division of Immunity and Infection University of Birmingham, U.K. http://xbase.bham.ac.uk > Hi Jason, > > I add the line code > $primer3->primer_product_size_range("490-510"); > to my script. But it doesn't work nor primer3 > complains it. > > Li > > --- Jason Stajich wrote: > >> > I put up some info on the wiki (and I encourage >> > other people to do >> > the same!) >> > > http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3 >> > >> > Set the command line parameters by just calling a >> > function of the >> > name of the parameter. To get a list of the >> > available options, this >> > perl code will report it to you: >> > >> > # what are the arguments, and what do they mean? >> > my $args = $primer3->arguments; >> > >> > print "ARGUMENT\tMEANING\n"; >> > foreach my $key (keys %{$args}) {print "$key\t", >> > $$args{$key}, "\n"} >> > >> > The info for PRODUCT_SIZE_RANGE is: >> > (size range list, default 100-300) space >> > separated list of product >> > sizes eg - - >> > >> > I believe you can set the PCR product size with >> > $primer3->primer_product_size_range("490-510"); >> > >> > -jason >> > On May 7, 2006, at 3:34 AM, chen li wrote: >> > >>> > > Hi all, >>> > > >>> > > I use Bio::Tools::Run::Primer3 to design PCR >> > primers. >>> > > I want to change some default values, for example, >> > to >>> > > increase the PCR product size to 490-510 bp >> > instead of >>> > > using the default value of 100-300 bp. What should >> > I >>> > > do ? >>> > > >>> > > >>> > > Thanks, >>> > > >>> > > Li >>> > > >>> > > __________________________________________________ >>> > > Do You Yahoo!? >>> > > Tired of spam? Yahoo! Mail has the best spam >> > protection around >>> > > http://mail.yahoo.com >>> > > _______________________________________________ >>> > > Bioperl-l mailing list >>> > > Bioperl-l at lists.open-bio.org >>> > > >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > -- >> > Jason Stajich >> > Duke University >> > http://www.duke.edu/~jes12 >> > >> > >> > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From chen_li3 at yahoo.com Mon May 8 09:21:54 2006 From: chen_li3 at yahoo.com (chen li) Date: Mon, 8 May 2006 06:21:54 -0700 (PDT) Subject: [Bioperl-l] primer parameters using primer3 In-Reply-To: <445F27B1.40501@colibase.bham.ac.uk> Message-ID: <20060508132154.71440.qmail@web36802.mail.mud.yahoo.com> I think Dr. Chaudhuri is correct. I add the follwoing line codes to my script(actually copy from the document) $primer3->add_targets( PRIMER_PRODUCT_SIZE_RANGE=>'490-510'); $primer3->add_targets('PRIMER_MIN_TM'=>60, 'PRIMER_MAX_TM'=>64); to design the primers with product size from 490-510 bp and primer annealing Tm from 60 to 64C . Here is part of the output in the file called temp.out: .......... original sequence..... GTGGGCTGGTGTTGCTTGGAAAATTTCAAAATCCCAAAGTTTCAGGCTTCCCAAAGTTGGCTTGGAAAAATGTGATAGTCTCACCTGAGTCTAGACATGT ................. PRIMER_PRODUCT_SIZE_RANGE=490-510 PRIMER_MIN_TM=60 PRIMER_MAX_TM=64 PRIMER_PAIR_PENALTY=0.1544 PRIMER_LEFT_PENALTY=0.081468 PRIMER_RIGHT_PENALTY=0.072951 PRIMER_LEFT_SEQUENCE=CCAAAGTTGGCTTGGAAAAA ............................... PRIMER_PRODUCT_SIZE=501 .............. This is what I want. If you don't set the special parameters such annealing Tm program will use the defualt ones. If you set your own parameters they will show up after the sequence (see this output example). If one needs to set more parameters and wants to know what parameters are available just browse the code for BEGIN section. Now I have another question: the program always prints out the original sequence at the beginning is it possible not to do that? Thanks all for join this topic, Li --- Roy Chaudhuri wrote: > Hi Li, > > I think the syntax you need is: > > $primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510'); > > I guess you may also need to change the parameter > PRIMER_PRODUCT_OPT_SIZE. > > Incidentally, such a restricted product size range > may mean that Primer3 > is unable to design any suitable primers. If I > recall correctly, this > doesn't cause an error, you just get a > Bio::Tools::Primer3 object with > no primers in it. I have had some success with > testing for this, and if > necessary relaxing some constraints on primer design > and re-running > Primer3. > > Hope this helps. > Roy. > > -- > Dr. Roy Chaudhuri > Bioinformatics Research Fellow > Division of Immunity and Infection > University of Birmingham, U.K. > > http://xbase.bham.ac.uk > > > Hi Jason, > > > > I add the line code > > $primer3->primer_product_size_range("490-510"); > > to my script. But it doesn't work nor primer3 > > complains it. > > > > Li > > > > --- Jason Stajich wrote: > > > >> > I put up some info on the wiki (and I encourage > >> > other people to do > >> > the same!) > >> > > > > http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3 > >> > > >> > Set the command line parameters by just calling > a > >> > function of the > >> > name of the parameter. To get a list of the > >> > available options, this > >> > perl code will report it to you: > >> > > >> > # what are the arguments, and what do they > mean? > >> > my $args = $primer3->arguments; > >> > > >> > print "ARGUMENT\tMEANING\n"; > >> > foreach my $key (keys %{$args}) {print > "$key\t", > >> > $$args{$key}, "\n"} > >> > > >> > The info for PRODUCT_SIZE_RANGE is: > >> > (size range list, default 100-300) space > >> > separated list of product > >> > sizes eg - - > >> > > >> > I believe you can set the PCR product size with > >> > > $primer3->primer_product_size_range("490-510"); > >> > > >> > -jason > >> > On May 7, 2006, at 3:34 AM, chen li wrote: > >> > > >>> > > Hi all, > >>> > > > >>> > > I use Bio::Tools::Run::Primer3 to design PCR > >> > primers. > >>> > > I want to change some default values, for > example, > >> > to > >>> > > increase the PCR product size to 490-510 bp > >> > instead of > >>> > > using the default value of 100-300 bp. What > should > >> > I > >>> > > do ? > >>> > > > >>> > > > >>> > > Thanks, > >>> > > > >>> > > Li > >>> > > > >>> > > > __________________________________________________ > >>> > > Do You Yahoo!? > >>> > > Tired of spam? Yahoo! Mail has the best > spam > >> > protection around > >>> > > http://mail.yahoo.com > >>> > > > _______________________________________________ > >>> > > Bioperl-l mailing list > >>> > > Bioperl-l at lists.open-bio.org > >>> > > > >> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > >> > -- > >> > Jason Stajich > >> > Duke University > >> > http://www.duke.edu/~jes12 > >> > > >> > > >> > > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From hubert.prielinger at gmx.at Mon May 8 15:09:29 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Mon, 08 May 2006 13:09:29 -0600 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <42d52830.b4f91bfc.81e4600@expms6.cites.uiuc.edu> References: <42d52830.b4f91bfc.81e4600@expms6.cites.uiuc.edu> Message-ID: <445F9769.70500@gmx.at> hi all together, i have solved the problem, because I'm parsing blast 2.2.13 and I have installed an early bioperl 1.5.1 and there it occurred that bug 1934 wasn't fixed yet, so I had to exchange the blast.pm file and now it works properly. thank you very much Hubert Christopher Fields wrote: > These are debugging lines (not errors); you still have the -verbose flag set. > > Did you follow Jason's advice? I believe he's right on the money about the issue > at hand... > > Chris > > ---- Original message ---- > >> Date: Sun, 07 May 2006 19:41:14 -0600 >> From: Hubert Prielinger >> Subject: Re: [Bioperl-l] [BULK] can't parse blast file anymore >> To: Torsten Seemann , bioperl- >> > l at bioperl.org, Chris Fields , Jason Stajich > > >> hi, >> I have corrected that and now I finally I got a few error messages: >> >> blast.pm: unrecognized line Reference: Altschul, Stephen F., Thomas L. >> Madden, Alejandro A. Sch?ffer, >> blast.pm: unrecognized line Jinghui Zhang, Zheng Zhang, Webb Miller, and >> David J. Lipman >> blast.pm: unrecognized line (1997), "Gapped BLAST and PSI-BLAST: a new >> generation of >> blast.pm: unrecognized line protein database search programs", Nucleic >> Acids Res. 25:3389-3402. >> blast.pm: unrecognized line RID: >> > 1137529800-24476-151611170370.BLASTQ1 > >> after that line it stops without terminating.... >> >> >> Torsten Seemann wrote: >> >>> Hubert Prielinger wrote: >>> >>>> ok, thanks >>>> I have submitted the bug >>>> bug #1994 >>>> >>> This is a line from the script you sent to Bugzilla: >>> >>> my $search = new Bio::SearchIO ( >>> -verbose => 1,-format => 'blast', -file => $file) >>> or die "could not open blast report" if not defined my $search; >>> >>> Althoygh syntactically correct, I don't think it is doing what you want. >>> Please change it to this: >>> >>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or >>> die "could not open blast report"; >>> >>> or alternatively, this: >>> >>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file); >>> if (not defined $search) { >>> die "could not open blast report"; >>> } >>> >>> and let us know what happens. >>> >>> all the example output you have supplied still suggests that >>> Bio::SearchIO can not load or parse your blast report. >>> >>> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From s.johri at imperial.ac.uk Mon May 8 11:38:13 2006 From: s.johri at imperial.ac.uk (Johri, Saurabh) Date: Mon, 8 May 2006 16:38:13 +0100 Subject: [Bioperl-l] PAML + Codeml problem.. Message-ID: <4A98ACB8EC146149872BAC9A132A582C277AC4@icex5.ic.ac.uk> Hi all, I'm trying to use codeml from PAML to estimate Ka, Ks values from sequences within a multi fasta file: i'm using the code which has been posted on the bioperl wiki... However, when I run the code, i get the following errors: I did a google search to see if anyone had come across similar problems.... in which case the problem seems to have been due to the sequences not being a multiple of 3, In my code I check if the sequence is a multiple of 3 and if not, i alter the sequences until this is the case, although I still have the same error messages, Any suggestions as to why this could be happening? Thanks!!! Saurabh Johri Tuberculosis Research Group Centre for Molecular Microbiology & Infection Imperial College London SW7 2AZ -------------------- WARNING --------------------- MSG: There was an error - see error_string for the program output --------------------------------------------------- ------------- EXCEPTION Bio::Root::NotImplemented ------------- MSG: Unknown format of PAML output STACK Bio::Tools::Phylo::PAML::_parse_summary /sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:359 STACK Bio::Tools::Phylo::PAML::next_result /sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:224 ------------------------------------ >Rv3923c caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc aaataagcccggtgttgcaatcaa >Rv3923c_mtb_cdc1551 caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcac >Rv3923c_mtb_f11 caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc aaataagcccggtgttgcaatcaa >Rv3923c_mtb_c1 caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc aaataagcccggtgttgcaatcaa >Rv3923c_mtb_210 caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc aaataagcccggtgttgcaatcaa >Rv3923c_mbovis caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc aaataagcccggtgttgcaatcaa ------------------------------------ From chen_li3 at yahoo.com Mon May 8 20:21:42 2006 From: chen_li3 at yahoo.com (chen li) Date: Mon, 8 May 2006 17:21:42 -0700 (PDT) Subject: [Bioperl-l] use primer3 to design primers with multiple sequences Message-ID: <20060509002142.94880.qmail@web36806.mail.mud.yahoo.com> Dear all, The following is the script I use to design primers for one sequence: #!/cygdrive/c/Perl/bin/perl.exe use warnings; use strict; use Bio::Tools::Run::Primer3; use Bio::SeqIO; my $file_in='piwil2.fa'; my $file_out='temp.out'; my $seqio=Bio::SeqIO->new(-file=>$file_in) my $seq=$seqio->next_seq; my $primer3=Bio::Tools::Run::Primer3->new( -seq=>$seq, -outfile=>$file_out, - path=>"c:/Perl/local/primer3_1.0.0/src/primer3.exe" ); unless ($primer3->executable){ print "primer3 can not be found. Is it installed?\n"; exit(-1); } $primer3->add_targets( # set your own parameters for the primers or product 'PRIMER_OPT_GC_PERCENT'=>' 50 ', 'PRIMER_OPT_SIZE'=> '24 ', 'PRIMER_OPT_TM'=> ' 60 '); my $result=$primer3->run; exit; I try to modify it for multiple sequences by using a while loop as following: while ($seq=$seqio->next_seq){ my $primer3=Bio::Tools::Run::Primer3->new() # design the primer} ....} I get primers only for the last sequence. It seems the earlier ones are overwritten. Any idea will be highly aprreciated. Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From jason.stajich at duke.edu Mon May 8 20:59:26 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon, 8 May 2006 20:59:26 -0400 Subject: [Bioperl-l] PAML + Codeml problem.. In-Reply-To: <4A98ACB8EC146149872BAC9A132A582C277AC4@icex5.ic.ac.uk> References: <4A98ACB8EC146149872BAC9A132A582C277AC4@icex5.ic.ac.uk> Message-ID: <4796FE3D-9D14-4D93-B455-69EDFE2B2B62@duke.edu> Saurabh - a) These sequences are identical except for difference in length so there isn't going to be any interesting values from PAML, but maybe you are just providing an example? b) I think you are missing the trailing gaps in the alignment of the Rv3923c_mtb_cdc1551 sequence as it is shorter PAML requires aligned sequences as input. c) The sequences, in the reading frame you have provided (and using the standard translation table), have stop codons in them, this will cause failure as well. Which code from the wiki are you running, the 'running PAML' part of the HOWTO? Try looking at the actual output from PAML to figure out what is wrong. Add this when initializing the Run object: -save_tempfiles => 1, -verbose => 1, then open up the tempdir that is reported and look at the output files (mlc file). -jason On May 8, 2006, at 11:38 AM, Johri, Saurabh wrote: > Hi all, > > I'm trying to use codeml from PAML to estimate Ka, Ks values from > sequences within a multi fasta file: > i'm using the code which has been posted on the bioperl wiki... > > However, when I run the code, i get the following errors: > > I did a google search to see if anyone had come across similar > problems.... in which case the problem seems to have been due to the > sequences not being a multiple of 3, > In my code I check if the sequence is a multiple of 3 and if not, i > alter the sequences until this is the case, although I still have the > same error messages, > > Any suggestions as to why this could be happening? > > Thanks!!! > > Saurabh Johri > Tuberculosis Research Group > Centre for Molecular Microbiology & Infection > Imperial College London > SW7 2AZ > > > > > -------------------- WARNING --------------------- > MSG: There was an error - see error_string for the program output > --------------------------------------------------- > > ------------- EXCEPTION Bio::Root::NotImplemented ------------- > MSG: Unknown format of PAML output > STACK Bio::Tools::Phylo::PAML::_parse_summary > /sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:359 > STACK Bio::Tools::Phylo::PAML::next_result > /sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:224 > ------------------------------------ > >> Rv3923c > caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg > ag > gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg > ac > ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc > gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg > gt > acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc > gc > aaataagcccggtgttgcaatcaa >> Rv3923c_mtb_cdc1551 > caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg > ag > gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg > ac > ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc > gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg > gt > acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcac >> Rv3923c_mtb_f11 > caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg > ag > gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg > ac > ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc > gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg > gt > acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc > gc > aaataagcccggtgttgcaatcaa >> Rv3923c_mtb_c1 > caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg > ag > gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg > ac > ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc > gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg > gt > acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc > gc > aaataagcccggtgttgcaatcaa >> Rv3923c_mtb_210 > caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg > ag > gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg > ac > ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc > gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg > gt > acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc > gc > aaataagcccggtgttgcaatcaa >> Rv3923c_mbovis > caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg > ag > gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg > ac > ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc > gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg > gt > acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc > gc > aaataagcccggtgttgcaatcaa > > ------------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From osborne1 at optonline.net Mon May 8 21:17:22 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Mon, 08 May 2006 21:17:22 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiple sequences In-Reply-To: <20060509002142.94880.qmail@web36806.mail.mud.yahoo.com> Message-ID: Li, If you're analyzing multiple input sequences you're going to have to create multiple output sequences. Brian O. On 5/8/06 8:21 PM, "chen li" wrote: > I get primers only for the last sequence. It seems the > earlier ones are overwritten. From WiersmaP at AGR.GC.CA Mon May 8 21:28:27 2006 From: WiersmaP at AGR.GC.CA (Wiersma, Paul) Date: Mon, 8 May 2006 21:28:27 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiple sequences Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C41@onncrxms5.agr.gc.ca> Hi Li, When you execute $primer3->run with a Bio::Tools::Run::Primer3 object it opens -outfile=>"filename" for writing and then closes. That's why putting it in a loop will overwrite your output file each time so you only see the last one. I suppose you could read in each output file before looping to the next seq and append it to another file. If you're doing a fair bit of work with this module it would be worth looking at the Bio::Tools::Primer3 module. The statement $result = $primer3->run produces a Bio::Tools::Primer3 object which has all the methods you need for customizing your output. Paul Paul A. Wiersma Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Summerland, BC wiersmap at agr.gc.ca From simon_sask at yahoo.com Tue May 9 04:06:04 2006 From: simon_sask at yahoo.com (Simon K. Chan) Date: Tue, 9 May 2006 01:06:04 -0700 (PDT) Subject: [Bioperl-l] Raw Blast Alignment Message-ID: <20060509080604.53621.qmail@web54104.mail.yahoo.com> Hi Fellow Bioperl-ers, bioperl-live/examples/searchio/rawwriter.pl is supposed to show the raw alignments using Bio::SearchIO. The script is written to parse a PSI-BLAST report. I found an old email in the archive from Jason stating that this should parse other flavors of blast reports as well. What do I need to do to make this script parse non-PSI blast reports? I tried to just specify a file and that the -format is 'blast', but I get an error stating that the object method 'raw_hit_data' is not defined in Bio::Search::Hit::BlastHit. Basically, I want to obtain the raw alignment because I'd like to get the size of the gaps, not just the number. Any help will be much appreciated. Many thanks __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cjfields at uiuc.edu Tue May 9 08:21:02 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Tue, 9 May 2006 07:21:02 -0500 Subject: [Bioperl-l] Raw Blast Alignment Message-ID: You need to read the SearchIO HOWTO, which gives several examples: http://www.bioperl.org/wiki/HOWTO:SearchIO Chris ---- Original message ---- >Date: Tue, 9 May 2006 01:06:04 -0700 (PDT) >From: "Simon K. Chan" >Subject: [Bioperl-l] Raw Blast Alignment >To: bioperl-l at lists.open-bio.org > >Hi Fellow Bioperl-ers, > >bioperl-live/examples/searchio/rawwriter.pl is >supposed to show the raw alignments using >Bio::SearchIO. The script is written to parse a >PSI-BLAST report. I found an old email in the archive >from Jason stating that this should parse other >flavors of blast reports as well. > >What do I need to do to make this script parse non-PSI >blast reports? I tried to just specify a file and >that the -format is 'blast', but I get an error >stating that the object method 'raw_hit_data' is not >defined in Bio::Search::Hit::BlastHit. > >Basically, I want to obtain the raw alignment because >I'd like to get the size of the gaps, not just the >number. > >Any help will be much appreciated. >Many thanks > > >__________________________________________________ >Do You Yahoo!? >Tired of spam? Yahoo! Mail has the best spam protection around >http://mail.yahoo.com >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From peterm at bioinf.uni-leipzig.de Tue May 9 08:44:25 2006 From: peterm at bioinf.uni-leipzig.de (Peter Menzel) Date: Tue, 09 May 2006 14:44:25 +0200 Subject: [Bioperl-l] colorize features Message-ID: <44608EA9.1030808@bioinf.uni-leipzig.de> Hi all, I am using the Bio::Graphics module to draw sequences and their features with Bio::SeqFeature::Generic. The features I want to highlight are occurrences of transcription binding factors. Therefore I want to give every factor its own color, but i didn't see how to manage it. I only can colorize complete tracks. Is there a known workaround? Thanks, Peter From Marc.Logghe at DEVGEN.com Tue May 9 10:13:24 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Tue, 9 May 2006 16:13:24 +0200 Subject: [Bioperl-l] colorize features Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746D88@ANTARESIA.be.devgen.com> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Peter Menzel > Sent: Tuesday, May 09, 2006 2:44 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] colorize features > > Hi all, > I am using the Bio::Graphics module to draw sequences and > their features with Bio::SeqFeature::Generic. > The features I want to highlight are occurrences of > transcription binding factors. Therefore I want to give every > factor its own color, but i didn't see how to manage it. I > only can colorize complete tracks. > Is there a known workaround? Yes, instead of giving a hardcoded color value you can pass a subroutine to the option. -bgcolor => sub { my $feat = shift; # get your attribute on which you want to base your color my ($attr) = $feat->get_tag_values('my_attribute'); return $attr > 10 ? 'red' : 'green' } Not sure about the method calls I am making here (could as well be get_attributes()) but you get the idea. Cheers, Marc From Marc.Logghe at DEVGEN.com Tue May 9 10:47:06 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Tue, 9 May 2006 16:47:06 +0200 Subject: [Bioperl-l] colorize features Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746D89@ANTARESIA.be.devgen.com> Hi Peter, Actually it is explained much better in this howto: http://bioperl.org/wiki/HOWTO:Graphics The examples show the principle I mentioned in my previous post (e.g. Example 4), but then for the -label or -description options. But as said, you can apply this as well for (most of ?) the other options as well. Regards, ML > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Marc Logghe > Sent: Tuesday, May 09, 2006 4:13 PM > To: Peter Menzel; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] colorize features > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Peter > > Menzel > > Sent: Tuesday, May 09, 2006 2:44 PM > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] colorize features > > > > Hi all, > > I am using the Bio::Graphics module to draw sequences and their > > features with Bio::SeqFeature::Generic. > > The features I want to highlight are occurrences of transcription > > binding factors. Therefore I want to give every factor its > own color, > > but i didn't see how to manage it. I only can colorize complete > > tracks. > > Is there a known workaround? > > Yes, instead of giving a hardcoded color value you can pass a > subroutine to the option. > -bgcolor => sub { > my $feat = shift; > # get your attribute on which you want to base your color > my ($attr) = $feat->get_tag_values('my_attribute'); > > return $attr > 10 ? 'red' : 'green' > } > > Not sure about the method calls I am making here (could as well be > get_attributes()) but you get the idea. > Cheers, > Marc > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From WiersmaP at AGR.GC.CA Tue May 9 11:49:33 2006 From: WiersmaP at AGR.GC.CA (Wiersma, Paul) Date: Tue, 9 May 2006 11:49:33 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiple sequences Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C42@onncrxms5.agr.gc.ca> Hi Li, The line "my $result = $primer3->run" is already in the code you submitted. In the Bio::Tools::Primer3 module the author uses "$p3" for the object. If you change your line to "my $p3 = $primer3->run" you should be able to run the examples below. Process the results for each sequence and output the results before looping to the next sequence. >From Bio::Tools::Primer3.pm: # how many results were there? my $num=$p3->number_of_results; print "There were $num results\n"; # get all the results my $all_results=$p3->all_results; print "ALL the results\n"; foreach my $key (keys %{$all_results}) {print "$key\t${$all_results}{$key}\n"} # get specific results my $result1=$p3->primer_results(1); print "The first primer is\n"; foreach my $key (keys %{$result1}) {print "$key\t${$result1}{$key}\n"} Paul Paul A. Wiersma Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Summerland, BC wiersmap at agr.gc.ca ? -----Original Message----- From: chen li [mailto:chen_li3 at yahoo.com] Sent: Monday, May 08, 2006 8:32 PM To: Wiersma, Paul Subject: Re: [Bioperl-l] use primer3 to design primers with multiple sequences Hi Paul, I read both documents. What I understand is that Bio:Tools::Run:Primer3 is for designing primers and Bio:Tools::Primer3 is for parsing the results. When I read the documents I do not see this line $result = $primer3->run in Bio:Tools::Primer3. I wonder how you get this infomration. Thanks, Li --- "Wiersma, Paul" wrote: > Hi Li, > > > > When you execute $primer3->run with a > Bio::Tools::Run::Primer3 object it > opens -outfile=>"filename" for writing and then > closes. That's why > putting it in a loop will overwrite your output file > each time so you > only see the last one. I suppose you could read in > each output file > before looping to the next seq and append it to > another file. > > > > If you're doing a fair bit of work with this module > it would be worth > looking at the Bio::Tools::Primer3 module. The > statement $result = > $primer3->run produces a Bio::Tools::Primer3 object > which has all the > methods you need for customizing your output. > > > > Paul > > > > Paul A. Wiersma > Agriculture and Agri-Food Canada/Agriculture et > Agroalimentaire Canada > Summerland, BC > > wiersmap at agr.gc.ca > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From chen_li3 at yahoo.com Tue May 9 13:32:32 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 9 May 2006 10:32:32 -0700 (PDT) Subject: [Bioperl-l] use primer3 to design primers with multiple sequences In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C42@onncrxms5.agr.gc.ca> Message-ID: <20060509173232.18843.qmail@web36802.mail.mud.yahoo.com> Thanks Paul it REALLY works. I have other questions: 1) When I run the script I use this line on the command prompt perl primer.pl >test When I check the default output file(temp.out) used by the script I only see the information about the last sequence which is different from what is in the test file. In test file I can get all the information for all the sequences. 2)Is it possible directly to use Bio::Tools:: Primer3 to print out selective information such as the primer sequence and the size of PCR product? Or do I have parse the file by myself? After I get all these information I would like to post the script for bacth-designing PCR primers. Thanks, Li --- "Wiersma, Paul" wrote: > Hi Li, > > The line "my $result = $primer3->run" is already in > the code you submitted. In the Bio::Tools::Primer3 > module the author uses "$p3" for the object. If you > change your line to "my $p3 = $primer3->run" you > should be able to run the examples below. Process > the results for each sequence and output the results > before looping to the next sequence. > > >From Bio::Tools::Primer3.pm: > > # how many results were there? > my $num=$p3->number_of_results; > print "There were $num results\n"; > > # get all the results > my $all_results=$p3->all_results; > print "ALL the results\n"; > foreach my $key (keys %{$all_results}) {print > "$key\t${$all_results}{$key}\n"} > > # get specific results > my $result1=$p3->primer_results(1); > print "The first primer is\n"; > foreach my $key (keys %{$result1}) {print > "$key\t${$result1}{$key}\n"} > > Paul > > Paul A. Wiersma > Agriculture and Agri-Food Canada/Agriculture et > Agroalimentaire Canada > Summerland, BC > wiersmap at agr.gc.ca > > ? > > > > -----Original Message----- > From: chen li [mailto:chen_li3 at yahoo.com] > Sent: Monday, May 08, 2006 8:32 PM > To: Wiersma, Paul > Subject: Re: [Bioperl-l] use primer3 to design > primers with multiple sequences > > Hi Paul, > > I read both documents. What I understand is that > Bio:Tools::Run:Primer3 is for designing primers and > Bio:Tools::Primer3 is for parsing the results. When > I > read the documents I do not see this line > $result = $primer3->run in Bio:Tools::Primer3. I > wonder how you get this infomration. > > Thanks, > > Li > > --- "Wiersma, Paul" wrote: > > > Hi Li, > > > > > > > > When you execute $primer3->run with a > > Bio::Tools::Run::Primer3 object it > > opens -outfile=>"filename" for writing and then > > closes. That's why > > putting it in a loop will overwrite your output > file > > each time so you > > only see the last one. I suppose you could read > in > > each output file > > before looping to the next seq and append it to > > another file. > > > > > > > > If you're doing a fair bit of work with this > module > > it would be worth > > looking at the Bio::Tools::Primer3 module. The > > statement $result = > > $primer3->run produces a Bio::Tools::Primer3 > object > > which has all the > > methods you need for customizing your output. > > > > > > > > Paul > > > > > > > > Paul A. Wiersma > > Agriculture and Agri-Food Canada/Agriculture et > > Agroalimentaire Canada > > Summerland, BC > > > > wiersmap at agr.gc.ca > > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam > protection around > http://mail.yahoo.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From WiersmaP at AGR.GC.CA Tue May 9 13:59:20 2006 From: WiersmaP at AGR.GC.CA (Wiersma, Paul) Date: Tue, 9 May 2006 13:59:20 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiple sequences Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C43@onncrxms5.agr.gc.ca> Hi Li, I've attached some code I used to explore basic functionality of Primer3.pm modules. Hopefully you can see how I've picked out parts of the results for printing. You can modify it as you need to output only some results. >>>>>>>> # design the primers. This runs primer3 and returns a # Bio::Tools::Run::Primer3 object with the results my $results=$primer3->run; # see the Bio::Tools::Run::Primer3 pod for # things that you can get from this. For example: print "There were ", $results->number_of_results+1, " primers\n"; my @out_keys_part = qw( START LENGTH TM GC_PERCENT SELF_ANY SELF_END SEQUENCE ); for (my $i=0;$i <= $results->number_of_results;$i++){ # get specific results my $result1=$results->primer_results($i); print "\n",$i+1; for $key qw(PRIMER_LEFT PRIMER_RIGHT){ my ($start, $length) = split /,/, ${$result1}{$key}; ${$result1}{$key."_START"} = $start; ${$result1}{$key."_LENGTH"} = $length; foreach $partkey (@out_keys_part) { print "\t", ${$result1}{$key."_".$partkey}; } print "\n"; } print "\tPRODUCT SIZE: ", ${$result1}{'PRIMER_PRODUCT_SIZE'}, ", PAIR ANY COMPL: ", ${$result1}{'PRIMER_PAIR_COMPL_ANY'}; print ", PAIR 3\' COMPL: ", ${$result1}{'PRIMER_PAIR_COMPL_END'}, "\n"; } >>>>>>>>>>>>>>> Paul A. Wiersma Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Telephone/T?l?phone: 250-494-6388 Facsimile/T?l?copieur: 250-494-0755 Box 5000, 4200 Hwy 97 Summerland, BC V0H 1Z0 wiersmap at agr.gc.ca ? -----Original Message----- From: chen li [mailto:chen_li3 at yahoo.com] Sent: Tuesday, May 09, 2006 10:33 AM To: Wiersma, Paul Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] use primer3 to design primers with multiple sequences Thanks Paul it REALLY works. I have other questions: 1) When I run the script I use this line on the command prompt perl primer.pl >test When I check the default output file(temp.out) used by the script I only see the information about the last sequence which is different from what is in the test file. In test file I can get all the information for all the sequences. 2)Is it possible directly to use Bio::Tools:: Primer3 to print out selective information such as the primer sequence and the size of PCR product? Or do I have parse the file by myself? After I get all these information I would like to post the script for bacth-designing PCR primers. Thanks, Li --- "Wiersma, Paul" wrote: > Hi Li, > > The line "my $result = $primer3->run" is already in > the code you submitted. In the Bio::Tools::Primer3 > module the author uses "$p3" for the object. If you > change your line to "my $p3 = $primer3->run" you > should be able to run the examples below. Process > the results for each sequence and output the results > before looping to the next sequence. > > >From Bio::Tools::Primer3.pm: > > # how many results were there? > my $num=$p3->number_of_results; > print "There were $num results\n"; > > # get all the results > my $all_results=$p3->all_results; > print "ALL the results\n"; > foreach my $key (keys %{$all_results}) {print > "$key\t${$all_results}{$key}\n"} > > # get specific results > my $result1=$p3->primer_results(1); > print "The first primer is\n"; > foreach my $key (keys %{$result1}) {print > "$key\t${$result1}{$key}\n"} > > Paul > > Paul A. Wiersma > Agriculture and Agri-Food Canada/Agriculture et > Agroalimentaire Canada > Summerland, BC > wiersmap at agr.gc.ca > > ? > > > > -----Original Message----- > From: chen li [mailto:chen_li3 at yahoo.com] > Sent: Monday, May 08, 2006 8:32 PM > To: Wiersma, Paul > Subject: Re: [Bioperl-l] use primer3 to design > primers with multiple sequences > > Hi Paul, > > I read both documents. What I understand is that > Bio:Tools::Run:Primer3 is for designing primers and > Bio:Tools::Primer3 is for parsing the results. When > I > read the documents I do not see this line > $result = $primer3->run in Bio:Tools::Primer3. I > wonder how you get this infomration. > > Thanks, > > Li > > --- "Wiersma, Paul" wrote: > > > Hi Li, > > > > > > > > When you execute $primer3->run with a > > Bio::Tools::Run::Primer3 object it > > opens -outfile=>"filename" for writing and then > > closes. That's why > > putting it in a loop will overwrite your output > file > > each time so you > > only see the last one. I suppose you could read > in > > each output file > > before looping to the next seq and append it to > > another file. > > > > > > > > If you're doing a fair bit of work with this > module > > it would be worth > > looking at the Bio::Tools::Primer3 module. The > > statement $result = > > $primer3->run produces a Bio::Tools::Primer3 > object > > which has all the > > methods you need for customizing your output. > > > > > > > > Paul > > > > > > > > Paul A. Wiersma > > Agriculture and Agri-Food Canada/Agriculture et > > Agroalimentaire Canada > > Summerland, BC > > > > wiersmap at agr.gc.ca > > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam > protection around > http://mail.yahoo.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cjfields at uiuc.edu Tue May 9 17:13:43 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 9 May 2006 16:13:43 -0500 Subject: [Bioperl-l] Oddness in Bio::SeqIO Message-ID: <000601c673ad$74601c30$15327e82@pyrimidine> I noticed an odd thing with SeqIO parsing of species lines (those problematic bacterial tax names again). I have a simple script that runs output to STDOUT to generate a list of hits. Here's what I get: Bacterium: Corynebacterium glutamicum ATCC 13032 hits: 4 Bacterium: Corynebacterium jeikeium K411 K411 <-- hits: 1 Bacterium: Frankia sp. CcI3 CcI3 <-- hits: 1 Bacterium: Frankia sp. EAN1pec EAN1pec <-- hits: 1 Bacterium: Janibacter sp. HTCC2649 HTCC2649 <-- hits: 1 Bacterium: Kineococcus radiotolerans SRS30216 SRS30216 <-- hits: 1 Bacterium: Leifsonia xyli subsp. xyli str. CTCB07 xyli str. CTCB07 <-- hits: 1 Bacterium: Mycobacterium avium subsp. paratuberculosis K-10 paratuberculosis K-10 <-- ... Most (but not all) of the strain numbers get repeated (marked with arrows). This is actually in the GenBank file itself, downloaded via Bio::DB::GenBank (and thus passed through Bio::SeqIO). Anyone seen this before? Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From torsten.seemann at infotech.monash.edu.au Tue May 9 19:42:29 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Wed, 10 May 2006 09:42:29 +1000 Subject: [Bioperl-l] Oddness in Bio::SeqIO In-Reply-To: <000601c673ad$74601c30$15327e82@pyrimidine> References: <000601c673ad$74601c30$15327e82@pyrimidine> Message-ID: <446128E5.1000908@infotech.monash.edu.au> Chris, > I noticed an odd thing with SeqIO parsing of species lines (those > problematic bacterial tax names again). I have a simple script that runs > output to STDOUT to generate a list of hits. Here's what I get: > Bacterium: Mycobacterium avium subsp. paratuberculosis K-10 paratuberculosis > K-10 <-- In this case, Genus = Mycobacterium Species = avium Subspecies = paratuberculosis Strain = K-10 which suggests that BioPerl is trying to handle something special, because the 'subsp.' is gone? Here's the pertinent parts of the Genbank file (apologies for the wrapping): LOCUS NC_002944 4829781 bp DNA circular BCT 18-JAN-2006 DEFINITION Mycobacterium avium subsp. paratuberculosis K-10, complete genome. SOURCE Mycobacterium avium subsp. paratuberculosis K-10 ORGANISM Mycobacterium avium subsp. paratuberculosis K-10 Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales; Corynebacterineae; Mycobacteriaceae; Mycobacterium; Mycobacterium avium complex (MAC). /organism="Mycobacterium avium subsp. paratuberculosis K-10" /strain="K-10" /sub_species="paratuberculosis" > Most (but not all) of the strain numbers get repeated (marked with arrows). > This is actually in the GenBank file itself, downloaded via Bio::DB::GenBank > (and thus passed through Bio::SeqIO). Anyone seen this before? The problem is mentioned in the wiki so it must have come up before? http://bioperl.org/wiki/Project_priority_list#Taxonomy_.2F_Species_data I also deal with Bacteria mainly, and should also look into this. I haven't been using the genbank headers directly, only the features, so i never came across this. Another thing which may crop up is when no Species has been allocated yet but the genus is known (or something like that). In that case the name is written as "Genus spp." eg. Gallibacterium spp. --Torsten From chen_li3 at yahoo.com Tue May 9 21:04:08 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 9 May 2006 18:04:08 -0700 (PDT) Subject: [Bioperl-l] use primer3 to design primers with multiple sequences In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C47@onncrxms5.agr.gc.ca> Message-ID: <20060510010408.24494.qmail@web36804.mail.mud.yahoo.com> Hi Paul, Thank you very much. Just like you point out in your lastest email I now figure out the line "my $result1=$results->primer_results(1);" returns a hash reference containing all the information for the first pair of primer. 1)Since it is a hash I should be able to get the specific value for its corresponding key by telling Perl which key is the entry for the value. 2) Also it is a reference I should deference it to get the so-called true value. I don't know too much OO and Perl and your code looks a little bit complicated to me. But I get the job done by adding the following lines directly: ############################################### #from Primer3 module to get all the infomration #foreach my $key (sort keys %{$result1}) { #print "$key\t${$result1}{$key}\n"} ################################################## #get the value for the key in the hash reference my $key_PRIMER_LEFT_SEQUENCE='PRIMER_LEFT_SEQUENCE'; print "$key_PRIMER_LEFT_SEQUENCE\t${$result1}{$key_PRIMER_LEFT_SEQUENCE}\n"; There is one point I don't understand: When I add these two lines into my code (line 49 in my code) my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID'; print "$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n"; I don't get the PRIMER_SEQUENCE_ID. Perl complains it and says "Use of uninitialized value in concatenation (.) or string at primer3-3 line 49." Li --- "Wiersma, Paul" wrote: > Hi Li, > > Just a bit of clarification of the code that I sent > earlier. > The line "my $result1=$results->primer_results($i);" > gives you a > reference to a hash that contains all of the > information for a primer > pair. > To access the entries you dereference the hash, i.e. > the hash is > %{$result1} and ${$result1}{'PRIMER_PRODUCT_SIZE'} > gives you the entry > for product size. The following are the available > entries. All are > single values or strings except PRIMER_RIGHT and > PRIMER_LEFT which are > start,length pairs (e.g. PRIMER_LEFT => '60,20') > which can be pulled out > with split. > my ($start, $length) = split /,/, > ${$result1}{'PRIMER_LEFT'}; > my $right_Tm = ${$result1}{'PRIMER_RIGHT_TM'} > PRIMER_PRODUCT_SIZE > PRIMER_PAIR_COMPL_ANY > PRIMER_PAIR_COMPL_END > PRIMER_PAIR_PENALTY > > PRIMER_LEFT > PRIMER_LEFT_END_STABILITY > PRIMER_LEFT_PENALTY > PRIMER_LEFT_TM > PRIMER_LEFT_GC_PERCENT > PRIMER_LEFT_SELF_ANY > PRIMER_LEFT_SELF_END > PRIMER_LEFT_SEQUENCE > > PRIMER_RIGHT > PRIMER_RIGHT_END_STABILITY > PRIMER_RIGHT_PENALTY > PRIMER_RIGHT_TM > PRIMER_RIGHT_GC_PERCENT > PRIMER_RIGHT_SELF_ANY > PRIMER_RIGHT_SELF_END > PRIMER_RIGHT_SEQUENCE > > Paul A. Wiersma > Agriculture and Agri-Food Canada/Agriculture et > Agroalimentaire Canada > Summerland, BC > wiersmap at agr.gc.ca > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From zhouyubio at gmail.com Tue May 9 21:35:01 2006 From: zhouyubio at gmail.com (Yu ZHOU) Date: Wed, 10 May 2006 01:35:01 +0000 (UTC) Subject: [Bioperl-l] pubmed References: <6.1.2.0.2.20050331171052.03830ba8@qfdong.mail.iastate.edu> Message-ID: Qunfeng iastate.edu> writes: > > Hi there, > > http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html > > I am not very familiar with BioPerl. I tried to follow the example showing > in the above page to retrieve pubmed ID under each Reference tag , i.e., > $value->pubmed(), but it doesn't work for me for the seq gi#56961711. The > authors() works for me. Appreciate any suggestions. > > Qunfeng > Hi, I have the same problem with you. Here is what I have done, by using regular expression to match the value of 'location' tag, if there is. #------------------ my $ann = $seqobj->annotation(); # annotation object foreach my $ref ( $ann->get_Annotations('reference') ) { print "Title: ", $ref->title,"\n"; print "Location: ", $ref->location, "\n"; if ($ref->location =~ /PUBMED\s+(\d+)/) { my $pmid = $1; print "PMID: ", $pmid, "\n"; } print "Authors: ", $ref->authors, "\n"; } #------------------ From osborne1 at optonline.net Tue May 9 23:01:49 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 09 May 2006 23:01:49 -0400 Subject: [Bioperl-l] pubmed In-Reply-To: Message-ID: Qunfeng, I'm using bioperl-live, I'm able retrieve the single PubMed id found in the 56961711 entry using the pubmed() method. Note that there are 4 references, only one of which has a Pubmed id. Also, the authors() method prints out the authors, not the Pubmed id. If you have a problem please show your code and tell us which version of Bioperl you're using. Brian O. use strict; use lib "/Users/bosborne/bioperl-live"; use Bio::DB::GenBank; my $db = Bio::DB::GenBank->new; my $seq = $db->get_Seq_by_id(56961711); my $ann_coll = $seq->annotation; foreach my $ann ($ann_coll->get_Annotations('reference')) { print "Author: ", $ann->authors, "\nPubmed id: ", $ann->pubmed, "\n"; } On 5/9/06 9:35 PM, "Yu ZHOU" wrote: > Qunfeng iastate.edu> writes: > >> >> Hi there, >> >> http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html >> >> I am not very familiar with BioPerl. I tried to follow the example showing >> in the above page to retrieve pubmed ID under each Reference tag , i.e., >> $value->pubmed(), but it doesn't work for me for the seq gi#56961711. The >> authors() works for me. Appreciate any suggestions. >> >> Qunfeng >> > > > Hi, > > I have the same problem with you. Here is what I have done, by using regular > expression to match the value of 'location' tag, if there is. > > #------------------ > my $ann = $seqobj->annotation(); # annotation object > foreach my $ref ( $ann->get_Annotations('reference') ) { > print "Title: ", $ref->title,"\n"; > print "Location: ", $ref->location, "\n"; > if ($ref->location =~ /PUBMED\s+(\d+)/) { > my $pmid = $1; > print "PMID: ", $pmid, "\n"; > } > print "Authors: ", $ref->authors, "\n"; > } > #------------------ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sb at mrc-dunn.cam.ac.uk Wed May 10 05:30:59 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Wed, 10 May 2006 10:30:59 +0100 Subject: [Bioperl-l] Bio::Taxonomy confusion Message-ID: <4461B2D3.7010603@mrc-dunn.cam.ac.uk> Hi, I'm a little confused as to how names are supposed to work in Bio::Taxonomy::Node. In the bioperl versions that I've looked at a Node doesn't seem to store the most important information about itself - it's scientific name - in an obvious place. bioperl 1.5.1 puts it at the start of the classification list. I'd have thought sticking it in -name would make more sense, but this is used only for the GenBank common name. The Bio::Taxonomy docs still suggests: my $node_species_sapiens = Bio::Taxonomy::Node->new( -object_id => 9606, # or -ncbi_taxid. Requird tag -names => { 'scientific' => ['sapiens'], 'common_name' => ['human'] }, -rank => 'species' # Required tag ); and whilst Bio::Taxonomy::Node does not accept -names, it does have a 'name' method which claims to work like: $obj->name('scientific', 'sapiens'); This kind of thing would be really nice, but afaics Bio::Taxonomy::Node->new takes the -name value and makes a common name out of it, whilst the name() method passes any 'scientific' name to the scientific_name() method which is unable to set any value (and warns about this), only get. It seems like the need to have this classification array work the same way as Bio::Species is causing some unnecessary restrictions. Can't the more sensible idea of having a dedicated storage spot for the ScientificName and other parameters be used, with the classification array either being generated just-in-time from the hash-stored data, or indeed being generated from the Lineage field? Also, why does a node store the complete hierarchy on itself in the classification array? If we're going that far, why don't the Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a get_taxonomy() method instead of a get_Taxonomy_Node() method. get_taxonomy() could, from a single efetch.fcgi lookup, create a complete Bio::Taxonomy with all the nodes. Whilst most nodes would only have a minimum of information, if you could simply ask a node what its rank and scientific name was you could easily build a classification array, or ask what Kingdom your species was in etc. Are there good reasons for Taxonomy working the way it does in 1.5.1, or would I not be wasting my time re-writing things to make more sense (to me)? Cheers, Sendu. From osborne1 at optonline.net Wed May 10 10:33:18 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 10 May 2006 10:33:18 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiple sequences In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C43@onncrxms5.agr.gc.ca> Message-ID: Paul, I took your code, added some "run" code and made it into a script and added this to CVS, examples/tools/run_primer3.pl. I hope this is OK with you. Brian O. On 5/9/06 1:59 PM, "Wiersma, Paul" wrote: > $results->number_of_results From stoltzfu at umbi.umd.edu Tue May 9 16:22:43 2006 From: stoltzfu at umbi.umd.edu (Arlin Stoltzfus) Date: Tue, 09 May 2006 16:22:43 -0400 Subject: [Bioperl-l] proposal: CDAT (character data and trees) integrative object Message-ID: Dear developers-- We propose a Bio::CDAT (Character Data And Trees) module to facilitate comparative analysis using evolutionary methods by 1) managing evolutionary relationships (by linking data to trees) and 2) allowing coordinated analysis of different types of data (by implementing a generic concept of ?character-state? data). Bio::CDAT would take advantage of existing BioPerl objects and would include the functionality of Rutger Vos's Bio::Phylo. It would provide the framework to develop interfaces to analysis tools (phylogeny inference, evolutionary rate models, functional shift inference, etc), as well as to file formats and visualization methods appropriate for such analyses. A proposal is attached. We would like to hear your thoughts (e.g., see the section on "Questions to consider")! Thanks Arlin Stoltzfus WeiGang Qiu Rutger Vos (with thanks to Justin Reese and Aaron Mackey) ------------------ Arlin Stoltzfus (stoltzfu at umbi.umd.edu) CARB, 9600 Gudelsky Drive, Rockville, Maryland 20850 tel 240 314 6208, fax 240 314 6255, www.molevol.org/camel --------- -------------- next part -------------- A non-text attachment was scrubbed... Name: CDAT-proposal.pdf Type: application/pdf Size: 193701 bytes Desc: not available URL: -------------- next part -------------- From zhouyubio at gmail.com Wed May 10 04:55:46 2006 From: zhouyubio at gmail.com (Yu Zhou) Date: Wed, 10 May 2006 16:55:46 +0800 Subject: [Bioperl-l] pubmed In-Reply-To: References: Message-ID: <613ffb490605100155w43a9ea4sca23818bc7fa4e33@mail.gmail.com> Thanks! I am using Bioperl-1.4, not bioperl-live. That may be the reason why it does not work! On 5/10/06, Brian Osborne wrote: > Qunfeng, > > I'm using bioperl-live, I'm able retrieve the single PubMed id found in the > 56961711 entry using the pubmed() method. Note that there are 4 references, > only one of which has a Pubmed id. Also, the authors() method prints out the > authors, not the Pubmed id. If you have a problem please show your code and > tell us which version of Bioperl you're using. > > Brian O. > > > use strict; > > use lib "/Users/bosborne/bioperl-live"; > > use Bio::DB::GenBank; > > > > my $db = Bio::DB::GenBank->new; > > my $seq = $db->get_Seq_by_id(56961711); > > my $ann_coll = $seq->annotation; > > > foreach my $ann ($ann_coll->get_Annotations('reference')) { > > print "Author: ", $ann->authors, "\nPubmed id: ", $ann->pubmed, "\n"; > > } > > > > > > On 5/9/06 9:35 PM, "Yu ZHOU" wrote: > > > Qunfeng iastate.edu> writes: > > > >> > >> Hi there, > >> > >> http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html > >> > >> I am not very familiar with BioPerl. I tried to follow the example > showing > >> in the above page to retrieve pubmed ID under each Reference tag , i.e., > >> $value->pubmed(), but it doesn't work for me for the seq gi#56961711. The > >> authors() works for me. Appreciate any suggestions. > >> > >> Qunfeng > >> > > > > > > Hi, > > > > I have the same problem with you. Here is what I have done, by using > regular > > expression to match the value of 'location' tag, if there is. > > > > #------------------ > > my $ann = $seqobj->annotation(); # annotation object > > foreach my $ref ( $ann->get_Annotations('reference') ) { > > print "Title: ", $ref->title,"\n"; > > print "Location: ", $ref->location, "\n"; > > if ($ref->location =~ /PUBMED\s+(\d+)/) { > > my $pmid = $1; > > print "PMID: ", $pmid, "\n"; > > } > > print "Authors: ", $ref->authors, "\n"; > > } > > #------------------ > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- Best Wishes! Yu From cjfields at uiuc.edu Wed May 10 11:46:27 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 10 May 2006 10:46:27 -0500 Subject: [Bioperl-l] Oddness in Bio::SeqIO In-Reply-To: <446128E5.1000908@infotech.monash.edu.au> Message-ID: <000f01c67448$e63973b0$15327e82@pyrimidine> This actually pops up when using $seq->species->common_name; using $seq->species->binomial chops some of the strain designations off, so really neither one works optimally for bacterial genus-species-strain taxonomy. Hilmar made the suggestion that it's probably best to grab the NCBI TaxID and parse it out that way by looking it up in the taxonomy database (using Bio::DB::Taxonomy), but at the moment that's not what Bio::SeqIO::genbank does. I wonder if we should be trying to shove most of this stuff into species objects directly from the beginning; in other words, maybe we should try to get the information in Bio::Annotation objects and then, after the parsing/IO is finished, have a method to get the information into Bio::Species objects when wanted/needed; a check could be added against the NCBI Taxonomy database there. Anyway, I really haven't looked at how they are parsed out and don't have the time at the moment. I may look into this as well but not until I get back from conference (end of May). Jason and Brian have been calling for a refactoring of Bio::SeqIO::genbank for a while; maybe it's getting time to do something about it... Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > Sent: Tuesday, May 09, 2006 6:42 PM > To: Chris Fields > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Oddness in Bio::SeqIO > > Chris, > > > I noticed an odd thing with SeqIO parsing of species lines (those > > problematic bacterial tax names again). I have a simple script that > runs > > output to STDOUT to generate a list of hits. Here's what I get: > > > Bacterium: Mycobacterium avium subsp. paratuberculosis K-10 > paratuberculosis > > K-10 <-- > > In this case, > > Genus = Mycobacterium > Species = avium > Subspecies = paratuberculosis > Strain = K-10 > > which suggests that BioPerl is trying to handle something special, > because the 'subsp.' is gone? > > Here's the pertinent parts of the Genbank file > (apologies for the wrapping): > > LOCUS NC_002944 4829781 bp DNA circular BCT > 18-JAN-2006 > DEFINITION Mycobacterium avium subsp. paratuberculosis K-10, complete > genome. > SOURCE Mycobacterium avium subsp. paratuberculosis K-10 > ORGANISM Mycobacterium avium subsp. paratuberculosis K-10 > Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales; > Corynebacterineae; Mycobacteriaceae; Mycobacterium; > Mycobacterium > avium complex (MAC). > > /organism="Mycobacterium avium subsp. > paratuberculosis K-10" > /strain="K-10" > /sub_species="paratuberculosis" > > > > Most (but not all) of the strain numbers get repeated (marked with > arrows). > > This is actually in the GenBank file itself, downloaded via > Bio::DB::GenBank > > (and thus passed through Bio::SeqIO). Anyone seen this before? > > The problem is mentioned in the wiki so it must have come up before? > http://bioperl.org/wiki/Project_priority_list#Taxonomy_.2F_Species_data > > I also deal with Bacteria mainly, and should also look into this. I > haven't been using the genbank headers directly, only the features, so i > never came across this. > > Another thing which may crop up is when no Species has been allocated > yet but the genus is known (or something like that). In that case the > name is written as "Genus spp." eg. Gallibacterium spp. > > --Torsten > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cuiw at mail.nih.gov Wed May 10 12:02:55 2006 From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F]) Date: Wed, 10 May 2006 12:02:55 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiplesequences In-Reply-To: <20060510010408.24494.qmail@web36804.mail.mud.yahoo.com> Message-ID: 'PRIMER_SEQUENCE_ID' is not a key in the Bio::Tools::Primer3 output hash. You can find all legal keys by "print keys %{$result1};" There is one point I don't understand: When I add these two lines into my code (line 49 in my code) my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID'; print "$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n"; I don't get the PRIMER_SEQUENCE_ID. Perl complains it and says "Use of uninitialized value in concatenation (.) or string at primer3-3 line 49." Li From WiersmaP at AGR.GC.CA Wed May 10 12:08:37 2006 From: WiersmaP at AGR.GC.CA (Wiersma, Paul) Date: Wed, 10 May 2006 12:08:37 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiple sequences Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C48@onncrxms5.agr.gc.ca> Brian, no problem with the code, thanks for asking. Li, PRIMER_SEQUENCE_ID and SEQUENCE are not part of the individual results but only end up by default with $results->primer_results(0). If you try to access them using $results->primer_results(1) (or anything but 0) you will get an error. Paul Paul A. Wiersma Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Summerland, BC wiersmap at agr.gc.ca ? -----Original Message----- From: chen li [mailto:chen_li3 at yahoo.com] Sent: Tuesday, May 09, 2006 6:04 PM To: Wiersma, Paul Cc: bioperl-l at bioperl.org Subject: RE: [Bioperl-l] use primer3 to design primers with multiple sequences Hi Paul, Thank you very much. Just like you point out in your lastest email I now figure out the line "my $result1=$results->primer_results(1);" returns a hash reference containing all the information for the first pair of primer. 1)Since it is a hash I should be able to get the specific value for its corresponding key by telling Perl which key is the entry for the value. 2) Also it is a reference I should deference it to get the so-called true value. I don't know too much OO and Perl and your code looks a little bit complicated to me. But I get the job done by adding the following lines directly: ############################################### #from Primer3 module to get all the infomration #foreach my $key (sort keys %{$result1}) { #print "$key\t${$result1}{$key}\n"} ################################################## #get the value for the key in the hash reference my $key_PRIMER_LEFT_SEQUENCE='PRIMER_LEFT_SEQUENCE'; print "$key_PRIMER_LEFT_SEQUENCE\t${$result1}{$key_PRIMER_LEFT_SEQUENCE}\n"; There is one point I don't understand: When I add these two lines into my code (line 49 in my code) my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID'; print "$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n"; I don't get the PRIMER_SEQUENCE_ID. Perl complains it and says "Use of uninitialized value in concatenation (.) or string at primer3-3 line 49." Li --- "Wiersma, Paul" wrote: > Hi Li, > > Just a bit of clarification of the code that I sent > earlier. > The line "my $result1=$results->primer_results($i);" > gives you a > reference to a hash that contains all of the > information for a primer > pair. > To access the entries you dereference the hash, i.e. > the hash is > %{$result1} and ${$result1}{'PRIMER_PRODUCT_SIZE'} > gives you the entry > for product size. The following are the available > entries. All are > single values or strings except PRIMER_RIGHT and > PRIMER_LEFT which are > start,length pairs (e.g. PRIMER_LEFT => '60,20') > which can be pulled out > with split. > my ($start, $length) = split /,/, > ${$result1}{'PRIMER_LEFT'}; > my $right_Tm = ${$result1}{'PRIMER_RIGHT_TM'} > PRIMER_PRODUCT_SIZE > PRIMER_PAIR_COMPL_ANY > PRIMER_PAIR_COMPL_END > PRIMER_PAIR_PENALTY > > PRIMER_LEFT > PRIMER_LEFT_END_STABILITY > PRIMER_LEFT_PENALTY > PRIMER_LEFT_TM > PRIMER_LEFT_GC_PERCENT > PRIMER_LEFT_SELF_ANY > PRIMER_LEFT_SELF_END > PRIMER_LEFT_SEQUENCE > > PRIMER_RIGHT > PRIMER_RIGHT_END_STABILITY > PRIMER_RIGHT_PENALTY > PRIMER_RIGHT_TM > PRIMER_RIGHT_GC_PERCENT > PRIMER_RIGHT_SELF_ANY > PRIMER_RIGHT_SELF_END > PRIMER_RIGHT_SEQUENCE > > Paul A. Wiersma > Agriculture and Agri-Food Canada/Agriculture et > Agroalimentaire Canada > Summerland, BC > wiersmap at agr.gc.ca > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cuiw at mail.nih.gov Wed May 10 14:42:36 2006 From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F]) Date: Wed, 10 May 2006 14:42:36 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiplesequences: bug in code! In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C48@onncrxms5.agr.gc.ca> Message-ID: Hope this works! Bio::Tools::Primer3 line 264 should be: $self->{seqobject}=Bio::Seq->new(-seq=>$value, -id=>$id); Then you should be able to display PRIMER_SEQUENCE_ID by ####read primer3 output file############ my $p3=Bio::Tools::Primer3->new(-file=>"data/primer3_output.txt"); ######## print id############### print $p3->seqobject->id; Wenwu Cui, PhD NIH/NCI -----Original Message----- From: Wiersma, Paul [mailto:WiersmaP at agr.gc.ca] Sent: Wednesday, May 10, 2006 12:09 PM To: chen li Cc: bioperl-l at bioperl.org Subject: Re: [Bioperl-l] use primer3 to design primers with multiplesequences Brian, no problem with the code, thanks for asking. Li, PRIMER_SEQUENCE_ID and SEQUENCE are not part of the individual results but only end up by default with $results->primer_results(0). If you try to access them using $results->primer_results(1) (or anything but 0) you will get an error. Paul Paul A. Wiersma Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Summerland, BC wiersmap at agr.gc.ca ? -----Original Message----- From: chen li [mailto:chen_li3 at yahoo.com] Sent: Tuesday, May 09, 2006 6:04 PM To: Wiersma, Paul Cc: bioperl-l at bioperl.org Subject: RE: [Bioperl-l] use primer3 to design primers with multiple sequences Hi Paul, Thank you very much. Just like you point out in your lastest email I now figure out the line "my $result1=$results->primer_results(1);" returns a hash reference containing all the information for the first pair of primer. 1)Since it is a hash I should be able to get the specific value for its corresponding key by telling Perl which key is the entry for the value. 2) Also it is a reference I should deference it to get the so-called true value. I don't know too much OO and Perl and your code looks a little bit complicated to me. But I get the job done by adding the following lines directly: ############################################### #from Primer3 module to get all the infomration #foreach my $key (sort keys %{$result1}) { #print "$key\t${$result1}{$key}\n"} ################################################## #get the value for the key in the hash reference my $key_PRIMER_LEFT_SEQUENCE='PRIMER_LEFT_SEQUENCE'; print "$key_PRIMER_LEFT_SEQUENCE\t${$result1}{$key_PRIMER_LEFT_SEQUENCE}\n"; There is one point I don't understand: When I add these two lines into my code (line 49 in my code) my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID'; print "$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n"; I don't get the PRIMER_SEQUENCE_ID. Perl complains it and says "Use of uninitialized value in concatenation (.) or string at primer3-3 line 49." Li --- "Wiersma, Paul" wrote: > Hi Li, > > Just a bit of clarification of the code that I sent > earlier. > The line "my $result1=$results->primer_results($i);" > gives you a > reference to a hash that contains all of the > information for a primer > pair. > To access the entries you dereference the hash, i.e. > the hash is > %{$result1} and ${$result1}{'PRIMER_PRODUCT_SIZE'} > gives you the entry > for product size. The following are the available > entries. All are > single values or strings except PRIMER_RIGHT and > PRIMER_LEFT which are > start,length pairs (e.g. PRIMER_LEFT => '60,20') > which can be pulled out > with split. > my ($start, $length) = split /,/, > ${$result1}{'PRIMER_LEFT'}; > my $right_Tm = ${$result1}{'PRIMER_RIGHT_TM'} > PRIMER_PRODUCT_SIZE > PRIMER_PAIR_COMPL_ANY > PRIMER_PAIR_COMPL_END > PRIMER_PAIR_PENALTY > > PRIMER_LEFT > PRIMER_LEFT_END_STABILITY > PRIMER_LEFT_PENALTY > PRIMER_LEFT_TM > PRIMER_LEFT_GC_PERCENT > PRIMER_LEFT_SELF_ANY > PRIMER_LEFT_SELF_END > PRIMER_LEFT_SEQUENCE > > PRIMER_RIGHT > PRIMER_RIGHT_END_STABILITY > PRIMER_RIGHT_PENALTY > PRIMER_RIGHT_TM > PRIMER_RIGHT_GC_PERCENT > PRIMER_RIGHT_SELF_ANY > PRIMER_RIGHT_SELF_END > PRIMER_RIGHT_SEQUENCE > > Paul A. Wiersma > Agriculture and Agri-Food Canada/Agriculture et > Agroalimentaire Canada > Summerland, BC > wiersmap at agr.gc.ca > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Wed May 10 14:58:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 10 May 2006 13:58:19 -0500 Subject: [Bioperl-l] ListSummaries for April 26-May 9 Message-ID: <001801c67463$b3c0a910$15327e82@pyrimidine> ListSummaries for April 26-May 9 are up at the usual place: http://www.bioperl.org/wiki/Mailing_list_summaries Direct link: http://www.bioperl.org/wiki/ListSummary:April_26-May_9%2C2006 It's a bit of a hurried one so don't be surprised to find a few spelling errors here and there. I'm getting ready for a conference in a couple weeks so I may be off the radar a bit here and there. The next ListSummary won't be posted until May 26. Enjoy! Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From chen_li3 at yahoo.com Wed May 10 20:27:34 2006 From: chen_li3 at yahoo.com (chen li) Date: Wed, 10 May 2006 17:27:34 -0700 (PDT) Subject: [Bioperl-l] What is the relationship between primer3 module and run-primer3 module? Message-ID: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com> First thank you all for replying my previous post about primer3. But now I am a little confused even after I read the documents: What is the relationship between these two modules? What is correct/standard way to use them to do the batch-primer design? What I do is that I use Bio::Tools::Run::Primer3 to design primers. Based on Dr. Roy Chaudhuri's information I can set the parameters using the following syntax: $primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510'); Based on Paul A. Wiersma's explanation I can also print out part of the primer results(because I don't need all the information). But there is a little trouble: PRIMER_SEQUENCE_ID can't be accessed using this method. And Paul points out that "PRIMER_SEQUENCE_ID and SEQUENCE are not part of the individual results but only end up by default with $results->primer_results(0)". So it seems there is no way to get around this problem using Bio::Tools::Run::Primer3. And others suggest using Bio::Tools::Primer3 to parse the results. So is true that Bio::Tools::Run::Primer3 is for primer design and Bio::Tools::Primer3 is for parsing the results from Bio::Tools::Run::Primer3? But what I find is that I get almost all the results (except PRIMER_SEQUENCE_ID and SEQUENCE ) without providing a line code use Bio::Tools::Primer3 in the script. How to explain this? Is it because the following line code? my $result=$primer3->run; The last question: which line code is used to invoke program primer3.exe? How does Perl script call the primer3.exe? Once again thank you all very much, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From jason.stajich at duke.edu Wed May 10 20:41:31 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed, 10 May 2006 20:41:31 -0400 Subject: [Bioperl-l] What is the relationship between primer3 module and run-primer3 module? In-Reply-To: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com> References: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com> Message-ID: Bio::Tools::Run::XXX modules are for running applications... On May 10, 2006, at 8:27 PM, chen li wrote: > First thank you all for replying my previous post > about primer3. > > But now I am a little confused even after I read the > documents: What is the relationship between these two > modules? What is correct/standard way to use them to > do the batch-primer design? What I do is that I use > Bio::Tools::Run::Primer3 to design primers. Based on > Dr. Roy Chaudhuri's information I can set the > parameters using the following syntax: > > $primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510'); > > Based on Paul A. Wiersma's explanation I can also > print out part of the primer results(because I don't > need all the information). But there is a little > trouble: PRIMER_SEQUENCE_ID can't be accessed using > this method. And Paul points out that > "PRIMER_SEQUENCE_ID and SEQUENCE are not part of the > individual > results but only end up by default with > $results->primer_results(0)". So it seems there is no > way to get around this problem using > Bio::Tools::Run::Primer3. And others suggest using > Bio::Tools::Primer3 to parse the results. So is true > that Bio::Tools::Run::Primer3 is for primer design and > Bio::Tools::Primer3 is for parsing the results from > Bio::Tools::Run::Primer3? But what I find is that I > get almost all the results (except PRIMER_SEQUENCE_ID > and SEQUENCE ) without providing a line code > > use Bio::Tools::Primer3 > > in the script. How to explain this? Is it because the > following line code? > > my $result=$primer3->run; > > The last question: which line code is used to invoke > program primer3.exe? How does Perl script call the > primer3.exe? > > Once again thank you all very much, > > Li > > > > > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From jason.stajich at duke.edu Wed May 10 20:53:43 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed, 10 May 2006 20:53:43 -0400 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: <4461B2D3.7010603@mrc-dunn.cam.ac.uk> References: <4461B2D3.7010603@mrc-dunn.cam.ac.uk> Message-ID: <655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu> I would use the implementation that talks to the flatfile db as the standard here. nodes are defined by the data in from taxonomy dump dbs from ncbi. the eutils is pretty worthless except for taxid->name or reverse, you can't get the full taxonomy (or couldn't when that implementation was written). The "name" method refers to the name of the node - each level in the taxonomy can have a "name". The bits of hackiness relate to wrapping the node object as a Bio::Species and/or being able to read a genbank file and the organism taxonomy data as a list and instantiating. If we could rely on everything being in a DB of course this would be simpler. Another problem is the depth of the taxonomy is not constant for every node so assuming that a fixed number of slots will be filled in to generate the taxonomy leads to problems. Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the best example of working code as this is how I really wanted it to work, the Bio::Species hacks are only there to shoehorn data retrieved from genbank files in. With the flatfile implementation you have to walk all the way up the db hierarchy to get the kingdom for a node so you do have to build up the classification hierarchy as each node only stores data about itsself. I'm not exactly sure what you are proposing to do, but would definitely enjoy another pair of hands, I don't really have time to mess with it any time soon. -jason On May 10, 2006, at 5:30 AM, Sendu Bala wrote: > Hi, > I'm a little confused as to how names are supposed to work in > Bio::Taxonomy::Node. > > In the bioperl versions that I've looked at a Node doesn't seem to > store > the most important information about itself - it's scientific name > - in > an obvious place. bioperl 1.5.1 puts it at the start of the > classification list. I'd have thought sticking it in -name would make > more sense, but this is used only for the GenBank common name. > > The Bio::Taxonomy docs still suggests: > > my $node_species_sapiens = Bio::Taxonomy::Node->new( > -object_id => 9606, # or -ncbi_taxid. Requird tag > -names => { > 'scientific' => ['sapiens'], > 'common_name' => ['human'] > }, > -rank => 'species' # Required tag > ); > > and whilst Bio::Taxonomy::Node does not accept -names, it does have a > 'name' method which claims to work like: > > $obj->name('scientific', 'sapiens'); > > This kind of thing would be really nice, but afaics > Bio::Taxonomy::Node->new takes the -name value and makes a common name > out of it, whilst the name() method passes any 'scientific' name to > the > scientific_name() method which is unable to set any value (and warns > about this), only get. > > It seems like the need to have this classification array work the same > way as Bio::Species is causing some unnecessary restrictions. Can't > the > more sensible idea of having a dedicated storage spot for the > ScientificName and other parameters be used, with the classification > array either being generated just-in-time from the hash-stored > data, or > indeed being generated from the Lineage field? > > > Also, why does a node store the complete hierarchy on itself in the > classification array? If we're going that far, why don't the > Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a > get_taxonomy() method instead of a get_Taxonomy_Node() method. > get_taxonomy() could, from a single efetch.fcgi lookup, create a > complete Bio::Taxonomy with all the nodes. Whilst most nodes would > only > have a minimum of information, if you could simply ask a node what its > rank and scientific name was you could easily build a classification > array, or ask what Kingdom your species was in etc. > > Are there good reasons for Taxonomy working the way it does in > 1.5.1, or > would I not be wasting my time re-writing things to make more sense > (to me)? > > > Cheers, > Sendu. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cuiw at mail.nih.gov Wed May 10 21:46:00 2006 From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F]) Date: Wed, 10 May 2006 21:46:00 -0400 Subject: [Bioperl-l] What is the relationship between primer3 module andrun-primer3 module? References: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com> Message-ID: 1. Bio::Tools::Primer3 is already included in Bio::Tools::Run::Primer3 module so that you can parse the result file. 2. There is a bug in Bio::Toos::Primer3.pm line 264 as I mentioned. Once fixed, it can output PRIMER_SEQUENCE_ID 3. primer3.exe is called in the Bio::Tools::Run::Primer3 "run" function, please read the function definition. ________________________________ From: chen li [mailto:chen_li3 at yahoo.com] Sent: Wed 5/10/2006 8:27 PM To: bioperl-l at bioperl.org Subject: [Bioperl-l] What is the relationship between primer3 module andrun-primer3 module? First thank you all for replying my previous post about primer3. But now I am a little confused even after I read the documents: What is the relationship between these two modules? What is correct/standard way to use them to do the batch-primer design? What I do is that I use Bio::Tools::Run::Primer3 to design primers. Based on Dr. Roy Chaudhuri's information I can set the parameters using the following syntax: $primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510'); Based on Paul A. Wiersma's explanation I can also print out part of the primer results(because I don't need all the information). But there is a little trouble: PRIMER_SEQUENCE_ID can't be accessed using this method. And Paul points out that "PRIMER_SEQUENCE_ID and SEQUENCE are not part of the individual results but only end up by default with $results->primer_results(0)". So it seems there is no way to get around this problem using Bio::Tools::Run::Primer3. And others suggest using Bio::Tools::Primer3 to parse the results. So is true that Bio::Tools::Run::Primer3 is for primer design and Bio::Tools::Primer3 is for parsing the results from Bio::Tools::Run::Primer3? But what I find is that I get almost all the results (except PRIMER_SEQUENCE_ID and SEQUENCE ) without providing a line code use Bio::Tools::Primer3 in the script. How to explain this? Is it because the following line code? my $result=$primer3->run; The last question: which line code is used to invoke program primer3.exe? How does Perl script call the primer3.exe? Once again thank you all very much, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Wed May 10 23:36:39 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 10 May 2006 22:36:39 -0500 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: <655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu> Message-ID: <000301c674ac$1d40f0f0$15327e82@pyrimidine> I think you can get pretty much everything now, though I can definitely see the use of a local database. I ran a few tests, really unrelated to this, using the powerscripting test page at NCBI for eutils (for the curious, at http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was able to retrieve XML-formatted taxonomic information; here's the bacterium Frankia sp. CcI3 TaxID info, which looks like they have everything set up by rank. It gives quite a bit of information. 106370 Frankia sp. CcI3 1854 species Bacteria 11 Bacterial and Plant Plastid 0 Unspecified cellular organisms; Bacteria; Actinobacteria; Actinobacteria (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae; Frankia 131567 cellular organisms no rank 2 Bacteria superkingdom 201174 Actinobacteria phylum 1760 Actinobacteria (class) class 85003 Actinobacteridae subclass 2037 Actinomycetales order 85013 Frankineae suborder 74712 Frankiaceae family 1854 Frankia genus 1999/10/22 2005/01/19 2000/02/02 Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Jason Stajich > Sent: Wednesday, May 10, 2006 7:54 PM > To: Sendu Bala > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::Taxonomy confusion > > I would use the implementation that talks to the flatfile db as the > standard here. nodes are defined by the data in from taxonomy dump > dbs from ncbi. > the eutils is pretty worthless except for taxid->name or reverse, you > can't get the full taxonomy (or couldn't when that implementation was > written). > > The "name" method refers to the name of the node - each level in the > taxonomy can have a "name". > > The bits of hackiness relate to wrapping the node object as a > Bio::Species and/or being able to read a genbank file and the > organism taxonomy data as a list and instantiating. If we could rely > on everything being in a DB of course this would be simpler. > > Another problem is the depth of the taxonomy is not constant for > every node so assuming that a fixed number of slots will be filled in > to generate the taxonomy leads to problems. > > Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the > best example of working code as this is how I really wanted it to > work, the Bio::Species hacks are only there to shoehorn data > retrieved from genbank files in. With the flatfile implementation > you have to walk all the way up the db hierarchy to get the kingdom > for a node so you do have to build up the classification hierarchy as > each node only stores data about itsself. > > I'm not exactly sure what you are proposing to do, but would > definitely enjoy another pair of hands, I don't really have time to > mess with it any time soon. > > -jason > On May 10, 2006, at 5:30 AM, Sendu Bala wrote: > > > Hi, > > I'm a little confused as to how names are supposed to work in > > Bio::Taxonomy::Node. > > > > In the bioperl versions that I've looked at a Node doesn't seem to > > store > > the most important information about itself - it's scientific name > > - in > > an obvious place. bioperl 1.5.1 puts it at the start of the > > classification list. I'd have thought sticking it in -name would make > > more sense, but this is used only for the GenBank common name. > > > > The Bio::Taxonomy docs still suggests: > > > > my $node_species_sapiens = Bio::Taxonomy::Node->new( > > -object_id => 9606, # or -ncbi_taxid. Requird tag > > -names => { > > 'scientific' => ['sapiens'], > > 'common_name' => ['human'] > > }, > > -rank => 'species' # Required tag > > ); > > > > and whilst Bio::Taxonomy::Node does not accept -names, it does have a > > 'name' method which claims to work like: > > > > $obj->name('scientific', 'sapiens'); > > > > This kind of thing would be really nice, but afaics > > Bio::Taxonomy::Node->new takes the -name value and makes a common name > > out of it, whilst the name() method passes any 'scientific' name to > > the > > scientific_name() method which is unable to set any value (and warns > > about this), only get. > > > > It seems like the need to have this classification array work the same > > way as Bio::Species is causing some unnecessary restrictions. Can't > > the > > more sensible idea of having a dedicated storage spot for the > > ScientificName and other parameters be used, with the classification > > array either being generated just-in-time from the hash-stored > > data, or > > indeed being generated from the Lineage field? > > > > > > Also, why does a node store the complete hierarchy on itself in the > > classification array? If we're going that far, why don't the > > Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a > > get_taxonomy() method instead of a get_Taxonomy_Node() method. > > get_taxonomy() could, from a single efetch.fcgi lookup, create a > > complete Bio::Taxonomy with all the nodes. Whilst most nodes would > > only > > have a minimum of information, if you could simply ask a node what its > > rank and scientific name was you could easily build a classification > > array, or ask what Kingdom your species was in etc. > > > > Are there good reasons for Taxonomy working the way it does in > > 1.5.1, or > > would I not be wasting my time re-writing things to make more sense > > (to me)? > > > > > > Cheers, > > Sendu. > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Thu May 11 08:04:54 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu, 11 May 2006 08:04:54 -0400 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: <000301c674ac$1d40f0f0$15327e82@pyrimidine> References: <000301c674ac$1d40f0f0$15327e82@pyrimidine> Message-ID: Great - now we just need someone to volunteer to actually work on this. The current code grabs most of this but I believe expects a different XML On May 10, 2006, at 11:36 PM, Chris Fields wrote: > I think you can get pretty much everything now, though I can > definitely see > the use of a local database. I ran a few tests, really unrelated > to this, > using the powerscripting test page at NCBI for eutils (for the > curious, at > http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was > able to > retrieve XML-formatted taxonomic information; here's the bacterium > Frankia > sp. CcI3 TaxID info, which looks like they have everything set up > by rank. > It gives quite a bit of information. > > > "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd"> > > > > 106370 > Frankia sp. CcI3 > 1854 > species > Bacteria > > 11 > Bacterial and Plant Plastid > > > 0 > Unspecified > > cellular organisms; Bacteria; Actinobacteria; > Actinobacteria > (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae; > Frankia > > > 131567 > cellular organisms > no rank > > > 2 > Bacteria > superkingdom > > > 201174 > Actinobacteria > phylum > > > 1760 > Actinobacteria (class) > class > > > 85003 > Actinobacteridae > subclass > > > 2037 > Actinomycetales > order > > > 85013 > Frankineae > suborder > > > 74712 > Frankiaceae > family > > > 1854 > Frankia > genus > > > 1999/10/22 > 2005/01/19 > 2000/02/02 > > > > Chris > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Jason Stajich >> Sent: Wednesday, May 10, 2006 7:54 PM >> To: Sendu Bala >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion >> >> I would use the implementation that talks to the flatfile db as the >> standard here. nodes are defined by the data in from taxonomy dump >> dbs from ncbi. >> the eutils is pretty worthless except for taxid->name or reverse, you >> can't get the full taxonomy (or couldn't when that implementation was >> written). >> >> The "name" method refers to the name of the node - each level in the >> taxonomy can have a "name". >> >> The bits of hackiness relate to wrapping the node object as a >> Bio::Species and/or being able to read a genbank file and the >> organism taxonomy data as a list and instantiating. If we could rely >> on everything being in a DB of course this would be simpler. >> >> Another problem is the depth of the taxonomy is not constant for >> every node so assuming that a fixed number of slots will be filled in >> to generate the taxonomy leads to problems. >> >> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the >> best example of working code as this is how I really wanted it to >> work, the Bio::Species hacks are only there to shoehorn data >> retrieved from genbank files in. With the flatfile implementation >> you have to walk all the way up the db hierarchy to get the kingdom >> for a node so you do have to build up the classification hierarchy as >> each node only stores data about itsself. >> >> I'm not exactly sure what you are proposing to do, but would >> definitely enjoy another pair of hands, I don't really have time to >> mess with it any time soon. >> >> -jason >> On May 10, 2006, at 5:30 AM, Sendu Bala wrote: >> >>> Hi, >>> I'm a little confused as to how names are supposed to work in >>> Bio::Taxonomy::Node. >>> >>> In the bioperl versions that I've looked at a Node doesn't seem to >>> store >>> the most important information about itself - it's scientific name >>> - in >>> an obvious place. bioperl 1.5.1 puts it at the start of the >>> classification list. I'd have thought sticking it in -name would >>> make >>> more sense, but this is used only for the GenBank common name. >>> >>> The Bio::Taxonomy docs still suggests: >>> >>> my $node_species_sapiens = Bio::Taxonomy::Node->new( >>> -object_id => 9606, # or -ncbi_taxid. Requird tag >>> -names => { >>> 'scientific' => ['sapiens'], >>> 'common_name' => ['human'] >>> }, >>> -rank => 'species' # Required tag >>> ); >>> >>> and whilst Bio::Taxonomy::Node does not accept -names, it does >>> have a >>> 'name' method which claims to work like: >>> >>> $obj->name('scientific', 'sapiens'); >>> >>> This kind of thing would be really nice, but afaics >>> Bio::Taxonomy::Node->new takes the -name value and makes a common >>> name >>> out of it, whilst the name() method passes any 'scientific' name to >>> the >>> scientific_name() method which is unable to set any value (and warns >>> about this), only get. >>> >>> It seems like the need to have this classification array work the >>> same >>> way as Bio::Species is causing some unnecessary restrictions. Can't >>> the >>> more sensible idea of having a dedicated storage spot for the >>> ScientificName and other parameters be used, with the classification >>> array either being generated just-in-time from the hash-stored >>> data, or >>> indeed being generated from the Lineage field? >>> >>> >>> Also, why does a node store the complete hierarchy on itself in the >>> classification array? If we're going that far, why don't the >>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a >>> get_taxonomy() method instead of a get_Taxonomy_Node() method. >>> get_taxonomy() could, from a single efetch.fcgi lookup, create a >>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would >>> only >>> have a minimum of information, if you could simply ask a node >>> what its >>> rank and scientific name was you could easily build a classification >>> array, or ask what Kingdom your species was in etc. >>> >>> Are there good reasons for Taxonomy working the way it does in >>> 1.5.1, or >>> would I not be wasting my time re-writing things to make more sense >>> (to me)? >>> >>> >>> Cheers, >>> Sendu. >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From sb at mrc-dunn.cam.ac.uk Thu May 11 07:51:44 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Thu, 11 May 2006 12:51:44 +0100 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: <655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu> References: <4461B2D3.7010603@mrc-dunn.cam.ac.uk> <655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu> Message-ID: <44632550.3040603@mrc-dunn.cam.ac.uk> Jason Stajich wrote: > I would use the implementation that talks to the flatfile db as the > standard here. nodes are defined by the data in from taxonomy dump > dbs from ncbi. the eutils is pretty worthless except for taxid->name > or reverse, you can't get the full taxonomy (or couldn't when that > implementation was written). I'm not sure what you mean. In 1.5.1 you have access to the full taxonomy because you're using efetch.fcgi. Indeed, you parse the full taxonomy already to get the classification. > The "name" method refers to the name of the node - each level in the > taxonomy can have a "name". Yes, and to me the 'name of the node' is its scientific name (something like 'sapiens'), not a 'common' name. So why is it stored as a 'common' name in the object? Why don't the DB::Taxonomy modules store the actual common names (something like 'human')? > The bits of hackiness relate to wrapping the node object as a > Bio::Species and/or being able to read a genbank file and the > organism taxonomy data as a list and instantiating. If we could rely > on everything being in a DB of course this would be simpler. I think that Taxonomy stuff could be done in a 'pure' way, with a new Bio::Species made as a wrapper around an appropriate Taxonomy module(s) that cheated and made fake nodes from a genbank list and then made a proper Bio::Taxonomy. > With the flatfile implementation you have to walk all the way up the > db hierarchy to get the kingdom for a node so you do have to build up > the classification hierarchy as each node only stores data about > itsself. I'm still actually using bioperl 1.4 but I'm looking at 1.5.1 assuming it is the latest available and I see that the flatfile implementation works the same way as the entrez one. The requested node is fetched, but then internally it walks the hierarchy purely so it can build a classification list which is then stored on the object. If you're already retrieving every node above the the requested node, why not just return every node? Why not just return a whole Bio::Taxonomy? > I'm not exactly sure what you are proposing to do, but would > definitely enjoy another pair of hands, I don't really have time to > mess with it any time soon. I shouldn't really be spending any time on it either, but I knocked up a quick implementation for myself yesterday/today. I'm working on a bunch of modules that inherit from bioperl and then add/alter to suit my needs. In this regard they're a bit limited and kind of hard-coded to my way of thinking, but hopefully you can see my intent and perhaps use some of my implementation. In my implementation: # DB::Taxonomy::* return a Bio::Taxonomy equivalent with a single database lookup. # The Taxonomy is implicitly a tree. # The Taxonomy can have branches of different length from root to the same rank level. # The Taxonomy isn't told what ranks is has (isn't limited by some supplied rank list); it has the ranks that its Nodes have and knows (without being told) what order those ranks should be in. # The Taxonomy is made of Nodes that truly only contain information about themselves and have no classification array or anything like that. # A Node can still be classified. # We can have Nodes of rank 'no rank' that will be correctly ordered in the classification. # Nodes have a scientific name and common names # You get parent and all children nodes without database lookups. # There is a Bio::Species like thing that wraps around this and gives easy access to what I really want to do: my $human = TFBS::Species->new(-common_name => 'human'); my @classification = $human->classification; # returns the array you'd expect from a normally created, fully classified Bio::Species my $kingdom = $human->kingdom # returns 'Metazoa' # For genbank, we can still supply TFBS::Species a classification array http://bix.sendu.me.uk/files/taxonomy_the_tfbs_way.tar.gz (only tested inheriting from bioperl 1.4, but ideally that shouldn't make any difference!) Is there any scope for bioperl Taxonomy becoming more like this? Or are there problems with my design (quite likely!)? Or are there good reasons for maintaining the current way of working? Please feel free to shoot me down/ discuss. Cheers, Sendu. From sb at mrc-dunn.cam.ac.uk Thu May 11 08:22:53 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Thu, 11 May 2006 13:22:53 +0100 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: References: <000301c674ac$1d40f0f0$15327e82@pyrimidine> Message-ID: <44632C9D.4010408@mrc-dunn.cam.ac.uk> Jason Stajich wrote: > Great - now we just need someone to volunteer to actually work on this. Now I'm really confused... > The current code grabs most of this but I believe expects a different XML No, I think the code in bioperl 1.5.1 Bio::DB::Taxonomy::entrez expects that XML, and parses it as fully as flatfile.pm does. Nothing more to do. Weren't you the person that wrote that parser? I parse the same XML in my version of entrez.pm (see my previous email); the main difference being I make Nodes out of each Taxon instead of just adding each Taxon's ScientificName to the classification array. From jason.stajich at duke.edu Thu May 11 09:53:56 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu, 11 May 2006 09:53:56 -0400 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: <44632C9D.4010408@mrc-dunn.cam.ac.uk> References: <000301c674ac$1d40f0f0$15327e82@pyrimidine> <44632C9D.4010408@mrc-dunn.cam.ac.uk> Message-ID: i guess so - long since forgotten what it supports though since I don't regularly use it. sorry. On May 11, 2006, at 8:22 AM, Sendu Bala wrote: > Jason Stajich wrote: >> Great - now we just need someone to volunteer to actually work on >> this. > > Now I'm really confused... > > >> The current code grabs most of this but I believe expects a >> different XML > > No, I think the code in bioperl 1.5.1 Bio::DB::Taxonomy::entrez > expects > that XML, and parses it as fully as flatfile.pm does. Nothing more to > do. Weren't you the person that wrote that parser? > > I parse the same XML in my version of entrez.pm (see my previous > email); > the main difference being I make Nodes out of each Taxon instead of > just > adding each Taxon's ScientificName to the classification array. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cjfields at uiuc.edu Thu May 11 10:57:20 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 11 May 2006 09:57:20 -0500 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: Message-ID: <000b01c6750b$33e95ea0$15327e82@pyrimidine> Heh... To tell the truth, I haven't looked at Bio::DB::Taxonomy in any depth yet, but I myself have seen issues with the way Bio::Species treats bacterial strains (I guess this also involves Bio::Taxonomy::Node since that's what Bio::Species delegates to). Seems it likes to repeat some strain names when using $seq->species->common_name. Not a killer problem but annoying since the correct name is in the source tag in the feature table! I 'could' take a look at it but I can't guarantee quick results. Jason, I could add Taxonomy to the EUtilities overhaul I mentioned to you previously but it'll take awhile to get going. I'm really more interested in getting epost-esearch-efetch sequence retrieval up and running first with the same API as Bio::DB::GenBank/Genpept and Bio::DB::Query::GenBank, donate the code (late summer/fall???) after working out namespace issues so it doesn't conflict with current Bio::DB::WebDBSeqI inheritance. I suppose I could also look at Bio::DB:Taxonomy to see what's up in the next couple of weeks (after conference), unless someone gets to it sooner. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Jason Stajich > Sent: Thursday, May 11, 2006 7:05 AM > To: Chris Fields > Cc: bioperl-l at lists.open-bio.org; 'Sendu Bala' > Subject: Re: [Bioperl-l] Bio::Taxonomy confusion > > Great - now we just need someone to volunteer to actually work on this. > > The current code grabs most of this but I believe expects a different > XML > > > On May 10, 2006, at 11:36 PM, Chris Fields wrote: > > > I think you can get pretty much everything now, though I can > > definitely see > > the use of a local database. I ran a few tests, really unrelated > > to this, > > using the powerscripting test page at NCBI for eutils (for the > > curious, at > > http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was > > able to > > retrieve XML-formatted taxonomic information; here's the bacterium > > Frankia > > sp. CcI3 TaxID info, which looks like they have everything set up > > by rank. > > It gives quite a bit of information. > > > > > > > "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd"> > > > > > > > > 106370 > > Frankia sp. CcI3 > > 1854 > > species > > Bacteria > > > > 11 > > Bacterial and Plant Plastid > > > > > > 0 > > Unspecified > > > > cellular organisms; Bacteria; Actinobacteria; > > Actinobacteria > > (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae; > > Frankia > > > > > > 131567 > > cellular organisms > > no rank > > > > > > 2 > > Bacteria > > superkingdom > > > > > > 201174 > > Actinobacteria > > phylum > > > > > > 1760 > > Actinobacteria (class) > > class > > > > > > 85003 > > Actinobacteridae > > subclass > > > > > > 2037 > > Actinomycetales > > order > > > > > > 85013 > > Frankineae > > suborder > > > > > > 74712 > > Frankiaceae > > family > > > > > > 1854 > > Frankia > > genus > > > > > > 1999/10/22 > > 2005/01/19 > > 2000/02/02 > > > > > > > > Chris > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Jason Stajich > >> Sent: Wednesday, May 10, 2006 7:54 PM > >> To: Sendu Bala > >> Cc: bioperl-l at lists.open-bio.org > >> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion > >> > >> I would use the implementation that talks to the flatfile db as the > >> standard here. nodes are defined by the data in from taxonomy dump > >> dbs from ncbi. > >> the eutils is pretty worthless except for taxid->name or reverse, you > >> can't get the full taxonomy (or couldn't when that implementation was > >> written). > >> > >> The "name" method refers to the name of the node - each level in the > >> taxonomy can have a "name". > >> > >> The bits of hackiness relate to wrapping the node object as a > >> Bio::Species and/or being able to read a genbank file and the > >> organism taxonomy data as a list and instantiating. If we could rely > >> on everything being in a DB of course this would be simpler. > >> > >> Another problem is the depth of the taxonomy is not constant for > >> every node so assuming that a fixed number of slots will be filled in > >> to generate the taxonomy leads to problems. > >> > >> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the > >> best example of working code as this is how I really wanted it to > >> work, the Bio::Species hacks are only there to shoehorn data > >> retrieved from genbank files in. With the flatfile implementation > >> you have to walk all the way up the db hierarchy to get the kingdom > >> for a node so you do have to build up the classification hierarchy as > >> each node only stores data about itsself. > >> > >> I'm not exactly sure what you are proposing to do, but would > >> definitely enjoy another pair of hands, I don't really have time to > >> mess with it any time soon. > >> > >> -jason > >> On May 10, 2006, at 5:30 AM, Sendu Bala wrote: > >> > >>> Hi, > >>> I'm a little confused as to how names are supposed to work in > >>> Bio::Taxonomy::Node. > >>> > >>> In the bioperl versions that I've looked at a Node doesn't seem to > >>> store > >>> the most important information about itself - it's scientific name > >>> - in > >>> an obvious place. bioperl 1.5.1 puts it at the start of the > >>> classification list. I'd have thought sticking it in -name would > >>> make > >>> more sense, but this is used only for the GenBank common name. > >>> > >>> The Bio::Taxonomy docs still suggests: > >>> > >>> my $node_species_sapiens = Bio::Taxonomy::Node->new( > >>> -object_id => 9606, # or -ncbi_taxid. Requird tag > >>> -names => { > >>> 'scientific' => ['sapiens'], > >>> 'common_name' => ['human'] > >>> }, > >>> -rank => 'species' # Required tag > >>> ); > >>> > >>> and whilst Bio::Taxonomy::Node does not accept -names, it does > >>> have a > >>> 'name' method which claims to work like: > >>> > >>> $obj->name('scientific', 'sapiens'); > >>> > >>> This kind of thing would be really nice, but afaics > >>> Bio::Taxonomy::Node->new takes the -name value and makes a common > >>> name > >>> out of it, whilst the name() method passes any 'scientific' name to > >>> the > >>> scientific_name() method which is unable to set any value (and warns > >>> about this), only get. > >>> > >>> It seems like the need to have this classification array work the > >>> same > >>> way as Bio::Species is causing some unnecessary restrictions. Can't > >>> the > >>> more sensible idea of having a dedicated storage spot for the > >>> ScientificName and other parameters be used, with the classification > >>> array either being generated just-in-time from the hash-stored > >>> data, or > >>> indeed being generated from the Lineage field? > >>> > >>> > >>> Also, why does a node store the complete hierarchy on itself in the > >>> classification array? If we're going that far, why don't the > >>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a > >>> get_taxonomy() method instead of a get_Taxonomy_Node() method. > >>> get_taxonomy() could, from a single efetch.fcgi lookup, create a > >>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would > >>> only > >>> have a minimum of information, if you could simply ask a node > >>> what its > >>> rank and scientific name was you could easily build a classification > >>> array, or ask what Kingdom your species was in etc. > >>> > >>> Are there good reasons for Taxonomy working the way it does in > >>> 1.5.1, or > >>> would I not be wasting my time re-writing things to make more sense > >>> (to me)? > >>> > >>> > >>> Cheers, > >>> Sendu. > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> -- > >> Jason Stajich > >> Duke University > >> http://www.duke.edu/~jes12 > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Thu May 11 11:42:07 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu, 11 May 2006 11:42:07 -0400 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: <000b01c6750b$33e95ea0$15327e82@pyrimidine> References: <000b01c6750b$33e95ea0$15327e82@pyrimidine> Message-ID: <0C1C2DAC-F388-465E-B6C2-7217A3B4CC6C@duke.edu> I think you'll see it is different and mostly a limitation of the genbank format and the Bio::Species objects that you get from a genbank parse do represent the full capabilities of a Taxonomy::Node. I am happy for someone to overhaul things, but it all boils down to inferring which part of a list of names is the species versus sub- species versus strain when none of the members of the list are labeled. This is some of the same problems we have for swissprot as well. I just don't think we can do it right only from the genbank file data so I don't see a lot of point of expecting Bio::Species to provide more than a representation of what is in the file and just return that array. It has seemed like we need to special case things pretty heavily or do a lookup in the taxonomydb for something. Can you guess what value is the strain versus sub-species? What happens when there is a two part strain name (space separated) and a sub-species or variety designation? SOURCE Staphylococcus haemolyticus JCSC1435 ORGANISM Staphylococcus haemolyticus JCSC1435 Bacteria; Firmicutes; Bacillales; Staphylococcus. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=279808 strain is JCSC1435 versus SOURCE Muntiacus muntjak vaginalis ORGANISM Muntiacus muntjak vaginalis Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Ruminantia; Pecora; Cervidae; Muntiacinae; Muntiacus. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9887 species is muntjak, sub-species vaginalis ? versus SOURCE Aspergillus nidulans FGSC A4 ORGANISM Aspergillus nidulans FGSC A4 Eukaryota; Fungi; Ascomycota; Pezizomycotina; Eurotiomycetes; Eurotiales; Trichocomaceae; Emericella. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=227321 Genus should be Aspergillus or Emericella ? Strain and subspecies/variety in the same entry SOURCE Cryptococcus neoformans var. grubii H99 ORGANISM Cryptococcus neoformans var. grubii H99 Eukaryota; Fungi; Basidiomycota; Hymenomycetes; Heterobasidiomycetes; Tremellomycetidae; Tremellales; Tremellaceae; Filobasidiella. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=235443 On May 11, 2006, at 10:57 AM, Chris Fields wrote: > Heh... > > To tell the truth, I haven't looked at Bio::DB::Taxonomy in any > depth yet, > but I myself have seen issues with the way Bio::Species treats > bacterial > strains (I guess this also involves Bio::Taxonomy::Node since > that's what > Bio::Species delegates to). Seems it likes to repeat some strain > names when > using $seq->species->common_name. Not a killer problem but > annoying since > the correct name is in the source tag in the feature table! I > 'could' take > a look at it but I can't guarantee quick results. > > Jason, I could add Taxonomy to the EUtilities overhaul I mentioned > to you > previously but it'll take awhile to get going. I'm really more > interested > in getting epost-esearch-efetch sequence retrieval up and running > first with > the same API as Bio::DB::GenBank/Genpept and > Bio::DB::Query::GenBank, donate > the code (late summer/fall???) after working out namespace issues > so it > doesn't conflict with current Bio::DB::WebDBSeqI inheritance. I > suppose I > could also look at Bio::DB:Taxonomy to see what's up in the next > couple of > weeks (after conference), unless someone gets to it sooner. > > Chris > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Jason Stajich >> Sent: Thursday, May 11, 2006 7:05 AM >> To: Chris Fields >> Cc: bioperl-l at lists.open-bio.org; 'Sendu Bala' >> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion >> >> Great - now we just need someone to volunteer to actually work on >> this. >> >> The current code grabs most of this but I believe expects a different >> XML >> >> >> On May 10, 2006, at 11:36 PM, Chris Fields wrote: >> >>> I think you can get pretty much everything now, though I can >>> definitely see >>> the use of a local database. I ran a few tests, really unrelated >>> to this, >>> using the powerscripting test page at NCBI for eutils (for the >>> curious, at >>> http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was >>> able to >>> retrieve XML-formatted taxonomic information; here's the bacterium >>> Frankia >>> sp. CcI3 TaxID info, which looks like they have everything set up >>> by rank. >>> It gives quite a bit of information. >>> >>> >>> >> "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd"> >>> >>> >>> >>> 106370 >>> Frankia sp. CcI3 >>> 1854 >>> species >>> Bacteria >>> >>> 11 >>> Bacterial and Plant Plastid >>> >>> >>> 0 >>> Unspecified >>> >>> cellular organisms; Bacteria; Actinobacteria; >>> Actinobacteria >>> (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae; >>> Frankia >>> >>> >>> 131567 >>> cellular organisms >>> no rank >>> >>> >>> 2 >>> Bacteria >>> superkingdom >>> >>> >>> 201174 >>> Actinobacteria >>> phylum >>> >>> >>> 1760 >>> Actinobacteria (class) >>> class >>> >>> >>> 85003 >>> Actinobacteridae >>> subclass >>> >>> >>> 2037 >>> Actinomycetales >>> order >>> >>> >>> 85013 >>> Frankineae >>> suborder >>> >>> >>> 74712 >>> Frankiaceae >>> family >>> >>> >>> 1854 >>> Frankia >>> genus >>> >>> >>> 1999/10/22 >>> 2005/01/19 >>> 2000/02/02 >>> >>> >>> >>> Chris >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Jason Stajich >>>> Sent: Wednesday, May 10, 2006 7:54 PM >>>> To: Sendu Bala >>>> Cc: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion >>>> >>>> I would use the implementation that talks to the flatfile db as the >>>> standard here. nodes are defined by the data in from taxonomy dump >>>> dbs from ncbi. >>>> the eutils is pretty worthless except for taxid->name or >>>> reverse, you >>>> can't get the full taxonomy (or couldn't when that >>>> implementation was >>>> written). >>>> >>>> The "name" method refers to the name of the node - each level in >>>> the >>>> taxonomy can have a "name". >>>> >>>> The bits of hackiness relate to wrapping the node object as a >>>> Bio::Species and/or being able to read a genbank file and the >>>> organism taxonomy data as a list and instantiating. If we could >>>> rely >>>> on everything being in a DB of course this would be simpler. >>>> >>>> Another problem is the depth of the taxonomy is not constant for >>>> every node so assuming that a fixed number of slots will be >>>> filled in >>>> to generate the taxonomy leads to problems. >>>> >>>> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as >>>> the >>>> best example of working code as this is how I really wanted it to >>>> work, the Bio::Species hacks are only there to shoehorn data >>>> retrieved from genbank files in. With the flatfile implementation >>>> you have to walk all the way up the db hierarchy to get the kingdom >>>> for a node so you do have to build up the classification >>>> hierarchy as >>>> each node only stores data about itsself. >>>> >>>> I'm not exactly sure what you are proposing to do, but would >>>> definitely enjoy another pair of hands, I don't really have time to >>>> mess with it any time soon. >>>> >>>> -jason >>>> On May 10, 2006, at 5:30 AM, Sendu Bala wrote: >>>> >>>>> Hi, >>>>> I'm a little confused as to how names are supposed to work in >>>>> Bio::Taxonomy::Node. >>>>> >>>>> In the bioperl versions that I've looked at a Node doesn't seem to >>>>> store >>>>> the most important information about itself - it's scientific name >>>>> - in >>>>> an obvious place. bioperl 1.5.1 puts it at the start of the >>>>> classification list. I'd have thought sticking it in -name would >>>>> make >>>>> more sense, but this is used only for the GenBank common name. >>>>> >>>>> The Bio::Taxonomy docs still suggests: >>>>> >>>>> my $node_species_sapiens = Bio::Taxonomy::Node->new( >>>>> -object_id => 9606, # or -ncbi_taxid. Requird tag >>>>> -names => { >>>>> 'scientific' => ['sapiens'], >>>>> 'common_name' => ['human'] >>>>> }, >>>>> -rank => 'species' # Required tag >>>>> ); >>>>> >>>>> and whilst Bio::Taxonomy::Node does not accept -names, it does >>>>> have a >>>>> 'name' method which claims to work like: >>>>> >>>>> $obj->name('scientific', 'sapiens'); >>>>> >>>>> This kind of thing would be really nice, but afaics >>>>> Bio::Taxonomy::Node->new takes the -name value and makes a common >>>>> name >>>>> out of it, whilst the name() method passes any 'scientific' >>>>> name to >>>>> the >>>>> scientific_name() method which is unable to set any value (and >>>>> warns >>>>> about this), only get. >>>>> >>>>> It seems like the need to have this classification array work the >>>>> same >>>>> way as Bio::Species is causing some unnecessary restrictions. >>>>> Can't >>>>> the >>>>> more sensible idea of having a dedicated storage spot for the >>>>> ScientificName and other parameters be used, with the >>>>> classification >>>>> array either being generated just-in-time from the hash-stored >>>>> data, or >>>>> indeed being generated from the Lineage field? >>>>> >>>>> >>>>> Also, why does a node store the complete hierarchy on itself in >>>>> the >>>>> classification array? If we're going that far, why don't the >>>>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just >>>>> have a >>>>> get_taxonomy() method instead of a get_Taxonomy_Node() method. >>>>> get_taxonomy() could, from a single efetch.fcgi lookup, create a >>>>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would >>>>> only >>>>> have a minimum of information, if you could simply ask a node >>>>> what its >>>>> rank and scientific name was you could easily build a >>>>> classification >>>>> array, or ask what Kingdom your species was in etc. >>>>> >>>>> Are there good reasons for Taxonomy working the way it does in >>>>> 1.5.1, or >>>>> would I not be wasting my time re-writing things to make more >>>>> sense >>>>> (to me)? >>>>> >>>>> >>>>> Cheers, >>>>> Sendu. >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> Jason Stajich >>>> Duke University >>>> http://www.duke.edu/~jes12 >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From WiersmaP at AGR.GC.CA Thu May 11 13:04:01 2006 From: WiersmaP at AGR.GC.CA (Wiersma, Paul) Date: Thu, 11 May 2006 13:04:01 -0400 Subject: [Bioperl-l] What is the relationship between primer3 moduleandrun-primer3 module? Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C4D@onncrxms5.agr.gc.ca> The bug that Wenwu referred should only occur when reading a Primer3 output file; the Bio::Tools::Run::Primer3->run method takes the results and directly transfers them to a Bio::Tools::Primer3 object without an intermediate file. A Data::Dumper look at the Bio::Tools::Primer3 object shows the keys and results for PRIMER_SEQUENCE_ID and SEQUENCE in 'results' and then again in the 'results_by_number' hash but only in the '0' hash. All of this doesn't really matter for Li's original concern. If you want to include the id of sequence along with the primer3 results just take it from the seq object (i.e. $seq->display_id() ). Since you are in a loop taking one sequence at a time this $seq will be the one that was sent to primer3. PAW Paul A. Wiersma Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Summerland, BC wiersmap at agr.gc.ca ? -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Cui, Wenwu (NIH/NCI) [F] Sent: Wednesday, May 10, 2006 6:46 PM To: chen li; bioperl-l at bioperl.org Subject: Re: [Bioperl-l] What is the relationship between primer3 moduleandrun-primer3 module? 1. Bio::Tools::Primer3 is already included in Bio::Tools::Run::Primer3 module so that you can parse the result file. 2. There is a bug in Bio::Toos::Primer3.pm line 264 as I mentioned. Once fixed, it can output 3. primer3.exe is called in the Bio::Tools::Run::Primer3 "run" function, please read the function definition. From cjfields at uiuc.edu Thu May 11 13:16:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 11 May 2006 12:16:19 -0500 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: <0C1C2DAC-F388-465E-B6C2-7217A3B4CC6C@duke.edu> Message-ID: <000f01c6751e$9e89d6a0$15327e82@pyrimidine> > I think you'll see it is different and mostly a limitation of the > genbank format and the Bio::Species objects that you get from a > genbank parse do represent the full capabilities of a Taxonomy::Node. I definitely see the rational for using a TaxID lookup (I think Hilmar said so as well), especially for local databases. I wonder, though, if there is a way that RichSeqs like GenBank, when passed through SeqIO, can be just be 'short-circuited' using the sequence builder to just accept what's on the SOURCE or ORGANISM line of a file as is, without forcing it into Bio::Species/Bio::Taxonomy::Node. Or maybe diminish the role of the SOURCE/ORGANISM lines altogether to just simple Annotation objects and place much greater emphasis on the TaxID itself, in effect decoupling the TaxID (taxonomic information) from SOURCE/ORGANISM (annotation information). In other words, have GenBank/EMBL classification lines and organism lines essentially stay like they are in the input file (use simple objects). Then, if one were really intent on getting the full name, classification, etc., or one wanted to store their sequences in bioperl-db, they would be required to either have a local db of NCBI Taxonomy or remote access to a similar database (NCBI or something else) so a lookup could be accomplished using the TaxID. If they us BioSQL, then require them to preload their BioSQL database with NCBI's taxonomy, something Hilmar already strongly suggests. If anyone isn't interested in the taxonomic information or doesn't want to bother grabbing the database or setting up remote access, tough luck; just grab the Bio::Annotation/Bio::Species object and use that. As the saying goes, "you can't be all things to all people." At some point you have to throw your arms in the air, do the best you can, but give up trying to please everyone. > I am happy for someone to overhaul things, but it all boils down to > inferring which part of a list of names is the species versus sub- > species versus strain when none of the members of the list are > labeled. This is some of the same problems we have for swissprot as > well. I just don't think we can do it right only from the genbank > file data so I don't see a lot of point of expecting Bio::Species to > provide more than a representation of what is in the file and just > return that array. > > > It has seemed like we need to special case things pretty heavily or > do a lookup in the taxonomydb for something. > > Can you guess what value is the strain versus sub-species? What > happens when there is a two part strain name (space separated) and a > sub-species or variety designation? > > SOURCE Staphylococcus haemolyticus JCSC1435 > ORGANISM Staphylococcus haemolyticus JCSC1435 > Bacteria; Firmicutes; Bacillales; Staphylococcus. > http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=279808 > strain is JCSC1435 > > versus > SOURCE Muntiacus muntjak vaginalis > ORGANISM Muntiacus muntjak vaginalis > Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; > Euteleostomi; > Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; > Ruminantia; > Pecora; Cervidae; Muntiacinae; Muntiacus. > http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9887 > species is muntjak, sub-species vaginalis ? > > versus > SOURCE Aspergillus nidulans FGSC A4 > ORGANISM Aspergillus nidulans FGSC A4 > Eukaryota; Fungi; Ascomycota; Pezizomycotina; > Eurotiomycetes; > Eurotiales; Trichocomaceae; Emericella. > http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=227321 > > Genus should be Aspergillus or Emericella ? > > Strain and subspecies/variety in the same entry > SOURCE Cryptococcus neoformans var. grubii H99 > ORGANISM Cryptococcus neoformans var. grubii H99 > Eukaryota; Fungi; Basidiomycota; Hymenomycetes; > Heterobasidiomycetes; Tremellomycetidae; Tremellales; > Tremellaceae; > Filobasidiella. > http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=235443 Definitely tricky! This really points out the problem here. It used to be a problem for only a few cases but with so many bacterial and fungal genomes that's changed. The Frankia XML example has the scientific name set to "Frankia sp. CcI3", which matches the SOURCE/ORGANISM line in NCBI's GenBank files and the OS line in EMBL files. It looks like the lines are parsed into and then built from the ground-up in Bio::SeqIO::genbank using Bio::Species objects, which, in my case with the strain designation, is where the problem lies. They could be placed in annotation objects with (-tagname=> 'SOURCE', value =>'Frankia sp. CcI3') or similar settings. Or simplify Bio::Species to only represent the information in the GenBank SOURCE/ORGANISM/CLASSIFICATION or EMBL OS/OC lines and nothing more complex than that (no complex taxonomy; for that you use the TaxID and local database). Okay, I need to lay off the coffee now... Chris > On May 11, 2006, at 10:57 AM, Chris Fields wrote: > > > Heh... > > > > To tell the truth, I haven't looked at Bio::DB::Taxonomy in any > > depth yet, > > but I myself have seen issues with the way Bio::Species treats > > bacterial > > strains (I guess this also involves Bio::Taxonomy::Node since > > that's what > > Bio::Species delegates to). Seems it likes to repeat some strain > > names when > > using $seq->species->common_name. Not a killer problem but > > annoying since > > the correct name is in the source tag in the feature table! I > > 'could' take > > a look at it but I can't guarantee quick results. > > > > Jason, I could add Taxonomy to the EUtilities overhaul I mentioned > > to you > > previously but it'll take awhile to get going. I'm really more > > interested > > in getting epost-esearch-efetch sequence retrieval up and running > > first with > > the same API as Bio::DB::GenBank/Genpept and > > Bio::DB::Query::GenBank, donate > > the code (late summer/fall???) after working out namespace issues > > so it > > doesn't conflict with current Bio::DB::WebDBSeqI inheritance. I > > suppose I > > could also look at Bio::DB:Taxonomy to see what's up in the next > > couple of > > weeks (after conference), unless someone gets to it sooner. > > > > Chris > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Jason Stajich > >> Sent: Thursday, May 11, 2006 7:05 AM > >> To: Chris Fields > >> Cc: bioperl-l at lists.open-bio.org; 'Sendu Bala' > >> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion > >> > >> Great - now we just need someone to volunteer to actually work on > >> this. > >> > >> The current code grabs most of this but I believe expects a different > >> XML > >> > >> > >> On May 10, 2006, at 11:36 PM, Chris Fields wrote: > >> > >>> I think you can get pretty much everything now, though I can > >>> definitely see > >>> the use of a local database. I ran a few tests, really unrelated > >>> to this, > >>> using the powerscripting test page at NCBI for eutils (for the > >>> curious, at > >>> http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was > >>> able to > >>> retrieve XML-formatted taxonomic information; here's the bacterium > >>> Frankia > >>> sp. CcI3 TaxID info, which looks like they have everything set up > >>> by rank. > >>> It gives quite a bit of information. > >>> > >>> > >>> >>> "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd"> > >>> > >>> > >>> > >>> 106370 > >>> Frankia sp. CcI3 > >>> 1854 > >>> species > >>> Bacteria > >>> > >>> 11 > >>> Bacterial and Plant Plastid > >>> > >>> > >>> 0 > >>> Unspecified > >>> > >>> cellular organisms; Bacteria; Actinobacteria; > >>> Actinobacteria > >>> (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae; > >>> Frankia > >>> > >>> > >>> 131567 > >>> cellular organisms > >>> no rank > >>> > >>> > >>> 2 > >>> Bacteria > >>> superkingdom > >>> > >>> > >>> 201174 > >>> Actinobacteria > >>> phylum > >>> > >>> > >>> 1760 > >>> Actinobacteria (class) > >>> class > >>> > >>> > >>> 85003 > >>> Actinobacteridae > >>> subclass > >>> > >>> > >>> 2037 > >>> Actinomycetales > >>> order > >>> > >>> > >>> 85013 > >>> Frankineae > >>> suborder > >>> > >>> > >>> 74712 > >>> Frankiaceae > >>> family > >>> > >>> > >>> 1854 > >>> Frankia > >>> genus > >>> > >>> > >>> 1999/10/22 > >>> 2005/01/19 > >>> 2000/02/02 > >>> > >>> > >>> > >>> Chris > >>> > >>>> -----Original Message----- > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>> bounces at lists.open-bio.org] On Behalf Of Jason Stajich > >>>> Sent: Wednesday, May 10, 2006 7:54 PM > >>>> To: Sendu Bala > >>>> Cc: bioperl-l at lists.open-bio.org > >>>> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion > >>>> > >>>> I would use the implementation that talks to the flatfile db as the > >>>> standard here. nodes are defined by the data in from taxonomy dump > >>>> dbs from ncbi. > >>>> the eutils is pretty worthless except for taxid->name or > >>>> reverse, you > >>>> can't get the full taxonomy (or couldn't when that > >>>> implementation was > >>>> written). > >>>> > >>>> The "name" method refers to the name of the node - each level in > >>>> the > >>>> taxonomy can have a "name". > >>>> > >>>> The bits of hackiness relate to wrapping the node object as a > >>>> Bio::Species and/or being able to read a genbank file and the > >>>> organism taxonomy data as a list and instantiating. If we could > >>>> rely > >>>> on everything being in a DB of course this would be simpler. > >>>> > >>>> Another problem is the depth of the taxonomy is not constant for > >>>> every node so assuming that a fixed number of slots will be > >>>> filled in > >>>> to generate the taxonomy leads to problems. > >>>> > >>>> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as > >>>> the > >>>> best example of working code as this is how I really wanted it to > >>>> work, the Bio::Species hacks are only there to shoehorn data > >>>> retrieved from genbank files in. With the flatfile implementation > >>>> you have to walk all the way up the db hierarchy to get the kingdom > >>>> for a node so you do have to build up the classification > >>>> hierarchy as > >>>> each node only stores data about itsself. > >>>> > >>>> I'm not exactly sure what you are proposing to do, but would > >>>> definitely enjoy another pair of hands, I don't really have time to > >>>> mess with it any time soon. > >>>> > >>>> -jason > >>>> On May 10, 2006, at 5:30 AM, Sendu Bala wrote: > >>>> > >>>>> Hi, > >>>>> I'm a little confused as to how names are supposed to work in > >>>>> Bio::Taxonomy::Node. > >>>>> > >>>>> In the bioperl versions that I've looked at a Node doesn't seem to > >>>>> store > >>>>> the most important information about itself - it's scientific name > >>>>> - in > >>>>> an obvious place. bioperl 1.5.1 puts it at the start of the > >>>>> classification list. I'd have thought sticking it in -name would > >>>>> make > >>>>> more sense, but this is used only for the GenBank common name. > >>>>> > >>>>> The Bio::Taxonomy docs still suggests: > >>>>> > >>>>> my $node_species_sapiens = Bio::Taxonomy::Node->new( > >>>>> -object_id => 9606, # or -ncbi_taxid. Requird tag > >>>>> -names => { > >>>>> 'scientific' => ['sapiens'], > >>>>> 'common_name' => ['human'] > >>>>> }, > >>>>> -rank => 'species' # Required tag > >>>>> ); > >>>>> > >>>>> and whilst Bio::Taxonomy::Node does not accept -names, it does > >>>>> have a > >>>>> 'name' method which claims to work like: > >>>>> > >>>>> $obj->name('scientific', 'sapiens'); > >>>>> > >>>>> This kind of thing would be really nice, but afaics > >>>>> Bio::Taxonomy::Node->new takes the -name value and makes a common > >>>>> name > >>>>> out of it, whilst the name() method passes any 'scientific' > >>>>> name to > >>>>> the > >>>>> scientific_name() method which is unable to set any value (and > >>>>> warns > >>>>> about this), only get. > >>>>> > >>>>> It seems like the need to have this classification array work the > >>>>> same > >>>>> way as Bio::Species is causing some unnecessary restrictions. > >>>>> Can't > >>>>> the > >>>>> more sensible idea of having a dedicated storage spot for the > >>>>> ScientificName and other parameters be used, with the > >>>>> classification > >>>>> array either being generated just-in-time from the hash-stored > >>>>> data, or > >>>>> indeed being generated from the Lineage field? > >>>>> > >>>>> > >>>>> Also, why does a node store the complete hierarchy on itself in > >>>>> the > >>>>> classification array? If we're going that far, why don't the > >>>>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just > >>>>> have a > >>>>> get_taxonomy() method instead of a get_Taxonomy_Node() method. > >>>>> get_taxonomy() could, from a single efetch.fcgi lookup, create a > >>>>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would > >>>>> only > >>>>> have a minimum of information, if you could simply ask a node > >>>>> what its > >>>>> rank and scientific name was you could easily build a > >>>>> classification > >>>>> array, or ask what Kingdom your species was in etc. > >>>>> > >>>>> Are there good reasons for Taxonomy working the way it does in > >>>>> 1.5.1, or > >>>>> would I not be wasting my time re-writing things to make more > >>>>> sense > >>>>> (to me)? > >>>>> > >>>>> > >>>>> Cheers, > >>>>> Sendu. > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> -- > >>>> Jason Stajich > >>>> Duke University > >>>> http://www.duke.edu/~jes12 > >>>> > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > >> -- > >> Jason Stajich > >> Duke University > >> http://www.duke.edu/~jes12 > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 From WiersmaP at AGR.GC.CA Thu May 11 20:13:12 2006 From: WiersmaP at AGR.GC.CA (Wiersma, Paul) Date: Thu, 11 May 2006 20:13:12 -0400 Subject: [Bioperl-l] What is the relationship between primer3 module andrun-primer3 module? Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C52@onncrxms5.agr.gc.ca> Li, If you are only "a little confused" by the OO concepts in the primer3 modules than you are doing well. To expand a little on Wenwu's explanations. A Bio::Tools::Run:Primer3 object is a "wrapper" around the Primer3 program. All the commands and parameters that Primer3 needs for it to run are collected inside the object. This includes a sequence (which you must supply as a sequence object) and parameters (most of which are already supplied by default but can be changed using the $primer3_object->add_targets method). Then, when everything is set the way you want it you 'run' the Primer3 program by using $primer3_object->run. The "wrapper" collects all the run parameters and sends them off to the Primer3 executable. Primer3 does the analysis and outputs the results to "stdout" in boulder-io format. By redirecting the output (i.e. perl p3run_script.pl > out.txt) you will get the Primer3 output directly in the boulder-io format ('tag'='value') stored in out.txt. Because out.txt is not being closed between each sequence called in the script you get all of the results concatenated in out.txt. However, if you supplied an output filename (-outfile=>$file_out) in the "wrapper", each line of output from Primer3 will be written to $file_out and at the end of Primer3 output the file will be closed. Now if your script loops to another sequence it will open the same outfile again and overwrite. One last important detail for the "wrapper" object. When Primer3 is executed the $primer3_object is designed to return a Bio::Tools::Primer3 object (the code is: my $results_object = $primer3_object->run). $results_object is a Bio::Tools::Primer3 object and contains the results of your Primer3 run as well as having methods for getting at that information. This includes finding out how many primer sets were found and the means to access the primer set results one at a time. It does work as advertised. Because all of the primer sets are based on the same sequence, Primer3 only outputs the SEQUENCE and PRIMER_SEQUENCE_ID one time instead of for each primer set. That is why they only show up in $results_object as if they belonged with the first primer set (set '0') and they are not available for the other primer sets. PAW Paul A. Wiersma Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Summerland, BC wiersmap at agr.gc.ca ? -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chen li Sent: Wednesday, May 10, 2006 5:28 PM To: bioperl-l at bioperl.org Subject: [Bioperl-l] What is the relationship between primer3 module andrun-primer3 module? First thank you all for replying my previous post about primer3. But now I am a little confused even after I read the documents: What is the relationship between these two modules? What is correct/standard way to use them to do the batch-primer design? What I do is that I use Bio::Tools::Run::Primer3 to design primers. Based on Dr. Roy Chaudhuri's information I can set the parameters using the following syntax: $primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510'); Based on Paul A. Wiersma's explanation I can also print out part of the primer results(because I don't need all the information). But there is a little trouble: PRIMER_SEQUENCE_ID can't be accessed using this method. And Paul points out that "PRIMER_SEQUENCE_ID and SEQUENCE are not part of the individual results but only end up by default with $results->primer_results(0)". So it seems there is no way to get around this problem using Bio::Tools::Run::Primer3. And others suggest using Bio::Tools::Primer3 to parse the results. So is true that Bio::Tools::Run::Primer3 is for primer design and Bio::Tools::Primer3 is for parsing the results from Bio::Tools::Run::Primer3? But what I find is that I get almost all the results (except PRIMER_SEQUENCE_ID and SEQUENCE ) without providing a line code use Bio::Tools::Primer3 in the script. How to explain this? Is it because the following line code? my $result=$primer3->run; The last question: which line code is used to invoke program primer3.exe? How does Perl script call the primer3.exe? Once again thank you all very much, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From torsten.seemann at infotech.monash.edu.au Fri May 12 00:29:37 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 12 May 2006 14:29:37 +1000 Subject: [Bioperl-l] Using bioperl to convert gene predictions to gff In-Reply-To: <000301c6698f$b17a4d20$0202a8c0@GosinkFranklin> References: <000301c6698f$b17a4d20$0202a8c0@GosinkFranklin> Message-ID: <44640F31.6090702@infotech.monash.edu.au> Mark, > I'd like to reformat gene predictions from several different programs > (genscan, glimmerhmm, fgenesh) to gff format. I know bioperl can parse the > output from these and other predictors and that it can export into GFF. But > I'm not clear on how to string the two together. > Can anyone point me at any example code? The parser module for the gene predictions generally allow you to iterate through the predicted genes. Each prediction is usually returned as a Bio::SeqFeatureI-derived object. Those objects have a gff_string() method to print them as GFF. So something as simple as this *may* work: use Bio::Tools::Glimmer; my $parser = new Bio::Tools::Glimmer(-file => 'glimmer.out'); while(my $gene = $parser->next_prediction) { print $gene->gff_string; } If you want separate GFF lines for each exon, you'll have to do another loop over $gene->exons() etc each of which are luckily also Bio::SeqFeatures! Or if want to modify some of the GFF columns first, eg. the source tag, just do $gene->source_tag('mynewtag') before printing it. Hope this helps, -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From torsten.seemann at infotech.monash.edu.au Fri May 12 00:36:46 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 12 May 2006 14:36:46 +1000 Subject: [Bioperl-l] Bio::Graphics::Panel imagemap making with Bio::Graphics::Panel In-Reply-To: <5b6410e0605030120q31d1f554mbc4bf104deca48bf@mail.gmail.com> References: <5b6410e0605030120q31d1f554mbc4bf104deca48bf@mail.gmail.com> Message-ID: <446410DE.7070305@infotech.monash.edu.au> Kevin, > I want to create an imagemap of short sequence matches with a longer one > with clickable imagemaps for the short sequences. I figure I can do this > easily enough using the example script for parsing blast output but I need > an example script to understand how to produce the html code for the > imagemap. I can find only rather cryptic references about how this can be > done (see below). The "blastGraphic" project probably has Perl code that could help you. http://www.gmod.org/blastGraphic.shtml It is/was part of the GMOD project. It produces pretty clickable image maps from BLAST reports. Hope it helps, -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From brianjgilmartin at hotmail.com Fri May 12 05:29:15 2006 From: brianjgilmartin at hotmail.com (brian gilmartin) Date: Fri, 12 May 2006 10:29:15 +0100 Subject: [Bioperl-l] (no subject) Message-ID: please remove me from the list _________________________________________________________________ Be the first to hear what's new at MSN - sign up to our free newsletters! http://www.msn.co.uk/newsletters From sb at mrc-dunn.cam.ac.uk Fri May 12 06:24:39 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Fri, 12 May 2006 11:24:39 +0100 Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names Message-ID: <44646267.2000802@mrc-dunn.cam.ac.uk> In bioperl up to at least 1.5.1, when one of the database modules comes across a species rank it does: if ($rank eq 'species') { # get rid of genus from species name (undef,$taxon_name) = split(/\s+/,$taxon_name,2); } However even though true scientific name is usually 'Genus species' in the database, note the 'usually' - sometimes the species is a multiword item that does not include the Genus, so we can't do some simple split and take the second word. The same applies to levels below species, eg. 'Avian erythroblastosis virus' is a variant of the species 'Avian leukosis virus' but 'Avian erythroblastosis virus (strain ES4)' is a variant of that variant... My solution is to just remove whatever is the same between the current rank and the previous rank. Maybe even that's not so perfect, but it must be a lot better than turning the species 'Avian leukosis virus' into the species 'virus' (especially given that the genus here is 'Alpharetrovirus')! # we need to be going root(kingdom) -> leaf (species or lower) order # # we need to be storing untouched versions of the scientific name of # the previous rank ($self->{_last_raw}) # # probably only bother start doing this when we get to genus my $last_raw = $self->{_last_raw} || undef; $self->{_last_raw} = $sci_name; if ($last_raw) { $sci_name =~ s/$last_raw//; $sci_name =~ s/^\s+//; } Are there even more strange species (and lower) names that would still not work well with the above solution? Cheers, Sendu. From s_maheshwari84 at rediffmail.com Fri May 12 09:55:49 2006 From: s_maheshwari84 at rediffmail.com (saurabh maheshwari) Date: 12 May 2006 13:55:49 -0000 Subject: [Bioperl-l] problem help me...........please Message-ID: <20060512135549.27106.qmail@webmail9.rediffmail.com> hello I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD). I am working on protein protein interaction but I am unable to use the protein interaction module i.e. ProteinGraph.pm.. Actially I am facing lots of problem in the programme I have written Please help me since last four months I am not able to solve the same problem.. I am pasting my programe here also I am attaching it also. ...... #!usr/bin/perl use lib "/usr/local/bioxapps/bioperl/library/"; use strict; use Bio::Graph::SimpleGraph; use Bio::Graph::IO; our @ISA=qw( Bio::SeqI); use Bio::Graph::Edge; use Bio::Graph::IO::dip; use Bio::Graph::IO::psi_xml; use Clone qw(clone); use vars qw(@ISA); use Bio::AnnotatableI; use Bio::IdentifiableI; our @ISA = qw(Bio::Graph::SimpleGraph); @ISA = qw(Bio::Graph::IO); our @ISA=qw(Expoerter); use Bio::Graph::ProteinGraph; use Class::AutoClass; use Bio::Graph::SimpleGraph::Traversal; my $graphio = Bio::Graph::IO->new(-file => '/users/saurabh/perl_program/sample1.txt',-format => 'dip'); print "$graphio"; my $graph = $graphio->next_network(); print "$graph->nodes\t"; $graph->remove_dup_edges(); my @un=$graph->unconnected_nodes(); print "\nthe unconnected nodes are =@un"; my @n=$graph->subgraph(); print "\subgraph=@n\n"; #print "Please the protein-id whose clusering coefficient is to be detemined\n"; #my $v=; my $density = $graph->density(); print "\ngraph density=$density\n"; my @graphs = $graph->components(); print "\nno of Connected components=$#graphs\n"; print "\nplease enter the protein-id whom you want to remove from the network\n"; my $no=; $graph->remove_nodes($graph->nodes_by_id($no)); my $count = $graph->edge_count(); print "\nno of edges=$count\n "; my $ncount = $graph->node_count(); print "\nno of nodes=$ncount\n "; print"\nenter the protein whose interactions is to be find "; my $x=; my $node = $graph->nodes_by_id($x); #print " this is $node\n"; my @neighbors = $graph->neighbors($node); print "to check"; print join",",map{$_->object_id()} @neighbors; my @nodes = $graph->nodes(); print "\nno of nodes = @nodes\t\n"; my @hubs; foreach my $nodi (@nodes) { if ($graph->neighbor_count($node) > 10) { push @hubs, $nodi; } } foreach my $r(@hubs) { my @y=@$r; print "the following proteins have > 10 interactors=@y\n"; } #siblingual protein my @edgeref = $graph->articulation_points(); print "no of articulation points=$#edgeref\n"; print "please enter the protein whom you want to check for articulation point \n "; my $nod=; # make pathgen graph my $grap = Bio::Graph::IO->new(-file => 'org.txt',-format => 'dip'); my $gra = $grap->next_network(); $graph->remove_dup_edges(); $graph->union($gra); my @duplicates = $graph->dup_edges(); print "these interactions exist in cere and c.elegan\n=@duplicates"; print "please enter the first protein for identifiaction of shortest path\n"; my $p1=; print "please enter the second protein for identifiaction of shortest path\n"; my $p2=; my @a=$graph->shortest_paths(); print "shortest path=@a\t\n"; with Regards SAURABH MAHESHWARI M.Sc. (BIOINFORMATICS) JAMIA MILLIA ISLAMIA NEW DELHI -------------- next part -------------- A non-text attachment was scrubbed... Name: from.pl Type: application/octet-stream Size: 2723 bytes Desc: not available URL: From chen_li3 at yahoo.com Thu May 11 13:47:33 2006 From: chen_li3 at yahoo.com (chen li) Date: Thu, 11 May 2006 10:47:33 -0700 (PDT) Subject: [Bioperl-l] script for batch-primer design using primer3 module In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C4D@onncrxms5.agr.gc.ca> Message-ID: <20060511174733.68836.qmail@web36812.mail.mud.yahoo.com> Hi all, With the valuable input from many of you I finally come out a script for my personal need: 1)bacth-primer design 2)set some of the parameters instead of using all the default values 3)output only part of the information for the first pair of primers but not all of them(but you can choose) 4)the reults can be exported into excel for my convience. Enclosed are the script and the results tested. I also include some lines about how I figure out which keys/entries are vailable for change.If you don't want the sequence part just add # to comment it. Any comments are welcome. BTW the solution suggested by Dr. Cui and Paul doesn't work for me. Once again thank you very much, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: primer3-5 URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: result1.txt URL: From Marc.Logghe at DEVGEN.com Fri May 12 11:28:55 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Fri, 12 May 2006 17:28:55 +0200 Subject: [Bioperl-l] problem help me...........please Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746DAB@ANTARESIA.be.devgen.com> Hi, What is actually the problem ? Do you have errors ? Is the script not behaving as you expect ? You also might attach the input file sample1.txt so that people can try it. Regards, Marc > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > saurabh maheshwari > Sent: Friday, May 12, 2006 3:56 PM > To: bioperl-l at bioperl.org; s_maheshwari84 > Subject: [Bioperl-l] problem help me...........please > > > hello > I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD). > I am working on protein protein interaction but I am unable > to use the protein interaction module i.e. ProteinGraph.pm.. > Actially I am facing lots of problem in the programme I have > written Please help me since last four months I am not able > to solve the same problem.. > I am pasting my programe here also I am attaching it also. ...... > > #!usr/bin/perl > use lib "/usr/local/bioxapps/bioperl/library/"; > use strict; > use Bio::Graph::SimpleGraph; > use Bio::Graph::IO; > our @ISA=qw( Bio::SeqI); > use Bio::Graph::Edge; > use Bio::Graph::IO::dip; > use Bio::Graph::IO::psi_xml; > use Clone qw(clone); > use vars qw(@ISA); > use Bio::AnnotatableI; > use Bio::IdentifiableI; > our @ISA = qw(Bio::Graph::SimpleGraph); > @ISA = qw(Bio::Graph::IO); > our @ISA=qw(Expoerter); > use Bio::Graph::ProteinGraph; > use Class::AutoClass; > use Bio::Graph::SimpleGraph::Traversal; > > my $graphio = Bio::Graph::IO->new(-file => > '/users/saurabh/perl_program/sample1.txt',-format => 'dip'); > print "$graphio"; > my $graph = $graphio->next_network(); > print "$graph->nodes\t"; > $graph->remove_dup_edges(); > my @un=$graph->unconnected_nodes(); > print "\nthe unconnected nodes are =@un"; my > @n=$graph->subgraph(); print "\subgraph=@n\n"; #print "Please > the protein-id whose clusering coefficient is to be > detemined\n"; #my $v=; my $density = > $graph->density(); print "\ngraph density=$density\n"; my > @graphs = $graph->components(); print "\nno of Connected > components=$#graphs\n"; print "\nplease enter the protein-id > whom you want to remove from the network\n"; my $no=; > $graph->remove_nodes($graph->nodes_by_id($no)); > my $count = $graph->edge_count(); > print "\nno of edges=$count\n "; > my $ncount = $graph->node_count(); > print "\nno of nodes=$ncount\n "; > > print"\nenter the protein whose interactions is to be find > "; my $x=; my $node = $graph->nodes_by_id($x); #print > " this is $node\n"; my @neighbors = $graph->neighbors($node); > print "to check"; print join",",map{$_->object_id()} > @neighbors; my @nodes = $graph->nodes(); print "\nno of nodes > = @nodes\t\n"; my @hubs; foreach my $nodi (@nodes) { > if ($graph->neighbor_count($node) > 10) > { > push @hubs, $nodi; > } > } > > foreach my $r(@hubs) > { > my @y=@$r; > print "the following proteins have > 10 interactors=@y\n"; > } > #siblingual protein > > my @edgeref = $graph->articulation_points(); print "no of > articulation points=$#edgeref\n"; print "please enter the > protein whom you want to check for articulation point \n "; > my $nod=; > # make pathgen graph > my $grap = Bio::Graph::IO->new(-file => 'org.txt',-format > => 'dip'); > my $gra = $grap->next_network(); > $graph->remove_dup_edges(); > $graph->union($gra); > my @duplicates = $graph->dup_edges(); > print "these interactions exist in cere and c.elegan\n=@duplicates"; > print "please enter the first protein for identifiaction of > shortest path\n"; > my $p1=; > print "please enter the second protein for identifiaction > of shortest path\n"; > my $p2=; > > my @a=$graph->shortest_paths(); > print "shortest path=@a\t\n"; > > > > with Regards > > SAURABH MAHESHWARI > > M.Sc. (BIOINFORMATICS) > > JAMIA MILLIA ISLAMIA > > NEW DELHI > From stoltzfu at umbi.umd.edu Fri May 12 11:56:06 2006 From: stoltzfu at umbi.umd.edu (Arlin Stoltzfus) Date: Fri, 12 May 2006 11:56:06 -0400 Subject: [Bioperl-l] proposal: Bio::CDAT (character data and trees) Message-ID: Dear developers-- We propose a Bio::CDAT (Character Data And Trees) module to facilitate comparative analysis using evolutionary methods by 1) managing evolutionary relationships (by linking data to trees) and 2) allowing coordinated analysis of different types of data (by implementing a generic concept of ?character-state? data). Bio::CDAT would leverage existing BioPerl objects and include the functionality of Rutger Vos's Bio::Phylo. It would provide the framework to develop interfaces to analysis tools (phylogeny inference, evolutionary rate models, functional shift inference, etc), as well as to file formats and visualization methods appropriate for such analyses. A proposal is available at http://www.molevol.org/camel/projects/CDAT-proposal.pdf We would like to hear your thoughts (e.g., see the section on "Questions to consider")! Thanks Arlin Stoltzfus WeiGang Qiu Rutger Vos (with thanks to Justin Reese and Aaron Mackey) ------------------ Arlin Stoltzfus (stoltzfu at umbi.umd.edu) CARB, 9600 Gudelsky Drive, Rockville, Maryland 20850 tel 240 314 6208, fax 240 314 6255, www.molevol.org/camel From sdavis2 at mail.nih.gov Fri May 12 11:54:57 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Fri, 12 May 2006 11:54:57 -0400 Subject: [Bioperl-l] problem help me...........please In-Reply-To: <20060512135549.27106.qmail@webmail9.rediffmail.com> Message-ID: On 5/12/06 9:55 AM, "saurabh maheshwari" wrote: > > hello > I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD). > I am working on protein protein interaction but I am unable to use the protein > interaction module i.e. ProteinGraph.pm.. > Actially I am facing lots of problem in the programme I have written Please > help me since last four months I am not able to solve the same problem.. > I am pasting my programe here also I am attaching it also. ...... You haven't really told us what you are trying to do or what problems you are having. Sean From cjfields at uiuc.edu Fri May 12 13:08:11 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 12 May 2006 12:08:11 -0500 Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names In-Reply-To: <44646267.2000802@mrc-dunn.cam.ac.uk> Message-ID: <000f01c675e6$a61bde90$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Friday, May 12, 2006 5:25 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles > species,subspecies/variant names > > In bioperl up to at least 1.5.1, when one of the database modules comes > across a species rank it does: > > if ($rank eq 'species') { > # get rid of genus from species name > (undef,$taxon_name) = split(/\s+/,$taxon_name,2); > } The XML example from NCBI Taxonomy I mentioned previously seems to have everything in the classification, from superkingdom down to species (no strain unfortunately, and I'm nit sure about subspecies); if it's missing the rank then the designation doesn't exist or is tagged as 'no rank'. Like I mentioned before I'm not intimately familiar Bio::Taxonomy, Bio::DB::Taxonomy, or Bio::Species, so I don't have a clue as to how everything is parsed and plugged in to Bio::Taxonomy objects. I do know that XML::Twig is used for parsing through the data so it shouldn't be too hard to change what you want. I haven't tried using Bio::DB::Taxonomy directly yet, but I would have thought that the binomial is just built from the XML twig 'LineageEx' Rank=Genus + Rank=Species, that the genus comes from the tag 'Genus' and species from 'Species', and that the scientific name is from the tag 'ScientificName'. Guess not. > However even though true scientific name is usually 'Genus species' in > the database, note the 'usually' - sometimes the species is a multiword > item that does not include the Genus, so we can't do some simple split > and take the second word. > The same applies to levels below species, eg. 'Avian erythroblastosis > virus' is a variant of the species 'Avian leukosis virus' but 'Avian > erythroblastosis virus (strain ES4)' is a variant of that variant... > > My solution is to just remove whatever is the same between the current > rank and the previous rank. Maybe even that's not so perfect, but it > must be a lot better than turning the species 'Avian leukosis virus' > into the species 'virus' (especially given that the genus here is > 'Alpharetrovirus')! > > # we need to be going root(kingdom) -> leaf (species or lower) order > # > # we need to be storing untouched versions of the scientific name of > # the previous rank ($self->{_last_raw}) > # > # probably only bother start doing this when we get to genus > my $last_raw = $self->{_last_raw} || undef; > $self->{_last_raw} = $sci_name; > if ($last_raw) { > $sci_name =~ s/$last_raw//; > $sci_name =~ s/^\s+//; > } > > Are there even more strange species (and lower) names that would still > not work well with the above solution? I'm don't think taking Genus/Species directly from the scientific name (normally what is in the SOURCE or ORGANISM annotation for GenBank or OS for EMBL) is the best way to go about it since it's really a best guess using regex; Jason pointed out several examples where this falls apart, and being a bacterial man I have found many examples myself. I'm also not sure that forcing a lookup for every TaxID in every sequence every time it's passed through SeqIO is the best way to go either, though I think it should be required for storing sequences. It's a tricky balance. I still think that maybe we should absolve ourselves from using SOURCE/ORGANISM or OS/OC information in GenBank files as anything more than strictly annotation, or reconstruct Bio::Species to maybe a Bio::Annotation::Species object to handle that annotation and either deprecate Bio::Species or separate it completely from any Bio::Taxonomy objects. It would really simplify things. Then, if anyone is interested in taxonomy, either install a local database or use Entrez efetch, and then use Bio::DB::Taxonomy (fixed of course) to grab the TaxID info. Seems like we're running more and more into exceptions to the rule as more genomes are made available. Anyway, using Bio::Species for GenBank is really screwy for bacterial names, so currently I get around BioPerl issues with bacterial names by grabbing the 'source' seqfeature and pulling the 'organism' tag out. But it really shouldn't be that obfuscated, right? Chris > Cheers, > Sendu. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sdavis2 at mail.nih.gov Sat May 13 08:19:21 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Sat, 13 May 2006 08:19:21 -0400 Subject: [Bioperl-l] problem help me...........please In-Reply-To: <20060513041853.16091.qmail@webmail31.rediffmail.com> References: <20060513041853.16091.qmail@webmail31.rediffmail.com> Message-ID: <4465CEC9.2010909@mail.nih.gov> saurabh maheshwari wrote: > > hello > Thanks for your prompt reply. > Actaully I am trying to make a protein interaction graph from a dip > file.But I am not able to do so.In my last mail I have already attached > my program which is giving some error and I am not able troble shot > them.Please help > Thanks I meant that since we don't know what error(s) you are getting, it is really not possible to determine what the problem is. Also, someone else on the list offered to look at your code if you were to privide the input file. I find it helpful to look at this webpage every now and then to remind myself what constitutes a useful question to email lists: http://www.catb.org/~esr/faqs/smart-questions.html Sean > On Fri, 12 May 2006 Sean Davis wrote : > > > > > > > >On 5/12/06 9:55 AM, "saurabh maheshwari" > >wrote: > > > > > > > > hello > > > I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD). > > > I am working on protein protein interaction but I am unable to use > the protein > > > interaction module i.e. ProteinGraph.pm.. > > > Actially I am facing lots of problem in the programme I have > written Please > > > help me since last four months I am not able to solve the same > problem.. > > > I am pasting my programe here also I am attaching it also. ...... > > > >You haven't really told us what you are trying to do or what problems you > >are having. > > > >Sean > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > with Regards > SAURABH MAHESHWARI > M.Sc. (BIOINFORMATICS) > JAMIA MILLIA ISLAMIA > NEW DELHI > > > From s_maheshwari84 at rediffmail.com Sat May 13 01:17:58 2006 From: s_maheshwari84 at rediffmail.com (saurabh maheshwari) Date: 13 May 2006 05:17:58 -0000 Subject: [Bioperl-l] problem help me...........please Message-ID: <20060513051758.4610.qmail@webmail31.rediffmail.com> hello I am very happy to see the prompt reply from the group members.. As you all suggested to attach the required files .. So I have attached all the three file first the input file,secod I have saved the error I was getting into a error file and third the programme file.. Actully in error file I want to know some thing . I am putting here one error line, ## no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) ## what this stand for Second thing I want to get the connected graph as I have. which type of connected grph I explain you by example.. Let there are five object in such a way. A connected to B A connected to C B connected to C D connected to C E connected to A I want to create a whole link in betwwen all five. Please help me I am not getting the result with Regards SAURABH MAHESHWARI M.Sc. (BIOINFORMATICS) JAMIA MILLIA ISLAMIA NEW DELHI -------------- next part -------------- A non-text attachment was scrubbed... Name: sample.dip Type: application/octet-stream Size: 5794 bytes Desc: not available URL: -------------- next part -------------- bash-2.05b$ perl from.pl Bio::Graph::ProteinGraph=HASH(0x1182e70) Bio::Graph::ProteinGraph=HASH(0x1182e70)->nodes the unconnected nodes are =subgraph=Bio::Graph::SimpleGraph=HASH(0x11e2160) graph density=0.00826446280991736 no of Connected components=60 please enter the protein-id whom you want to remove from the network XMECF2 no of edges=61 no of nodes=122 enter the protein whose interactions is to be find XMECF2 XMECF2 interacts with map{->object_id()} no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) Bio::Seq::RichSeq=HASH(0x11d1850 ) Bio::Seq::RichSeq=HASH(0x11bd4c0) Bio::Seq::RichSeq=HASH(0x11c2fd0) Bio::Seq:: RichSeq=HASH(0x11aa7f0) Bio::Seq::RichSeq=HASH(0x1198340) Bio::Seq::RichSeq=HASH (0x11d81a0) Bio::Seq::RichSeq=HASH(0x11ca320) Bio::Seq::RichSeq=HASH(0x11b5e40) Bio::Seq::RichSeq=HASH(0x1190e00) Bio::Seq::RichSeq=HASH(0x11c1350) Bio::Seq::Ri chSeq=HASH(0x11b2e20) Bio::Seq::RichSeq=HASH(0x11cb360) Bio::Seq::RichSeq=HASH(0 x1198250) Bio::Seq::RichSeq=HASH(0x11d0240) Bio::Seq::RichSeq=HASH(0x11c8f20) Bi o::Seq::RichSeq=HASH(0x11b4ef0) Bio::Seq::RichSeq=HASH(0x119f7a0) Bio::Seq::Rich Seq=HASH(0x11c2ee0) Bio::Seq::RichSeq=HASH(0x11dba20) Bio::Seq::RichSeq=HASH(0x1 1e2300) Bio::Seq::RichSeq=HASH(0x11b2f10) Bio::Seq::RichSeq=HASH(0x11b4b90) Bio: :Seq::RichSeq=HASH(0x11d4df0) Bio::Seq::RichSeq=HASH(0x11d4b80) Bio::Seq::RichSe q=HASH(0x11d8e70) Bio::Seq::RichSeq=HASH(0x11a1270) Bio::Seq::RichSeq=HASH(0x11c b5d0) Bio::Seq::RichSeq=HASH(0x11d5cc0) Bio::Seq::RichSeq=HASH(0x11d32a0) Bio::S eq::RichSeq=HASH(0x11b4c80) Bio::Seq::RichSeq=HASH(0x119e0c0) Bio::Seq::RichSeq= HASH(0x11b7ed0) Bio::Seq::RichSeq=HASH(0x11ad490) Bio::Seq::RichSeq=HASH(0x1196e 60) Bio::Seq::RichSeq=HASH(0x119b7f0) Bio::Seq::RichSeq=HASH(0x11cef60) Bio::Seq ::RichSeq=HASH(0x11b7b70) Bio::Seq::RichSeq=HASH(0x11dd330) Bio::Seq::RichSeq=HA SH(0x11da8c0) Bio::Seq::RichSeq=HASH(0x11a9f70) Bio::Seq::RichSeq=HASH(0x119b700 ) Bio::Seq::RichSeq=HASH(0x119a550) Bio::Seq::RichSeq=HASH(0x11ba910) Bio::Seq:: RichSeq=HASH(0x11e0b30) Bio::Seq::RichSeq=HASH(0x11d3030) Bio::Seq::RichSeq=HASH (0x11c62d0) Bio::Seq::RichSeq=HASH(0x11abb20) Bio::Seq::RichSeq=HASH(0x11d5bd0) Bio::Seq::RichSeq=HASH(0x11b03c0) Bio::Seq::RichSeq=HASH(0x119e1b0) Bio::Seq::Ri chSeq=HASH(0x11aa060) Bio::Seq::RichSeq=HASH(0x11a5700) Bio::Seq::RichSeq=HASH(0 x11a81e0) Bio::Seq::RichSeq=HASH(0x1196b00) Bio::Seq::RichSeq=HASH(0x11c1260) Bi o::Seq::RichSeq=HASH(0x11a2800) Bio::Seq::RichSeq=HASH(0x11c63c0) Bio::Seq::Rich Seq=HASH(0x11b60b0) Bio::Seq::RichSeq=HASH(0x11b93b0) Bio::Seq::RichSeq=HASH(0x1 1a4490) Bio::Seq::RichSeq=HASH(0x11ded50) Bio::Seq::RichSeq=HASH(0x11bbcd0) Bio: :Seq::RichSeq=HASH(0x1194780) Bio::Seq::RichSeq=HASH(0x11aedd0) Bio::Seq::RichSe q=HASH(0x11cd300) Bio::Seq::RichSeq=HASH(0x11a14e0) Bio::Seq::RichSeq=HASH(0x11c 4630) Bio::Seq::RichSeq=HASH(0x11a43a0) Bio::Seq::RichSeq=HASH(0x11a80f0) Bio::S eq::RichSeq=HASH(0x11bbbe0) Bio::Seq::RichSeq=HASH(0x11d5960) Bio::Seq::RichSeq= HASH(0x11c8e30) Bio::Seq::RichSeq=HASH(0x11cd3f0) Bio::Seq::RichSeq=HASH(0x11dd4 20) Bio::Seq::RichSeq=HASH(0x11cee70) Bio::Seq::RichSeq=HASH(0x11dbb10) Bio::Seq ::RichSeq=HASH(0x119a460) Bio::Seq::RichSeq=HASH(0x11aaa60) Bio::Seq::RichSeq=HA SH(0x11d1760) Bio::Seq::RichSeq=HASH(0x11cb6c0) Bio::Seq::RichSeq=HASH(0x11c7530 ) Bio::Seq::RichSeq=HASH(0x11deae0) Bio::Seq::RichSeq=HASH(0x11c4720) Bio::Seq:: RichSeq=HASH(0x119f890) Bio::Seq::RichSeq=HASH(0x11a6c40) Bio::Seq::RichSeq=HASH (0x11ad130) Bio::Seq::RichSeq=HASH(0x11e23f0) Bio::Seq::RichSeq=HASH(0x11d2f40) Bio::Seq::RichSeq=HASH(0x1194640) Bio::Seq::RichSeq=HASH(0x11d8f60) Bio::Seq::Ri chSeq=HASH(0x11d0150) Bio::Seq::RichSeq=HASH(0x119d070) Bio::Seq::RichSeq=HASH(0 x11a5610) Bio::Seq::RichSeq=HASH(0x11aa2d0) Bio::Seq::RichSeq=HASH(0x11b94a0) Bi o::Seq::RichSeq=HASH(0x11bd5b0) Bio::Seq::RichSeq=HASH(0x11c0ff0) Bio::Seq::Rich Seq=HASH(0x11a6b50) Bio::Seq::RichSeq=HASH(0x119cf80) Bio::Seq::RichSeq=HASH(0x1 1baa00) Bio::Seq::RichSeq=HASH(0x11c7620) Bio::Seq::RichSeq=HASH(0x119fb00) Bio: :Seq::RichSeq=HASH(0x11a2a70) Bio::Seq::RichSeq=HASH(0x11b1960) Bio::Seq::RichSe q=HASH(0x11ab8b0) Bio::Seq::RichSeq=HASH(0x11e0c20) Bio::Seq::RichSeq=HASH(0x11a d3a0) Bio::Seq::RichSeq=HASH(0x1197fe0) Bio::Seq::RichSeq=HASH(0x11b1870) Bio::S eq::RichSeq=HASH(0x11a2b60) Bio::Seq::RichSeq=HASH(0x1192750) Bio::Seq::RichSeq= HASH(0x11c9190) Bio::Seq::RichSeq=HASH(0x11e08c0) Bio::Seq::RichSeq=HASH(0x11dd6 90) Bio::Seq::RichSeq=HASH(0x11da7d0) Bio::Seq::RichSeq=HASH(0x11aece0) Bio::Seq ::RichSeq=HASH(0x11d80b0) Bio::Seq::RichSeq=HASH(0x11ca0b0) Bio::Seq::RichSeq=HA SH(0x1196bf0) Bio::Seq::RichSeq=HASH(0x11b7de0) Bio::Seq::RichSeq=HASH(0x11b02d0 ) Can't call method "isa" on an undefined value at /usr/local/bioxapps/bioperl/lib rary//Bio/Graph/ProteinGraph.pm line 477, line 2. -------------- next part -------------- A non-text attachment was scrubbed... Name: from.pl Type: application/octet-stream Size: 2723 bytes Desc: not available URL: From cjfields at uiuc.edu Sat May 13 14:18:53 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 13 May 2006 13:18:53 -0500 Subject: [Bioperl-l] problem help me...........please In-Reply-To: <20060513051758.4610.qmail@webmail31.rediffmail.com> Message-ID: <000901c676b9$b14479c0$15327e82@pyrimidine> I really hate to break the bad news here, but I'm going to be brutally honest. I have not looked at any of the Bio::Graph modules and have no idea how they are implemented, and I haven't looked at your input file, but I can tell right off the bat your script has major logic problems. I can also pretty much tell that you don't understand the object model we use here, at all. This is why I say that (from your last response): > ## no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) ## > what this stand for Did you cut and paste from several other scripts hoping that it would work? I say that b/c you mix styles quite frequently here, using objects correctly (deref'ing with '->') and incorrectly (print "$object"). You also declare (and redeclare) @ISA four times for a script (not needed unless you're declaring a class and inheriting methods from other modules). You also use @ISA once with a misspelled module name (I don't think there is a module named 'Expoerter'). So, I'm actually stunned that the script doesn't crash at all. Yikes! Okay, brutal honesty time over. Any time you see something like this: Bio::Graph::ProteinGraph=HASH(0x1182e70) means that what you are printing out is an reference to an object (it refers to the object class and the location in memory) and is NOT what you want. You should be doing something along the lines of $object->method, not 'print $object', to get at the object data and methods. You use this several times in your script already; that should be a big hint as the areas where it doesn't work do not use this syntax. Read the documentation for the many varied modules you use in your script. Look at script examples. Start simply, then work your way up. Also, using the '->' dereferencing operator inside double quotes doesn't work; you have to do something like: print $graph->nodes,"\t"; not print "$graph->nodes\t"; That's why you get this in your output: Bio::Graph::ProteinGraph=HASH(0x1182e70)->nodes Which just prints the object reference with the string '->nodes'. If any of what I just said doesn't make any sense, you really need to pick up 'Learning Perl' and 'Intermediate Perl' by Schwartz et al and 'Programming Perl' by Wall et al. I don't know if anyone can really help at this point w/o completely writing the script for you. We will fix problems to a point but we, for the most part, will not do your work for you. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of saurabh maheshwari > Sent: Saturday, May 13, 2006 12:18 AM > To: bioperl_l > Subject: [Bioperl-l] problem help me...........please > > > hello > I am very happy to see the prompt reply from the group members.. > As you all suggested to attach the required files .. > So I have attached all the three file first the input file,secod I have > saved the error I was getting into a error file and third the programme > file.. > Actully in error file I want to know some thing . > I am putting here one error line, > ## no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) ## > what this stand for > Second thing I want to get the connected graph as I have. > which type of connected grph I explain you by example.. > Let there are five object in such a way. > A connected to B > A connected to C > B connected to C > D connected to C > E connected to A > I want to create a whole link in betwwen all five. > > > Please help me I am not getting the result > > > with Regards > > SAURABH MAHESHWARI > > M.Sc. (BIOINFORMATICS) > > JAMIA MILLIA ISLAMIA > > NEW DELHI From hubert.prielinger at gmx.at Sat May 13 23:45:58 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Sat, 13 May 2006 21:45:58 -0600 Subject: [Bioperl-l] parsing output files from other tools Message-ID: <4466A7F6.30204@gmx.at> hi, Is it possible to parse text outputfiles rather than blast output files, like the text outputfiles form the search tool mpSrch that is offered by EBI, because the WU Blast output files are possible to parse with bioperl. thanks Hubert From arareko at campus.iztacala.unam.mx Sun May 14 00:09:35 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Sat, 13 May 2006 23:09:35 -0500 Subject: [Bioperl-l] Deobfuscator interface now available Message-ID: <4466AD7F.6050700@campus.iztacala.unam.mx> I'm glad to announce the availability of the Deobfuscator interface at the BioPerl website. You can use it at the following URL: http://bioperl.org/cgi-bin/deob_interface.cgi Many thanks to Laura Kavanaugh and David Messina for this great contribution to the BioPerl project! Mauricio. -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Sun May 14 12:18:10 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 14 May 2006 11:18:10 -0500 Subject: [Bioperl-l] parsing output files from other tools In-Reply-To: <4466A7F6.30204@gmx.at> Message-ID: <000301c67772$00b4e4f0$15327e82@pyrimidine> These are the current report types parsed through SearchIO: http://www.bioperl.org/wiki/Module:Bio::SearchIO I don't see mpsrch among them. If you want you could create a new plugin module to parse those reports; the SearchIO HOWTO gives some pointers: http://www.bioperl.org/wiki/HOWTO:SearchIO You can always look at some of the current modules like blast, blastxml, or fasta to get an idea of how it works. Judging by the mpsrch output I'm pretty sure you would have to build a custom plugin for it. A viable alternative: looking through the mail list it looks like mpsrch is a multiprocessor implementation of ssearch, itself an implementation of the Smith-Waterman algorithm for local alignments in the FASTA package of programs: http://www.bioperl.org/wiki/SSEARCH You might be able to use SearchIO::fasta there... Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger > Sent: Saturday, May 13, 2006 10:46 PM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] parsing output files from other tools > > hi, > Is it possible to parse text outputfiles rather than blast output files, > like the text outputfiles form the search tool mpSrch that is offered by > EBI, because the WU Blast output files are possible to parse with bioperl. > > thanks > Hubert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From chen_li3 at yahoo.com Sun May 14 13:14:30 2006 From: chen_li3 at yahoo.com (chen li) Date: Sun, 14 May 2006 10:14:30 -0700 (PDT) Subject: [Bioperl-l] no revcom method in Bio::Seq module? Message-ID: <20060514171430.74846.qmail@web36802.mail.mud.yahoo.com> Hi all, I need to get a reverse-complemenary sequence out of a fasta sequence file. And the Synopsis of Bio::Seq points out I can do like this way: $revcom=$seqobj->revcom(); I use the following script trying to get the job done but it doesn't work. Then I read documentation of Bio::Seq and it looks like it doesn't contain revcom method. Any idea will be appreciated. Li ############################### Here is the code: #!c:/perl/bin/perl.exe use strict; use warnings; use Bio::Seq; use Bio::SeqIO; my $file='c:/perl/local/primer3_1.0.0/src/est.txt'; my $seqIO=Bio::SeqIO->new(-file=>"<$file", -format=>'fasta' ); my $seqobj=$seqIO->next_seq();#create object print "what attributes/keys are available:\n"; for my $key (sort keys %$seqobj){ my $value=$seqobj->{$key}; print "$key\t=>\t$value\n" } # These are the output on the screen #primary_id => gi|54093|emb|X61809.1| #primary_seq => Bio::PrimarySeq=HASH(0x10492848) #based on these results primary_id can get #access right away # as to primary_seq it is an object in #Bio::Primaryseq and it provides the following #methods after reading the documentaion: #new #seq #validate_seq #subseq #length #display_id #accession_number #primary_id #alphabet #desc #can_call_new #id #is_circular #object_id #version #authority #namespace #display_name #description print "primary_id=",$seqobj->primary_id, "\n\n"; print "id=",$seqobj->id, "\n\n"; print "revcom=",$seqobj->revcom,"\n\n"; my $now_time=localtime; print $now_time, "\n\n"; exit; #These are the output on the screen #primary_id=gi|54093|emb|X61809.1| #id=gi|54093|emb|X61809.1 #revcom=Bio::Seq=HASH(0x10493304) #Sun May 14 12:45:20 2006 __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cjfields at uiuc.edu Sun May 14 13:39:50 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 14 May 2006 12:39:50 -0500 Subject: [Bioperl-l] no revcom method in Bio::Seq module? In-Reply-To: <20060514171430.74846.qmail@web36802.mail.mud.yahoo.com> Message-ID: <000401c6777d$66ddb120$15327e82@pyrimidine> This line should give you the hint: #revcom=Bio::Seq=HASH(0x10493304) You're getting an object ref here. The actual way to get the rev. comp on the wiki states '$seq->revcom->seq', not '$seq->revcom'. When I ran your script and change your line to the wiki version I get (using my test seq): what attributes/keys are available: primary_id => test, primary_seq => Bio::PrimarySeq=HASH(0x1d47fe0) primary_id=test, id=test, revcom=GGAACGAGATCTCCATGCCGCGCACCATCGGCCCGGGATGCAGCACGATCGCGCGGTCCGGCAGCATCG CCTGGCGCTTCTCGGACAATCCGTAGCGCACCGAGTACTCACGCGCGGA CGGGAAGAAACTGCCGTTCATGCGTTCGGCCTGCACGCGCAGCATGAGCACCGCGTCGGCCGCGGGCAGTTCGGCG TCCAGGTCATAGGACACGGTCACCGGCCAGTTCTCGACGCCCCTGGGGA GCAGCGTCGGTGGGGACACCAGCACCACCTCGGCCCCGAGGGTGTGCAGCAGCGTCACGTTGGAGCGGGCCACGCG GCTGTGCAGCACGTCGCCGACGATCACCACGCGCTTGCCCTCGACGCTG Sun May 14 17:34:45 2006 Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of chen li > Sent: Sunday, May 14, 2006 12:15 PM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] no revcom method in Bio::Seq module? > > Hi all, > > I need to get a reverse-complemenary sequence out of a > fasta sequence file. And the Synopsis of Bio::Seq > points out I can do like this way: > > $revcom=$seqobj->revcom(); > > I use the following script trying to get the job done > but it doesn't work. Then I read documentation of > Bio::Seq and it looks like it doesn't contain revcom > method. > > Any idea will be appreciated. > > Li > > > ############################### > Here is the code: > > #!c:/perl/bin/perl.exe > use strict; > use warnings; > > use Bio::Seq; > use Bio::SeqIO; > > my $file='c:/perl/local/primer3_1.0.0/src/est.txt'; > > > my $seqIO=Bio::SeqIO->new(-file=>"<$file", > -format=>'fasta' ); > > my $seqobj=$seqIO->next_seq();#create object > > print "what attributes/keys are available:\n"; > for my $key (sort keys %$seqobj){ > my $value=$seqobj->{$key}; > print "$key\t=>\t$value\n" > } > # These are the output on the screen > #primary_id => gi|54093|emb|X61809.1| > #primary_seq => Bio::PrimarySeq=HASH(0x10492848) > > #based on these results primary_id can get > #access right away > # as to primary_seq it is an object in > #Bio::Primaryseq and it provides the following > #methods after reading the documentaion: > #new > #seq > #validate_seq > #subseq > #length > #display_id > #accession_number > #primary_id > #alphabet > #desc > #can_call_new > #id > #is_circular > #object_id > #version > #authority > #namespace > #display_name > #description > > print "primary_id=",$seqobj->primary_id, "\n\n"; > print "id=",$seqobj->id, "\n\n"; > print "revcom=",$seqobj->revcom,"\n\n"; > > my $now_time=localtime; > print $now_time, "\n\n"; > exit; > > #These are the output on the screen > #primary_id=gi|54093|emb|X61809.1| > #id=gi|54093|emb|X61809.1 > #revcom=Bio::Seq=HASH(0x10493304) > #Sun May 14 12:45:20 2006 > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From chen_li3 at yahoo.com Sun May 14 14:08:49 2006 From: chen_li3 at yahoo.com (chen li) Date: Sun, 14 May 2006 11:08:49 -0700 (PDT) Subject: [Bioperl-l] no revcom method in Bio::Seq module? In-Reply-To: <000401c6777d$66ddb120$15327e82@pyrimidine> Message-ID: <20060514180849.55423.qmail@web36808.mail.mud.yahoo.com> Hi Chris, Thank you very much. But could you please give me the link for this syntax: $seq->revcom->seq? Li --- Chris Fields wrote: > This line should give you the hint: > > #revcom=Bio::Seq=HASH(0x10493304) > > You're getting an object ref here. The actual way > to get the rev. comp on > the wiki states '$seq->revcom->seq', not > '$seq->revcom'. > > When I ran your script and change your line to the > wiki version I get (using > my test seq): > > what attributes/keys are available: > primary_id => test, > primary_seq => > Bio::PrimarySeq=HASH(0x1d47fe0) > primary_id=test, > > id=test, > > revcom=GGAACGAGATCTCCATGCCGCGCACCATCGGCCCGGGATGCAGCACGATCGCGCGGTCCGGCAGCATCG > CCTGGCGCTTCTCGGACAATCCGTAGCGCACCGAGTACTCACGCGCGGA > CGGGAAGAAACTGCCGTTCATGCGTTCGGCCTGCACGCGCAGCATGAGCACCGCGTCGGCCGCGGGCAGTTCGGCG > TCCAGGTCATAGGACACGGTCACCGGCCAGTTCTCGACGCCCCTGGGGA > GCAGCGTCGGTGGGGACACCAGCACCACCTCGGCCCCGAGGGTGTGCAGCAGCGTCACGTTGGAGCGGGCCACGCG > GCTGTGCAGCACGTCGCCGACGATCACCACGCGCTTGCCCTCGACGCTG > > Sun May 14 17:34:45 2006 > > Chris > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of chen li > > Sent: Sunday, May 14, 2006 12:15 PM > > To: bioperl-l at bioperl.org > > Subject: [Bioperl-l] no revcom method in Bio::Seq > module? > > > > Hi all, > > > > I need to get a reverse-complemenary sequence out > of a > > fasta sequence file. And the Synopsis of Bio::Seq > > points out I can do like this way: > > > > $revcom=$seqobj->revcom(); > > > > I use the following script trying to get the job > done > > but it doesn't work. Then I read documentation of > > Bio::Seq and it looks like it doesn't contain > revcom > > method. > > > > Any idea will be appreciated. > > > > Li > > > > > > ############################### > > Here is the code: > > > > #!c:/perl/bin/perl.exe > > use strict; > > use warnings; > > > > use Bio::Seq; > > use Bio::SeqIO; > > > > my > $file='c:/perl/local/primer3_1.0.0/src/est.txt'; > > > > > > my $seqIO=Bio::SeqIO->new(-file=>"<$file", > > -format=>'fasta' ); > > > > my $seqobj=$seqIO->next_seq();#create object > > > > print "what attributes/keys are available:\n"; > > for my $key (sort keys %$seqobj){ > > my $value=$seqobj->{$key}; > > print "$key\t=>\t$value\n" > > } > > # These are the output on the screen > > #primary_id => gi|54093|emb|X61809.1| > > #primary_seq => > Bio::PrimarySeq=HASH(0x10492848) > > > > #based on these results primary_id can get > > #access right away > > # as to primary_seq it is an object in > > #Bio::Primaryseq and it provides the following > > #methods after reading the documentaion: > > #new > > #seq > > #validate_seq > > #subseq > > #length > > #display_id > > #accession_number > > #primary_id > > #alphabet > > #desc > > #can_call_new > > #id > > #is_circular > > #object_id > > #version > > #authority > > #namespace > > #display_name > > #description > > > > print "primary_id=",$seqobj->primary_id, "\n\n"; > > print "id=",$seqobj->id, "\n\n"; > > print "revcom=",$seqobj->revcom,"\n\n"; > > > > my $now_time=localtime; > > print $now_time, "\n\n"; > > exit; > > > > #These are the output on the screen > > #primary_id=gi|54093|emb|X61809.1| > > #id=gi|54093|emb|X61809.1 > > #revcom=Bio::Seq=HASH(0x10493304) > > #Sun May 14 12:45:20 2006 > > > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cjfields at uiuc.edu Sun May 14 14:28:14 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Sun, 14 May 2006 13:28:14 -0500 Subject: [Bioperl-l] no revcom method in Bio::Seq module? Message-ID: I think the confusion lies in what revcom returns. This page http://www.bioperl.org/wiki/Getting_Started show a quick way of using revcom, (which I mentioned previously) while this page http://www.bioperl.org/wiki/HOWTO:Beginners explains what is returned when you use revcom. '$seq_obj->revcom' returns a sequence object (not a sequence string): http://www.bioperl.org/wiki/HOWTO:Beginners#The_Sequence_Object which is why you need to use the 'seq' method to get the string. Hence, '$seq_obj->revcom->seq'. Chris ---- Original message ---- >Date: Sun, 14 May 2006 11:08:49 -0700 (PDT) >From: chen li >Subject: RE: [Bioperl-l] no revcom method in Bio::Seq module? >To: Chris Fields >Cc: bioperl-l at bioperl.org > >Hi Chris, > >Thank you very much. But could you please give me the >link for this syntax: $seq->revcom->seq? > >Li > > > >--- Chris Fields wrote: > >> This line should give you the hint: >> >> #revcom=Bio::Seq=HASH(0x10493304) >> >> You're getting an object ref here. The actual way >> to get the rev. comp on >> the wiki states '$seq->revcom->seq', not >> '$seq->revcom'. >> >> When I ran your script and change your line to the >> wiki version I get (using >> my test seq): >> >> what attributes/keys are available: >> primary_id => test, >> primary_seq => >> Bio::PrimarySeq=HASH(0x1d47fe0) >> primary_id=test, >> >> id=test, >> >> >revcom=GGAACGAGATCTCCATGCCGCGCACCATCGGCCCGGGATGCAGCACGAT CGCGCGGTCCGGCAGCATCG >> CCTGGCGCTTCTCGGACAATCCGTAGCGCACCGAGTACTCACGCGCGGA >> >CGGGAAGAAACTGCCGTTCATGCGTTCGGCCTGCACGCGCAGCATGAGCACCGCG TCGGCCGCGGGCAGTTCGGCG >> TCCAGGTCATAGGACACGGTCACCGGCCAGTTCTCGACGCCCCTGGGGA >> >GCAGCGTCGGTGGGGACACCAGCACCACCTCGGCCCCGAGGGTGTGCAGCAGCGT CACGTTGGAGCGGGCCACGCG >> GCTGTGCAGCACGTCGCCGACGATCACCACGCGCTTGCCCTCGACGCTG >> >> Sun May 14 17:34:45 2006 >> >> Chris >> >> > -----Original Message----- >> > From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l- >> > bounces at lists.open-bio.org] On Behalf Of chen li >> > Sent: Sunday, May 14, 2006 12:15 PM >> > To: bioperl-l at bioperl.org >> > Subject: [Bioperl-l] no revcom method in Bio::Seq >> module? >> > >> > Hi all, >> > >> > I need to get a reverse-complemenary sequence out >> of a >> > fasta sequence file. And the Synopsis of Bio::Seq >> > points out I can do like this way: >> > >> > $revcom=$seqobj->revcom(); >> > >> > I use the following script trying to get the job >> done >> > but it doesn't work. Then I read documentation of >> > Bio::Seq and it looks like it doesn't contain >> revcom >> > method. >> > >> > Any idea will be appreciated. >> > >> > Li >> > >> > >> > ############################### >> > Here is the code: >> > >> > #!c:/perl/bin/perl.exe >> > use strict; >> > use warnings; >> > >> > use Bio::Seq; >> > use Bio::SeqIO; >> > >> > my >> $file='c:/perl/local/primer3_1.0.0/src/est.txt'; >> > >> > >> > my $seqIO=Bio::SeqIO->new(-file=>"<$file", >> > -format=>'fasta' ); >> > >> > my $seqobj=$seqIO->next_seq();#create object >> > >> > print "what attributes/keys are available:\n"; >> > for my $key (sort keys %$seqobj){ >> > my $value=$seqobj->{$key}; >> > print "$key\t=>\t$value\n" >> > } >> > # These are the output on the screen >> > #primary_id => gi|54093|emb|X61809.1| >> > #primary_seq => >> Bio::PrimarySeq=HASH(0x10492848) >> > >> > #based on these results primary_id can get >> > #access right away >> > # as to primary_seq it is an object in >> > #Bio::Primaryseq and it provides the following >> > #methods after reading the documentaion: >> > #new >> > #seq >> > #validate_seq >> > #subseq >> > #length >> > #display_id >> > #accession_number >> > #primary_id >> > #alphabet >> > #desc >> > #can_call_new >> > #id >> > #is_circular >> > #object_id >> > #version >> > #authority >> > #namespace >> > #display_name >> > #description >> > >> > print "primary_id=",$seqobj->primary_id, "\n\n"; >> > print "id=",$seqobj->id, "\n\n"; >> > print "revcom=",$seqobj->revcom,"\n\n"; >> > >> > my $now_time=localtime; >> > print $now_time, "\n\n"; >> > exit; >> > >> > #These are the output on the screen >> > #primary_id=gi|54093|emb|X61809.1| >> > #id=gi|54093|emb|X61809.1 >> > #revcom=Bio::Seq=HASH(0x10493304) >> > #Sun May 14 12:45:20 2006 >> > >> > >> > >> > __________________________________________________ >> > Do You Yahoo!? >> > Tired of spam? Yahoo! Mail has the best spam >> protection around >> > http://mail.yahoo.com >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > >__________________________________________________ >Do You Yahoo!? >Tired of spam? Yahoo! Mail has the best spam protection around >http://mail.yahoo.com From Marc.Logghe at DEVGEN.com Sun May 14 16:28:34 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Sun, 14 May 2006 22:28:34 +0200 Subject: [Bioperl-l] no revcom method in Bio::Seq module? Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746DAC@ANTARESIA.be.devgen.com> Hi Li, > doesn't work. Then I read documentation of Bio::Seq and it > looks like it doesn't contain revcom method. Here, the Deobfuscator interface that Mauricio announced earlier, comes in handy. http://bioperl.org/cgi-bin/deob_interface.cgi?Search=Search&module=Bio%3 A%3ASeq&sort_order=by+method&search_string= If you look in the methods table, you will find out that the revcom method is inherited from, and implemented by Bio::PrimarySeqI. HTH, Marc From sb at mrc-dunn.cam.ac.uk Mon May 15 04:18:11 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Mon, 15 May 2006 09:18:11 +0100 Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names In-Reply-To: <000f01c675e6$a61bde90$15327e82@pyrimidine> References: <000f01c675e6$a61bde90$15327e82@pyrimidine> Message-ID: <44683943.5020307@mrc-dunn.cam.ac.uk> Chris Fields wrote: > Sendu Bala wrote: >> In bioperl up to at least 1.5.1, when one of the database modules >> comes across a species rank it does: >> >> if ($rank eq 'species') { # get rid of genus from species name >> (undef,$taxon_name) = split(/\s+/,$taxon_name,2); } > > The XML example from NCBI Taxonomy I mentioned previously seems to > have everything in the classification, from superkingdom down to > species (no strain unfortunately, and I'm nit sure about subspecies); > if it's missing the rank then the designation doesn't exist or is > tagged as 'no rank'. Like I mentioned before I'm not intimately > familiar Bio::Taxonomy, Bio::DB::Taxonomy, or Bio::Species, so I > don't have a clue as to how everything is parsed and plugged in to > Bio::Taxonomy objects. I do know that XML::Twig is used for parsing > through the data so it shouldn't be too hard to change what you > want. Yes, that's all true, but I'm not sure what it has to do with what I was saying. FYI, you do get a 'subspecies' rank but no 'variant' rank. In my own implementation I change the rank of all 'no rank' Nodes below species to 'variant'. > I haven't tried using Bio::DB::Taxonomy directly yet, but I would > have thought that the binomial is just built from the XML twig > 'LineageEx' Rank=Genus + Rank=Species, that the genus comes from the > tag 'Genus' and species from 'Species', and that the scientific name > is from the tag 'ScientificName'. Guess not. No. See above for what it actually does. That is a copy/paste from the code (there, $taxon_name == ScientificName). When it finds a species rank it does that split because in the ncbi taxonomy database the 'genus' rank for a human has a ScientificName of 'Homo', whilst the 'species' rank has a ScientificName of 'Homo sapiens', and the bioperl model (quite rightly, I think) wants the 'species' node to not have information of other nodes (well, except for the classification array). So it removes the 'Homo' from 'Homo sapiens' giving a species name of 'sapiens'. This then allows the binomial method to return 'Homo sapiens' instead of 'Homo Homo sapiens'. (though in a bizarre twist, and this is one of my problems with how names are currently represented in the Taxonomy modules, 'Scientific Name' and 'binomial' are synonymous) [snip] >> My solution is to just remove whatever is the same between the >> current rank and the previous rank. Maybe even that's not so >> perfect, but it must be a lot better than turning the species >> 'Avian leukosis virus' into the species 'virus' (especially given >> that the genus here is 'Alpharetrovirus')! > > I'm don't think taking Genus/Species directly from the scientific > name (normally what is in the SOURCE or ORGANISM annotation for > GenBank or OS for EMBL) is the best way to go about it [snip] Perhaps, but again I'm not sure what this has to do with what I was saying. If you don't want your species name to contain your genus name you have to do some kind of parsing. My post merely pointed out that the parsing currently in bioperl does not work for viruses and possibly other species. I'd like to think that someone cares about this error and would do the simple fix I offered, or that they already know about the problem and have done their own fix. > I'm also not sure that forcing a lookup for every TaxID in every > sequence every time it's passed through SeqIO is the best way to go > either, though I think it should be required for storing sequences. > It's a tricky balance. In my own implementation any database lookups are cached, and you have the option of not doing any database lookup at all and 'faking' a taxonomy from the supplied list of names (so it works just like normal Bio::Seq). > I still think that maybe we should absolve ourselves from using > SOURCE/ORGANISM or OS/OC information in GenBank files as anything > more than strictly annotation, or reconstruct Bio::Species to maybe a > Bio::Annotation::Species object to handle that annotation and either > deprecate Bio::Species or separate it completely from any > Bio::Taxonomy objects. It would really simplify things. Then, if > anyone is interested in taxonomy, either install a local database or > use Entrez efetch, and then use Bio::DB::Taxonomy (fixed of course) > to grab the TaxID info. My personal view is that having it as an annotation would serve no real purpose. For me the whole point of any kind of species representation in bioperl is to allow you to compare species in a biologically meaningful way. If it's just some annotation then that means it's basically free-form text and you have no guarantee that two sequences from the same species are annotated exactly the same - no guarantee that your code would identify that those sequences are from the same species. The only other useful thing that a species object needs to do it let you know how related two different species are - you need to be able to ask what a species' class, kingdom etc. are. Again, not viable with an annotation - you need something strict like a properly constructed Taxonomy. I guess it comes down to the philosophy of parsing a file. Do you try and reflect exactly what the file contains, letter for letter, so that your resulting object can recreate that file letter for letter, or do you parse the file and extract the correct /meaning/ in order to be more useful? I think there can be a choice by the user, and this is best done by making Bio::Species a clever wrapper around an improved Bio::Taxonomy, as in my own implementation. From s_maheshwari84 at rediffmail.com Mon May 15 04:15:26 2006 From: s_maheshwari84 at rediffmail.com (saurabh maheshwari) Date: 15 May 2006 08:15:26 -0000 Subject: [Bioperl-l] please help Message-ID: <20060515081526.27270.qmail@webmail7.rediffmail.com> Hello All I have sent a problem to the earlier also but my problem is still unsolve so i have modified the problem in another way please can any body give me code to make a graph between some items which are in a text file in the following formate: Example item1 interacts with item2 and i want to make graph by giving any item as input and asking all interactions of that item. item 1 item 2 A B A C C B D B D E A F G A with Regards SAURABH MAHESHWARI M.Sc. (BIOINFORMATICS) JAMIA MILLIA ISLAMIA NEW DELHI From sdavis2 at mail.nih.gov Mon May 15 06:26:53 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Mon, 15 May 2006 06:26:53 -0400 Subject: [Bioperl-l] please help In-Reply-To: <20060515081526.27270.qmail@webmail7.rediffmail.com> Message-ID: On 5/15/06 4:15 AM, "saurabh maheshwari" wrote: > > Hello All > I have sent a problem to the earlier also but my problem is still unsolve so i > have modified the problem in another way please can any body give me code to > make a graph between some items which are in a text file in the following > formate: > Example > item1 interacts with item2 and i want to make graph by giving any item as > input and asking all interactions of that item. > > item 1 item 2 > A B > A C > C B > D B > D E > A F > G A Not a bioperl answer, but in your case, I would suggest looking at using cytoscape to do this. Look here for details: http://www.cytoscape.org/ Sean From sdavis2 at mail.nih.gov Mon May 15 07:03:28 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Mon, 15 May 2006 07:03:28 -0400 Subject: [Bioperl-l] please help In-Reply-To: Message-ID: On 5/15/06 6:26 AM, "Sean Davis" wrote: > > > > On 5/15/06 4:15 AM, "saurabh maheshwari" > wrote: > >> >> Hello All >> I have sent a problem to the earlier also but my problem is still unsolve so >> i >> have modified the problem in another way please can any body give me code to >> make a graph between some items which are in a text file in the following >> formate: >> Example >> item1 interacts with item2 and i want to make graph by giving any item as >> input and asking all interactions of that item. >> >> item 1 item 2 >> A B >> A C >> C B >> D B >> D E >> A F >> G A > > Not a bioperl answer, but in your case, I would suggest looking at using > cytoscape to do this. Look here for details: > > http://www.cytoscape.org/ I forgot to mention, if you are looking for a perl solution, I would look at the Graph module. http://search.cpan.org/~jhi/Graph-0.69/lib/Graph.pod You can create the graph according to the docs and then use the neighbors() method (if I remember correctly) to get the nodes connected to the query node. Sean From akarger at CGR.Harvard.edu Mon May 15 08:20:11 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Mon, 15 May 2006 08:20:11 -0400 Subject: [Bioperl-l] Deobfuscator interface now available Message-ID: This tool is quite nice, and may save me a lot of perdoc'ing. A couple of minor interface thoughts. 1)There's quite a lot of methods for many of the classes. As such, I think I'll often want to browse through what's available in a class. But 60% or so of the screen real estate is used for "Enter a search string... OR select a class from the list". IMO, it would be better to have two pages, a search page and a result page. It only takes a click on Back (or a "new search" button) to get to a new search, and now you can use your whole screen for reading your results. 2) Please sort the "select a class from the list" alphabetically. I guess I can enter a search term to get the right classes, but it would be nice to be able to browse. 2a) if you want to be really fancy, make a javascript nested menu with expandable submenus. OK, maybe not. 3) Minimalist is nice, but documentation is even nicer. It wasn't clear to me that the search searches within class names rather than function names. What I really want to know sometimes is which module has, say, the revcom method in it. So, if it's not easy to include that within this search, then at least tell me what my search space is. 4) When I search for something that's not found, I get a screen that looks pretty familiar, with the extra text "No match to string found" down at the bottom. It took me a while to even notice it. (Studies show that most users don't read most of the text on a page.) Bold might be nice here. Or put the error at the top of the screen. Or both. 5) I'll save my stupidest comment for last - please make the page title "Bioperl Deobfuscator", so that when I bookmark it I'll know what the bookmark stands for. Thanks, Laura Kavanaugh and David Messina, for a neat AND useful tool. - Amir Karger Computational Biology Group Bauer Center for Genomics Research Harvard University 617-496-0626 From sb at mrc-dunn.cam.ac.uk Mon May 15 09:08:32 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Mon, 15 May 2006 14:08:32 +0100 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: References: Message-ID: <44687D50.6080306@mrc-dunn.cam.ac.uk> Amir Karger wrote: > This tool is quite nice, and may save me a lot of perdoc'ing. Yes, many thanks to everyone involved. > A couple of minor interface thoughts. > > 1)There's quite a lot of methods for many of the classes. As such, I > think I'll often want to browse through what's available in a class. But > 60% or so of the screen real estate is used for "Enter a search > string... OR select a class from the list". IMO, it would be better to > have two pages, a search page and a result page. It only takes a click > on Back (or a "new search" button) to get to a new search, and now you > can use your whole screen for reading your results. As the compromise it must be, I like the way it behaves. I don't like lots of windows. I especially don't like pop up windows. Right now when I'm using the bioperl docs I tend to have a whole bunch of tabs open to different class pages at once, so being able to see an overview all on one page in Deobfuscator is very nice. Further to that, I'd love it if clicking on a method name caused an in-place css(&|javascript) reveal (similar to how a well implemented drop down menu works in a website) rather than a new window opened. Alternatively, just have more columns in the results table, ie. usage, function, returns, args columns. I feel that opening a window for each method you want to understand is far too slow. I'd also really like a link to the code for the method as well. The bioperl docs are rarely complete enough that you can really understand what every method is supposed to do without looking at the code. > 3) Minimalist is nice, but documentation is even nicer. It wasn't clear > to me that the search searches within class names rather than function > names. What I really want to know sometimes is which module has, say, > the revcom method in it. This would be a great feature to add. Another minor interface thought: 6) Have a little more cell padding in all the tables. Things are just a little too cramped and things start to look messy/ run into each other. From cjfields at uiuc.edu Mon May 15 09:59:57 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 08:59:57 -0500 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <44687D50.6080306@mrc-dunn.cam.ac.uk> Message-ID: <000901c67827$d99eabb0$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Monday, May 15, 2006 8:09 AM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Deobfuscator interface now available > > Amir Karger wrote: > > This tool is quite nice, and may save me a lot of perdoc'ing. > > Yes, many thanks to everyone involved. The Deobfuscator currently indexes bioperl-1.4, so it's not completely up-to-date. I believe Mauricio and Dave may be working on updating to the newer versions and maybe bioperl-live, as well as getting the other bioperl packages up and running. For modules added after v1.4 I use the script in the FAQ question mentioned on the Deobfuscator wiki page to get up-to-date methods, then grab the that ActiveState HTML'd perldocs pumped out when installing using PPM (I make a custom PPM/PPD file and install myself every once in a while): #!/usr/bin/perl -w use Class::Inspector; $class = shift || die "Usage: methods perl_class_name\n"; eval "require $class"; print join ("\n", sort @{Class::Inspector- > > A couple of minor interface thoughts. > > > > 1)There's quite a lot of methods for many of the classes. As such, I > > think I'll often want to browse through what's available in a class. But > > 60% or so of the screen real estate is used for "Enter a search > > string... OR select a class from the list". IMO, it would be better to > > have two pages, a search page and a result page. It only takes a click > > on Back (or a "new search" button) to get to a new search, and now you > > can use your whole screen for reading your results. > > As the compromise it must be, I like the way it behaves. I don't like > lots of windows. I especially don't like pop up windows. Right now when > I'm using the bioperl docs I tend to have a whole bunch of tabs open to > different class pages at once, so being able to see an overview all on > one page in Deobfuscator is very nice. > > Further to that, I'd love it if clicking on a method name caused an > in-place css(&|javascript) reveal (similar to how a well implemented > drop down menu works in a website) rather than a new window opened. > Alternatively, just have more columns in the results table, ie. usage, > function, returns, args columns. I feel that opening a window for each > method you want to understand is far too slow. Agreed. > I'd also really like a link to the code for the method as well. The > bioperl docs are rarely complete enough that you can really understand > what every method is supposed to do without looking at the code. The methods that pop up are in columns along with the class module that implements the method. If you click on that link you get PDOC documentation for the module which includes most of the code (strangely, though Deobfuscator indexes bioperl 1.4, the PDOC corresponds to bioperl-live). Is that what you meant, or something a bit more detailed? > > 3) Minimalist is nice, but documentation is even nicer. It wasn't clear > > to me that the search searches within class names rather than function > > names. What I really want to know sometimes is which module has, say, > > the revcom method in it. That's listed in the method results table (the next column has the module with a link to the module's online docs). Chris > This would be a great feature to add. > > > Another minor interface thought: > 6) Have a little more cell padding in all the tables. Things are just a > little too cramped and things start to look messy/ run into each other. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Mon May 15 12:08:30 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 11:08:30 -0500 Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names In-Reply-To: <44683943.5020307@mrc-dunn.cam.ac.uk> Message-ID: <001601c67839$cf289490$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Monday, May 15, 2006 3:18 AM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, > subspecies/variant names > > Chris Fields wrote: > > Sendu Bala wrote: > >> In bioperl up to at least 1.5.1, when one of the database modules > >> comes across a species rank it does: > >> > >> if ($rank eq 'species') { # get rid of genus from species name > >> (undef,$taxon_name) = split(/\s+/,$taxon_name,2); } > > > > The XML example from NCBI Taxonomy I mentioned previously seems to > > have everything in the classification, from superkingdom down to > > species (no strain unfortunately, and I'm nit sure about subspecies); > > if it's missing the rank then the designation doesn't exist or is > > tagged as 'no rank'. Like I mentioned before I'm not intimately > > familiar Bio::Taxonomy, Bio::DB::Taxonomy, or Bio::Species, so I > > don't have a clue as to how everything is parsed and plugged in to > > Bio::Taxonomy objects. I do know that XML::Twig is used for parsing > > through the data so it shouldn't be too hard to change what you > > want. > > Yes, that's all true, but I'm not sure what it has to do with what I was > saying. FYI, you do get a 'subspecies' rank but no 'variant' rank. In my > own implementation I change the rank of all 'no rank' Nodes below > species to 'variant'. Sorry; wandered a bit off topic there. > > I haven't tried using Bio::DB::Taxonomy directly yet, but I would > > have thought that the binomial is just built from the XML twig > > 'LineageEx' Rank=Genus + Rank=Species, that the genus comes from the > > tag 'Genus' and species from 'Species', and that the scientific name > > is from the tag 'ScientificName'. Guess not. > > No. See above for what it actually does. That is a copy/paste from the > code (there, $taxon_name == ScientificName). When it finds a species > rank it does that split because in the > ncbi taxonomy database the 'genus' rank for a human has a ScientificName > of 'Homo', whilst the 'species' rank has a ScientificName of 'Homo > sapiens', and the bioperl model (quite rightly, I think) wants the > 'species' node to not have information of other nodes (well, except for > the classification array). So it removes the 'Homo' from 'Homo sapiens' > giving a species name of 'sapiens'. This then allows the binomial method > to return 'Homo sapiens' instead of 'Homo Homo sapiens'. > > (though in a bizarre twist, and this is one of my problems with how > names are currently represented in the Taxonomy modules, 'Scientific > Name' and 'binomial' are synonymous) Ah, now I see. That's a bit screwy, but it's not on our end so we have to deal with it. I also noticed that subspecies also contains the entire string: 135461 Bacillus subtilis subsp. subtilis subspecies As for the 'scientific_name' method when accessed through Bio::DB::Taxonomy, I don't get the actual scientific name for the node (from the GenBank ORGANISM line) almost every time; I get the name with the strain chopped off instead and a number of times the names get mangled. The regexes below only grab from the topmost tags: Script: --------------------------------- #! perl use strict; use warnings; use Bio::DB::Taxonomy; my $file = shift @ARGV; print "\nNCBI XML output ScientificName tag for each node:\n"; my @taxid =(); open (TAXFILE, "){ if (/^\s{2}(\d+)<\/TaxId>/) { print "$1\t"; push @taxid, $1; } print "$1\n" if /^\s{2}(.*)<\/ScientificName>/; } close TAXFILE; print "\nBio::DB::Taxonomy scientific_name:\n"; for my $id (@taxid){ my $factory = Bio::DB::Taxonomy->new(-source => 'entrez'); my $node = $factory->get_Taxonomy_Node(-taxonid => $id); print $node->ncbi_taxid,"\t",$node->scientific_name,"\n"; } --------------------------------- Output: --------------------------------- NCBI XML output ScientificName tag for each node: 191218 Bacillus anthracis str. A2012 198094 Bacillus anthracis str. Ames 222523 Bacillus cereus ATCC 10987 224308 Bacillus subtilis subsp. subtilis str. 168 226186 Bacteroides thetaiotaomicron VPI-5482 226900 Bacillus cereus ATCC 14579 246194 Carboxydothermus hydrogenoformans Z-2901 260799 Bacillus anthracis str. Sterne 261594 Bacillus anthracis str. 'Ames Ancestor' 264462 Bdellovibrio bacteriovorus HD100 272558 Bacillus halodurans C-125 272559 Bacteroides fragilis NCTC 9343 279010 Bacillus licheniformis ATCC 14580 281309 Bacillus thuringiensis serovar konkukian str. 97-27 288681 Bacillus cereus E33L 295405 Bacteroides fragilis YCH46 66692 Bacillus clausii KSM-K16 76114 Azoarcus sp. EbN1 Bio::DB::Taxonomy scientific_name: 191218 Bacillus cereus group anthracis 198094 Bacillus cereus group anthracis 222523 Bacillus cereus group cereus 224308 subtilis Bacillus subtilis subsp. subtilis 226186 Bacteroides thetaiotaomicron 226900 Bacillus cereus group cereus 246194 Carboxydothermus hydrogenoformans 260799 Bacillus cereus group anthracis 261594 Bacillus cereus group anthracis 264462 Bdellovibrio bacteriovorus 272558 Bacillus halodurans 272559 Bacteroides fragilis 279010 Bacillus licheniformis 281309 Bacillus cereus group thuringiensis 288681 Bacillus cereus group cereus 295405 Bacteroides fragilis 66692 Bacillus clausii 76114 Azoarcus sp. --------------------------------- Note Bacillus subtilis in the Bio::Tax output above. Not one of those is the scientific name as defined by NCBI (and most taxonomists for that matter). So, in a nutshell, there's a problem here. I don't know if your fix works for that, but I definitely don't think the 'scientific name' should be assembled ad hoc but should be taken from the tagname for that node. I am currently reduced to grabbing the feature primary_tagged 'source' and getting the 'organism' tagname from that. I cannot stress enough that it should NOT be that way. As for 'binomial' == 'scientific_name', I agree; I see it as well and that should be fixed. ... > Perhaps, but again I'm not sure what this has to do with what I was > saying. If you don't want your species name to contain your genus name > you have to do some kind of parsing. My post merely pointed out that the > parsing currently in bioperl does not work for viruses and possibly > other species. I'd like to think that someone cares about this error and > would do the simple fix I offered, or that they already know about the > problem and have done their own fix. Again me going off-topic, so my apologies; it's more to do with my frustrations with Bio::Species (not Bio::DB::Taxonomy). My point here was, since there is no real way to surmise from a GenBank flatfile what the taxonomic ranks are w/o guessing (which seems to break more often than not when dealing with complex names), there shouldn't be any tie to Bio::Tax objects, at least directly. I guess methods could be incorporated into Bio::Species for those who want to give it a try, but I would like to get a GenBank file, for once, in which the scientific name/binomial name isn't mangled by Bio::Species. Back to Bio::DB::Taxonomy; I don't have a problem with implementing your methods here; on the contrary, if they fix my problem above then I'll be more than glad to. I can't get to it immediately but maybe later today/tomorrow. > > I'm also not sure that forcing a lookup for every TaxID in every > > sequence every time it's passed through SeqIO is the best way to go > > either, though I think it should be required for storing sequences. > > It's a tricky balance. > > In my own implementation any database lookups are cached, and you have > the option of not doing any database lookup at all and 'faking' a > taxonomy from the supplied list of names (so it works just like normal > Bio::Seq). > > > > I still think that maybe we should absolve ourselves from using > > SOURCE/ORGANISM or OS/OC information in GenBank files as anything > > more than strictly annotation, or reconstruct Bio::Species to maybe a > > Bio::Annotation::Species object to handle that annotation and either > > deprecate Bio::Species or separate it completely from any > > Bio::Taxonomy objects. It would really simplify things. Then, if > > anyone is interested in taxonomy, either install a local database or > > use Entrez efetch, and then use Bio::DB::Taxonomy (fixed of course) > > to grab the TaxID info. > > My personal view is that having it as an annotation would serve no real > purpose. For me the whole point of any kind of species representation in > bioperl is to allow you to compare species in a biologically meaningful > way. If it's just some annotation then that means it's basically > free-form text and you have no guarantee that two sequences from the > same species are annotated exactly the same - no guarantee that your > code would identify that those sequences are from the same species. > The only other useful thing that a species object needs to do it let you > know how related two different species are - you need to be able to ask > what a species' class, kingdom etc. are. Again, not viable with an > annotation - you need something strict like a properly constructed > Taxonomy. My point is, a large number of users do NOT use, nor care about, taxonomic information to the degree they need to know the entire classification of the organism; many are just as happy about getting the scientific name only, which is in the GenBank/EMBL file itself. To take one extreme, it is not productive to force every user to download the NCBI tax database and use lookups just to convert sequences from EMBL format to GenBank format. It's not productive to allow users to spam the NCBI tax database remotely either, so hardcoding lookups is, IMHO, a big mistake. > I guess it comes down to the philosophy of parsing a file. Do you try > and reflect exactly what the file contains, letter for letter, so that > your resulting object can recreate that file letter for letter, or do > you parse the file and extract the correct /meaning/ in order to be more > useful? > I think there can be a choice by the user, and this is best done by > making Bio::Species a clever wrapper around an improved Bio::Taxonomy, > as in my own implementation. I understand both philosophies, but the latter implies that you know the intention of the ones submitting the sequence. 99.9% of the time that's fine, something I can live with. However, when we mess up something as simple as getting the scientific name for an organism when the information is directly in the flat file (ORGANISM line) by trying to 'imply' what the classification is, yes, I get frustrated. Even more frustrating to me is that Bio::DB::Taxonomy, which should return accurate information directly from the Taxonomy database, still manages to screw up the scientific name. The NCBI definition in the sample record: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html state that the ORGANISM line contains the formal scientific name and it's lineage (no ranking). If the lineage is very long it is abbreviated so you don't get the same thing as you would through using TaxID. So, in essence, I believe you are correct, that Bio::Species can be used as a 'wrapper' for Bio::Taxonomy objects, but only up to a certain degree with caveats or warnings for possible inaccuracies. I also believe that lookups should be allowed but optional, not required (i.e. left up to the user, as you state). I just feel that it's somewhat misleading to imply, by delegating to Bio::Taxonomy, that Bio::Species contains accurate taxonomic information when NCBI themselves state that the GenBank flatfile classification can be incomplete and does not supply rankings (genus, species) in the file. It's our best guess in most cases, and a best guess by definition is not very accurate. If you want taxonomic accuracy, use the TaxID and a local tax database. I feel that we shouldn't punish those who don't worry/care about taxonomy by implementing Bio::Species with methods that mangle data that's directly in the flat file they're parsing. Okay, not to cut short this discussion, but I have to get back to $job. I'll try adding your fixes in a bit later today/tomorrow; if they pass tests I'll commit them in. Chris From hlapp at gmx.net Mon May 15 12:59:06 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 15 May 2006 12:59:06 -0400 Subject: [Bioperl-l] error loading uniprot release 49.6 into mysql In-Reply-To: <051520061234.14794.446875470003415A000039CA21602807489D0A02970E9DD29C@att.net> References: <051520061234.14794.446875470003415A000039CA21602807489D0A02970E9DD29C@att.net> Message-ID: You found the right instance. Unfortunately with the way the bioperl swissprot parser works the group (RG) isn't promoted to author if there is no author in addition (in fact you may debate whether that would even be the best way of doing things), so it doesn't find it on second occurrence by unique key. If you can live without this entry, or any other entry that causes a hiccup, just supply the flag --safe and it will gracefully move on to the next entry. Fixing the issue would require either to fix the bioperl swissprot parser (or Bio::Annotation::Reference) to stick the RG group into the author slot if there is no author, or to fix Bioperl Bio::Annotation::Reference to also feature a group and biosql to use it in place of a missing author. Actually there is $reference->rg. Maybe Bioperl-db (and hence Biosql) should just use that in place of a missing author? The downside is that upon round-tripping an entry, the RG annotation line will become an RA annotation line. How bad would that be? Any thoughts from anyone? -hilmar On May 15, 2006, at 8:34 AM, s.rayner at att.net wrote: > I found where the script is hiccuping.... > > The Uniprot release contains lines with identical annotation for > the RL keyword for two different sequences. > > ___________________ > > First occurence... > ___________________ > > ID 1433T_PONPY STANDARD; PRT; 245 AA. > AC Q5RFJ2; Q5RDK2; > DT 05-JUL-2005, integrated into UniProtKB/Swiss-Prot. > DT 05-JUL-2005, sequence version 2. > DT 18-APR-2006, entry version 13. > DE 14-3-3 protein theta. > GN Name=YWHAQ; > OS Pongo pygmaeus (Orangutan). > OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; > OC Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; > OC Catarrhini; Hominidae; Pongo. > OX NCBI_TaxID=9600; > RN [1] > RP NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA]. > RC TISSUE=Brain cortex, and Kidney; > RG The German cDNA consortium; > RL Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases. > <====== Not Unique > > > ___________________ > > Second occurence... > ___________________ > > > ID 1433G_PONPY STANDARD; PRT; 246 AA. > AC Q5RC20; > DT 05-JUL-2005, integrated into UniProtKB/Swiss-Prot. > DT 05-JUL-2005, sequence version 2. > DT 18-APR-2006, entry version 13. > DE 14-3-3 protein gamma. > GN Name=YWHAG; > OS Pongo pygmaeus (Orangutan). > OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; > OC Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; > OC Catarrhini; Hominidae; Pongo. > OX NCBI_TaxID=9600; > RN [1] > RP NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA]. > RC TISSUE=Heart; > RG The German cDNA consortium; > RL Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases. > <====== Not Unique > > > > in these two cases the generated CRC key is identical and so MySQL > throws a wobbly. > > if i look at the MySQL entry in the REFERENCE table for the first > sequence > ------+-------+---------+----------------------+ > | 139 | NULL | Submitted (NOV-2004) to the EMBL/ > GenBank/DDBJ databases. | NULL | NULL | CRC-E7973FEA4B5611DC | > +--------------+----------- > +---------------------------------------------------- > > and the error when the script choked was > > MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, > values were > ("","","Submitted (NOV-2004) to the EMBL/GenBank/DDBJ > databases.","CRC-E7973FEA4B5611DC","","","") FKs ( Duplicate entry 'CRC-E7973FEA4B5611DC' for key 3 > > hence the problem. > > I'm guessing i'm not the first person to encounter this, but dont > see any hints for an easy way around this. > > any suggestions....? > > ta > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Mon May 15 13:01:14 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 15 May 2006 13:01:14 -0400 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <4466AD7F.6050700@campus.iztacala.unam.mx> References: <4466AD7F.6050700@campus.iztacala.unam.mx> Message-ID: <068E49BD-2DE4-47BA-BD7C-D6FD487DF095@gmx.net> Hey, thanks to Laura & David for this interface. Any idea why most of the Bio::Ontology::* modules show up without their leading Bio::Ontology? And clicking on those hyperlinks doesn't go anywhere either ... Anything different with those modules that I can fix? -hilmar On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: > I'm glad to announce the availability of the Deobfuscator interface at > the BioPerl website. You can use it at the following URL: > > http://bioperl.org/cgi-bin/deob_interface.cgi > > Many thanks to Laura Kavanaugh and David Messina for this great > contribution to the BioPerl project! > > Mauricio. > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Mon May 15 13:22:13 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 12:22:13 -0500 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <068E49BD-2DE4-47BA-BD7C-D6FD487DF095@gmx.net> Message-ID: <000301c67844$1b506280$15327e82@pyrimidine> That's strange. Clicking on the list gives me the results for that module. When I click on the hyperlinks in the results section they open fine; the method column links opens a new page containing usage-function-returns-args and the class column links opens pdoc (same page) for bioperl-live. I'm using Firefox 1.5 on WinXP. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > Sent: Monday, May 15, 2006 12:01 PM > To: Mauricio Herrera Cuadra > Cc: bioperl-l > Subject: Re: [Bioperl-l] Deobfuscator interface now available > > Hey, thanks to Laura & David for this interface. > > Any idea why most of the Bio::Ontology::* modules show up without > their leading Bio::Ontology? And clicking on those hyperlinks doesn't > go anywhere either ... Anything different with those modules that I > can fix? > > -hilmar > > On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: > > > I'm glad to announce the availability of the Deobfuscator interface at > > the BioPerl website. You can use it at the following URL: > > > > http://bioperl.org/cgi-bin/deob_interface.cgi > > > > Many thanks to Laura Kavanaugh and David Messina for this great > > contribution to the BioPerl project! > > > > Mauricio. > > > > -- > > MAURICIO HERRERA CUADRA > > arareko at campus.iztacala.unam.mx > > Laboratorio de Gen?tica > > Unidad de Morfofisiolog?a y Funci?n > > Facultad de Estudios Superiores Iztacala, UNAM > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sb at mrc-dunn.cam.ac.uk Mon May 15 14:00:15 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Mon, 15 May 2006 19:00:15 +0100 Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names In-Reply-To: <001601c67839$cf289490$15327e82@pyrimidine> References: <001601c67839$cf289490$15327e82@pyrimidine> Message-ID: <4468C1AF.9080400@mrc-dunn.cam.ac.uk> Chris Fields wrote: > > Ah, now I see. That's a bit screwy, but it's not on our end so we have to > deal with it. I also noticed that subspecies also contains the entire > string: > > > 135461 > Bacillus subtilis subsp. subtilis > subspecies > Yes, this is one of the problems I mentioned in the first post to this thread. > As for the 'scientific_name' method when accessed through Bio::DB::Taxonomy, > I don't get the actual scientific name for the node (from the GenBank > ORGANISM line) almost every time; I get the name with the strain chopped off > instead and a number of times the names get mangled. [snip, should be:] > 224308 Bacillus subtilis subsp. subtilis str. 168 > 281309 Bacillus thuringiensis serovar konkukian str. 97-27 [snip, but Bio::DB::Taxonomy gives:] > 224308 subtilis Bacillus subtilis subsp. subtilis > 281309 Bacillus cereus group thuringiensis [snip] > So, in a nutshell, there's a problem here. I don't know if your fix works > for that, but I definitely don't think the 'scientific name' should be > assembled ad hoc but should be taken from the tagname for that node. Yes, my implementation will get you the correct answer, but not quite as you say. My solution was to munge the actual ScientificName but 'ensure' that the binomial would give you back the actual binomial name you wanted - which is the intent of current Bio::DB::Taxonomy code. my $species0 = TFBS::Species->new(-ncbi_taxid => 224308); my $leaf_node = $species0->taxonomy->get_leaves(); print "sci_name of Node = '", $leaf_node->scientific_name, "'\n"; print "Species0 subspecies = '", $species0->subspecies, "'\n"; print "Species0 variants = '", scalar($species0->variant), "'\n"; print "Species0 binomial = '", $species0->binomial('FULL'), "'\n"; gives: sci_name of Node = 'str. 168' Species0 subspecies = 'subsp. subtilis' Species0 variants = 'str. 168' Species0 binomial = 'Bacillus subtilis subsp. subtilis str. 168' and the same again for id 281309: sci_name of Node = 'str. 97-27' Species0 subspecies = '' Species0 variants = 'serovar konkukian str. 97-27' Species0 binomial = 'Bacillus thuringiensis serovar konkukian str. 97-27' I've done it this way because even though strictly speaking the ScientificName for 224308 (a 'no rank') is 'Bacillus subtilis subsp. subtilis str. 168', when I ask for the variant I don't want that whole string. I just want the bit that will be different when comparing other strains of this subspecies of this species of Bacillus. I want 'str. 168'. Note that my objects never store the original ScientificName; it is due to 'luck' (or as I like to think, a good implementation) that the binomial method is able to reconstruct a string that is identical to what the original ScientificName was. If you'd like to see my code let me know. You can't just drop the code snippet I posted in this thread into existing bioperl modules; quite a bit else has to change as well. I'll have to make an updated taxonomy_the_tfbs_way.tar.gz file available if you want an example implementation; the current version of that file is now out of date - it doesn't do any of what I describe above. From hlapp at gmx.net Mon May 15 14:08:49 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 15 May 2006 14:08:49 -0400 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <000301c67844$1b506280$15327e82@pyrimidine> References: <000301c67844$1b506280$15327e82@pyrimidine> Message-ID: Safari or Firefox on MacOSX don't do this. Note that the appearance in the browsable list is already different (the prefix is missing), and the JavaScript link also lacks the prefix in the module name in contrast to others, e.g., Bio::Ontology::Ontology (which is one of the few Bio::Ontology exceptions that do work and do display correctly). I suppose there is something peculiar about the code formatting of those modules? Some of the modules under Bio::OntologyIO are also affected BTW. What happens is after you click on the link the page apppears to reload (i.e., gets submitted) but the second table that is supposed open underneath the first doesn't appear. However, the sort-by drop down selector does appear. -hilmar On May 15, 2006, at 1:22 PM, Chris Fields wrote: > That's strange. Clicking on the list gives me the results for that > module. > When I click on the hyperlinks in the results section they open > fine; the > method column links opens a new page containing usage-function- > returns-args > and the class column links opens pdoc (same page) for bioperl- > live. I'm > using Firefox 1.5 on WinXP. > > Chris > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp >> Sent: Monday, May 15, 2006 12:01 PM >> To: Mauricio Herrera Cuadra >> Cc: bioperl-l >> Subject: Re: [Bioperl-l] Deobfuscator interface now available >> >> Hey, thanks to Laura & David for this interface. >> >> Any idea why most of the Bio::Ontology::* modules show up without >> their leading Bio::Ontology? And clicking on those hyperlinks doesn't >> go anywhere either ... Anything different with those modules that I >> can fix? >> >> -hilmar >> >> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: >> >>> I'm glad to announce the availability of the Deobfuscator >>> interface at >>> the BioPerl website. You can use it at the following URL: >>> >>> http://bioperl.org/cgi-bin/deob_interface.cgi >>> >>> Many thanks to Laura Kavanaugh and David Messina for this great >>> contribution to the BioPerl project! >>> >>> Mauricio. >>> >>> -- >>> MAURICIO HERRERA CUADRA >>> arareko at campus.iztacala.unam.mx >>> Laboratorio de Gen?tica >>> Unidad de Morfofisiolog?a y Funci?n >>> Facultad de Estudios Superiores Iztacala, UNAM >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Mon May 15 15:07:59 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 14:07:59 -0500 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: Message-ID: <000501c67852$e1bb55c0$15327e82@pyrimidine> I'll have to give it a try on Mac OS X (we have an ancient G4 in the lab which I can try it on). I'll let you know what I find. This is what I get when I do a search for 'Bio::Ont*' using Firefox on WinXP and this Deobfuscator link (http://bioperl.org/cgi-bin/deob_interface.cgi?); all the classes have links that work (I added newline and tab to make it a bit more readable) : Bio::OntologyIO Parser factory for Ontology formats Bio::OntologyIO::Handlers::BaseSAXHandler no short description available Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler no short description available Bio::Ontology::OntologyI Interface for an ontology implementation Bio::Ontology::TermFactory Instantiates a new Bio::Ontology::TermI (or derived class) through a factory Bio::Ontology::OntologyStore A repository of ontologies Bio::Ontology::RelationshipFactory Instantiates a new Bio::Ontology::RelationshipI (or derived class) through a factory Bio::Ontology::Ontology standard implementation of an Ontology So the names seem fine here. When I click on a class (Bio::Ontology::Ontology) I get in the results section: Method Class Returns Usage add_relationship Bio::Ontology::Ontology Its argument. add_relationship(RelationshipI relationship): RelationshipI add_relationship_type Bio::Ontology::OntologyEngineI not documented not documented add_term Bio::Ontology::Ontology its argument. add_term(TermI term): TermI ....and so on Where each method is clickable and opens a new page containing a table: Bio::Ontology::Ontology::add_relationship Usage add_relationship(RelationshipI relationship): RelationshipI Function Adds a relationship object to the ontology engine. Returns Its argument. Args A RelationshipI object. Each class is also linked to the bioperl-live PDOC. Clicking on class Bio::Ontology::Ontology in the results table gets me this page (no new page): http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html Chris > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Monday, May 15, 2006 1:09 PM > To: Chris Fields > Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > Subject: Re: [Bioperl-l] Deobfuscator interface now available > > Safari or Firefox on MacOSX don't do this. Note that the appearance > in the browsable list is already different (the prefix is missing), > and the JavaScript link also lacks the prefix in the module name in > contrast to others, e.g., Bio::Ontology::Ontology (which is one of > the few Bio::Ontology exceptions that do work and do display correctly). > > I suppose there is something peculiar about the code formatting of > those modules? Some of the modules under Bio::OntologyIO are also > affected BTW. > > What happens is after you click on the link the page apppears to > reload (i.e., gets submitted) but the second table that is supposed > open underneath the first doesn't appear. However, the sort-by drop > down selector does appear. > > -hilmar > > On May 15, 2006, at 1:22 PM, Chris Fields wrote: > > > That's strange. Clicking on the list gives me the results for that > > module. > > When I click on the hyperlinks in the results section they open > > fine; the > > method column links opens a new page containing usage-function- > > returns-args > > and the class column links opens pdoc (same page) for bioperl- > > live. I'm > > using Firefox 1.5 on WinXP. > > > > Chris > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > >> Sent: Monday, May 15, 2006 12:01 PM > >> To: Mauricio Herrera Cuadra > >> Cc: bioperl-l > >> Subject: Re: [Bioperl-l] Deobfuscator interface now available > >> > >> Hey, thanks to Laura & David for this interface. > >> > >> Any idea why most of the Bio::Ontology::* modules show up without > >> their leading Bio::Ontology? And clicking on those hyperlinks doesn't > >> go anywhere either ... Anything different with those modules that I > >> can fix? > >> > >> -hilmar > >> > >> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: > >> > >>> I'm glad to announce the availability of the Deobfuscator > >>> interface at > >>> the BioPerl website. You can use it at the following URL: > >>> > >>> http://bioperl.org/cgi-bin/deob_interface.cgi > >>> > >>> Many thanks to Laura Kavanaugh and David Messina for this great > >>> contribution to the BioPerl project! > >>> > >>> Mauricio. > >>> > >>> -- > >>> MAURICIO HERRERA CUADRA > >>> arareko at campus.iztacala.unam.mx > >>> Laboratorio de Gen?tica > >>> Unidad de Morfofisiolog?a y Funci?n > >>> Facultad de Estudios Superiores Iztacala, UNAM > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > >> -- > >> =========================================================== > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >> =========================================================== > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From cjfields at uiuc.edu Mon May 15 15:12:34 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 14:12:34 -0500 Subject: [Bioperl-l] Deobfuscator interface now available Message-ID: <000601c67853$85d49cc0$15327e82@pyrimidine> I just tried the same thing (links, search, etc) with Mac OS X v 10.3.9 and Safari (no Firefox sorry) and it worked fine as well (all links, no missing Bio::Ontology, etc). Not sure what it could be... Chris > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > Sent: Monday, May 15, 2006 2:08 PM > To: 'Hilmar Lapp' > Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > Subject: RE: [Bioperl-l] Deobfuscator interface now available > > I'll have to give it a try on Mac OS X (we have an ancient G4 in the lab > which I can try it on). I'll let you know what I find. > > This is what I get when I do a search for 'Bio::Ont*' using Firefox on > WinXP and this Deobfuscator link (http://bioperl.org/cgi- > bin/deob_interface.cgi?); all the classes have links that work (I added > newline and tab to make it a bit more readable) : > > Bio::OntologyIO > Parser factory for Ontology formats > Bio::OntologyIO::Handlers::BaseSAXHandler > no short description available > Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler > no short description available > Bio::Ontology::OntologyI > Interface for an ontology implementation > Bio::Ontology::TermFactory > Instantiates a new Bio::Ontology::TermI (or derived class) through a > factory > Bio::Ontology::OntologyStore > A repository of ontologies > Bio::Ontology::RelationshipFactory > Instantiates a new Bio::Ontology::RelationshipI (or derived class) > through a factory > Bio::Ontology::Ontology > standard implementation of an Ontology > > So the names seem fine here. > > When I click on a class (Bio::Ontology::Ontology) I get in the results > section: > > Method Class Returns > Usage > add_relationship Bio::Ontology::Ontology Its > argument. add_relationship(RelationshipI relationship): RelationshipI > add_relationship_type Bio::Ontology::OntologyEngineI not > documented not documented > add_term Bio::Ontology::Ontology its > argument. add_term(TermI term): TermI > > ....and so on > > Where each method is clickable and opens a new page containing a table: > > Bio::Ontology::Ontology::add_relationship > Usage add_relationship(RelationshipI relationship): RelationshipI > Function Adds a relationship object to the ontology engine. > Returns Its argument. > Args A RelationshipI object. > > > Each class is also linked to the bioperl-live PDOC. Clicking on class > Bio::Ontology::Ontology in the results table gets me this page (no new > page): > > http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html > > > Chris > > > -----Original Message----- > > From: Hilmar Lapp [mailto:hlapp at gmx.net] > > Sent: Monday, May 15, 2006 1:09 PM > > To: Chris Fields > > Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > > Subject: Re: [Bioperl-l] Deobfuscator interface now available > > > > Safari or Firefox on MacOSX don't do this. Note that the appearance > > in the browsable list is already different (the prefix is missing), > > and the JavaScript link also lacks the prefix in the module name in > > contrast to others, e.g., Bio::Ontology::Ontology (which is one of > > the few Bio::Ontology exceptions that do work and do display correctly). > > > > I suppose there is something peculiar about the code formatting of > > those modules? Some of the modules under Bio::OntologyIO are also > > affected BTW. > > > > What happens is after you click on the link the page apppears to > > reload (i.e., gets submitted) but the second table that is supposed > > open underneath the first doesn't appear. However, the sort-by drop > > down selector does appear. > > > > -hilmar > > > > On May 15, 2006, at 1:22 PM, Chris Fields wrote: > > > > > That's strange. Clicking on the list gives me the results for that > > > module. > > > When I click on the hyperlinks in the results section they open > > > fine; the > > > method column links opens a new page containing usage-function- > > > returns-args > > > and the class column links opens pdoc (same page) for bioperl- > > > live. I'm > > > using Firefox 1.5 on WinXP. > > > > > > Chris > > > > > >> -----Original Message----- > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > > >> Sent: Monday, May 15, 2006 12:01 PM > > >> To: Mauricio Herrera Cuadra > > >> Cc: bioperl-l > > >> Subject: Re: [Bioperl-l] Deobfuscator interface now available > > >> > > >> Hey, thanks to Laura & David for this interface. > > >> > > >> Any idea why most of the Bio::Ontology::* modules show up without > > >> their leading Bio::Ontology? And clicking on those hyperlinks doesn't > > >> go anywhere either ... Anything different with those modules that I > > >> can fix? > > >> > > >> -hilmar > > >> > > >> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: > > >> > > >>> I'm glad to announce the availability of the Deobfuscator > > >>> interface at > > >>> the BioPerl website. You can use it at the following URL: > > >>> > > >>> http://bioperl.org/cgi-bin/deob_interface.cgi > > >>> > > >>> Many thanks to Laura Kavanaugh and David Messina for this great > > >>> contribution to the BioPerl project! > > >>> > > >>> Mauricio. > > >>> > > >>> -- > > >>> MAURICIO HERRERA CUADRA > > >>> arareko at campus.iztacala.unam.mx > > >>> Laboratorio de Gen?tica > > >>> Unidad de Morfofisiolog?a y Funci?n > > >>> Facultad de Estudios Superiores Iztacala, UNAM > > >>> > > >>> _______________________________________________ > > >>> Bioperl-l mailing list > > >>> Bioperl-l at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>> > > >> > > >> -- > > >> =========================================================== > > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > >> =========================================================== > > >> > > >> > > >> > > >> > > >> > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > -- > > =========================================================== > > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > =========================================================== > > > > > > From arareko at campus.iztacala.unam.mx Mon May 15 15:20:10 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Mon, 15 May 2006 14:20:10 -0500 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <000901c67827$d99eabb0$15327e82@pyrimidine> References: <000901c67827$d99eabb0$15327e82@pyrimidine> Message-ID: <4468D46A.8070203@campus.iztacala.unam.mx> Laura and Dave would be very happy to see all of your comments/suggestions/enhancements/complaints summarized in the appropriate wiki page. Just be sure to sign them properly with your name and date: http://bioperl.org/wiki/Deobfuscator I think they'll have to discuss which features will be nice to implement and which don't, depending on the direction they want their project to go. But don't worry, they're extremely nice people who are open to all kind of ideas. The best of all: the Deobfuscator is open-source so everyone is invited to contribute to it, just ask them for the code :) On my side, I'm working on tweaking the code so it would be able of browsing different BioPerl packages (core, run, ext) and their respective releases (stable, developer, cvs). Regards, Mauricio. Chris Fields wrote: >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Sendu Bala >> Sent: Monday, May 15, 2006 8:09 AM >> To: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Deobfuscator interface now available >> >> Amir Karger wrote: >>> This tool is quite nice, and may save me a lot of perdoc'ing. >> Yes, many thanks to everyone involved. > > The Deobfuscator currently indexes bioperl-1.4, so it's not completely > up-to-date. I believe Mauricio and Dave may be working on updating to the > newer versions and maybe bioperl-live, as well as getting the other bioperl > packages up and running. > > For modules added after v1.4 I use the script in the FAQ question mentioned > on the Deobfuscator wiki page to get up-to-date methods, then grab the that > ActiveState HTML'd perldocs pumped out when installing using PPM (I make a > custom PPM/PPD file and install myself every once in a while): > > #!/usr/bin/perl -w > use Class::Inspector; > $class = shift || die "Usage: methods perl_class_name\n"; > eval "require $class"; > print join ("\n", sort @{Class::Inspector- > >>> A couple of minor interface thoughts. >>> >>> 1)There's quite a lot of methods for many of the classes. As such, I >>> think I'll often want to browse through what's available in a class. But >>> 60% or so of the screen real estate is used for "Enter a search >>> string... OR select a class from the list". IMO, it would be better to >>> have two pages, a search page and a result page. It only takes a click >>> on Back (or a "new search" button) to get to a new search, and now you >>> can use your whole screen for reading your results. >> As the compromise it must be, I like the way it behaves. I don't like >> lots of windows. I especially don't like pop up windows. Right now when >> I'm using the bioperl docs I tend to have a whole bunch of tabs open to >> different class pages at once, so being able to see an overview all on >> one page in Deobfuscator is very nice. >> >> Further to that, I'd love it if clicking on a method name caused an >> in-place css(&|javascript) reveal (similar to how a well implemented >> drop down menu works in a website) rather than a new window opened. >> Alternatively, just have more columns in the results table, ie. usage, >> function, returns, args columns. I feel that opening a window for each >> method you want to understand is far too slow. > > Agreed. > >> I'd also really like a link to the code for the method as well. The >> bioperl docs are rarely complete enough that you can really understand >> what every method is supposed to do without looking at the code. > > The methods that pop up are in columns along with the class module that > implements the method. > > > If you click on that link you get PDOC documentation for the module which > includes most of the code (strangely, though Deobfuscator indexes bioperl > 1.4, the PDOC corresponds to bioperl-live). Is that what you meant, or > something a bit more detailed? > >>> 3) Minimalist is nice, but documentation is even nicer. It wasn't clear >>> to me that the search searches within class names rather than function >>> names. What I really want to know sometimes is which module has, say, >>> the revcom method in it. > > That's listed in the method results table (the next column has the module > with a link to the module's online docs). > > > Chris > > >> This would be a great feature to add. >> >> >> Another minor interface thought: >> 6) Have a little more cell padding in all the tables. Things are just a >> little too cramped and things start to look messy/ run into each other. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From hlapp at gmx.net Mon May 15 15:23:55 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 15 May 2006 15:23:55 -0400 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <000501c67852$e1bb55c0$15327e82@pyrimidine> References: <000501c67852$e1bb55c0$15327e82@pyrimidine> Message-ID: <57326DCD-D72B-4CED-801D-9E25609BF57C@gmx.net> I wasn't using the search. It's in the scrollable table for browsing. -hilmar On May 15, 2006, at 3:07 PM, Chris Fields wrote: > I'll have to give it a try on Mac OS X (we have an ancient G4 in > the lab > which I can try it on). I'll let you know what I find. > > This is what I get when I do a search for 'Bio::Ont*' using Firefox > on WinXP > and this Deobfuscator link (http://bioperl.org/cgi-bin/ > deob_interface.cgi?); > all the classes have links that work (I added newline and tab to > make it a > bit more readable) : > > Bio::OntologyIO > Parser factory for Ontology formats > Bio::OntologyIO::Handlers::BaseSAXHandler > no short description available > Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler > no short description available > Bio::Ontology::OntologyI > Interface for an ontology implementation > Bio::Ontology::TermFactory > Instantiates a new Bio::Ontology::TermI (or derived class) through a > factory > Bio::Ontology::OntologyStore > A repository of ontologies > Bio::Ontology::RelationshipFactory > Instantiates a new Bio::Ontology::RelationshipI (or derived class) > through a factory > Bio::Ontology::Ontology > standard implementation of an Ontology > > So the names seem fine here. > > When I click on a class (Bio::Ontology::Ontology) I get in the results > section: > > Method Class > Returns > Usage > add_relationship Bio::Ontology::Ontology Its > argument. add_relationship(RelationshipI relationship): > RelationshipI > add_relationship_type Bio::Ontology::OntologyEngineI not > documented not documented > add_term Bio::Ontology::Ontology its > argument. add_term(TermI term): TermI > > ....and so on > > Where each method is clickable and opens a new page containing a > table: > > Bio::Ontology::Ontology::add_relationship > Usage add_relationship(RelationshipI relationship): RelationshipI > Function Adds a relationship object to the ontology engine. > Returns Its argument. > Args A RelationshipI object. > > > Each class is also linked to the bioperl-live PDOC. Clicking on class > Bio::Ontology::Ontology in the results table gets me this page (no new > page): > > http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html > > > Chris > >> -----Original Message----- >> From: Hilmar Lapp [mailto:hlapp at gmx.net] >> Sent: Monday, May 15, 2006 1:09 PM >> To: Chris Fields >> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' >> Subject: Re: [Bioperl-l] Deobfuscator interface now available >> >> Safari or Firefox on MacOSX don't do this. Note that the appearance >> in the browsable list is already different (the prefix is missing), >> and the JavaScript link also lacks the prefix in the module name in >> contrast to others, e.g., Bio::Ontology::Ontology (which is one of >> the few Bio::Ontology exceptions that do work and do display >> correctly). >> >> I suppose there is something peculiar about the code formatting of >> those modules? Some of the modules under Bio::OntologyIO are also >> affected BTW. >> >> What happens is after you click on the link the page apppears to >> reload (i.e., gets submitted) but the second table that is supposed >> open underneath the first doesn't appear. However, the sort-by drop >> down selector does appear. >> >> -hilmar >> >> On May 15, 2006, at 1:22 PM, Chris Fields wrote: >> >>> That's strange. Clicking on the list gives me the results for that >>> module. >>> When I click on the hyperlinks in the results section they open >>> fine; the >>> method column links opens a new page containing usage-function- >>> returns-args >>> and the class column links opens pdoc (same page) for bioperl- >>> live. I'm >>> using Firefox 1.5 on WinXP. >>> >>> Chris >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp >>>> Sent: Monday, May 15, 2006 12:01 PM >>>> To: Mauricio Herrera Cuadra >>>> Cc: bioperl-l >>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available >>>> >>>> Hey, thanks to Laura & David for this interface. >>>> >>>> Any idea why most of the Bio::Ontology::* modules show up without >>>> their leading Bio::Ontology? And clicking on those hyperlinks >>>> doesn't >>>> go anywhere either ... Anything different with those modules that I >>>> can fix? >>>> >>>> -hilmar >>>> >>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: >>>> >>>>> I'm glad to announce the availability of the Deobfuscator >>>>> interface at >>>>> the BioPerl website. You can use it at the following URL: >>>>> >>>>> http://bioperl.org/cgi-bin/deob_interface.cgi >>>>> >>>>> Many thanks to Laura Kavanaugh and David Messina for this great >>>>> contribution to the BioPerl project! >>>>> >>>>> Mauricio. >>>>> >>>>> -- >>>>> MAURICIO HERRERA CUADRA >>>>> arareko at campus.iztacala.unam.mx >>>>> Laboratorio de Gen?tica >>>>> Unidad de Morfofisiolog?a y Funci?n >>>>> Facultad de Estudios Superiores Iztacala, UNAM >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> -- >>>> =========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>> =========================================================== >>>> >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From ClarkeW at AGR.GC.CA Mon May 15 15:40:15 2006 From: ClarkeW at AGR.GC.CA (Clarke, Wayne) Date: Mon, 15 May 2006 15:40:15 -0400 Subject: [Bioperl-l] Memory Leak in Bio::SearchIO Message-ID: <320530F83FA47047823E57F110DDEAADB159EC@onncrxms4.agr.gc.ca> Hey everyone, I have been developing some code to download and parse blast reports from a remote server using Soap::Lite as well as insert the results into a mysql database. The problem I am having is that my program seems to be taking up and huge amount of RAM. For a single job of 10000 queries it can consume as much as a couple hundred Mb inside an hour. I realize that a lot of work is being done but this seems like way too much. This leads me to the subject of my post. I think I may have traced the source of the memory leak to Bio::SearchIO. I have used Devel::Size to track the size of my variables and done other debugging steps and have had no luck with resolving this very frustrating problem. My code is as follows: my $result = $connector->getQueryResult($query_id); my $FH; open $FH, "<", \$result; my $searchio = new Bio::SearchIO(-format => "blast", -fh => $FH); while (my $o_blast = $searchio->next_result()) { my $clone_id = $o_blast->query_name(); my $statement = $bdbi->form_push_SQL ($o_blast, $clone_id, 5); this is just the leading and tailing code surrounding the use of Bio::SearchIO since there is quite a lot. I am mostly just wondering if anyone has ever had problems with SearchIO and its memory usage. I looked at the source code for it but am afraid it is out of my league. Any help/suggestions/questions would be great. Thanks From dmessina at wustl.edu Mon May 15 15:34:10 2006 From: dmessina at wustl.edu (David Messina) Date: Mon, 15 May 2006 14:34:10 -0500 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <000901c67827$d99eabb0$15327e82@pyrimidine> References: <000901c67827$d99eabb0$15327e82@pyrimidine> Message-ID: Responding to: >>> Amir Karger >> Sendu Bala > Chris Fields > The Deobfuscator currently indexes bioperl-1.4, so it's not completely > up-to-date. I believe Mauricio and Dave may be working on updating > to the > newer versions and maybe bioperl-live, as well as getting the other > bioperl > packages up and running. That's correct -- Mauricio is currently working on a version that will allow you to search 1.4, 1.5.1, or bioperl-live. The Deobfuscator indexes will be updated (daily?) to keep them in sync with the CVS repository. >>> A couple of minor interface thoughts. >>> >>> 1)There's quite a lot of methods for many of the classes. As such, I >>> think I'll often want to browse through what's available in a >>> class. But >>> 60% or so of the screen real estate is used for "Enter a search >>> string... OR select a class from the list". IMO, it would be >>> better to >>> have two pages, a search page and a result page. It only takes >>> a click >>> on Back (or a "new search" button) to get to a new search, and >>> now you >>> can use your whole screen for reading your results. >> >> As the compromise it must be, I like the way it behaves. I don't like >> lots of windows. I especially don't like pop up windows. Right now >> when >> I'm using the bioperl docs I tend to have a whole bunch of tabs >> open to >> different class pages at once, so being able to see an overview >> all on >> one page in Deobfuscator is very nice. I think the current behavior makes sense as the default, but I like the idea of being able to view the search results in a separate window for easier browsing. Thanks for the suggestion; I'll add it to the list. >> Further to that, I'd love it if clicking on a method name caused an >> in-place css(&|javascript) reveal (similar to how a well implemented >> drop down menu works in a website) rather than a new window opened. >> Alternatively, just have more columns in the results table, ie. >> usage, >> function, returns, args columns. I feel that opening a window for >> each >> method you want to understand is far too slow. > > Agreed. Yeah, the way it currently works is admittedly lame, and was done as a placeholder until we figured out a better way to do it. An in-place reveal sounds like a good solution. >>> 2) Please sort the "select a class from the list" alphabetically. I >>> guess I can enter a search term to get the right classes, but it >>> would >>> be nice to be able to browse. Agreed. I think we were doing this in an earlier test version, but I must have left it out of the release I handed off to Mauricio. >>> 3) Minimalist is nice, but documentation is even nicer. It wasn't >>> clear >>> to me that the search searches within class names rather than >>> function >>> names. What I really want to know sometimes is which module has, >>> say, >>> the revcom method in it. >> >> This would be a great feature to add. That's a great idea. >>> 4) When I search for something that's not found, I get a screen that >>> looks pretty familiar, with the extra text "No match to string >>> found" >>> down at the bottom. It took me a while to even notice it. >>> (Studies show >>> that most users don't read most of the text on a page.) Bold >>> might be >>> nice here. Or put the error at the top of the screen. Or both. Added to the list. >>> 5) I'll save my stupidest comment for last - please make the page >>> title >>> "Bioperl Deobfuscator", so that when I bookmark it I'll know what >>> the >>> bookmark stands for. Added to the list. Not stupid, by the way -- much to my surprise, there are at least 2 or 3 other (obviously inferior :) ) deobfuscators floating around out there. >> Another minor interface thought: >> 6) Have a little more cell padding in all the tables. Things are >> just a >> little too cramped and things start to look messy/ run into each >> other. Added to the list. Thanks to all of you for taking the time to give such detailed feedback -- it's really helpful. There is a wiki page on the BioPerl site for this project (http:// www.bioperl.org/wiki/Deobfuscator), so I'll be putting your comments there for tracking and further discussion. Please feel free to add to it. Dave -- Dave Messina WashU Genome Sequencing Center dmessina at wustl.edu 314-286-1825 From faruque at ebi.ac.uk Mon May 15 15:47:27 2006 From: faruque at ebi.ac.uk (Nadeem Faruque) Date: Mon, 15 May 2006 20:47:27 +0100 Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names Message-ID: <809AE0C7-A9B6-48A4-BF11-BA392C770CA9@ebi.ac.uk> >> My personal view is that having it as an annotation would serve no >> real >> purpose. For me the whole point of any kind of species >> representation in >> bioperl is to allow you to compare species in a biologically >> meaningful >> way. If it's just some annotation then that means it's basically I understand the need to find the species name of entries, especially now that so many complete genomes have been given their own strain- specific tax nodes, and I also think it is a shame that the ncbi tax dump does not give a rank to entries such as these (they cannot easily be distinguished from unofficial ranks higher in the tree without ascending the tree). Would it be useful for the species name to be included within EMBL file headers, eg in a line called OB (OB is a terrible suggestion based on 'Organism Binomial' since OS is already in use)? eg two examples of the species 'Apple stem grooving virus', where the second one would appear to be a different species without delving into the tax tree or the inclusion of an OB line. AC D14995; S47260; DE Apple stem grooving virus genome, complete sequence. OS Apple stem grooving virus OB Apple stem grooving virus OC Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae; OC Capillovirus. AC AY646511; DE Citrus tatter leaf virus strain Kumquat 1, complete genome. OS Citrus tatter leaf virus OB Apple stem grooving virus OC Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae; OC Capillovirus. > My point is, a large number of users do NOT use, nor care about, > taxonomic > information to the degree they need to know the entire > classification of the > organism; many are just as happy about getting the scientific name > only, > which is in the GenBank/EMBL file itself. To take one extreme, it > is not > productive to force every user to download the NCBI tax database > and use > lookups just to convert sequences from EMBL format to GenBank > format. It's > not productive to allow users to spam the NCBI tax database > remotely either, > so hardcoding lookups is, IMHO, a big mistake. I don't think you need to add any information to turn an embl-format file into a Genbank flatfile, but maybe I'm missing something obvious. Nadeem -- Dr S.M. Nadeem N. Faruque 9 Barley Court Saffron Walden Essex CB11 3HG 01799 500 120 From dmessina at wustl.edu Mon May 15 16:12:48 2006 From: dmessina at wustl.edu (David Messina) Date: Mon, 15 May 2006 15:12:48 -0500 Subject: [Bioperl-l] Deobfuscator interface now available Message-ID: <5A2309FD-8C6E-4349-99CC-B3EDA8B2F499@wustl.edu> On May 15, 2006, at 2:23 PM, Hilmar Lapp wrote: > I wasn't using the search. It's in the scrollable table for browsing. > -hilmar I'm seeing this too on OS X with Safari 2.0.3. If you type 'goflat' (without the quotes) into the search box, you'll see the behavior. Chris, can you try it again this way just to confirm it's an OS/browser-specific thing? Not sure what's going on, Hilmar -- I'll take a look. Dave From cjfields at uiuc.edu Mon May 15 16:56:29 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 15:56:29 -0500 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <57326DCD-D72B-4CED-801D-9E25609BF57C@gmx.net> Message-ID: <000a01c67862$0a00cab0$15327e82@pyrimidine> Okay, I see what you mean. Using the search term "Bio::Ont*" also explains why I didn't see it ;P. Yeah, the bug shows up here too (WinXP and Mac OS X), and those links are broken like you said. Could be something to do with indexing. Using the methods script in the FAQ (http://www.bioperl.org/wiki/FAQ#Why_can.27t_I_easily_get_a_list_of_all_the_ methods_a_object_can_call.3F) I get this: C:\Perl\Scripts>methods.pl Bio::OntologyIO::simplehierarchy Bio::OntologyIO::simplehierarchy::Dumper Bio::OntologyIO::simplehierarchy::basename Bio::OntologyIO::simplehierarchy::dirname Bio::OntologyIO::simplehierarchy::fileparse Bio::OntologyIO::simplehierarchy::fileparse_set_fstype Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > Sent: Monday, May 15, 2006 2:24 PM > To: Chris Fields > Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > Subject: Re: [Bioperl-l] Deobfuscator interface now available > > I wasn't using the search. It's in the scrollable table for browsing. > -hilmar > > On May 15, 2006, at 3:07 PM, Chris Fields wrote: > > > I'll have to give it a try on Mac OS X (we have an ancient G4 in > > the lab > > which I can try it on). I'll let you know what I find. > > > > This is what I get when I do a search for 'Bio::Ont*' using Firefox > > on WinXP > > and this Deobfuscator link (http://bioperl.org/cgi-bin/ > > deob_interface.cgi?); > > all the classes have links that work (I added newline and tab to > > make it a > > bit more readable) : > > > > Bio::OntologyIO > > Parser factory for Ontology formats > > Bio::OntologyIO::Handlers::BaseSAXHandler > > no short description available > > Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler > > no short description available > > Bio::Ontology::OntologyI > > Interface for an ontology implementation > > Bio::Ontology::TermFactory > > Instantiates a new Bio::Ontology::TermI (or derived class) through a > > factory > > Bio::Ontology::OntologyStore > > A repository of ontologies > > Bio::Ontology::RelationshipFactory > > Instantiates a new Bio::Ontology::RelationshipI (or derived class) > > through a factory > > Bio::Ontology::Ontology > > standard implementation of an Ontology > > > > So the names seem fine here. > > > > When I click on a class (Bio::Ontology::Ontology) I get in the results > > section: > > > > Method Class > > Returns > > Usage > > add_relationship Bio::Ontology::Ontology > Its > > argument. add_relationship(RelationshipI relationship): > > RelationshipI > > add_relationship_type Bio::Ontology::OntologyEngineI not > > documented not documented > > add_term Bio::Ontology::Ontology its > > argument. add_term(TermI term): TermI > > > > ....and so on > > > > Where each method is clickable and opens a new page containing a > > table: > > > > Bio::Ontology::Ontology::add_relationship > > Usage add_relationship(RelationshipI relationship): RelationshipI > > Function Adds a relationship object to the ontology engine. > > Returns Its argument. > > Args A RelationshipI object. > > > > > > Each class is also linked to the bioperl-live PDOC. Clicking on class > > Bio::Ontology::Ontology in the results table gets me this page (no new > > page): > > > > http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html > > > > > > Chris > > > >> -----Original Message----- > >> From: Hilmar Lapp [mailto:hlapp at gmx.net] > >> Sent: Monday, May 15, 2006 1:09 PM > >> To: Chris Fields > >> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > >> Subject: Re: [Bioperl-l] Deobfuscator interface now available > >> > >> Safari or Firefox on MacOSX don't do this. Note that the appearance > >> in the browsable list is already different (the prefix is missing), > >> and the JavaScript link also lacks the prefix in the module name in > >> contrast to others, e.g., Bio::Ontology::Ontology (which is one of > >> the few Bio::Ontology exceptions that do work and do display > >> correctly). > >> > >> I suppose there is something peculiar about the code formatting of > >> those modules? Some of the modules under Bio::OntologyIO are also > >> affected BTW. > >> > >> What happens is after you click on the link the page apppears to > >> reload (i.e., gets submitted) but the second table that is supposed > >> open underneath the first doesn't appear. However, the sort-by drop > >> down selector does appear. > >> > >> -hilmar > >> > >> On May 15, 2006, at 1:22 PM, Chris Fields wrote: > >> > >>> That's strange. Clicking on the list gives me the results for that > >>> module. > >>> When I click on the hyperlinks in the results section they open > >>> fine; the > >>> method column links opens a new page containing usage-function- > >>> returns-args > >>> and the class column links opens pdoc (same page) for bioperl- > >>> live. I'm > >>> using Firefox 1.5 on WinXP. > >>> > >>> Chris > >>> > >>>> -----Original Message----- > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > >>>> Sent: Monday, May 15, 2006 12:01 PM > >>>> To: Mauricio Herrera Cuadra > >>>> Cc: bioperl-l > >>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available > >>>> > >>>> Hey, thanks to Laura & David for this interface. > >>>> > >>>> Any idea why most of the Bio::Ontology::* modules show up without > >>>> their leading Bio::Ontology? And clicking on those hyperlinks > >>>> doesn't > >>>> go anywhere either ... Anything different with those modules that I > >>>> can fix? > >>>> > >>>> -hilmar > >>>> > >>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: > >>>> > >>>>> I'm glad to announce the availability of the Deobfuscator > >>>>> interface at > >>>>> the BioPerl website. You can use it at the following URL: > >>>>> > >>>>> http://bioperl.org/cgi-bin/deob_interface.cgi > >>>>> > >>>>> Many thanks to Laura Kavanaugh and David Messina for this great > >>>>> contribution to the BioPerl project! > >>>>> > >>>>> Mauricio. > >>>>> > >>>>> -- > >>>>> MAURICIO HERRERA CUADRA > >>>>> arareko at campus.iztacala.unam.mx > >>>>> Laboratorio de Gen?tica > >>>>> Unidad de Morfofisiolog?a y Funci?n > >>>>> Facultad de Estudios Superiores Iztacala, UNAM > >>>>> > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>> > >>>> -- > >>>> =========================================================== > >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >>>> =========================================================== > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > >> -- > >> =========================================================== > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >> =========================================================== > >> > >> > >> > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Mon May 15 17:29:14 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 16:29:14 -0500 Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names In-Reply-To: <809AE0C7-A9B6-48A4-BF11-BA392C770CA9@ebi.ac.uk> Message-ID: <000b01c67866$9dac2620$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Nadeem Faruque > Sent: Monday, May 15, 2006 2:47 PM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::DB::Taxonomy:: mishandles > species,subspecies/variant names > > >> My personal view is that having it as an annotation would serve no > >> real > >> purpose. For me the whole point of any kind of species > >> representation in > >> bioperl is to allow you to compare species in a biologically > >> meaningful > >> way. If it's just some annotation then that means it's basically > > I understand the need to find the species name of entries, especially > now that so many complete genomes have been given their own strain- > specific tax nodes, and I also think it is a shame that the ncbi tax > dump does not give a rank to entries such as these (they cannot > easily be distinguished from unofficial ranks higher in the tree > without ascending the tree). > Would it be useful for the species name to be included within EMBL > file headers, eg in a line called OB (OB is a terrible suggestion > based on 'Organism Binomial' since OS is already in use)? > > eg two examples of the species 'Apple stem grooving virus', where the > second one would appear to be a different species without delving > into the tax tree or the inclusion of an OB line. > > AC D14995; S47260; > DE Apple stem grooving virus genome, complete sequence. > OS Apple stem grooving virus > OB Apple stem grooving virus > OC Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae; > OC Capillovirus. > > AC AY646511; > DE Citrus tatter leaf virus strain Kumquat 1, complete genome. > OS Citrus tatter leaf virus > OB Apple stem grooving virus > OC Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae; > OC Capillovirus. Jason also mentions a few examples (see below). The problem lies in the fact that EMBL and GenBank flatfiles do not give hierarchy ranking for taxonomy, so it's a best guess. What I'm seeing is that the guess is wrong more often than not when it comes to complex scientific names (viruses, bacteria, etc). Notice the doubling of the strain in the following GenBank files passed through SeqIO (genbank->genbank conversion, BTW; haven't tried EMBL): SOURCE Azoarcus sp. EbN1 EbN1 ORGANISM Azoarcus sp. Bacteria; Proteobacteria; Betaproteobacteria; Rhodocyclales; Rhodocyclaceae; Azoarcus. SOURCE Mycobacterium sp. KMS KMS ORGANISM Mycobacterium sp. Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales; Corynebacterineae; Mycobacteriaceae; Mycobacterium. SOURCE Mycobacterium tuberculosis C C ORGANISM Mycobacterium tuberculosis Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales; Corynebacterineae; Mycobacteriaceae; Mycobacterium; Mycobacterium; tuberculosis complex; Mycobacterium. SOURCE Bacillus subtilis subsp. subtilis str. 168 subtilis str. 168 ORGANISM Bacillus subtilis subsp. Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus. Here are Jason's examples, for posterity: Can you guess what value is the strain versus sub-species? What happens when there is a two part strain name (space separated) and a sub-species or variety designation? SOURCE Staphylococcus haemolyticus JCSC1435 ORGANISM Staphylococcus haemolyticus JCSC1435 Bacteria; Firmicutes; Bacillales; Staphylococcus. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=279808 strain is JCSC1435 versus SOURCE Muntiacus muntjak vaginalis ORGANISM Muntiacus muntjak vaginalis Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Ruminantia; Pecora; Cervidae; Muntiacinae; Muntiacus. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9887 species is muntjak, sub-species vaginalis ? versus SOURCE Aspergillus nidulans FGSC A4 ORGANISM Aspergillus nidulans FGSC A4 Eukaryota; Fungi; Ascomycota; Pezizomycotina; Eurotiomycetes; Eurotiales; Trichocomaceae; Emericella. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=227321 Genus should be Aspergillus or Emericella ? Strain and subspecies/variety in the same entry SOURCE Cryptococcus neoformans var. grubii H99 ORGANISM Cryptococcus neoformans var. grubii H99 Eukaryota; Fungi; Basidiomycota; Hymenomycetes; Heterobasidiomycetes; Tremellomycetidae; Tremellales; Tremellaceae; Filobasidiella. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=235443 > > My point is, a large number of users do NOT use, nor care about, > > taxonomic > > information to the degree they need to know the entire > > classification of the > > organism; many are just as happy about getting the scientific name > > only, > > which is in the GenBank/EMBL file itself. To take one extreme, it > > is not > > productive to force every user to download the NCBI tax database > > and use > > lookups just to convert sequences from EMBL format to GenBank > > format. It's > > not productive to allow users to spam the NCBI tax database > > remotely either, > > so hardcoding lookups is, IMHO, a big mistake. > > I don't think you need to add any information to turn an embl-format > file into a Genbank flatfile, but maybe I'm missing something obvious. The issue is the way the SOURCE and ORGANISM lines are handled (OS/OC lines in EMBL, I believe), which is using a Bio::Species object. The problem is, like I mentioned above, no hierarchal ranking is in the flat file, just the order of the ranking. We can try to make a best guess based on that but it's obviously very tricky, particularly when dealing with subspecies, strains, etc. NCBI also states that many times the classification can be too long for a file so may be incomplete (I think they leave out nodes which have 'no rank' tags, but I can't be completely sure), so there's another issue. Anyway, this is where the lookup would come in, which would require a local taxonomy database (we can't spam the NCBI remote database, that would just be rude) which would give the complete taxonomic classification if it worked properly. So now we have three possible situations: 1) One extreme : We require a lookup to get it right (which, BTW, it currently doesn't); this by default requires a local database. 2) Middle of the road : we try and guess the information as best as we can with the information given (the current situation); this is breaking more and more often now, so is becoming more unreliable. 3) Other extreme : we punt and absolve ourselves of even trying to parse the data and just have a strict tagname->value or similar simple construct to handle the data. #3 as default with option to do #1 is probably best (least error prone with option for most information), with caching to speed up lookups as Sendu Bala does now. Chris > Nadeem > > > -- > Dr S.M. Nadeem N. Faruque > 9 Barley Court > Saffron Walden > Essex CB11 3HG > 01799 500 120 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Mon May 15 17:37:56 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 15 May 2006 17:37:56 -0400 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <000a01c67862$0a00cab0$15327e82@pyrimidine> References: <000a01c67862$0a00cab0$15327e82@pyrimidine> Message-ID: <6CCA8112-651D-4154-94AE-88FE8EFBCD27@gmx.net> It does have the following line though (and a 'use' statement for OntologyIO); @ISA = qw( Bio::OntologyIO ); So what is it doing 'wrong' (there aren't any tests or so in which anything erroneous would show)? -hilmar On May 15, 2006, at 4:56 PM, Chris Fields wrote: > Okay, I see what you mean. Using the search term "Bio::Ont*" also > explains > why I didn't see it ;P. Yeah, the bug shows up here too (WinXP and > Mac OS > X), and those links are broken like you said. Could be something > to do with > indexing. > > Using the methods script in the FAQ > (http://www.bioperl.org/wiki/FAQ#Why_can. > 27t_I_easily_get_a_list_of_all_the_ > methods_a_object_can_call.3F) I get this: > > C:\Perl\Scripts>methods.pl Bio::OntologyIO::simplehierarchy > Bio::OntologyIO::simplehierarchy::Dumper > Bio::OntologyIO::simplehierarchy::basename > Bio::OntologyIO::simplehierarchy::dirname > Bio::OntologyIO::simplehierarchy::fileparse > Bio::OntologyIO::simplehierarchy::fileparse_set_fstype > > Chris > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp >> Sent: Monday, May 15, 2006 2:24 PM >> To: Chris Fields >> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' >> Subject: Re: [Bioperl-l] Deobfuscator interface now available >> >> I wasn't using the search. It's in the scrollable table for browsing. >> -hilmar >> >> On May 15, 2006, at 3:07 PM, Chris Fields wrote: >> >>> I'll have to give it a try on Mac OS X (we have an ancient G4 in >>> the lab >>> which I can try it on). I'll let you know what I find. >>> >>> This is what I get when I do a search for 'Bio::Ont*' using Firefox >>> on WinXP >>> and this Deobfuscator link (http://bioperl.org/cgi-bin/ >>> deob_interface.cgi?); >>> all the classes have links that work (I added newline and tab to >>> make it a >>> bit more readable) : >>> >>> Bio::OntologyIO >>> Parser factory for Ontology formats >>> Bio::OntologyIO::Handlers::BaseSAXHandler >>> no short description available >>> Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler >>> no short description available >>> Bio::Ontology::OntologyI >>> Interface for an ontology implementation >>> Bio::Ontology::TermFactory >>> Instantiates a new Bio::Ontology::TermI (or derived class) >>> through a >>> factory >>> Bio::Ontology::OntologyStore >>> A repository of ontologies >>> Bio::Ontology::RelationshipFactory >>> Instantiates a new Bio::Ontology::RelationshipI (or derived class) >>> through a factory >>> Bio::Ontology::Ontology >>> standard implementation of an Ontology >>> >>> So the names seem fine here. >>> >>> When I click on a class (Bio::Ontology::Ontology) I get in the >>> results >>> section: >>> >>> Method Class >>> Returns >>> Usage >>> add_relationship Bio::Ontology::Ontology >> Its >>> argument. add_relationship(RelationshipI relationship): >>> RelationshipI >>> add_relationship_type Bio::Ontology::OntologyEngineI >>> not >>> documented not documented >>> add_term Bio::Ontology::Ontology >>> its >>> argument. add_term(TermI term): TermI >>> >>> ....and so on >>> >>> Where each method is clickable and opens a new page containing a >>> table: >>> >>> Bio::Ontology::Ontology::add_relationship >>> Usage add_relationship(RelationshipI relationship): RelationshipI >>> Function Adds a relationship object to the ontology engine. >>> Returns Its argument. >>> Args A RelationshipI object. >>> >>> >>> Each class is also linked to the bioperl-live PDOC. Clicking on >>> class >>> Bio::Ontology::Ontology in the results table gets me this page >>> (no new >>> page): >>> >>> http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html >>> >>> >>> Chris >>> >>>> -----Original Message----- >>>> From: Hilmar Lapp [mailto:hlapp at gmx.net] >>>> Sent: Monday, May 15, 2006 1:09 PM >>>> To: Chris Fields >>>> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' >>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available >>>> >>>> Safari or Firefox on MacOSX don't do this. Note that the appearance >>>> in the browsable list is already different (the prefix is missing), >>>> and the JavaScript link also lacks the prefix in the module name in >>>> contrast to others, e.g., Bio::Ontology::Ontology (which is one of >>>> the few Bio::Ontology exceptions that do work and do display >>>> correctly). >>>> >>>> I suppose there is something peculiar about the code formatting of >>>> those modules? Some of the modules under Bio::OntologyIO are also >>>> affected BTW. >>>> >>>> What happens is after you click on the link the page apppears to >>>> reload (i.e., gets submitted) but the second table that is supposed >>>> open underneath the first doesn't appear. However, the sort-by drop >>>> down selector does appear. >>>> >>>> -hilmar >>>> >>>> On May 15, 2006, at 1:22 PM, Chris Fields wrote: >>>> >>>>> That's strange. Clicking on the list gives me the results for >>>>> that >>>>> module. >>>>> When I click on the hyperlinks in the results section they open >>>>> fine; the >>>>> method column links opens a new page containing usage-function- >>>>> returns-args >>>>> and the class column links opens pdoc (same page) for bioperl- >>>>> live. I'm >>>>> using Firefox 1.5 on WinXP. >>>>> >>>>> Chris >>>>> >>>>>> -----Original Message----- >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp >>>>>> Sent: Monday, May 15, 2006 12:01 PM >>>>>> To: Mauricio Herrera Cuadra >>>>>> Cc: bioperl-l >>>>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available >>>>>> >>>>>> Hey, thanks to Laura & David for this interface. >>>>>> >>>>>> Any idea why most of the Bio::Ontology::* modules show up without >>>>>> their leading Bio::Ontology? And clicking on those hyperlinks >>>>>> doesn't >>>>>> go anywhere either ... Anything different with those modules >>>>>> that I >>>>>> can fix? >>>>>> >>>>>> -hilmar >>>>>> >>>>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: >>>>>> >>>>>>> I'm glad to announce the availability of the Deobfuscator >>>>>>> interface at >>>>>>> the BioPerl website. You can use it at the following URL: >>>>>>> >>>>>>> http://bioperl.org/cgi-bin/deob_interface.cgi >>>>>>> >>>>>>> Many thanks to Laura Kavanaugh and David Messina for this great >>>>>>> contribution to the BioPerl project! >>>>>>> >>>>>>> Mauricio. >>>>>>> >>>>>>> -- >>>>>>> MAURICIO HERRERA CUADRA >>>>>>> arareko at campus.iztacala.unam.mx >>>>>>> Laboratorio de Gen?tica >>>>>>> Unidad de Morfofisiolog?a y Funci?n >>>>>>> Facultad de Estudios Superiores Iztacala, UNAM >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>> >>>>>> -- >>>>>> =========================================================== >>>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>>>> =========================================================== >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> -- >>>> =========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>> =========================================================== >>>> >>>> >>>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Mon May 15 18:03:48 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 17:03:48 -0500 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <6CCA8112-651D-4154-94AE-88FE8EFBCD27@gmx.net> Message-ID: <000d01c6786b$71c04e60$15327e82@pyrimidine> And Bio::OntologyIO works on it's own: C:\Perl\Scripts>methods.pl Bio::OntologyIO Bio::OntologyIO::DESTROY Bio::OntologyIO::new Bio::OntologyIO::next_ontology Bio::OntologyIO::term_factory Bio::OntologyIO::unescape Bio::Root::IO::catfile Bio::Root::IO::close Bio::Root::IO::dup Bio::Root::IO::exists_exe Bio::Root::IO::file Bio::Root::IO::flush Bio::Root::IO::gensym Bio::Root::IO::mode Bio::Root::IO::noclose Bio::Root::IO::qualify Bio::Root::IO::qualify_to_ref Bio::Root::IO::rmtree Bio::Root::IO::tempdir Bio::Root::IO::tempfile Bio::Root::IO::ungensym Bio::Root::Root::confess Bio::Root::Root::debug Bio::Root::Root::throw Bio::Root::Root::verbose Bio::Root::RootI::carp Bio::Root::RootI::deprecated Bio::Root::RootI::stack_trace Bio::Root::RootI::stack_trace_dump Bio::Root::RootI::throw_not_implemented Bio::Root::RootI::warn Bio::Root::RootI::warn_not_implemented But when I try these: C:\Perl\Scripts>methods.pl Bio::OntologyIO::goflat C:\Perl\Scripts>methods.pl Bio::OntologyIO::dagflat I get nada. It could be related to the way the methods are parsed using Class::Inspector : print join ("\n", sort @{Class::Inspector->methods($class,'full','public')}), "\n"; I haven't tried it on all the weird Bio::Ontology-missing modules (don't have time today). It's not common to all of those modules though: C:\Perl\Scripts>methods.pl Bio::OntologyIO::InterProParser Bio::OntologyIO::DESTROY Bio::OntologyIO::InterProParser::next_ontology Bio::OntologyIO::InterProParser::parse Bio::OntologyIO::InterProParser::secondary_accessions_map Bio::OntologyIO::new Bio::OntologyIO::term_factory Bio::OntologyIO::unescape Bio::Root::IO::catfile Bio::Root::IO::close Bio::Root::IO::dup Bio::Root::IO::exists_exe Bio::Root::IO::file Bio::Root::IO::flush Bio::Root::IO::gensym Bio::Root::IO::mode Bio::Root::IO::noclose Bio::Root::IO::qualify Bio::Root::IO::qualify_to_ref Bio::Root::IO::rmtree Bio::Root::IO::tempdir Bio::Root::IO::tempfile Bio::Root::IO::ungensym Bio::Root::Root::confess Bio::Root::Root::debug Bio::Root::Root::throw Bio::Root::Root::verbose Bio::Root::RootI::carp Bio::Root::RootI::deprecated Bio::Root::RootI::stack_trace Bio::Root::RootI::stack_trace_dump Bio::Root::RootI::throw_not_implemented Bio::Root::RootI::warn Bio::Root::RootI::warn_not_implemented Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > Sent: Monday, May 15, 2006 4:38 PM > To: Chris Fields > Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > Subject: Re: [Bioperl-l] Deobfuscator interface now available > > It does have the following line though (and a 'use' statement for > OntologyIO); > > @ISA = qw( Bio::OntologyIO ); > > So what is it doing 'wrong' (there aren't any tests or so in which > anything erroneous would show)? > > -hilmar > > On May 15, 2006, at 4:56 PM, Chris Fields wrote: > > > Okay, I see what you mean. Using the search term "Bio::Ont*" also > > explains > > why I didn't see it ;P. Yeah, the bug shows up here too (WinXP and > > Mac OS > > X), and those links are broken like you said. Could be something > > to do with > > indexing. > > > > Using the methods script in the FAQ > > (http://www.bioperl.org/wiki/FAQ#Why_can. > > 27t_I_easily_get_a_list_of_all_the_ > > methods_a_object_can_call.3F) I get this: > > > > C:\Perl\Scripts>methods.pl Bio::OntologyIO::simplehierarchy > > Bio::OntologyIO::simplehierarchy::Dumper > > Bio::OntologyIO::simplehierarchy::basename > > Bio::OntologyIO::simplehierarchy::dirname > > Bio::OntologyIO::simplehierarchy::fileparse > > Bio::OntologyIO::simplehierarchy::fileparse_set_fstype > > > > Chris > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > >> Sent: Monday, May 15, 2006 2:24 PM > >> To: Chris Fields > >> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > >> Subject: Re: [Bioperl-l] Deobfuscator interface now available > >> > >> I wasn't using the search. It's in the scrollable table for browsing. > >> -hilmar > >> > >> On May 15, 2006, at 3:07 PM, Chris Fields wrote: > >> > >>> I'll have to give it a try on Mac OS X (we have an ancient G4 in > >>> the lab > >>> which I can try it on). I'll let you know what I find. > >>> > >>> This is what I get when I do a search for 'Bio::Ont*' using Firefox > >>> on WinXP > >>> and this Deobfuscator link (http://bioperl.org/cgi-bin/ > >>> deob_interface.cgi?); > >>> all the classes have links that work (I added newline and tab to > >>> make it a > >>> bit more readable) : > >>> > >>> Bio::OntologyIO > >>> Parser factory for Ontology formats > >>> Bio::OntologyIO::Handlers::BaseSAXHandler > >>> no short description available > >>> Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler > >>> no short description available > >>> Bio::Ontology::OntologyI > >>> Interface for an ontology implementation > >>> Bio::Ontology::TermFactory > >>> Instantiates a new Bio::Ontology::TermI (or derived class) > >>> through a > >>> factory > >>> Bio::Ontology::OntologyStore > >>> A repository of ontologies > >>> Bio::Ontology::RelationshipFactory > >>> Instantiates a new Bio::Ontology::RelationshipI (or derived class) > >>> through a factory > >>> Bio::Ontology::Ontology > >>> standard implementation of an Ontology > >>> > >>> So the names seem fine here. > >>> > >>> When I click on a class (Bio::Ontology::Ontology) I get in the > >>> results > >>> section: > >>> > >>> Method Class > >>> Returns > >>> Usage > >>> add_relationship Bio::Ontology::Ontology > >> Its > >>> argument. add_relationship(RelationshipI relationship): > >>> RelationshipI > >>> add_relationship_type Bio::Ontology::OntologyEngineI > >>> not > >>> documented not documented > >>> add_term Bio::Ontology::Ontology > >>> its > >>> argument. add_term(TermI term): TermI > >>> > >>> ....and so on > >>> > >>> Where each method is clickable and opens a new page containing a > >>> table: > >>> > >>> Bio::Ontology::Ontology::add_relationship > >>> Usage add_relationship(RelationshipI relationship): RelationshipI > >>> Function Adds a relationship object to the ontology engine. > >>> Returns Its argument. > >>> Args A RelationshipI object. > >>> > >>> > >>> Each class is also linked to the bioperl-live PDOC. Clicking on > >>> class > >>> Bio::Ontology::Ontology in the results table gets me this page > >>> (no new > >>> page): > >>> > >>> http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html > >>> > >>> > >>> Chris > >>> > >>>> -----Original Message----- > >>>> From: Hilmar Lapp [mailto:hlapp at gmx.net] > >>>> Sent: Monday, May 15, 2006 1:09 PM > >>>> To: Chris Fields > >>>> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > >>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available > >>>> > >>>> Safari or Firefox on MacOSX don't do this. Note that the appearance > >>>> in the browsable list is already different (the prefix is missing), > >>>> and the JavaScript link also lacks the prefix in the module name in > >>>> contrast to others, e.g., Bio::Ontology::Ontology (which is one of > >>>> the few Bio::Ontology exceptions that do work and do display > >>>> correctly). > >>>> > >>>> I suppose there is something peculiar about the code formatting of > >>>> those modules? Some of the modules under Bio::OntologyIO are also > >>>> affected BTW. > >>>> > >>>> What happens is after you click on the link the page apppears to > >>>> reload (i.e., gets submitted) but the second table that is supposed > >>>> open underneath the first doesn't appear. However, the sort-by drop > >>>> down selector does appear. > >>>> > >>>> -hilmar > >>>> > >>>> On May 15, 2006, at 1:22 PM, Chris Fields wrote: > >>>> > >>>>> That's strange. Clicking on the list gives me the results for > >>>>> that > >>>>> module. > >>>>> When I click on the hyperlinks in the results section they open > >>>>> fine; the > >>>>> method column links opens a new page containing usage-function- > >>>>> returns-args > >>>>> and the class column links opens pdoc (same page) for bioperl- > >>>>> live. I'm > >>>>> using Firefox 1.5 on WinXP. > >>>>> > >>>>> Chris > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > >>>>>> Sent: Monday, May 15, 2006 12:01 PM > >>>>>> To: Mauricio Herrera Cuadra > >>>>>> Cc: bioperl-l > >>>>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available > >>>>>> > >>>>>> Hey, thanks to Laura & David for this interface. > >>>>>> > >>>>>> Any idea why most of the Bio::Ontology::* modules show up without > >>>>>> their leading Bio::Ontology? And clicking on those hyperlinks > >>>>>> doesn't > >>>>>> go anywhere either ... Anything different with those modules > >>>>>> that I > >>>>>> can fix? > >>>>>> > >>>>>> -hilmar > >>>>>> > >>>>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: > >>>>>> > >>>>>>> I'm glad to announce the availability of the Deobfuscator > >>>>>>> interface at > >>>>>>> the BioPerl website. You can use it at the following URL: > >>>>>>> > >>>>>>> http://bioperl.org/cgi-bin/deob_interface.cgi > >>>>>>> > >>>>>>> Many thanks to Laura Kavanaugh and David Messina for this great > >>>>>>> contribution to the BioPerl project! > >>>>>>> > >>>>>>> Mauricio. > >>>>>>> > >>>>>>> -- > >>>>>>> MAURICIO HERRERA CUADRA > >>>>>>> arareko at campus.iztacala.unam.mx > >>>>>>> Laboratorio de Gen?tica > >>>>>>> Unidad de Morfofisiolog?a y Funci?n > >>>>>>> Facultad de Estudios Superiores Iztacala, UNAM > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Bioperl-l mailing list > >>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>> > >>>>>> > >>>>>> -- > >>>>>> =========================================================== > >>>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >>>>>> =========================================================== > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>> > >>>> -- > >>>> =========================================================== > >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >>>> =========================================================== > >>>> > >>>> > >>>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > >> -- > >> =========================================================== > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >> =========================================================== > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Mon May 15 20:14:28 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Mon, 15 May 2006 19:14:28 -0500 Subject: [Bioperl-l] Memory Leak in Bio::SearchIO Message-ID: ---- Original message ---- >Date: Mon, 15 May 2006 15:40:15 -0400 >From: "Clarke, Wayne" >Subject: [Bioperl-l] Memory Leak in Bio::SearchIO >To: > >Hey everyone, > > > >I have been developing some code to download and parse blast reports >from a remote server using Soap::Lite as well as insert the results into >a mysql database. The problem I am having is that my program seems to be >taking up and huge amount of RAM. For a single job of 10000 queries it >can consume as much as a couple hundred Mb inside an hour. If you're parsing 10000 queries (10000 different BLAST reports, right?) then it's not necessarily a memory leak as much as it is object creatio. Each report generates hit objects which in turn generate hsp objects. I think Jason recommends using the tabular output option (-m8 or -m9) for huge reports as it cuts down considerably on this. If you are cycling through each report it shouldn't be as much of a problem unless your BLAST reports are really huge. Have you tried parsing a single report to see if the problem persists? Now, if you are using Bioperl 1.5.1 with BLAST 2.2.13 or newer, you'll likely run into a problem with an infinite loop that occurs due to a change in NCBI's text output. You can try updating bioperl from CVS in either case to see if that helps any. Tabular output and XML output, AFAIK, is the same regardless of version; this bug only affected text output of BLAST reports. > I realize >that a lot of work is being done but this seems like way too much. This >leads me to the subject of my post. I think I may have traced the source >of the memory leak to Bio::SearchIO. I have used Devel::Size to track >the size of my variables and done other debugging steps and have had no >luck with resolving this very frustrating problem. My code is as >follows: > > > > my $result = $connector->getQueryResult($query_id); > > > > my $FH; > > open $FH, "<", \$result; > > > > my $searchio = new Bio::SearchIO(-format => "blast", > > > > -fh => $FH); > > > > while (my $o_blast = $searchio->next_result()) { > > my $clone_id = $o_blast->query_name(); > > > > my $statement = $bdbi->form_push_SQL ($o_blast, >$clone_id, 5); > > > >this is just the leading and tailing code surrounding the use of >Bio::SearchIO since there is quite a lot. I am mostly just wondering if >anyone has ever had problems with SearchIO and its memory usage. I >looked at the source code for it but am afraid it is out of my league. >Any help/suggestions/questions would be great. Thanks > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From torsten.seemann at infotech.monash.edu.au Mon May 15 20:18:44 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 16 May 2006 10:18:44 +1000 Subject: [Bioperl-l] Memory Leak in Bio::SearchIO In-Reply-To: <320530F83FA47047823E57F110DDEAADB159EC@onncrxms4.agr.gc.ca> References: <320530F83FA47047823E57F110DDEAADB159EC@onncrxms4.agr.gc.ca> Message-ID: <44691A64.8040607@infotech.monash.edu.au> > taking up and huge amount of RAM. For a single job of 10000 queries it > can consume as much as a couple hundred Mb inside an hour. I realize > my $result = $connector->getQueryResult($query_id); > my $searchio = new Bio::SearchIO(-format => "blast", > while (my $o_blast = $searchio->next_result()) { > my $clone_id = $o_blast->query_name(); > my $statement = $bdbi->form_push_SQL ($o_blast, $clone_id, 5); } Some comments: Have you considered that whatever class/module $bdbi belongs to is causing the problem? ie. is it keeping a reference to $o_blast around? Are you aware that Perl garbage collection does not necessarily return freed memory back to the OS? This may affect how you were measuring "memory usage". -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From kmdaily at indiana.edu Mon May 15 17:00:12 2006 From: kmdaily at indiana.edu (Daily, Kenneth Michael) Date: Mon, 15 May 2006 17:00:12 -0400 Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO Message-ID: <20528E699A515C499B80C222BDBEBC34043FF8@iu-mssg-mbx108.ads.iu.edu> I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in Bio/SeqIO). How can I get this module? Kenny Daily IU School of Informatics kmdaily at indiana.edu From letondal at pasteur.fr Tue May 16 02:06:19 2006 From: letondal at pasteur.fr (Catherine Letondal) Date: Tue, 16 May 2006 08:06:19 +0200 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: References: <000901c67827$d99eabb0$15327e82@pyrimidine> Message-ID: <9c36140009c3d80bbb0d543376afa6e0@pasteur.fr> On May 15, 2006, at 9:34 PM, David Messina wrote: >>>> A couple of minor interface thoughts. >>>> >>>> 1)There's quite a lot of methods for many of the classes. As such, I >>>> think I'll often want to browse through what's available in a >>>> class. But >>>> 60% or so of the screen real estate is used for "Enter a search >>>> string... OR select a class from the list". IMO, it would be >>>> better to >>>> have two pages, a search page and a result page. It only takes >>>> a click >>>> on Back (or a "new search" button) to get to a new search, and >>>> now you >>>> can use your whole screen for reading your results. >>> >>> As the compromise it must be, I like the way it behaves. I don't like >>> lots of windows. I especially don't like pop up windows. Right now >>> when >>> I'm using the bioperl docs I tend to have a whole bunch of tabs >>> open to >>> different class pages at once, so being able to see an overview >>> all on >>> one page in Deobfuscator is very nice. > > I think the current behavior makes sense as the default, but I like > the idea of being able to view the search results in a separate > window for easier browsing. Thanks for the suggestion; I'll add it to > the list. > First, thanks for this very useful Web interface! There are examples (quite ajaxian ones) that reach a compromise between several windows for easily browsing large results, and composing everything in one window to get an overview - the 2 examples that come in my mind currently are (not biology related): - http://montreal.mspace.fm/chi/sched/ - http://www.live.com/ (see the slider on the top right enabling to squeeze or enlarge the results area) -- Catherine Letondal -- Institut Pasteur From cjfields at uiuc.edu Tue May 16 07:38:42 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Tue, 16 May 2006 06:38:42 -0500 Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO Message-ID: <36b52ba4.b94c5b79.8198c00@expms6.cites.uiuc.edu> You'll have to install from CVS. I believe Brian added Entrezgene.pm after the lst developer release (1.5.1): http://www.bioperl.org/wiki/Installing_BioPerl Chris ---- Original message ---- >Date: Mon, 15 May 2006 17:00:12 -0400 >From: "Daily, Kenneth Michael" >Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO >To: > >I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in Bio/SeqIO). How can I get this module? > >Kenny Daily >IU School of Informatics >kmdaily at indiana.edu > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From bernd.web at gmail.com Tue May 16 07:37:46 2006 From: bernd.web at gmail.com (Bernd Web) Date: Tue, 16 May 2006 13:37:46 +0200 Subject: [Bioperl-l] Bio::DB::Query::GenBank checks Message-ID: <716af09c0605160437tfcf824dxa514f38f6b94d423@mail.gmail.com> Hi all, I was using Bio::DB::Query::GenBank to obtain only IDs from Entrez and found some issues and differences (bugs?) in behaviour wrt the pod. Do these look familiar ? Some example code: my $query = Bio::DB::Query::GenBank->new (-query =>'Lassa Virus[ORGN]', -reldate => '30', -db => 'protein', -ids => [195052,2981014,11127914], -maxids => 30 ); $gb = new Bio::DB::GenBank(format=>'fasta'); my $seqio = $gb->get_Stream_by_query($query); while (my $seq = $seqio->next_seq) { print $seq->desc,"\n"; } The module states that if we provide -ids that: If you provide an array reference of IDs in -ids, the query will be ignored and the list of IDs will be used when the query is passed to a Bio::DB::GenBank object's get_Stream_by_query() method. In the above case actually the query is passed ('Lassa Virus[ORGN]), not the IDs. Also $query->query shows the original query. Am I doing something wrong or is the pod not reflecting current behaviour of this module? I was also surprised that if internet is down no warning is thrown for $query->query or $query->count at all. Only the get_Stream_by_query above will warn us if the site is unreachable (500 Internal Server Error). $query->ids or $query->count will not throw a warning and @ids=$query->ids will just be an empty array. (I realize $query->count is not initialized, so I am using this now to check for succes, but a warning from WebDBSeqI would me more approprotiate I think). Last, the example from the pod is not working, but no warnings are raised: # initialize the list yourself my $query = Bio::DB::Query::GenBank->new(-ids=>[195052,2981014,11127914]); $query->count returns zero w/o any warning. Of course this query did not specify a DB. Only if we specify -db=>'nucleotide' $query->count is 3. However, why not any warning if we set -db->'protein' or if we did not set this? On the NCBI website searching Protein DB returns for 19505: See Details. No items found. The following term(s) refer to a different DB:195052 But this is not reflected via Bio::DB::Query::GenBank. Can I check for this situation in the code apart from checking on $query->count == 0 ? Or would it indeed be better to check for these situations in the module? Regards, Bernd From chen_li3 at yahoo.com Tue May 16 10:55:51 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 16 May 2006 07:55:51 -0700 (PDT) Subject: [Bioperl-l] module for 6 reading frames Message-ID: <20060516145551.50370.qmail@web36802.mail.mud.yahoo.com> Hi all, I wonder which module is available for translating DNA sequence into 6 reading frames. Thank you, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From smarkel at scitegic.com Tue May 16 11:10:35 2006 From: smarkel at scitegic.com (smarkel at scitegic.com) Date: Tue, 16 May 2006 08:10:35 -0700 Subject: [Bioperl-l] module for 6 reading frames In-Reply-To: <20060516145551.50370.qmail@web36802.mail.mud.yahoo.com> Message-ID: Li, Use the translate() function in Bio::Tools::CodonTable. Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at scitegic.com SciTegic Inc. mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 279 8804 USA web: http://www.scitegic.com bioperl-l-bounces at lists.open-bio.org wrote on 16.05.2006 07:55:51: > Hi all, > > I wonder which module is available for translating DNA > sequence into 6 reading frames. > > Thank you, > > Li From golharam at umdnj.edu Tue May 16 12:18:19 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Tue, 16 May 2006 12:18:19 -0400 Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene? Message-ID: <001f01c67904$59b08ad0$2f01a8c0@GOLHARMOBILE1> I just updated my local copy of bioperl from cvs. When I ran the configure script, it says I need the external module Bio::ASN1::EntrezGene. Which package contains this module? -- Ryan Golhar - golharam at umdnj.edu The Informatics Institute of UMDNJ From golharam at umdnj.edu Tue May 16 12:24:03 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Tue, 16 May 2006 12:24:03 -0400 Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene? Message-ID: <002001c67905$2622a580$2f01a8c0@GOLHARMOBILE1> Never mind. I see its in CPAN. -----Original Message----- From: Ryan Golhar [mailto:golharam at umdnj.edu] Sent: Tuesday, May 16, 2006 12:18 PM To: 'bioperl-l at bioperl.org' Subject: Where is Bio::ASN1::EntrezGene? I just updated my local copy of bioperl from cvs. When I ran the configure script, it says I need the external module Bio::ASN1::EntrezGene. Which package contains this module? -- Ryan Golhar - golharam at umdnj.edu The Informatics Institute of UMDNJ From cjfields at uiuc.edu Tue May 16 13:27:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 16 May 2006 12:27:32 -0500 Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene? In-Reply-To: <001f01c67904$59b08ad0$2f01a8c0@GOLHARMOBILE1> Message-ID: <002701c6790e$03d8f110$15327e82@pyrimidine> It's actually not part of Bioperl currently; you can find it on CPAN: http://search.cpan.org/~mingyiliu/Bio-ASN1-EntrezGene-1.091/lib/Bio/ASN1/Ent rezGene.pm Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Ryan Golhar > Sent: Tuesday, May 16, 2006 11:18 AM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene? > > I just updated my local copy of bioperl from cvs. When I ran the > configure script, it says I need the external module > Bio::ASN1::EntrezGene. Which package contains this module? > > -- > Ryan Golhar - golharam at umdnj.edu > The Informatics Institute of UMDNJ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ClarkeW at AGR.GC.CA Tue May 16 16:57:13 2006 From: ClarkeW at AGR.GC.CA (Clarke, Wayne) Date: Tue, 16 May 2006 16:57:13 -0400 Subject: [Bioperl-l] Memory Leak in Bio::SearchIO Message-ID: <320530F83FA47047823E57F110DDEAADB159FB@onncrxms4.agr.gc.ca> With regards to the suggestions/comments made thank you. However I think I should clear a few things up. I am running bioperl v1.4, I am cycling through the blast reports which should not be of absurd size since they only contain the top 5 hits, and I am using top to track(although I realize fairly inacuately) the memory usage. I have looked through the code for both AAFCBLAST and BEAST_UPDATE but do not believe the leak/problem to be contained within them since they are almost exclusively using method calls and those variables should be destroyed upon leaving the scope of the method. I have used Devel::Size to check the size of the variables $bdbi and $searchio and $connector and on each iteration these variables have the same size. Any other suggestions would be greatly appreciated as I have nearly gone insane trying to track this problem down. Thanks, Wayne -----Original Message----- From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au] Sent: Monday, May 15, 2006 6:19 PM To: Clarke, Wayne Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO > taking up and huge amount of RAM. For a single job of 10000 queries it > can consume as much as a couple hundred Mb inside an hour. I realize > my $result = $connector->getQueryResult($query_id); > my $searchio = new Bio::SearchIO(-format => "blast", > while (my $o_blast = $searchio->next_result()) { > my $clone_id = $o_blast->query_name(); > my $statement = $bdbi->form_push_SQL ($o_blast, $clone_id, 5); } Some comments: Have you considered that whatever class/module $bdbi belongs to is causing the problem? ie. is it keeping a reference to $o_blast around? Are you aware that Perl garbage collection does not necessarily return freed memory back to the OS? This may affect how you were measuring "memory usage". -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From smarkel at scitegic.com Tue May 16 16:52:05 2006 From: smarkel at scitegic.com (smarkel at scitegic.com) Date: Tue, 16 May 2006 13:52:05 -0700 Subject: [Bioperl-l] module for 6 reading frames In-Reply-To: <20060516200436.34908.qmail@web36812.mail.mud.yahoo.com> Message-ID: Li, You can either do the substring, and reverse complement, yourself or you can use the translate() function in Bio::PrimarySeq. It inherits from Bio::PrimarySeqI, so check there for the documentation. That translate() function takes a "-frame" argument. Scott PS In future, please respond to the list. That way others see the questions and answers. chen li wrote on 16.05.2006 13:04:36: > Dear Dr. Markel, > > I browse through the document of > Bio:Tools::Codontable and find this line: > > my $translation= $CodonTable->translate($seq); > > I think this line is to do the translation. Here is my > question: which line in the doc says how to translate > the remaining frames 2,3, and -1, -2, -3? > > > Thank you, > > Li > > --- smarkel at scitegic.com wrote: > > > Li, > > > > Use the translate() function in > > Bio::Tools::CodonTable. > > > > Scott > > > > Scott Markel, Ph.D. > > Principal Bioinformatics Architect email: > > smarkel at scitegic.com > > SciTegic Inc. mobile: +1 858 > > 205 3653 > > 10188 Telesis Court, Suite 100 voice: +1 858 > > 799 5603 > > San Diego, CA 92121 fax: +1 858 > > 279 8804 > > USA web: > > http://www.scitegic.com > > > > > > bioperl-l-bounces at lists.open-bio.org wrote on > > 16.05.2006 07:55:51: > > > > > Hi all, > > > > > > I wonder which module is available for translating > > DNA > > > sequence into 6 reading frames. > > > > > > Thank you, > > > > > > Li > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > > > -- > Click on the link below to report this email as spam > https://www.mailcontrol. > com/sr/YWaRnXqa+nSyeG1Z34OqL4dC5eYKMoJmYLQSBonkiAgNVwARwO! > frAkRrVu9wDE5L8wrIaSzXTpcs3mxX9Ufx7LAO0PQl77O8HiAh50c4TI! > ysIW++WTn79gM0HS11zvKPuUVANsGXCZT! > LRAY3PyyLo6NzoChgLXk6YfX05ndLG3vE+GH2aUSTxvV3pwd2! > JlBh9ARAt+OXXsyYtG6VgFNOO9GFnNxV From cjfields at uiuc.edu Tue May 16 17:15:10 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 16 May 2006 16:15:10 -0500 Subject: [Bioperl-l] Memory Leak in Bio::SearchIO In-Reply-To: <320530F83FA47047823E57F110DDEAADB159FB@onncrxms4.agr.gc.ca> Message-ID: <000601c6792d$d0ab1500$15327e82@pyrimidine> I mentioned two possibilities last time I posted: 1) that the BLAST file was too large, or 2) that you are using an old version of bioperl that SearchIO is broken. You seem to fit #2. The issue is that NCBI does not consider text BLAST output sacrosanct and routinely makes changes to it that break parsing. Due to this, SearchIO::blast needs to be constantly updated, so much so that there are normally a few updates a year to fix parsing issues in that module alone compared to BioPerl as a whole. And, BTW, although bioperl-1.4 is about 2 years old now, even bioperl-1.5.1 SearchIO is broken when it comes to the latest NCBI BLAST (2.2.14 now). I seriously suggest updating your local bioperl distribution to the latest bioperl-live (from CVS). Take one of those 10000 reports, just one, and try parsing it. If you have the same problem (a CPU spike and increasing memory usage) then it may be fixed in CVS. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne > Sent: Tuesday, May 16, 2006 3:57 PM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO > > > With regards to the suggestions/comments made thank you. However I think > I should clear a few things up. I am running bioperl v1.4, I am cycling > through the blast reports which should not be of absurd size since they > only contain the top 5 hits, and I am using top to track(although I > realize fairly inacuately) the memory usage. I have looked through the > code for both AAFCBLAST and BEAST_UPDATE but do not believe the > leak/problem to be contained within them since they are almost > exclusively using method calls and those variables should be destroyed > upon leaving the scope of the method. I have used Devel::Size to check > the size of the variables $bdbi and $searchio and $connector and on each > iteration these variables have the same size. Any other suggestions > would be greatly appreciated as I have nearly gone insane trying to > track this problem down. > > Thanks, Wayne > > > -----Original Message----- > From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au] > Sent: Monday, May 15, 2006 6:19 PM > To: Clarke, Wayne > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO > > > taking up and huge amount of RAM. For a single job of 10000 queries it > > can consume as much as a couple hundred Mb inside an hour. I realize > > > my $result = $connector->getQueryResult($query_id); > > my $searchio = new Bio::SearchIO(-format => "blast", > > while (my $o_blast = $searchio->next_result()) { > > my $clone_id = $o_blast->query_name(); > > my $statement = $bdbi->form_push_SQL > ($o_blast, $clone_id, 5); } > > Some comments: > > Have you considered that whatever class/module $bdbi belongs to is > causing the problem? ie. is it keeping a reference to $o_blast around? > > Are you aware that Perl garbage collection does not necessarily return > freed memory back to the OS? This may affect how you were measuring > "memory usage". > > -- > Dr Torsten Seemann http://www.vicbioinformatics.com > Victorian Bioinformatics Consortium, Monash University, Australia > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ClarkeW at AGR.GC.CA Tue May 16 17:24:51 2006 From: ClarkeW at AGR.GC.CA (Clarke, Wayne) Date: Tue, 16 May 2006 17:24:51 -0400 Subject: [Bioperl-l] Memory Leak in Bio::SearchIO Message-ID: <320530F83FA47047823E57F110DDEAADB159FC@onncrxms4.agr.gc.ca> Thanks Chris, I did forget to mention however that I did parse one single report and found no problems, it finished fast and with no noticeable memory usage. I will consider getting my SA to update bioperl from CVS as a precaution but he has already stated he prefers to wait for the release of v1.5. Even a single job of 10000 will finish but the problem is that I am trying to loop through many jobs of 10000 and it seems to be additive for reasons I can not determine. During testing I noticed that the RSS on top decreased around 80% MEM usage, but then the shared mem increased. I am wondering if this is due to the perl garbage collector freeing up memory but keeping it in its pool for use, if so that is fine as long as the it does not then want to reach into swapped mem. Thanks again, Wayne -----Original Message----- From: Chris Fields [mailto:cjfields at uiuc.edu] Sent: Tuesday, May 16, 2006 3:15 PM To: Clarke, Wayne; bioperl-l at lists.open-bio.org Subject: RE: [Bioperl-l] Memory Leak in Bio::SearchIO I mentioned two possibilities last time I posted: 1) that the BLAST file was too large, or 2) that you are using an old version of bioperl that SearchIO is broken. You seem to fit #2. The issue is that NCBI does not consider text BLAST output sacrosanct and routinely makes changes to it that break parsing. Due to this, SearchIO::blast needs to be constantly updated, so much so that there are normally a few updates a year to fix parsing issues in that module alone compared to BioPerl as a whole. And, BTW, although bioperl-1.4 is about 2 years old now, even bioperl-1.5.1 SearchIO is broken when it comes to the latest NCBI BLAST (2.2.14 now). I seriously suggest updating your local bioperl distribution to the latest bioperl-live (from CVS). Take one of those 10000 reports, just one, and try parsing it. If you have the same problem (a CPU spike and increasing memory usage) then it may be fixed in CVS. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne > Sent: Tuesday, May 16, 2006 3:57 PM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO > > > With regards to the suggestions/comments made thank you. However I think > I should clear a few things up. I am running bioperl v1.4, I am cycling > through the blast reports which should not be of absurd size since they > only contain the top 5 hits, and I am using top to track(although I > realize fairly inacuately) the memory usage. I have looked through the > code for both AAFCBLAST and BEAST_UPDATE but do not believe the > leak/problem to be contained within them since they are almost > exclusively using method calls and those variables should be destroyed > upon leaving the scope of the method. I have used Devel::Size to check > the size of the variables $bdbi and $searchio and $connector and on each > iteration these variables have the same size. Any other suggestions > would be greatly appreciated as I have nearly gone insane trying to > track this problem down. > > Thanks, Wayne > > > -----Original Message----- > From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au] > Sent: Monday, May 15, 2006 6:19 PM > To: Clarke, Wayne > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO > > > taking up and huge amount of RAM. For a single job of 10000 queries it > > can consume as much as a couple hundred Mb inside an hour. I realize > > > my $result = $connector->getQueryResult($query_id); > > my $searchio = new Bio::SearchIO(-format => "blast", > > while (my $o_blast = $searchio->next_result()) { > > my $clone_id = $o_blast->query_name(); > > my $statement = $bdbi->form_push_SQL > ($o_blast, $clone_id, 5); } > > Some comments: > > Have you considered that whatever class/module $bdbi belongs to is > causing the problem? ie. is it keeping a reference to $o_blast around? > > Are you aware that Perl garbage collection does not necessarily return > freed memory back to the OS? This may affect how you were measuring > "memory usage". > > -- > Dr Torsten Seemann http://www.vicbioinformatics.com > Victorian Bioinformatics Consortium, Monash University, Australia > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Tue May 16 17:45:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 16 May 2006 16:45:16 -0500 Subject: [Bioperl-l] Memory Leak in Bio::SearchIO In-Reply-To: <320530F83FA47047823E57F110DDEAADB159FC@onncrxms4.agr.gc.ca> Message-ID: <000801c67932$050dbd30$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne > Sent: Tuesday, May 16, 2006 4:25 PM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO > > > Thanks Chris, > > I did forget to mention however that I did parse one single report and > found no problems, it finished fast and with no noticeable memory usage. > I will consider getting my SA to update bioperl from CVS as a precaution > but he has already stated he prefers to wait for the release of v1.5. Um, you can tell him the last release was v.1.5.1 (last October). It's considered a developer release but is pretty stable; well, except for that whole SearchIO quibble, and that's not our fault. You could also install a local version in case he doesn't budge; see here: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPERL_I N_A_PERSONAL_MODULE_AREA Chris > Even a single job of 10000 will finish but the problem is that I am > trying to loop through many jobs of 10000 and it seems to be additive > for reasons I can not determine. During testing I noticed that the RSS > on top decreased around 80% MEM usage, but then the shared mem > increased. I am wondering if this is due to the perl garbage collector > freeing up memory but keeping it in its pool for use, if so that is fine > as long as the it does not then want to reach into swapped mem. > > Thanks again, Wayne > ... From cjfields at uiuc.edu Tue May 16 18:20:29 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 16 May 2006 17:20:29 -0500 Subject: [Bioperl-l] Bio::DB::Query::GenBank checks In-Reply-To: <716af09c0605160437tfcf824dxa514f38f6b94d423@mail.gmail.com> Message-ID: <000901c67936$f0896990$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Bernd Web > Sent: Tuesday, May 16, 2006 6:38 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::DB::Query::GenBank checks > > Hi all, > > I was using Bio::DB::Query::GenBank to obtain only IDs from Entrez and > found some issues and differences (bugs?) in behaviour wrt the pod. > Do these look familiar ? > > Some example code: > my $query = Bio::DB::Query::GenBank->new > (-query =>'Lassa Virus[ORGN]', > -reldate => '30', > -db => 'protein', > -ids => [195052,2981014,11127914], > -maxids => 30 ); > > $gb = new Bio::DB::GenBank(format=>'fasta'); > my $seqio = $gb->get_Stream_by_query($query); > while (my $seq = $seqio->next_seq) { > print $seq->desc,"\n"; } > > The module states that if we provide -ids that: > If you provide an array reference of IDs in -ids, the query will be > ignored and the list of IDs will be used when the query is passed > to a > Bio::DB::GenBank object's get_Stream_by_query() method. > > In the above case actually the query is passed ('Lassa Virus[ORGN]), > not the IDs. Also $query->query shows the original query. Am I doing > something wrong or is the pod not reflecting current behaviour of this > module? > > I was also surprised that if internet is down no warning is thrown for > $query->query or $query->count at all. Only the get_Stream_by_query > above will warn us if the site is unreachable (500 Internal Server > Error). I believe this has to do with the difference in the objects and the way they retrieve request data; Bio::DB::GenBank and Bio::DB::Query::GenBank use different methods to retrieve ids, Bio::DB::GenBank's get_Stream_by_query method just makes it a bit easier to retrieve a list of uid's directly instead of saving them as an array then reposting them using get_Stream_by_id. Not fullproof but it works okay. > $query->ids or $query->count will not throw a warning and > @ids=$query->ids will just be an empty array. (I realize $query->count > is not initialized, so I am using this now to check for succes, but a > warning from WebDBSeqI would me more approprotiate I think). WebDBSeqI would be the place to make general warnings (it supposed to be and interface for any web seq DB), but not eutils-specific warnings. > Last, the example from the pod is not working, but no warnings are raised: > # initialize the list yourself > my $query = > Bio::DB::Query::GenBank->new(-ids=>[195052,2981014,11127914]); > > $query->count returns zero w/o any warning. Of course this query did > not specify a DB. Only if we specify -db=>'nucleotide' $query->count > is 3. > However, why not any warning if we set -db->'protein' or if we did not set > this? > > > On the NCBI website searching Protein DB returns for 19505: > See Details. No items found. > The following term(s) refer to a different DB:195052 > > But this is not reflected via Bio::DB::Query::GenBank. > > Can I check for this situation in the code apart from checking on > $query->count == 0 ? Or would it indeed be better to check for these > situations in the module? > > Regards, > Bernd I can probably play around with adding a few things in tomorrow and clean up the POD somewhat. I'm planning a rewrite for EUtilities-based searches but that's a ways off still... Can't promise much;l I'm pretty busy til next week. Chris From chen_li3 at yahoo.com Tue May 16 20:53:17 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 16 May 2006 17:53:17 -0700 (PDT) Subject: [Bioperl-l] module for formating sequence output on the screen Message-ID: <20060517005317.3976.qmail@web36815.mail.mud.yahoo.com> Hi all, Thank you very much for the help. I have some DNA sequences printed on the screen. But the default output is longer than I expect. I need 50 necleotides/line. I search CPAN but can not get the right module. Which bioperl module can do this job? Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From kmdaily at indiana.edu Tue May 16 09:57:52 2006 From: kmdaily at indiana.edu (Daily, Kenneth Michael) Date: Tue, 16 May 2006 09:57:52 -0400 Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO References: <36b52ba4.b94c5b79.8198c00@expms6.cites.uiuc.edu> Message-ID: <20528E699A515C499B80C222BDBEBC34043FFB@iu-mssg-mbx108.ads.iu.edu> OK, got that installed. But I still get an error: Can't locate object method "url" via package "Bio::Annotation::DBLink" at /home/kmdaily/src/bioperl/core/Bio/SeqIO/entrezgene.pm line 557. I am using this on a shared system, and an older version of Bioperl was installed by the admin. But the path to the one I downloaded via CVS is first in the list @INC, and PERL5LIB="/home/kmdaily/src/bioperl/core". Kenny Daily IU School of Informatics kmdaily at indiana.edu -----Original Message----- From: Christopher Fields [mailto:cjfields at uiuc.edu] Sent: Tue 5/16/2006 7:38 AM To: Daily, Kenneth Michael; bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] entrezgene.pm not in Bio::SeqIO You'll have to install from CVS. I believe Brian added Entrezgene.pm after the lst developer release (1.5.1): http://www.bioperl.org/wiki/Installing_BioPerl Chris ---- Original message ---- >Date: Mon, 15 May 2006 17:00:12 -0400 >From: "Daily, Kenneth Michael" >Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO >To: > >I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in Bio/SeqIO). How can I get this module? > >Kenny Daily >IU School of Informatics >kmdaily at indiana.edu > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From skirov at utk.edu Wed May 17 07:48:29 2006 From: skirov at utk.edu (Stefan Kirov) Date: Wed, 17 May 2006 07:48:29 -0400 Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO In-Reply-To: <20528E699A515C499B80C222BDBEBC34043FFB@iu-mssg-mbx108.ads.iu.edu> References: <36b52ba4.b94c5b79.8198c00@expms6.cites.uiuc.edu> <20528E699A515C499B80C222BDBEBC34043FFB@iu-mssg-mbx108.ads.iu.edu> Message-ID: <446B0D8D.40901@utk.edu> You are using an old Bio::Annotation::DBLink module. Did you download only entrezgene.pm or the whole bioperl? If yes, what does the tests tell you? Stefan Daily, Kenneth Michael wrote: >OK, got that installed. But I still get an error: > >Can't locate object method "url" via package "Bio::Annotation::DBLink" at /home/kmdaily/src/bioperl/core/Bio/SeqIO/entrezgene.pm line 557. > >I am using this on a shared system, and an older version of Bioperl was installed by the admin. But the path to the one I downloaded via CVS is first in the list @INC, and PERL5LIB="/home/kmdaily/src/bioperl/core". > >Kenny Daily >IU School of Informatics >kmdaily at indiana.edu > > > >-----Original Message----- >From: Christopher Fields [mailto:cjfields at uiuc.edu] >Sent: Tue 5/16/2006 7:38 AM >To: Daily, Kenneth Michael; bioperl-l at lists.open-bio.org >Subject: Re: [Bioperl-l] entrezgene.pm not in Bio::SeqIO > >You'll have to install from CVS. I believe Brian added Entrezgene.pm after the lst >developer release (1.5.1): > >http://www.bioperl.org/wiki/Installing_BioPerl > >Chris > >---- Original message ---- > > >>Date: Mon, 15 May 2006 17:00:12 -0400 >>From: "Daily, Kenneth Michael" >>Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO >>To: >> >>I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in >> >> >Bio/SeqIO). How can I get this module? > > >>Kenny Daily >>IU School of Informatics >>kmdaily at indiana.edu >> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From osborne1 at optonline.net Tue May 16 20:46:00 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 16 May 2006 20:46:00 -0400 Subject: [Bioperl-l] module for 6 reading frames In-Reply-To: Message-ID: Chen Li, There's some documentation on translate() in bptutorial: http://bioperl.org/Core/Latest/bptutorial.html You could also use the translate_6frames() method of Bio::SeqUtils. Brian O. On 5/16/06 4:52 PM, "smarkel at scitegic.com" wrote: > Li, > > You can either do the substring, and reverse complement, yourself > or you can use the translate() function in Bio::PrimarySeq. It > inherits from Bio::PrimarySeqI, so check there for the documentation. > That translate() function takes a "-frame" argument. > > Scott > > PS In future, please respond to the list. That way others see > the questions and answers. > > chen li wrote on 16.05.2006 13:04:36: > >> Dear Dr. Markel, >> >> I browse through the document of >> Bio:Tools::Codontable and find this line: >> >> my $translation= $CodonTable->translate($seq); >> >> I think this line is to do the translation. Here is my >> question: which line in the doc says how to translate >> the remaining frames 2,3, and -1, -2, -3? >> >> >> Thank you, >> >> Li >> >> --- smarkel at scitegic.com wrote: >> >>> Li, >>> >>> Use the translate() function in >>> Bio::Tools::CodonTable. >>> >>> Scott >>> >>> Scott Markel, Ph.D. >>> Principal Bioinformatics Architect email: >>> smarkel at scitegic.com >>> SciTegic Inc. mobile: +1 858 >>> 205 3653 >>> 10188 Telesis Court, Suite 100 voice: +1 858 >>> 799 5603 >>> San Diego, CA 92121 fax: +1 858 >>> 279 8804 >>> USA web: >>> http://www.scitegic.com >>> >>> >>> bioperl-l-bounces at lists.open-bio.org wrote on >>> 16.05.2006 07:55:51: >>> >>>> Hi all, >>>> >>>> I wonder which module is available for translating >>> DNA >>>> sequence into 6 reading frames. >>>> >>>> Thank you, >>>> >>>> Li >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> __________________________________________________ >> Do You Yahoo!? >> Tired of spam? Yahoo! Mail has the best spam protection around >> http://mail.yahoo.com >> >> >> -- >> Click on the link below to report this email as spam >> https://www.mailcontrol. >> com/sr/YWaRnXqa+nSyeG1Z34OqL4dC5eYKMoJmYLQSBonkiAgNVwARwO! >> frAkRrVu9wDE5L8wrIaSzXTpcs3mxX9Ufx7LAO0PQl77O8HiAh50c4TI! >> ysIW++WTn79gM0HS11zvKPuUVANsGXCZT! >> LRAY3PyyLo6NzoChgLXk6YfX05ndLG3vE+GH2aUSTxvV3pwd2! >> JlBh9ARAt+OXXsyYtG6VgFNOO9GFnNxV > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From e-just at northwestern.edu Wed May 17 11:03:41 2006 From: e-just at northwestern.edu (Eric Just) Date: Wed, 17 May 2006 10:03:41 -0500 Subject: [Bioperl-l] Modware: a BioPerl based API for Chado Message-ID: <6.1.1.1.2.20060517095821.13353920@hecky.it.northwestern.edu> Hi Everyone, We are announcing a new Sourceforge Project called Modware that may be of interest to you. It is an object-oriented API written in Perl that creates BioPerl object representations of biological features stored in a Chado database. It basically creates a Bio::Seq object for chromosomes in Chado and creates Bio::SeqFeature::Gene objects for protein coding transcripts stored in Chado. Things like contigs are represented as Bio::SeqFeature::Generic objects. We also provide many methods for manipulating these objects once they are in memory. For download please visit our Sourceforge project page: http://sourceforge.net/projects/gmod-ware For API documentation and some short examples of selected use cases visit our project home page: http://gmod-ware.sourceforge.net/ This software is adapted from the production middleware code that dictyBase uses. Modware 0.1 requires the latest stable GMOD release: 0.003 be installed. We are currently calling it a release candidate and if we get some feedback will call it an official release if there are no major install bugs (we've installed it only on two different machines). If you would like a version that works on the latest CVS version of GMOD, let me know and I'll expedite getting that out the door. Lastly, please use the direct download version, we have not fully recovered from the recent Sourceforge CVS issues. Please try the software out and let us know what you think! Sincerely, Eric Just and Sohel Merchant e-just at northwestern.edu s-merchant at northwestern.edu ============================================ Eric Just e-just at northwestern.edu dictyBase Programmer Center for Genetic Medicine Northwestern University http://dictybase.org ============================================ From sb at mrc-dunn.cam.ac.uk Wed May 17 13:46:45 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Wed, 17 May 2006 18:46:45 +0100 Subject: [Bioperl-l] Bio::Map:: enhancements Message-ID: <446B6185.1000602@mrc-dunn.cam.ac.uk> I added bug http://bugzilla.bioperl.org/show_bug.cgi?id=1998 I'm interested in what people have to say about the secondary enhancement I talk about there. Is it a sane thing to do? What are the better ways of doing that? If it /is/ ok, I suppose I'd have to go back and alter Bio::Map::MappableI and Bio::Map::MarkerI as well, not just Marker. Oh, on a side note, you'll see I had to override RangeI's intersection method to work on multiple ranges. Why is RangeI limited to an intersection of only two ranges? Cheers, Sendu. From David_Waner/San_Diego/Accelrys at scitegic.com Thu May 18 15:30:46 2006 From: David_Waner/San_Diego/Accelrys at scitegic.com (David_Waner/San_Diego/Accelrys at scitegic.com) Date: Thu, 18 May 2006 12:30:46 -0700 Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8 on Windows Message-ID: BioPerl Users/Developers, In our testing we have found severe performance problems using BioPerl with Perl 5.8 on Windows (but not on Linux). They show up especially in SeqIO when reading or writing Fasta files containing large (~16 MB) sequences. The same files that can be read in 1 or 2 seconds with Windows Perl 5.6 or Linux Perl 5.8, take minutes in Windows Perl 5.8. Although the fault is clearly with Perl, not with BioPerl, I have identified a couple of places where BioPerl could be modified in order to save Windows Perl 5.8 users a lot of time, while not affecting other users. For example, in my testing the following excerpt from Bio::Root::IO::_readline() takes 50 seconds (!) to execute (when reading a 16 MB sequence): if( (!$param{-raw}) && (defined $line) ) { $line =~ s/\015?\012/\n/g; $line =~ s/\015/\n/g unless $ONMAC; } whereas the following replacement code should be equivalent: if( (!$param{-raw}) && (defined $line) ) { $line =~ s/\015\012/\012/g; # Change all CR/LF pairs to LF $line =~ tr/\015/\n/ unless $ONMAC; # Change all single CRs to NEWLINE } but executes in less than 1 second. In addition, changing: defined $sequence && $sequence =~ s/\s//g; # Remove whitespace to: defined $sequence && $sequence =~ tr/ \t\n\r//d; # Remove whitespace in Bio::SeqIO::fasta.pm saves an additional ~20 seconds. There are also problems in reading files with the <> operator when $/ is redefined to "\n>", where reading the first line of Fasta files containing large sequences takes ~50 seconds, but reading subsequent lines or files takes about 1 second. I don't have a work-around for this. I would like to ask the mailing list: 1. Has anyone else run into this problem? Any fixes? 2. Do you think BioPerl should incorporate these changes? I plan to submit a bug report to perlbug, but don't know when or if the problem will be fixed. - David From cjfields at uiuc.edu Thu May 18 16:07:14 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 18 May 2006 15:07:14 -0500 Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8 onWindows In-Reply-To: Message-ID: <002901c67ab6$a84c3140$15327e82@pyrimidine> David, I have seen some slowdowns with Bio::SeqIO associated with GenBank files, which this could be related to. I can't do anything about it (test or commit changes) until next week but someone else using Windows might (though we are few and far between, and I'm switching to Mac OS X in fall). Would be nice to try the changes and test it out on a few platforms. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of > David_Waner/San_Diego/Accelrys at scitegic.com > Sent: Thursday, May 18, 2006 2:31 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8 > onWindows > > BioPerl Users/Developers, > > In our testing we have found severe performance problems using BioPerl > with Perl 5.8 on Windows (but not on Linux). They show up especially in > SeqIO when reading or writing Fasta files containing large (~16 MB) > sequences. The same files that can be read in 1 or 2 seconds with Windows > Perl 5.6 or Linux Perl 5.8, take minutes in Windows Perl 5.8. > > Although the fault is clearly with Perl, not with BioPerl, I have > identified a couple of places where BioPerl could be modified in order to > save Windows Perl 5.8 users a lot of time, while not affecting other > users. > > For example, in my testing the following excerpt from > Bio::Root::IO::_readline() takes 50 seconds (!) to execute (when reading a > 16 MB sequence): > > if( (!$param{-raw}) && (defined $line) ) { > $line =~ s/\015?\012/\n/g; > $line =~ s/\015/\n/g unless $ONMAC; > } > > whereas the following replacement code should be equivalent: > > if( (!$param{-raw}) && (defined $line) ) { > $line =~ s/\015\012/\012/g; # Change all > CR/LF pairs to LF > $line =~ tr/\015/\n/ unless $ONMAC; # Change all single CRs to > NEWLINE > } > > but executes in less than 1 second. > > In addition, changing: > > defined $sequence && $sequence =~ s/\s//g; # Remove whitespace > > to: > > defined $sequence && $sequence =~ tr/ \t\n\r//d; # Remove > whitespace > > in Bio::SeqIO::fasta.pm saves an additional ~20 seconds. > > There are also problems in reading files with the <> operator when $/ is > redefined to "\n>", where reading the first line of Fasta files containing > large sequences takes ~50 seconds, but reading subsequent lines or files > takes about 1 second. I don't have a work-around for this. > > I would like to ask the mailing list: > > 1. Has anyone else run into this problem? Any fixes? > 2. Do you think BioPerl should incorporate these changes? > > I plan to submit a bug report to perlbug, but don't know when or if the > problem will be fixed. > > - David > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Thu May 18 16:27:57 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 18 May 2006 16:27:57 -0400 Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8 on Windows In-Reply-To: Message-ID: David, What are the results from the relevant t/*t files before and after these patches? Brian O. On 5/18/06 3:30 PM, "David_Waner/San_Diego/Accelrys at scitegic.com" wrote: > BioPerl Users/Developers, > > In our testing we have found severe performance problems using BioPerl > with Perl 5.8 on Windows (but not on Linux). They show up especially in > SeqIO when reading or writing Fasta files containing large (~16 MB) > sequences. The same files that can be read in 1 or 2 seconds with Windows > Perl 5.6 or Linux Perl 5.8, take minutes in Windows Perl 5.8. > > Although the fault is clearly with Perl, not with BioPerl, I have > identified a couple of places where BioPerl could be modified in order to > save Windows Perl 5.8 users a lot of time, while not affecting other > users. > > For example, in my testing the following excerpt from > Bio::Root::IO::_readline() takes 50 seconds (!) to execute (when reading a > 16 MB sequence): > > if( (!$param{-raw}) && (defined $line) ) { > $line =~ s/\015?\012/\n/g; > $line =~ s/\015/\n/g unless $ONMAC; > } > > whereas the following replacement code should be equivalent: > > if( (!$param{-raw}) && (defined $line) ) { > $line =~ s/\015\012/\012/g; # Change all > CR/LF pairs to LF > $line =~ tr/\015/\n/ unless $ONMAC; # Change all single CRs to > NEWLINE > } > > but executes in less than 1 second. > > In addition, changing: > > defined $sequence && $sequence =~ s/\s//g; # Remove whitespace > > to: > > defined $sequence && $sequence =~ tr/ \t\n\r//d; # Remove > whitespace > > in Bio::SeqIO::fasta.pm saves an additional ~20 seconds. > > There are also problems in reading files with the <> operator when $/ is > redefined to "\n>", where reading the first line of Fasta files containing > large sequences takes ~50 seconds, but reading subsequent lines or files > takes about 1 second. I don't have a work-around for this. > > I would like to ask the mailing list: > > 1. Has anyone else run into this problem? Any fixes? > 2. Do you think BioPerl should incorporate these changes? > > I plan to submit a bug report to perlbug, but don't know when or if the > problem will be fixed. > > - David > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Thu May 18 16:41:27 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Thu, 18 May 2006 14:41:27 -0600 Subject: [Bioperl-l] parsing xml output Message-ID: <446CDBF7.10908@gmx.at> hi, what is the best way to parse NCBI- and WU- Blast XML output.... and is it possible to parse both with the same parser, or differ their XML output... thanks From staffa at niehs.nih.gov Thu May 18 16:49:15 2006 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Thu, 18 May 2006 16:49:15 -0400 Subject: [Bioperl-l] Reading GenBank Genomic File Annotation Message-ID: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov> Would like a fairly simple way to extract certain information from Genbank Genomic File Annotations. Namely the six D.melanogaster sequences. Specifically to find gene entries and learn the gene name, begin and end and CDS. Please point me to appropriate modules and documentation. Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Information Technology Support Services Contract (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina From adamnkraut at gmail.com Thu May 18 17:07:42 2006 From: adamnkraut at gmail.com (Adam Kraut) Date: Thu, 18 May 2006 17:07:42 -0400 Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline C? Message-ID: <134ede0b0605181407l52d1c2c3x79dd7f177ae7b828@mail.gmail.com> I am currently using a pairwise alignment algorithm written in C (not by me). The program consists of a library of routines, structures, and definitions which I do not want to spend a lot of time abstracting. I already have a hack method of writing the parameters and inputs I want from perl, calling the c program with system( ), and then parsing the output in Perl. Any good programmer would probably smack me but I'm just an undergrad and I needed to show my boss that this works in order to spend more time on it. So on to my question, what is the preferred method of extending Bioperl to use this algorithm? I have just read the XS tutorial and a bit about Inline C. Can I put the main function in my script using Inline, and then just point Inline at the rest of the C library? The program has several C-structures that are semantically equivalent to Bioperl objects, so just need somewhere to start. I will spend some more time so that I have a more specific question, I just wanted a little feedback, this is my first post to the bioperl list. Thanks, Adam From osborne1 at optonline.net Thu May 18 17:54:01 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 18 May 2006 17:54:01 -0400 Subject: [Bioperl-l] Reading GenBank Genomic File Annotation In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov> Message-ID: Nick, Have you read the Feature-Annotation HOWTO? This would be a good starting point... Brian O. On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" wrote: > Would like a fairly simple way to extract certain information from Genbank > Genomic File Annotations. > Namely the six D.melanogaster sequences. > Specifically to find gene entries and learn the gene name, begin and end and > CDS. > Please point me to appropriate modules and documentation. > > > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Information Technology Support Services Contract > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Thu May 18 18:22:32 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu, 18 May 2006 18:22:32 -0400 Subject: [Bioperl-l] parsing xml output In-Reply-To: <446CDBF7.10908@gmx.at> References: <446CDBF7.10908@gmx.at> Message-ID: we don't parse WU-BLAST XML at this time. We'd welcome someone contributing this. ncbi XML is parsed with blastxml format. -jason On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote: > hi, > what is the best way to parse NCBI- and WU- Blast XML output.... > and is it possible to parse both with the same parser, or differ their > XML output... > > thanks > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From MEC at stowers-institute.org Thu May 18 18:39:15 2006 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Thu, 18 May 2006 17:39:15 -0500 Subject: [Bioperl-l] module for formating sequence output on the screen Message-ID: Li, Here's a one-liner that uses bioperl's Bio::SeqIO module to reformat fasta on standard input to 50 char wide fasta on standard output. perl -MBio::SeqIO -e 'select Bio::SeqIO->newFh(-format => "fasta", -width => 50); $in = Bio::SeqIO->newFh(-format => "fasta", -fh => \*STDIN); print while <$in>' You can call it like this: perl -MBio::SeqIO -e 'select Bio::SeqIO->newFh(-format => "fasta", -width => 50); $in = Bio::SeqIO->newFh(-format => "fasta", -fh => \*STDIN); print while <$in>' inputfile.fasta > outputfile.fasta Does this help? --Malcolm Cook >-----Original Message----- >From: bioperl-l-bounces at lists.open-bio.org >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chen li >Sent: Tuesday, May 16, 2006 7:53 PM >To: bioperl-l at bioperl.org >Subject: [Bioperl-l] module for formating sequence output on the screen > >Hi all, > >Thank you very much for the help. > >I have some DNA sequences printed on the screen. But >the default output is longer than I expect. I need 50 >necleotides/line. I search CPAN but can not get the >right module. Which bioperl module can do this job? > >Li > >__________________________________________________ >Do You Yahoo!? >Tired of spam? Yahoo! Mail has the best spam protection around >http://mail.yahoo.com >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From gish at watson.wustl.edu Thu May 18 19:57:03 2006 From: gish at watson.wustl.edu (Warren Gish) Date: Thu, 18 May 2006 18:57:03 -0500 Subject: [Bioperl-l] parsing xml output In-Reply-To: Message-ID: <009f01c67ad6$c359a560$0d00a8c0@PM> Just to clarify, the XML output from WU-BLAST conforms to the standard NCBI_BlastOutput.dtd. Technically, contents of data fields could still be incompatible, but care was taken to ensure compatibility. If someone identifies a difference that prevents parsing or proper interpretation of the WU-BLAST output, please let me know. Regards, --Warren > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Jason Stajich > Sent: Thursday, May 18, 2006 5:23 PM > To: Hubert Prielinger > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] parsing xml output > > we don't parse WU-BLAST XML at this time. We'd welcome someone > contributing this. > > ncbi XML is parsed with blastxml format. > > -jason > On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote: > > > hi, > > what is the best way to parse NCBI- and WU- Blast XML output.... > > and is it possible to parse both with the same parser, or > differ their > > XML output... > > > > thanks > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at uiuc.edu Thu May 18 21:10:50 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Thu, 18 May 2006 20:10:50 -0500 Subject: [Bioperl-l] parsing xml output Message-ID: Just to make sure everybody knows, if you use bioperl v1.5.1, SearchIO::blastxml uses XML::Parser which should come with most recent perl distributions. The bioperl-live version has switched over to XML::SAX for SAX2 parsing and it is recommended that you install XML::SAX::ExpatXS as well for faster parsing. Chris ---- Original message ---- >Date: Thu, 18 May 2006 18:57:03 -0500 >From: "Warren Gish" >Subject: Re: [Bioperl-l] parsing xml output >To: "'Hubert Prielinger'" >Cc: bioperl-l at bioperl.org > >Just to clarify, the XML output from WU-BLAST conforms to the standard >NCBI_BlastOutput.dtd. Technically, contents of data fields could still be >incompatible, but care was taken to ensure compatibility. If someone >identifies a difference that prevents parsing or proper interpretation of >the WU-BLAST output, please let me know. >Regards, >--Warren > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Jason Stajich >> Sent: Thursday, May 18, 2006 5:23 PM >> To: Hubert Prielinger >> Cc: bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] parsing xml output >> >> we don't parse WU-BLAST XML at this time. We'd welcome someone >> contributing this. >> >> ncbi XML is parsed with blastxml format. >> >> -jason >> On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote: >> >> > hi, >> > what is the best way to parse NCBI- and WU- Blast XML output.... >> > and is it possible to parse both with the same parser, or >> differ their >> > XML output... >> > >> > thanks >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Fri May 19 08:52:13 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri, 19 May 2006 08:52:13 -0400 Subject: [Bioperl-l] parsing xml output In-Reply-To: <009f01c67ad6$c359a560$0d00a8c0@PM> References: <009f01c67ad6$c359a560$0d00a8c0@PM> Message-ID: <360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu> Whoops - sorry Warren - for some reason I had it in my mind that it was different. So the blastxml parser should work fine. The WUBLAST tab-delimited output is different than NCBI's -m8/9 though, right? -jason On May 18, 2006, at 7:57 PM, Warren Gish wrote: > Just to clarify, the XML output from WU-BLAST conforms to the standard > NCBI_BlastOutput.dtd. Technically, contents of data fields could > still be > incompatible, but care was taken to ensure compatibility. If someone > identifies a difference that prevents parsing or proper > interpretation of > the WU-BLAST output, please let me know. > Regards, > --Warren > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Jason Stajich >> Sent: Thursday, May 18, 2006 5:23 PM >> To: Hubert Prielinger >> Cc: bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] parsing xml output >> >> we don't parse WU-BLAST XML at this time. We'd welcome someone >> contributing this. >> >> ncbi XML is parsed with blastxml format. >> >> -jason >> On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote: >> >>> hi, >>> what is the best way to parse NCBI- and WU- Blast XML output.... >>> and is it possible to parse both with the same parser, or >> differ their >>> XML output... >>> >>> thanks >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From torsten.seemann at infotech.monash.edu.au Thu May 18 18:42:05 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 19 May 2006 08:42:05 +1000 Subject: [Bioperl-l] parsing xml output In-Reply-To: <446CDBF7.10908@gmx.at> References: <446CDBF7.10908@gmx.at> Message-ID: <446CF83D.60207@infotech.monash.edu.au> > what is the best way to parse NCBI- and WU- Blast XML output.... > and is it possible to parse both with the same parser, or differ their > XML output... For NCBI BLAST XML format, use Bio::SearchIO->new(-format=>'blastxml', ...) I don't know if 'blastxml' will load WU-BLAST XML format. http://www.bioperl.org/wiki/HOWTO:SearchIO does not mention it. Why not try it, and report back the results to the bioperl list? -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia -------------- next part -------------- A non-text attachment was scrubbed... Name: torsten.seemann.vcf Type: text/x-vcard Size: 348 bytes Desc: not available URL: From torsten.seemann at infotech.monash.edu.au Thu May 18 18:37:17 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 19 May 2006 08:37:17 +1000 Subject: [Bioperl-l] Reading GenBank Genomic File Annotation In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov> References: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov> Message-ID: <446CF71D.2070207@infotech.monash.edu.au> Staffa, Nick (NIH/NIEHS) [C] wrote: > Would like a fairly simple way to extract certain information from Genbank Genomic File Annotations. > Namely the six D.melanogaster sequences. > Specifically to find gene entries and learn the gene name, begin and end and CDS. > Please point me to appropriate modules and documentation. http://www.bioperl.org/ -> http://www.bioperl.org/wiki/HOWTOs -> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation http://www.bioperl.org/ -> http://www.bioperl.org/wiki/FAQ -> http://www.bioperl.org/wiki/FAQ#Annotations_and_Features -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia -------------- next part -------------- A non-text attachment was scrubbed... Name: torsten.seemann.vcf Type: text/x-vcard Size: 348 bytes Desc: not available URL: From gish at watson.wustl.edu Fri May 19 10:50:08 2006 From: gish at watson.wustl.edu (Warren Gish) Date: Fri, 19 May 2006 09:50:08 -0500 Subject: [Bioperl-l] parsing xml output In-Reply-To: <360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu> References: <009f01c67ad6$c359a560$0d00a8c0@PM> <360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu> Message-ID: Right, the WU-BLAST tabbed output contains more fields. (See http:// blast.wustl.edu/blast/tabular.html). --Warren > Whoops - sorry Warren - for some reason I had it in my mind that it > was different. So the blastxml parser should work fine. The > WUBLAST tab-delimited output is different than NCBI's -m8/9 though, > right? > > -jason From adamnkraut at gmail.com Fri May 19 11:04:01 2006 From: adamnkraut at gmail.com (Adam Kraut) Date: Fri, 19 May 2006 11:04:01 -0400 Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline C? In-Reply-To: References: <134ede0b0605181407l52d1c2c3x79dd7f177ae7b828@mail.gmail.com> Message-ID: <134ede0b0605190804i60ee5ce1v984a33e0c91adf52@mail.gmail.com> The program generates an ensemble of weighted suboptimal alignments by use of a partition function and stochastic backtracking. The algorithm is quite novel and it's really only part of a larger multi-scale comparative modeling project. There documentation is here: http://www.tbi.univie.ac.at/~ulim/probA/probA_lib.html While I think this would be useful to the bioperl community if it were fully abstracted/extended, I would at the least like to be able to pass in any two sequences and get back SimpleAlign objects for our internal uses first. I have a good idea on how to get started. I will be sure to post when I get into trouble. On 5/19/06, aaron.j.mackey at gsk.com wrote: > > bioperl-ext is the package in which alignment algorithms and/or BioPerl > "wrapped" external C libraries live. Subprojects in bioperl-ext use both > XS and Inline::C, that's up to you. > > You'll need to get your C code compiled to a dynamically loaded library > (.so) to use either XS or Inline::C; this precludes any reuse of the C > main() function (although your Inline::C wrapper might recapitulate/copy > the main() function code). > > Out of curiosity, what pairwise alignment algorithm are you using? This > is a heavily beaten path, you might want to dig around first to see if > someone else already has what you need. > > -Aaron > > From slenk at emich.edu Fri May 19 10:42:41 2006 From: slenk at emich.edu (Stephen Gordon Lenk) Date: Fri, 19 May 2006 10:42:41 -0400 Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline C? Message-ID: There is nothing wrong with a reasonable way that works - better not to put yourself down. Inline is good if you can get it to work for you - I have had issues with linking Inline to dynamic libraries. I believe Inline makes a file that has linkage characteristics specified. Try it and see, then tell people how you did it. My two cents. Another way to use exterior executables is popen3, then reading and writing to the pipes. I use it (primer3 and local lab automation code) - snippet follows: my $pid = 0; my $cancmd = 'cancmd.exe'; my $write = 0; my $read = 0; sub new { my $c = {}; $pid = open3(\*WTRFH, \*RDRFH, \*RDRFH, $cancmd); $write = *WTRFH; $read = *RDRFH; $write->autoflush(); bless $c; return $c; } Just write your request, then read it back - I make sure that each pair is a newline terminated text line - be sure you harvest the child pid when you are done. ----- Original Message ----- From: Adam Kraut Date: Thursday, May 18, 2006 5:07 pm Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline C? > I am currently using a pairwise alignment algorithm written in C > (not by > me). The program consists of a library of routines, structures, and > definitions which I do not want to spend a lot of time > abstracting. I > already have a hack method of writing the parameters and inputs I > want from > perl, calling the c program with system( ), and then parsing the > output in > Perl. Any good programmer would probably smack me but I'm just an > undergradand I needed to show my boss that this works in order to > spend more time on > it. > > So on to my question, what is the preferred method of extending > Bioperl to > use this algorithm? I have just read the XS tutorial and a bit > about Inline > C. Can I put the main function in my script using Inline, and > then just > point Inline at the rest of the C library? The program has several > C-structures that are semantically equivalent to Bioperl objects, > so just > need somewhere to start. I will spend some more time so that I > have a more > specific question, I just wanted a little feedback, this is my > first post to > the bioperl list. > > Thanks, > Adam > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hubert.prielinger at gmx.at Fri May 19 12:52:28 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 19 May 2006 10:52:28 -0600 Subject: [Bioperl-l] parsing xml output In-Reply-To: References: <009f01c67ad6$c359a560$0d00a8c0@PM> <360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu> Message-ID: <446DF7CC.5060509@gmx.at> hi, I wondered whether is it also possible in the xml output (either WU or NCBI - Blast) to get the species (taxononmy) for every hit, if I do a general search. regards Warren Gish wrote: > Right, the WU-BLAST tabbed output contains more fields. (See http:// > blast.wustl.edu/blast/tabular.html). > --Warren > > >> Whoops - sorry Warren - for some reason I had it in my mind that it >> was different. So the blastxml parser should work fine. The >> WUBLAST tab-delimited output is different than NCBI's -m8/9 though, >> right? >> >> -jason >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From staffa at niehs.nih.gov Fri May 19 14:12:47 2006 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Fri, 19 May 2006 14:12:47 -0400 Subject: [Bioperl-l] Reading GenBank Genomic File Annotation In-Reply-To: Message-ID: <7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov> Specifically: I have the document to which you refer, but have not seen this one thing I need in the printout of tags etc.: the values in this line; mRNA join(380..509,578..1913,7784..8649,9439..10200) Is that a location object? Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Information Technology Support Services Contract (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina > ---------- > From: Brian Osborne > Sent: Thursday, May 18, 2006 5:54 PM > To: Staffa, Nick (NIH/NIEHS) [C]; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Reading GenBank Genomic File Annotation > > Nick, > > Have you read the Feature-Annotation HOWTO? This would be a good starting > point... > > Brian O. > > > On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" > wrote: > > > Would like a fairly simple way to extract certain information from Genbank > > Genomic File Annotations. > > Namely the six D.melanogaster sequences. > > Specifically to find gene entries and learn the gene name, begin and end and > > CDS. > > Please point me to appropriate modules and documentation. > > > > > > Nick Staffa > > Telephone: 919-316-4569 (NIEHS: 6-4569) > > Scientific Computing Support Group > > NIEHS Information Technology Support Services Contract > > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) > > National Institute of Environmental Health Sciences > > National Institutes of Health > > Research Triangle Park, North Carolina > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From chandan.kr.singh at gmail.com Fri May 19 14:37:26 2006 From: chandan.kr.singh at gmail.com (CHANDAN SINGH) Date: Sat, 20 May 2006 00:07:26 +0530 Subject: [Bioperl-l] Reading GenBank Genomic File Annotation In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov> References: <7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov> Message-ID: <2d4f320605191137n11017ec0xe41a632a3c7ea9a9@mail.gmail.com> On 5/19/06, Staffa, Nick (NIH/NIEHS) [C] wrote: > > Specifically: > I have the document to which you refer, > but have not seen this one thing I need in the printout of tags etc.: > the values in this line; > mRNA join(380..509,578..1913,7784..8649,9439..10200) > Is that a location object? Yes it is a location object . If you want that as a string (this is what seems from ur mail ) , u just have to do this : $loc = $fet->location(); $loc_str = $loc->to_FTstring() ; Hope it helps. Chandan Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Information Technology Support Services Contract > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > > > ---------- > > From: Brian Osborne > > Sent: Thursday, May 18, 2006 5:54 PM > > To: Staffa, Nick (NIH/NIEHS) [C]; bioperl-l at lists.open-bio.org > > Subject: Re: [Bioperl-l] Reading GenBank Genomic File Annotation > > > > Nick, > > > > Have you read the Feature-Annotation HOWTO? This would be a good > starting > > point... > > > > Brian O. > > > > > > On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" > > > wrote: > > > > > Would like a fairly simple way to extract certain information from > Genbank > > > Genomic File Annotations. > > > Namely the six D.melanogaster sequences. > > > Specifically to find gene entries and learn the gene name, begin and > end and > > > CDS. > > > Please point me to appropriate modules and documentation. > > > > > > > > > Nick Staffa > > > Telephone: 919-316-4569 (NIEHS: 6-4569) > > > Scientific Computing Support Group > > > NIEHS Information Technology Support Services Contract > > > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) > > > National Institute of Environmental Health Sciences > > > National Institutes of Health > > > Research Triangle Park, North Carolina > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From osborne1 at optonline.net Fri May 19 15:39:36 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 19 May 2006 15:39:36 -0400 Subject: [Bioperl-l] Reading GenBank Genomic File Annotation In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov> Message-ID: Nick, This is from the HOWTO: Another way of describing a feature in Genbank involves multiple start and end positions. These could be called "split" locations, and a very common example is the join statement in the CDS feature found in Genbank entries (e.g. join(45..122,233..267)). This calls for a specialized object, Bio::Location::SplitLocationI, which is a container for Location objects: for my $feature ($seqobj->top_SeqFeatures){ if ( $feature->location->isa('Bio::Location::SplitLocationI') && $feature->primary_tag eq 'CDS' ) { for my $location ( $feature->location->sub_Location ) { print $location->start . ".." . $location->end . "\n"; } } } Brian O. On 5/19/06 2:12 PM, "Staffa, Nick (NIH/NIEHS) [C]" wrote: > Specifically: > I have the document to which you refer, > but have not seen this one thing I need in the printout of tags etc.: > the values in this line; > mRNA join(380..509,578..1913,7784..8649,9439..10200) > Is that a location object? > > > > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Information Technology Support Services Contract > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > >> ---------- >> From: Brian Osborne >> Sent: Thursday, May 18, 2006 5:54 PM >> To: Staffa, Nick (NIH/NIEHS) [C]; bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Reading GenBank Genomic File Annotation >> >> Nick, >> >> Have you read the Feature-Annotation HOWTO? This would be a good starting >> point... >> >> Brian O. >> >> >> On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" >> wrote: >> >>> Would like a fairly simple way to extract certain information from Genbank >>> Genomic File Annotations. >>> Namely the six D.melanogaster sequences. >>> Specifically to find gene entries and learn the gene name, begin and end and >>> CDS. >>> Please point me to appropriate modules and documentation. >>> >>> >>> Nick Staffa >>> Telephone: 919-316-4569 (NIEHS: 6-4569) >>> Scientific Computing Support Group >>> NIEHS Information Technology Support Services Contract >>> (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) >>> National Institute of Environmental Health Sciences >>> National Institutes of Health >>> Research Triangle Park, North Carolina >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Fri May 19 16:42:09 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 19 May 2006 14:42:09 -0600 Subject: [Bioperl-l] parsing xml output In-Reply-To: References: <009f01c67ad6$c359a560$0d00a8c0@PM> <360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu> <446DF7CC.5060509@gmx.at> Message-ID: <446E2DA1.1050503@gmx.at> hi warren, that means if I alter the DTD (if that is possible) by adding the taxonomic id to the DTD..... then I should have the taxonomic id tag in the xml file (theoretically) but I guess this is only possible with a local search (blastall) but not with an online search. greetings Warren Gish wrote: > > On May 19, 2006, at 11:52 AM, Hubert Prielinger wrote: > >> hi, >> I wondered whether is it also possible in the xml output (either WU >> or NCBI - Blast) to get the species (taxononmy) for every hit, if I >> do a general search. >> regards >> > The taxonomic id is not an entity in the NCBI XML DTD. If the > information was embedded in deflines, one could conceivably parse for > it, but I believe the NCBI only distributes taxids in their ASN.1 data > and in their pre-formated BLAST databases, and NCBI BLAST only reports > taxids in its ASN.1 output format, where taxid is available as an entity. > > --Warren > > From cjfields at uiuc.edu Fri May 19 16:56:56 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Fri, 19 May 2006 15:56:56 -0500 Subject: [Bioperl-l] parsing xml output Message-ID: <5c1c5a79.bb0af5aa.8198d00@expms6.cites.uiuc.edu> You'll have to pull the GI or accession from each hit and do a lookup by either grabbing the sequence and using Bio::Species or use Bio::DB::Taxonomy; there isn't any tax information directly incorporated into BLAST reports AFAIK. Chris ---- Original message ---- >Date: Fri, 19 May 2006 10:52:28 -0600 >From: Hubert Prielinger >Subject: Re: [Bioperl-l] parsing xml output >To: Warren Gish , bioperl-l at bioperl.org > >hi, >I wondered whether is it also possible in the xml output (either WU or >NCBI - Blast) to get the species (taxononmy) for every hit, if I do a >general search. >regards > >Warren Gish wrote: >> Right, the WU-BLAST tabbed output contains more fields. (See http:// >> blast.wustl.edu/blast/tabular.html). >> --Warren >> >> >>> Whoops - sorry Warren - for some reason I had it in my mind that it >>> was different. So the blastxml parser should work fine. The >>> WUBLAST tab-delimited output is different than NCBI's -m8/9 though, >>> right? >>> >>> -jason >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri May 19 16:59:35 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Fri, 19 May 2006 15:59:35 -0500 Subject: [Bioperl-l] parsing xml output Message-ID: <65932c77.bb0b33b0.8253400@expms6.cites.uiuc.edu> Um, I don't think it works that way. I'm pretty sure the XML is generated from the ASN1 output. I don't think (like Warren says) that you can directly get to the tax information. Indirectly is another matter... Chris ---- Original message ---- >Date: Fri, 19 May 2006 14:42:09 -0600 >From: Hubert Prielinger >Subject: Re: [Bioperl-l] parsing xml output >To: Warren Gish , bioperl-l at bioperl.org > >hi warren, >that means if I alter the DTD (if that is possible) by adding the >taxonomic id to the DTD..... then I should have the taxonomic id tag in >the xml file (theoretically) >but I guess this is only possible with a local search (blastall) but not >with an online search. > >greetings > >Warren Gish wrote: >> >> On May 19, 2006, at 11:52 AM, Hubert Prielinger wrote: >> >>> hi, >>> I wondered whether is it also possible in the xml output (either WU >>> or NCBI - Blast) to get the species (taxononmy) for every hit, if I >>> do a general search. >>> regards >>> >> The taxonomic id is not an entity in the NCBI XML DTD. If the >> information was embedded in deflines, one could conceivably parse for >> it, but I believe the NCBI only distributes taxids in their ASN.1 data >> and in their pre-formated BLAST databases, and NCBI BLAST only reports >> taxids in its ASN.1 output format, where taxid is available as an entity. >> >> --Warren >> >> > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Fri May 19 17:30:20 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 19 May 2006 15:30:20 -0600 Subject: [Bioperl-l] parsing xml output In-Reply-To: <446E3854.5010708@gmx.at> References: <5c1c5a79.bb0af5aa.8198d00@expms6.cites.uiuc.edu> <446E3854.5010708@gmx.at> Message-ID: <446E38EC.9020100@gmx.at> ok, thanks, it appears that I only need the species where the Protein is derived from, so I guess Bio:Species would satisfy me, or? and it would work that I just pull off the accession from the blast output file and then assign the accession code and get as return value the species name. is it possible to just assign the accession code, because I looked up but they were always talking of the entire file. regards > > > Christopher Fields wrote: >> You'll have to pull the GI or accession from each hit and do a lookup >> by either grabbing the sequence and using Bio::Species or use >> Bio::DB::Taxonomy; there isn't any tax information directly >> incorporated into BLAST reports AFAIK. >> >> Chris >> >> ---- Original message ---- >> >>> Date: Fri, 19 May 2006 10:52:28 -0600 >>> From: Hubert Prielinger Subject: Re: >>> [Bioperl-l] parsing xml output To: Warren Gish >>> , bioperl-l at bioperl.org >>> >>> hi, >>> I wondered whether is it also possible in the xml output (either WU >>> or NCBI - Blast) to get the species (taxononmy) for every hit, if I >>> do a general search. >>> regards >>> >>> Warren Gish wrote: >>> >>>> Right, the WU-BLAST tabbed output contains more fields. (See >>>> http:// blast.wustl.edu/blast/tabular.html). >>>> --Warren >>>> >>>> >>>>> Whoops - sorry Warren - for some reason I had it in my mind that >>>>> it was different. So the blastxml parser should work fine. The >>>>> WUBLAST tab-delimited output is different than NCBI's -m8/9 >>>>> though, right? >>>>> >>>>> -jason >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > From jason.stajich at duke.edu Fri May 19 18:40:54 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri, 19 May 2006 18:40:54 -0400 Subject: [Bioperl-l] parsing xml output In-Reply-To: <446E38EC.9020100@gmx.at> References: <5c1c5a79.bb0af5aa.8198d00@expms6.cites.uiuc.edu> <446E3854.5010708@gmx.at> <446E38EC.9020100@gmx.at> Message-ID: There is a gi2taxid table in the /pub/taxonomy part of NCBI FTP site (ftp.ncbi.nih.gov) -- I have used this to take GI numbers from report and get taxonomy for overall classification. I think something like this exists in the scripts or examples directory in the bioperl distro. I know I posted about it when I wrote about it a while ago. -jason On May 19, 2006, at 5:30 PM, Hubert Prielinger wrote: > ok, thanks, > it appears that I only need the species where the Protein is derived > from, so I guess Bio:Species would satisfy me, or? > and it would work that I just pull off the accession from the blast > output file and then assign the accession code and get as return value > the species name. > is it possible to just assign the accession code, because I looked up > but they were always talking of the entire file. > > regards >> >> >> Christopher Fields wrote: >>> You'll have to pull the GI or accession from each hit and do a >>> lookup >>> by either grabbing the sequence and using Bio::Species or use >>> Bio::DB::Taxonomy; there isn't any tax information directly >>> incorporated into BLAST reports AFAIK. >>> >>> Chris >>> >>> ---- Original message ---- >>> >>>> Date: Fri, 19 May 2006 10:52:28 -0600 >>>> From: Hubert Prielinger Subject: Re: >>>> [Bioperl-l] parsing xml output To: Warren Gish >>>> , bioperl-l at bioperl.org >>>> >>>> hi, >>>> I wondered whether is it also possible in the xml output (either WU >>>> or NCBI - Blast) to get the species (taxononmy) for every hit, if I >>>> do a general search. >>>> regards >>>> >>>> Warren Gish wrote: >>>> >>>>> Right, the WU-BLAST tabbed output contains more fields. (See >>>>> http:// blast.wustl.edu/blast/tabular.html). >>>>> --Warren >>>>> >>>>> >>>>>> Whoops - sorry Warren - for some reason I had it in my mind that >>>>>> it was different. So the blastxml parser should work fine. The >>>>>> WUBLAST tab-delimited output is different than NCBI's -m8/9 >>>>>> though, right? >>>>>> >>>>>> -jason >>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From ewijaya at i2r.a-star.edu.sg Sat May 20 08:36:44 2006 From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward) Date: Sat, 20 May 2006 20:36:44 +0800 Subject: [Bioperl-l] Method for checking Sequence type of a file Message-ID: <30362db229c.446f7ddc@i2r.a-star.edu.sg> Dear expert, Is there any Bioperl method that allows you to check verify sequence type in a file? For example, given a file we wish to check (return true or false) whether it is in FASTA format, GENBANK format, etc. This method is useful in web application as taint checking procedure. Regards, Edward WIJAYA SINGAPORE ------------ Institute For Infocomm Research - Disclaimer ------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you. -------------------------------------------------------- From aaron.j.mackey at gsk.com Fri May 19 09:33:01 2006 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Fri, 19 May 2006 09:33:01 -0400 Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline C? In-Reply-To: <134ede0b0605181407l52d1c2c3x79dd7f177ae7b828@mail.gmail.com> Message-ID: bioperl-ext is the package in which alignment algorithms and/or BioPerl "wrapped" external C libraries live. Subprojects in bioperl-ext use both XS and Inline::C, that's up to you. You'll need to get your C code compiled to a dynamically loaded library (.so) to use either XS or Inline::C; this precludes any reuse of the C main() function (although your Inline::C wrapper might recapitulate/copy the main() function code). Out of curiosity, what pairwise alignment algorithm are you using? This is a heavily beaten path, you might want to dig around first to see if someone else already has what you need. -Aaron bioperl-l-bounces at lists.open-bio.org wrote on 05/18/2006 05:07:42 PM: > I am currently using a pairwise alignment algorithm written in C (not by > me). The program consists of a library of routines, structures, and > definitions which I do not want to spend a lot of time abstracting. I > already have a hack method of writing the parameters and inputs I want from > perl, calling the c program with system( ), and then parsing the output in > Perl. Any good programmer would probably smack me but I'm just an undergrad > and I needed to show my boss that this works in order to spend more time on > it. > > So on to my question, what is the preferred method of extending Bioperl to > use this algorithm? I have just read the XS tutorial and a bit about Inline > C. Can I put the main function in my script using Inline, and then just > point Inline at the rest of the C library? The program has several > C-structures that are semantically equivalent to Bioperl objects, so just > need somewhere to start. I will spend some more time so that I have a more > specific question, I just wanted a little feedback, this is my first post to > the bioperl list. > > Thanks, > Adam > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at duke.edu Sat May 20 10:50:17 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Sat, 20 May 2006 10:50:17 -0400 Subject: [Bioperl-l] Method for checking Sequence type of a file In-Reply-To: <30362db229c.446f7ddc@i2r.a-star.edu.sg> References: <30362db229c.446f7ddc@i2r.a-star.edu.sg> Message-ID: Try Bio::Tools::GuessSeqFormat On May 20, 2006, at 8:36 AM, Wijaya Edward wrote: > > Dear expert, > > Is there any Bioperl method that allows > you to check verify sequence type in a file? > > For example, given a file we wish > to check (return true or false) whether > it is in FASTA format, GENBANK format, etc. > > This method is useful in web application > as taint checking procedure. > > Regards, > Edward WIJAYA > SINGAPORE > > > ------------ Institute For Infocomm Research - Disclaimer > ------------- > This email is confidential and may be privileged. If you are not > the intended recipient, please delete it and notify us immediately. > Please do not copy or use it for any purpose, or disclose its > contents to any other person. Thank you. > -------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From chen_li3 at yahoo.com Sat May 20 20:15:01 2006 From: chen_li3 at yahoo.com (chen li) Date: Sat, 20 May 2006 17:15:01 -0700 (PDT) Subject: [Bioperl-l] problems iwth Bio::graphics module Message-ID: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com> Dear all, I try one script from GraphicsHowTo under Cygwin environment(GD and libpng already installed). I type this line in Cygwin X window: $ perl render_blast1.pl data1.txt | display - And here is the result: display: no decode delegate for this image format `/tmp/magick-qKiRPDRS'. Any idea? Thank you very much, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From osborne1 at optonline.net Sat May 20 20:59:06 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Sat, 20 May 2006 20:59:06 -0400 Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com> Message-ID: Chen, Not sure. However, whenever I see a new or incomprehensible error message like "display: no decode delegate for this image format" I Google it. Brian O. On 5/20/06 8:15 PM, "chen li" wrote: > Dear all, > > > I try one script from GraphicsHowTo under Cygwin > environment(GD and libpng already installed). I type > this line in Cygwin X window: > > > $ perl render_blast1.pl data1.txt | display - > > And here is the result: > > display: no decode delegate for this image format > `/tmp/magick-qKiRPDRS'. > > Any idea? > > > Thank you very much, > > Li > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From n.saunders at uq.edu.au Sun May 21 18:17:44 2006 From: n.saunders at uq.edu.au (Neil Saunders) Date: Mon, 22 May 2006 08:17:44 +1000 Subject: [Bioperl-l] problems with Bio::Graph Message-ID: <4470E708.3070402@uq.edu.au> dear all, I am having some problems with the Bio::Graph modules. Running Bioperl 1.5.0 RC1 with Ubuntu 5.10 i686. I would like to parse files in PSI MI XML 2.5 format and for selected proteins, get the Uniprot accession of interacting partners (this is outlined in the documentation for Bio::Graph::ProteinGraph). I wrote a very simple test script and ran it on a selection of XML files. The script is simply: ---------------------------------------------------------------- use strict; use Bio::Graph::IO; my $mifile = shift || die("Usage = biograph.pl \n"); my $graphio = Bio::Graph::IO->new('-file' => $mifile, '-format' => 'psi_xml'); my $gr = $graphio->next_network; ---------------------------------------------------------------- Here's a summary of the error messages with some sample files (I tried PSI MI XML versions 1 and 2.5): 1. MINT database 9707552_small.xml (PSI 2.5) Can't call method "att" on an undefined value at /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173. 2. IntAct database yeast_small-11.xml (PSI 2.5) Can't call method "att" on an undefined value at /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173. 3. IntAct database yeast_small-11.xml (PSI 1) Use of uninitialized value in string eq at /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 126. 4. DIP files Scere20060402.mif, Ecoli20060402.mif (PSI 1) These give no errors 5. DIP file dip20060402.mif (PSI 1, complete dataset) ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Invalid species name 'immunodeficiency virus type 1, HIV-1' STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.7/Bio/Root/Root.pm:328 STACK: Bio::Species::validate_species_name /usr/local/share/perl/5.8.7/Bio/Species.pm:340 STACK: Bio::Species::classification /usr/local/share/perl/5.8.7/Bio/Species.pm:170 STACK: Bio::Species::new /usr/local/share/perl/5.8.7/Bio/Species.pm:118 STACK: Bio::Graph::IO::psi_xml::_proteinInteractor /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:105 STACK: XML::Twig::_twig_end /usr/share/perl5/XML/Twig.pm:1473 STACK: XML::Parser::Expat::parse /usr/lib/perl5/XML/Parser/Expat.pm:469 STACK: XML::Parser::parse /usr/lib/perl5/XML/Parser.pm:187 STACK: XML::Parser::parsefile /usr/lib/perl5/XML/Parser.pm:233 STACK: Bio::Graph::IO::psi_xml::next_network /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:79 STACK: ./biograph.pl:18 ----------------------------------------------------------- Looking at the module code, it seems that the first 2 errors relate to a parameter "proteinInteractorRef", found in PSI MI version 1 but not version 2.5. Error 3 I haven't yet figured out. DIP PSI MI XML version 1 for single species seems OK, but it seems there are species names in the complete dataset that cause problems (error 5). Is the CVS version of Bio::Graph any better at handling PSI MI XML? Are there plans to get it to work with version 2.5 files from all sources (MINT and IntAct) ? Googling and checking the list archives didn't give a lot of hits which made me think it's not a widely-used module. thanks, Neil -- School of Molecular and Microbial Sciences University of Queensland Brisbane 4072 Australia http://psychro.bioinformatics.unsw.edu.au/neil From torsten.seemann at infotech.monash.edu.au Sun May 21 21:31:56 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Mon, 22 May 2006 11:31:56 +1000 Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com> References: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com> Message-ID: <4471148C.5090404@infotech.monash.edu.au> > I try one script from GraphicsHowTo under Cygwin > environment(GD and libpng already installed). I type > this line in Cygwin X window: > $ perl render_blast1.pl data1.txt | display - > display: no decode delegate for this image format > `/tmp/magick-qKiRPDRS'. You are piping the output of the Perl script (which is a GIF/PNG image) into the input of a program called "display". This program is part of the ImageMagick toolkit, standard on most Linux installations. Because you are using Windows you probably don't have it installed! Try this: $ perl render_blast1.pl data1.txt > image.gif Then load 'image.gif' into whatever your favourite image viewer is. -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From darin.london at duke.edu Mon May 22 11:29:45 2006 From: darin.london at duke.edu (Darin London) Date: Mon, 22 May 2006 11:29:45 -0400 Subject: [Bioperl-l] BOSC 2006 2nd Call for Papers In-Reply-To: <4471CE49.80109@duke.edu> References: <44294B65.4050207@duke.edu> <4471CE49.80109@duke.edu> Message-ID: <4471D8E9.8090109@duke.edu> 2nd CALL FOR SPEAKERS This is the second and last official call for speakers to submit their abstracts to speak at BOSC 2006 in Fortaleza, Brasil. In order to be considered as a potential speaker, an abstract must be recieved by Monday, June 5th, 2006. We look forward to a great conference this year. Please consult The Official BOSC 2006 Website at: http://www.open-bio.org/wiki/BOSC_2006 for more details and information. In addition, a BOSC weblog has been setup to make it easier to desiminate all BOSC related announcements: http://wiki.open-bio.org/boscblog/ And if you have an ICAL compatible Calendar, there is an EventDB calendar set up with all BOSC related deadlines. http://eventful.com/groups/G0-001-000014747-0 More information about ISMB can be found at the Official ISMB 2006 Website: http://ismb2006.cbi.cnptia.embrapa.br/ Thank You, and we look forward to seeing you all, The BOSC Organizing Committee. From darin.london at duke.edu Mon May 22 12:00:55 2006 From: darin.london at duke.edu (Darin London) Date: Mon, 22 May 2006 09:00:55 -0700 Subject: [Bioperl-l] [Bioperl-announce-l] BOSC 2006 2nd Call for Papers In-Reply-To: <4471CE49.80109@duke.edu> References: <44294B65.4050207@duke.edu> <4471CE49.80109@duke.edu> Message-ID: <000301c67db8$e8391f70$6400a8c0@CodonSolutions.local> 2nd CALL FOR SPEAKERS This is the second and last official call for speakers to submit their abstracts to speak at BOSC 2006 in Fortaleza, Brasil. In order to be considered as a potential speaker, an abstract must be recieved by Monday, June 5th, 2006. We look forward to a great conference this year. Please consult The Official BOSC 2006 Website at: http://www.open-bio.org/wiki/BOSC_2006 for more details and information. In addition, a BOSC weblog has been setup to make it easier to desiminate all BOSC related announcements: http://wiki.open-bio.org/boscblog/ And if you have an ICAL compatible Calendar, there is an EventDB calendar set up with all BOSC related deadlines. http://eventful.com/groups/G0-001-000014747-0 More information about ISMB can be found at the Official ISMB 2006 Website: http://ismb2006.cbi.cnptia.embrapa.br/ Thank You, and we look forward to seeing you all, The BOSC Organizing Committee. _______________________________________________ Bioperl-announce-l mailing list Bioperl-announce-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l From osborne1 at optonline.net Mon May 22 17:37:50 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Mon, 22 May 2006 17:37:50 -0400 Subject: [Bioperl-l] problems with Bio::Graph In-Reply-To: <4470E708.3070402@uq.edu.au> Message-ID: Neil, Let me propose an alternative. In the past few months I've been working on a Bioperl package for handling protein interaction networks, it is called bioperl-network. It's similar to the Bio::Graph modules, except for the following: - It does not use Nat Goodman's SimpleGraph, it uses Perl's Graph. The advantage is that we are not responsible for maintaining the algorithm code, the disadvantage is that Graph has some bugs but Jarkko Hietaniemi has been working on these and has fixed some significant ones recently. - It uses names and concepts from Graph. It also has separate notions of edge and interaction, where one edge can have one or more interactions. - It uses more method names and conventions borrowed from interaction databases and PSI MI. For example, a node can be a protein complex composed of multiple Seq objects, not just a protein. This package is a makeover of Bio::Graph, therefore Nat Goodman and Richard Adams are major contributors to it. It's also worth mentioning that it's not complete, meaning it won't parse all fields from PSI MI 2 or 2.5 but I think it should be able to handle the code you've shown (and if it cannot then I'll see that it's fixed). I don't know about PSI MI version 1 but if I'm not mistaken there's a version 1 -> version 2 converter. I'm about to put this into CVS so you can take a look, should you choose to. Brian O. On 5/21/06 6:17 PM, "Neil Saunders" wrote: > dear all, > > I am having some problems with the Bio::Graph modules. Running Bioperl 1.5.0 > RC1 with Ubuntu 5.10 i686. > > I would like to parse files in PSI MI XML 2.5 format and for selected > proteins, > get the Uniprot accession of interacting partners (this is outlined in the > documentation for Bio::Graph::ProteinGraph). I wrote a very simple test > script > and ran it on a selection of XML files. The script is simply: > > ---------------------------------------------------------------- > use strict; > use Bio::Graph::IO; > > my $mifile = shift || die("Usage = biograph.pl \n"); > my $graphio = Bio::Graph::IO->new('-file' => $mifile, > '-format' => 'psi_xml'); > my $gr = $graphio->next_network; > ---------------------------------------------------------------- > > Here's a summary of the error messages with some sample files (I tried PSI MI > XML versions 1 and 2.5): > > 1. MINT database 9707552_small.xml (PSI 2.5) > Can't call method "att" on an undefined value at > /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173. > > 2. IntAct database yeast_small-11.xml (PSI 2.5) > Can't call method "att" on an undefined value at > /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173. > > 3. IntAct database yeast_small-11.xml (PSI 1) > Use of uninitialized value in string eq at > /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 126. > > 4. DIP files Scere20060402.mif, Ecoli20060402.mif (PSI 1) > These give no errors > > 5. DIP file dip20060402.mif (PSI 1, complete dataset) > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Invalid species name 'immunodeficiency virus type 1, HIV-1' > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.7/Bio/Root/Root.pm:328 > STACK: Bio::Species::validate_species_name > /usr/local/share/perl/5.8.7/Bio/Species.pm:340 > STACK: Bio::Species::classification > /usr/local/share/perl/5.8.7/Bio/Species.pm:170 > STACK: Bio::Species::new /usr/local/share/perl/5.8.7/Bio/Species.pm:118 > STACK: Bio::Graph::IO::psi_xml::_proteinInteractor > /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:105 > STACK: XML::Twig::_twig_end /usr/share/perl5/XML/Twig.pm:1473 > STACK: XML::Parser::Expat::parse /usr/lib/perl5/XML/Parser/Expat.pm:469 > STACK: XML::Parser::parse /usr/lib/perl5/XML/Parser.pm:187 > STACK: XML::Parser::parsefile /usr/lib/perl5/XML/Parser.pm:233 > STACK: Bio::Graph::IO::psi_xml::next_network > /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:79 > STACK: ./biograph.pl:18 > ----------------------------------------------------------- > > > Looking at the module code, it seems that the first 2 errors relate to a > parameter "proteinInteractorRef", found in PSI MI version 1 but not version > 2.5. > Error 3 I haven't yet figured out. DIP PSI MI XML version 1 for single > species seems OK, but it seems there are species names in the complete dataset > that cause problems (error 5). > > > Is the CVS version of Bio::Graph any better at handling PSI MI XML? Are there > plans to get it to work with version 2.5 files from all sources (MINT and > IntAct) ? Googling and checking the list archives didn't give a lot of hits > which made me think it's not a widely-used module. > > thanks, > Neil From torsten.seemann at infotech.monash.edu.au Mon May 22 17:53:02 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 23 May 2006 07:53:02 +1000 Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <20060522132553.21896.qmail@web36804.mail.mud.yahoo.com> References: <20060522132553.21896.qmail@web36804.mail.mud.yahoo.com> Message-ID: <447232BE.1080001@infotech.monash.edu.au> Chen Li > perl render_blast1.pl data1.txt >im.png Based on http://bioperl.org/wiki/HOWTO:Graphics I believe the example script is creating a PNG image. The last line is: print $panel->png; > and Perl runs without any problem. I use adobe > photoshop to open them and Adobe can't recognize them. > If I use ACDSee to open them I only get a black > background. If I issue this line under Cygwin X window > display im.png or display im.gif > Cygwin says: > display: Improper image header `im.png'. > It seems Perl can't produce an image with right > format. Are you sure Perl is producing a PNG file at all? How many bytes does im.png use? Zero? Did you notice this in http://bioperl.org/wiki/HOWTO:Graphics ? It says: "If you are on a Windows platform, you need to put STDOUT into binary mode so that the PNG file does not go through Window's carriage return/linefeed transformations. Before the final print statement, put the statement binmode(STDOUT)." ie. your script should have binmode(STDOUT); print $panel->png; as the last 2 lines. > Do you experience the same problem before? No. --Torsten From chen_li3 at yahoo.com Mon May 22 09:25:53 2006 From: chen_li3 at yahoo.com (chen li) Date: Mon, 22 May 2006 06:25:53 -0700 (PDT) Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <4471148C.5090404@infotech.monash.edu.au> Message-ID: <20060522132553.21896.qmail@web36804.mail.mud.yahoo.com> Dear Dr. Seemann, Thank you very much for the reply. I issue this line: perl render_blast1.pl data1.txt >im.gif or perl render_blast1.pl data1.txt >im.png and Perl runs without any problem. I use adobe photoshop to open them and Adobe can't recognize them. If I use ACDSee to open them I only get a black background. If I issue this line under Cygwin X window display im.png or display im.gif Cygwin says: display: Improper image header `im.png'. or display: Improper image header `im.gif'. It seems Perl can't produce an image with right format. Do you experience the same problem before? Li --- Torsten Seemann wrote: > > I try one script from GraphicsHowTo under Cygwin > > environment(GD and libpng already installed). I > type > > this line in Cygwin X window: > > $ perl render_blast1.pl data1.txt | display - > > display: no decode delegate for this image format > > `/tmp/magick-qKiRPDRS'. > > You are piping the output of the Perl script (which > is a GIF/PNG image) > into the input of a program called "display". This > program is part of > the ImageMagick toolkit, standard on most Linux > installations. Because > you are using Windows you probably don't have it > installed! Try this: > > $ perl render_blast1.pl data1.txt > image.gif > > Then load 'image.gif' into whatever your favourite > image viewer is. > > -- > Dr Torsten Seemann > http://www.vicbioinformatics.com > Victorian Bioinformatics Consortium, Monash > University, Australia > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From chen_li3 at yahoo.com Mon May 22 18:57:42 2006 From: chen_li3 at yahoo.com (chen li) Date: Mon, 22 May 2006 15:57:42 -0700 (PDT) Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <447232BE.1080001@infotech.monash.edu.au> Message-ID: <20060522225742.78245.qmail@web36804.mail.mud.yahoo.com> Hi, I try both: either with or without this statement binmode(STDOUT) before the last line print $panel->png; But there are no differenes. I get a file of 2432 bytes. Li > Chen Li > > > perl render_blast1.pl data1.txt >im.png > > Based on http://bioperl.org/wiki/HOWTO:Graphics I > believe the example > script is creating a PNG image. The last line is: > print $panel->png; > > > and Perl runs without any problem. I use adobe > > photoshop to open them and Adobe can't recognize > them. > > If I use ACDSee to open them I only get a black > > background. If I issue this line under Cygwin X > window > > display im.png or display im.gif > > Cygwin says: > > display: Improper image header `im.png'. > > It seems Perl can't produce an image with right > > format. > > Are you sure Perl is producing a PNG file at all? > How many bytes does im.png use? Zero? > > Did you notice this in > http://bioperl.org/wiki/HOWTO:Graphics ? > > It says: "If you are on a Windows platform, you need > to put STDOUT into > binary mode so that the PNG file does not go through > Window's carriage > return/linefeed transformations. Before the final > print statement, put > the statement binmode(STDOUT)." > > ie. your script should have > > binmode(STDOUT); > print $panel->png; > > as the last 2 lines. > > > Do you experience the same problem before? > > No. > > --Torsten > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From barry.moore at genetics.utah.edu Mon May 22 21:00:06 2006 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Mon, 22 May 2006 19:00:06 -0600 Subject: [Bioperl-l] Problems with Unflattener.pm Message-ID: <729FFBBD-955B-4689-8A27-66733E81431C@genetics.utah.edu> Hi All, NT_113910 appears to throw Bio::SeqFeatures::Tools::Unflattener into an infinite recursive loop. The trouble occurs in the method find_best_matches between lines 2258 and 2281, and in particular the loop is perpetuated by line 2273. NT_113910 has a fairly complex features table, and but I have as yet been unable to figure out why this loop is not exiting properly. This has been submitted to bugzilla, but I?ll post here so it gets documented on the list also. Any suggestions from Chris or others would be greatly appreciated. This problem can be recreated as follows: Grab NT_113910 from genbank. bp_fetch.pl -fmt genbank net::genbank:NT_113910 > NT_113910.gbk Pass NT_113910.gbk on the command line to the attached script. #!/usr/bin/perl; use strict; use warnings; use Bio::SeqIO; use Bio::SeqFeature::Tools::Unflattener; my $file = shift; # generate an Unflattener object my $unflattener = Bio::SeqFeature::Tools::Unflattener->new; #$unflattener->verbose(1); # first fetch a genbank SeqI object my $seqio = Bio::SeqIO->new(-file => $file, -format => 'GenBank'); my $out = Bio::SeqIO->new(-format => 'asciitree'); while (my $seq = $seqio->next_seq()) { # get top level unflattended SeqFeatureI objects $unflattener->unflatten_seq(-seq => $seq, -use_magic => 1); $out->write_seq($seq); } From miker at biotiquesystems.com Mon May 22 19:56:52 2006 From: miker at biotiquesystems.com (Michael Rogoff) Date: Mon, 22 May 2006 16:56:52 -0700 Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version Message-ID: <002a01c67dfb$663cc600$c100a8c0@mike> As best as I can tell, using Bio::SeqIO to parse a uniprot file ignores the sequence version, and calling seq_version() on the resulting RichSeq object returns undef. It looks like swiss.pm is trying to parse the version out of the SV line, which apparently doesn't exist any more? The sequence version(s) are now specified as part of the Date (DT) lines. Is this not a bug? Is swiss.pm not designed to parse uniprot files? Thanks for any help ... From jason.stajich at duke.edu Mon May 22 21:37:13 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon, 22 May 2006 21:37:13 -0400 Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version In-Reply-To: <002a01c67dfb$663cc600$c100a8c0@mike> References: <002a01c67dfb$663cc600$c100a8c0@mike> Message-ID: Sounds like a "missing feature" =) AFAIK the module was only written for swissprot files. It is possible there have been changes in the format that have not been tracked to the current code. We'd certainly appreciate someone testing it out as versions evolve. If you submit a bug to bugzilla with version of bioperl and example files you can track when a fix is in. We of course appreciate anyone's efforts to provide a patch as most bugs get fixed of late when someone gets "itchy" enough to fix them. -jason On May 22, 2006, at 7:56 PM, Michael Rogoff wrote: > > As best as I can tell, using Bio::SeqIO to parse a uniprot file > ignores the > sequence version, and calling seq_version() on the resulting > RichSeq object > returns undef. > > It looks like swiss.pm is trying to parse the version out of the SV > line, which > apparently doesn't exist any more? The sequence version(s) are now > specified as > part of the Date (DT) lines. > > Is this not a bug? Is swiss.pm not designed to parse uniprot files? > > Thanks for any help ... > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From jason.stajich at duke.edu Mon May 22 22:04:17 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon, 22 May 2006 22:04:17 -0400 Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version In-Reply-To: <003301c67e0b$5dd44410$c100a8c0@mike> References: <003301c67e0b$5dd44410$c100a8c0@mike> Message-ID: <3607997C-DAD4-4E0E-A919-7D9212AC6D50@duke.edu> We ask that people post patches to the bugzilla as an attachment to the bugzilla so we can track what and why the bug was that the patch fixes. I am not totally sure this patch works because it seems like we need to strip out more information now from the DT line if the $date actually contains more information than just the date. If you would go ahead and create a bug in bugzilla for this (http:// bugzilla.open-bio.org) this sort of conversation can be tracked to the bug. If any of this is unclear please let us know - I though we had put some pages up about this sort of thing on the wiki but maybe they need to be expanded. -jason On May 22, 2006, at 9:51 PM, Michael Rogoff wrote: > I have a patch that seems to work but I'm not familiar with the > proper method to > "provide" it. How do I go about that? > > The patch is pretty simple, it just parses the sequence version out > of the date > line where it now hides: > > #date > elsif( /^DT\s+(.*)/ ) { > my $date = $1; > + > + if ($date =~ /sequence version (\d+)/i) { > + $params{'-seq_version'} ||= $1; > + } > + > $date =~ s/\;//; > $date =~ s/\s+$//; > push @{$params{'-dates'}}, $date; > } > > By the way, what is the difference between Bio::Seq::version and > Bio::Seq::RichSeq::seq_version? > > >> -----Original Message----- >> From: Jason Stajich [mailto:jason.stajich at duke.edu] >> Sent: Monday, May 22, 2006 6:37 PM >> To: Michael Rogoff >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version >> >> >> Sounds like a "missing feature" =) >> >> AFAIK the module was only written for swissprot files. It is >> possible there have been changes in the format that have not been >> tracked to the current code. We'd certainly appreciate someone >> testing it out as versions evolve. If you submit a bug to bugzilla >> with version of bioperl and example files you can track when >> a fix is >> in. We of course appreciate anyone's efforts to provide a patch as >> most bugs get fixed of late when someone gets "itchy" enough to fix >> them. >> >> -jason >> >> On May 22, 2006, at 7:56 PM, Michael Rogoff wrote: >> >>> >>> As best as I can tell, using Bio::SeqIO to parse a uniprot file >>> ignores the >>> sequence version, and calling seq_version() on the resulting >>> RichSeq object >>> returns undef. >>> >>> It looks like swiss.pm is trying to parse the version out >> of the SV >>> line, which >>> apparently doesn't exist any more? The sequence version(s) >> are now >>> specified as >>> part of the Date (DT) lines. >>> >>> Is this not a bug? Is swiss.pm not designed to parse uniprot files? >>> >>> Thanks for any help ... >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From Marc.Logghe at DEVGEN.com Tue May 23 03:08:37 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Tue, 23 May 2006 09:08:37 +0200 Subject: [Bioperl-l] problems iwth Bio::graphics module Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746E4E@ANTARESIA.be.devgen.com> Hi Li, Did you check your script for any other print statements (to STDOUT, that is) that potentially could contaminate your png stream ? Marc > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chen li > Sent: Tuesday, May 23, 2006 12:58 AM > To: Torsten Seemann > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] problems iwth Bio::graphics module > > Hi, > > I try both: either with or without this statement > binmode(STDOUT) before the last line print $panel->png; But > there are no differenes. I get a file of 2432 bytes. > > Li > > > > > Chen Li > > > > > perl render_blast1.pl data1.txt >im.png > > > > Based on http://bioperl.org/wiki/HOWTO:Graphics I believe > the example > > script is creating a PNG image. The last line is: > > print $panel->png; > > > > > and Perl runs without any problem. I use adobe photoshop to open > > > them and Adobe can't recognize > > them. > > > If I use ACDSee to open them I only get a black background. If I > > > issue this line under Cygwin X > > window > > > display im.png or display im.gif > > > Cygwin says: > > > display: Improper image header `im.png'. > > > It seems Perl can't produce an image with right format. > > > > Are you sure Perl is producing a PNG file at all? > > How many bytes does im.png use? Zero? > > > > Did you notice this in > > http://bioperl.org/wiki/HOWTO:Graphics ? > > > > It says: "If you are on a Windows platform, you need to put STDOUT > > into binary mode so that the PNG file does not go through Window's > > carriage return/linefeed transformations. Before the final print > > statement, put the statement binmode(STDOUT)." > > > > ie. your script should have > > > > binmode(STDOUT); > > print $panel->png; > > > > as the last 2 lines. > > > > > Do you experience the same problem before? > > > > No. > > > > --Torsten > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection > around http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From chen_li3 at yahoo.com Tue May 23 09:27:06 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 23 May 2006 06:27:06 -0700 (PDT) Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746E4E@ANTARESIA.be.devgen.com> Message-ID: <20060523132706.57245.qmail@web36811.mail.mud.yahoo.com> Dear Dr. Logghe, Thank you so much. I have the script worked after getting your suggestion under Cygwin. Here are the last two lines: either binmode (STDOUT); print STDOUT $panel->png; or only print STDOUT $panel->png; They both work for me. I know the default output in perl to the screen. I don't why it works if STDOUT after print is added. Could you explain it? BTW I copy this script from GraphicsHowTo on Bioperl website and only one line contains print statement, which is 'print $panel->png'. Once again thank you so much, Li --- Marc Logghe wrote: > Hi Li, > Did you check your script for any other print > statements (to STDOUT, > that is) that potentially could contaminate your png > stream ? > > Marc > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open-bio.org] On > Behalf Of chen li > > Sent: Tuesday, May 23, 2006 12:58 AM > > To: Torsten Seemann > > Cc: bioperl-l at lists.open-bio.org > > Subject: Re: [Bioperl-l] problems iwth > Bio::graphics module > > > > Hi, > > > > I try both: either with or without this statement > > binmode(STDOUT) before the last line print > $panel->png; But > > there are no differenes. I get a file of 2432 > bytes. > > > > Li > > > > > > > > > Chen Li > > > > > > > perl render_blast1.pl data1.txt >im.png > > > > > > Based on http://bioperl.org/wiki/HOWTO:Graphics > I believe > > the example > > > script is creating a PNG image. The last line > is: > > > print $panel->png; > > > > > > > and Perl runs without any problem. I use adobe > photoshop to open > > > > them and Adobe can't recognize > > > them. > > > > If I use ACDSee to open them I only get a > black background. If I > > > > issue this line under Cygwin X > > > window > > > > display im.png or display im.gif > > > > Cygwin says: > > > > display: Improper image header `im.png'. > > > > It seems Perl can't produce an image with > right format. > > > > > > Are you sure Perl is producing a PNG file at > all? > > > How many bytes does im.png use? Zero? > > > > > > Did you notice this in > > > http://bioperl.org/wiki/HOWTO:Graphics ? > > > > > > It says: "If you are on a Windows platform, you > need to put STDOUT > > > into binary mode so that the PNG file does not > go through Window's > > > carriage return/linefeed transformations. Before > the final print > > > statement, put the statement binmode(STDOUT)." > > > > > > ie. your script should have > > > > > > binmode(STDOUT); > > > print $panel->png; > > > > > > as the last 2 lines. > > > > > > > Do you experience the same problem before? > > > > > > No. > > > > > > --Torsten > > > > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection > > around http://mail.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From lstein at cshl.edu Tue May 23 10:06:27 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 23 May 2006 10:06:27 -0400 Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com> References: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com> Message-ID: <200605231006.28392.lstein@cshl.edu> Hi, It is possible that your version of display can't handle PNG images. Try saving the output as a file and then opening it in another image program: perl render_blast1.pl data1.txt > data1.png Another thing to watch out for is that, depending on what version of Perl you're using, you may have to insert this statement into the render_blast1.pl script (somewhere near the top): binmode STDOUT; Lincoln On Saturday 20 May 2006 20:15, chen li wrote: > Dear all, > > > I try one script from GraphicsHowTo under Cygwin > environment(GD and libpng already installed). I type > this line in Cygwin X window: > > > $ perl render_blast1.pl data1.txt | display - > > And here is the result: > > display: no decode delegate for this image format > `/tmp/magick-qKiRPDRS'. > > Any idea? > > > Thank you very much, > > Li > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From Derek.Fairley at bll.n-i.nhs.uk Tue May 23 10:39:16 2006 From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek) Date: Tue, 23 May 2006 15:39:16 +0100 Subject: [Bioperl-l] Bio::Restriction::IO query Message-ID: Hi folks, I'm new to BioPerl, and struggling to make the Bio::Restriction::* modules work (using BioPerl-1.4; Perl-5.8.1; Linux-2.4). Specifically, I'm having some trouble understanding the behaviour of the Bio::Restriction::IO module. I'm trying to use this to create a Bio::Restriction::EnzymeCollection object from a local REBASE file (which is in bairoch-format); this will in turn be passed to a Bio::Restriction::Analysis object. The following test script (derived from the Bio::Restriction::IO perldoc) runs fine: #! /usr/bin/perl -w use strict; use warnings; use Bio::Restriction::IO; my $in = Bio::Restriction::IO->new( -file => "REBASE_file", -format =>'Bairoch'); my $collection = $in->read(); print "Number of REs in the collection: ", scalar $collection->each_enzyme, "\n"; #note that using -format=>'bairoch' without capitalisation (as shown in perldoc synopsis) throws an exception: Failed to load module Bio::Restriction::IO::bairoch... However... the test script returns the number 532 - the number of enzymes in the default enzyme set - regardless of the number of enzymes in the file. A default Bio::Restriction::EnzymeCollection object has presumably been created (as the 'read()' and 'each_enzyme' methods are available) but it didn't come from the local file. The result is the same if the Bio::Restriction::IO->new() method is called with no arguments - a default EnzymeCollection object is created. It's not clear to me where this has come from. My (mis?)understanding was that the default set of enzymes would be loaded on creation of a new Bio::Restriction::Analysis object (in the absence of a -enzymes=>... argument). Presumably this is down to my poor understanding of the BioPerl object model... ;-) So: how should I create an EnzymeCollection object from file? Any help or advice would be gratefully received. PS. Congratulations to the development team for creating a very impressive and useful open source toolkit. Derek. ----------------------------------------- Derek Fairley, Ph.D. Regional Virus Laboratory, Kelvin Building, Royal Victoria Hospital, Grosvenor Road, Belfast, N. Ireland. BT12 6BA Tel. +44 (0)2890 635303 From rowan.mitchell at bbsrc.ac.uk Tue May 23 10:53:42 2006 From: rowan.mitchell at bbsrc.ac.uk (rowan mitchell (RRes-Roth)) Date: Tue, 23 May 2006 15:53:42 +0100 Subject: [Bioperl-l] Assembly::IO ace output Message-ID: Hi I am very interested in writing ace format files and had assumed that I would be able to do this with Assembly::IO until I tried it! I see there has been some correspondence last year on this, but as far as I can see this is still not implemented in 1.5.1. Is this correct ? Is it planned to be included; are there modules under development available ? many thanks Rowan =============================================== Dr Rowan Mitchell Rothamsted Research Harpenden Herts AL5 2JQ UK Tel: +44 (0)1582 763133 x2469 Fax: +44 (0)1582 763010 E-mail: rowan.mitchell at bbsrc.ac.uk WWW: http://www.rothamsted.bbsrc.ac.uk/ =============================================== Rothamsted Research is a company limited by guarantee, registered in England under the registration number 2393175 and a not for profit charity number 802038. From rfsouza at cecm.usp.br Tue May 23 16:17:36 2006 From: rfsouza at cecm.usp.br (Robson Francisco de Souza {S}) Date: Tue, 23 May 2006 17:17:36 -0300 Subject: [Bioperl-l] Assembly::IO ace output In-Reply-To: References: Message-ID: <20060523201736.GA28401@cecm.usp.br> Hi Rowan, On Tue, May 23, 2006 at 03:53:42PM +0100, rowan mitchell (RRes-Roth) wrote: > Hi > > I am very interested in writing ace format files and had assumed that I > would be able to do this with Assembly::IO until I tried it! I see there > has been some correspondence last year on this, but as far as I can see > this is still not implemented in 1.5.1. Is this correct ? Is it planned > to be included; are there modules under development available ? As far as I know, there are no plans to add write support to Bio::Assembly::IO. When I wrote the original modules there was no need for this so I left it aside. Best regards, Robson > many thanks > > Rowan > > =============================================== > Dr Rowan Mitchell > Rothamsted Research > Harpenden > Herts AL5 2JQ UK > > Tel: +44 (0)1582 763133 x2469 > Fax: +44 (0)1582 763010 > E-mail: rowan.mitchell at bbsrc.ac.uk > WWW: http://www.rothamsted.bbsrc.ac.uk/ > =============================================== > Rothamsted Research is a company limited by guarantee, registered in > England under the registration number 2393175 and a not for profit > charity number 802038. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From lstein at cshl.edu Tue May 23 16:53:34 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 23 May 2006 16:53:34 -0400 Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <200605231006.28392.lstein@cshl.edu> References: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com> <200605231006.28392.lstein@cshl.edu> Message-ID: <200605231653.36087.lstein@cshl.edu> Hi Chen, It looks to me like you cut and paste the data1.txt file from the web site, consequently replacing the tabs with spaces. Please get table1.txt from the BioPerl distribution, as instructed in the tutorial. Best, Lincoln On Tuesday 23 May 2006 10:06, Lincoln Stein wrote: > Hi, > > It is possible that your version of display can't handle PNG images. Try > saving the output as a file and then opening it in another image program: > > perl render_blast1.pl data1.txt > data1.png > > Another thing to watch out for is that, depending on what version of Perl > you're using, you may have to insert this statement into the > render_blast1.pl script (somewhere near the top): > > binmode STDOUT; > > Lincoln > > On Saturday 20 May 2006 20:15, chen li wrote: > > Dear all, > > > > > > I try one script from GraphicsHowTo under Cygwin > > environment(GD and libpng already installed). I type > > this line in Cygwin X window: > > > > > > $ perl render_blast1.pl data1.txt | display - > > > > And here is the result: > > > > display: no decode delegate for this image format > > `/tmp/magick-qKiRPDRS'. > > > > Any idea? > > > > > > Thank you very much, > > > > Li > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam protection around > > http://mail.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From chen_li3 at yahoo.com Tue May 23 17:46:16 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 23 May 2006 14:46:16 -0700 (PDT) Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <200605231653.36087.lstein@cshl.edu> Message-ID: <20060523214616.15131.qmail@web36813.mail.mud.yahoo.com> Dear Dr. Stein, Thank you so much. I follow your suggestions and download codes from the Bioperl CVS website. Now everything is working. Li --- Lincoln Stein wrote: > Hi Chen, > > It looks to me like you cut and paste the data1.txt > file from the web site, > consequently replacing the tabs with spaces. Please > get table1.txt from the > BioPerl distribution, as instructed in the tutorial. > > Best, > > Lincoln > > On Tuesday 23 May 2006 10:06, Lincoln Stein wrote: > > Hi, > > > > It is possible that your version of display can't > handle PNG images. Try > > saving the output as a file and then opening it in > another image program: > > > > perl render_blast1.pl data1.txt > data1.png > > > > Another thing to watch out for is that, depending > on what version of Perl > > you're using, you may have to insert this > statement into the > > render_blast1.pl script (somewhere near the top): > > > > binmode STDOUT; > > > > Lincoln > > > > On Saturday 20 May 2006 20:15, chen li wrote: > > > Dear all, > > > > > > > > > I try one script from GraphicsHowTo under Cygwin > > > environment(GD and libpng already installed). I > type > > > this line in Cygwin X window: > > > > > > > > > $ perl render_blast1.pl data1.txt | display - > > > > > > And here is the result: > > > > > > display: no decode delegate for this image > format > > > `/tmp/magick-qKiRPDRS'. > > > > > > Any idea? > > > > > > > > > Thank you very much, > > > > > > Li > > > > > > > > > > __________________________________________________ > > > Do You Yahoo!? > > > Tired of spam? Yahoo! Mail has the best spam > protection around > > > http://mail.yahoo.com > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From chen_li3 at yahoo.com Tue May 23 18:59:46 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 23 May 2006 15:59:46 -0700 (PDT) Subject: [Bioperl-l] How to download sequence files either in EMBL format Message-ID: <20060523225946.2118.qmail@web36805.mail.mud.yahoo.com> Hi all, I need to download one sequence for a gene. I go to NCBI website,find the gene of interest,download the file in Genbank format(saved as sequence.genbank). But to my surprise this so-called genbank format file doesn't contain many features such as exons,compared to the one in Emsembl. My question: where can I download this sequence file in EMBL format? It looks like the one in EMBL might contain other information such exon. Thank you very much, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From osborne1 at optonline.net Wed May 24 10:33:16 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 24 May 2006 10:33:16 -0400 Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <20060523132706.57245.qmail@web36811.mail.mud.yahoo.com> Message-ID: Li, The Graphics HOWTO talks about this Windows workaround in _four_ different places, it's impossible to miss if you read it from start to finish. This is what one should do if one wants to use these modules and one is a novice. Example: Important! Remember that if you are on a Windows platform, you need to put STDOUT into binary mode so that the PNG file does not go through Window's carriage return/linefeed transformations. Before the final print statement, write binmode(STDOUT). Brian O. On 5/23/06 9:27 AM, "chen li" wrote: > BTW I copy this script from GraphicsHowTo on Bioperl > website and only one line contains print statement, > which is 'print $panel->png'. From chen_li3 at yahoo.com Wed May 24 12:17:15 2006 From: chen_li3 at yahoo.com (chen li) Date: Wed, 24 May 2006 09:17:15 -0700 (PDT) Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: Message-ID: <20060524161715.45141.qmail@web36807.mail.mud.yahoo.com> Thanks but Dr. Stein already helps me to figure out what is going on: I should have copied the source codes for the examples in CVS instead of "cut and paste" from the HOWTO tutorial. And sorry for any inconvience. Li --- Brian Osborne wrote: > Li, > > The Graphics HOWTO talks about this Windows > workaround in _four_ different > places, it's impossible to miss if you read it from > start to finish. This is > what one should do if one wants to use these modules > and one is a novice. > Example: > > Important! Remember that if you are on a Windows > platform, you need to put > STDOUT into binary mode so that the PNG file does > not go through Window's > carriage return/linefeed transformations. Before the > final print statement, > write binmode(STDOUT). > > Brian O. > > > On 5/23/06 9:27 AM, "chen li" > wrote: > > > BTW I copy this script from GraphicsHowTo on > Bioperl > > website and only one line contains print > statement, > > which is 'print $panel->png'. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From ULNJUJERYDIX at spammotel.com Wed May 24 21:59:36 2006 From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau) Date: Thu, 25 May 2006 09:59:36 +0800 Subject: [Bioperl-l] URGENT: Bio::Graphics::Panel make the ruler have negative (-) position numbering imagemap making Message-ID: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com> Hi thanks for the help offered thus far! sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq using bioperl. therefore i was asked to make the numberings as such (-1000) is there any way at all to do this in bioperl without changing the .pm file? thanks guys.. kevin From cjfields at uiuc.edu Thu May 25 12:43:37 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 25 May 2006 11:43:37 -0500 Subject: [Bioperl-l] Problems with Unflattener.pm In-Reply-To: <729FFBBD-955B-4689-8A27-66733E81431C@genetics.utah.edu> Message-ID: <009d01c6801a$5f75d2a0$15327e82@pyrimidine> I was able to reproduce this using WinXP and bioperl-live. Seems to get caught up in the loop during recursion: debugging shows it is unable to get past 'find_best_matches: (/15)'. There are lots of unmatched pairs here with this sequence, so could that be the problem? I not terribly familiar with Unflattener... Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Barry Moore > Sent: Monday, May 22, 2006 8:00 PM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] Problems with Unflattener.pm > > Hi All, > > NT_113910 appears to throw Bio::SeqFeatures::Tools::Unflattener into > an infinite recursive loop. The trouble occurs in the method > find_best_matches between lines 2258 and 2281, and in particular the > loop is perpetuated by line 2273. NT_113910 has a fairly complex > features table, and but I have as yet been unable to figure out why > this loop is not exiting properly. This has been submitted to > bugzilla, but I'll post here so it gets documented on the list also. > Any suggestions from Chris or others would be greatly appreciated. > > This problem can be recreated as follows: > > Grab NT_113910 from genbank. > bp_fetch.pl -fmt genbank net::genbank:NT_113910 > NT_113910.gbk > > Pass NT_113910.gbk on the command line to the attached script. > > > > #!/usr/bin/perl; > > use strict; > use warnings; > > use Bio::SeqIO; > use Bio::SeqFeature::Tools::Unflattener; > > my $file = shift; > > # generate an Unflattener object > my $unflattener = Bio::SeqFeature::Tools::Unflattener->new; > #$unflattener->verbose(1); > > # first fetch a genbank SeqI object > my $seqio = > Bio::SeqIO->new(-file => $file, > -format => 'GenBank'); > my $out = > Bio::SeqIO->new(-format => 'asciitree'); > while (my $seq = $seqio->next_seq()) { > > # get top level unflattended SeqFeatureI objects > $unflattener->unflatten_seq(-seq => $seq, > -use_magic => 1); > $out->write_seq($seq); > } > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Thu May 25 15:44:01 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 25 May 2006 14:44:01 -0500 Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version In-Reply-To: <3607997C-DAD4-4E0E-A919-7D9212AC6D50@duke.edu> Message-ID: <00a101c68033$95606dd0$15327e82@pyrimidine> This is due to recent changes in the SwissProt/UniProt format (there apparently are many other changes besides this). >From UniProtKB news (http://ca.expasy.org/sprot/relnotes/sp_news.html) is this tidbit: ---------------------------------------------------------- UniProtKB release 7.0 of 07-Feb-2006 Changes concerning dates and versions numbers (DT lines) We changed from showing only the dates corresponding to full UniProtKB releases in the DT lines to displaying the date of the biweekly release at which an entry is integrated or updated. We dropped the information concerning the release number and introduced entry and sequence version numbers in the DT lines. The new format of the three DT lines is: DT DD-MMM-YYYY, integrated into UniProtKB/database_name. DT DD-MMM-YYYY, sequence version version_number. DT DD-MMM-YYYY, entry version version_number. Example for UniProtKB/Swiss-Prot: DT 01-JAN-1998, integrated into UniProtKB/Swiss-Prot. DT 15-OCT-2001, sequence version 3. DT 01-APR-2004, entry version 14. Example for UniProtKB/TrEMBL: DT 01-FEB-1999, integrated into UniProtKB/TrEMBL. DT 15-OCT-2000, sequence version 2. DT 15-DEC-2004, entry version 5. The sequence version number of an entry is incremented by one when its amino acid sequence is modified. The entry version number is incremented by one whenever any data in the flat file representation of the entry is modified. We retrofitted the entry and sequence version numbers, as well as all dates, using archived UniProtKB releases. ---------------------------------------------------------- Probably should explain on the swissprot wiki page that the format is in a state of flux at the moment. I've added this tidbit to the bug page (#2003) as well. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Jason Stajich > Sent: Monday, May 22, 2006 9:04 PM > To: Michael Rogoff > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version > > We ask that people post patches to the bugzilla as an attachment to > the bugzilla so we can track what and why the bug was that the patch > fixes. > > I am not totally sure this patch works because it seems like we need > to strip out more information now from the DT line if the $date > actually contains more information than just the date. > > If you would go ahead and create a bug in bugzilla for this (http:// > bugzilla.open-bio.org) this sort of conversation can be tracked to > the bug. > > If any of this is unclear please let us know - I though we had put > some pages up about this sort of thing on the wiki but maybe they > need to be expanded. > > -jason > On May 22, 2006, at 9:51 PM, Michael Rogoff wrote: > > > I have a patch that seems to work but I'm not familiar with the > > proper method to > > "provide" it. How do I go about that? > > > > The patch is pretty simple, it just parses the sequence version out > > of the date > > line where it now hides: > > > > #date > > elsif( /^DT\s+(.*)/ ) { > > my $date = $1; > > + > > + if ($date =~ /sequence version (\d+)/i) { > > + $params{'-seq_version'} ||= $1; > > + } > > + > > $date =~ s/\;//; > > $date =~ s/\s+$//; > > push @{$params{'-dates'}}, $date; > > } > > > > By the way, what is the difference between Bio::Seq::version and > > Bio::Seq::RichSeq::seq_version? > > > > > >> -----Original Message----- > >> From: Jason Stajich [mailto:jason.stajich at duke.edu] > >> Sent: Monday, May 22, 2006 6:37 PM > >> To: Michael Rogoff > >> Cc: bioperl-l at lists.open-bio.org > >> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version > >> > >> > >> Sounds like a "missing feature" =) > >> > >> AFAIK the module was only written for swissprot files. It is > >> possible there have been changes in the format that have not been > >> tracked to the current code. We'd certainly appreciate someone > >> testing it out as versions evolve. If you submit a bug to bugzilla > >> with version of bioperl and example files you can track when > >> a fix is > >> in. We of course appreciate anyone's efforts to provide a patch as > >> most bugs get fixed of late when someone gets "itchy" enough to fix > >> them. > >> > >> -jason > >> > >> On May 22, 2006, at 7:56 PM, Michael Rogoff wrote: > >> > >>> > >>> As best as I can tell, using Bio::SeqIO to parse a uniprot file > >>> ignores the > >>> sequence version, and calling seq_version() on the resulting > >>> RichSeq object > >>> returns undef. > >>> > >>> It looks like swiss.pm is trying to parse the version out > >> of the SV > >>> line, which > >>> apparently doesn't exist any more? The sequence version(s) > >> are now > >>> specified as > >>> part of the Date (DT) lines. > >>> > >>> Is this not a bug? Is swiss.pm not designed to parse uniprot files? > >>> > >>> Thanks for any help ... > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> -- > >> Jason Stajich > >> Duke University > >> http://www.duke.edu/~jes12 > >> > >> > >> > > > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From miker at biotiquesystems.com Mon May 22 21:51:10 2006 From: miker at biotiquesystems.com (Michael Rogoff) Date: Mon, 22 May 2006 18:51:10 -0700 Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version In-Reply-To: Message-ID: <003301c67e0b$5dd44410$c100a8c0@mike> I have a patch that seems to work but I'm not familiar with the proper method to "provide" it. How do I go about that? The patch is pretty simple, it just parses the sequence version out of the date line where it now hides: #date elsif( /^DT\s+(.*)/ ) { my $date = $1; + + if ($date =~ /sequence version (\d+)/i) { + $params{'-seq_version'} ||= $1; + } + $date =~ s/\;//; $date =~ s/\s+$//; push @{$params{'-dates'}}, $date; } By the way, what is the difference between Bio::Seq::version and Bio::Seq::RichSeq::seq_version? > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich at duke.edu] > Sent: Monday, May 22, 2006 6:37 PM > To: Michael Rogoff > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version > > > Sounds like a "missing feature" =) > > AFAIK the module was only written for swissprot files. It is > possible there have been changes in the format that have not been > tracked to the current code. We'd certainly appreciate someone > testing it out as versions evolve. If you submit a bug to bugzilla > with version of bioperl and example files you can track when > a fix is > in. We of course appreciate anyone's efforts to provide a patch as > most bugs get fixed of late when someone gets "itchy" enough to fix > them. > > -jason > > On May 22, 2006, at 7:56 PM, Michael Rogoff wrote: > > > > > As best as I can tell, using Bio::SeqIO to parse a uniprot file > > ignores the > > sequence version, and calling seq_version() on the resulting > > RichSeq object > > returns undef. > > > > It looks like swiss.pm is trying to parse the version out > of the SV > > line, which > > apparently doesn't exist any more? The sequence version(s) > are now > > specified as > > part of the Date (DT) lines. > > > > Is this not a bug? Is swiss.pm not designed to parse uniprot files? > > > > Thanks for any help ... > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > From chen_li3 at yahoo.com Tue May 23 11:48:46 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 23 May 2006 08:48:46 -0700 (PDT) Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <200605231006.28392.lstein@cshl.edu> Message-ID: <20060523154846.70831.qmail@web36815.mail.mud.yahoo.com> Dear Dr. Stein, I have the job partially done by adding this line (under Cygwin) print STDOUT $panel->png; It is done because I can produce the image to be viewed by other programs but it is only partially done because I don't get exactly the same image as that shown on the website. Enclosed is the image I get. Thank you, Li --- Lincoln Stein wrote: > Hi, > > It is possible that your version of display can't > handle PNG images. Try > saving the output as a file and then opening it in > another image program: > > perl render_blast1.pl data1.txt > data1.png > > Another thing to watch out for is that, depending on > what version of Perl > you're using, you may have to insert this statement > into the render_blast1.pl > script (somewhere near the top): > > binmode STDOUT; > > Lincoln > > > On Saturday 20 May 2006 20:15, chen li wrote: > > Dear all, > > > > > > I try one script from GraphicsHowTo under Cygwin > > environment(GD and libpng already installed). I > type > > this line in Cygwin X window: > > > > > > $ perl render_blast1.pl data1.txt | display - > > > > And here is the result: > > > > display: no decode delegate for this image format > > `/tmp/magick-qKiRPDRS'. > > > > Any idea? > > > > > > Thank you very much, > > > > Li > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -------------- next part -------------- A non-text attachment was scrubbed... Name: im1 Type: image/x-png Size: 2423 bytes Desc: 2615755531-im1 URL: From cjfields at uiuc.edu Thu May 25 21:28:14 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 25 May 2006 20:28:14 -0500 Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version In-Reply-To: <003301c67e0b$5dd44410$c100a8c0@mike> References: <003301c67e0b$5dd44410$c100a8c0@mike> Message-ID: This patch works only for the recent change in swissprot seq format for sequence versions on the DT line. I checked it out vs the test data provided with bioperl (t\data\swiss.dat). I did manage to get it working for both old and new using a modification to your patch but there's another issue; using $seq->get_dates, which should only show dates, shows the entire line (date and version info). Jason mentioned that there needs to be a better way to address this which I'm looking into. Chris On May 22, 2006, at 8:51 PM, Michael Rogoff wrote: > I have a patch that seems to work but I'm not familiar with the > proper method to > "provide" it. How do I go about that? > > The patch is pretty simple, it just parses the sequence version out > of the date > line where it now hides: > > #date > elsif( /^DT\s+(.*)/ ) { > my $date = $1; > + > + if ($date =~ /sequence version (\d+)/i) { > + $params{'-seq_version'} ||= $1; > + } > + > $date =~ s/\;//; > $date =~ s/\s+$//; > push @{$params{'-dates'}}, $date; > } > > By the way, what is the difference between Bio::Seq::version and > Bio::Seq::RichSeq::seq_version? > > >> -----Original Message----- >> From: Jason Stajich [mailto:jason.stajich at duke.edu] >> Sent: Monday, May 22, 2006 6:37 PM >> To: Michael Rogoff >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version >> >> >> Sounds like a "missing feature" =) >> >> AFAIK the module was only written for swissprot files. It is >> possible there have been changes in the format that have not been >> tracked to the current code. We'd certainly appreciate someone >> testing it out as versions evolve. If you submit a bug to bugzilla >> with version of bioperl and example files you can track when >> a fix is >> in. We of course appreciate anyone's efforts to provide a patch as >> most bugs get fixed of late when someone gets "itchy" enough to fix >> them. >> >> -jason >> >> On May 22, 2006, at 7:56 PM, Michael Rogoff wrote: >> >>> >>> As best as I can tell, using Bio::SeqIO to parse a uniprot file >>> ignores the >>> sequence version, and calling seq_version() on the resulting >>> RichSeq object >>> returns undef. >>> >>> It looks like swiss.pm is trying to parse the version out >> of the SV >>> line, which >>> apparently doesn't exist any more? The sequence version(s) >> are now >>> specified as >>> part of the Date (DT) lines. >>> >>> Is this not a bug? Is swiss.pm not designed to parse uniprot files? >>> >>> Thanks for any help ... >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From lstein at cshl.edu Fri May 26 10:38:29 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 26 May 2006 10:38:29 -0400 Subject: [Bioperl-l] URGENT: Bio::Graphics::Panel make the ruler have negative (-) position numbering imagemap making In-Reply-To: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com> References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com> Message-ID: <200605261038.30380.lstein@cshl.edu> Hi, For some reason I didn't see the first posting on this. In current bioperl live, the ruler can have negative numberings - I use this routinely. You need to create a feature that starts in negative coordinates. What is happening to you when you try this? Lincoln On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote: > Hi > thanks for the help offered thus far! > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq using > bioperl. therefore i was asked to make the numberings as such (-1000) is > there any way at all to do this in bioperl without changing the .pm file? > > thanks guys.. > kevin > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From jelenaob at gmail.com Fri May 26 12:47:05 2006 From: jelenaob at gmail.com (Jelena Obradovic) Date: Fri, 26 May 2006 09:47:05 -0700 Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file Message-ID: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com> Hi there, I have tried loading enzyme list from a file REBASE bairoch.605 using Bio::Restriction::IO; 1. But for some reason the number of enzymes in the list is always 532 which is a default set of enzymes in enzyme collection. Is there any known issue with this module or a workaround? And here is the code I have been using: my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-format=>"Bairoch") || die "can't load the file bairoch.605: $!"; my $enzymes = $re_in->read; print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n"; 2. The other problem is when trying to use format that is lower-case it throws an exception, but when "B" is capitalized it is ok. I assume it cannot load a file and does not initilize enzyme collection properly. Can't call method "each_enzyme" on an undefined value at .../cgi-bin/seq-load.pl line 51. Any thoughts? Thanks in advance, Jelena Obradovic jelenaob at gmail.com From cjfields at uiuc.edu Fri May 26 15:27:13 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 26 May 2006 14:27:13 -0500 Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file In-Reply-To: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com> Message-ID: <002601c680fa$644635a0$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic > Sent: Friday, May 26, 2006 11:47 AM > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file > > Hi there, > > I have tried loading enzyme list from a file REBASE bairoch.605 using > Bio::Restriction::IO; > > 1. But for some reason the number of enzymes in the list is always 532 > which is a default set of enzymes in enzyme collection. > > Is there any known issue with this module or a workaround? > > And here is the code I have been using: > > my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- > format=>"Bairoch") > || die "can't load the file bairoch.605: $!"; > my $enzymes = $re_in->read; > print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n"; my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- format=>"Bairoch"); should be my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- format=>"bairoch"); Note the case change for the format; this is noted in the bug report you submitted earlier. Bio::Restriction::IO works similarly to Bio::SeqIO (i.e. requires a specific format, which I believe is case-sensitive). Judging by the modules in Bio/Restriction/IO directory, looks like the Bio::Restriction::IO format should match one of the following formats: bairoch, itype2, withrefm, and you can also build your own if needed using the previous as examples and implementing Bio::Restriction::IO::base. > 2. The other problem is when trying to use format that is lower-case > it throws an exception, but when "B" is capitalized it is ok. > I assume it cannot load a file and does not initilize enzyme > collection properly. > > Can't call method "each_enzyme" on an undefined value at > .../cgi-bin/seq-load.pl line 51. My guess? The reason it works with an uppercase ('Bairoch') is that it can't find the module and uses the default set of enzymes as a fallback. The exception that you reported when you use lowercase ('bairoch') is real and I reported it as a bug (there are a few I found in that module). You might want to try using one of the other formats if you can get the files in the right format from REBASE. I'm looking into the bugs specifically associated with Bio::Restriction::IO::bairoch. > Any thoughts? > > > Thanks in advance, > > > Jelena Obradovic > jelenaob at gmail.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Fri May 26 15:43:18 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 26 May 2006 15:43:18 -0400 Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file In-Reply-To: <002601c680fa$644635a0$15327e82@pyrimidine> Message-ID: Chris, SeqIO's arguments are case-insensitive (e.g. 'fasta', 'Fasta', 'FASTA' should work). This is what the documentation says and what the code seems to suggest. This is probably what the Restriction modules should do as well. Brian O. From cjfields at uiuc.edu Fri May 26 16:21:03 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 26 May 2006 15:21:03 -0500 Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file In-Reply-To: Message-ID: <002701c68101$e9432540$15327e82@pyrimidine> Okay, my bad. Having the format be case-insensitive makes sense and is probably an easy fix, but there seem to be more serious issues with the Bio::Restriction::IO modules at the moment. None have implemented write methods though POD implies they work: SYNOPSIS use Bio::Restriction::IO; $in = Bio::Restriction::IO->new(-file => "inputfilename" , -format => 'withrefm'); $out = Bio::Restriction::IO->new(-file => ">outputfilename" , -format => 'bairoch'); my $res = $in->read; # a Bio::Restriction::EnzymeCollection $out->write($res); and no tests exist for Bio::Restriction::IO::bairoch yet. In fact, the tests are pretty confusing; when did we allow this syntax: '-format => 8'? Anyway, I'm muddling my way through this and will probably write something up for the project priority list if I can't work this bug out. Chris > -----Original Message----- > From: Brian Osborne [mailto:osborne1 at optonline.net] > Sent: Friday, May 26, 2006 2:43 PM > To: Chris Fields; 'Jelena Obradovic'; Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::Restriction::IO and REBASE file > > Chris, > > SeqIO's arguments are case-insensitive (e.g. 'fasta', 'Fasta', 'FASTA' > should work). This is what the documentation says and what the code seems > to > suggest. This is probably what the Restriction modules should do as well. > > Brian O. > > From andreas.bender at complife.org Fri May 26 10:50:03 2006 From: andreas.bender at complife.org (Andreas Bender (CompLife'06)) Date: Fri, 26 May 2006 10:50:03 -0400 Subject: [Bioperl-l] Bioperl-based Applications for "Free Software" Session? Message-ID: Dear All, Did anyone of you implement some cool programs/tools using Bioperl? Or is there someone from the Bioperl core team who wants to present Bioperl itself at our conference? We are holding a "free software" session (free at least as in free beer, ideally also open source, some GNU-type license) at our "Computational Life Sciences" Conference in Cambridge/UK later this year and you are warmly welcome to present your software there. Please contact me directly or visit the website in case of any questions. Enjoy the weekend, Andreas Call for Contributions ================================================== LIFE SCIENCE FREE SOFTWARE SESSION held at CompLife 2006 (http://www.complife.org) in Cambridge, United Kingdom, on September 27 - 29, 2006 ================================================== In the last years more and more free and open source software has been developed for chemo- and bioinformatics, molecular modelling or other Life Science applications, but many of the programs are not well known. During the CompLife 2006 conference we will organize a special session dedicated to this type of free software. The demo session will be preceeded by a short session having room for brief introductory presentations whereas the demo session itself will allow attendees to see the tools in action. Authors of free software will have the opportunity to present their program to the CompLife audience which will consist of researchers and users from computer science, biology, chemistry and everything in between. In case you are interested in the free software session, send us an email at fss at complife.org and briefly describe your program and how you intend to present it at the conference (1-2 pages max - please include URL to downloadable version where available). The only restrictions are that the program must be freely available for everyone or even open source and that it must be related to Life Science applications. The deadline for these proposals is June, 16th 2006. In mid July we will notify you if your software demo was accepted. ************************ -- Computational Life Sciences '06 Cambridge/UK, 27-29 September 2006: Visit http://www.complife.org for more information! Andreas Kieron Patrick Bender - http://www.andreasbender.de Novartis Institutes for BioMedical Research, Cambridge/MA From cjfields at uiuc.edu Fri May 26 17:19:08 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 26 May 2006 16:19:08 -0500 Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file In-Reply-To: <286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com> Message-ID: <002b01c6810a$06642400$15327e82@pyrimidine> The POD documentation is a bit misleading for Bio::Restriction::IO. Brian's right, there needs to be more flexibility with the case for the formats used. I found a few other odd things as well which I may file bug reports for. Looks like another post for the project priority list. Chris _____ From: Jelena Obradovic [mailto:jobradovic at gmail.com] Sent: Friday, May 26, 2006 3:56 PM To: Chris Fields Cc: Jelena Obradovic; Bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio::Restriction::IO and REBASE file Hi guys, I tried with the other formats, and it works fine with "withrefm" format but not with "withref". Thanks a lot for your reponse. Cheers, Jelena On 5/26/06, Chris Fields wrote: > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic > Sent: Friday, May 26, 2006 11:47 AM > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file > > Hi there, > > I have tried loading enzyme list from a file REBASE bairoch.605 using > Bio::Restriction::IO; > > 1. But for some reason the number of enzymes in the list is always 532 > which is a default set of enzymes in enzyme collection. > > Is there any known issue with this module or a workaround? > > And here is the code I have been using: > > my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- > format=>"Bairoch") > || die "can't load the file bairoch.605: $!"; > my $enzymes = $re_in->read; > print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n"; my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- format=>"Bairoch"); should be my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- format=>"bairoch"); Note the case change for the format; this is noted in the bug report you submitted earlier. Bio::Restriction::IO works similarly to Bio::SeqIO ( i.e. requires a specific format, which I believe is case-sensitive). Judging by the modules in Bio/Restriction/IO directory, looks like the Bio::Restriction::IO format should match one of the following formats: bairoch, itype2, withrefm, and you can also build your own if needed using the previous as examples and implementing Bio::Restriction::IO::base. > 2. The other problem is when trying to use format that is lower-case > it throws an exception, but when "B" is capitalized it is ok. > I assume it cannot load a file and does not initilize enzyme > collection properly. > > Can't call method "each_enzyme" on an undefined value at > .../cgi-bin/seq-load.pl line 51. My guess? The reason it works with an uppercase ('Bairoch') is that it can't find the module and uses the default set of enzymes as a fallback. The exception that you reported when you use lowercase ('bairoch') is real and I reported it as a bug (there are a few I found in that module). You might want to try using one of the other formats if you can get the files in the right format from REBASE. I'm looking into the bugs specifically associated with Bio::Restriction::IO::bairoch. > Any thoughts? > > > Thanks in advance, > > > Jelena Obradovic > jelenaob at gmail.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jelena Obradovic Email: jobradovic at gmail.com From jay at jays.net Sat May 27 12:47:27 2006 From: jay at jays.net (Jay Hannah) Date: Sat, 27 May 2006 11:47:27 -0500 Subject: [Bioperl-l] "Project OpenLab" (working title) Message-ID: <4478829F.5030508@jays.net> Hola -- We've been kicking around this idea for a few months now. I'm threatening to start coding. Once I do I might not sleep for a few weeks so I thought I'd solicit feedback now. :) "Project OpenLab": http://omaha.pm.org/kwiki/?BioPerl - Does any such project already exist? - If there's no other obvious choice already bent to BioPerl / BioPerl DB / BioSQL, I'll probably be writing the web framework in Perl's Template Toolkit. The server is Linux, Apache, mySQL (BioPerl DBs). - I'll be using BioPerl objects for the persistence layer as much as possible. Where not possible I'll ask this list about my patches/additions/ugly hackery. - I'll be discussing my back office tables like "users" that don't belong in bioperl-db; and my questions about new tables that might belong there on the BioSQL-l mailing list. - I'm not a computer language zealot (usually), so I'm open to out-of-the-box ideas from anyone. - I'm a biology newb with a long Perl/database/web/e-commerce background, so please feel free to point out any bio idiocy I engage in. Thanks for your time, j From fernan at iib.unsam.edu.ar Sat May 27 18:30:44 2006 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Sat, 27 May 2006 19:30:44 -0300 Subject: [Bioperl-l] "Project OpenLab" (working title) In-Reply-To: <4478829F.5030508@jays.net> References: <4478829F.5030508@jays.net> Message-ID: <20060527223044.GA40583@iib.unsam.edu.ar> +----[ Jay Hannah (27.May.2006 15:15): | | Hola -- Hola! | We've been kicking around this idea for a few months now. I'm threatening to start coding. Once I do I might not sleep for a few weeks so I thought I'd solicit feedback now. :) | | "Project OpenLab": | http://omaha.pm.org/kwiki/?BioPerl | | - Does any such project already exist? mmm ... maybe ... both GUS (Genomics Unified Schema: gusdb.org, though not developed around bioperl) and GMOD (Generic Model Organism Database: gmod.org) provide you with i) RDBMS storage ii) a Perl object layer iii) a web app framework Though certainly overkill for the needs you describe in the wiki, they can be customized to work in the way you describe or at least serve as a guide. | - If there's no other obvious choice already bent to BioPerl / BioPerl DB / BioSQL, I'll probably be writing the web framework in Perl's Template Toolkit. The server is Linux, Apache, mySQL (BioPerl DBs). Have you considered Perl Catalyst? It has the benefits of allowing you to work with bioperl modules naturally (it's Perl!) a choice of templating toolkits (Template Toolkit, Mason, among others) and will provide you with an almost ready to go controller/url dispatcher. | - I'll be using BioPerl objects for the persistence layer as much as possible. Where not possible I'll ask this list about my patches/additions/ugly hackery. | - I'll be discussing my back office tables like "users" that don't belong in bioperl-db; and my questions about new tables that might belong there on the BioSQL-l mailing list. | - I'm not a computer language zealot (usually), so I'm open to out-of-the-box ideas from anyone. | - I'm a biology newb with a long Perl/database/web/e-commerce background, so please feel free to point out any bio idiocy I engage in. | | Thanks for your time, | | j | +----] Good luck, Fernan From epsteinj at mail.nih.gov Fri May 26 14:46:32 2006 From: epsteinj at mail.nih.gov (Epstein, Jonathan A (NIH/NICHD) [E]) Date: Fri, 26 May 2006 14:46:32 -0400 Subject: [Bioperl-l] URGENT: Bio::Graphics::Panel make the ruler havenegative (-) position numbering imagemap making In-Reply-To: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com> References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com> Message-ID: <42504F69898FE546B3F0238C9BD032750915F8@NIHCESMLBX7.nih.gov> While this is being discussed and we have Lincoln's attention; in example 4 on the Biographics Howto: http://stein.cshl.org/genome_informatics/BioGraphics/Graphics-HOWTO.html how can one assign directional arrows to the graded segments which represent the BLAST hits? I.e., is there a glyph type which is both an 'arrow' and a 'graded_segment'? What other techniques do you recommend for associating directionality with these hits? Thanks®ards, Jonathan From jobradovic at gmail.com Fri May 26 16:55:35 2006 From: jobradovic at gmail.com (Jelena Obradovic) Date: Fri, 26 May 2006 13:55:35 -0700 Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file In-Reply-To: <002601c680fa$644635a0$15327e82@pyrimidine> References: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com> <002601c680fa$644635a0$15327e82@pyrimidine> Message-ID: <286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com> Hi guys, I tried with the other formats, and it works fine with "withrefm" format but not with "withref". Thanks a lot for your reponse. Cheers, Jelena On 5/26/06, Chris Fields wrote: > > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic > > Sent: Friday, May 26, 2006 11:47 AM > > To: Bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file > > > > Hi there, > > > > I have tried loading enzyme list from a file REBASE bairoch.605 using > > Bio::Restriction::IO; > > > > 1. But for some reason the number of enzymes in the list is always 532 > > which is a default set of enzymes in enzyme collection. > > > > Is there any known issue with this module or a workaround? > > > > And here is the code I have been using: > > > > my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- > > format=>"Bairoch") > > || die "can't load the file bairoch.605: $!"; > > my $enzymes = $re_in->read; > > print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n"; > > my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- > format=>"Bairoch"); > > should be > > my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- > format=>"bairoch"); > > Note the case change for the format; this is noted in the bug report you > submitted earlier. Bio::Restriction::IO works similarly to Bio::SeqIO ( > i.e. > requires a specific format, which I believe is case-sensitive). Judging > by > the modules in Bio/Restriction/IO directory, looks like the > Bio::Restriction::IO format should match one of the following formats: > bairoch, itype2, withrefm, and you can also build your own if needed using > the previous as examples and implementing Bio::Restriction::IO::base. > > > 2. The other problem is when trying to use format that is lower-case > > it throws an exception, but when "B" is capitalized it is ok. > > I assume it cannot load a file and does not initilize enzyme > > collection properly. > > > > Can't call method "each_enzyme" on an undefined value at > > .../cgi-bin/seq-load.pl line 51. > > My guess? The reason it works with an uppercase ('Bairoch') is that it > can't find the module and uses the default set of enzymes as a fallback. > The exception that you reported when you use lowercase ('bairoch') is real > and I reported it as a bug (there are a few I found in that module). > > You might want to try using one of the other formats if you can get the > files in the right format from REBASE. I'm looking into the bugs > specifically associated with Bio::Restriction::IO::bairoch. > > > Any thoughts? > > > > > > Thanks in advance, > > > > > > Jelena Obradovic > > jelenaob at gmail.com > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Jelena Obradovic Email: jobradovic at gmail.com From gad14 at cornell.edu Fri May 26 16:02:33 2006 From: gad14 at cornell.edu (Genevieve DeClerck) Date: Fri, 26 May 2006 16:02:33 -0400 Subject: [Bioperl-l] results problem with StandAloneBlast Message-ID: <44775ED9.4020208@cornell.edu> Hi, I'm running local blast with Bio::Tools::Run::StandAloneBlast. Everything seems to work ok up to the point of accessing the results. I am able to print the results but when I try to do more than one thing with the result, nothing is returned for the second activity.. I'd like to first sort the results into groups of results that hit the db seq once, twice, three times, etc - where the results are stored as SeqFeature objects in temporary arrays whose contents are printed sequentially to stdout when the whole sort is complete. Secondly, I need to print the results in Hit Table (i.e. -m 8) format to stdout. If I've sorted the results the sorted-results will print to screen, however when I try to print the Hit Table results nothing is returned, as if the blast results have evaporated.... and visa versa, if i comment out the part where i point my sorting subroutine to the blast results reference, my hit table results suddenly prints to screen. It's almost like the reference to the SearchIO obj that holds the StandAloneBlast results is lost after one use?? (I'm beginning to think there is something naive about the way I'm using references?..) Here's an abbreviated version of my code: my $ref_seq_objs; # ref to array of Sequence obj's my $genome_seq; # fasta containing 1 genomic sequence my @params = ('program' => 'blastn', 'database' => $genome_seq, ); my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); my $blast_report = $factory->blastall($ref_seq_objs); #OK ####### ### the following 2 actions seem to be mutually exclusive. # 1) sort results into 1-hitter, 2-hitter, etc. groups of # SeqFeature objs stored in arrays. arrays are then printed # to stdout &sort_results($blast_report); # 2) print blast results &print_blast_results($blast_report); ####### sub print_blast_results{ my $report = shift; while(my $result = $report->next_result()){ while(my $hit = $result->next_hit()){ while(my $hsp = $hit->next_hsp()){ my $q_name = $hsp_q_seq_obj->display_id; print join(", ",$q_name,$hit->name,$hsp->bits)."\n"; } } } } I'm about to lose my mind on this... any assistance appreciated! Thanks, Genevieve From rvosa at sfu.ca Sun May 28 03:43:23 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Sun, 28 May 2006 00:43:23 -0700 Subject: [Bioperl-l] "Project OpenLab" (working title) In-Reply-To: <4478829F.5030508@jays.net> References: <4478829F.5030508@jays.net> Message-ID: <4479549B.5030202@sfu.ca> The TreeBaseII team (part of the cipres project: http://www.phylo.org) are working on a lab database system for storage of intermediate calculation results and data (sequence alignments, trees, taxon sets). I think what you're discussing is a bit more molecular and less phylogenetic, but it does sound similar in spirit. Rutger Jay Hannah wrote: > Hola -- > > We've been kicking around this idea for a few months now. I'm threatening to start coding. Once I do I might not sleep for a few weeks so I thought I'd solicit feedback now. :) > > "Project OpenLab": > http://omaha.pm.org/kwiki/?BioPerl > > - Does any such project already exist? > - If there's no other obvious choice already bent to BioPerl / BioPerl DB / BioSQL, I'll probably be writing the web framework in Perl's Template Toolkit. The server is Linux, Apache, mySQL (BioPerl DBs). > - I'll be using BioPerl objects for the persistence layer as much as possible. Where not possible I'll ask this list about my patches/additions/ugly hackery. > - I'll be discussing my back office tables like "users" that don't belong in bioperl-db; and my questions about new tables that might belong there on the BioSQL-l mailing list. > - I'm not a computer language zealot (usually), so I'm open to out-of-the-box ideas from anyone. > - I'm a biology newb with a long Perl/database/web/e-commerce background, so please feel free to point out any bio idiocy I engage in. > > Thanks for your time, > > j > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- ++++++++++++++++++++++++++++++++++++++++++++++++++++ Rutger Vos, PhD. candidate Department of Biological Sciences Simon Fraser University 8888 University Drive Burnaby, BC, V5A1S6 Phone: 604-291-5625 Fax: 604-291-3496 Personal site: http://www.sfu.ca/~rvosa FAB* lab: http://www.sfu.ca/~fabstar Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ ++++++++++++++++++++++++++++++++++++++++++++++++++++ From cjfields at uiuc.edu Sun May 28 09:55:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 28 May 2006 08:55:47 -0500 Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file In-Reply-To: <286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com> References: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com> <002601c680fa$644635a0$15327e82@pyrimidine> <286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com> Message-ID: Again, it's b/c 'withrefm' is a valid Restriction::IO module and 'withref' is not. Similar to the case issue you saw before with 'bairoch.' Making this more lenient would help but there are more serious issues with these modules that need to be addressed... http://www.bioperl.org/wiki/Project_priority_list#Restriction_Enzymes Chris On May 26, 2006, at 3:55 PM, Jelena Obradovic wrote: > Hi guys, I tried with the other formats, and it works fine with > "withrefm" > format but not with "withref". > > Thanks a lot for your reponse. > > Cheers, > > Jelena > > On 5/26/06, Chris Fields wrote: >> >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic >>> Sent: Friday, May 26, 2006 11:47 AM >>> To: Bioperl-l at lists.open-bio.org >>> Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file >>> >>> Hi there, >>> >>> I have tried loading enzyme list from a file REBASE bairoch.605 >>> using >>> Bio::Restriction::IO; >>> >>> 1. But for some reason the number of enzymes in the list is >>> always 532 >>> which is a default set of enzymes in enzyme collection. >>> >>> Is there any known issue with this module or a workaround? >>> >>> And here is the code I have been using: >>> >>> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- >>> format=>"Bairoch") >>> || die "can't load the file bairoch.605: $!"; >>> my $enzymes = $re_in->read; >>> print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n"; >> >> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- >> format=>"Bairoch"); >> >> should be >> >> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- >> format=>"bairoch"); >> >> Note the case change for the format; this is noted in the bug >> report you >> submitted earlier. Bio::Restriction::IO works similarly to >> Bio::SeqIO ( >> i.e. >> requires a specific format, which I believe is case-sensitive). >> Judging >> by >> the modules in Bio/Restriction/IO directory, looks like the >> Bio::Restriction::IO format should match one of the following >> formats: >> bairoch, itype2, withrefm, and you can also build your own if >> needed using >> the previous as examples and implementing Bio::Restriction::IO::base. >> >>> 2. The other problem is when trying to use format that is lower-case >>> it throws an exception, but when "B" is capitalized it is ok. >>> I assume it cannot load a file and does not initilize enzyme >>> collection properly. >>> >>> Can't call method "each_enzyme" on an undefined value at >>> .../cgi-bin/seq-load.pl line 51. >> >> My guess? The reason it works with an uppercase ('Bairoch') is >> that it >> can't find the module and uses the default set of enzymes as a >> fallback. >> The exception that you reported when you use lowercase ('bairoch') >> is real >> and I reported it as a bug (there are a few I found in that module). >> >> You might want to try using one of the other formats if you can >> get the >> files in the right format from REBASE. I'm looking into the bugs >> specifically associated with Bio::Restriction::IO::bairoch. >> >>> Any thoughts? >>> >>> >>> Thanks in advance, >>> >>> >>> Jelena Obradovic >>> jelenaob at gmail.com >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > -- > Jelena Obradovic > Email: jobradovic at gmail.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From osborne1 at optonline.net Sun May 28 11:03:37 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Sun, 28 May 2006 11:03:37 -0400 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <44775ED9.4020208@cornell.edu> Message-ID: Genevieve, Does this simplified code, without the &sort_results($blast_report) line, work? By the way, no one can really help you here because you haven't shown us all of the code. The code you are showing certainly looks OK. Brian O. On 5/26/06 4:02 PM, "Genevieve DeClerck" wrote: > &sort_results($blast_report); From simon.rayner.mlist at gmail.com Mon May 29 03:37:24 2006 From: simon.rayner.mlist at gmail.com (mailing lists) Date: Mon, 29 May 2006 15:37:24 +0800 Subject: [Bioperl-l] installation problems with bioperl-ext on x86_64 running SuSE linux Message-ID: Hello, i'm having a problem trying to install the bioperl-ext package on my system. biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align # perl Makefile.PL Writing Makefile for Bio::Ext::Align biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align # make cc -c -I./libs -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -pipe -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -fPIC -O2 -fmessage-length=0 -Wall -D_FORTIFY_SOURCE=2 -g -Wall -pipe -DVERSION=\"0.1\" -DXS_VERSION= \"0.1\" -fPIC "-I/usr/lib/perl5/5.8.7/x86_64-linux-thread-multi/CORE" -DPOSIX -DNOERROR Align.c In file included from Align.xs:12: ./libs/sw.h:1360:1: warning: "/*" within comment . . . Running Mkbootstrap for Bio::Ext::Align () chmod 644 Align.bs rm -f blib/arch/auto/Bio/Ext/Align/Align.so LD_RUN_PATH="" cc -shared -L/usr/local/lib64 Align.o -o blib/arch/auto/Bio/Ext/Align/Align.so libs/libsw.a -lm /usr/lib64/gcc/x86_64-suse-linux/4.0.2/../../../../x86_64-suse-linux/bin/ld: libs/libsw.a(aln.o): relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC libs/libsw.a: could not read symbols: Bad value collect2: ld returned 1 exit status make: *** [blib/arch/auto/Bio/Ext/Align/Align.so] Error 1 biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align # biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align # biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align # the -fPIC flag is already set in the makefile. I found a similar problem in an earlier posting with the following suggestions.... From: Aaron J. Mackey pcbi.upenn.edu> Subject: Re: compiling bioperl-ext Newsgroups: gmane.comp.lang.perl.bio.general Date: 2004-06-09 20:46:05 GMT (1 year, 50 weeks, 3 days, 3 hours and 50 minutes ago) 1) Are you starting with a clean build directory? 2) Does installing other compiled Perl modules work for you (e.g. Data::Dumper or Storable)? That's a pretty arcane error, and if the answer to #2 is "no", then I don't think we can help you. -Aaron ....In my case, both 1) and 2) are true. I installed Data::Dumper without any problems. I've found plenty of similar incidences for other sofware and it seems to relate to 32/64bit issues. Does anyone have any suggestions about how to get around this? thanks Simon Rayner From ULNJUJERYDIX at spammotel.com Mon May 29 05:46:21 2006 From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau) Date: Mon, 29 May 2006 17:46:21 +0800 Subject: [Bioperl-l] **Fwd: Re: URGENT: Bio::Graphics::Panel make the ruler have In-Reply-To: <200605261038.30380.lstein@cshl.edu> References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com> <200605261038.30380.lstein@cshl.edu> Message-ID: <5b6410e0605290246p8875c78n286caa672a55b4de@mail.gmail.com> Hi! oh it was in a slightly different header asking about the create image map feature. I am using the stable version 1.4 of bioperl now. In any case I have not added the sequence as a feature annotated seq. as I already have the bp where the TF binds (in 1-1050 numberings) so what I did was to just add graded segments based on the position. I saw that there is a scale function for the arrow glyp however, it is a multiply function, can it be hacked to take in a offset value (ie minus the scale by 1000?) cheers kevin Hi, > > For some reason I didn't see the first posting on this. In current bioperl > live, the ruler can have negative numberings - I use this routinely. You > need > to create a feature that starts in negative coordinates. What is happening > to > you when you try this? > > Lincoln > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote: > > Hi > > thanks for the help offered thus far! > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq > using > > bioperl. therefore i was asked to make the numberings as such (-1000) is > > there any way at all to do this in bioperl without changing the .pm > file? > > > > thanks guys.. > > kevin > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From shameer at ncbs.res.in Mon May 29 06:07:17 2006 From: shameer at ncbs.res.in (Shameer Khadar) Date: Mon, 29 May 2006 15:37:17 +0530 (IST) Subject: [Bioperl-l] Reg. Integrated Server / CGI to pass PDB to multiple Servers Message-ID: <49187.192.168.1.1.1148897237.squirrel@192.168.1.1> Dear All, My query may not be directly related to BioPERL, But am sure I will get some idea to move on. Some possibilities wil be available from Pise or related modules Query : --------- We have several public servers(say a,b,c). All of them will take a pdb-file as an input and process it and displays it. Now, I need to create a web page(a meta-server/integrated web-server) with three radio buttons(a,b,c) and a single input form(to accept pdb file from the users ...:( - File passing as an argument seems to be some what impossible to me). I need output as 3 links in next page. Is there any Bio-PERL module / CGI / Perl tricks to do it ? Thanks in advance, -- Shameer Khadar Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India T - 91-080-23636420-32 EXT 4241 F - 91-080-23636662/23636675 W - http://caps.ncbs.res.in -------------------------------------------------- "Refrain from illusions, insist on work and not words, patiently seek divine and scientific truth." From torsten.seemann at infotech.monash.edu.au Tue May 30 02:41:31 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 30 May 2006 16:41:31 +1000 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <44775ED9.4020208@cornell.edu> References: <44775ED9.4020208@cornell.edu> Message-ID: <447BE91B.30001@infotech.monash.edu.au> > my $ref_seq_objs; # ref to array of Sequence obj's > my $genome_seq; # fasta containing 1 genomic sequence > my @params = ('program' => 'blastn', > 'database' => $genome_seq, > ); The database parameter needs to be the same thing you would pass to the "-d" option in "blastall". I don't think you can pass a perl string here. ie. there needs to be a properly formatted set of blast indices for your genome sequence on the disk in the appropriate place. See ftp://ftp.ncbi.nlm.nih.gov/blast/documents/blast.html > my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); > my $blast_report = $factory->blastall($ref_seq_objs); #OK But I could be wrong, and $blast_report here contains a valid BLAST report. -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From sb at mrc-dunn.cam.ac.uk Tue May 30 03:59:28 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Tue, 30 May 2006 08:59:28 +0100 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <44775ED9.4020208@cornell.edu> References: <44775ED9.4020208@cornell.edu> Message-ID: <447BFB60.4000006@mrc-dunn.cam.ac.uk> Genevieve DeClerck wrote: > Hi, [snip] > If I've sorted the results the sorted-results will print to screen, > however when I try to print the Hit Table results nothing is returned, > as if the blast results have evaporated.... and visa versa, if i comment > out the part where i point my sorting subroutine to the blast results > reference, my hit table results suddenly prints to screen. [snip] > Here's an abbreviated version of my code: [snip] > ####### > ### the following 2 actions seem to be mutually exclusive. > # 1) sort results into 1-hitter, 2-hitter, etc. groups of > # SeqFeature objs stored in arrays. arrays are then printed > # to stdout > &sort_results($blast_report); > > # 2) print blast results > &print_blast_results($blast_report); > sub print_blast_results{ > my $report = shift; > while(my $result = $report->next_result()){ [snip] You didn't give us your sort_results subroutine, but is it as simple as they both use $report->next_result (and/or $result->next_hit), but you don't reset the internal counter back to the start, so the second subroutine tries to get the next_result and finds the first subroutine has already looked at the last result and so next_result returns false? From a quick look it wasn't obvious how to reset the counter. Hopefully this can be done and someone else knows how. From torsten.seemann at infotech.monash.edu.au Tue May 30 04:18:45 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 30 May 2006 18:18:45 +1000 Subject: [Bioperl-l] For CVS developers - potential pitfall with "return undef" Message-ID: <447BFFE5.8010508@infotech.monash.edu.au> FYI Bioperl developers: I just audited the bioperl-live CVS and found about 450 occurrences of "return undef". Page 199 of "Perl Best Practices" by Damian Conway, and this URL http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest: "Use return; instead of return undef; if you want to return nothing. If someone assigns the return value to an array, the latter creates an array of one value (undef), which evaluates to true. The former will correctly handle all contexts." So I'm guessing at least some of these 450 occurrences *could* result in bugs and should probably be changed. Your opinion may differ :-) -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From cjfields at uiuc.edu Tue May 30 10:07:45 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 30 May 2006 09:07:45 -0500 Subject: [Bioperl-l] For CVS developers - potential pitfall with "returnundef" In-Reply-To: <447BFFE5.8010508@infotech.monash.edu.au> Message-ID: <000c01c683f2$6ca62570$15327e82@pyrimidine> Torsten, Any way you can post a list of some/all of the offending lines or modules? Sounds like something to consider, but if the list is as large as you say we made need something (bugzilla? wiki?) to track the changes and make sure they pass tests; I'm sure a large majority will. I'm guessing Jason would want this somewhere on the project priority list or bugzilla, with a link to the actual list, but I'm not sure. Maybe start a page on the wiki for proposed code changes? Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > Sent: Tuesday, May 30, 2006 3:19 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] For CVS developers - potential pitfall with > "returnundef" > > FYI Bioperl developers: > > I just audited the bioperl-live CVS and found about 450 occurrences of > "return undef". > > Page 199 of "Perl Best Practices" by Damian Conway, and this URL > http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest: > > "Use return; instead of return undef; if you want to return nothing. If > someone assigns the return value to an array, the latter creates an > array of one value (undef), which evaluates to true. The former will > correctly handle all contexts." > > So I'm guessing at least some of these 450 occurrences *could* result in > bugs and should probably be changed. > > Your opinion may differ :-) > > -- > Dr Torsten Seemann http://www.vicbioinformatics.com > Victorian Bioinformatics Consortium, Monash University, Australia > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lstein at cshl.edu Tue May 30 10:47:48 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 30 May 2006 10:47:48 -0400 Subject: [Bioperl-l] **Fwd: Re: URGENT: Bio::Graphics::Panel make the ruler have In-Reply-To: <5b6410e0605290246p8875c78n286caa672a55b4de@mail.gmail.com> References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com> <200605261038.30380.lstein@cshl.edu> <5b6410e0605290246p8875c78n286caa672a55b4de@mail.gmail.com> Message-ID: <200605301047.49127.lstein@cshl.edu> Hi Kevin, I'm afraid that there is no offset value. You'll need the 1.51 version of bioperl to handle negative numbers properly. I understand your reluctance to upgrade just to get the Bio::Graphics functionality. You might consider checking out just the Bio/Graphics subtree and installing that. It should work on top of 1.4 Lincoln On Monday 29 May 2006 05:46, Kevin Lam Koiyau wrote: > Hi! > oh it was in a slightly different header asking about the create image map > feature. > I am using the stable version 1.4 of bioperl now. In any case I have not > added the sequence as a feature annotated seq. as I already have the bp > where the TF binds (in 1-1050 numberings) so what I did was to just add > graded segments based on the position. > I saw that there is a scale function for the arrow glyp however, it is a > multiply function, can it be hacked to take in a offset value (ie minus the > scale by 1000?) > > cheers > kevin > > > Hi, > > > For some reason I didn't see the first posting on this. In current > > bioperl live, the ruler can have negative numberings - I use this > > routinely. You need > > to create a feature that starts in negative coordinates. What is > > happening to > > you when you try this? > > > > Lincoln > > > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote: > > > Hi > > > thanks for the help offered thus far! > > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq > > > > using > > > > > bioperl. therefore i was asked to make the numberings as such (-1000) > > > is there any way at all to do this in bioperl without changing the .pm > > > > file? > > > > > thanks guys.. > > > kevin > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Lincoln D. Stein > > Cold Spring Harbor Laboratory > > 1 Bungtown Road > > Cold Spring Harbor, NY 11724 > > (516) 367-8380 (voice) > > (516) 367-8389 (fax) > > FOR URGENT MESSAGES & SCHEDULING, > > PLEASE CONTACT MY ASSISTANT, > > SANDRA MICHELSEN, AT michelse at cshl.edu > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Tue May 30 10:50:06 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 30 May 2006 09:50:06 -0500 Subject: [Bioperl-l] Bio::Restriction::IO issues Message-ID: <000f01c683f8$5771ed50$15327e82@pyrimidine> Jason, Brian, et al, I found several major issues with Bio::Restriction::IO (this popped up while bug squashing). In particular, the POD is pretty misleading. It states (directly from perldoc): SYNOPSIS use Bio::Restriction::IO; $in = Bio::Restriction::IO->new(-file => "inputfilename" , -format => 'withrefm'); $out = Bio::Restriction::IO->new(-file => ">outputfilename" , -format => 'bairoch'); my $res = $in->read; # a Bio::Restriction::EnzymeCollection $out->write($res); # or # use Bio::Restriction::IO; # # #input file format can be read from the file extension (dat|xml) # $in = Bio::Restriction::IO->newFh(-file => "inputfilename"); # $out = Bio::Restriction::IO->newFh('-format' => 'xml'); # # # World's shortest flat<->xml format converter: # print $out $_ while <$in>; So, I have found several problems with these modules. I really hate to criticize code here, as my own is pretty hacky, but I think these are things to seriously mull over: 1) Note that, though some of the lines above are commented they are still there in POD and thus present in perldoc/pod2html etc. So, judging from the above, it suggests using the script above should read in from one format and write out to another (like SeqIO). However, NONE of the current write() methods are implemented for any of the IO modules (withref, base, itype2, bairoch), so this does not happen as expected. You get the nasty thrown 'method not implemented error' instead when writing. 2) The commented statements in POD above also suggest that REBASE XML format is supported when there is no XML module. 3) The Bio::Restriction::IO::bairoch module had multiple bugs which made it unusable until I added a few small changes; it still can't handle multisite/multicut enzymes properly, so in essence it is useless until that is addressed. 4) Bio::Restriction::IO inherits from Bio::SeqIO, though I'm not sure why. Shouldn't it just inherit from Bio::Root::Root/Bio::Root::IO and make up it's own methods? I'm working on at least getting the 'bairoch' input format up and running (so at least it gets the enzymes into a Bio::Restriction::Enzyme::Collection). From this point I'm not sure where to proceed. The POD obviously needs to be corrected to reflect that writing formats is not implemented (and the bit about XML should be taken out completely); that's the easy part which I am working on and plan committing today. However, these modules don't seem to be used too frequently so I'm not sure whether it's worth spending too much time getting these up to speed at the moment (adding write methods, switching to Bio::Root::Root, etc); I have other priorities at the moment (including a way overdue ListSummary). I'm also not sure who else is (using|working) on these so I don't want to (make too many changes|step on someone else's toes), but these are, IMHO, pretty serious problems. Any thoughts? Chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue May 30 12:34:18 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 30 May 2006 11:34:18 -0500 Subject: [Bioperl-l] Bio::Restriction::IO changes Message-ID: <001401c68406$e71e9850$15327e82@pyrimidine> Jason, Brian, et al: I have made changes to the Bio::Restriction::IO POD to remove any reference to write functions since almost none have been implemented yet, so including this into POD is a bit misleading. At the moment, you can't write to any REBASE format except for 'base', which I found is the only one that works. And, upon further checking, even that one has issues: it looks like there are problems with multicut/multisite enzymes when writing in 'base' format which I'm not delving into ('TaqII' only displays one site when writing when it has two cut sites). I'll add this to the wiki and a bug report (enhancement) for this module. I am also removing mention of XML and 'bairoch' formats (the former isn't present and the latter is broken at the moment) and added a few things to the POD TO DO section. Rob (if you're out there somewhere in the ether), have you made any more changes to these modules that need to be committed? Didn't know if any of these issues have already been addressed/changed etc. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From jelenaob at gmail.com Tue May 30 00:58:35 2006 From: jelenaob at gmail.com (Jelena Obradovic) Date: Mon, 29 May 2006 21:58:35 -0700 Subject: [Bioperl-l] Bio::Graphic::Panel backgroud color Message-ID: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com> Hello everybody, does anybody know how to remove the background color of the Panel. Currently, I am not adding anything to it, so I can troubleshot the problem, and I have tried setting up all color attributes I could find to the panel, but no luck. Whatever I do, I get the BLUE border of the panel. Has anybody faced the same problem? Thanks in advance, Jelena And here is the code I am currently using: ----------------------------------------------------------------------------------------------------------- my $panel = Bio::Graphics::Panel->new(-length => $prim_seq->length() + 200, -width => 800, -pad_left => 10, -pad_right => 10, -key_color => 'white', -bgcolor => 'white', -gridcolor=>'black', -fgcolor => 'black', -grid => 0, ); my ($url,$map,$mapname) = $panel->image_and_map( #-root=>'$root_url' , -url => '/tmpimages'); #make clickable image print $cgi->img({-src=>$url,-usemap=>"#$mapname"}); print $map; ----------------------------------------------------------------------------------------------------------- From jelenaob at gmail.com Tue May 30 00:58:35 2006 From: jelenaob at gmail.com (Jelena Obradovic) Date: Mon, 29 May 2006 21:58:35 -0700 Subject: [Bioperl-l] Bio::Graphic::Panel backgroud color Message-ID: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com> Hello everybody, does anybody know how to remove the background color of the Panel. Currently, I am not adding anything to it, so I can troubleshot the problem, and I have tried setting up all color attributes I could find to the panel, but no luck. Whatever I do, I get the BLUE border of the panel. Has anybody faced the same problem? Thanks in advance, Jelena And here is the code I am currently using: ----------------------------------------------------------------------------------------------------------- my $panel = Bio::Graphics::Panel->new(-length => $prim_seq->length() + 200, -width => 800, -pad_left => 10, -pad_right => 10, -key_color => 'white', -bgcolor => 'white', -gridcolor=>'black', -fgcolor => 'black', -grid => 0, ); my ($url,$map,$mapname) = $panel->image_and_map( #-root=>'$root_url' , -url => '/tmpimages'); #make clickable image print $cgi->img({-src=>$url,-usemap=>"#$mapname"}); print $map; ----------------------------------------------------------------------------------------------------------- From luciap at sas.upenn.edu Tue May 30 14:49:48 2006 From: luciap at sas.upenn.edu (Lucia Peixoto) Date: Tue, 30 May 2006 14:49:48 -0400 Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function Message-ID: <1149014988.447c93cc01761@128.91.55.38> Hi I am here again, I finally got to write the "collapse nodes" function and have a couple of questions. In order to collpase any node $node, I first have to get the parent which I can do as $parent=$node->ancestor and then the children as: @children=$node->get_all_Descendents (or should I use each descendent?) Then before deleting $node I have to assign all its children to $parent, and here is where I am kind of confussed. Can I use the add_Descendent function for this? I've been tryig to write something like this: foreach $child (@children){ $parent=add_Descendent->$child; } but this doesn't work and I think it is because I don't have any idea of what I am doing any suggestions? thanks Lucia Peixoto Department of Biology,SAS University of Pennsylvania From rvosa at sfu.ca Tue May 30 14:52:52 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Tue, 30 May 2006 11:52:52 -0700 Subject: [Bioperl-l] For CVS developers - potential pitfall with "returnundef" In-Reply-To: <000c01c683f2$6ca62570$15327e82@pyrimidine> References: <000c01c683f2$6ca62570$15327e82@pyrimidine> Message-ID: <447C9484.9030102@sfu.ca> Although I agree with the sentiment of following PBP, I'm not so sure changing 'return undef' to 'return' *now* will fix any bugs without introducing new, subtle ones. Chris Fields wrote: > Torsten, > > Any way you can post a list of some/all of the offending lines or modules? > Sounds like something to consider, but if the list is as large as you say we > made need something (bugzilla? wiki?) to track the changes and make sure > they pass tests; I'm sure a large majority will. > > I'm guessing Jason would want this somewhere on the project priority list or > bugzilla, with a link to the actual list, but I'm not sure. Maybe start a > page on the wiki for proposed code changes? > > Chris > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann >> Sent: Tuesday, May 30, 2006 3:19 AM >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] For CVS developers - potential pitfall with >> "returnundef" >> >> FYI Bioperl developers: >> >> I just audited the bioperl-live CVS and found about 450 occurrences of >> "return undef". >> >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest: >> >> "Use return; instead of return undef; if you want to return nothing. If >> someone assigns the return value to an array, the latter creates an >> array of one value (undef), which evaluates to true. The former will >> correctly handle all contexts." >> >> So I'm guessing at least some of these 450 occurrences *could* result in >> bugs and should probably be changed. >> >> Your opinion may differ :-) >> >> -- >> Dr Torsten Seemann http://www.vicbioinformatics.com >> Victorian Bioinformatics Consortium, Monash University, Australia >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- ++++++++++++++++++++++++++++++++++++++++++++++++++++ Rutger Vos, PhD. candidate Department of Biological Sciences Simon Fraser University 8888 University Drive Burnaby, BC, V5A1S6 Phone: 604-291-5625 Fax: 604-291-3496 Personal site: http://www.sfu.ca/~rvosa FAB* lab: http://www.sfu.ca/~fabstar Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ ++++++++++++++++++++++++++++++++++++++++++++++++++++ From luciap at sas.upenn.edu Tue May 30 16:11:52 2006 From: luciap at sas.upenn.edu (Lucia Peixoto) Date: Tue, 30 May 2006 16:11:52 -0400 Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function In-Reply-To: References: Message-ID: <1149019912.447ca7085124e@128.91.55.38> Hi OK that was silly, but what I have in my code is what you just wrote But the problem is that if I write $parent->add_Descendent($child) it tells me that I am calling the method "ass_Descendent" on an undefined value (but I did define $parent before??) So here it goes the code so far: use Bio::TreeIO; my $in = new Bio::TreeIO(-file => 'Test2.tre', -format => 'newick'); my $out = new Bio::TreeIO(-file => '>mytree.out', -format => 'newick'); while( my $tree = $in->next_tree ) { foreach my $node ( grep { ! $_->is_Leaf() } $tree->get_nodes() ) { my $bootstrap=$node->_creation_id; if ($bootstrap < 70 ){ my $parent = $node->ancestor; my @children=$node->get_all_Descendents; foreach my $child (@children){ $parent->add_Descendent($child); } ........ eventually I'll add (once I assigned the children to the parent succesfully): $tree->remove_Node($node); } } $out->write_tree($tree); } Quoting aaron.j.mackey at gsk.com: > > foreach $child (@children){ > > $parent=add_Descendent->$child; > > } > > I think what you want is $parent->add_Descendent($child) > > -Aaron > Lucia Peixoto Department of Biology,SAS University of Pennsylvania From jason.stajich at duke.edu Tue May 30 16:30:56 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue, 30 May 2006 16:30:56 -0400 Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function In-Reply-To: <1149019912.447ca7085124e@128.91.55.38> References: <1149019912.447ca7085124e@128.91.55.38> Message-ID: <6B175FC0-F9D4-4658-AF9D-23D7F1C1B241@duke.edu> you need to special case the root - it won't have an ancestor. just protect the my $parent = $node->ancestor with an if statement as I did below On May 30, 2006, at 4:11 PM, Lucia Peixoto wrote: > Hi > OK that was silly, but what I have in my code is what you just wrote > But the problem is that if I write > > $parent->add_Descendent($child) > > it tells me that I am calling the method "ass_Descendent" on an > undefined value > (but I did define $parent before??) > > So here it goes the code so far: > > use Bio::TreeIO; > my $in = new Bio::TreeIO(-file => 'Test2.tre', > -format => 'newick'); > my $out = new Bio::TreeIO(-file => '>mytree.out', > -format => 'newick'); > while( my $tree = $in->next_tree ) { > foreach my $node ( grep { ! $_->is_Leaf() } $tree->get_nodes() ) { > my $bootstrap=$node->_creation_id; > > if ($bootstrap < 70 ){ > >>> if( my $parent = $node->ancestor ) { > my @children=$node->get_all_Descendents; > foreach my $child (@children){ > $parent->add_Descendent($child); > } } > > ........ > > eventually I'll add (once I assigned the children to the parent > succesfully): > $tree->remove_Node($node); > > } > } > $out->write_tree($tree); > } > > Quoting aaron.j.mackey at gsk.com: > >>> foreach $child (@children){ >>> $parent=add_Descendent->$child; >>> } >> >> I think what you want is $parent->add_Descendent($child) >> >> -Aaron >> > > > Lucia Peixoto > Department of Biology,SAS > University of Pennsylvania > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cjfields at uiuc.edu Tue May 30 17:40:18 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 30 May 2006 16:40:18 -0500 Subject: [Bioperl-l] For CVS developers - potential pitfallwith "returnundef" In-Reply-To: <447C9484.9030102@sfu.ca> Message-ID: <001801c68431$a586b2d0$15327e82@pyrimidine> Agreed, though I think these changes should be implemented at some point (Conway's argument here makes sense and it is nice for Torsten to check this out). If proper tests are written then any changes resulting in errors should be picked up by checking the appropriate test suite, though I know it doesn't absolutely guarantee it. ; P Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Rutger Vos > Sent: Tuesday, May 30, 2006 1:53 PM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith > "returnundef" > > Although I agree with the sentiment of following PBP, I'm not so sure > changing 'return undef' to 'return' *now* will fix any bugs without > introducing new, subtle ones. > > Chris Fields wrote: > > Torsten, > > > > Any way you can post a list of some/all of the offending lines or > modules? > > Sounds like something to consider, but if the list is as large as you > say we > > made need something (bugzilla? wiki?) to track the changes and make sure > > they pass tests; I'm sure a large majority will. > > > > I'm guessing Jason would want this somewhere on the project priority > list or > > bugzilla, with a link to the actual list, but I'm not sure. Maybe start > a > > page on the wiki for proposed code changes? > > > > Chris > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > >> Sent: Tuesday, May 30, 2006 3:19 AM > >> To: bioperl-l at lists.open-bio.org > >> Subject: [Bioperl-l] For CVS developers - potential pitfall with > >> "returnundef" > >> > >> FYI Bioperl developers: > >> > >> I just audited the bioperl-live CVS and found about 450 occurrences of > >> "return undef". > >> > >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL > >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest: > >> > >> "Use return; instead of return undef; if you want to return nothing. If > >> someone assigns the return value to an array, the latter creates an > >> array of one value (undef), which evaluates to true. The former will > >> correctly handle all contexts." > >> > >> So I'm guessing at least some of these 450 occurrences *could* result > in > >> bugs and should probably be changed. > >> > >> Your opinion may differ :-) > >> > >> -- > >> Dr Torsten Seemann http://www.vicbioinformatics.com > >> Victorian Bioinformatics Consortium, Monash University, Australia > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > -- > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > Rutger Vos, PhD. candidate > Department of Biological Sciences > Simon Fraser University > 8888 University Drive > Burnaby, BC, V5A1S6 > Phone: 604-291-5625 > Fax: 604-291-3496 > Personal site: http://www.sfu.ca/~rvosa > FAB* lab: http://www.sfu.ca/~fabstar > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rvosa at sfu.ca Tue May 30 17:58:25 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Tue, 30 May 2006 14:58:25 -0700 Subject: [Bioperl-l] For CVS developers - potential pitfallwith"returnundef" In-Reply-To: <001901c68433$026b1ad0$15327e82@pyrimidine> References: <001901c68433$026b1ad0$15327e82@pyrimidine> Message-ID: <447CC001.4050000@sfu.ca> I've been following the perl6 mailing lists for a while now. I think this time around it won't really take that long (one year?) for pugs/perl6 stacks to become more than just toys. I think especially large projects, like bioperl, will really benefit from the improved OO implementation in perl6, so it might be of interest to at least fantasize about it. Chris Fields wrote: > Ha! Or may be the 'nonexistent' bioperl-experimental. Wonder what'll > happen once Perl6 comes to term? > > -CJF > > >> -----Original Message----- >> From: Rutger Vos [mailto:rvosa at sfu.ca] >> Sent: Tuesday, May 30, 2006 4:48 PM >> To: Chris Fields >> Subject: Re: [Bioperl-l] For CVS developers - potential >> pitfallwith"returnundef" >> >> Surely this will all sort itself out in bioperl6 ;-) >> >> Chris Fields wrote: >> >>> Agreed, though I think these changes should be implemented at some point >>> (Conway's argument here makes sense and it is nice for Torsten to check >>> >> this >> >>> out). If proper tests are written then any changes resulting in errors >>> should be picked up by checking the appropriate test suite, though I >>> >> know it >> >>> doesn't absolutely guarantee it. ; P >>> >>> Chris >>> >>> >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Rutger Vos >>>> Sent: Tuesday, May 30, 2006 1:53 PM >>>> To: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith >>>> "returnundef" >>>> >>>> Although I agree with the sentiment of following PBP, I'm not so sure >>>> changing 'return undef' to 'return' *now* will fix any bugs without >>>> introducing new, subtle ones. >>>> >>>> Chris Fields wrote: >>>> >>>> >>>>> Torsten, >>>>> >>>>> Any way you can post a list of some/all of the offending lines or >>>>> >>>>> >>>> modules? >>>> >>>> >>>>> Sounds like something to consider, but if the list is as large as you >>>>> >>>>> >>>> say we >>>> >>>> >>>>> made need something (bugzilla? wiki?) to track the changes and make >>>>> >> sure >> >>>>> they pass tests; I'm sure a large majority will. >>>>> >>>>> I'm guessing Jason would want this somewhere on the project priority >>>>> >>>>> >>>> list or >>>> >>>> >>>>> bugzilla, with a link to the actual list, but I'm not sure. Maybe >>>>> >> start >> >>>> a >>>> >>>> >>>>> page on the wiki for proposed code changes? >>>>> >>>>> Chris >>>>> >>>>> >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann >>>>>> Sent: Tuesday, May 30, 2006 3:19 AM >>>>>> To: bioperl-l at lists.open-bio.org >>>>>> Subject: [Bioperl-l] For CVS developers - potential pitfall with >>>>>> "returnundef" >>>>>> >>>>>> FYI Bioperl developers: >>>>>> >>>>>> I just audited the bioperl-live CVS and found about 450 occurrences >>>>>> >> of >> >>>>>> "return undef". >>>>>> >>>>>> Page 199 of "Perl Best Practices" by Damian Conway, and this URL >>>>>> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html >>>>>> >> suggest: >> >>>>>> "Use return; instead of return undef; if you want to return nothing. >>>>>> >> If >> >>>>>> someone assigns the return value to an array, the latter creates an >>>>>> array of one value (undef), which evaluates to true. The former will >>>>>> correctly handle all contexts." >>>>>> >>>>>> So I'm guessing at least some of these 450 occurrences *could* result >>>>>> >>>>>> >>>> in >>>> >>>> >>>>>> bugs and should probably be changed. >>>>>> >>>>>> Your opinion may differ :-) >>>>>> >>>>>> -- >>>>>> Dr Torsten Seemann http://www.vicbioinformatics.com >>>>>> Victorian Bioinformatics Consortium, Monash University, Australia >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> -- >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> Rutger Vos, PhD. candidate >>>> Department of Biological Sciences >>>> Simon Fraser University >>>> 8888 University Drive >>>> Burnaby, BC, V5A1S6 >>>> Phone: 604-291-5625 >>>> Fax: 604-291-3496 >>>> Personal site: http://www.sfu.ca/~rvosa >>>> FAB* lab: http://www.sfu.ca/~fabstar >>>> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> >>> >>> >> -- >> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Rutger Vos, PhD. candidate >> Department of Biological Sciences >> Simon Fraser University >> 8888 University Drive >> Burnaby, BC, V5A1S6 >> Phone: 604-291-5625 >> Fax: 604-291-3496 >> Personal site: http://www.sfu.ca/~rvosa >> FAB* lab: http://www.sfu.ca/~fabstar >> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > > > > > > -- ++++++++++++++++++++++++++++++++++++++++++++++++++++ Rutger Vos, PhD. candidate Department of Biological Sciences Simon Fraser University 8888 University Drive Burnaby, BC, V5A1S6 Phone: 604-291-5625 Fax: 604-291-3496 Personal site: http://www.sfu.ca/~rvosa FAB* lab: http://www.sfu.ca/~fabstar Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ ++++++++++++++++++++++++++++++++++++++++++++++++++++ From cjfields at uiuc.edu Tue May 30 18:08:26 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 30 May 2006 17:08:26 -0500 Subject: [Bioperl-l] For CVS developers - potentialpitfallwith"returnundef" In-Reply-To: <447CC001.4050000@sfu.ca> Message-ID: <001a01c68435$93135a50$15327e82@pyrimidine> Agreed. I would say, probably 6-12 months time, might be a good idea to try getting something actually started, maybe under the 'bioperl-experimental' title Jason has mentioned. One could always try getting a Bio::Root-like object going in Pugs/Perl6 as a starter and work up from there, with emphasis on key areas (seq. parsing, so on). CJF > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Rutger Vos > Sent: Tuesday, May 30, 2006 4:58 PM > To: bioperl list > Subject: Re: [Bioperl-l] For CVS developers - > potentialpitfallwith"returnundef" > > I've been following the perl6 mailing lists for a while now. I think > this time around it won't really take that long (one year?) for > pugs/perl6 stacks to become more than just toys. I think especially > large projects, like bioperl, will really benefit from the improved OO > implementation in perl6, so it might be of interest to at least > fantasize about it. > > Chris Fields wrote: > > Ha! Or may be the 'nonexistent' bioperl-experimental. Wonder what'll > > happen once Perl6 comes to term? > > > > -CJF > > > > > >> -----Original Message----- > >> From: Rutger Vos [mailto:rvosa at sfu.ca] > >> Sent: Tuesday, May 30, 2006 4:48 PM > >> To: Chris Fields > >> Subject: Re: [Bioperl-l] For CVS developers - potential > >> pitfallwith"returnundef" > >> > >> Surely this will all sort itself out in bioperl6 ;-) > >> > >> Chris Fields wrote: > >> > >>> Agreed, though I think these changes should be implemented at some > point > >>> (Conway's argument here makes sense and it is nice for Torsten to > check > >>> > >> this > >> > >>> out). If proper tests are written then any changes resulting in > errors > >>> should be picked up by checking the appropriate test suite, though I > >>> > >> know it > >> > >>> doesn't absolutely guarantee it. ; P > >>> > >>> Chris > >>> > >>> > >>> > >>>> -----Original Message----- > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>> bounces at lists.open-bio.org] On Behalf Of Rutger Vos > >>>> Sent: Tuesday, May 30, 2006 1:53 PM > >>>> To: bioperl-l at lists.open-bio.org > >>>> Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith > >>>> "returnundef" > >>>> > >>>> Although I agree with the sentiment of following PBP, I'm not so sure > >>>> changing 'return undef' to 'return' *now* will fix any bugs without > >>>> introducing new, subtle ones. > >>>> > >>>> Chris Fields wrote: > >>>> > >>>> > >>>>> Torsten, > >>>>> > >>>>> Any way you can post a list of some/all of the offending lines or > >>>>> > >>>>> > >>>> modules? > >>>> > >>>> > >>>>> Sounds like something to consider, but if the list is as large as > you > >>>>> > >>>>> > >>>> say we > >>>> > >>>> > >>>>> made need something (bugzilla? wiki?) to track the changes and make > >>>>> > >> sure > >> > >>>>> they pass tests; I'm sure a large majority will. > >>>>> > >>>>> I'm guessing Jason would want this somewhere on the project priority > >>>>> > >>>>> > >>>> list or > >>>> > >>>> > >>>>> bugzilla, with a link to the actual list, but I'm not sure. Maybe > >>>>> > >> start > >> > >>>> a > >>>> > >>>> > >>>>> page on the wiki for proposed code changes? > >>>>> > >>>>> Chris > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>>>> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > >>>>>> Sent: Tuesday, May 30, 2006 3:19 AM > >>>>>> To: bioperl-l at lists.open-bio.org > >>>>>> Subject: [Bioperl-l] For CVS developers - potential pitfall with > >>>>>> "returnundef" > >>>>>> > >>>>>> FYI Bioperl developers: > >>>>>> > >>>>>> I just audited the bioperl-live CVS and found about 450 occurrences > >>>>>> > >> of > >> > >>>>>> "return undef". > >>>>>> > >>>>>> Page 199 of "Perl Best Practices" by Damian Conway, and this URL > >>>>>> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html > >>>>>> > >> suggest: > >> > >>>>>> "Use return; instead of return undef; if you want to return > nothing. > >>>>>> > >> If > >> > >>>>>> someone assigns the return value to an array, the latter creates an > >>>>>> array of one value (undef), which evaluates to true. The former > will > >>>>>> correctly handle all contexts." > >>>>>> > >>>>>> So I'm guessing at least some of these 450 occurrences *could* > result > >>>>>> > >>>>>> > >>>> in > >>>> > >>>> > >>>>>> bugs and should probably be changed. > >>>>>> > >>>>>> Your opinion may differ :-) > >>>>>> > >>>>>> -- > >>>>>> Dr Torsten Seemann http://www.vicbioinformatics.com > >>>>>> Victorian Bioinformatics Consortium, Monash University, Australia > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>>>> > >>>>>> > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>> -- > >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>> Rutger Vos, PhD. candidate > >>>> Department of Biological Sciences > >>>> Simon Fraser University > >>>> 8888 University Drive > >>>> Burnaby, BC, V5A1S6 > >>>> Phone: 604-291-5625 > >>>> Fax: 604-291-3496 > >>>> Personal site: http://www.sfu.ca/~rvosa > >>>> FAB* lab: http://www.sfu.ca/~fabstar > >>>> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>> > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> > >>> > >>> > >>> > >>> > >> -- > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> Rutger Vos, PhD. candidate > >> Department of Biological Sciences > >> Simon Fraser University > >> 8888 University Drive > >> Burnaby, BC, V5A1S6 > >> Phone: 604-291-5625 > >> Fax: 604-291-3496 > >> Personal site: http://www.sfu.ca/~rvosa > >> FAB* lab: http://www.sfu.ca/~fabstar > >> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> > > > > > > > > > > > > > > -- > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > Rutger Vos, PhD. candidate > Department of Biological Sciences > Simon Fraser University > 8888 University Drive > Burnaby, BC, V5A1S6 > Phone: 604-291-5625 > Fax: 604-291-3496 > Personal site: http://www.sfu.ca/~rvosa > FAB* lab: http://www.sfu.ca/~fabstar > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ULNJUJERYDIX at spammotel.com Tue May 30 23:45:12 2006 From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau) Date: Wed, 31 May 2006 11:45:12 +0800 Subject: [Bioperl-l] SOLVED Bio::Graphics::Panel make ruler have neg values Message-ID: <5b6410e0605302045x5c420674x6f898a8a2973991a@mail.gmail.com> I am so sorry for the truncated email accidentally hit reply. if anyone is interested i have opted to change change line 161 of arrow.pm in Perl/site/lib/Bio/Graphics/Glyph/arrow.pm in linux its /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Glyph/arrow.pm $gd->string($font,$middle,$center+$a2-1,$label,$font_color) to $gd->string($font,$middle,$center+$a2-1,$label-1000,$font_color) just for this one-off use. strangely I found at line 112 for ver 1.51 bioperl in arrow.pm a hidden option for coords offset? my $relative_coords_offset = $self->option('relative_coords_offset'); $relative_coords_offset = 1 unless defined $relative_coords_offset; but entering the option -relative_coords_offset=>1000 in the arrow glyphs didn't do anything... Hi! > oh it was in a slightly different header asking about the create image map > feature. > I am using the stable version 1.4 of bioperl now. In any case I have not > added the sequence as a feature annotated seq. as I already have the bp > where the TF binds (in 1-1050 numberings) so what I did was to just add > graded segments based on the position. > I saw that there is a scale function for the arrow glyp however, it is a > multiply function, can it be hacked to take in a offset value (ie minus > the > scale by 1000?) > > cheers > kevin > > > Hi, > > > > For some reason I didn't see the first posting on this. In current > bioperl > > live, the ruler can have negative numberings - I use this routinely. You > > need > > to create a feature that starts in negative coordinates. What is > happening > > to > > you when you try this? > > > > Lincoln > > > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote: > > > Hi > > > thanks for the help offered thus far! > > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq > > using > > > bioperl. therefore i was asked to make the numberings as such (-1000) > is > > > there any way at all to do this in bioperl without changing the .pm > > file? > > > > > > thanks guys.. > > > kevin > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Lincoln D. Stein > > Cold Spring Harbor Laboratory > > 1 Bungtown Road > > Cold Spring Harbor, NY 11724 > > (516) 367-8380 (voice) > > (516) 367-8389 (fax) > > FOR URGENT MESSAGES & SCHEDULING, > > PLEASE CONTACT MY ASSISTANT, > > SANDRA MICHELSEN, AT michelse at cshl.edu > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From sb at mrc-dunn.cam.ac.uk Wed May 31 04:40:08 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Wed, 31 May 2006 09:40:08 +0100 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <447C7985.9000404@cornell.edu> References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk> <447C7985.9000404@cornell.edu> Message-ID: <447D5668.7070500@mrc-dunn.cam.ac.uk> Genevieve DeClerck wrote: > Thanks for your comment Sendu, it was very helpful. I think this must be > what's going on.. I am using $blast_report->next_result in both > subroutines. It appears that analyzing the blast results first w/ my > sort subroutine empties (?) the $blast_result object so that when I try > to print, there is nothing left to print. (and visa-versa when I print > first then try to sort). > So, from the looks of things, using next_result has the effect of > popping the Bio::Search::Result::ResultI objects off of the SearchIO > blast report object?? Not quite. It's more or less exactly like opening a file and then trying to read it all twice like this: open(FILE, "file"); while () { print # prints each line in the file } while () { print # never happens, we never enter this while loop } To get the second while loop to print anything we need to say seek(FILE, 0, 0) before it. Or in the first while loop store each line in an array, and then make the second loop a foreach through that array. > It seems I could get around this by making a copy of the blast report by > setting it to another new variable...(not the most elegant solution) but > I'm having trouble with this... > > If I do: > > my $blast_report_copy = $blast_report; > > I'm just copying the reference to the SearchIO blast result, so it > doesn't help me. How can I make another physical copy of this blast > result object? Seems like a simple thing but how to do it is escaping me. Not really a good idea, and it may not work anyway if the object contains a filehandle. But for a simple object you might recursively loop through the data structure and copy each element out into a similar data structure. > But better yet, the way to go is to 'reset the counter,' or to find a > way to look at/print/sort the results without removing data from the > blast result object. How is this done though?? It would be rather nice if this worked: my $blast_report = $factory->blastall($ref_seq_objs); my $blast_fh = $blast_report->fh(); while (<$blast_fh>) { # $_ is a ResultI object, use as normal } seek($blast_fh, 0, 0); # this would be great, but does it work? while <$blast_fh>) { # go through the results again in your second subroutine } An alternative hacky way of doing it, which may also not work, would be to go through your $blast_report as normal, but then before going through it a second time, say my $fh = $blast_report->_fh; seek($fh, 0, 0); Finally, the most sensible way (assuming bioperl provides no methods of its own for this) of solving the problem is, the first time you go through each next_result, next_hit and next_hsp, just store the returned objects in an array of arrays of arrays. Then the second time get the objects from your array structure instead of with the method calls. From heikki at sanbi.ac.za Wed May 31 06:55:18 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 31 May 2006 12:55:18 +0200 Subject: [Bioperl-l] =?iso-8859-1?q?For_CVS_developers_-_potential_pitfall?= =?iso-8859-1?q?with_=22returnundef=22?= In-Reply-To: <001801c68431$a586b2d0$15327e82@pyrimidine> References: <001801c68431$a586b2d0$15327e82@pyrimidine> Message-ID: <200605311255.19166.heikki@sanbi.ac.za> In my opinion the sooner the bugs get exposed the better. It is much more likely that there is a well hidden bug caused by assigning accidentally undef into an one element array that someone intentionally writing code that expects that behaviour! I removed (but did not commit yet) all undefs from my old Bio::Variation code and could not see any differences in the test output. Let's remove them! -Heikki On Tuesday 30 May 2006 23:40, Chris Fields wrote: > Agreed, though I think these changes should be implemented at some point > (Conway's argument here makes sense and it is nice for Torsten to check > this out). If proper tests are written then any changes resulting in > errors should be picked up by checking the appropriate test suite, though I > know it doesn't absolutely guarantee it. ; P > > Chris > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Rutger Vos > > Sent: Tuesday, May 30, 2006 1:53 PM > > To: bioperl-l at lists.open-bio.org > > Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith > > "returnundef" > > > > Although I agree with the sentiment of following PBP, I'm not so sure > > changing 'return undef' to 'return' *now* will fix any bugs without > > introducing new, subtle ones. > > > > Chris Fields wrote: > > > Torsten, > > > > > > Any way you can post a list of some/all of the offending lines or > > > > modules? > > > > > Sounds like something to consider, but if the list is as large as you > > > > say we > > > > > made need something (bugzilla? wiki?) to track the changes and make > > > sure they pass tests; I'm sure a large majority will. > > > > > > I'm guessing Jason would want this somewhere on the project priority > > > > list or > > > > > bugzilla, with a link to the actual list, but I'm not sure. Maybe > > > start > > > > a > > > > > page on the wiki for proposed code changes? > > > > > > Chris > > > > > >> -----Original Message----- > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > > >> Sent: Tuesday, May 30, 2006 3:19 AM > > >> To: bioperl-l at lists.open-bio.org > > >> Subject: [Bioperl-l] For CVS developers - potential pitfall with > > >> "returnundef" > > >> > > >> FYI Bioperl developers: > > >> > > >> I just audited the bioperl-live CVS and found about 450 occurrences of > > >> "return undef". > > >> > > >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL > > >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html > > >> suggest: > > >> > > >> "Use return; instead of return undef; if you want to return nothing. > > >> If someone assigns the return value to an array, the latter creates an > > >> array of one value (undef), which evaluates to true. The former will > > >> correctly handle all contexts." > > >> > > >> So I'm guessing at least some of these 450 occurrences *could* result > > > > in > > > > >> bugs and should probably be changed. > > >> > > >> Your opinion may differ :-) > > >> > > >> -- > > >> Dr Torsten Seemann http://www.vicbioinformatics.com > > >> Victorian Bioinformatics Consortium, Monash University, Australia > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Rutger Vos, PhD. candidate > > Department of Biological Sciences > > Simon Fraser University > > 8888 University Drive > > Burnaby, BC, V5A1S6 > > Phone: 604-291-5625 > > Fax: 604-291-3496 > > Personal site: http://www.sfu.ca/~rvosa > > FAB* lab: http://www.sfu.ca/~fabstar > > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of the Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Wed May 31 06:44:28 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 31 May 2006 12:44:28 +0200 Subject: [Bioperl-l] Bio::Restriction::IO issues In-Reply-To: <000f01c683f8$5771ed50$15327e82@pyrimidine> References: <000f01c683f8$5771ed50$15327e82@pyrimidine> Message-ID: <200605311244.29187.heikki@sanbi.ac.za> Chris, Thanks for stepping in. I feel partly responsible here because I originally changed some of Rob's code but have not followed up since. There have not been active development on these modules so do not worry about stepping on anyone's toes. -Heikki On Tuesday 30 May 2006 16:50, Chris Fields wrote: > Jason, Brian, et al, > > I found several major issues with Bio::Restriction::IO (this popped up > while bug squashing). In particular, the POD is pretty misleading. It > states (directly from perldoc): > > SYNOPSIS > use Bio::Restriction::IO; > > $in = Bio::Restriction::IO->new(-file => "inputfilename" , > -format => 'withrefm'); > $out = Bio::Restriction::IO->new(-file => ">outputfilename" , > -format => 'bairoch'); > my $res = $in->read; # a Bio::Restriction::EnzymeCollection > $out->write($res); > > # or > > # use Bio::Restriction::IO; > # > # #input file format can be read from the file extension (dat|xml) > # $in = Bio::Restriction::IO->newFh(-file => "inputfilename"); > # $out = Bio::Restriction::IO->newFh('-format' => 'xml'); > # > # # World's shortest flat<->xml format converter: > # print $out $_ while <$in>; > > So, I have found several problems with these modules. I really hate to > criticize code here, as my own is pretty hacky, but I think these are > things to seriously mull over: > > 1) Note that, though some of the lines above are commented they are > still there in POD and thus present in perldoc/pod2html etc. So, judging > from the above, it suggests using the script above should read in from one > format and write out to another (like SeqIO). However, NONE of the current > write() methods are implemented for any of the IO modules (withref, base, > itype2, bairoch), so this does not happen as expected. You get the nasty > thrown 'method not implemented error' instead when writing. > 2) The commented statements in POD above also suggest that REBASE XML > format is supported when there is no XML module. > 3) The Bio::Restriction::IO::bairoch module had multiple bugs which > made it unusable until I added a few small changes; it still can't handle > multisite/multicut enzymes properly, so in essence it is useless until that > is addressed. > 4) Bio::Restriction::IO inherits from Bio::SeqIO, though I'm not sure > why. Shouldn't it just inherit from Bio::Root::Root/Bio::Root::IO and make > up it's own methods? > > I'm working on at least getting the 'bairoch' input format up and running > (so at least it gets the enzymes into a > Bio::Restriction::Enzyme::Collection). From this point I'm not sure where > to proceed. The POD obviously needs to be corrected to reflect that > writing formats is not implemented (and the bit about XML should be taken > out completely); that's the easy part which I am working on and plan > committing today. However, these modules don't seem to be used too > frequently so I'm not sure whether it's worth spending too much time > getting these up to speed at the moment (adding write methods, switching to > Bio::Root::Root, etc); I have other priorities at the moment (including a > way overdue ListSummary). I'm also not sure who else is (using|working) on > these so I don't want to (make too many changes|step on someone else's > toes), but these are, IMHO, pretty serious problems. > > Any thoughts? > > Chris > > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of the Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From cjfields at uiuc.edu Wed May 31 09:10:00 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 08:10:00 -0500 Subject: [Bioperl-l] Bio::Restriction::IO issues In-Reply-To: <200605311244.29187.heikki@sanbi.ac.za> References: <000f01c683f8$5771ed50$15327e82@pyrimidine> <200605311244.29187.heikki@sanbi.ac.za> Message-ID: Heikki, I mainly just changed a few things so no one would get the wrong ideas from POD (that they write format as well) and added a few things to the TO DO. I also added a warning to Bio::Restriction::IO::bairoch for the multisite/multicut issue. Besides that I haven't done much to them. I also added a bit to the Project Priority List in case someone wants to take it up. I may tinker with it but it's not really high on my priority list. I've been pretty busy getting the ListSummaries back up to speed (very busy mail lists since the last one) and am writing/testing a new interface to NCBI EUtilities which I may donate at some in the next few months or so. Chris On May 31, 2006, at 5:44 AM, Heikki Lehvaslaiho wrote: > > Chris, > > Thanks for stepping in. I feel partly responsible here because I > originally > changed some of Rob's code but have not followed up since. > > There have not been active development on these modules so do not > worry about > stepping on anyone's toes. > > -Heikki > > On Tuesday 30 May 2006 16:50, Chris Fields wrote: >> Jason, Brian, et al, >> >> I found several major issues with Bio::Restriction::IO (this >> popped up >> while bug squashing). In particular, the POD is pretty >> misleading. It >> states (directly from perldoc): >> >> SYNOPSIS >> use Bio::Restriction::IO; >> >> $in = Bio::Restriction::IO->new(-file => "inputfilename" , >> -format => 'withrefm'); >> $out = Bio::Restriction::IO->new(-file => ">outputfilename" , >> -format => 'bairoch'); >> my $res = $in->read; # a Bio::Restriction::EnzymeCollection >> $out->write($res); >> >> # or >> >> # use Bio::Restriction::IO; >> # >> # #input file format can be read from the file extension >> (dat|xml) >> # $in = Bio::Restriction::IO->newFh(-file => >> "inputfilename"); >> # $out = Bio::Restriction::IO->newFh('-format' => 'xml'); >> # >> # # World's shortest flat<->xml format converter: >> # print $out $_ while <$in>; >> >> So, I have found several problems with these modules. I really >> hate to >> criticize code here, as my own is pretty hacky, but I think these are >> things to seriously mull over: >> >> 1) Note that, though some of the lines above are commented they are >> still there in POD and thus present in perldoc/pod2html etc. So, >> judging >> from the above, it suggests using the script above should read in >> from one >> format and write out to another (like SeqIO). However, NONE of >> the current >> write() methods are implemented for any of the IO modules >> (withref, base, >> itype2, bairoch), so this does not happen as expected. You get >> the nasty >> thrown 'method not implemented error' instead when writing. >> 2) The commented statements in POD above also suggest that REBASE XML >> format is supported when there is no XML module. >> 3) The Bio::Restriction::IO::bairoch module had multiple bugs which >> made it unusable until I added a few small changes; it still can't >> handle >> multisite/multicut enzymes properly, so in essence it is useless >> until that >> is addressed. >> 4) Bio::Restriction::IO inherits from Bio::SeqIO, though I'm not sure >> why. Shouldn't it just inherit from Bio::Root::Root/Bio::Root::IO >> and make >> up it's own methods? >> >> I'm working on at least getting the 'bairoch' input format up and >> running >> (so at least it gets the enzymes into a >> Bio::Restriction::Enzyme::Collection). From this point I'm not >> sure where >> to proceed. The POD obviously needs to be corrected to reflect that >> writing formats is not implemented (and the bit about XML should >> be taken >> out completely); that's the easy part which I am working on and plan >> committing today. However, these modules don't seem to be used too >> frequently so I'm not sure whether it's worth spending too much time >> getting these up to speed at the moment (adding write methods, >> switching to >> Bio::Root::Root, etc); I have other priorities at the moment >> (including a >> way overdue ListSummary). I'm also not sure who else is (using| >> working) on >> these so I don't want to (make too many changes|step on someone >> else's >> toes), but these are, IMHO, pretty serious problems. >> >> Any thoughts? >> >> Chris >> >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of the Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jay at jays.net Wed May 31 09:07:10 2006 From: jay at jays.net (Jay Hannah) Date: Wed, 31 May 2006 08:07:10 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl Message-ID: <447D94FE.8090305@jays.net> http://www.bioperl.org/wiki/Bptutorial.pl I think I just partially fulfilled this TODO: TODO: check if the POD is in the Wiki yet, and if not, put it here? I used Pod::Simple::Wiki (format 'mediawiki') to burn bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it the wiki page via my web browser. (Is that proper procedure? Is the plan to just do that manually from time to time as the document changes?) Now what? Should there be a new link on the far left of bioperl.org called "Tutorial"? It's an amazing document. IMHO it should be listed prominently on bioperl.org. HTH, j From osborne1 at optonline.net Wed May 31 09:58:01 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 31 May 2006 09:58:01 -0400 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <447D94FE.8090305@jays.net> Message-ID: Jay, Excellent! Now we need to answer a few more questions for ourselves: - Do we remove the file bptutorial.pl from the package now? I'd say yes, we don't want to have to maintain two bptutorials. - What do we do with the script part of bptutorial.pl? It certainly could be excised and put into the examples/ directory, for example, but this would break a few of the paths that are being used. - A link to bptutorial? Or a link to the existing tutorials page? http://www.bioperl.org/wiki/Tutorials. Any thoughts on these? Brian O. On 5/31/06 9:07 AM, "Jay Hannah" wrote: > http://www.bioperl.org/wiki/Bptutorial.pl > > I think I just partially fulfilled this TODO: > > TODO: check if the POD is in the Wiki yet, and if not, put it here? > > I used Pod::Simple::Wiki (format 'mediawiki') to burn > bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it the > wiki page via my web browser. (Is that proper procedure? Is the plan to just > do that manually from time to time as the document changes?) > > Now what? > > Should there be a new link on the far left of bioperl.org called "Tutorial"? > > It's an amazing document. IMHO it should be listed prominently on bioperl.org. > > HTH, > > j > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From luciap at sas.upenn.edu Wed May 31 10:06:13 2006 From: luciap at sas.upenn.edu (Lucia Peixoto) Date: Wed, 31 May 2006 10:06:13 -0400 Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function In-Reply-To: <6B175FC0-F9D4-4658-AF9D-23D7F1C1B241@duke.edu> References: <1149019912.447ca7085124e@128.91.55.38> <6B175FC0-F9D4-4658-AF9D-23D7F1C1B241@duke.edu> Message-ID: <1149084373.447da2d5c5339@128.91.55.38> Hi Thanks a couple more questions why is the bootstrap value stored as the node id? Is that right? also, in the add_descendant method, how do you set the $ignoreoverwrite parameter to true? Lucia Quoting Jason Stajich : > you need to special case the root - it won't have an ancestor. just > protect the my $parent = $node->ancestor with an if statement as I > did below > > On May 30, 2006, at 4:11 PM, Lucia Peixoto wrote: > > > Hi > > OK that was silly, but what I have in my code is what you just wrote > > But the problem is that if I write > > > > $parent->add_Descendent($child) > > > > it tells me that I am calling the method "ass_Descendent" on an > > undefined value > > (but I did define $parent before??) > > > > So here it goes the code so far: > > > > use Bio::TreeIO; > > my $in = new Bio::TreeIO(-file => 'Test2.tre', > > -format => 'newick'); > > my $out = new Bio::TreeIO(-file => '>mytree.out', > > -format => 'newick'); > > while( my $tree = $in->next_tree ) { > > foreach my $node ( grep { ! $_->is_Leaf() } $tree->get_nodes() ) { > > my $bootstrap=$node->_creation_id; > > > > if ($bootstrap < 70 ){ > > >>> if( my $parent = $node->ancestor ) { > > my @children=$node->get_all_Descendents; > > foreach my $child (@children){ > > $parent->add_Descendent($child); > > } > } > > > > ........ > > > > eventually I'll add (once I assigned the children to the parent > > succesfully): > > $tree->remove_Node($node); > > > > } > > } > > $out->write_tree($tree); > > } > > > > Quoting aaron.j.mackey at gsk.com: > > > >>> foreach $child (@children){ > >>> $parent=add_Descendent->$child; > >>> } > >> > >> I think what you want is $parent->add_Descendent($child) > >> > >> -Aaron > >> > > > > > > Lucia Peixoto > > Department of Biology,SAS > > University of Pennsylvania > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > Lucia Peixoto Department of Biology,SAS University of Pennsylvania From sb at mrc-dunn.cam.ac.uk Wed May 31 10:56:49 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Wed, 31 May 2006 15:56:49 +0100 Subject: [Bioperl-l] For CVS developers - potential pitfallwith "returnundef" In-Reply-To: <200605311255.19166.heikki@sanbi.ac.za> References: <001801c68431$a586b2d0$15327e82@pyrimidine> <200605311255.19166.heikki@sanbi.ac.za> Message-ID: <447DAEB1.4040509@mrc-dunn.cam.ac.uk> Heikki Lehvaslaiho wrote: > In my opinion the sooner the bugs get exposed the better. It is much more > likely that there is a well hidden bug caused by assigning accidentally undef > into an one element array that someone intentionally writing code that > expects that behaviour! > > I removed (but did not commit yet) all undefs from my old Bio::Variation code > and could not see any differences in the test output. > > Let's remove them! Just looking for all return undef;s isn't enough. It's entirely possible to do something like: my $return_value; { # do something that assigns to return_value on success # on failure, just do nothing } return $return_value; The bioperl docs will typically explicitly state that undef is returned, and under what circumstance. If a user suffers from the undef-into-array-problem, yes it can be slightly unexpected, but lots of unexpected things will happen when you don't use a method correctly, as per the docs! Fixing the return of undef is either a job that shouldn't be done, or a much harder job than expected. From bernd.web at gmail.com Wed May 31 10:30:30 2006 From: bernd.web at gmail.com (Bernd Web) Date: Wed, 31 May 2006 16:30:30 +0200 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: References: <447D94FE.8090305@jays.net> Message-ID: <716af09c0605310730o7de20489m674a07b5a928039d@mail.gmail.com> Hi, I am not sure to what extent bptutorial will be removed, but I actually like having bptutorial.pl in my BioPerl base for reference. regards, Bernd On 5/31/06, Brian Osborne wrote: > Jay, > > Excellent! Now we need to answer a few more questions for ourselves: > > - Do we remove the file bptutorial.pl from the package now? I'd say yes, we > don't want to have to maintain two bptutorials. > > - What do we do with the script part of bptutorial.pl? It certainly could be > excised and put into the examples/ directory, for example, but this would > break a few of the paths that are being used. > > - A link to bptutorial? Or a link to the existing tutorials page? > http://www.bioperl.org/wiki/Tutorials. > > Any thoughts on these? > > > Brian O. > > > On 5/31/06 9:07 AM, "Jay Hannah" wrote: > > > http://www.bioperl.org/wiki/Bptutorial.pl > > > > I think I just partially fulfilled this TODO: > > > > TODO: check if the POD is in the Wiki yet, and if not, put it here? > > > > I used Pod::Simple::Wiki (format 'mediawiki') to burn > > bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it the > > wiki page via my web browser. (Is that proper procedure? Is the plan to just > > do that manually from time to time as the document changes?) > > > > Now what? > > > > Should there be a new link on the far left of bioperl.org called "Tutorial"? > > > > It's an amazing document. IMHO it should be listed prominently on bioperl.org. > > > > HTH, > > > > j > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From lstein at cshl.edu Wed May 31 12:03:13 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 31 May 2006 12:03:13 -0400 Subject: [Bioperl-l] For CVS developers - potential pitfallwith "returnundef" In-Reply-To: <200605311255.19166.heikki@sanbi.ac.za> References: <001801c68431$a586b2d0$15327e82@pyrimidine> <200605311255.19166.heikki@sanbi.ac.za> Message-ID: <200605311203.13922.lstein@cshl.edu> I'm afraid that everything depends on the context. If the subroutine is documented to return a single scalar, then returning undef is appropriate. If the subroutine is documented to return "false" on failure, then one must call return (or "return ()" ). Changing all the return undefs to return is going to expose hidden bugs in the code written by people who are using BioPerl. While I agree wholeheartedly with the proposed audit, I think we need to expect that people are going to complain. Lincoln On Wednesday 31 May 2006 06:55, Heikki Lehvaslaiho wrote: > In my opinion the sooner the bugs get exposed the better. It is much more > likely that there is a well hidden bug caused by assigning accidentally > undef into an one element array that someone intentionally writing code > that expects that behaviour! > > I removed (but did not commit yet) all undefs from my old Bio::Variation > code and could not see any differences in the test output. > > Let's remove them! > > -Heikki > > On Tuesday 30 May 2006 23:40, Chris Fields wrote: > > Agreed, though I think these changes should be implemented at some point > > (Conway's argument here makes sense and it is nice for Torsten to check > > this out). If proper tests are written then any changes resulting in > > errors should be picked up by checking the appropriate test suite, though > > I know it doesn't absolutely guarantee it. ; P > > > > Chris > > > > > -----Original Message----- > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > bounces at lists.open-bio.org] On Behalf Of Rutger Vos > > > Sent: Tuesday, May 30, 2006 1:53 PM > > > To: bioperl-l at lists.open-bio.org > > > Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith > > > "returnundef" > > > > > > Although I agree with the sentiment of following PBP, I'm not so sure > > > changing 'return undef' to 'return' *now* will fix any bugs without > > > introducing new, subtle ones. > > > > > > Chris Fields wrote: > > > > Torsten, > > > > > > > > Any way you can post a list of some/all of the offending lines or > > > > > > modules? > > > > > > > Sounds like something to consider, but if the list is as large as you > > > > > > say we > > > > > > > made need something (bugzilla? wiki?) to track the changes and make > > > > sure they pass tests; I'm sure a large majority will. > > > > > > > > I'm guessing Jason would want this somewhere on the project priority > > > > > > list or > > > > > > > bugzilla, with a link to the actual list, but I'm not sure. Maybe > > > > start > > > > > > a > > > > > > > page on the wiki for proposed code changes? > > > > > > > > Chris > > > > > > > >> -----Original Message----- > > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > > > >> Sent: Tuesday, May 30, 2006 3:19 AM > > > >> To: bioperl-l at lists.open-bio.org > > > >> Subject: [Bioperl-l] For CVS developers - potential pitfall with > > > >> "returnundef" > > > >> > > > >> FYI Bioperl developers: > > > >> > > > >> I just audited the bioperl-live CVS and found about 450 occurrences > > > >> of "return undef". > > > >> > > > >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL > > > >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html > > > >> suggest: > > > >> > > > >> "Use return; instead of return undef; if you want to return nothing. > > > >> If someone assigns the return value to an array, the latter creates > > > >> an array of one value (undef), which evaluates to true. The former > > > >> will correctly handle all contexts." > > > >> > > > >> So I'm guessing at least some of these 450 occurrences *could* > > > >> result > > > > > > in > > > > > > >> bugs and should probably be changed. > > > >> > > > >> Your opinion may differ :-) > > > >> > > > >> -- > > > >> Dr Torsten Seemann http://www.vicbioinformatics.com > > > >> Victorian Bioinformatics Consortium, Monash University, Australia > > > >> > > > >> _______________________________________________ > > > >> Bioperl-l mailing list > > > >> Bioperl-l at lists.open-bio.org > > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > Rutger Vos, PhD. candidate > > > Department of Biological Sciences > > > Simon Fraser University > > > 8888 University Drive > > > Burnaby, BC, V5A1S6 > > > Phone: 604-291-5625 > > > Fax: 604-291-3496 > > > Personal site: http://www.sfu.ca/~rvosa > > > FAB* lab: http://www.sfu.ca/~fabstar > > > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Wed May 31 12:34:54 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 11:34:54 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: Message-ID: <001201c684d0$263c5530$15327e82@pyrimidine> Brian, Jay, I think it would be nice to have the tutorial prominently displayed somehow (Jay's suggestion), with a link provided via the tutorials page. Hopefully this will help with the bioperl newbies. Jay, looks like there are still some weird formatting issues with the bptutorial wiki page, something which I ran into before when getting the Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or more spaces preceding a line denotes code for some reason). Not much you can do in these cases except remove the extra spaces in those spots. Looking good though! Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Brian Osborne > Sent: Wednesday, May 31, 2006 8:58 AM > To: Jay Hannah; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl > > Jay, > > Excellent! Now we need to answer a few more questions for ourselves: > > - Do we remove the file bptutorial.pl from the package now? I'd say yes, > we > don't want to have to maintain two bptutorials. > > - What do we do with the script part of bptutorial.pl? It certainly could > be > excised and put into the examples/ directory, for example, but this would > break a few of the paths that are being used. > > - A link to bptutorial? Or a link to the existing tutorials page? > http://www.bioperl.org/wiki/Tutorials. > > Any thoughts on these? > > > Brian O. > > > On 5/31/06 9:07 AM, "Jay Hannah" wrote: > > > http://www.bioperl.org/wiki/Bptutorial.pl > > > > I think I just partially fulfilled this TODO: > > > > TODO: check if the POD is in the Wiki yet, and if not, put it here? > > > > I used Pod::Simple::Wiki (format 'mediawiki') to burn > > bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it > the > > wiki page via my web browser. (Is that proper procedure? Is the plan to > just > > do that manually from time to time as the document changes?) > > > > Now what? > > > > Should there be a new link on the far left of bioperl.org called > "Tutorial"? > > > > It's an amazing document. IMHO it should be listed prominently on > bioperl.org. > > > > HTH, > > > > j > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Wed May 31 12:44:31 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 11:44:31 -0500 Subject: [Bioperl-l] For CVS developers - potential pitfallwith"returnundef" In-Reply-To: <200605311203.13922.lstein@cshl.edu> Message-ID: <001301c684d1$7e849fd0$15327e82@pyrimidine> My feeling is the test suite 'should' pick up a large majority of problems if changes are made to these lines, the quotes there indicating the utopian idea that the tests are all written well (I believe 99% of the tests are, BTW). You can always try the changes (wholesale or on smaller chunks of code), see if they pass tests on different OS's using 'make/nmake test', revert the ones that didn't pass, etc. It's a matter of someone willing to try it out. I think the original argument proposed here (originating from Damian Conway and 'Perl Best Practices') is maybe using 'return undef' is something we shouldn't be doing since this can lead to subtle errors itself. Not that everything we do is considered 'a good practice' by any means. If I remember correctly from 'OOPerl', Conway doesn't like combined get/setters either (he prefers separate getters and setters); we use the 'bad' combined version predominately in Bioperl. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Lincoln Stein > Sent: Wednesday, May 31, 2006 11:03 AM > To: bioperl-l at lists.open-bio.org > Cc: Heikki Lehvaslaiho > Subject: Re: [Bioperl-l] For CVS developers - potential > pitfallwith"returnundef" > > I'm afraid that everything depends on the context. If the subroutine is > documented to return a single scalar, then returning undef is appropriate. > If > the subroutine is documented to return "false" on failure, then one must > call > return (or "return ()" ). > > Changing all the return undefs to return is going to expose hidden bugs in > the > code written by people who are using BioPerl. While I agree wholeheartedly > with the proposed audit, I think we need to expect that people are going > to > complain. > > Lincoln > > > On Wednesday 31 May 2006 06:55, Heikki Lehvaslaiho wrote: > > In my opinion the sooner the bugs get exposed the better. It is much > more > > likely that there is a well hidden bug caused by assigning accidentally > > undef into an one element array that someone intentionally writing code > > that expects that behaviour! > > > > I removed (but did not commit yet) all undefs from my old Bio::Variation > > code and could not see any differences in the test output. > > > > Let's remove them! > > > > -Heikki > > > > On Tuesday 30 May 2006 23:40, Chris Fields wrote: > > > Agreed, though I think these changes should be implemented at some > point > > > (Conway's argument here makes sense and it is nice for Torsten to > check > > > this out). If proper tests are written then any changes resulting in > > > errors should be picked up by checking the appropriate test suite, > though > > > I know it doesn't absolutely guarantee it. ; P > > > > > > Chris > > > > > > > -----Original Message----- > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > bounces at lists.open-bio.org] On Behalf Of Rutger Vos > > > > Sent: Tuesday, May 30, 2006 1:53 PM > > > > To: bioperl-l at lists.open-bio.org > > > > Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith > > > > "returnundef" > > > > > > > > Although I agree with the sentiment of following PBP, I'm not so > sure > > > > changing 'return undef' to 'return' *now* will fix any bugs without > > > > introducing new, subtle ones. > > > > > > > > Chris Fields wrote: > > > > > Torsten, > > > > > > > > > > Any way you can post a list of some/all of the offending lines or > > > > > > > > modules? > > > > > > > > > Sounds like something to consider, but if the list is as large as > you > > > > > > > > say we > > > > > > > > > made need something (bugzilla? wiki?) to track the changes and > make > > > > > sure they pass tests; I'm sure a large majority will. > > > > > > > > > > I'm guessing Jason would want this somewhere on the project > priority > > > > > > > > list or > > > > > > > > > bugzilla, with a link to the actual list, but I'm not sure. Maybe > > > > > start > > > > > > > > a > > > > > > > > > page on the wiki for proposed code changes? > > > > > > > > > > Chris > > > > > > > > > >> -----Original Message----- > > > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > > > > >> Sent: Tuesday, May 30, 2006 3:19 AM > > > > >> To: bioperl-l at lists.open-bio.org > > > > >> Subject: [Bioperl-l] For CVS developers - potential pitfall with > > > > >> "returnundef" > > > > >> > > > > >> FYI Bioperl developers: > > > > >> > > > > >> I just audited the bioperl-live CVS and found about 450 > occurrences > > > > >> of "return undef". > > > > >> > > > > >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL > > > > >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html > > > > >> suggest: > > > > >> > > > > >> "Use return; instead of return undef; if you want to return > nothing. > > > > >> If someone assigns the return value to an array, the latter > creates > > > > >> an array of one value (undef), which evaluates to true. The > former > > > > >> will correctly handle all contexts." > > > > >> > > > > >> So I'm guessing at least some of these 450 occurrences *could* > > > > >> result > > > > > > > > in > > > > > > > > >> bugs and should probably be changed. > > > > >> > > > > >> Your opinion may differ :-) > > > > >> > > > > >> -- > > > > >> Dr Torsten Seemann http://www.vicbioinformatics.com > > > > >> Victorian Bioinformatics Consortium, Monash University, Australia > > > > >> > > > > >> _______________________________________________ > > > > >> Bioperl-l mailing list > > > > >> Bioperl-l at lists.open-bio.org > > > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > _______________________________________________ > > > > > Bioperl-l mailing list > > > > > Bioperl-l at lists.open-bio.org > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > -- > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > Rutger Vos, PhD. candidate > > > > Department of Biological Sciences > > > > Simon Fraser University > > > > 8888 University Drive > > > > Burnaby, BC, V5A1S6 > > > > Phone: 604-291-5625 > > > > Fax: 604-291-3496 > > > > Personal site: http://www.sfu.ca/~rvosa > > > > FAB* lab: http://www.sfu.ca/~fabstar > > > > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Wed May 31 10:59:53 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 31 May 2006 10:59:53 -0400 Subject: [Bioperl-l] For CVS developers - potential pitfallwith "returnundef" In-Reply-To: <200605311255.19166.heikki@sanbi.ac.za> References: <001801c68431$a586b2d0$15327e82@pyrimidine> <200605311255.19166.heikki@sanbi.ac.za> Message-ID: <949F348A-391B-495D-ABCE-30BABC37FF05@gmx.net> I agree. Thanks to Torsten for the audit and Chris for stepping up. -hilmar On May 31, 2006, at 6:55 AM, Heikki Lehvaslaiho wrote: > In my opinion the sooner the bugs get exposed the better. It is > much more > likely that there is a well hidden bug caused by assigning > accidentally undef > into an one element array that someone intentionally writing code that > expects that behaviour! > > I removed (but did not commit yet) all undefs from my old > Bio::Variation code > and could not see any differences in the test output. > > Let's remove them! > > -Heikki > > On Tuesday 30 May 2006 23:40, Chris Fields wrote: >> Agreed, though I think these changes should be implemented at some >> point >> (Conway's argument here makes sense and it is nice for Torsten to >> check >> this out). If proper tests are written then any changes resulting in >> errors should be picked up by checking the appropriate test suite, >> though I >> know it doesn't absolutely guarantee it. ; P >> >> Chris >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Rutger Vos >>> Sent: Tuesday, May 30, 2006 1:53 PM >>> To: bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith >>> "returnundef" >>> >>> Although I agree with the sentiment of following PBP, I'm not so >>> sure >>> changing 'return undef' to 'return' *now* will fix any bugs without >>> introducing new, subtle ones. >>> >>> Chris Fields wrote: >>>> Torsten, >>>> >>>> Any way you can post a list of some/all of the offending lines or >>> >>> modules? >>> >>>> Sounds like something to consider, but if the list is as large >>>> as you >>> >>> say we >>> >>>> made need something (bugzilla? wiki?) to track the changes and make >>>> sure they pass tests; I'm sure a large majority will. >>>> >>>> I'm guessing Jason would want this somewhere on the project >>>> priority >>> >>> list or >>> >>>> bugzilla, with a link to the actual list, but I'm not sure. Maybe >>>> start >>> >>> a >>> >>>> page on the wiki for proposed code changes? >>>> >>>> Chris >>>> >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann >>>>> Sent: Tuesday, May 30, 2006 3:19 AM >>>>> To: bioperl-l at lists.open-bio.org >>>>> Subject: [Bioperl-l] For CVS developers - potential pitfall with >>>>> "returnundef" >>>>> >>>>> FYI Bioperl developers: >>>>> >>>>> I just audited the bioperl-live CVS and found about 450 >>>>> occurrences of >>>>> "return undef". >>>>> >>>>> Page 199 of "Perl Best Practices" by Damian Conway, and this URL >>>>> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html >>>>> suggest: >>>>> >>>>> "Use return; instead of return undef; if you want to return >>>>> nothing. >>>>> If someone assigns the return value to an array, the latter >>>>> creates an >>>>> array of one value (undef), which evaluates to true. The former >>>>> will >>>>> correctly handle all contexts." >>>>> >>>>> So I'm guessing at least some of these 450 occurrences *could* >>>>> result >>> >>> in >>> >>>>> bugs and should probably be changed. >>>>> >>>>> Your opinion may differ :-) >>>>> >>>>> -- >>>>> Dr Torsten Seemann http://www.vicbioinformatics.com >>>>> Victorian Bioinformatics Consortium, Monash University, Australia >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> Rutger Vos, PhD. candidate >>> Department of Biological Sciences >>> Simon Fraser University >>> 8888 University Drive >>> Burnaby, BC, V5A1S6 >>> Phone: 604-291-5625 >>> Fax: 604-291-3496 >>> Personal site: http://www.sfu.ca/~rvosa >>> FAB* lab: http://www.sfu.ca/~fabstar >>> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of the Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed May 31 14:08:43 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 31 May 2006 14:08:43 -0400 Subject: [Bioperl-l] For CVS developers - potential pitfallwith "returnundef" In-Reply-To: <200605311203.13922.lstein@cshl.edu> References: <001801c68431$a586b2d0$15327e82@pyrimidine> <200605311255.19166.heikki@sanbi.ac.za> <200605311203.13922.lstein@cshl.edu> Message-ID: On May 31, 2006, at 12:03 PM, Lincoln Stein wrote: > If the subroutine is documented to return "false" on failure, then > one must call > return (or "return ()" ). The problem seems to be that 'a value that evaluates to either true or false' and 'a [meaningful] value or undef' and 'a value or false' ('a value or no value) are not the same in perl. And what would/should one expect if the doc states 'true on success and false otherwise'? Maybe the documentation should also be fixed to avoid any ambiguity. I.e., avoid documenting 'a value or false' because it may be ambiguous (not only) to the less proficient. 'True or false' should imply a value being returned. Comments? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From lstein at cshl.edu Wed May 31 14:14:59 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 31 May 2006 14:14:59 -0400 Subject: [Bioperl-l] For CVS developers - potential pitfallwith "returnundef" In-Reply-To: References: <001801c68431$a586b2d0$15327e82@pyrimidine> <200605311203.13922.lstein@cshl.edu> Message-ID: <200605311415.00414.lstein@cshl.edu> If the documentation says "returns false" then I expect to be able to do this: @result = foo(); die "foo() failed" unless @result; If the documentation says "returns undef" then I expect this: @result = foo(); die "foo() failed" unless $result[0]; Lincoln On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote: > On May 31, 2006, at 12:03 PM, Lincoln Stein wrote: > > If the subroutine is documented to return "false" on failure, then > > one must call > > return (or "return ()" ). > > The problem seems to be that 'a value that evaluates to either true > or false' and 'a [meaningful] value or undef' and 'a value or > false' ('a value or no value) are not the same in perl. And what > would/should one expect if the doc states 'true on success and false > otherwise'? > > Maybe the documentation should also be fixed to avoid any ambiguity. > I.e., avoid documenting 'a value or false' because it may be > ambiguous (not only) to the less proficient. 'True or false' should > imply a value being returned. > > Comments? > > -hilmar -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From hlapp at gmx.net Wed May 31 14:31:21 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 31 May 2006 14:31:21 -0400 Subject: [Bioperl-l] For CVS developers - potential pitfallwith "returnundef" In-Reply-To: <200605311415.00414.lstein@cshl.edu> References: <001801c68431$a586b2d0$15327e82@pyrimidine> <200605311203.13922.lstein@cshl.edu> <200605311415.00414.lstein@cshl.edu> Message-ID: <241E77AE-8D1E-4708-9C4C-8A9619822DB4@gmx.net> On May 31, 2006, at 2:14 PM, Lincoln Stein wrote: > If the documentation says "returns false" then I expect to be able > to do this: > > @result = foo(); > die "foo() failed" unless @result; Except if the alternative to 'false' would be a scalar, you normally wouldn't assign it to an array, would you? I.e., I wouldn't expect this strict of a behavior from an open-source package written largely from people whose job is biological science, not programming perl knowing and following DC to the letter ... I'd rather be on the safe side and assign to a scalar. Just my $0.02 ... -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Wed May 31 14:50:30 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 13:50:30 -0500 Subject: [Bioperl-l] For CVS developers - potential pitfallwith"returnundef" In-Reply-To: <447DAEB1.4040509@mrc-dunn.cam.ac.uk> Message-ID: <001801c684e3$16e33730$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Wednesday, May 31, 2006 9:57 AM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] For CVS developers - potential > pitfallwith"returnundef" > > Heikki Lehvaslaiho wrote: > > In my opinion the sooner the bugs get exposed the better. It is much > more > > likely that there is a well hidden bug caused by assigning accidentally > undef > > into an one element array that someone intentionally writing code that > > expects that behaviour! > > > > I removed (but did not commit yet) all undefs from my old Bio::Variation > code > > and could not see any differences in the test output. > > > > Let's remove them! > > Just looking for all return undef;s isn't enough. It's entirely possible > to do something like: > > my $return_value; > { > # do something that assigns to return_value on success > # on failure, just do nothing > } > return $return_value; Agreed, though looking for these is obviously much harder. The way to get around those is: return $return_value if $return_value; return; which I've seen used in a number of get/set methods. > The bioperl docs will typically explicitly state that undef is returned, > and under what circumstance. If a user suffers from the > undef-into-array-problem, yes it can be slightly unexpected, but lots of > unexpected things will happen when you don't use a method correctly, as > per the docs! Right, but the argument you make is that code will always work as expected from the perldoc examples. My recent experiences with the Bio::Restriction::IO and Bio::Species classes show that the docs are not always up-to-date and may indicate the unimplemented intent of the author more than the actual implementation. Again, I believe a large majority of the docs are fine, but it's those few errors that made a devil's advocate of me... > Fixing the return of undef is either a job that shouldn't be done, or a > much harder job than expected. I don't think ignoring the problem is the best answer here though I agree the problem is more complicated than at first glance. Judging from code I'm trolled through a bit lately I've seen a lot of methods (mainly get/setters) that are essentially copied multiple times in the same or across similar modules to save time. You could see a scenario where, in those instances, so-called 'bad code' would spread quite quickly. I think adding a wiki page to address some of these issues would be nice, something separate from the Project Priority List. Chris _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From forward at hongyu.org Wed May 31 14:03:46 2006 From: forward at hongyu.org (Hongyu Zhang) Date: Wed, 31 May 2006 11:03:46 -0700 Subject: [Bioperl-l] New functions for SimpleAlign.pm Message-ID: <20060531110346.78xod658td8o0w0w@hongyu.org> Greetings, I am a new member in this mailing list. Nice to be here. I wrote two more functions for the alignment module SimpleAlign.pm that calculate the percentage of identity based on the shortest and longest sequence length, respectively. I also found an error in the no_residues() function that calculate the number of residues in the alignment. I am wondering whether they can be added to the official bioperl package. I've contacted the original author of this module, Heikki Lehvaslaiho, a couple of weeks ago, but haven't heard from him yet. Thanks. -- Hongyu Zhang, Ph.D. Computational biologist Ceres Inc. From cjfields at uiuc.edu Wed May 31 15:39:26 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 14:39:26 -0500 Subject: [Bioperl-l] New functions for SimpleAlign.pm In-Reply-To: <20060531110346.78xod658td8o0w0w@hongyu.org> Message-ID: <001901c684e9$ed4a1720$15327e82@pyrimidine> I added a bit to the FAQ about this: http://www.bioperl.org/wiki/FAQ#How_do_I_submit_a_patch_or_enhancement_to_Bi oPerl.3F and the HOWTO explains things a bit more directly: http://www.bioperl.org/wiki/HOWTO:SubmitPatch In brief, these need to be submitted to Bugzilla as either code enhancements (for your added methods) or bugs with the patch to the relevant code. Code enhancements probably should include some code and test cases to demonstrate usage. Patches to buggy code are checked to make sure they pass relevant tests by the core developers. Submitting it to the mail list is definitely the first step, though, so you're on the right path. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hongyu Zhang > Sent: Wednesday, May 31, 2006 1:04 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] New functions for SimpleAlign.pm > > Greetings, > > I am a new member in this mailing list. Nice to be here. > > I wrote two more functions for the alignment module SimpleAlign.pm > that calculate the percentage of identity based on the shortest and > longest sequence length, respectively. I also found an error in the > no_residues() function that calculate the number of residues in the > alignment. > > I am wondering whether they can be added to the official bioperl > package. I've contacted the original author of this module, Heikki > Lehvaslaiho, a couple of weeks ago, but haven't heard from him yet. > > Thanks. > > -- > Hongyu Zhang, Ph.D. > Computational biologist > Ceres Inc. > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Wed May 31 16:40:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 15:40:19 -0500 Subject: [Bioperl-l] For CVS developers - potential pitfallwith"returnundef" In-Reply-To: <200605311415.00414.lstein@cshl.edu> Message-ID: <002001c684f2$6fb7daf0$15327e82@pyrimidine> What about modules that have 'throw_not_implemented' statements present? Here's a list with the total for each. Some of these are interfaces (I got rid of a number that ended in 'I' or 'IO' to remove the I/IO interfaces but it misses a few). There are a number here that are implementations, though (Bio::AlignIO::maf, Bio::Restriction:IO::*), so they are technically incomplete: Instances: 1 Module : Bio::AlignIO::maf Instances: 25 Module : Bio::Assembly::Contig Instances: 2 Module : Bio::Assembly::ContigAnalysis Instances: 2 Module : Bio::Biblio::BiblioBase Instances: 4 Module : Bio::DB::Expression Instances: 2 Module : Bio::DB::Expression::geo Instances: 5 Module : Bio::DB::Flat Instances: 2 Module : Bio::DB::Query::WebQuery Instances: 17 Module : Bio::DB::SeqFeature::Store Instances: 2 Module : Bio::DB::SeqVersion Instances: 3 Module : Bio::DB::Taxonomy Instances: 1 Module : Bio::FeatureIO::bed Instances: 1 Module : Bio::Map::Marker Instances: 1 Module : Bio::MapIO::fpc Instances: 1 Module : Bio::MapIO::mapmaker Instances: 1 Module : Bio::Restriction::IO::bairoch Instances: 1 Module : Bio::Restriction::IO::itype2 Instances: 1 Module : Bio::Restriction::IO::withrefm Instances: 1 Module : Bio::Tools::Analysis::SimpleAnalysisBase Instances: 3 Module : Bio::Tools::Run::WrapperBase Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Lincoln Stein > Sent: Wednesday, May 31, 2006 1:15 PM > To: Hilmar Lapp > Cc: bioperl-l at lists.open-bio.org; Heikki Lehvaslaiho > Subject: Re: [Bioperl-l] For CVS developers - potential > pitfallwith"returnundef" > > If the documentation says "returns false" then I expect to be able to do > this: > > @result = foo(); > die "foo() failed" unless @result; > > If the documentation says "returns undef" then I expect this: > > @result = foo(); > die "foo() failed" unless $result[0]; > > Lincoln > > > On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote: > > On May 31, 2006, at 12:03 PM, Lincoln Stein wrote: > > > If the subroutine is documented to return "false" on failure, then > > > one must call > > > return (or "return ()" ). > > > > The problem seems to be that 'a value that evaluates to either true > > or false' and 'a [meaningful] value or undef' and 'a value or > > false' ('a value or no value) are not the same in perl. And what > > would/should one expect if the doc states 'true on success and false > > otherwise'? > > > > Maybe the documentation should also be fixed to avoid any ambiguity. > > I.e., avoid documenting 'a value or false' because it may be > > ambiguous (not only) to the less proficient. 'True or false' should > > imply a value being returned. > > > > Comments? > > > > -hilmar > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lstein at cshl.edu Wed May 31 17:07:06 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 31 May 2006 17:07:06 -0400 Subject: [Bioperl-l] For CVS developers - potential pitfallwith"returnundef" In-Reply-To: <002001c684f2$6fb7daf0$15327e82@pyrimidine> References: <002001c684f2$6fb7daf0$15327e82@pyrimidine> Message-ID: <200605311707.08196.lstein@cshl.edu> > Instances: 17 Module : Bio::DB::SeqFeature::Store This is intentional. Bio::DB::SeqFeature::Store is intended to be a virtual base class. The throw_not_implemented() calls are there to force developers to override the needed interface methods. If this is not the right way to do it, let me know and I'll fix it. Lincoln > Instances: 2 Module : Bio::DB::SeqVersion > Instances: 3 Module : Bio::DB::Taxonomy > Instances: 1 Module : Bio::FeatureIO::bed > Instances: 1 Module : Bio::Map::Marker > Instances: 1 Module : Bio::MapIO::fpc > Instances: 1 Module : Bio::MapIO::mapmaker > Instances: 1 Module : Bio::Restriction::IO::bairoch > Instances: 1 Module : Bio::Restriction::IO::itype2 > Instances: 1 Module : Bio::Restriction::IO::withrefm > Instances: 1 Module : Bio::Tools::Analysis::SimpleAnalysisBase > Instances: 3 Module : Bio::Tools::Run::WrapperBase > > Chris > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Lincoln Stein > > Sent: Wednesday, May 31, 2006 1:15 PM > > To: Hilmar Lapp > > Cc: bioperl-l at lists.open-bio.org; Heikki Lehvaslaiho > > Subject: Re: [Bioperl-l] For CVS developers - potential > > pitfallwith"returnundef" > > > > If the documentation says "returns false" then I expect to be able to do > > this: > > > > @result = foo(); > > die "foo() failed" unless @result; > > > > If the documentation says "returns undef" then I expect this: > > > > @result = foo(); > > die "foo() failed" unless $result[0]; > > > > Lincoln > > > > On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote: > > > On May 31, 2006, at 12:03 PM, Lincoln Stein wrote: > > > > If the subroutine is documented to return "false" on failure, then > > > > one must call > > > > return (or "return ()" ). > > > > > > The problem seems to be that 'a value that evaluates to either true > > > or false' and 'a [meaningful] value or undef' and 'a value or > > > false' ('a value or no value) are not the same in perl. And what > > > would/should one expect if the doc states 'true on success and false > > > otherwise'? > > > > > > Maybe the documentation should also be fixed to avoid any ambiguity. > > > I.e., avoid documenting 'a value or false' because it may be > > > ambiguous (not only) to the less proficient. 'True or false' should > > > imply a value being returned. > > > > > > Comments? > > > > > > -hilmar > > > > -- > > Lincoln D. Stein > > Cold Spring Harbor Laboratory > > 1 Bungtown Road > > Cold Spring Harbor, NY 11724 > > (516) 367-8380 (voice) > > (516) 367-8389 (fax) > > FOR URGENT MESSAGES & SCHEDULING, > > PLEASE CONTACT MY ASSISTANT, > > SANDRA MICHELSEN, AT michelse at cshl.edu > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From hlapp at gmx.net Wed May 31 17:21:57 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 31 May 2006 17:21:57 -0400 Subject: [Bioperl-l] For CVS developers - throw_not_implemented In-Reply-To: <002001c684f2$6fb7daf0$15327e82@pyrimidine> References: <002001c684f2$6fb7daf0$15327e82@pyrimidine> Message-ID: On May 31, 2006, at 4:40 PM, Chris Fields wrote: > What about modules that have 'throw_not_implemented' statements > present? Those are often if not always legitimate - the problem are those that don't have them but fail to override an inherited interface or abstract method. If something is not implemented what is the better way to express this other than throwing an exception? (and if it's not an interface or abstract base class, saying so in the documentation) -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed May 31 17:25:48 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 31 May 2006 17:25:48 -0400 Subject: [Bioperl-l] For CVS developers - potential pitfallwith"returnundef" In-Reply-To: <001801c684e3$16e33730$15327e82@pyrimidine> References: <001801c684e3$16e33730$15327e82@pyrimidine> Message-ID: <8AA04BF0-FA79-43CF-9FBB-310314FECD91@gmx.net> On May 31, 2006, at 2:50 PM, Chris Fields wrote: > I've seen a lot of methods (mainly get/setters) > that are essentially copied multiple times in the same or across > similar > modules to save time. You could see a scenario where, in those > instances, > so-called 'bad code' would spread quite quickly. This will usually be code generated by macros, e.g. the emacs macros for getter/setter generation for properties. If the macro generates wrong code, that's indeed pretty bad. (We've had that.) OTOH it should be spotted quickly as well. And macro changes or new macros should probably be scrutinized by all eyes watching ... -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Wed May 31 17:40:22 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 16:40:22 -0500 Subject: [Bioperl-l] For CVS developers - throw_not_implemented In-Reply-To: Message-ID: <002401c684fa$d28e7640$15327e82@pyrimidine> I think, as long as it's reflected in the docs that something doesn't work (hasn't been implemented) then there's no problem. It's when the docs are misleading that we run into problems. The sticking point lies with some classes, such as IO classes (like SeqIO, or Restrict::IO, with read and write methods) where the IO base class specifies that it is possible to read and write a particular format but the actual implementation varies according to whether or not the derived class overrides the base or interface method (in other words, 'doesn't work as advertised' only in specific circumstances). I don't know how to solve this issue except to add in the docs that specific formats don't implement write() methods. Personally, I haven't had an issue with it and it probably makes no difference, but I think it needs to be pointed out. The most extreme I ran into was Bio::Restriction::IO, which had 3 out of 4 plugin modules that didn't implement the write() method but left this in the synopsis in POD: use Bio::Restriction::IO; $in = Bio::Restriction::IO->new(-file => "inputfilename" , -format => 'withrefm'); $out = Bio::Restriction::IO->new(-file => ">outputfilename" , -format => 'bairoch'); my $res = $in->read; # a Bio::Restriction::EnzymeCollection $out->write($res); # or # use Bio::Restriction::IO; # # #input file format can be read from the file extension (dat|xml) # $in = Bio::Restriction::IO->newFh(-file => "inputfilename"); # $out = Bio::Restriction::IO->newFh('-format' => 'xml'); # # # World's shortest flat<->xml format converter: # print $out $_ while <$in>; None of this code works; in fact, no XML parser even exists for these IO classes! Bio::AlignIO also has a few as well (maf and Stockholm formats don't write). Chris > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Wednesday, May 31, 2006 4:22 PM > To: Chris Fields > Cc: lstein at cshl.edu; bioperl-l at lists.open-bio.org; 'Heikki Lehvaslaiho' > Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented > > > On May 31, 2006, at 4:40 PM, Chris Fields wrote: > > > What about modules that have 'throw_not_implemented' statements > > present? > > Those are often if not always legitimate - the problem are those that > don't have them but fail to override an inherited interface or > abstract method. > > If something is not implemented what is the better way to express > this other than throwing an exception? (and if it's not an interface > or abstract base class, saying so in the documentation) > > -hilmar > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From hlapp at gmx.net Wed May 31 17:55:37 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 31 May 2006 17:55:37 -0400 Subject: [Bioperl-l] For CVS developers - throw_not_implemented In-Reply-To: <002401c684fa$d28e7640$15327e82@pyrimidine> References: <002401c684fa$d28e7640$15327e82@pyrimidine> Message-ID: This is documentation cruft resulting from copy&paste w/o later fixing it. (which isn't a justification) Note that not implementing the write is as legitimate as not implementing the read method ... It should be pointed out in the documentation though that it will depend on the actual implementation of the format whether it supports reading or writing or both. -hilmar On May 31, 2006, at 5:40 PM, Chris Fields wrote: > I think, as long as it's reflected in the docs that something > doesn't work > (hasn't been implemented) then there's no problem. It's when the > docs are > misleading that we run into problems. > > The sticking point lies with some classes, such as IO classes (like > SeqIO, > or Restrict::IO, with read and write methods) where the IO base class > specifies that it is possible to read and write a particular format > but the > actual implementation varies according to whether or not the > derived class > overrides the base or interface method (in other words, 'doesn't > work as > advertised' only in specific circumstances). I don't know how to > solve this > issue except to add in the docs that specific formats don't implement > write() methods. > > Personally, I haven't had an issue with it and it probably makes no > difference, but I think it needs to be pointed out. The most > extreme I ran > into was Bio::Restriction::IO, which had 3 out of 4 plugin modules > that > didn't implement the write() method but left this in the synopsis > in POD: > > use Bio::Restriction::IO; > > $in = Bio::Restriction::IO->new(-file => "inputfilename" , > -format => 'withrefm'); > $out = Bio::Restriction::IO->new(-file => ">outputfilename" , > -format => 'bairoch'); > my $res = $in->read; # a Bio::Restriction::EnzymeCollection > $out->write($res); > > # or > > # use Bio::Restriction::IO; > # > # #input file format can be read from the file extension (dat| > xml) > # $in = Bio::Restriction::IO->newFh(-file => "inputfilename"); > # $out = Bio::Restriction::IO->newFh('-format' => 'xml'); > # > # # World's shortest flat<->xml format converter: > # print $out $_ while <$in>; > > None of this code works; in fact, no XML parser even exists for > these IO > classes! Bio::AlignIO also has a few as well (maf and Stockholm > formats > don't write). > > Chris > > >> -----Original Message----- >> From: Hilmar Lapp [mailto:hlapp at gmx.net] >> Sent: Wednesday, May 31, 2006 4:22 PM >> To: Chris Fields >> Cc: lstein at cshl.edu; bioperl-l at lists.open-bio.org; 'Heikki >> Lehvaslaiho' >> Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented >> >> >> On May 31, 2006, at 4:40 PM, Chris Fields wrote: >> >>> What about modules that have 'throw_not_implemented' statements >>> present? >> >> Those are often if not always legitimate - the problem are those that >> don't have them but fail to override an inherited interface or >> abstract method. >> >> If something is not implemented what is the better way to express >> this other than throwing an exception? (and if it's not an interface >> or abstract base class, saying so in the documentation) >> >> -hilmar >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From slenk at emich.edu Wed May 31 17:52:13 2006 From: slenk at emich.edu (Stephen Gordon Lenk) Date: Wed, 31 May 2006 17:52:13 -0400 Subject: [Bioperl-l] For CVS developers - throw_not_implemented Message-ID: <100682f110067a83.10067a83100682f1@emich.edu> Isn't it fairly standard in OO schemes/languages to have an exception thrown if a method can't be found at the end of a search up the class hierarchy? I recall being very mad at Smalltalk because "method not found" kept biting me. C++ has pure virtual base classes that do not allow objects to be instantiated directly; they are meant to be inherited and then implemented. Perl 6 was mentioned a bit back. Is this issue addressed there? Should it be? Do the Bioperl people feed their needs into Perl 6 so that all the code effort to make Bio::Root is handled for them in the next effort by Perl 6 itself. Make the Perl 6 people solve these issues with your input, then you will not have to deal with implementing it yourselves. I'll just bet that you are not the only potential users of Perl 6 who will have to solve these issues eventually. ----- Original Message ----- From: Hilmar Lapp Date: Wednesday, May 31, 2006 5:21 pm Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented > > On May 31, 2006, at 4:40 PM, Chris Fields wrote: > > > What about modules that have 'throw_not_implemented' statements > > present? > > Those are often if not always legitimate - the problem are those > that > don't have them but fail to override an inherited interface or > abstract method. > > If something is not implemented what is the better way to express > this other than throwing an exception? (and if it's not an > interface > or abstract base class, saying so in the documentation) > > -hilmar > > -- > ========================================================= == > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > ========================================================= == > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From arareko at campus.iztacala.unam.mx Wed May 31 18:49:03 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Wed, 31 May 2006 17:49:03 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <001201c684d0$263c5530$15327e82@pyrimidine> References: <001201c684d0$263c5530$15327e82@pyrimidine> Message-ID: <447E1D5F.1050807@campus.iztacala.unam.mx> Brian, Jay, Chris, I agree with what Bernd Web said in another reply. For some people will be nice to still be able to run the script from the codebase and interact with it. I don't think it should be a lot of problem to maintain both tutorials, as long as the 'main' one is the one in the CVS tree. By reading what Jay did in order to convert it into mediawiki format, I suppose this can be easily done again for each new change to the script (again, this is just my guessing). Besides, as far as I've seen, there aren't frequent commits to the script at all. I've added a link in the left menu of the wiki. If you think it should point to the Tutorials page instead of the Bptutorial.pl page please let me know. Regards, Mauricio. Chris Fields wrote: > Brian, Jay, > > I think it would be nice to have the tutorial prominently displayed somehow > (Jay's suggestion), with a link provided via the tutorials page. Hopefully > this will help with the bioperl newbies. > > Jay, looks like there are still some weird formatting issues with the > bptutorial wiki page, something which I ran into before when getting the > Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or more > spaces preceding a line denotes code for some reason). Not much you can do > in these cases except remove the extra spaces in those spots. Looking good > though! > > Chris > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Brian Osborne >> Sent: Wednesday, May 31, 2006 8:58 AM >> To: Jay Hannah; bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl >> >> Jay, >> >> Excellent! Now we need to answer a few more questions for ourselves: >> >> - Do we remove the file bptutorial.pl from the package now? I'd say yes, >> we >> don't want to have to maintain two bptutorials. >> >> - What do we do with the script part of bptutorial.pl? It certainly could >> be >> excised and put into the examples/ directory, for example, but this would >> break a few of the paths that are being used. >> >> - A link to bptutorial? Or a link to the existing tutorials page? >> http://www.bioperl.org/wiki/Tutorials. >> >> Any thoughts on these? >> >> >> Brian O. >> >> >> On 5/31/06 9:07 AM, "Jay Hannah" wrote: >> >>> http://www.bioperl.org/wiki/Bptutorial.pl >>> >>> I think I just partially fulfilled this TODO: >>> >>> TODO: check if the POD is in the Wiki yet, and if not, put it here? >>> >>> I used Pod::Simple::Wiki (format 'mediawiki') to burn >>> bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it >> the >>> wiki page via my web browser. (Is that proper procedure? Is the plan to >> just >>> do that manually from time to time as the document changes?) >>> >>> Now what? >>> >>> Should there be a new link on the far left of bioperl.org called >> "Tutorial"? >>> It's an amazing document. IMHO it should be listed prominently on >> bioperl.org. >>> HTH, >>> >>> j >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Wed May 31 20:43:48 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 19:43:48 -0500 Subject: [Bioperl-l] For CVS developers - potential pitfallwith"returnundef" In-Reply-To: <200605311707.08196.lstein@cshl.edu> Message-ID: <002801c68514$72f11480$15327e82@pyrimidine> > -----Original Message----- > From: Lincoln Stein [mailto:lstein at cshl.edu] > Sent: Wednesday, May 31, 2006 4:07 PM > To: Chris Fields > Cc: 'Hilmar Lapp'; bioperl-l at lists.open-bio.org; 'Heikki Lehvaslaiho' > Subject: Re: [Bioperl-l] For CVS developers - potential > pitfallwith"returnundef" > > > > Instances: 17 Module : Bio::DB::SeqFeature::Store > > This is intentional. Bio::DB::SeqFeature::Store is intended to be a > virtual > base class. The throw_not_implemented() calls are there to force > developers > to override the needed interface methods. > > If this is not the right way to do it, let me know and I'll fix it. That's the right way, though I don't really know what the 'right way' is. Sorry Lincoln, didn't mean to imply anything directly at you specifically; I responded to your last post to stay in the thread, so to speak. It was meant to be a general statement that some classes haven't implemented methods specified by their abstract base or interface class. This is just output from a quickie script I wrote up to check on this and see how many of these statements are out there, and since there isn't a full-proof method to know what an abstract base class is, it pulls in a few abstract classes (such as yours) along with all the others. At least there aren't as many hits as Torsten's ~400-500 for 'return undef'! Anyway, I'm not sure what would be the best place to address code problems or issues like the unimplemented methods issue or Torsten's audits (list, wiki, etc); it's a delicate issue b/c it's bordering on code critiquing and what constitutes good vs. bad code. I remember some pretty heated arguments about the 'proper' way to do things a while back involving AUTOLOAD'ing methods, which I think is summarized somewhere in the wiki. Myself, I'm a microbiologist and not a programmer, so I'm prone to bouts of hackery, but I try to have the code at least do what the docs state. Chris > Lincoln > > > > Instances: 2 Module : Bio::DB::SeqVersion > > Instances: 3 Module : Bio::DB::Taxonomy > > Instances: 1 Module : Bio::FeatureIO::bed > > Instances: 1 Module : Bio::Map::Marker > > Instances: 1 Module : Bio::MapIO::fpc > > Instances: 1 Module : Bio::MapIO::mapmaker > > Instances: 1 Module : Bio::Restriction::IO::bairoch > > Instances: 1 Module : Bio::Restriction::IO::itype2 > > Instances: 1 Module : Bio::Restriction::IO::withrefm > > Instances: 1 Module : Bio::Tools::Analysis::SimpleAnalysisBase > > Instances: 3 Module : Bio::Tools::Run::WrapperBase > > > > Chris > > > > > -----Original Message----- > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > bounces at lists.open-bio.org] On Behalf Of Lincoln Stein > > > Sent: Wednesday, May 31, 2006 1:15 PM > > > To: Hilmar Lapp > > > Cc: bioperl-l at lists.open-bio.org; Heikki Lehvaslaiho > > > Subject: Re: [Bioperl-l] For CVS developers - potential > > > pitfallwith"returnundef" > > > > > > If the documentation says "returns false" then I expect to be able to > do > > > this: > > > > > > @result = foo(); > > > die "foo() failed" unless @result; > > > > > > If the documentation says "returns undef" then I expect this: > > > > > > @result = foo(); > > > die "foo() failed" unless $result[0]; > > > > > > Lincoln > > > > > > On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote: > > > > On May 31, 2006, at 12:03 PM, Lincoln Stein wrote: > > > > > If the subroutine is documented to return "false" on failure, then > > > > > one must call > > > > > return (or "return ()" ). > > > > > > > > The problem seems to be that 'a value that evaluates to either true > > > > or false' and 'a [meaningful] value or undef' and 'a value or > > > > false' ('a value or no value) are not the same in perl. And what > > > > would/should one expect if the doc states 'true on success and false > > > > otherwise'? > > > > > > > > Maybe the documentation should also be fixed to avoid any ambiguity. > > > > I.e., avoid documenting 'a value or false' because it may be > > > > ambiguous (not only) to the less proficient. 'True or false' should > > > > imply a value being returned. > > > > > > > > Comments? > > > > > > > > -hilmar > > > > > > -- > > > Lincoln D. Stein > > > Cold Spring Harbor Laboratory > > > 1 Bungtown Road > > > Cold Spring Harbor, NY 11724 > > > (516) 367-8380 (voice) > > > (516) 367-8389 (fax) > > > FOR URGENT MESSAGES & SCHEDULING, > > > PLEASE CONTACT MY ASSISTANT, > > > SANDRA MICHELSEN, AT michelse at cshl.edu > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Wed May 31 20:56:12 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 19:56:12 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <447E1D5F.1050807@campus.iztacala.unam.mx> Message-ID: <002901c68516$316d4fe0$15327e82@pyrimidine> Mauricio et al, Sounds good, except that there are a few issues with the formatting done by Pod::Simple::Wiki, such as changing some things to tags when they obviously aren't code; I don't know if thee is a work around for that (Jay?). It may not be anything too serious though. There was a similar issue with the INSTALL doc conversion to wiki that I ran into, in that I don't think it will be easy converting one way or the other (POD->wiki or wiki->POD or text), so syncing updates with wiki and CVS docs could be an issue we'll have to face in the future. We could strip the POD out of the script and have the docs on the wiki (Brian's idea), or have minimal POD in the tutorial and keep the wiki updated, just to simplify things, but this may not appeal to those who use perldoc frequently (I personally use browsable prettified HTML). cjf > -----Original Message----- > From: Mauricio Herrera Cuadra [mailto:arareko at campus.iztacala.unam.mx] > Sent: Wednesday, May 31, 2006 5:49 PM > To: Chris Fields > Cc: 'Brian Osborne'; 'Jay Hannah'; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl > > Brian, Jay, Chris, > > I agree with what Bernd Web said in another reply. For some people will > be nice to still be able to run the script from the codebase and > interact with it. > > I don't think it should be a lot of problem to maintain both tutorials, > as long as the 'main' one is the one in the CVS tree. By reading what > Jay did in order to convert it into mediawiki format, I suppose this can > be easily done again for each new change to the script (again, this is > just my guessing). Besides, as far as I've seen, there aren't frequent > commits to the script at all. > > I've added a link in the left menu of the wiki. If you think it should > point to the Tutorials page instead of the Bptutorial.pl page please let > me know. > > Regards, > Mauricio. > > Chris Fields wrote: > > Brian, Jay, > > > > I think it would be nice to have the tutorial prominently displayed > somehow > > (Jay's suggestion), with a link provided via the tutorials page. > Hopefully > > this will help with the bioperl newbies. > > > > Jay, looks like there are still some weird formatting issues with the > > bptutorial wiki page, something which I ran into before when getting the > > Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or > more > > spaces preceding a line denotes code for some reason). Not much you can > do > > in these cases except remove the extra spaces in those spots. Looking > good > > though! > > > > Chris > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Brian Osborne > >> Sent: Wednesday, May 31, 2006 8:58 AM > >> To: Jay Hannah; bioperl-l at lists.open-bio.org > >> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl > >> > >> Jay, > >> > >> Excellent! Now we need to answer a few more questions for ourselves: > >> > >> - Do we remove the file bptutorial.pl from the package now? I'd say > yes, > >> we > >> don't want to have to maintain two bptutorials. > >> > >> - What do we do with the script part of bptutorial.pl? It certainly > could > >> be > >> excised and put into the examples/ directory, for example, but this > would > >> break a few of the paths that are being used. > >> > >> - A link to bptutorial? Or a link to the existing tutorials page? > >> http://www.bioperl.org/wiki/Tutorials. > >> > >> Any thoughts on these? > >> > >> > >> Brian O. > >> > >> > >> On 5/31/06 9:07 AM, "Jay Hannah" wrote: > >> > >>> http://www.bioperl.org/wiki/Bptutorial.pl > >>> > >>> I think I just partially fulfilled this TODO: > >>> > >>> TODO: check if the POD is in the Wiki yet, and if not, put it here? > >>> > >>> I used Pod::Simple::Wiki (format 'mediawiki') to burn > >>> bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it > >> the > >>> wiki page via my web browser. (Is that proper procedure? Is the plan > to > >> just > >>> do that manually from time to time as the document changes?) > >>> > >>> Now what? > >>> > >>> Should there be a new link on the far left of bioperl.org called > >> "Tutorial"? > >>> It's an amazing document. IMHO it should be listed prominently on > >> bioperl.org. > >>> HTH, > >>> > >>> j > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM From osborne1 at optonline.net Wed May 31 21:37:15 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 31 May 2006 21:37:15 -0400 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <447E1D5F.1050807@campus.iztacala.unam.mx> Message-ID: Mauricio, Bernd didn't say he want the _script_ in the package, he said he wanted bptutorial.pl in the package, not indicating whether it was the documentation or the script that was important. It's my suspicion that the documentation is more important than the script, and this is what my last letter was asking, in part: is the script important? Or can we focus on the text/POD part? Brian O. On 5/31/06 6:49 PM, "Mauricio Herrera Cuadra" wrote: > I agree with what Bernd Web said in another reply. For some people will > be nice to still be able to run the script from the codebase and > interact with it. From cjfields at uiuc.edu Wed May 31 21:42:54 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 20:42:54 -0500 Subject: [Bioperl-l] For CVS developers - throw_not_implemented In-Reply-To: <100682f110067a83.10067a83100682f1@emich.edu> Message-ID: <002a01c6851c$b3b8a980$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Stephen Gordon Lenk > Sent: Wednesday, May 31, 2006 4:52 PM > To: Hilmar Lapp > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented > > > Isn't it fairly standard in OO schemes/languages to have an exception > thrown if a method > can't be found at the > end of a search up the class hierarchy? I recall being very mad at > Smalltalk because "method > not found" kept > biting me. C++ has pure virtual base classes that do not allow objects to > be instantiated > directly; they are > meant to be inherited and then implemented. Perl will throw an error if it can't find a method in a class hierarchy. It will do a few things first before dying, like looking for AUTOLOAD, etc. AUTOLOAD has it's supporters and detractors; I try to stay away from it as much as possible. Not sure about C++ like pure virtual classes in Perl5, i.e. not allowing direct object instantiation, but Perl6 is supposed to have them, at least according to Apocalypse 12. From what Mr. Wall says about OOP in Perl5, it's essentially 'bolted on' but works with caveats (is 'private' really 'private'?). Perl6 is rebuilt from scratch (internals are OO). > Perl 6 was mentioned a bit back. Is this issue addressed there? Should it > be? Do the Bioperl > people feed their > needs into Perl 6 so that all the code effort to make Bio::Root is handled > for them in the next > effort by Perl 6 > itself. Make the Perl 6 people solve these issues with your input, then > you will not have to > deal with > implementing it yourselves. I'll just bet that you are not the only > potential users of Perl 6 who > will have to solve > these issues eventually. I think Perl6 will solve most (if not all) these problems since it's a complete rebuild. In fact, it's pretty much a new language altogether from what I have seen (and the little I have played around with using Pugs). Parrot is supposed to handle mixes of Perl5/Perl6, so it may not be necessary to immediately convert all of bioperl to Perl6. Though I have also heard of a Perl5->6 converter in the works as well... >From an OO standpoint, I believe everything is considered an object in Perl6, though it's not supposed to force you into using objects according to the Apocalypses that I have read. I actually see a lot there that reminds me of C++ (but in a Perl-ish way, of course). Apocalypse 12 is a good primer, though you may want to go through the others first, they're heavy slogging: http://dev.perl.org/perl6/doc/design/apo/A12.html Not sure what you mean by 'feeding our needs into Perl6'. I have periodically checked on perl6 progress and they seem to have everything well under control. Chris > ----- Original Message ----- > From: Hilmar Lapp > Date: Wednesday, May 31, 2006 5:21 pm > Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented > > > > > On May 31, 2006, at 4:40 PM, Chris Fields wrote: > > > > > What about modules that have 'throw_not_implemented' statements > > > present? > > > > Those are often if not always legitimate - the problem are those > > that > > don't have them but fail to override an inherited interface or > > abstract method. > > > > If something is not implemented what is the better way to express > > this other than throwing an exception? (and if it's not an > > interface > > or abstract base class, saying so in the documentation) > > > > -hilmar > > > > -- > > > ========================================================= > == > > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > > ========================================================= > == > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jay at jays.net Wed May 31 21:54:01 2006 From: jay at jays.net (Jay Hannah) Date: Wed, 31 May 2006 20:54:01 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: References: Message-ID: <447E48B9.4080503@jays.net> Brian Osborne wrote: > - Do we remove the file bptutorial.pl from the package now? I'd say yes, we > don't want to have to maintain two bptutorials. We certainly wouldn't want to try to maintain two copies, one POD one in wiki. That would be the worst of all options. One option that hasn't been mentioned yet is to keep maintenance of that in POD in the distro (leaving the cool runability alone), and then flag that document as unchangeable in the wiki with a note on top "Maintenance of this document is done in POD in the distro. Submit POD patches to bioperl-l and we'll re-post an updated copy to this wiki." Just a thought. > - What do we do with the script part of bptutorial.pl? It certainly could be > excised and put into the examples/ directory, for example, but this would > break a few of the paths that are being used. /README says this: scripts/ - Useful production-quality scripts with POD documentation examples/ - Scripts demonstrating the many uses of Bioperl I'm personally not clear on the difference. Little stuff should start in examples/ and graduate to scripts/ once they've matured? Is the doc/ tree being abandoned? doc/faq (empty?) doc/howto doc/howto/examples doc/howto/figs (empty?) doc/howto/html (empty?) doc/howto/pdf (empty?) doc/howto/sgml (empty?) doc/howto/txt (empty?) doc/howto/xml (empty?) Does all that stuff officially live in and is being changed in the wiki, never to return to the distro? Any reason those empty dirs aren't nuked out of CVS? Chris Fields wrote: > Jay, looks like there are still some weird formatting issues with the > bptutorial wiki page, something which I ran into before when getting the > Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or more > spaces preceding a line denotes code for some reason). Not much you can do > in these cases except remove the extra spaces in those spots. Looking good > though! Sorry, I spent zero time on the whole conversion. I'm not sure what parts didn't convert well. I've never done that conversion before, and know nothing about mediawiki. I just blindly let Pod::Simple::Wiki do its thing then ran off to work. :) Mauricio Herrera Cuadra wrote: > I've added a link in the left menu of the wiki. If you think it should > point to the Tutorials page instead of the Bptutorial.pl page please let > me know. Instead of all these competing links on the left, maybe we should have a master "documentation" page linked on the left cascading like so? Documentation (linked on the left menu) - Quick start - FAQ - HOWTOs - Tutorials (What's the conceptual difference between a HOWTO and a tutorial?) It's hard for me to dive into a wiki lifestyle for the huge documentation pillars since it can't ever get back into the distro... (can it?) Small, throw away stuff is great for the wiki, but huge, established, thoughtful, long documents should be left in the distro? Present (and searchable) on the wiki but static? Why isn't the short "Current events" just listed on the top of the "News" page? Sick of my endless questions yet? -grin- j From cjfields at uiuc.edu Wed May 31 23:09:38 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 22:09:38 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <447E48B9.4080503@jays.net> Message-ID: <000001c68528$d1b6ec10$15327e82@pyrimidine> ... > We certainly wouldn't want to try to maintain two copies, one POD one in > wiki. That would be the worst of all options. One option that hasn't been > mentioned yet is to keep maintenance of that in POD in the distro (leaving > the cool runability alone), and then flag that document as unchangeable in > the wiki with a note on top "Maintenance of this document is done in POD > in the distro. Submit POD patches to bioperl-l and we'll re-post an > updated copy to this wiki." > > Just a thought. There are probably three schools of thought on docs: those that like nice docs with links within and beyond BioPerl (hence the wiki), those who like including docs with the distribution, and those that would like both. The latter would be nice but isn't realistic unless we can come up with a way to sync changes between the wiki and CVS those docs we want to include with the distribution w/o too much trouble. I'm in the first school of thought since rich text with links is better and more informative than plain text any day. It might be a very small school though... > > - What do we do with the script part of bptutorial.pl? It certainly > could be > > excised and put into the examples/ directory, for example, but this > would > > break a few of the paths that are being used. > > /README says this: > > scripts/ - Useful production-quality scripts with POD documentation > examples/ - Scripts demonstrating the many uses of Bioperl > > I'm personally not clear on the difference. Little stuff should start in > examples/ and graduate to scripts/ once they've matured? > > Is the doc/ tree being abandoned? Most docs have been moved over to the wiki, which generates nicely formatted docs for printing. ... > Does all that stuff officially live in and is being changed in the wiki, > never to return to the distro? It's easier to add changes in the wiki and add markup, links, etc. Much richer text, so on. > Any reason those empty dirs aren't nuked out of CVS? > > Chris Fields wrote: > > Jay, looks like there are still some weird formatting issues with the > > bptutorial wiki page, something which I ran into before when getting the > > Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or > more > > spaces preceding a line denotes code for some reason). Not much you can > do > > in these cases except remove the extra spaces in those spots. Looking > good > > though! > > Sorry, I spent zero time on the whole conversion. I'm not sure what parts > didn't convert well. I've never done that conversion before, and know > nothing about mediawiki. I just blindly let Pod::Simple::Wiki do its thing > then ran off to work. :) No big deal. > Mauricio Herrera Cuadra wrote: > > I've added a link in the left menu of the wiki. If you think it should > > point to the Tutorials page instead of the Bptutorial.pl page please let > > me know. > > Instead of all these competing links on the left, maybe we should have a > master "documentation" page linked on the left cascading like so? > > Documentation (linked on the left menu) > - Quick start > - FAQ > - HOWTOs > - Tutorials Okay, though Mauricio may know a bit more on how/if this can be done. Mauricio? > (What's the conceptual difference between a HOWTO and a tutorial?) I believe the reasoning is along these lines: HOWTO's are focused in on specific areas (graphics, trees, BLAST report parsing, etc) and thus usually has greater detail. The tutorials are more broadly based (sort of a general bioperl HOWTO). The only exception is the Beginner's HOWTO, but even that has additional information over the tutorial (at least it did the last time I looked at the tutorial, which has been a while). > It's hard for me to dive into a wiki lifestyle for the huge documentation > pillars since it can't ever get back into the distro... (can it?) Small, > throw away stuff is great for the wiki, but huge, established, thoughtful, > long documents should be left in the distro? Present (and searchable) on > the wiki but static? Hence the problem we face now. It is something we need to really look into before adding too much more to the wiki. IMHO, I think we should have very little information directly in the distribution itself since it's already quite large. It's almost as easy to have a bare-bones INSTALL file, which would point to the wiki for additional information. But I may be very much alone in that train of thought ; > > Why isn't the short "Current events" just listed on the top of the "News" > page? Don't know. > Sick of my endless questions yet? -grin- Not really. cjf > j > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From gad14 at cornell.edu Tue May 30 12:57:41 2006 From: gad14 at cornell.edu (Genevieve DeClerck) Date: Tue, 30 May 2006 12:57:41 -0400 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <447BFB20.40501@mrc-dunn.cam.ac.uk> References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk> Message-ID: <447C7985.9000404@cornell.edu> Thanks for your comment Sendu, it was very helpful. I think this must be what's going on.. I am using $blast_report->next_result in both subroutines. It appears that analyzing the blast results first w/ my sort subroutine empties (?) the $blast_result object so that when I try to print, there is nothing left to print. (and visa-versa when I print first then try to sort). So, from the looks of things, using next_result has the effect of popping the Bio::Search::Result::ResultI objects off of the SearchIO blast report object?? It seems I could get around this by making a copy of the blast report by setting it to another new variable...(not the most elegant solution) but I'm having trouble with this... If I do: my $blast_report_copy = $blast_report; I'm just copying the reference to the SearchIO blast result, so it doesn't help me. How can I make another physical copy of this blast result object? Seems like a simple thing but how to do it is escaping me. But better yet, the way to go is to 'reset the counter,' or to find a way to look at/print/sort the results without removing data from the blast result object. How is this done though?? Sendu and Brian, I didn't post the sort_results subroutine because it is sprawling, as is a lot of my code. The code I provided was more like an aid for my explanation of the problem.. it doesn't actually run - sorry for the confusion, I should have more clear on that. The important thing to know perhaps is that both sort_results and print_blast_results contain a foreach loop where I am using the 'next_results' method to view blast results. (And to clarify for Torsten, the blastall() is working just fine - the analysis/viewing of the results object is where I am encountering the problem.) Any other ideas would be greatly appreciated... Thank you, Genevieve Sendu Bala wrote: > Genevieve DeClerck wrote: > >> Hi, > > [snip] > >> If I've sorted the results the sorted-results will print to screen, >> however when I try to print the Hit Table results nothing is returned, >> as if the blast results have evaporated.... and visa versa, if i >> comment out the part where i point my sorting subroutine to the blast >> results reference, my hit table results suddenly prints to screen. > > [snip] > >> Here's an abbreviated version of my code: > > [snip] > >> ####### >> ### the following 2 actions seem to be mutually exclusive. >> # 1) sort results into 1-hitter, 2-hitter, etc. groups of >> # SeqFeature objs stored in arrays. arrays are then printed >> # to stdout >> &sort_results($blast_report); >> >> # 2) print blast results >> &print_blast_results($blast_report); > > >> sub print_blast_results{ >> my $report = shift; >> while(my $result = $report->next_result()){ > > [snip] > > You didn't give us your sort_results subroutine, but is it as simple as > they both use $report->next_result (and/or $result->next_hit), but you > don't reset the internal counter back to the start, so the second > subroutine tries to get the next_result and finds the first subroutine > has already looked at the last result and so next_result returns false? > > From a quick look it wasn't obvious how to reset the counter. Hopefully > this can be done and someone else knows how. > From lstein at cshl.edu Wed May 31 11:17:39 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 31 May 2006 11:17:39 -0400 Subject: [Bioperl-l] SOLVED Bio::Graphics::Panel make ruler have neg values In-Reply-To: <5b6410e0605302045x5c420674x6f898a8a2973991a@mail.gmail.com> References: <5b6410e0605302045x5c420674x6f898a8a2973991a@mail.gmail.com> Message-ID: <200605311117.41479.lstein@cshl.edu> Hi Kevin, Since you are modifying the Panel.pm source code, why don't you just go ahead and use the current Bio::Graphics development tree? Since 1.5.1 it supports negative coordinates. Here's an illustration: #!/usr/bin/perl use strict; use Bio::Graphics; use Bio::Graphics::Feature; my $whole = Bio::Graphics::Feature->new(-start=>-200,-end=>+200); my $feature = Bio::Graphics::Feature->new(-start=>-100,-end=>+100,-strand=>+1); my $panel = Bio::Graphics::Panel->new(-start=> -200, -end => +200, -width=>800, -pad_left=>10, -pad_right=>10); $panel->add_track($whole, -glyph=>'arrow', -double=>1, -tick=>2); $panel->add_track($feature, -glyph=>'box', -stranded=>1); print $panel->png; exit 0; The resulting image is attached. Lincoln On Tuesday 30 May 2006 23:45, Kevin Lam Koiyau wrote: > I am so sorry for the truncated email accidentally hit reply. > if anyone is interested i have opted to change > > change line 161 of arrow.pm in Perl/site/lib/Bio/Graphics/Glyph/arrow.pm > in linux its > /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Glyph/arrow.pm > > > $gd->string($font,$middle,$center+$a2-1,$label,$font_color) > > to > > $gd->string($font,$middle,$center+$a2-1,$label-1000,$font_color) > > just for this one-off use. > > > > strangely I found at line 112 for ver 1.51 bioperl in arrow.pm a hidden > option for coords offset? > my $relative_coords_offset = $self->option('relative_coords_offset'); > $relative_coords_offset = 1 unless defined $relative_coords_offset; > but entering the option -relative_coords_offset=>1000 in the arrow glyphs > didn't do anything... > > > > Hi! > > > oh it was in a slightly different header asking about the create image > > map feature. > > I am using the stable version 1.4 of bioperl now. In any case I have not > > added the sequence as a feature annotated seq. as I already have the bp > > where the TF binds (in 1-1050 numberings) so what I did was to just add > > graded segments based on the position. > > I saw that there is a scale function for the arrow glyp however, it is a > > multiply function, can it be hacked to take in a offset value (ie minus > > the > > scale by 1000?) > > > > cheers > > kevin > > > > > > Hi, > > > > > For some reason I didn't see the first posting on this. In current > > > > bioperl > > > > > live, the ruler can have negative numberings - I use this routinely. > > > You need > > > to create a feature that starts in negative coordinates. What is > > > > happening > > > > > to > > > you when you try this? > > > > > > Lincoln > > > > > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote: > > > > Hi > > > > thanks for the help offered thus far! > > > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq > > > > > > using > > > > > > > bioperl. therefore i was asked to make the numberings as such (-1000) > > > > is > > > > > > there any way at all to do this in bioperl without changing the .pm > > > > > > file? > > > > > > > thanks guys.. > > > > kevin > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > > > Lincoln D. Stein > > > Cold Spring Harbor Laboratory > > > 1 Bungtown Road > > > Cold Spring Harbor, NY 11724 > > > (516) 367-8380 (voice) > > > (516) 367-8389 (fax) > > > FOR URGENT MESSAGES & SCHEDULING, > > > PLEASE CONTACT MY ASSISTANT, > > > SANDRA MICHELSEN, AT michelse at cshl.edu > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu -------------- next part -------------- A non-text attachment was scrubbed... Name: negatives.png Type: image/png Size: 1065 bytes Desc: not available URL: From lstein at cshl.edu Wed May 31 12:05:47 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 31 May 2006 12:05:47 -0400 Subject: [Bioperl-l] Fwd: Re: SOLVED Bio::Graphics::Panel make ruler have neg values Message-ID: <200605311205.48122.lstein@cshl.edu> Oddly, bioperl-l listserver is holding this mail because it has "a suspicious header". I took out Kevin's email address in case it is the "spammotel" header that is bothering it. Lincoln ---------- Forwarded Message ---------- Subject: Re: [Bioperl-l] SOLVED Bio::Graphics::Panel make ruler have neg values Date: Wednesday 31 May 2006 11:17 From: Lincoln Stein To: bioperl-l at lists.open-bio.org Cc: "Kevin Lam Koiyau" Hi Kevin, Since you are modifying the Panel.pm source code, why don't you just go ahead and use the current Bio::Graphics development tree? Since 1.5.1 it supports negative coordinates. Here's an illustration: #!/usr/bin/perl use strict; use Bio::Graphics; use Bio::Graphics::Feature; my $whole = Bio::Graphics::Feature->new(-start=>-200,-end=>+200); my $feature = Bio::Graphics::Feature->new(-start=>-100,-end=>+100,-strand=>+1); my $panel = Bio::Graphics::Panel->new(-start=> -200, -end => +200, -width=>800, -pad_left=>10, -pad_right=>10); $panel->add_track($whole, -glyph=>'arrow', -double=>1, -tick=>2); $panel->add_track($feature, -glyph=>'box', -stranded=>1); print $panel->png; exit 0; The resulting image is attached. Lincoln On Tuesday 30 May 2006 23:45, Kevin Lam Koiyau wrote: > I am so sorry for the truncated email accidentally hit reply. > if anyone is interested i have opted to change > > change line 161 of arrow.pm in Perl/site/lib/Bio/Graphics/Glyph/arrow.pm > in linux its > /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Glyph/arrow.pm > > > $gd->string($font,$middle,$center+$a2-1,$label,$font_color) > > to > > $gd->string($font,$middle,$center+$a2-1,$label-1000,$font_color) > > just for this one-off use. > > > > strangely I found at line 112 for ver 1.51 bioperl in arrow.pm a hidden > option for coords offset? > my $relative_coords_offset = $self->option('relative_coords_offset'); > $relative_coords_offset = 1 unless defined $relative_coords_offset; > but entering the option -relative_coords_offset=>1000 in the arrow glyphs > didn't do anything... > > > > Hi! > > > oh it was in a slightly different header asking about the create image > > map feature. > > I am using the stable version 1.4 of bioperl now. In any case I have not > > added the sequence as a feature annotated seq. as I already have the bp > > where the TF binds (in 1-1050 numberings) so what I did was to just add > > graded segments based on the position. > > I saw that there is a scale function for the arrow glyp however, it is a > > multiply function, can it be hacked to take in a offset value (ie minus > > the > > scale by 1000?) > > > > cheers > > kevin > > > > > > Hi, > > > > > For some reason I didn't see the first posting on this. In current > > > > bioperl > > > > > live, the ruler can have negative numberings - I use this routinely. > > > You need > > > to create a feature that starts in negative coordinates. What is > > > > happening > > > > > to > > > you when you try this? > > > > > > Lincoln > > > > > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote: > > > > Hi > > > > thanks for the help offered thus far! > > > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq > > > > > > using > > > > > > > bioperl. therefore i was asked to make the numberings as such (-1000) > > > > is > > > > > > there any way at all to do this in bioperl without changing the .pm > > > > > > file? > > > > > > > thanks guys.. > > > > kevin > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > > > Lincoln D. Stein > > > Cold Spring Harbor Laboratory > > > 1 Bungtown Road > > > Cold Spring Harbor, NY 11724 > > > (516) 367-8380 (voice) > > > (516) 367-8389 (fax) > > > FOR URGENT MESSAGES & SCHEDULING, > > > PLEASE CONTACT MY ASSISTANT, > > > SANDRA MICHELSEN, AT michelse at cshl.edu > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu ------------------------------------------------------- -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu -------------- next part -------------- A non-text attachment was scrubbed... Name: negatives.png Type: image/png Size: 1065 bytes Desc: not available URL: From rvosa at sfu.ca Tue May 30 15:10:17 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Tue, 30 May 2006 12:10:17 -0700 Subject: [Bioperl-l] New mailing list for Bio::Phylo Message-ID: <447C9899.5060102@sfu.ca> Dear recipients, the open bioinformatics foundation has been kind enough to host a mailing list for Bio::Phylo (http://search.cpan.org/~rvosa/Bio-Phylo/, the cpan distribution for phylogenetic analysis using perl). The scope of this list is at present fairly broad as it is both meant for user questions and development discussion on deeper integration with bioperl. You are invited to sign up at: http://lists.open-bio.org/mailman/listinfo/bio-phylo-l Best wishes, Rutger Vos -- ++++++++++++++++++++++++++++++++++++++++++++++++++++ Rutger Vos, PhD. candidate Department of Biological Sciences Simon Fraser University 8888 University Drive Burnaby, BC, V5A1S6 Phone: 604-291-5625 Fax: 604-291-3496 Personal site: http://www.sfu.ca/~rvosa FAB* lab: http://www.sfu.ca/~fabstar Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ ++++++++++++++++++++++++++++++++++++++++++++++++++++ From bioperlanand at yahoo.com Mon May 1 18:36:20 2006 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Mon, 1 May 2006 11:36:20 -0700 (PDT) Subject: [Bioperl-l] how to obtain GIs from clone_ids Message-ID: <20060501183620.85791.qmail@web37901.mail.mud.yahoo.com> Hi everybody, I have a file containing clone_ids (from the Features annotation section of a GenBank entry) ------------------------------------------------------------ FEATURES Location/Qualifiers source 1..707 /clone="C0005918b04" ------------------------------------------------------------ Is there a way in Bioperl to send a query over the internet (one clone_id at a time) and get out just the GI number for that clone_id? Any suggestions.. Thanks in advance. Anand --------------------------------- Blab-away for as little as 1?/min. Make PC-to-Phone Calls using Yahoo! Messenger with Voice. From cuiw at mail.nih.gov Mon May 1 19:39:01 2006 From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F]) Date: Mon, 1 May 2006 15:39:01 -0400 Subject: [Bioperl-l] how to obtain GIs from clone_ids In-Reply-To: <20060501183620.85791.qmail@web37901.mail.mud.yahoo.com> Message-ID: use strict; use Bio::DB::Query::GenBank; my $query_string = 'EST["C0005918b04"]'; my $query = Bio::DB::Query::GenBank->new(-db=>'nucleotide', -query=>$query_string, ); my $count = $query->count; my @ids = $query->ids; for (@ids) { print; } -----Original Message----- From: Anand Venkatraman [mailto:bioperlanand at yahoo.com] Sent: Monday, May 01, 2006 2:36 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] how to obtain GIs from clone_ids Hi everybody, I have a file containing clone_ids (from the Features annotation section of a GenBank entry) ------------------------------------------------------------ FEATURES Location/Qualifiers source 1..707 /clone="C0005918b04" ------------------------------------------------------------ Is there a way in Bioperl to send a query over the internet (one clone_id at a time) and get out just the GI number for that clone_id? Any suggestions.. Thanks in advance. Anand --------------------------------- Blab-away for as little as 1?/min. Make PC-to-Phone Calls using Yahoo! Messenger with Voice. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From s.ryazansky at gmail.com Mon May 1 21:55:13 2006 From: s.ryazansky at gmail.com (Sergei Ryazansky) Date: Mon, 1 May 2006 21:55:13 +0000 (UTC) Subject: [Bioperl-l] blast program to run locally on windows References: <007c01c66883$61f29490$15327e82@pyrimidine> <20060425215433.35436.qmail@web36613.mail.mud.yahoo.com> Message-ID: Hi, Can you post your formatdb.log file here? From cjfields at uiuc.edu Tue May 2 04:15:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 1 May 2006 23:15:19 -0500 Subject: [Bioperl-l] blast program to run locally on windows In-Reply-To: References: <007c01c66883$61f29490$15327e82@pyrimidine> <20060425215433.35436.qmail@web36613.mail.mud.yahoo.com> Message-ID: We managed to work our way through it. He hadn't set ncbi.ini to the correct directories; the database was formatted correctly. Chris On May 1, 2006, at 4:55 PM, Sergei Ryazansky wrote: > Hi, > Can you post your formatdb.log file here? > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue May 2 16:19:34 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 2 May 2006 11:19:34 -0500 Subject: [Bioperl-l] Bio::DB::GenBank and complexity Message-ID: <000901c66e04$33e07370$15327e82@pyrimidine> I ran into some wonkiness with using extra parameters ('seq_start', 'seq_stop', 'strand', and 'complexity') with Bio::DB::GenBank that I have gone through, fixed, and committed. I also have added a few tests to DB.t for everything (all changes were in Bio::DB::WebDBSeqI and Bio::DB::NCBIHelper). The 'complexity' tag is the strangest, though I did manage to get it added as well (with tests). This is how NCBI defines complexity: complexity regulates the display: 0 - get the whole blob 1 - get the bioseq for gi of interest (default in Entrez) 2 - get the minimal bioseq-set containing the gi of interest 3 - get the minimal nuc-prot containing the gi of interest 4 - get the minimal pub-set containing the gi of interest Here's my quandary; when setting complexity to '0', you get a glob back (the main sequence as well as any subsequences, such as CDS); this is in essence a sequence stream with multiple alphabet types. So, I now have it set up to do this: my $factory = Bio::DB::GenBank->new(-format => 'fasta', -complexity => 0 ); my $seqin = $factory->get_Seq_by_acc($acc); while (my $seq = $seqin->next_seq) { $seqout->write_seq($seq); } since I thought returning an array would be horrendously expensive on memory, esp. with larger sequences. Currently this is only set up for sequences which are retrieved when complexity is set to '0' so it's a pretty unique case. Regardless, I'm worried that, since users expect a Bio::Seq object instead of a Bio::SeqIO object here, it will cause a lot of confusion with the API. Any suggestions/gripes? Chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From mamillerpa at yahoo.com Tue May 2 11:41:01 2006 From: mamillerpa at yahoo.com (Mark A. Miller) Date: Tue, 2 May 2006 04:41:01 -0700 (PDT) Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines Message-ID: <20060502114101.29745.qmail@web50409.mail.yahoo.com> Hello all. I have a recently donwloaded UniProt/TrEMBL flat file. I am trying to make FASTA subset files for some bacterial strains. I haven't been able to parse out the strain information from the OS or RC lines. These lines typically look like: OS Somegenus somespecies subsp. somesubspecies strain ABC123. RC STRAIN=ABC123. I'm not especiialy good with Perl, and I'm definitely weak when it comes to OOP. I have included some code I pasted together from various pages on the bioperl wiki. In addition to the wiki, I have been making use of www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html The code I have so far reports the species but not the subspecies or variant. I have also tried to walk through all of the feature, annotation and reference objects but I still can't seem to parse out the information I need. (For brevity, the example I'm including below only lists the code I used for the annotation objects.) Also, this code only prints the information... I know that I'll have to write a FASTA sequence object seperately. Any suggestions? Thanks, Mark --- --- --- #!/usr/bin/perl use Bio::SeqIO; my $usage = "getaccs.pl file format\n"; my $file = shift or die $usage; my $format = shift or die $usage; my $inseq = Bio::SeqIO->new(-file => "<$file", -format => $format ); while (my $seq = $inseq->next_seq) { my $species_object = $seq->species; my $species_string = $species_object->species; my $variant_string = $species_object->variant; my $common_string = $species_object->common_name; my $sub_string = $species_object->sub_species; my $binomial = $species_object->binomial('FULL'); print "display ",$seq->display_id,"\n"; print "accession ",$seq->accession_number,"\n"; print "desc ",$seq->desc,"\n"; print "species ",$species_string,"\n"; print "variant ",$variant_string,"\n"; print "common ",$common_string,"\n"; print "sub ",$sub_string,"\n"; print "binomial ",$binomial,"\n"; print $seq->seq,"\n"; my $anno_collection = $seq->annotation; for my $key ( $anno_collection->get_all_annotation_keys ) { my @annotations = $anno_collection->get_Annotations($key); for my $value ( @annotations ) { print "tagname : ", $value->tagname, "\n"; # $value is an Bio::Annotation, and has an "as_text" method print " annotation value: ", $value->as_text, "\n"; if ($value->tagname eq "reference") { my $hash_ref = $value->hash_tree; for my $key (keys %{$hash_ref}) { print $key,": ",$hash_ref->{$key},"\n"; } } } } print "\n"; } exit; --- --- --- --- --- --- --- --- Mark A. Miller __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cjfields at uiuc.edu Tue May 2 18:01:58 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 2 May 2006 13:01:58 -0500 Subject: [Bioperl-l] Bio::DB::GenBank and complexity In-Reply-To: <000901c66e04$33e07370$15327e82@pyrimidine> Message-ID: <000a01c66e12$8131a960$15327e82@pyrimidine> I hate responding to my own post! Just wanted to add that I'm adding a warnings for the get_Seq* methods to use the approp. get_Stream* method when complexity == 0 before returning the Bio::SeqIO object. CJF > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Tuesday, May 02, 2006 11:20 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::DB::GenBank and complexity > > I ran into some wonkiness with using extra parameters ('seq_start', > 'seq_stop', 'strand', and 'complexity') with Bio::DB::GenBank that I have > gone through, fixed, and committed. I also have added a few tests to DB.t > for everything (all changes were in Bio::DB::WebDBSeqI and > Bio::DB::NCBIHelper). The 'complexity' tag is the strangest, though I did > manage to get it added as well (with tests). This is how NCBI defines > complexity: > > complexity regulates the display: > 0 - get the whole blob > 1 - get the bioseq for gi of interest (default in Entrez) > 2 - get the minimal bioseq-set containing the gi of interest > 3 - get the minimal nuc-prot containing the gi of interest > 4 - get the minimal pub-set containing the gi of interest > > Here's my quandary; when setting complexity to '0', you get a glob back > (the > main sequence as well as any subsequences, such as CDS); this is in > essence > a sequence stream with multiple alphabet types. So, I now have it set up > to > do this: > > my $factory = Bio::DB::GenBank->new(-format => 'fasta', > -complexity => 0 > ); > > my $seqin = $factory->get_Seq_by_acc($acc); > > while (my $seq = $seqin->next_seq) { > $seqout->write_seq($seq); > } > > since I thought returning an array would be horrendously expensive on > memory, esp. with larger sequences. Currently this is only set up for > sequences which are retrieved when complexity is set to '0' so it's a > pretty > unique case. Regardless, I'm worried that, since users expect a Bio::Seq > object instead of a Bio::SeqIO object here, it will cause a lot of > confusion > with the API. Any suggestions/gripes? > > Chris > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Tue May 2 18:36:08 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue, 2 May 2006 14:36:08 -0400 Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines In-Reply-To: <20060502114101.29745.qmail@web50409.mail.yahoo.com> References: <20060502114101.29745.qmail@web50409.mail.yahoo.com> Message-ID: <7B49D031-9F74-43C3-AA4F-2AE115BB843D@duke.edu> This is really a limitation of the EMBL/GenBank format See this thread: http://lists.open-bio.org/pipermail/bioperl-l/2006-March/021068.html or on GMANE http://comments.gmane.org/gmane.comp.lang.perl.bio.general/10557 I don't know if any of this has been resolved really so hopefully James will speak up if he's implemented anything. -jason On May 2, 2006, at 7:41 AM, Mark A. Miller wrote: > Hello all. > > I have a recently donwloaded UniProt/TrEMBL flat file. I am trying to > make FASTA subset files for some bacterial strains. I haven't been > able to parse out the strain information from the OS or RC lines. > These lines typically look like: > > OS Somegenus somespecies subsp. somesubspecies strain ABC123. > RC STRAIN=ABC123. > > I'm not especiialy good with Perl, and I'm definitely weak when it > comes to OOP. > > I have included some code I pasted together from various pages on the > bioperl wiki. In addition to the wiki, I have been making use of > www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html > > The code I have so far reports the species but not the subspecies or > variant. I have also tried to walk through all of the feature, > annotation and reference objects but I still can't seem to parse out > the information I need. (For brevity, the example I'm including below > only lists the code I used for the annotation objects.) Also, this > code only prints the information... I know that I'll have to write a > FASTA sequence object seperately. > > Any suggestions? > > Thanks, > Mark > > --- --- --- > > > #!/usr/bin/perl > > > > use Bio::SeqIO; > > > > my $usage = "getaccs.pl file format\n"; > > my $file = shift or die $usage; > > my $format = shift or die $usage; > > > > my $inseq = Bio::SeqIO->new(-file => "<$file", > > -format => $format ); > > > > while (my $seq = $inseq->next_seq) { > > > > my $species_object = $seq->species; > > my $species_string = $species_object->species; > > my $variant_string = $species_object->variant; > > my $common_string = $species_object->common_name; > > my $sub_string = $species_object->sub_species; > > my $binomial = $species_object->binomial('FULL'); > > > > print "display ",$seq->display_id,"\n"; > > print "accession ",$seq->accession_number,"\n"; > > print "desc ",$seq->desc,"\n"; > > > > print "species ",$species_string,"\n"; > > print "variant ",$variant_string,"\n"; > > print "common ",$common_string,"\n"; > > print "sub ",$sub_string,"\n"; > > print "binomial ",$binomial,"\n"; > > > > print $seq->seq,"\n"; > > > > my $anno_collection = $seq->annotation; > > for my $key ( $anno_collection->get_all_annotation_keys ) { > > my @annotations = $anno_collection->get_Annotations($key); > > for my $value ( @annotations ) { > > print "tagname : ", $value->tagname, "\n"; > > # $value is an Bio::Annotation, and has an "as_text" method > > print " annotation value: ", $value->as_text, "\n"; > > > > if ($value->tagname eq "reference") { > > my $hash_ref = $value->hash_tree; > > for my $key (keys %{$hash_ref}) { > > print $key,": ",$hash_ref->{$key},"\n"; > > } > > } > > } > > } > > print "\n"; > > } > > exit; > > > > > > --- --- --- --- --- --- --- --- > > Mark A. Miller > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From mblanche at berkeley.edu Tue May 2 19:30:49 2006 From: mblanche at berkeley.edu (Marco Blanchette) Date: Tue, 02 May 2006 12:30:49 -0700 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF Message-ID: Dear all-- I have been trying to use the intersection function to extract overlapping region from alternatively spliced exons as in the following script. The returned object from the 'my $overlap = $exon1->intersection($exon2);' is actually loosing the strand of $exon1 if $exon1 is from the negative strand. Is this behavior expected? Should I check the strand of $exon1 before working on the object return by any Bio::RangeI function? Many thanks #!/usr/bin/perl use strict; use warnings; use Bio::DB::GFF; MAIN:{ my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', -dsn => 'dbi:mysql:database=dmel_43_LS;host=riolab.net', -user => 'guest'); my $test_db = $db->segment('4'); # Load up the exons into $exons_p for my $gene ($test_db->features(-types => 'gene')){ my $exons_p = extractExons($gene); cluster($exons_p) unless ($#{$exons_p} == -1); } } sub extractExons { my $gene = shift; my %ex_list; my @tcs = $gene->features( -type =>'processed_transcript', -attributes =>{Gene => $gene->group}); for my $tc (@tcs){ my @exons = $tc->features (-type => 'exon', -attributes => {Parent => $tc->group} ); for (@exons){ my $ex_id = $_->id; $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); } } my @values = values %ex_list; return(\@values); } sub cluster { my $exons_p = shift; for (my $s = 0; $s <= $#{$exons_p}; $s++){ for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ my $exon1 = $exons_p->[$s]; my $exon2 = $exons_p->[$t]; if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ my $overlap = $exon1->intersection($exon2); print "===\n";; print "ex1\n", $exon1->seq, "\n"; print "ex2\n", $exon2->seq, "\n"; print "overlap\n", $overlap->seq, "\n"; } } } } ______________________________ Marco Blanchette, Ph.D. mblanche at uclink.berkeley.edu Donald C. Rio's lab Department of Molecular and Cell Biology 16 Barker Hall University of California Berkeley, CA 94720-3204 Tel: (510) 642-1084 Cell: (510) 847-0996 Fax: (510) 642-6062 -- From osborne1 at optonline.net Tue May 2 20:17:29 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 02 May 2006 16:17:29 -0400 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF In-Reply-To: Message-ID: Marco, Yes, this is how intersection() is supposed to work. If both of the Range objects have the same strand then the strand information is returned as part of the result but if they aren't on the same strand then no strand information is returned. Brian O. On 5/2/06 3:30 PM, "Marco Blanchette" wrote: > Dear all-- > > I have been trying to use the intersection function to extract overlapping > region from alternatively spliced exons as in the following script. The > returned object from the 'my $overlap = $exon1->intersection($exon2);' is > actually loosing the strand of $exon1 if $exon1 is from the negative strand. > Is this behavior expected? Should I check the strand of $exon1 before > working on the object return by any Bio::RangeI function? > > Many thanks > > #!/usr/bin/perl > use strict; > use warnings; > use Bio::DB::GFF; > > MAIN:{ > > my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', > -dsn => > 'dbi:mysql:database=dmel_43_LS;host=riolab.net', > -user => 'guest'); > my $test_db = $db->segment('4'); > > # Load up the exons into $exons_p > for my $gene ($test_db->features(-types => 'gene')){ > > my $exons_p = extractExons($gene); > > cluster($exons_p) unless ($#{$exons_p} == -1); > > } > } > > sub extractExons { > my $gene = shift; > my %ex_list; > my @tcs = $gene->features( -type =>'processed_transcript', > -attributes =>{Gene => $gene->group}); > > for my $tc (@tcs){ > my @exons = $tc->features (-type => 'exon', > -attributes => {Parent => $tc->group} > ); > > for (@exons){ > my $ex_id = $_->id; > $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); > > } > > } > my @values = values %ex_list; > return(\@values); > } > > sub cluster { > my $exons_p = shift; > > for (my $s = 0; $s <= $#{$exons_p}; $s++){ > for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ > my $exon1 = $exons_p->[$s]; > my $exon2 = $exons_p->[$t]; > > if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ > > my $overlap = $exon1->intersection($exon2); > > print "===\n";; > print "ex1\n", $exon1->seq, "\n"; > print "ex2\n", $exon2->seq, "\n"; > print "overlap\n", $overlap->seq, "\n"; > } > } > } > } > ______________________________ > Marco Blanchette, Ph.D. > > mblanche at uclink.berkeley.edu > > Donald C. Rio's lab > Department of Molecular and Cell Biology > 16 Barker Hall > University of California > Berkeley, CA 94720-3204 > > Tel: (510) 642-1084 > Cell: (510) 847-0996 > Fax: (510) 642-6062 From mblanche at berkeley.edu Tue May 2 20:32:58 2006 From: mblanche at berkeley.edu (Marco Blanchette) Date: Tue, 02 May 2006 13:32:58 -0700 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF In-Reply-To: Message-ID: Brian-- Even when both elements of intersection() are from the negative strand, the return object is from the positive strand and $overlap is actually the revervese complement of the intersection between the 2 exons. Here is part of the output from the script below: === ex1 Strand: -1 CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG ex2 Strand: -1 CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAACCCGT CAAATCG overlap Strand: 1 CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTATTTT TGCCGACTGCCATGTTCAACTAATAAACCGG AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG ... If both are from the positive strand, the return object is positive as in: === ex1 Strand: 1 CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCTTTGT TTTGTGCCTGTTTCAGTATAAATTAATTATG CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGATGAAT AAATATACATATATGCAACATATATAACTTC CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGGCAGA GCAGCGATCAAAGCTGCGTTCGGTACTCGTT GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT ex2 Strand: 1 ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG overlap Strand: 1 CAACGCAGACGTG Is there something I am missing? Here is the script generating the output Many thanks all... Marco use strict; use warnings; use Bio::DB::GFF; MAIN:{ my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', -dsn => 'dbi:mysql:database=dmel_43_LS;host=riolab.net', -user => 'guest'); my $test_db = $db->segment('4'); # Load up the exons into $exons_p for my $gene ($test_db->features(-types => 'gene')){ my $exons_p = extractExons($gene); cluster($exons_p) unless ($#{$exons_p} == -1); } } sub extractExons { my $gene = shift; my %ex_list; my @tcs = $gene->features( -type =>'processed_transcript', -attributes =>{Gene => $gene->group}); for my $tc (@tcs){ my @exons = $tc->features (-type => 'exon', -attributes => {Parent => $tc->group} ); for (@exons){ my $ex_id = $_->id; $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); } } my @values = values %ex_list; return(\@values); } sub cluster { my $exons_p = shift; for (my $s = 0; $s <= $#{$exons_p}; $s++){ for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ my $exon1 = $exons_p->[$s]; my $exon2 = $exons_p->[$t]; if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ my $overlap = $exon1->intersection($exon2); print "===\n";; print "ex1\tStrand: ", $exon1->strand, "\n", $exon1->seq, "\n"; print "ex2\tStrand: ", $exon2->strand, "\n", $exon2->seq, "\n"; print "overlap\tStrand: ", $overlap->strand, "\n", $overlap->seq, "\n"; } } } } On 5/2/06 13:17, "Brian Osborne" wrote: > Marco, > > Yes, this is how intersection() is supposed to work. If both of the Range > objects have the same strand then the strand information is returned as part > of the result but if they aren't on the same strand then no strand > information is returned. > > Brian O. > > > On 5/2/06 3:30 PM, "Marco Blanchette" wrote: > >> Dear all-- >> >> I have been trying to use the intersection function to extract overlapping >> region from alternatively spliced exons as in the following script. The >> returned object from the 'my $overlap = $exon1->intersection($exon2);' is >> actually loosing the strand of $exon1 if $exon1 is from the negative strand. >> Is this behavior expected? Should I check the strand of $exon1 before >> working on the object return by any Bio::RangeI function? >> >> Many thanks >> >> #!/usr/bin/perl >> use strict; >> use warnings; >> use Bio::DB::GFF; >> >> MAIN:{ >> >> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', >> -dsn => >> 'dbi:mysql:database=dmel_43_LS;host=riolab.net', >> -user => 'guest'); >> my $test_db = $db->segment('4'); >> >> # Load up the exons into $exons_p >> for my $gene ($test_db->features(-types => 'gene')){ >> >> my $exons_p = extractExons($gene); >> >> cluster($exons_p) unless ($#{$exons_p} == -1); >> >> } >> } >> >> sub extractExons { >> my $gene = shift; >> my %ex_list; >> my @tcs = $gene->features( -type =>'processed_transcript', >> -attributes =>{Gene => $gene->group}); >> >> for my $tc (@tcs){ >> my @exons = $tc->features (-type => 'exon', >> -attributes => {Parent => $tc->group} >> ); >> >> for (@exons){ >> my $ex_id = $_->id; >> $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); >> >> } >> >> } >> my @values = values %ex_list; >> return(\@values); >> } >> >> sub cluster { >> my $exons_p = shift; >> >> for (my $s = 0; $s <= $#{$exons_p}; $s++){ >> for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ >> my $exon1 = $exons_p->[$s]; >> my $exon2 = $exons_p->[$t]; >> >> if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ >> >> my $overlap = $exon1->intersection($exon2); >> >> print "===\n";; >> print "ex1\n", $exon1->seq, "\n"; >> print "ex2\n", $exon2->seq, "\n"; >> print "overlap\n", $overlap->seq, "\n"; >> } >> } >> } >> } >> ______________________________ >> Marco Blanchette, Ph.D. >> >> mblanche at uclink.berkeley.edu >> >> Donald C. Rio's lab >> Department of Molecular and Cell Biology >> 16 Barker Hall >> University of California >> Berkeley, CA 94720-3204 >> >> Tel: (510) 642-1084 >> Cell: (510) 847-0996 >> Fax: (510) 642-6062 > > ______________________________ Marco Blanchette, Ph.D. mblanche at uclink.berkeley.edu Donald C. Rio's lab Department of Molecular and Cell Biology 16 Barker Hall University of California Berkeley, CA 94720-3204 Tel: (510) 642-1084 Cell: (510) 847-0996 Fax: (510) 642-6062 -- From osborne1 at optonline.net Tue May 2 21:49:49 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 02 May 2006 17:49:49 -0400 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF In-Reply-To: Message-ID: Marco, Odd, because the intersection() code is quite simple and it's clear how it should behave. What version of Bioperl are you using? I'm looking at the latest, in bioperl-live... Brian O. On 5/2/06 4:32 PM, "Marco Blanchette" wrote: > Brian-- > > Even when both elements of intersection() are from the negative strand, the > return object is from the positive strand and $overlap is actually the > revervese complement of the intersection between the 2 exons. Here is part > of the output from the script below: > > === > ex1 Strand: -1 > CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA > AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG > TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG > ex2 Strand: -1 > CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA > AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG > TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAACCCGT > CAAATCG > overlap Strand: 1 > CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTATTTT > TGCCGACTGCCATGTTCAACTAATAAACCGG > AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG > ... > > If both are from the positive strand, the return object is positive as in: > > === > ex1 Strand: 1 > CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCTTTGT > TTTGTGCCTGTTTCAGTATAAATTAATTATG > CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGATGAAT > AAATATACATATATGCAACATATATAACTTC > CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGGCAGA > GCAGCGATCAAAGCTGCGTTCGGTACTCGTT > GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT > ex2 Strand: 1 > ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG > overlap Strand: 1 > CAACGCAGACGTG > > Is there something I am missing? Here is the script generating the output > > Many thanks all... > > Marco > > > use strict; > use warnings; > use Bio::DB::GFF; > > MAIN:{ > > my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', > -dsn => > 'dbi:mysql:database=dmel_43_LS;host=riolab.net', > -user => 'guest'); > my $test_db = $db->segment('4'); > > # Load up the exons into $exons_p > for my $gene ($test_db->features(-types => 'gene')){ > > my $exons_p = extractExons($gene); > > cluster($exons_p) unless ($#{$exons_p} == -1); > > } > } > > sub extractExons { > my $gene = shift; > my %ex_list; > my @tcs = $gene->features( -type =>'processed_transcript', > -attributes =>{Gene => $gene->group}); > > for my $tc (@tcs){ > my @exons = $tc->features (-type => 'exon', > -attributes => {Parent => $tc->group} > ); > > for (@exons){ > my $ex_id = $_->id; > $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); > > } > > } > my @values = values %ex_list; > return(\@values); > } > > sub cluster { > my $exons_p = shift; > > for (my $s = 0; $s <= $#{$exons_p}; $s++){ > for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ > my $exon1 = $exons_p->[$s]; > my $exon2 = $exons_p->[$t]; > > if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ > > my $overlap = $exon1->intersection($exon2); > > print "===\n";; > print "ex1\tStrand: ", $exon1->strand, "\n", > $exon1->seq, "\n"; > print "ex2\tStrand: ", $exon2->strand, "\n", > $exon2->seq, "\n"; > print "overlap\tStrand: ", $overlap->strand, "\n", > $overlap->seq, "\n"; > } > } > } > } > > On 5/2/06 13:17, "Brian Osborne" wrote: > >> Marco, >> >> Yes, this is how intersection() is supposed to work. If both of the Range >> objects have the same strand then the strand information is returned as part >> of the result but if they aren't on the same strand then no strand >> information is returned. >> >> Brian O. >> >> >> On 5/2/06 3:30 PM, "Marco Blanchette" wrote: >> >>> Dear all-- >>> >>> I have been trying to use the intersection function to extract overlapping >>> region from alternatively spliced exons as in the following script. The >>> returned object from the 'my $overlap = $exon1->intersection($exon2);' is >>> actually loosing the strand of $exon1 if $exon1 is from the negative strand. >>> Is this behavior expected? Should I check the strand of $exon1 before >>> working on the object return by any Bio::RangeI function? >>> >>> Many thanks >>> >>> #!/usr/bin/perl >>> use strict; >>> use warnings; >>> use Bio::DB::GFF; >>> >>> MAIN:{ >>> >>> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', >>> -dsn => >>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net', >>> -user => 'guest'); >>> my $test_db = $db->segment('4'); >>> >>> # Load up the exons into $exons_p >>> for my $gene ($test_db->features(-types => 'gene')){ >>> >>> my $exons_p = extractExons($gene); >>> >>> cluster($exons_p) unless ($#{$exons_p} == -1); >>> >>> } >>> } >>> >>> sub extractExons { >>> my $gene = shift; >>> my %ex_list; >>> my @tcs = $gene->features( -type =>'processed_transcript', >>> -attributes =>{Gene => $gene->group}); >>> >>> for my $tc (@tcs){ >>> my @exons = $tc->features (-type => 'exon', >>> -attributes => {Parent => $tc->group} >>> ); >>> >>> for (@exons){ >>> my $ex_id = $_->id; >>> $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); >>> >>> } >>> >>> } >>> my @values = values %ex_list; >>> return(\@values); >>> } >>> >>> sub cluster { >>> my $exons_p = shift; >>> >>> for (my $s = 0; $s <= $#{$exons_p}; $s++){ >>> for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ >>> my $exon1 = $exons_p->[$s]; >>> my $exon2 = $exons_p->[$t]; >>> >>> if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ >>> >>> my $overlap = $exon1->intersection($exon2); >>> >>> print "===\n";; >>> print "ex1\n", $exon1->seq, "\n"; >>> print "ex2\n", $exon2->seq, "\n"; >>> print "overlap\n", $overlap->seq, "\n"; >>> } >>> } >>> } >>> } >>> ______________________________ >>> Marco Blanchette, Ph.D. >>> >>> mblanche at uclink.berkeley.edu >>> >>> Donald C. Rio's lab >>> Department of Molecular and Cell Biology >>> 16 Barker Hall >>> University of California >>> Berkeley, CA 94720-3204 >>> >>> Tel: (510) 642-1084 >>> Cell: (510) 847-0996 >>> Fax: (510) 642-6062 >> >> > > ______________________________ > Marco Blanchette, Ph.D. > > mblanche at uclink.berkeley.edu > > Donald C. Rio's lab > Department of Molecular and Cell Biology > 16 Barker Hall > University of California > Berkeley, CA 94720-3204 > > Tel: (510) 642-1084 > Cell: (510) 847-0996 > Fax: (510) 642-6062 From mblanche at berkeley.edu Tue May 2 22:31:44 2006 From: mblanche at berkeley.edu (Marco Blanchette) Date: Tue, 02 May 2006 15:31:44 -0700 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF In-Reply-To: Message-ID: Brian-- I checked out last week version from the CVS. Silly question: How do I get the version of BioPerl I am using... Never had to check a module/bundle version number before... Marco On 5/2/06 14:49, "Brian Osborne" wrote: > Marco, > > Odd, because the intersection() code is quite simple and it's clear how it > should behave. What version of Bioperl are you using? I'm looking at the > latest, in bioperl-live... > > Brian O. > > > On 5/2/06 4:32 PM, "Marco Blanchette" wrote: > >> Brian-- >> >> Even when both elements of intersection() are from the negative strand, the >> return object is from the positive strand and $overlap is actually the >> revervese complement of the intersection between the 2 exons. Here is part >> of the output from the script below: >> >> === >> ex1 Strand: -1 >> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA >> AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG >> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG >> ex2 Strand: -1 >> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA >> AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG >> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAACCCGT >> CAAATCG >> overlap Strand: 1 >> CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTATTTT >> TGCCGACTGCCATGTTCAACTAATAAACCGG >> AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG >> ... >> >> If both are from the positive strand, the return object is positive as in: >> >> === >> ex1 Strand: 1 >> CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCTTTGT >> TTTGTGCCTGTTTCAGTATAAATTAATTATG >> CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGATGAAT >> AAATATACATATATGCAACATATATAACTTC >> CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGGCAGA >> GCAGCGATCAAAGCTGCGTTCGGTACTCGTT >> GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT >> ex2 Strand: 1 >> ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG >> overlap Strand: 1 >> CAACGCAGACGTG >> >> Is there something I am missing? Here is the script generating the output >> >> Many thanks all... >> >> Marco >> >> >> use strict; >> use warnings; >> use Bio::DB::GFF; >> >> MAIN:{ >> >> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', >> -dsn => >> 'dbi:mysql:database=dmel_43_LS;host=riolab.net', >> -user => 'guest'); >> my $test_db = $db->segment('4'); >> >> # Load up the exons into $exons_p >> for my $gene ($test_db->features(-types => 'gene')){ >> >> my $exons_p = extractExons($gene); >> >> cluster($exons_p) unless ($#{$exons_p} == -1); >> >> } >> } >> >> sub extractExons { >> my $gene = shift; >> my %ex_list; >> my @tcs = $gene->features( -type =>'processed_transcript', >> -attributes =>{Gene => $gene->group}); >> >> for my $tc (@tcs){ >> my @exons = $tc->features (-type => 'exon', >> -attributes => {Parent => $tc->group} >> ); >> >> for (@exons){ >> my $ex_id = $_->id; >> $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); >> >> } >> >> } >> my @values = values %ex_list; >> return(\@values); >> } >> >> sub cluster { >> my $exons_p = shift; >> >> for (my $s = 0; $s <= $#{$exons_p}; $s++){ >> for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ >> my $exon1 = $exons_p->[$s]; >> my $exon2 = $exons_p->[$t]; >> >> if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ >> >> my $overlap = $exon1->intersection($exon2); >> >> print "===\n";; >> print "ex1\tStrand: ", $exon1->strand, "\n", >> $exon1->seq, "\n"; >> print "ex2\tStrand: ", $exon2->strand, "\n", >> $exon2->seq, "\n"; >> print "overlap\tStrand: ", $overlap->strand, "\n", >> $overlap->seq, "\n"; >> } >> } >> } >> } >> >> On 5/2/06 13:17, "Brian Osborne" wrote: >> >>> Marco, >>> >>> Yes, this is how intersection() is supposed to work. If both of the Range >>> objects have the same strand then the strand information is returned as part >>> of the result but if they aren't on the same strand then no strand >>> information is returned. >>> >>> Brian O. >>> >>> >>> On 5/2/06 3:30 PM, "Marco Blanchette" wrote: >>> >>>> Dear all-- >>>> >>>> I have been trying to use the intersection function to extract overlapping >>>> region from alternatively spliced exons as in the following script. The >>>> returned object from the 'my $overlap = $exon1->intersection($exon2);' is >>>> actually loosing the strand of $exon1 if $exon1 is from the negative >>>> strand. >>>> Is this behavior expected? Should I check the strand of $exon1 before >>>> working on the object return by any Bio::RangeI function? >>>> >>>> Many thanks >>>> >>>> #!/usr/bin/perl >>>> use strict; >>>> use warnings; >>>> use Bio::DB::GFF; >>>> >>>> MAIN:{ >>>> >>>> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', >>>> -dsn => >>>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net', >>>> -user => 'guest'); >>>> my $test_db = $db->segment('4'); >>>> >>>> # Load up the exons into $exons_p >>>> for my $gene ($test_db->features(-types => 'gene')){ >>>> >>>> my $exons_p = extractExons($gene); >>>> >>>> cluster($exons_p) unless ($#{$exons_p} == -1); >>>> >>>> } >>>> } >>>> >>>> sub extractExons { >>>> my $gene = shift; >>>> my %ex_list; >>>> my @tcs = $gene->features( -type =>'processed_transcript', >>>> -attributes =>{Gene => $gene->group}); >>>> >>>> for my $tc (@tcs){ >>>> my @exons = $tc->features (-type => 'exon', >>>> -attributes => {Parent => $tc->group} >>>> ); >>>> >>>> for (@exons){ >>>> my $ex_id = $_->id; >>>> $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); >>>> >>>> } >>>> >>>> } >>>> my @values = values %ex_list; >>>> return(\@values); >>>> } >>>> >>>> sub cluster { >>>> my $exons_p = shift; >>>> >>>> for (my $s = 0; $s <= $#{$exons_p}; $s++){ >>>> for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ >>>> my $exon1 = $exons_p->[$s]; >>>> my $exon2 = $exons_p->[$t]; >>>> >>>> if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ >>>> >>>> my $overlap = $exon1->intersection($exon2); >>>> >>>> print "===\n";; >>>> print "ex1\n", $exon1->seq, "\n"; >>>> print "ex2\n", $exon2->seq, "\n"; >>>> print "overlap\n", $overlap->seq, "\n"; >>>> } >>>> } >>>> } >>>> } >>>> ______________________________ >>>> Marco Blanchette, Ph.D. >>>> >>>> mblanche at uclink.berkeley.edu >>>> >>>> Donald C. Rio's lab >>>> Department of Molecular and Cell Biology >>>> 16 Barker Hall >>>> University of California >>>> Berkeley, CA 94720-3204 >>>> >>>> Tel: (510) 642-1084 >>>> Cell: (510) 847-0996 >>>> Fax: (510) 642-6062 >>> >>> >> >> ______________________________ >> Marco Blanchette, Ph.D. >> >> mblanche at uclink.berkeley.edu >> >> Donald C. Rio's lab >> Department of Molecular and Cell Biology >> 16 Barker Hall >> University of California >> Berkeley, CA 94720-3204 >> >> Tel: (510) 642-1084 >> Cell: (510) 847-0996 >> Fax: (510) 642-6062 > > ______________________________ Marco Blanchette, Ph.D. mblanche at uclink.berkeley.edu Donald C. Rio's lab Department of Molecular and Cell Biology 16 Barker Hall University of California Berkeley, CA 94720-3204 Tel: (510) 642-1084 Cell: (510) 847-0996 Fax: (510) 642-6062 -- From arareko at campus.iztacala.unam.mx Tue May 2 22:32:24 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Tue, 02 May 2006 17:32:24 -0500 Subject: [Bioperl-l] BioPerl-run in FreeBSD Message-ID: <4457DDF8.4050005@campus.iztacala.unam.mx> It?s my great pleasure to announce the availability of the BioPerl-run packages (stable & developer releases) for the FreeBSD operating system. For instructions on how to install BioPerl ports in FreeBSD, please take a look into the Getting Bioperl section of the BioPerl Wiki. Regards, Mauricio. -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From heikki at sanbi.ac.za Wed May 3 06:51:12 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 3 May 2006 08:51:12 +0200 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF In-Reply-To: References: Message-ID: <200605030851.13007.heikki@sanbi.ac.za> On Wednesday 03 May 2006 00:31, Marco Blanchette wrote: > Brian-- > > I checked out last week version from the CVS. > > Silly question: How do I get the version of BioPerl I am using... Never had > to check a module/bundle version number before... It is not that silly. The syntax in not too easy: perl -MBio::Perl -le 'print Bio::Perl->VERSION;' You can use any module in bioperl, of course. -Heikki > Marco > > On 5/2/06 14:49, "Brian Osborne" wrote: > > Marco, > > > > Odd, because the intersection() code is quite simple and it's clear how > > it should behave. What version of Bioperl are you using? I'm looking at > > the latest, in bioperl-live... > > > > Brian O. > > > > On 5/2/06 4:32 PM, "Marco Blanchette" wrote: > >> Brian-- > >> > >> Even when both elements of intersection() are from the negative strand, > >> the return object is from the positive strand and $overlap is actually > >> the revervese complement of the intersection between the 2 exons. Here > >> is part of the output from the script below: > >> > >> === > >> ex1 Strand: -1 > >> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAA > >>AATA AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG > >> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG > >> ex2 Strand: -1 > >> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAA > >>AATA AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG > >> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAAC > >>CCGT CAAATCG > >> overlap Strand: 1 > >> CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTA > >>TTTT TGCCGACTGCCATGTTCAACTAATAAACCGG > >> AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG > >> ... > >> > >> If both are from the positive strand, the return object is positive as > >> in: > >> > >> === > >> ex1 Strand: 1 > >> CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCT > >>TTGT TTTGTGCCTGTTTCAGTATAAATTAATTATG > >> CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGAT > >>GAAT AAATATACATATATGCAACATATATAACTTC > >> CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGG > >>CAGA GCAGCGATCAAAGCTGCGTTCGGTACTCGTT > >> GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT > >> ex2 Strand: 1 > >> ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG > >> overlap Strand: 1 > >> CAACGCAGACGTG > >> > >> Is there something I am missing? Here is the script generating the > >> output > >> > >> Many thanks all... > >> > >> Marco > >> > >> > >> use strict; > >> use warnings; > >> use Bio::DB::GFF; > >> > >> MAIN:{ > >> > >> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', > >> -dsn => > >> 'dbi:mysql:database=dmel_43_LS;host=riolab.net', > >> -user => 'guest'); > >> my $test_db = $db->segment('4'); > >> > >> # Load up the exons into $exons_p > >> for my $gene ($test_db->features(-types => 'gene')){ > >> > >> my $exons_p = extractExons($gene); > >> > >> cluster($exons_p) unless ($#{$exons_p} == -1); > >> > >> } > >> } > >> > >> sub extractExons { > >> my $gene = shift; > >> my %ex_list; > >> my @tcs = $gene->features( -type =>'processed_transcript', > >> -attributes =>{Gene => > >> $gene->group}); > >> > >> for my $tc (@tcs){ > >> my @exons = $tc->features (-type => 'exon', > >> -attributes => {Parent => > >> $tc->group} ); > >> > >> for (@exons){ > >> my $ex_id = $_->id; > >> $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); > >> > >> } > >> > >> } > >> my @values = values %ex_list; > >> return(\@values); > >> } > >> > >> sub cluster { > >> my $exons_p = shift; > >> > >> for (my $s = 0; $s <= $#{$exons_p}; $s++){ > >> for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ > >> my $exon1 = $exons_p->[$s]; > >> my $exon2 = $exons_p->[$t]; > >> > >> if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ > >> > >> my $overlap = $exon1->intersection($exon2); > >> > >> print "===\n";; > >> print "ex1\tStrand: ", $exon1->strand, "\n", > >> $exon1->seq, "\n"; > >> print "ex2\tStrand: ", $exon2->strand, "\n", > >> $exon2->seq, "\n"; > >> print "overlap\tStrand: ", $overlap->strand, "\n", > >> $overlap->seq, "\n"; > >> } > >> } > >> } > >> } > >> > >> On 5/2/06 13:17, "Brian Osborne" wrote: > >>> Marco, > >>> > >>> Yes, this is how intersection() is supposed to work. If both of the > >>> Range objects have the same strand then the strand information is > >>> returned as part of the result but if they aren't on the same strand > >>> then no strand information is returned. > >>> > >>> Brian O. > >>> > >>> On 5/2/06 3:30 PM, "Marco Blanchette" wrote: > >>>> Dear all-- > >>>> > >>>> I have been trying to use the intersection function to extract > >>>> overlapping region from alternatively spliced exons as in the > >>>> following script. The returned object from the 'my $overlap = > >>>> $exon1->intersection($exon2);' is actually loosing the strand of > >>>> $exon1 if $exon1 is from the negative strand. > >>>> Is this behavior expected? Should I check the strand of $exon1 before > >>>> working on the object return by any Bio::RangeI function? > >>>> > >>>> Many thanks > >>>> > >>>> #!/usr/bin/perl > >>>> use strict; > >>>> use warnings; > >>>> use Bio::DB::GFF; > >>>> > >>>> MAIN:{ > >>>> > >>>> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', > >>>> -dsn => > >>>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net', > >>>> -user => 'guest'); > >>>> my $test_db = $db->segment('4'); > >>>> > >>>> # Load up the exons into $exons_p > >>>> for my $gene ($test_db->features(-types => 'gene')){ > >>>> > >>>> my $exons_p = extractExons($gene); > >>>> > >>>> cluster($exons_p) unless ($#{$exons_p} == -1); > >>>> > >>>> } > >>>> } > >>>> > >>>> sub extractExons { > >>>> my $gene = shift; > >>>> my %ex_list; > >>>> my @tcs = $gene->features( -type =>'processed_transcript', > >>>> -attributes =>{Gene => > >>>> $gene->group}); > >>>> > >>>> for my $tc (@tcs){ > >>>> my @exons = $tc->features (-type => 'exon', > >>>> -attributes => {Parent => > >>>> $tc->group} ); > >>>> > >>>> for (@exons){ > >>>> my $ex_id = $_->id; > >>>> $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); > >>>> > >>>> } > >>>> > >>>> } > >>>> my @values = values %ex_list; > >>>> return(\@values); > >>>> } > >>>> > >>>> sub cluster { > >>>> my $exons_p = shift; > >>>> > >>>> for (my $s = 0; $s <= $#{$exons_p}; $s++){ > >>>> for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ > >>>> my $exon1 = $exons_p->[$s]; > >>>> my $exon2 = $exons_p->[$t]; > >>>> > >>>> if (!($exon1->equals($exon2)) && > >>>> $exon1->overlaps($exon2)){ > >>>> > >>>> my $overlap = $exon1->intersection($exon2); > >>>> > >>>> print "===\n";; > >>>> print "ex1\n", $exon1->seq, "\n"; > >>>> print "ex2\n", $exon2->seq, "\n"; > >>>> print "overlap\n", $overlap->seq, "\n"; > >>>> } > >>>> } > >>>> } > >>>> } > >>>> ______________________________ > >>>> Marco Blanchette, Ph.D. > >>>> > >>>> mblanche at uclink.berkeley.edu > >>>> > >>>> Donald C. Rio's lab > >>>> Department of Molecular and Cell Biology > >>>> 16 Barker Hall > >>>> University of California > >>>> Berkeley, CA 94720-3204 > >>>> > >>>> Tel: (510) 642-1084 > >>>> Cell: (510) 847-0996 > >>>> Fax: (510) 642-6062 > >> > >> ______________________________ > >> Marco Blanchette, Ph.D. > >> > >> mblanche at uclink.berkeley.edu > >> > >> Donald C. Rio's lab > >> Department of Molecular and Cell Biology > >> 16 Barker Hall > >> University of California > >> Berkeley, CA 94720-3204 > >> > >> Tel: (510) 642-1084 > >> Cell: (510) 847-0996 > >> Fax: (510) 642-6062 > > ______________________________ > Marco Blanchette, Ph.D. > > mblanche at uclink.berkeley.edu > > Donald C. Rio's lab > Department of Molecular and Cell Biology > 16 Barker Hall > University of California > Berkeley, CA 94720-3204 > > Tel: (510) 642-1084 > Cell: (510) 847-0996 > Fax: (510) 642-6062 -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From nuclearn at gmail.com Wed May 3 06:05:42 2006 From: nuclearn at gmail.com (Li Xiao) Date: Wed, 3 May 2006 14:05:42 +0800 Subject: [Bioperl-l] about the frame and strand of a blastx report Message-ID: <150864390605022305p5a04e743l24938386af12edf3@mail.gmail.com> Hi, anybody, I am working to parse a blastx report by using BioPerl modules (Bio::SearchIO). The blastx result was created by NCBI-BLAST. How i can obtain the strand ( + or -) of query sequence against the hited protein? I tried to use the strand function, but nothing were reported. And i used the frame funtion, the result usually display 0,1,2, so, the result can not give any information about the query strand( + o r- ). How i obtain the strand of a query squence? -- ********************************************************************* Li Xiao Sichuan Key Laboratory of Molecular Biology and Biotechnology College of Life Science, Sichuan University Chengdu, SiChuan, P.R.China TEL:86-28-85470083 FAX:86-28-85412738 E-MAIL: nuclearn at gmail.com URL: http://scbi.scu.edu.cn ********************************************************************** From cjfields at uiuc.edu Wed May 3 13:38:17 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 3 May 2006 08:38:17 -0500 Subject: [Bioperl-l] about the frame and strand of a blastx report In-Reply-To: <150864390605022305p5a04e743l24938386af12edf3@mail.gmail.com> Message-ID: <000601c66eb6$d5d5f530$15327e82@pyrimidine> $hsp->strand(): my $parser = Bio::SearchIO->new (-file => shift @ARGV, -format => 'blast'); while (my $result = $parser->next_result) { while (my $hit = $result->next_hit) { while (my $hsp = $hit->next_hsp) { print $hsp->strand,"\n"; } } } This will give 1 or -1. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Li Xiao > Sent: Wednesday, May 03, 2006 1:06 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] about the frame and strand of a blastx report > > Hi, anybody, > > I am working to parse a blastx report by using BioPerl modules > (Bio::SearchIO). > The blastx result was created by NCBI-BLAST. How i can obtain the strand ( > + > or -) > of query sequence against the hited protein? I tried to use the strand > function, but > nothing were reported. And i used the frame funtion, the result usually > display 0,1,2, > so, the result can not give any information about the query strand( + o r- > ). > How i obtain the strand of a query squence? > -- > ********************************************************************* > Li Xiao > Sichuan Key Laboratory of Molecular Biology and Biotechnology > College of Life Science, Sichuan University > Chengdu, SiChuan, P.R.China > TEL:86-28-85470083 FAX:86-28-85412738 > E-MAIL: nuclearn at gmail.com > URL: http://scbi.scu.edu.cn > ********************************************************************** > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Wed May 3 15:22:27 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 03 May 2006 11:22:27 -0400 Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines In-Reply-To: <20060502114101.29745.qmail@web50409.mail.yahoo.com> Message-ID: Mark, So you're trying to get the information in the RC line from a Swissprot format file? Brian O. On 5/2/06 7:41 AM, "Mark A. Miller" wrote: > Hello all. > > I have a recently donwloaded UniProt/TrEMBL flat file. I am trying to > make FASTA subset files for some bacterial strains. I haven't been > able to parse out the strain information from the OS or RC lines. > These lines typically look like: > > OS Somegenus somespecies subsp. somesubspecies strain ABC123. > RC STRAIN=ABC123. > > I'm not especiialy good with Perl, and I'm definitely weak when it > comes to OOP. > > I have included some code I pasted together from various pages on the > bioperl wiki. In addition to the wiki, I have been making use of > www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html > > The code I have so far reports the species but not the subspecies or > variant. I have also tried to walk through all of the feature, > annotation and reference objects but I still can't seem to parse out > the information I need. (For brevity, the example I'm including below > only lists the code I used for the annotation objects.) Also, this > code only prints the information... I know that I'll have to write a > FASTA sequence object seperately. > > Any suggestions? > > Thanks, > Mark > > --- --- --- > > > #!/usr/bin/perl > > > > use Bio::SeqIO; > > > > my $usage = "getaccs.pl file format\n"; > > my $file = shift or die $usage; > > my $format = shift or die $usage; > > > > my $inseq = Bio::SeqIO->new(-file => "<$file", > > -format => $format ); > > > > while (my $seq = $inseq->next_seq) { > > > > my $species_object = $seq->species; > > my $species_string = $species_object->species; > > my $variant_string = $species_object->variant; > > my $common_string = $species_object->common_name; > > my $sub_string = $species_object->sub_species; > > my $binomial = $species_object->binomial('FULL'); > > > > print "display ",$seq->display_id,"\n"; > > print "accession ",$seq->accession_number,"\n"; > > print "desc ",$seq->desc,"\n"; > > > > print "species ",$species_string,"\n"; > > print "variant ",$variant_string,"\n"; > > print "common ",$common_string,"\n"; > > print "sub ",$sub_string,"\n"; > > print "binomial ",$binomial,"\n"; > > > > print $seq->seq,"\n"; > > > > my $anno_collection = $seq->annotation; > > for my $key ( $anno_collection->get_all_annotation_keys ) { > > my @annotations = $anno_collection->get_Annotations($key); > > for my $value ( @annotations ) { > > print "tagname : ", $value->tagname, "\n"; > > # $value is an Bio::Annotation, and has an "as_text" method > > print " annotation value: ", $value->as_text, "\n"; > > > > if ($value->tagname eq "reference") { > > my $hash_ref = $value->hash_tree; > > for my $key (keys %{$hash_ref}) { > > print $key,": ",$hash_ref->{$key},"\n"; > > } > > } > > } > > } > > print "\n"; > > } > > exit; > > > > > > --- --- --- --- --- --- --- --- > > Mark A. Miller > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From MEC at stowers-institute.org Wed May 3 15:09:04 2006 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Wed, 3 May 2006 10:09:04 -0500 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF Message-ID: Marco, It appears that your code assumes that the exons as returned from call to BIO::DB::GFF::features are sorted by start; I don't think is guaranteed (at least not in the documentation I'm reading). Also I think your code will not report overlap between two exons that have an intervening overlapping exon. Depending on what you're application is, you may care. For example, e1, e2, e3 all intersect pairwise, but your code won't report on e1's overlap with e3. e1 ---*******------- e2 -----******------ e3 ------***-------- Out of curiousity, what is your application? Designing primers for gene resequencing? Cheers, Malcolm Cook Database Applications Manager, Bioinformatics Stowers Institute for Medical Research >-----Original Message----- >From: bioperl-l-bounces at lists.open-bio.org >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >Marco Blanchette >Sent: Tuesday, May 02, 2006 2:31 PM >To: bioperl-l at lists.open-bio.org >Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF > >Dear all-- > >I have been trying to use the intersection function to extract >overlapping >region from alternatively spliced exons as in the following script. The >returned object from the 'my $overlap = >$exon1->intersection($exon2);' is >actually loosing the strand of $exon1 if $exon1 is from the >negative strand. >Is this behavior expected? Should I check the strand of $exon1 before >working on the object return by any Bio::RangeI function? > >Many thanks > >#!/usr/bin/perl >use strict; >use warnings; >use Bio::DB::GFF; > >MAIN:{ > > my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql', > -dsn => >'dbi:mysql:database=dmel_43_LS;host=riolab.net', > -user => 'guest'); > my $test_db = $db->segment('4'); > > # Load up the exons into $exons_p > for my $gene ($test_db->features(-types => 'gene')){ > > my $exons_p = extractExons($gene); > > cluster($exons_p) unless ($#{$exons_p} == -1); > > } >} > >sub extractExons { > my $gene = shift; > my %ex_list; > my @tcs = $gene->features( -type =>'processed_transcript', > -attributes =>{Gene => >$gene->group}); > > for my $tc (@tcs){ > my @exons = $tc->features (-type => 'exon', > -attributes => {Parent => >$tc->group} >); > > for (@exons){ > my $ex_id = $_->id; > $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id}); > > } > > } > my @values = values %ex_list; > return(\@values); >} > >sub cluster { > my $exons_p = shift; > > for (my $s = 0; $s <= $#{$exons_p}; $s++){ > for (my $t = $s+1; $t <= $#{$exons_p}; $t++){ > my $exon1 = $exons_p->[$s]; > my $exon2 = $exons_p->[$t]; > > if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){ > > my $overlap = $exon1->intersection($exon2); > > print "===\n";; > print "ex1\n", $exon1->seq, "\n"; > print "ex2\n", $exon2->seq, "\n"; > print "overlap\n", $overlap->seq, "\n"; > } > } > } >} >______________________________ >Marco Blanchette, Ph.D. > >mblanche at uclink.berkeley.edu > >Donald C. Rio's lab >Department of Molecular and Cell Biology >16 Barker Hall >University of California >Berkeley, CA 94720-3204 > >Tel: (510) 642-1084 >Cell: (510) 847-0996 >Fax: (510) 642-6062 >-- > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From sdavis2 at mail.nih.gov Wed May 3 16:18:48 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 03 May 2006 12:18:48 -0400 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF In-Reply-To: Message-ID: On 5/3/06 11:09 AM, "Cook, Malcolm" wrote: > Marco, > > It appears that your code assumes that the exons as returned from call > to BIO::DB::GFF::features are sorted by start; I don't think is > guaranteed (at least not in the documentation I'm reading). Also I > think your code will not report overlap between two exons that have an > intervening overlapping exon. Depending on what you're application is, > you may care. For example, e1, e2, e3 all intersect pairwise, but your > code won't report on e1's overlap with e3. > > e1 ---*******------- > e2 -----******------ > e3 ------***-------- I think this can be done (looking for "superexons") via the UCSC table browser or via Penn State University's Galaxy server (written in python and downloadable) in case you want a quick solution to what I think is your problem.... Sean From osborne1 at optonline.net Wed May 3 20:22:57 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 03 May 2006 16:22:57 -0400 Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines In-Reply-To: <20060503193446.92476.qmail@web50412.mail.yahoo.com> Message-ID: Mark, The RC line is part of the description of a reference, I'm guessing 'RC' stands for Reference Comment. In order to get the attributes of a reference you'll first do something like: my $anno_collection = $seq->annotation; my @references = $anno_collection->get_Annotations('reference'); To get the comment field for a specific reference you can do: $references[0]->comment; See the Feature-Annotation HOWTO for more information on Annotations, the Reference object is a kind of Annotation object. Brian O. On 5/3/06 3:34 PM, "Mark A. Miller" wrote: > Yeah. Do you have any experience with that? > > Mark > > --- Brian Osborne wrote: > >> Mark, >> >> So you're trying to get the information in the RC line from a >> Swissprot >> format file? >> >> Brian O. > > > --- --- --- --- --- --- --- --- > > Mark A. Miller > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com From cjfields at uiuc.edu Wed May 3 21:09:36 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 3 May 2006 16:09:36 -0500 Subject: [Bioperl-l] Batch retrieval partially implemented in Bio::DB::GenBank/GenPept Message-ID: <000601c66ef5$e3066d90$15327e82@pyrimidine> Just wanted to let you guys know I have added a few bits and pieces to Bio::DB::Gen* and BioLLDB::NCBIHelper for batch retrieval using epost/efetch. I didn't want to break anything too severely so you can only use this at the moment using get_seq_stream (i.e. NOT through get_Stream* methods yet). I also added tests to DB.t, a few each for protein and nucleotide retrieval using batch mode and so far they all pass fine. I haven't tested the upper sequence limit for this yet to see if it's at all comparable to just using efetch but it seems a bit faster. The eutils coursebook states that one should only post ~500 at a time (I think you can get a bit higher though). Also, at the moment it only works at the moment for GI's (NOT accessions, which apparently epost does not accept). If we want to continue using this method for retrieval then we may need a workaround for accs. CJF Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From torsten.seemann at infotech.monash.edu.au Wed May 3 21:44:48 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Thu, 04 May 2006 07:44:48 +1000 Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF In-Reply-To: References: Message-ID: <1146692688.12571.1.camel@chauvel.csse.monash.edu.au> Marco, > Silly question: How do I get the version of BioPerl I am using... Never had > to check a module/bundle version number before... http://bioperl.org/wiki/FAQ#How_can_I_tell_what_version_of_BioPerl_is_installed.3F -- Torsten Seemann Victorian Bioinformatics Consortium From cjfields at uiuc.edu Wed May 3 22:08:37 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 3 May 2006 17:08:37 -0500 Subject: [Bioperl-l] Batch retrieval partially implemented inBio::DB::GenBank/GenPept In-Reply-To: <000601c66ef5$e3066d90$15327e82@pyrimidine> Message-ID: <000001c66efe$21dbcf80$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Wednesday, May 03, 2006 4:10 PM > To: 'Jason Stajich'; 'Brian Osborne'; bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Batch retrieval partially implemented > inBio::DB::GenBank/GenPept > > Just wanted to let you guys know I have added a few bits and pieces to > Bio::DB::Gen* and BioLLDB::NCBIHelper for batch retrieval using ^^^^^^^^^^^^^^^^^^^ Bio::DB::NCBIHelper Fat fingers! > epost/efetch. I didn't want to break anything too severely so you can > only > use this at the moment using get_seq_stream (i.e. NOT through get_Stream* > methods yet). I also added tests to DB.t, a few each for protein and > nucleotide retrieval using batch mode and so far they all pass fine. > > I haven't tested the upper sequence limit for this yet to see if it's at > all > comparable to just using efetch but it seems a bit faster. The eutils > coursebook states that one should only post ~500 at a time (I think you > can > get a bit higher though). > > Also, at the moment it only works at the moment for GI's (NOT accessions, > which apparently epost does not accept). If we want to continue using > this > method for retrieval then we may need a workaround for accs. > > CJF > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From arareko at campus.iztacala.unam.mx Wed May 3 22:24:23 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Wed, 03 May 2006 17:24:23 -0500 Subject: [Bioperl-l] Batch retrieval partially implemented inBio::DB::GenBank/GenPept In-Reply-To: <000001c66efe$21dbcf80$15327e82@pyrimidine> References: <000001c66efe$21dbcf80$15327e82@pyrimidine> Message-ID: <44592D97.6090906@campus.iztacala.unam.mx> hehehe :) Chris Fields wrote: > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Chris Fields >> Sent: Wednesday, May 03, 2006 4:10 PM >> To: 'Jason Stajich'; 'Brian Osborne'; bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Batch retrieval partially implemented >> inBio::DB::GenBank/GenPept >> >> Just wanted to let you guys know I have added a few bits and pieces to >> Bio::DB::Gen* and BioLLDB::NCBIHelper for batch retrieval using > ^^^^^^^^^^^^^^^^^^^ > Bio::DB::NCBIHelper > Fat fingers! > >> epost/efetch. I didn't want to break anything too severely so you can >> only >> use this at the moment using get_seq_stream (i.e. NOT through get_Stream* >> methods yet). I also added tests to DB.t, a few each for protein and >> nucleotide retrieval using batch mode and so far they all pass fine. >> >> I haven't tested the upper sequence limit for this yet to see if it's at >> all >> comparable to just using efetch but it seems a bit faster. The eutils >> coursebook states that one should only post ~500 at a time (I think you >> can >> get a bit higher though). >> >> Also, at the moment it only works at the moment for GI's (NOT accessions, >> which apparently epost does not accept). If we want to continue using >> this >> method for retrieval then we may need a workaround for accs. >> >> CJF >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From fernan at iib.unsam.edu.ar Thu May 4 00:38:07 2006 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Wed, 3 May 2006 21:38:07 -0300 Subject: [Bioperl-l] BioPerl-run in FreeBSD In-Reply-To: <4457DDF8.4050005@campus.iztacala.unam.mx> References: <4457DDF8.4050005@campus.iztacala.unam.mx> Message-ID: <20060504003807.GA86447@iib.unsam.edu.ar> +----[ Mauricio Herrera Cuadra (02.May.2006 19:49): | | It?s my great pleasure to announce the availability of the BioPerl-run | packages (stable & developer releases) for the FreeBSD operating system. | | For instructions on how to install BioPerl ports in FreeBSD, please take | a look into the Getting Bioperl section of the BioPerl Wiki. | +----] Great job Mauricio, thanks for contributing this! Fernan From miker at biotiquesystems.com Wed May 3 03:31:59 2006 From: miker at biotiquesystems.com (Michael Rogoff) Date: Tue, 2 May 2006 20:31:59 -0700 Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps Message-ID: <007b01c66e62$23161d20$c100a8c0@mike> I've encountered a pretty serious bug in Bio::SeqIO when parsing certain genbank files that contain CONTIG entries with gaps. One such record is NW_925173. When I try to parse this file using Bio::SeqIO::genbank, it will enter an infinite loop and spin until it runs out of memory. I'm pretty certain it relates to this bug: http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to indicate that genbank records with CONTIG gaps are not valid and can't be parsed. But this bug actually claims to be fixed, which is strange, since looking at the code for FTLocationFactory (where the loop is) it's still right there. I assume that this may be fixed in other contexts but is still not fixed in Bio::SeqIO::genbank? Or am I doing something wrong? I think that this should probably be filed as an open bug. I would think that even if bioperl isn't interested in parsing this type of file via SeqIO, certainly you'd want to ensure that no finite input file would send the parser into an infinite loop. Have others encountered this problem? Is there any plan to address it? Thanks very much for any information or help! -Mike P.S. I've played around with my version of FTLocationFactory and it seems to actually work and parse the gaps. I'm not sure if I've created other bugs or if it works in all cases, but at least the parser doesn't die. I also don't know that my hacky code is appropriate for putting back in to BioPerl, but I'm happy to provide it if someone wants to check it out and/or consider it for checkin. From ULNJUJERYDIX at spammotel.com Wed May 3 08:20:38 2006 From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau) Date: Wed, 3 May 2006 16:20:38 +0800 Subject: [Bioperl-l] Bio::Graphics::Panel imagemap making with Bio::Graphics::Panel Message-ID: <5b6410e0605030120q31d1f554mbc4bf104deca48bf@mail.gmail.com> Help! I can't figure out the docs instructions I want to create an imagemap of short sequence matches with a longer one with clickable imagemaps for the short sequences. I figure I can do this easily enough using the example script for parsing blast output but I need an example script to understand how to produce the html code for the imagemap. I can find only rather cryptic references about how this can be done (see below). $boxes = $panel-Eboxes @boxes = $panel-Eboxes The boxes() method returns a list of arrayrefs containing the coordinates of each glyph. The method is useful for constructing an image map. In a scalar context, boxes() returns an arrayref. In an list context, the method returns the list directly. Each member of the list is an arrayref of the following format: [ $feature, $x1, $y1, $x2, $y2, $track ] The first element is the feature object; either an Ace::Sequence::Feature, a Das::Segment::Feature, or another Bioperl Bio::SeqFeatureI object. The coordinates are the topleft and bottomright corners of the glyph, including any space allocated for labels. The track is the Bio::Graphics::Glyph object corresponding to the track that the feature is rendered inside. $position = $panel-Etrack_position($track) After calling gd() or boxes(), you can learn the resulting Y coordinate of a track by calling track_position() with the value returned by add_track() or unshift_track(). This will return undef if called before gd() or boxes() or with an invalid track. @pixel_coords = $panel-Elocation2pixel(@feature_coords) Public routine to map feature coordinates (in base pairs) into pixel coordinates relative to the left-hand edge of the picture. If you define a -background callback, the callback may wish to invoke this routine in order to translate base coordinates into pixel coordinates. $left = $panel-Eleft $right = $panel-Eright $top = $panel-Etop $bottom = $panel-Ebottom Return the pixel coordinates of the *drawing area* of the panel, that is, exclusive of the padding. got it from http://docs.bioperl.org/bioperl-live/Bio/Graphics/Panel.html From s.johri at imperial.ac.uk Thu May 4 12:50:34 2006 From: s.johri at imperial.ac.uk (Johri, Saurabh) Date: Thu, 4 May 2006 13:50:34 +0100 Subject: [Bioperl-l] Fu and Li's D statistic - calculate Message-ID: <4A98ACB8EC146149872BAC9A132A582C277AB3@icex5.ic.ac.uk> Hi all, I'm trying to calculate Fu and Li's D summary statistic for a group of sequences. the function fu_and_li_D(@ingroup,$extmutations) takes 2 args, the first being the ingroup (population) and the second being the number of external mutations which is calculated from an outgroup sequence.. my question is, which function do i use to calculate the number of external mutations ? would this be the singleton_count() function ? the singleton_count() function takes a PopGen object - which represents a clustal alignment file... would i include the outgroup in a multiple fasta file for alignment with clustal ? any suggestions as to how to calculate the number of external mutations would be much appreciated Thanks for your help! Saurabh Johri Centre for Molecular Microbiology & Infection Imperial College London SW7 2AZ From hlapp at gmx.net Thu May 4 16:30:05 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 4 May 2006 12:30:05 -0400 Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps In-Reply-To: <007b01c66e62$23161d20$c100a8c0@mike> References: <007b01c66e62$23161d20$c100a8c0@mike> Message-ID: Infinite loop on a file you can download (i.e., as opposed to a file you tinkered with) is never ok. Could you file this as a bug report? And ideally attach your patch? Thanks, -hilmar On May 2, 2006, at 11:31 PM, Michael Rogoff wrote: > > I've encountered a pretty serious bug in Bio::SeqIO when parsing > certain genbank > files that contain CONTIG entries with gaps. One such record is > NW_925173. > > When I try to parse this file using Bio::SeqIO::genbank, it will > enter an > infinite loop and spin until it runs out of memory. > > I'm pretty certain it relates to this bug: > http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to > indicate that > genbank records with CONTIG gaps are not valid and can't be > parsed. But this > bug actually claims to be fixed, which is strange, since looking at > the code for > FTLocationFactory (where the loop is) it's still right there. I > assume that > this may be fixed in other contexts but is still not fixed in > Bio::SeqIO::genbank? Or am I doing something wrong? > > I think that this should probably be filed as an open bug. I would > think that > even if bioperl isn't interested in parsing this type of file via > SeqIO, > certainly you'd want to ensure that no finite input file would send > the parser > into an infinite loop. Have others encountered this problem? Is > there any plan > to address it? > > Thanks very much for any information or help! > > -Mike > > P.S. I've played around with my version of FTLocationFactory and > it seems to > actually work and parse the gaps. I'm not sure if I've created > other bugs or if > it works in all cases, but at least the parser doesn't die. I also > don't know > that my hacky code is appropriate for putting back in to BioPerl, > but I'm happy > to provide it if someone wants to check it out and/or consider it > for checkin. > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From saldroubi at yahoo.com Thu May 4 17:03:00 2006 From: saldroubi at yahoo.com (Sam Al-Droubi) Date: Thu, 4 May 2006 10:03:00 -0700 (PDT) Subject: [Bioperl-l] Is webiste down? Message-ID: <20060504170300.12178.qmail@web34301.mail.mud.yahoo.com> All, Is the bioperl website down? I can't get to http://www.bioperl.org Thank you. Sincerely, Sam Al-Droubi, M.S. saldroubi at yahoo.com From arareko at campus.iztacala.unam.mx Thu May 4 18:22:52 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 04 May 2006 13:22:52 -0500 Subject: [Bioperl-l] Is webiste down? In-Reply-To: <20060504170300.12178.qmail@web34301.mail.mud.yahoo.com> References: <20060504170300.12178.qmail@web34301.mail.mud.yahoo.com> Message-ID: <445A467C.4070700@campus.iztacala.unam.mx> Website is ok, maybe your gateway can't lookup the bioperl server at the moment. Regards, Mauricio. Sam Al-Droubi wrote: > All, > > Is the bioperl website down? I can't get to http://www.bioperl.org > > > Thank you. > > > > Sincerely, > Sam Al-Droubi, M.S. > saldroubi at yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Thu May 4 18:40:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 4 May 2006 13:40:32 -0500 Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps In-Reply-To: <007b01c66e62$23161d20$c100a8c0@mike> Message-ID: <000001c66faa$3a25b130$15327e82@pyrimidine> Are you using the CONTIG record or the full GenBank file? I see problems with both (using bioperl-live) which seem unrelated to one another. The full file seems to be running a bit slow b/c the full GenBank record is huge (~55 MB) but the CONTIG file does exactly what you said (runs out of memory). Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Michael Rogoff > Sent: Tuesday, May 02, 2006 10:32 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > > I've encountered a pretty serious bug in Bio::SeqIO when parsing certain > genbank > files that contain CONTIG entries with gaps. One such record is > NW_925173. > > When I try to parse this file using Bio::SeqIO::genbank, it will enter an > infinite loop and spin until it runs out of memory. > > I'm pretty certain it relates to this bug: > http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to indicate > that > genbank records with CONTIG gaps are not valid and can't be parsed. But > this > bug actually claims to be fixed, which is strange, since looking at the > code for > FTLocationFactory (where the loop is) it's still right there. I assume > that > this may be fixed in other contexts but is still not fixed in > Bio::SeqIO::genbank? Or am I doing something wrong? > > I think that this should probably be filed as an open bug. I would think > that > even if bioperl isn't interested in parsing this type of file via SeqIO, > certainly you'd want to ensure that no finite input file would send the > parser > into an infinite loop. Have others encountered this problem? Is there > any plan > to address it? > > Thanks very much for any information or help! > > -Mike > > P.S. I've played around with my version of FTLocationFactory and it seems > to > actually work and parse the gaps. I'm not sure if I've created other bugs > or if > it works in all cases, but at least the parser doesn't die. I also don't > know > that my hacky code is appropriate for putting back in to BioPerl, but I'm > happy > to provide it if someone wants to check it out and/or consider it for > checkin. > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From j.abbott at imperial.ac.uk Thu May 4 15:44:44 2006 From: j.abbott at imperial.ac.uk (James Abbott) Date: Thu, 04 May 2006 16:44:44 +0100 Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines In-Reply-To: <7B49D031-9F74-43C3-AA4F-2AE115BB843D@duke.edu> References: <20060502114101.29745.qmail@web50409.mail.yahoo.com> <7B49D031-9F74-43C3-AA4F-2AE115BB843D@duke.edu> Message-ID: <445A216C.7090108@imperial.ac.uk> Jason Stajich wrote: > I don't know if any of this has been resolved really so hopefully > James will speak up if he's implemented anything. Not as yet, I'm afraid - $job is keeping me overly busy at the moment, but it's on my todo list.... Cheers, James -- Dr. James Abbott Bioinformatics Software Developer, Bioinformatics Support Service Imperial College, London From hubert.prielinger at gmx.at Thu May 4 19:35:42 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Thu, 04 May 2006 13:35:42 -0600 Subject: [Bioperl-l] can't parse blast file anymore Message-ID: <445A578E.8050207@gmx.at> Hi, the following perl script worked fine until a few days ago.... ============================================================== #!/usr/bin/perl -w use Bio::SearchIO; use strict; use DBI; use Net::MySQL; #use lib qw(/usr/local/lib/perl5/site_perl/5.8.6/i686-linux); print "trying to connect to database \n"; my $database = 'antimicro_peptides'; my $host = 'ppc7.bio.ucalgary.ca'; my $user = 'Hubert'; my $password = 'Col00eng30'; my $mysql = Net::MySQL->new( hostname => $host, database => $database, user => $user, password => $password, ); print "Connection established \n"; my $selectID = 0; my $count = 0; ##output database results #while (my @row = $sth->fetchrow_array) # { print "@row\n" } print "start program\n"; my $directory = '/home/Hubert/test'; opendir(DIR, $directory) || die("Cannot open directory"); print "opened directory\n"; foreach my $file (readdir(DIR)) { if ($file =~ /txt$/) { $count++; print "read file $file \n"; $file = $directory . '/' . $file; my $search = new Bio::SearchIO (-format => 'blast', -file => $file); print "bioperl seems to work....\n"; my $cutoff_len = 10; #iterate over each query sequence print "try to enter while loop\n"; while (my $result = $search->next_result) { print "entered 1st while loop\n"; #iterate over each hit on the query sequence while (my $hit = $result->next_hit) { print "entered 2nd while loop\n"; #iterate over each HSP in the hit while (my $hsp = $hit->next_hsp) { print "entered 3rd while loop\n"; if ($hsp->length('sbjct') <= $cutoff_len) { #print $hsp->hit_string, "\n"; for ($hsp->hit_string) { #$hsp->hit_string print "count files....., $count ,\n"; ................. =================================================================== Output: [Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl trying to connect to database Connection established start program opened directory read file 40026.txt bioperl seems to work.... try to enter while loop but it doesn't enter the first while loop, it stuck there, first I thought it is a linux problem, because I updated from FC4 to FC5, but it isn't because perl is working fine, and it seems bioperl is working fine too, but it cannot parse the file anymore..... regards Hubert From barry.moore at genetics.utah.edu Thu May 4 21:22:51 2006 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu, 4 May 2006 15:22:51 -0600 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <445A578E.8050207@gmx.at> References: <445A578E.8050207@gmx.at> Message-ID: Hubert, My first suggestion would be to log onto your calgary server and change your password real quick (unless that is intended to post you password to the world). Well, this isn't an answer, but it may help you find one. Use perl -d your_script.pl to run your script under the debugger. Type 'n' to step forward to the line where you start the while loop. Type 'x $result' to see that an object exists (it should or you'd have gotten an error). Type 's' to step into the next_results call, and then continue to type 'n' and 's' as needed to burrow down to see if you can find where you're hanging. Barry On May 4, 2006, at 1:35 PM, Hubert Prielinger wrote: > Hi, > the following perl script worked fine until a few days ago.... > > ============================================================== > #!/usr/bin/perl -w > > use Bio::SearchIO; > use strict; > use DBI; > use Net::MySQL; > > #use lib qw(/usr/local/lib/perl5/site_perl/5.8.6/i686-linux); > > print "trying to connect to database \n"; > my $database = 'antimicro_peptides'; > my $host = 'ppc7.bio.ucalgary.ca'; > my $user = 'Hubert'; > my $password = 'Col00eng30'; > > my $mysql = Net::MySQL->new( > hostname => $host, > database => $database, > user => $user, > password => $password, > ); > > > print "Connection established \n"; > > my $selectID = 0; > my $count = 0; > > > > ##output database results > #while (my @row = $sth->fetchrow_array) > # { print "@row\n" } > > > > print "start program\n"; > my $directory = '/home/Hubert/test'; > opendir(DIR, $directory) || die("Cannot open directory"); > print "opened directory\n"; > > foreach my $file (readdir(DIR)) { > if ($file =~ /txt$/) { > $count++; > print "read file $file \n"; > > > $file = $directory . '/' . $file; > > my $search = new Bio::SearchIO (-format => 'blast', > -file => $file); > print "bioperl seems to work....\n"; > my $cutoff_len = 10; > > #iterate over each query sequence > print "try to enter while loop\n"; > while (my $result = $search->next_result) { > print "entered 1st while loop\n"; > > #iterate over each hit on the query sequence > while (my $hit = $result->next_hit) { > print "entered 2nd while loop\n"; > > #iterate over each HSP in the hit > while (my $hsp = $hit->next_hsp) { > print "entered 3rd while loop\n"; > > if ($hsp->length('sbjct') <= $cutoff_len) { > #print $hsp->hit_string, "\n"; > > for ($hsp->hit_string) { #$hsp->hit_string > print "count files....., $count ,\n"; > ................. > > =================================================================== > > Output: > > [Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl > trying to connect to database > Connection established > start program > opened directory > read file 40026.txt > bioperl seems to work.... > try to enter while loop > > > but it doesn't enter the first while loop, it stuck there, first I > thought it is a linux problem, because I updated from FC4 to FC5, > but it > isn't because perl is working fine, and it seems bioperl is working > fine > too, but it cannot parse the file anymore..... > > regards > Hubert > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Thu May 4 22:27:57 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 4 May 2006 17:27:57 -0500 Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps In-Reply-To: <000001c66faa$3a25b130$15327e82@pyrimidine> Message-ID: <000001c66fc9$fe7e5680$15327e82@pyrimidine> Here's another odd bit. This is what I get for the CONTIG line when I passed a simple contig file (NW_925062, with one join) through Bio::SeqIO: ----------------------------------- .... FEATURES Location/Qualifiers source 1..8541 /db_xref="taxon:9606" /mol_type="genomic DNA" /chromosome="11" /organism="Homo sapiens" CONTIG AADB02014027.1:1..8541 // ----------------------------------- Here's the original: ----------------------------------- FEATURES Location/Qualifiers source 1..8541 /organism="Homo sapiens" /mol_type="genomic DNA" /db_xref="taxon:9606" /chromosome="11" CONTIG join(AADB02014027.1:1..8541) // ----------------------------------- Looks like it lopped out the 'join' here as well. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Thursday, May 04, 2006 1:41 PM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > Are you using the CONTIG record or the full GenBank file? I see > problems with both (using bioperl-live) which seem unrelated to one > another. > The full file seems to be running a bit slow b/c the full GenBank record > is > huge (~55 MB) but the CONTIG file does exactly what you said (runs out of > memory). > > Chris > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Michael Rogoff > > Sent: Tuesday, May 02, 2006 10:32 PM > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > > > > > I've encountered a pretty serious bug in Bio::SeqIO when parsing certain > > genbank > > files that contain CONTIG entries with gaps. One such record is > > NW_925173. > > > > When I try to parse this file using Bio::SeqIO::genbank, it will enter > an > > infinite loop and spin until it runs out of memory. > > > > I'm pretty certain it relates to this bug: > > http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to indicate > > that > > genbank records with CONTIG gaps are not valid and can't be parsed. But > > this > > bug actually claims to be fixed, which is strange, since looking at the > > code for > > FTLocationFactory (where the loop is) it's still right there. I assume > > that > > this may be fixed in other contexts but is still not fixed in > > Bio::SeqIO::genbank? Or am I doing something wrong? > > > > I think that this should probably be filed as an open bug. I would > think > > that > > even if bioperl isn't interested in parsing this type of file via SeqIO, > > certainly you'd want to ensure that no finite input file would send the > > parser > > into an infinite loop. Have others encountered this problem? Is there > > any plan > > to address it? > > > > Thanks very much for any information or help! > > > > -Mike > > > > P.S. I've played around with my version of FTLocationFactory and it > seems > > to > > actually work and parse the gaps. I'm not sure if I've created other > bugs > > or if > > it works in all cases, but at least the parser doesn't die. I also > don't > > know > > that my hacky code is appropriate for putting back in to BioPerl, but > I'm > > happy > > to provide it if someone wants to check it out and/or consider it for > > checkin. > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Thu May 4 22:39:05 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 4 May 2006 18:39:05 -0400 Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps In-Reply-To: <000001c66fc9$fe7e5680$15327e82@pyrimidine> References: <000001c66fc9$fe7e5680$15327e82@pyrimidine> Message-ID: <2E0D7723-FA6E-4812-8DBB-30FCD11FA85C@gmx.net> The two notations are equivalent and syntactically correct, or so I believe ... I don't think 100% verbatim preservation should be the goal. Or am I missing the point? On May 4, 2006, at 6:27 PM, Chris Fields wrote: > Here's another odd bit. This is what I get for the CONTIG line when I > passed a simple contig file (NW_925062, with one join) through > Bio::SeqIO: > > ----------------------------------- > .... > FEATURES Location/Qualifiers > source 1..8541 > /db_xref="taxon:9606" > /mol_type="genomic DNA" > /chromosome="11" > /organism="Homo sapiens" > CONTIG AADB02014027.1:1..8541 > > // > ----------------------------------- > Here's the original: > ----------------------------------- > FEATURES Location/Qualifiers > source 1..8541 > /organism="Homo sapiens" > /mol_type="genomic DNA" > /db_xref="taxon:9606" > /chromosome="11" > CONTIG join(AADB02014027.1:1..8541) > // > ----------------------------------- > > Looks like it lopped out the 'join' here as well. > > Chris > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Chris Fields >> Sent: Thursday, May 04, 2006 1:41 PM >> To: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps >> >> Are you using the CONTIG record or the full GenBank file? I see >> problems with both (using bioperl-live) which seem unrelated to one >> another. >> The full file seems to be running a bit slow b/c the full GenBank >> record >> is >> huge (~55 MB) but the CONTIG file does exactly what you said (runs >> out of >> memory). >> >> Chris >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff >>> Sent: Tuesday, May 02, 2006 10:32 PM >>> To: bioperl-l at lists.open-bio.org >>> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps >>> >>> >>> I've encountered a pretty serious bug in Bio::SeqIO when parsing >>> certain >>> genbank >>> files that contain CONTIG entries with gaps. One such record is >>> NW_925173. >>> >>> When I try to parse this file using Bio::SeqIO::genbank, it will >>> enter >> an >>> infinite loop and spin until it runs out of memory. >>> >>> I'm pretty certain it relates to this bug: >>> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to >>> indicate >>> that >>> genbank records with CONTIG gaps are not valid and can't be >>> parsed. But >>> this >>> bug actually claims to be fixed, which is strange, since looking >>> at the >>> code for >>> FTLocationFactory (where the loop is) it's still right there. I >>> assume >>> that >>> this may be fixed in other contexts but is still not fixed in >>> Bio::SeqIO::genbank? Or am I doing something wrong? >>> >>> I think that this should probably be filed as an open bug. I would >> think >>> that >>> even if bioperl isn't interested in parsing this type of file via >>> SeqIO, >>> certainly you'd want to ensure that no finite input file would >>> send the >>> parser >>> into an infinite loop. Have others encountered this problem? Is >>> there >>> any plan >>> to address it? >>> >>> Thanks very much for any information or help! >>> >>> -Mike >>> >>> P.S. I've played around with my version of FTLocationFactory and it >> seems >>> to >>> actually work and parse the gaps. I'm not sure if I've created >>> other >> bugs >>> or if >>> it works in all cases, but at least the parser doesn't die. I also >> don't >>> know >>> that my hacky code is appropriate for putting back in to BioPerl, >>> but >> I'm >>> happy >>> to provide it if someone wants to check it out and/or consider it >>> for >>> checkin. >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hubert.prielinger at gmx.at Thu May 4 23:57:44 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Thu, 04 May 2006 17:57:44 -0600 Subject: [Bioperl-l] can't parse blast file anymore In-Reply-To: <445A7449.1080607@infotech.monash.edu.au> References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au> Message-ID: <445A94F8.9000903@gmx.at> Torsten Seemann wrote: > Hubert > >> the following perl script worked fine until a few days ago.... >> >> #iterate over each query sequence >> print "try to enter while loop\n"; >> >> > die "Bad BLAST report" if not defined $search; > >> while (my $result = $search->next_result) { >> print "entered 1st while loop\n"; >> >> Output: >> >> [Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl >> try to enter while loop >> >> but it doesn't enter the first while loop, it stuck there, first I >> > What is the value of $search before you start the WHILE loop ? > > hi, $search is defined, like my $search = new Bio::SearchIO (-format => 'blast', -file => $file) if I try it with the debugger as barry has suggested than I get the following DB<1> n main::(Blast.pl:24): print "Connection established \n"; DB<1> n Connection established main::(Blast.pl:26): my $selectID = 0; DB<1> n main::(Blast.pl:27): my $count = 0; DB<1> n main::(Blast.pl:37): print "start program\n"; DB<1> n start program main::(Blast.pl:38): my $directory = '/home/Hubert/test'; DB<1> n main::(Blast.pl:39): opendir(DIR, $directory) || die("Cannot open directory"); DB<1> n main::(Blast.pl:40): print "opened directory\n"; DB<1> n opened directory main::(Blast.pl:42): foreach my $file (readdir(DIR)) { DB<1> n main::(Blast.pl:43): if ($file =~ /txt$/) { DB<1> n main::(Blast.pl:44): $count++; DB<1> n main::(Blast.pl:45): print "read file $file \n"; DB<1> n read file 40026.txt main::(Blast.pl:48): $file = $directory . '/' . $file; DB<1> n main::(Blast.pl:50): my $search = new Bio::SearchIO (-format => 'blast', main::(Blast.pl:51): -file => $file); DB<1> n main::(Blast.pl:52): print "bioperl seems to work....\n"; DB<1> s $search main::((eval 14)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3): 3: $search; DB<<2>> n DB<2> n bioperl seems to work.... main::(Blast.pl:53): my $cutoff_len = 10; DB<2> n main::(Blast.pl:56): print "try to enter while loop\n"; DB<2> n try to enter while loop main::(Blast.pl:57): while (my $result = $search->next_result) { DB<2> s $result main::((eval 15)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3): 3: $result; DB<<3>> From torsten.seemann at infotech.monash.edu.au Thu May 4 21:38:17 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 05 May 2006 07:38:17 +1000 Subject: [Bioperl-l] can't parse blast file anymore In-Reply-To: <445A578E.8050207@gmx.at> References: <445A578E.8050207@gmx.at> Message-ID: <445A7449.1080607@infotech.monash.edu.au> Hubert >the following perl script worked fine until a few days ago.... > > #iterate over each query sequence > print "try to enter while loop\n"; > > die "Bad BLAST report" if not defined $search; > while (my $result = $search->next_result) { > print "entered 1st while loop\n"; > >Output: > >[Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl >try to enter while loop > >but it doesn't enter the first while loop, it stuck there, first I > > What is the value of $search before you start the WHILE loop ? From barry.moore at genetics.utah.edu Fri May 5 00:39:57 2006 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu, 4 May 2006 18:39:57 -0600 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <445A94F8.9000903@gmx.at> References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au> <445A94F8.9000903@gmx.at> Message-ID: <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu> That should be 'x $resust' and you should see the object dumped to the screen. or just 's' by itself which will step you into the sub on the while line will step you into the next_result sub, and you can look around and watch what's happening. B > DB<2> s $result > main::((eval 15)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3): > 3: $result; > DB<<3>> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Fri May 5 02:04:20 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Thu, 04 May 2006 20:04:20 -0600 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu> References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au> <445A94F8.9000903@gmx.at> <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu> Message-ID: <445AB2A4.7020405@gmx.at> if I do so it returns: 0 undef Barry Moore wrote: > That should be 'x $resust' and you should see the object dumped to > the screen. > > or just 's' by itself which will step you into the sub on the while > line will step you into the next_result sub, and you can look around > and watch what's happening. > > B > > >> DB<2> s $result >> main::((eval 15)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3): >> 3: $result; >> DB<<3>> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From torsten.seemann at infotech.monash.edu.au Fri May 5 04:40:34 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 05 May 2006 14:40:34 +1000 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <445AB2A4.7020405@gmx.at> References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au> <445A94F8.9000903@gmx.at> <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu> <445AB2A4.7020405@gmx.at> Message-ID: <445AD742.4070408@infotech.monash.edu.au> Hubert Prielinger wrote: > if I do so it returns: > 0 undef That means the value of $search was undef. That means that it could not parse or open the BLAST report. I repeat the line that I put in my earlier email which you ignored. # your line my $search = Bio::SearchIO->new( ..... ); # then check if it was successful! die "could not open blast report" if not defined $search; --Torsten From jason.stajich at duke.edu Fri May 5 13:21:38 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri, 5 May 2006 09:21:38 -0400 Subject: [Bioperl-l] bioperl-AlignIO problems parsing fasta files In-Reply-To: <5d35ac4d.b35ce863.8198d00@expms1.cites.uiuc.edu> References: <5d35ac4d.b35ce863.8198d00@expms1.cites.uiuc.edu> Message-ID: Space after the > is causing the problem since we infer the ID as the everything after the '>' BEFORE the first whitespace. Get rid of the space. $ perl -i.backup -p -e 's/^>\s+/>/' YOURFASALNFILE On May 4, 2006, at 7:00 PM, Gloria Rendon wrote: > contents of the input file has a single sequence: > >> gi|90108701|pdb|2AHZ|B Chain B, K+ Complex Of The Nak Channel > MLSFLLTLKRMLRACLRAWKDKEFQVLFVLTILTLISGTIFYSTVEGLRPIDALYFSVVTLTTVGDGNFS > PQTDFGKIFTILYIFIGIGLVFGFIHKLAVNVQLPSILSN > ------------------------------------------ > this is the script that tries to parse it: > > use Bio::AlignIO; > my $inseq = Bio::AlignIO->new(-format => 'fasta', > -file => 'test.fasta'); > while( my $aln = $inseq->next_aln ) { > print "name: ", $aln->displayname; > print "length: ", $aln->length; > print "\n"; > } > > ------------------------------------------ > and this is the result of running that script on winxp > > D:\msa\NAK MUTANTS>perl parseFasta.pl > > > ------------- EXCEPTION ------------- > MSG: No sequence with name [] > STACK Bio::SimpleAlign::displayname > C:/Perl/site/lib/Bio/SimpleAlign.pm:2047 > STACK toplevel parseFasta.pl:11 > > -------------------------------------- > D:\msa\NAK MUTANTS> -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From thoufek at pngg.org Thu May 4 16:50:44 2006 From: thoufek at pngg.org (T.D. Houfek) Date: Thu, 04 May 2006 12:50:44 -0400 Subject: [Bioperl-l] Bio::Seq::Quality description line problem In-Reply-To: References: Message-ID: <445A30E4.6070103@pngg.org> Using Bioperl 1.5, having trouble with writing FASTA-style quality files using Bio::Seq::Quality. I create the Bio::Seq::Quality object, giving its constructor an ID, a description, a nucleotide sequence, and a quality sequence. I then write the sequence FASTA and the quality FASTA. The description string will appear in the header line of the sequence FASTA, but not in the header line of the quality FASTA. Can anybody help me figure out how to fix this? I've attached a sample script and output. -T.D. ------------------- sample script follows --------------------------------------- #!/usr/bin/perl use strict; use Bio::Seq::Quality; use Bio::SeqIO; my $id = "bogus_id"; my $desc = "bogus description"; my $seq = "ATTATTATTATTATT"; my $qual = "10 20 30 10 20 30 10 20 30 10 20 30 10 20 30"; my $sequal_obj = Bio::Seq::Quality->new( -display_id => $id, -desc => $desc, -seq => $seq, -qual => $qual ); my $qualout = Bio::SeqIO->new( -file => ">myfile.qual", -format => 'qual' ); my $seqout = Bio::SeqIO->new( -file => ">myfile.seq", -format => 'Fasta' ); $seqout->write_seq($sequal_obj); $qualout->write_seq($sequal_obj); ------------------ sample output follows --------------------------------------- tdhoufek at aether:~$ cat myfile.seq >bogus_id bogus description ATTATTATTATTATT tdhoufek at aether:~$ cat myfile.qual >bogus_id 10 20 30 10 20 30 10 20 30 10 20 30 10 20 30 -------------------------------------------------------------------------------------------------- -- T.D. Houfek senior bioinformatics developer plant nematode genetics group north carolina state university Email: thoufek at pngg.org ---------------------------------------------------------- use Bio::Seq; @a =qw/NNN CCT GAG CAT GCG TGT AAG AAC TAG/; $u=seq;$r=Bio::Seq;sub c{$c=$r->new(-$u=>"@_[0]")->revcom; $t=$c->$u;}map{m/\d/?$g=c($a[$_]):tr/a-i/1-9/&&($g=$a[$_]) ;$x[$i++]=$g;} split //,"dgh5cb40ab120cdefb4";$z=$r->new(- $u=>(join"", at x))->translate()->$u;$z =~s/X/ /g;print"$z\n" From jason.stajich at duke.edu Fri May 5 13:27:51 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri, 5 May 2006 09:27:51 -0400 Subject: [Bioperl-l] bioperl-AlignIO problems parsing fasta files In-Reply-To: References: <5d35ac4d.b35ce863.8198d00@expms1.cites.uiuc.edu> Message-ID: <0F79C9AD-DE36-4424-9E59-37ABE8B62A5E@duke.edu> [replying to myself] although if you are trying to just read a sequence not an alignment then you want to use Bio::SeqIO. See the copious help on the HOWTO page at bioperl website including a sequence and feature howto and beginner's guide. http://bioperl.org/wiki/HOWTOs -jason On May 5, 2006, at 9:21 AM, Jason Stajich wrote: > Space after the > is causing the problem since we infer the ID as the > everything after the '>' BEFORE the first whitespace. Get rid of the > space. > $ perl -i.backup -p -e 's/^>\s+/>/' YOURFASALNFILE > > On May 4, 2006, at 7:00 PM, Gloria Rendon wrote: > >> contents of the input file has a single sequence: >> >>> gi|90108701|pdb|2AHZ|B Chain B, K+ Complex Of The Nak Channel >> MLSFLLTLKRMLRACLRAWKDKEFQVLFVLTILTLISGTIFYSTVEGLRPIDALYFSVVTLTTVGDGNF >> S >> PQTDFGKIFTILYIFIGIGLVFGFIHKLAVNVQLPSILSN >> ------------------------------------------ >> this is the script that tries to parse it: >> >> use Bio::AlignIO; >> my $inseq = Bio::AlignIO->new(-format => 'fasta', >> -file => 'test.fasta'); >> while( my $aln = $inseq->next_aln ) { >> print "name: ", $aln->displayname; >> print "length: ", $aln->length; >> print "\n"; >> } >> >> ------------------------------------------ >> and this is the result of running that script on winxp >> >> D:\msa\NAK MUTANTS>perl parseFasta.pl >> >> >> ------------- EXCEPTION ------------- >> MSG: No sequence with name [] >> STACK Bio::SimpleAlign::displayname >> C:/Perl/site/lib/Bio/SimpleAlign.pm:2047 >> STACK toplevel parseFasta.pl:11 >> >> -------------------------------------- >> D:\msa\NAK MUTANTS> > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From osborne1 at optonline.net Fri May 5 14:04:02 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 05 May 2006 10:04:02 -0400 Subject: [Bioperl-l] Bio::Seq::Quality description line problem In-Reply-To: <445A30E4.6070103@pngg.org> Message-ID: T.D., According to the documentation, http://www.bioperl.org/wiki/Qual_sequence_format, your *qual file looks right. What are you trying to create? Brian O. On 5/4/06 12:50 PM, "T.D. Houfek" wrote: > Using Bioperl 1.5, having trouble with writing FASTA-style quality files > using Bio::Seq::Quality. > > I create the Bio::Seq::Quality object, giving its constructor an ID, a > description, a nucleotide sequence, and a quality sequence. I then write > the sequence FASTA and the quality FASTA. The description string will > appear in the header line of the sequence FASTA, but not in the header > line of the quality FASTA. > > Can anybody help me figure out how to fix this? I've attached a sample > script and output. > > -T.D. > > ------------------- sample script follows > --------------------------------------- > > #!/usr/bin/perl > use strict; > use Bio::Seq::Quality; > use Bio::SeqIO; > > my $id = "bogus_id"; > my $desc = "bogus description"; > my $seq = "ATTATTATTATTATT"; > my $qual = "10 20 30 10 20 30 10 20 30 10 20 30 10 20 30"; > > my $sequal_obj = Bio::Seq::Quality->new( > -display_id => $id, > -desc => $desc, > -seq => $seq, > -qual => $qual > ); > > my $qualout = Bio::SeqIO->new( > -file => ">myfile.qual", > -format => 'qual' > ); > my $seqout = Bio::SeqIO->new( > -file => ">myfile.seq", > -format => 'Fasta' > ); > > $seqout->write_seq($sequal_obj); > $qualout->write_seq($sequal_obj); > > > ------------------ sample output follows > --------------------------------------- > > tdhoufek at aether:~$ cat myfile.seq >> bogus_id bogus description > ATTATTATTATTATT > tdhoufek at aether:~$ cat myfile.qual >> bogus_id > 10 20 30 10 20 30 10 20 30 10 20 30 10 20 30 > > ------------------------------------------------------------------------------ > -------------------- > > > From cjfields at uiuc.edu Fri May 5 14:24:05 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 May 2006 09:24:05 -0500 Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps In-Reply-To: <2E0D7723-FA6E-4812-8DBB-30FCD11FA85C@gmx.net> Message-ID: <001701c6704f$90dbd090$15327e82@pyrimidine> I'm not sure it's a valid CONTIG file w/o the join(...). This is a chunk from the longer file Michael used as an example here (NW_925173). I believe the CONTIG line is currently handled like a feature so I think it goes through Bio::SeqIO::FTHelper, which is where Michael mentions his bugfix is; I think it's getting beaten up in there somehow. I may see what happens if it's treated like a WGS line (like a Bio::Annotation::SimpleValue object) and just glob the whole mess together as is. Chris ... FEATURES Location/Qualifiers source 1..44976370 /organism="Homo sapiens" /mol_type="genomic DNA" /db_xref="taxon:9606" /chromosome="11" CONTIG join(AADB02014316.1:1..1482320,gap(67),AADB02014317.1:1..577321, gap(441),AADB02014318.1:1..173584,gap(676), AADB02014319.1:1..377558,gap(20), complement(AADB02014320.1:1..431263),gap(20), AADB02014321.1:1..794957,gap(1241),AADB02014322.1:1..1366198, gap(6446),AADB02014323.1:1..3366,gap(20),AADB02014324.1:1..4771, gap(4611),AADB02014325.1:1..383881,gap(20), complement(AADB02014326.1:1..381633),gap(1930), complement(AADB02014327.1:1..460053),gap(20), AADB02014328.1:1..4186,gap(1587), ... > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > Sent: Thursday, May 04, 2006 5:39 PM > To: Chris Fields > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > The two notations are equivalent and syntactically correct, or so I > believe ... I don't think 100% verbatim preservation should be the > goal. Or am I missing the point? > > On May 4, 2006, at 6:27 PM, Chris Fields wrote: > > > Here's another odd bit. This is what I get for the CONTIG line when I > > passed a simple contig file (NW_925062, with one join) through > > Bio::SeqIO: > > > > ----------------------------------- > > .... > > FEATURES Location/Qualifiers > > source 1..8541 > > /db_xref="taxon:9606" > > /mol_type="genomic DNA" > > /chromosome="11" > > /organism="Homo sapiens" > > CONTIG AADB02014027.1:1..8541 > > > > // > > ----------------------------------- > > Here's the original: > > ----------------------------------- > > FEATURES Location/Qualifiers > > source 1..8541 > > /organism="Homo sapiens" > > /mol_type="genomic DNA" > > /db_xref="taxon:9606" > > /chromosome="11" > > CONTIG join(AADB02014027.1:1..8541) > > // > > ----------------------------------- > > > > Looks like it lopped out the 'join' here as well. > > > > Chris > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Chris Fields > >> Sent: Thursday, May 04, 2006 1:41 PM > >> To: bioperl-l at lists.open-bio.org > >> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > >> > >> Are you using the CONTIG record or the full GenBank file? I see > >> problems with both (using bioperl-live) which seem unrelated to one > >> another. > >> The full file seems to be running a bit slow b/c the full GenBank > >> record > >> is > >> huge (~55 MB) but the CONTIG file does exactly what you said (runs > >> out of > >> memory). > >> > >> Chris > >> > >>> -----Original Message----- > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff > >>> Sent: Tuesday, May 02, 2006 10:32 PM > >>> To: bioperl-l at lists.open-bio.org > >>> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > >>> > >>> > >>> I've encountered a pretty serious bug in Bio::SeqIO when parsing > >>> certain > >>> genbank > >>> files that contain CONTIG entries with gaps. One such record is > >>> NW_925173. > >>> > >>> When I try to parse this file using Bio::SeqIO::genbank, it will > >>> enter > >> an > >>> infinite loop and spin until it runs out of memory. > >>> > >>> I'm pretty certain it relates to this bug: > >>> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to > >>> indicate > >>> that > >>> genbank records with CONTIG gaps are not valid and can't be > >>> parsed. But > >>> this > >>> bug actually claims to be fixed, which is strange, since looking > >>> at the > >>> code for > >>> FTLocationFactory (where the loop is) it's still right there. I > >>> assume > >>> that > >>> this may be fixed in other contexts but is still not fixed in > >>> Bio::SeqIO::genbank? Or am I doing something wrong? > >>> > >>> I think that this should probably be filed as an open bug. I would > >> think > >>> that > >>> even if bioperl isn't interested in parsing this type of file via > >>> SeqIO, > >>> certainly you'd want to ensure that no finite input file would > >>> send the > >>> parser > >>> into an infinite loop. Have others encountered this problem? Is > >>> there > >>> any plan > >>> to address it? > >>> > >>> Thanks very much for any information or help! > >>> > >>> -Mike > >>> > >>> P.S. I've played around with my version of FTLocationFactory and it > >> seems > >>> to > >>> actually work and parse the gaps. I'm not sure if I've created > >>> other > >> bugs > >>> or if > >>> it works in all cases, but at least the parser doesn't die. I also > >> don't > >>> know > >>> that my hacky code is appropriate for putting back in to BioPerl, > >>> but > >> I'm > >>> happy > >>> to provide it if someone wants to check it out and/or consider it > >>> for > >>> checkin. > >>> > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Fri May 5 14:47:50 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 5 May 2006 10:47:50 -0400 Subject: [Bioperl-l] Bio::Seq::Quality description line problem In-Reply-To: References: Message-ID: <2E1683FE-57E4-4D97-A958-1B529973E89E@gmx.net> He wants the description on the description line, like for the sequence file. Thomas, my guess is the code doesn't print the description to the line although I haven't made sure. Do you want to volunteer and check, add that print statement and post the patch? -hilmar On May 5, 2006, at 10:04 AM, Brian Osborne wrote: > T.D., > > According to the documentation, > http://www.bioperl.org/wiki/Qual_sequence_format, your *qual file > looks > right. What are you trying to create? > > Brian O. > > > On 5/4/06 12:50 PM, "T.D. Houfek" wrote: > >> Using Bioperl 1.5, having trouble with writing FASTA-style quality >> files >> using Bio::Seq::Quality. >> >> I create the Bio::Seq::Quality object, giving its constructor an >> ID, a >> description, a nucleotide sequence, and a quality sequence. I then >> write >> the sequence FASTA and the quality FASTA. The description string will >> appear in the header line of the sequence FASTA, but not in the >> header >> line of the quality FASTA. >> >> Can anybody help me figure out how to fix this? I've attached a >> sample >> script and output. >> >> -T.D. >> >> ------------------- sample script follows >> --------------------------------------- >> >> #!/usr/bin/perl >> use strict; >> use Bio::Seq::Quality; >> use Bio::SeqIO; >> >> my $id = "bogus_id"; >> my $desc = "bogus description"; >> my $seq = "ATTATTATTATTATT"; >> my $qual = "10 20 30 10 20 30 10 20 30 10 20 30 10 20 30"; >> >> my $sequal_obj = Bio::Seq::Quality->new( >> -display_id => $id, >> -desc => $desc, >> -seq => $seq, >> -qual => $qual >> ); >> >> my $qualout = Bio::SeqIO->new( >> -file => ">myfile.qual", >> -format => 'qual' >> ); >> my $seqout = Bio::SeqIO->new( >> -file => ">myfile.seq", >> -format => 'Fasta' >> ); >> >> $seqout->write_seq($sequal_obj); >> $qualout->write_seq($sequal_obj); >> >> >> ------------------ sample output follows >> --------------------------------------- >> >> tdhoufek at aether:~$ cat myfile.seq >>> bogus_id bogus description >> ATTATTATTATTATT >> tdhoufek at aether:~$ cat myfile.qual >>> bogus_id >> 10 20 30 10 20 30 10 20 30 10 20 30 10 20 30 >> >> --------------------------------------------------------------------- >> --------- >> -------------------- >> >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From dmessina at wustl.edu Fri May 5 15:24:47 2006 From: dmessina at wustl.edu (David Messina) Date: Fri, 5 May 2006 10:24:47 -0500 Subject: [Bioperl-l] Bio::Seq::Quality description line problem In-Reply-To: <445A30E4.6070103@pngg.org> References: <445A30E4.6070103@pngg.org> Message-ID: <5A549C57-A310-4623-BC44-787AC8BFD6C2@wustl.edu> Apologies if this is a repost -- mail troubles this morning. Hilmar is correct. From a cursory walk through the code in a debugger, it looks like Bio::SeqIO::qual's write_seq method doesn't read the 'desc' out of the Bio::Seq::Quality object. I think there should be something like this: if ($source->can('desc') and my $desc = $source->desc()) { $desc =~ s/\n//g; } $header .= " $desc"; before line 218 in Bio::SeqIO::qual (where the header is printed): $self->_print (">$header \n"); Dave From dmessina at wustl.edu Fri May 5 14:53:15 2006 From: dmessina at wustl.edu (David Messina) Date: Fri, 5 May 2006 09:53:15 -0500 Subject: [Bioperl-l] Bio::Seq::Quality description line problem In-Reply-To: <445A30E4.6070103@pngg.org> References: <445A30E4.6070103@pngg.org> Message-ID: T.D., From a cursory walk through your code in a debugger, it looks like Bio::SeqIO::qual's write_seq method doesn't read the 'desc' out of the Bio::Seq::Quality object. I think there should be something like this: if ($source->can('desc') and my $desc = $source->desc()) { $desc =~ s/\n//g; } $header .= " $desc"; before line 218 in Bio::SeqIO::qual (where the header is printed): $self->_print (">$header \n"); Dave From dmessina at wustl.edu Fri May 5 14:53:15 2006 From: dmessina at wustl.edu (David Messina) Date: Fri, 5 May 2006 09:53:15 -0500 Subject: [Bioperl-l] Bio::Seq::Quality description line problem In-Reply-To: <445A30E4.6070103@pngg.org> References: <445A30E4.6070103@pngg.org> Message-ID: T.D., From a cursory walk through your code in a debugger, it looks like Bio::SeqIO::qual's write_seq method doesn't read the 'desc' out of the Bio::Seq::Quality object. I think there should be something like this: if ($source->can('desc') and my $desc = $source->desc()) { $desc =~ s/\n//g; } $header .= " $desc"; before line 218 in Bio::SeqIO::qual (where the header is printed): $self->_print (">$header \n"); Dave From hubert.prielinger at gmx.at Fri May 5 18:30:24 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 05 May 2006 12:30:24 -0600 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <445AD742.4070408@infotech.monash.edu.au> References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au> <445A94F8.9000903@gmx.at> <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu> <445AB2A4.7020405@gmx.at> <445AD742.4070408@infotech.monash.edu.au> Message-ID: <445B99C0.6050407@gmx.at> hi, I have done, as you suggested and I got the error message: Can't call method "next_result" on an undefined value at.... then I looked up at the internet and found a thread which suggested to use strict and then the problem is solved.... but I'm already using use strict.. thanks Torsten Seemann wrote: > Hubert Prielinger wrote: > >> if I do so it returns: >> 0 undef >> > > That means the value of $search was undef. > That means that it could not parse or open the BLAST report. > I repeat the line that I put in my earlier email which you ignored. > > # your line > my $search = Bio::SearchIO->new( ..... ); > > # then check if it was successful! > die "could not open blast report" if not defined $search; > > --Torsten > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From cjfields at uiuc.edu Fri May 5 19:18:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 May 2006 14:18:16 -0500 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <445B99C0.6050407@gmx.at> Message-ID: <000001c67078$a9a7ca10$15327e82@pyrimidine> What happens if you add the verbose flag? my $search = new Bio::SearchIO (-verbose => 1, -format => 'blast', -file => $file); Added thought : you might want to look at File::Find for stepping through your files and performing a task on each one, such as parsing output. It changes into the working directory each time; you should be able to do something like this: use File::Find; use Bio::SearchIO; Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger > Sent: Friday, May 05, 2006 1:30 PM > To: Torsten Seemann; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore > > hi, > I have done, as you suggested and I got the error message: > > Can't call method "next_result" on an undefined value at.... > > then I looked up at the internet and found a thread which suggested to > use strict and then the problem is solved.... > but I'm already using use strict.. > > thanks > > Torsten Seemann wrote: > > Hubert Prielinger wrote: > > > >> if I do so it returns: > >> 0 undef > >> > > > > That means the value of $search was undef. > > That means that it could not parse or open the BLAST report. > > I repeat the line that I put in my earlier email which you ignored. > > > > # your line > > my $search = Bio::SearchIO->new( ..... ); > > > > # then check if it was successful! > > die "could not open blast report" if not defined $search; > > > > --Torsten > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri May 5 19:27:12 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 May 2006 14:27:12 -0500 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <445B99C0.6050407@gmx.at> Message-ID: <000101c67079$e8c86a00$15327e82@pyrimidine> Sorry, mail got sent before I finished it! Here I go again... What happens if you add the verbose flag? my $search = new Bio::SearchIO (-verbose => 1, -format => 'blast', -file => $file); Added thought : you might want to look at File::Find for stepping through your files and performing a task on each one, such as parsing output. It changes into the working directory each time; you should be able to do something like this: use File::Find; use Bio::SearchIO; my @dirlist = ("/home/Hubert/test"); find (\&dir, @dirlist); sub printdir { return unless /txt$/; return if (-d); my $parser = Bio::SearchIO->new(-file => $_, -format => 'blast'); while (my $result = $parser->next_result) { while (my $hit = $result->next_hit) { while (my $hsp = $hit->next_hsp) { # do stuff here } } } } Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger > Sent: Friday, May 05, 2006 1:30 PM > To: Torsten Seemann; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore > > hi, > I have done, as you suggested and I got the error message: > > Can't call method "next_result" on an undefined value at.... > > then I looked up at the internet and found a thread which suggested to > use strict and then the problem is solved.... > but I'm already using use strict.. > > thanks > > Torsten Seemann wrote: > > Hubert Prielinger wrote: > > > >> if I do so it returns: > >> 0 undef > >> > > > > That means the value of $search was undef. > > That means that it could not parse or open the BLAST report. > > I repeat the line that I put in my earlier email which you ignored. > > > > # your line > > my $search = Bio::SearchIO->new( ..... ); > > > > # then check if it was successful! > > die "could not open blast report" if not defined $search; > > > > --Torsten > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From barry.moore at genetics.utah.edu Fri May 5 19:39:37 2006 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri, 5 May 2006 13:39:37 -0600 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <445B99C0.6050407@gmx.at> References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au> <445A94F8.9000903@gmx.at> <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu> <445AB2A4.7020405@gmx.at> <445AD742.4070408@infotech.monash.edu.au> <445B99C0.6050407@gmx.at> Message-ID: <7F3D73A6-392E-4728-ACB9-FD3BEDFD3C18@genetics.utah.edu> Hubert- If you want to send me your script and input file I'll try to have a look at it. Barry On May 5, 2006, at 12:30 PM, Hubert Prielinger wrote: > hi, > I have done, as you suggested and I got the error message: > > Can't call method "next_result" on an undefined value at.... > > then I looked up at the internet and found a thread which suggested to > use strict and then the problem is solved.... > but I'm already using use strict.. > > thanks > > Torsten Seemann wrote: >> Hubert Prielinger wrote: >> >>> if I do so it returns: >>> 0 undef >>> >> >> That means the value of $search was undef. >> That means that it could not parse or open the BLAST report. >> I repeat the line that I put in my earlier email which you ignored. >> >> # your line >> my $search = Bio::SearchIO->new( ..... ); >> >> # then check if it was successful! >> die "could not open blast report" if not defined $search; >> >> --Torsten >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri May 5 20:07:53 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 May 2006 15:07:53 -0500 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <000101c67079$e8c86a00$15327e82@pyrimidine> Message-ID: <000201c6707f$97aaaba0$15327e82@pyrimidine> Oops! This is what happens when I copy and paste in a hurry. > use File::Find; > use Bio::SearchIO; > > my @dirlist = ("/home/Hubert/test"); > > find (\&dir, @dirlist); > > sub printdir { ^^^^^^^^^^^ Should be: sub dir { > return unless /txt$/; > return if (-d); > my $parser = Bio::SearchIO->new(-file => $_, > -format => 'blast'); > while (my $result = $parser->next_result) { > while (my $hit = $result->next_hit) { > while (my $hsp = $hit->next_hsp) { > # do stuff here > } > } > } > } Hubert, if the file you are parsing looks fine (i.e. valid BLAST output), post it and your script on Bugzilla and let us take a look. Leave out your password though ; > Chris From golharam at umdnj.edu Fri May 5 19:58:03 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Fri, 05 May 2006 15:58:03 -0400 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <000001c67078$a9a7ca10$15327e82@pyrimidine> Message-ID: <02f101c6707e$39a03a30$2f01a8c0@GOLHARMOBILE1> I'm not sure how applicable this is, but I've seen a problem with Perl if the LANG environment variable contain UTF8 (ex LANG=en_US.UTF8). I've changed mine to en_US and lots of perl string parsing problems went away. Also, what about running the bioperl tests on your installation (make test). What happens? -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields Sent: Friday, May 05, 2006 3:18 PM To: 'Hubert Prielinger'; 'Torsten Seemann'; bioperl-l at bioperl.org Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore What happens if you add the verbose flag? my $search = new Bio::SearchIO (-verbose => 1, -format => 'blast', -file => $file); Added thought : you might want to look at File::Find for stepping through your files and performing a task on each one, such as parsing output. It changes into the working directory each time; you should be able to do something like this: use File::Find; use Bio::SearchIO; Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger > Sent: Friday, May 05, 2006 1:30 PM > To: Torsten Seemann; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore > > hi, > I have done, as you suggested and I got the error message: > > Can't call method "next_result" on an undefined value at.... > > then I looked up at the internet and found a thread which suggested to > use strict and then the problem is solved.... but I'm already using > use strict.. > > thanks > > Torsten Seemann wrote: > > Hubert Prielinger wrote: > > > >> if I do so it returns: > >> 0 undef > >> > > > > That means the value of $search was undef. > > That means that it could not parse or open the BLAST report. I > > repeat the line that I put in my earlier email which you ignored. > > > > # your line > > my $search = Bio::SearchIO->new( ..... ); > > > > # then check if it was successful! > > die "could not open blast report" if not defined $search; > > > > --Torsten > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri May 5 21:56:29 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 May 2006 16:56:29 -0500 Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps In-Reply-To: <001701c6704f$90dbd090$15327e82@pyrimidine> Message-ID: <000901c6708e$c77442b0$15327e82@pyrimidine> Okay, I have changed the way the CONTIG line is handled in Bio::SeqIO::genbank. It was handling it as a feature; I just changed it over to handling it as a Bio::Annotation::SimpleValue object with the value being the entire contig section. It seems to pass tests fine but I'm operating off Windows and my wife's IBook went to the great desktop in the sky (motherboard), so I can't test it there. Pulling the file off using Bio::DB::GenBank (using the no-redirect flag) works w/o crashing out. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Friday, May 05, 2006 9:24 AM > To: 'Hilmar Lapp' > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > I'm not sure it's a valid CONTIG file w/o the join(...). This is a chunk > from the longer file Michael used as an example here (NW_925173). I > believe > the CONTIG line is currently handled like a feature so I think it goes > through Bio::SeqIO::FTHelper, which is where Michael mentions his bugfix > is; > I think it's getting beaten up in there somehow. I may see what happens if > it's treated like a WGS line (like a Bio::Annotation::SimpleValue object) > and just glob the whole mess together as is. > > > Chris > > ... > FEATURES Location/Qualifiers > source 1..44976370 > /organism="Homo sapiens" > /mol_type="genomic DNA" > /db_xref="taxon:9606" > /chromosome="11" > CONTIG > join(AADB02014316.1:1..1482320,gap(67),AADB02014317.1:1..577321, > gap(441),AADB02014318.1:1..173584,gap(676), > AADB02014319.1:1..377558,gap(20), > complement(AADB02014320.1:1..431263),gap(20), > AADB02014321.1:1..794957,gap(1241),AADB02014322.1:1..1366198, > > gap(6446),AADB02014323.1:1..3366,gap(20),AADB02014324.1:1..4771, > gap(4611),AADB02014325.1:1..383881,gap(20), > complement(AADB02014326.1:1..381633),gap(1930), > complement(AADB02014327.1:1..460053),gap(20), > AADB02014328.1:1..4186,gap(1587), > ... > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > > Sent: Thursday, May 04, 2006 5:39 PM > > To: Chris Fields > > Cc: bioperl-l at lists.open-bio.org > > Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > > > The two notations are equivalent and syntactically correct, or so I > > believe ... I don't think 100% verbatim preservation should be the > > goal. Or am I missing the point? > > > > On May 4, 2006, at 6:27 PM, Chris Fields wrote: > > > > > Here's another odd bit. This is what I get for the CONTIG line when I > > > passed a simple contig file (NW_925062, with one join) through > > > Bio::SeqIO: > > > > > > ----------------------------------- > > > .... > > > FEATURES Location/Qualifiers > > > source 1..8541 > > > /db_xref="taxon:9606" > > > /mol_type="genomic DNA" > > > /chromosome="11" > > > /organism="Homo sapiens" > > > CONTIG AADB02014027.1:1..8541 > > > > > > // > > > ----------------------------------- > > > Here's the original: > > > ----------------------------------- > > > FEATURES Location/Qualifiers > > > source 1..8541 > > > /organism="Homo sapiens" > > > /mol_type="genomic DNA" > > > /db_xref="taxon:9606" > > > /chromosome="11" > > > CONTIG join(AADB02014027.1:1..8541) > > > // > > > ----------------------------------- > > > > > > Looks like it lopped out the 'join' here as well. > > > > > > Chris > > > > > >> -----Original Message----- > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >> bounces at lists.open-bio.org] On Behalf Of Chris Fields > > >> Sent: Thursday, May 04, 2006 1:41 PM > > >> To: bioperl-l at lists.open-bio.org > > >> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > >> > > >> Are you using the CONTIG record or the full GenBank file? I see > > >> problems with both (using bioperl-live) which seem unrelated to one > > >> another. > > >> The full file seems to be running a bit slow b/c the full GenBank > > >> record > > >> is > > >> huge (~55 MB) but the CONTIG file does exactly what you said (runs > > >> out of > > >> memory). > > >> > > >> Chris > > >> > > >>> -----Original Message----- > > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >>> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff > > >>> Sent: Tuesday, May 02, 2006 10:32 PM > > >>> To: bioperl-l at lists.open-bio.org > > >>> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps > > >>> > > >>> > > >>> I've encountered a pretty serious bug in Bio::SeqIO when parsing > > >>> certain > > >>> genbank > > >>> files that contain CONTIG entries with gaps. One such record is > > >>> NW_925173. > > >>> > > >>> When I try to parse this file using Bio::SeqIO::genbank, it will > > >>> enter > > >> an > > >>> infinite loop and spin until it runs out of memory. > > >>> > > >>> I'm pretty certain it relates to this bug: > > >>> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to > > >>> indicate > > >>> that > > >>> genbank records with CONTIG gaps are not valid and can't be > > >>> parsed. But > > >>> this > > >>> bug actually claims to be fixed, which is strange, since looking > > >>> at the > > >>> code for > > >>> FTLocationFactory (where the loop is) it's still right there. I > > >>> assume > > >>> that > > >>> this may be fixed in other contexts but is still not fixed in > > >>> Bio::SeqIO::genbank? Or am I doing something wrong? > > >>> > > >>> I think that this should probably be filed as an open bug. I would > > >> think > > >>> that > > >>> even if bioperl isn't interested in parsing this type of file via > > >>> SeqIO, > > >>> certainly you'd want to ensure that no finite input file would > > >>> send the > > >>> parser > > >>> into an infinite loop. Have others encountered this problem? Is > > >>> there > > >>> any plan > > >>> to address it? > > >>> > > >>> Thanks very much for any information or help! > > >>> > > >>> -Mike > > >>> > > >>> P.S. I've played around with my version of FTLocationFactory and it > > >> seems > > >>> to > > >>> actually work and parse the gaps. I'm not sure if I've created > > >>> other > > >> bugs > > >>> or if > > >>> it works in all cases, but at least the parser doesn't die. I also > > >> don't > > >>> know > > >>> that my hacky code is appropriate for putting back in to BioPerl, > > >>> but > > >> I'm > > >>> happy > > >>> to provide it if someone wants to check it out and/or consider it > > >>> for > > >>> checkin. > > >>> > > >>> > > >>> > > >>> _______________________________________________ > > >>> Bioperl-l mailing list > > >>> Bioperl-l at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > -- > > =========================================================== > > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > =========================================================== > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Fri May 5 23:54:55 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 05 May 2006 17:54:55 -0600 Subject: [Bioperl-l] [BULK] Re: can't parse blast file anymore In-Reply-To: <02f101c6707e$39a03a30$2f01a8c0@GOLHARMOBILE1> References: <02f101c6707e$39a03a30$2f01a8c0@GOLHARMOBILE1> Message-ID: <445BE5CF.2000007@gmx.at> hi ryan, nothing happend if I add the verbose flag and how can I test my bioperl installation..... Ryan Golhar wrote: > I'm not sure how applicable this is, but I've seen a problem with Perl > if the LANG environment variable contain UTF8 (ex LANG=en_US.UTF8). > I've changed mine to en_US and lots of perl string parsing problems went > away. > > Also, what about running the bioperl tests on your installation (make > test). What happens? > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Friday, May 05, 2006 3:18 PM > To: 'Hubert Prielinger'; 'Torsten Seemann'; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore > > > What happens if you add the verbose flag? > > my $search = new Bio::SearchIO (-verbose => 1, > -format => 'blast', > -file => $file); > > Added thought : you might want to look at File::Find for stepping > through your files and performing a task on each one, such as parsing > output. It changes into the working directory each time; you should be > able to do something like this: > > use File::Find; > use Bio::SearchIO; > > > > > Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger >> Sent: Friday, May 05, 2006 1:30 PM >> To: Torsten Seemann; bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore >> >> hi, >> I have done, as you suggested and I got the error message: >> >> Can't call method "next_result" on an undefined value at.... >> >> then I looked up at the internet and found a thread which suggested to >> > > >> use strict and then the problem is solved.... but I'm already using >> use strict.. >> >> thanks >> >> Torsten Seemann wrote: >> >>> Hubert Prielinger wrote: >>> >>> >>>> if I do so it returns: >>>> 0 undef >>>> >>>> >>> That means the value of $search was undef. >>> That means that it could not parse or open the BLAST report. I >>> repeat the line that I put in my earlier email which you ignored. >>> >>> # your line >>> my $search = Bio::SearchIO->new( ..... ); >>> >>> # then check if it was successful! >>> die "could not open blast report" if not defined $search; >>> >>> --Torsten >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From hubert.prielinger at gmx.at Sat May 6 00:01:11 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 05 May 2006 18:01:11 -0600 Subject: [Bioperl-l] [BULK] can't parse blast file anymore Message-ID: <445BE747.5020202@gmx.at> hi I have posted my script and the blast file to bugzilla...... From hubert.prielinger at gmx.at Sat May 6 01:21:33 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 05 May 2006 19:21:33 -0600 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <445BE747.5020202@gmx.at> References: <445BE747.5020202@gmx.at> Message-ID: <445BFA1D.5060008@gmx.at> they bugzilla posting didn't work, what is the exact email address for bugzilla Hubert Prielinger wrote: > hi > I have posted my script and the blast file to bugzilla...... > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From cjfields at uiuc.edu Sat May 6 01:38:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 May 2006 20:38:47 -0500 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <445BFA1D.5060008@gmx.at> Message-ID: <000d01c670ad$d209f980$15327e82@pyrimidine> Hubert, Calm down. Breathe in, breath out. Relax....... Okay, here is the place to start. Read the instructions there first. http://www.bioperl.org/wiki/Bugs Bugs are reported at this site: http://bugzilla.bioperl.org/ Again, follow the instructions. You will have to create a user name and password to submit. Once that is set up, click the "Submit a new bug" link on the main bugzilla page. On that page, fill out all information first and a description of the error and hit 'commit'. Add the BLAST report and some sample script by clicking on the "Create a New Attachment" link (you'll have to do this for each file). Once you go back to the bug page you should see two attachments and the bug report. Any commits get sent through the bioperl-guts-l mail list which most developers subscribe to, so they'll know there's a new bug out there. I will not be able to get to it personally; our home computer died a slow painful death today (RIP 2002-2006) but I can get to it next week. If you post the bug, somebody might be able to get to it sooner! Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger > Sent: Friday, May 05, 2006 8:22 PM > To: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] [BULK] can't parse blast file anymore > > they bugzilla posting didn't work, what is the exact email address for > bugzilla > > Hubert Prielinger wrote: > > hi > > I have posted my script and the blast file to bugzilla...... > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Sat May 6 02:26:35 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 5 May 2006 21:26:35 -0500 Subject: [Bioperl-l] Changes to NCBIHelper (RE: CONTIG, genome files) Message-ID: <000f01c670b4$7f22f760$15327e82@pyrimidine> I committed a change to NCBIHelper that permits the downloading of CON (contig) files and corrects an issue where no sequence features were saved when rebuilding those files. If you use Bio::DB::GenBank regularly to download genome files, this likely will NOT affect your code unless you explicitly set the format type to 'genbank', like so: $factory = Bio::DB::GenBank->new(-format => 'gb'); # or 'genbank' I believe most will not have that setting since the default was already 'gb'. Now, the default is 'gbwithparts', which returns the full sequence regardless. If it is a file with a CONTIG line, the sequence is built on NCBI's end and will include seq features if they are present). As Brian said, we'll let NCBI do the work for us! If you need the actual file w/o sequence, then you can set the format to 'genbank' (like above) and it will grab it for you. There was an unrelated problem with CONTIG line parsing that I also fixed, where I changed the format over to a Bio::Annotation::SimpleValue as a workaround for now; for some reason some CON files were misparsed and resulted in infinite loops or missing 'join' statements. Chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From hubert.prielinger at gmx.at Sat May 6 22:22:05 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Sat, 06 May 2006 16:22:05 -0600 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <000d01c670ad$d209f980$15327e82@pyrimidine> References: <000d01c670ad$d209f980$15327e82@pyrimidine> Message-ID: <445D218D.2030504@gmx.at> ok, thanks I have submitted the bug bug #1994 Chris Fields wrote: > Hubert, > > Calm down. Breathe in, breath out. Relax....... > > Okay, here is the place to start. Read the instructions there first. > > http://www.bioperl.org/wiki/Bugs > > Bugs are reported at this site: > > http://bugzilla.bioperl.org/ > > Again, follow the instructions. You will have to create a user name and > password to submit. Once that is set up, click the "Submit a new bug" link > on the main bugzilla page. On that page, fill out all information first and > a description of the error and hit 'commit'. Add the BLAST report and some > sample script by clicking on the "Create a New Attachment" link (you'll have > to do this for each file). Once you go back to the bug page you should see > two attachments and the bug report. Any commits get sent through the > bioperl-guts-l mail list which most developers subscribe to, so they'll know > there's a new bug out there. > > I will not be able to get to it personally; our home computer died a slow > painful death today (RIP 2002-2006) but I can get to it next week. If you > post the bug, somebody might be able to get to it sooner! > > Chris > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger >> Sent: Friday, May 05, 2006 8:22 PM >> To: bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] [BULK] can't parse blast file anymore >> >> they bugzilla posting didn't work, what is the exact email address for >> bugzilla >> >> Hubert Prielinger wrote: >> >>> hi >>> I have posted my script and the blast file to bugzilla...... >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From torsten.seemann at infotech.monash.edu.au Sun May 7 00:57:14 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Sun, 07 May 2006 10:57:14 +1000 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <445D218D.2030504@gmx.at> References: <000d01c670ad$d209f980$15327e82@pyrimidine> <445D218D.2030504@gmx.at> Message-ID: <445D45EA.8020804@infotech.monash.edu.au> Hubert Prielinger wrote: > ok, thanks > I have submitted the bug > bug #1994 This is a line from the script you sent to Bugzilla: my $search = new Bio::SearchIO ( -verbose => 1,-format => 'blast', -file => $file) or die "could not open blast report" if not defined my $search; Althoygh syntactically correct, I don't think it is doing what you want. Please change it to this: my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or die "could not open blast report"; or alternatively, this: my $search = new Bio::SearchIO(-format => 'blast', -file => $file); if (not defined $search) { die "could not open blast report"; } and let us know what happens. all the example output you have supplied still suggests that Bio::SearchIO can not load or parse your blast report. -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia From mamillerpa at yahoo.com Sat May 6 23:07:30 2006 From: mamillerpa at yahoo.com (Mark A. Miller) Date: Sat, 6 May 2006 16:07:30 -0700 (PDT) Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines In-Reply-To: Message-ID: <20060506230730.56480.qmail@web50410.mail.yahoo.com> Thanks for your responses, Jason and Brian. Brian, you suggestion works great. I had really hoped that by parsing the OS line as well, I could be sure I wasn't missing any sequences from my organisms. Well, I gave up on that and just obtained the NCBI taxonomy values. I find it pretty easy to work with them in bioperl. Unfortunately, walking through all of Trembl takes a while, and I'm getting this error: Can't call method "ncbi_taxid" on an undefined value at ./ga2.pl line 55, line 3253682. When I try to extract annotations, etc., from entries like: DHE4_UNKP with: my $species_object = $seq->species; my $taxid_string = $species_object->ncbi_taxid; I guess I have to write an error handler for incomplete taxonomy values. Bye for now, Mark --- Brian Osborne wrote: > Mark, > > The RC line is part of the description of a reference, I'm guessing > 'RC' > stands for Reference Comment. In order to get the attributes of a > reference > you'll first do something like: > > my $anno_collection = $seq->annotation; > my @references = $anno_collection->get_Annotations('reference'); > > To get the comment field for a specific reference you can do: > > $references[0]->comment; > > See the Feature-Annotation HOWTO for more information on Annotations, > the > Reference object is a kind of Annotation object. > > Brian O. > > > On 5/3/06 3:34 PM, "Mark A. Miller" wrote: > > > Yeah. Do you have any experience with that? > > > > Mark > > > > --- Brian Osborne wrote: > > > >> Mark, > >> > >> So you're trying to get the information in the RC line from a > >> Swissprot > >> format file? > >> > >> Brian O. > > > > > > --- --- --- --- --- --- --- --- > > > > Mark A. Miller > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam protection around > > http://mail.yahoo.com > > > --- --- --- --- --- --- --- --- Mark A. Miller __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cjfields at uiuc.edu Sun May 7 03:33:40 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Sat, 6 May 2006 22:33:40 -0500 Subject: [Bioperl-l] [BULK] can't parse blast file anymore Message-ID: <65109dc1.b47d779e.81acb00@expms6.cites.uiuc.edu> The -verbose flag was my suggestion; it should output a ton of debugging info from SearchIO::blast; if you see anything there, then it means that it's at least attempting to parse the report. Of course I can't test this myself at the moment since my wife's computer died (along with the bioperl setup); I'm using a loaner computer at the moment. Chris ---- Original message ---- >Date: Sun, 07 May 2006 10:57:14 +1000 >From: Torsten Seemann >Subject: Re: [Bioperl-l] [BULK] ?can't parse blast file anymore >To: Hubert Prielinger >Cc: bioperl-l at bioperl.org > >Hubert Prielinger wrote: >> ok, thanks >> I have submitted the bug >> bug #1994 > >This is a line from the script you sent to Bugzilla: > >my $search = new Bio::SearchIO ( >-verbose => 1,-format => 'blast', -file => $file) >or die "could not open blast report" if not defined my $search; > >Althoygh syntactically correct, I don't think it is doing what you want. >Please change it to this: > >my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or die >"could not open blast report"; > >or alternatively, this: > >my $search = new Bio::SearchIO(-format => 'blast', -file => $file); >if (not defined $search) { > die "could not open blast report"; >} > >and let us know what happens. > >all the example output you have supplied still suggests that Bio::SearchIO can >not load or parse your blast report. > >-- >Torsten Seemann >Victorian Bioinformatics Consortium, Monash University, Australia >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From chen_li3 at yahoo.com Sun May 7 07:34:55 2006 From: chen_li3 at yahoo.com (chen li) Date: Sun, 7 May 2006 00:34:55 -0700 (PDT) Subject: [Bioperl-l] primer parameters using primer3 Message-ID: <20060507073455.11849.qmail@web36815.mail.mud.yahoo.com> Hi all, I use Bio::Tools::Run::Primer3 to design PCR primers. I want to change some default values, for example, to increase the PCR product size to 490-510 bp instead of using the default value of 100-300 bp. What should I do ? Thanks, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From jason.stajich at duke.edu Sun May 7 20:49:29 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun, 7 May 2006 16:49:29 -0400 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <65109dc1.b47d779e.81acb00@expms6.cites.uiuc.edu> References: <65109dc1.b47d779e.81acb00@expms6.cites.uiuc.edu> Message-ID: The problem is in how SearchIO was being initialized, the code basically looked like this: my $x = new Foo() or die if not defined my $x; which is invalid for two reason. 1) if not defined my $x; Will ALWAYS be false. 2) my $x = new Foo() or die ; Will cast the new object as a boolean. Whenever things aren't working, take a look at the code and try and walk through any shortcuts. For clarity make it a two-step process my $x = new Foo(); die "no valid $x" unless defined $x; Please note that currently BioPerl WILL die (via throw) if you try and ask for an invalid file when you initialize a new IO object -- this is handled by code in Bio::Root::IO (line 313 in Bio/Root/IO.pm) which all the IO objects use, so you don't really need to do a test on the object after all. --jason On May 6, 2006, at 11:33 PM, Christopher Fields wrote: > The -verbose flag was my suggestion; it should output a ton of > debugging info > from SearchIO::blast; if you see anything there, then it means that > it's at least > attempting to parse the report. > > Of course I can't test this myself at the moment since my wife's > computer died > (along with the bioperl setup); I'm using a loaner computer at the > moment. > > Chris > > ---- Original message ---- >> Date: Sun, 07 May 2006 10:57:14 +1000 >> From: Torsten Seemann >> Subject: Re: [Bioperl-l] [BULK] can't parse blast file anymore >> To: Hubert Prielinger >> Cc: bioperl-l at bioperl.org >> >> Hubert Prielinger wrote: >>> ok, thanks >>> I have submitted the bug >>> bug #1994 >> >> This is a line from the script you sent to Bugzilla: >> >> my $search = new Bio::SearchIO ( >> -verbose => 1,-format => 'blast', -file => $file) >> or die "could not open blast report" if not defined my $search; >> >> Althoygh syntactically correct, I don't think it is doing what you >> want. >> Please change it to this: >> >> my $search = new Bio::SearchIO(-format => 'blast', -file => $file) >> or die >> "could not open blast report"; >> >> or alternatively, this: >> >> my $search = new Bio::SearchIO(-format => 'blast', -file => $file); >> if (not defined $search) { >> die "could not open blast report"; >> } >> >> and let us know what happens. >> >> all the example output you have supplied still suggests that >> Bio::SearchIO can >> not load or parse your blast report. >> >> -- >> Torsten Seemann >> Victorian Bioinformatics Consortium, Monash University, Australia >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From jason.stajich at duke.edu Sun May 7 21:01:29 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun, 7 May 2006 17:01:29 -0400 Subject: [Bioperl-l] primer parameters using primer3 In-Reply-To: <20060507073455.11849.qmail@web36815.mail.mud.yahoo.com> References: <20060507073455.11849.qmail@web36815.mail.mud.yahoo.com> Message-ID: I put up some info on the wiki (and I encourage other people to do the same!) http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3 Set the command line parameters by just calling a function of the name of the parameter. To get a list of the available options, this perl code will report it to you: # what are the arguments, and what do they mean? my $args = $primer3->arguments; print "ARGUMENT\tMEANING\n"; foreach my $key (keys %{$args}) {print "$key\t", $$args{$key}, "\n"} The info for PRODUCT_SIZE_RANGE is: (size range list, default 100-300) space separated list of product sizes eg - - I believe you can set the PCR product size with $primer3->primer_product_size_range("490-510"); -jason On May 7, 2006, at 3:34 AM, chen li wrote: > Hi all, > > I use Bio::Tools::Run::Primer3 to design PCR primers. > I want to change some default values, for example, to > increase the PCR product size to 490-510 bp instead of > using the default value of 100-300 bp. What should I > do ? > > > Thanks, > > Li > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From chen_li3 at yahoo.com Mon May 8 01:18:17 2006 From: chen_li3 at yahoo.com (chen li) Date: Sun, 7 May 2006 18:18:17 -0700 (PDT) Subject: [Bioperl-l] primer parameters using primer3 In-Reply-To: Message-ID: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com> Hi Jason, I add the line code $primer3->primer_product_size_range("490-510"); to my script. But it doesn't work nor primer3 complains it. Li --- Jason Stajich wrote: > I put up some info on the wiki (and I encourage > other people to do > the same!) > http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3 > > Set the command line parameters by just calling a > function of the > name of the parameter. To get a list of the > available options, this > perl code will report it to you: > > # what are the arguments, and what do they mean? > my $args = $primer3->arguments; > > print "ARGUMENT\tMEANING\n"; > foreach my $key (keys %{$args}) {print "$key\t", > $$args{$key}, "\n"} > > The info for PRODUCT_SIZE_RANGE is: > (size range list, default 100-300) space > separated list of product > sizes eg - - > > I believe you can set the PCR product size with > $primer3->primer_product_size_range("490-510"); > > -jason > On May 7, 2006, at 3:34 AM, chen li wrote: > > > Hi all, > > > > I use Bio::Tools::Run::Primer3 to design PCR > primers. > > I want to change some default values, for example, > to > > increase the PCR product size to 490-510 bp > instead of > > using the default value of 100-300 bp. What should > I > > do ? > > > > > > Thanks, > > > > Li > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From hubert.prielinger at gmx.at Mon May 8 01:41:14 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Sun, 07 May 2006 19:41:14 -0600 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <445D45EA.8020804@infotech.monash.edu.au> References: <000d01c670ad$d209f980$15327e82@pyrimidine> <445D218D.2030504@gmx.at> <445D45EA.8020804@infotech.monash.edu.au> Message-ID: <445EA1BA.9050301@gmx.at> hi, I have corrected that and now I finally I got a few error messages: blast.pm: unrecognized line Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Sch?ffer, blast.pm: unrecognized line Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman blast.pm: unrecognized line (1997), "Gapped BLAST and PSI-BLAST: a new generation of blast.pm: unrecognized line protein database search programs", Nucleic Acids Res. 25:3389-3402. blast.pm: unrecognized line RID: 1137529800-24476-151611170370.BLASTQ1 after that line it stops without terminating.... Torsten Seemann wrote: > Hubert Prielinger wrote: >> ok, thanks >> I have submitted the bug >> bug #1994 > > This is a line from the script you sent to Bugzilla: > > my $search = new Bio::SearchIO ( > -verbose => 1,-format => 'blast', -file => $file) > or die "could not open blast report" if not defined my $search; > > Althoygh syntactically correct, I don't think it is doing what you want. > Please change it to this: > > my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or > die "could not open blast report"; > > or alternatively, this: > > my $search = new Bio::SearchIO(-format => 'blast', -file => $file); > if (not defined $search) { > die "could not open blast report"; > } > > and let us know what happens. > > all the example output you have supplied still suggests that > Bio::SearchIO can not load or parse your blast report. > From cjfields at uiuc.edu Mon May 8 02:04:13 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Sun, 7 May 2006 21:04:13 -0500 Subject: [Bioperl-l] [BULK] can't parse blast file anymore Message-ID: <42d52830.b4f91bfc.81e4600@expms6.cites.uiuc.edu> These are debugging lines (not errors); you still have the -verbose flag set. Did you follow Jason's advice? I believe he's right on the money about the issue at hand... Chris ---- Original message ---- >Date: Sun, 07 May 2006 19:41:14 -0600 >From: Hubert Prielinger >Subject: Re: [Bioperl-l] [BULK] ?can't parse blast file anymore >To: Torsten Seemann , bioperl- l at bioperl.org, Chris Fields , Jason Stajich > >hi, >I have corrected that and now I finally I got a few error messages: > >blast.pm: unrecognized line Reference: Altschul, Stephen F., Thomas L. >Madden, Alejandro A. Sch?ffer, >blast.pm: unrecognized line Jinghui Zhang, Zheng Zhang, Webb Miller, and >David J. Lipman >blast.pm: unrecognized line (1997), "Gapped BLAST and PSI-BLAST: a new >generation of >blast.pm: unrecognized line protein database search programs", Nucleic >Acids Res. 25:3389-3402. >blast.pm: unrecognized line RID: 1137529800-24476-151611170370.BLASTQ1 > >after that line it stops without terminating.... > > >Torsten Seemann wrote: >> Hubert Prielinger wrote: >>> ok, thanks >>> I have submitted the bug >>> bug #1994 >> >> This is a line from the script you sent to Bugzilla: >> >> my $search = new Bio::SearchIO ( >> -verbose => 1,-format => 'blast', -file => $file) >> or die "could not open blast report" if not defined my $search; >> >> Althoygh syntactically correct, I don't think it is doing what you want. >> Please change it to this: >> >> my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or >> die "could not open blast report"; >> >> or alternatively, this: >> >> my $search = new Bio::SearchIO(-format => 'blast', -file => $file); >> if (not defined $search) { >> die "could not open blast report"; >> } >> >> and let us know what happens. >> >> all the example output you have supplied still suggests that >> Bio::SearchIO can not load or parse your blast report. >> > From jason.stajich at duke.edu Mon May 8 02:47:00 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun, 7 May 2006 22:47:00 -0400 Subject: [Bioperl-l] primer parameters using primer3 In-Reply-To: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com> References: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com> Message-ID: <430DE892-8EE8-4FC9-8BAC-7D344C876B72@duke.edu> I'm not really familiar with the module more than what the documentation says so did you try and use the add_targets method to add arguments instead? I had thought the AUTOLOAD method took care of access to the cmd line arguments as it does for the other Run modules but I am not really sure. Perhaps folks on the list who use this module can provide better advice. -jason On May 7, 2006, at 9:18 PM, chen li wrote: > Hi Jason, > > I add the line code > $primer3->primer_product_size_range("490-510"); > to my script. But it doesn't work nor primer3 > complains it. > > Li > > --- Jason Stajich wrote: > >> I put up some info on the wiki (and I encourage >> other people to do >> the same!) >> > http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3 >> >> Set the command line parameters by just calling a >> function of the >> name of the parameter. To get a list of the >> available options, this >> perl code will report it to you: >> >> # what are the arguments, and what do they mean? >> my $args = $primer3->arguments; >> >> print "ARGUMENT\tMEANING\n"; >> foreach my $key (keys %{$args}) {print "$key\t", >> $$args{$key}, "\n"} >> >> The info for PRODUCT_SIZE_RANGE is: >> (size range list, default 100-300) space >> separated list of product >> sizes eg - - >> >> I believe you can set the PCR product size with >> $primer3->primer_product_size_range("490-510"); >> >> -jason >> On May 7, 2006, at 3:34 AM, chen li wrote: >> >>> Hi all, >>> >>> I use Bio::Tools::Run::Primer3 to design PCR >> primers. >>> I want to change some default values, for example, >> to >>> increase the PCR product size to 490-510 bp >> instead of >>> using the default value of 100-300 bp. What should >> I >>> do ? >>> >>> >>> Thanks, >>> >>> Li >>> >>> __________________________________________________ >>> Do You Yahoo!? >>> Tired of spam? Yahoo! Mail has the best spam >> protection around >>> http://mail.yahoo.com >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com -- Jason Stajich Duke University http://www.duke.edu/~jes12 From osborne1 at optonline.net Mon May 8 14:49:22 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Mon, 08 May 2006 10:49:22 -0400 Subject: [Bioperl-l] primer parameters using primer3 In-Reply-To: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com> Message-ID: Li, Read the documentation, Bio::Tools::Run::Primer3. It shows examples of the correct syntax. Also look at bioperl-run/t/Primer3.t. Brian O. On 5/7/06 9:18 PM, "chen li" wrote: > Hi Jason, > > I add the line code > $primer3->primer_product_size_range("490-510"); > to my script. But it doesn't work nor primer3 > complains it. > > Li > > --- Jason Stajich wrote: > >> I put up some info on the wiki (and I encourage >> other people to do >> the same!) >> > http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3 >> >> Set the command line parameters by just calling a >> function of the >> name of the parameter. To get a list of the >> available options, this >> perl code will report it to you: >> >> # what are the arguments, and what do they mean? >> my $args = $primer3->arguments; >> >> print "ARGUMENT\tMEANING\n"; >> foreach my $key (keys %{$args}) {print "$key\t", >> $$args{$key}, "\n"} >> >> The info for PRODUCT_SIZE_RANGE is: >> (size range list, default 100-300) space >> separated list of product >> sizes eg - - >> >> I believe you can set the PCR product size with >> $primer3->primer_product_size_range("490-510"); >> >> -jason >> On May 7, 2006, at 3:34 AM, chen li wrote: >> >>> Hi all, >>> >>> I use Bio::Tools::Run::Primer3 to design PCR >> primers. >>> I want to change some default values, for example, >> to >>> increase the PCR product size to 490-510 bp >> instead of >>> using the default value of 100-300 bp. What should >> I >>> do ? >>> >>> >>> Thanks, >>> >>> Li >>> >>> __________________________________________________ >>> Do You Yahoo!? >>> Tired of spam? Yahoo! Mail has the best spam >> protection around >>> http://mail.yahoo.com >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy at colibase.bham.ac.uk Mon May 8 11:12:49 2006 From: roy at colibase.bham.ac.uk (Roy Chaudhuri) Date: Mon, 08 May 2006 12:12:49 +0100 Subject: [Bioperl-l] primer parameters using primer3 In-Reply-To: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com> References: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com> Message-ID: <445F27B1.40501@colibase.bham.ac.uk> Hi Li, I think the syntax you need is: $primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510'); I guess you may also need to change the parameter PRIMER_PRODUCT_OPT_SIZE. Incidentally, such a restricted product size range may mean that Primer3 is unable to design any suitable primers. If I recall correctly, this doesn't cause an error, you just get a Bio::Tools::Primer3 object with no primers in it. I have had some success with testing for this, and if necessary relaxing some constraints on primer design and re-running Primer3. Hope this helps. Roy. -- Dr. Roy Chaudhuri Bioinformatics Research Fellow Division of Immunity and Infection University of Birmingham, U.K. http://xbase.bham.ac.uk > Hi Jason, > > I add the line code > $primer3->primer_product_size_range("490-510"); > to my script. But it doesn't work nor primer3 > complains it. > > Li > > --- Jason Stajich wrote: > >> > I put up some info on the wiki (and I encourage >> > other people to do >> > the same!) >> > > http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3 >> > >> > Set the command line parameters by just calling a >> > function of the >> > name of the parameter. To get a list of the >> > available options, this >> > perl code will report it to you: >> > >> > # what are the arguments, and what do they mean? >> > my $args = $primer3->arguments; >> > >> > print "ARGUMENT\tMEANING\n"; >> > foreach my $key (keys %{$args}) {print "$key\t", >> > $$args{$key}, "\n"} >> > >> > The info for PRODUCT_SIZE_RANGE is: >> > (size range list, default 100-300) space >> > separated list of product >> > sizes eg - - >> > >> > I believe you can set the PCR product size with >> > $primer3->primer_product_size_range("490-510"); >> > >> > -jason >> > On May 7, 2006, at 3:34 AM, chen li wrote: >> > >>> > > Hi all, >>> > > >>> > > I use Bio::Tools::Run::Primer3 to design PCR >> > primers. >>> > > I want to change some default values, for example, >> > to >>> > > increase the PCR product size to 490-510 bp >> > instead of >>> > > using the default value of 100-300 bp. What should >> > I >>> > > do ? >>> > > >>> > > >>> > > Thanks, >>> > > >>> > > Li >>> > > >>> > > __________________________________________________ >>> > > Do You Yahoo!? >>> > > Tired of spam? Yahoo! Mail has the best spam >> > protection around >>> > > http://mail.yahoo.com >>> > > _______________________________________________ >>> > > Bioperl-l mailing list >>> > > Bioperl-l at lists.open-bio.org >>> > > >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > -- >> > Jason Stajich >> > Duke University >> > http://www.duke.edu/~jes12 >> > >> > >> > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From chen_li3 at yahoo.com Mon May 8 13:21:54 2006 From: chen_li3 at yahoo.com (chen li) Date: Mon, 8 May 2006 06:21:54 -0700 (PDT) Subject: [Bioperl-l] primer parameters using primer3 In-Reply-To: <445F27B1.40501@colibase.bham.ac.uk> Message-ID: <20060508132154.71440.qmail@web36802.mail.mud.yahoo.com> I think Dr. Chaudhuri is correct. I add the follwoing line codes to my script(actually copy from the document) $primer3->add_targets( PRIMER_PRODUCT_SIZE_RANGE=>'490-510'); $primer3->add_targets('PRIMER_MIN_TM'=>60, 'PRIMER_MAX_TM'=>64); to design the primers with product size from 490-510 bp and primer annealing Tm from 60 to 64C . Here is part of the output in the file called temp.out: .......... original sequence..... GTGGGCTGGTGTTGCTTGGAAAATTTCAAAATCCCAAAGTTTCAGGCTTCCCAAAGTTGGCTTGGAAAAATGTGATAGTCTCACCTGAGTCTAGACATGT ................. PRIMER_PRODUCT_SIZE_RANGE=490-510 PRIMER_MIN_TM=60 PRIMER_MAX_TM=64 PRIMER_PAIR_PENALTY=0.1544 PRIMER_LEFT_PENALTY=0.081468 PRIMER_RIGHT_PENALTY=0.072951 PRIMER_LEFT_SEQUENCE=CCAAAGTTGGCTTGGAAAAA ............................... PRIMER_PRODUCT_SIZE=501 .............. This is what I want. If you don't set the special parameters such annealing Tm program will use the defualt ones. If you set your own parameters they will show up after the sequence (see this output example). If one needs to set more parameters and wants to know what parameters are available just browse the code for BEGIN section. Now I have another question: the program always prints out the original sequence at the beginning is it possible not to do that? Thanks all for join this topic, Li --- Roy Chaudhuri wrote: > Hi Li, > > I think the syntax you need is: > > $primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510'); > > I guess you may also need to change the parameter > PRIMER_PRODUCT_OPT_SIZE. > > Incidentally, such a restricted product size range > may mean that Primer3 > is unable to design any suitable primers. If I > recall correctly, this > doesn't cause an error, you just get a > Bio::Tools::Primer3 object with > no primers in it. I have had some success with > testing for this, and if > necessary relaxing some constraints on primer design > and re-running > Primer3. > > Hope this helps. > Roy. > > -- > Dr. Roy Chaudhuri > Bioinformatics Research Fellow > Division of Immunity and Infection > University of Birmingham, U.K. > > http://xbase.bham.ac.uk > > > Hi Jason, > > > > I add the line code > > $primer3->primer_product_size_range("490-510"); > > to my script. But it doesn't work nor primer3 > > complains it. > > > > Li > > > > --- Jason Stajich wrote: > > > >> > I put up some info on the wiki (and I encourage > >> > other people to do > >> > the same!) > >> > > > > http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3 > >> > > >> > Set the command line parameters by just calling > a > >> > function of the > >> > name of the parameter. To get a list of the > >> > available options, this > >> > perl code will report it to you: > >> > > >> > # what are the arguments, and what do they > mean? > >> > my $args = $primer3->arguments; > >> > > >> > print "ARGUMENT\tMEANING\n"; > >> > foreach my $key (keys %{$args}) {print > "$key\t", > >> > $$args{$key}, "\n"} > >> > > >> > The info for PRODUCT_SIZE_RANGE is: > >> > (size range list, default 100-300) space > >> > separated list of product > >> > sizes eg - - > >> > > >> > I believe you can set the PCR product size with > >> > > $primer3->primer_product_size_range("490-510"); > >> > > >> > -jason > >> > On May 7, 2006, at 3:34 AM, chen li wrote: > >> > > >>> > > Hi all, > >>> > > > >>> > > I use Bio::Tools::Run::Primer3 to design PCR > >> > primers. > >>> > > I want to change some default values, for > example, > >> > to > >>> > > increase the PCR product size to 490-510 bp > >> > instead of > >>> > > using the default value of 100-300 bp. What > should > >> > I > >>> > > do ? > >>> > > > >>> > > > >>> > > Thanks, > >>> > > > >>> > > Li > >>> > > > >>> > > > __________________________________________________ > >>> > > Do You Yahoo!? > >>> > > Tired of spam? Yahoo! Mail has the best > spam > >> > protection around > >>> > > http://mail.yahoo.com > >>> > > > _______________________________________________ > >>> > > Bioperl-l mailing list > >>> > > Bioperl-l at lists.open-bio.org > >>> > > > >> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > >> > -- > >> > Jason Stajich > >> > Duke University > >> > http://www.duke.edu/~jes12 > >> > > >> > > >> > > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From hubert.prielinger at gmx.at Mon May 8 19:09:29 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Mon, 08 May 2006 13:09:29 -0600 Subject: [Bioperl-l] [BULK] can't parse blast file anymore In-Reply-To: <42d52830.b4f91bfc.81e4600@expms6.cites.uiuc.edu> References: <42d52830.b4f91bfc.81e4600@expms6.cites.uiuc.edu> Message-ID: <445F9769.70500@gmx.at> hi all together, i have solved the problem, because I'm parsing blast 2.2.13 and I have installed an early bioperl 1.5.1 and there it occurred that bug 1934 wasn't fixed yet, so I had to exchange the blast.pm file and now it works properly. thank you very much Hubert Christopher Fields wrote: > These are debugging lines (not errors); you still have the -verbose flag set. > > Did you follow Jason's advice? I believe he's right on the money about the issue > at hand... > > Chris > > ---- Original message ---- > >> Date: Sun, 07 May 2006 19:41:14 -0600 >> From: Hubert Prielinger >> Subject: Re: [Bioperl-l] [BULK] can't parse blast file anymore >> To: Torsten Seemann , bioperl- >> > l at bioperl.org, Chris Fields , Jason Stajich > > >> hi, >> I have corrected that and now I finally I got a few error messages: >> >> blast.pm: unrecognized line Reference: Altschul, Stephen F., Thomas L. >> Madden, Alejandro A. Sch?ffer, >> blast.pm: unrecognized line Jinghui Zhang, Zheng Zhang, Webb Miller, and >> David J. Lipman >> blast.pm: unrecognized line (1997), "Gapped BLAST and PSI-BLAST: a new >> generation of >> blast.pm: unrecognized line protein database search programs", Nucleic >> Acids Res. 25:3389-3402. >> blast.pm: unrecognized line RID: >> > 1137529800-24476-151611170370.BLASTQ1 > >> after that line it stops without terminating.... >> >> >> Torsten Seemann wrote: >> >>> Hubert Prielinger wrote: >>> >>>> ok, thanks >>>> I have submitted the bug >>>> bug #1994 >>>> >>> This is a line from the script you sent to Bugzilla: >>> >>> my $search = new Bio::SearchIO ( >>> -verbose => 1,-format => 'blast', -file => $file) >>> or die "could not open blast report" if not defined my $search; >>> >>> Althoygh syntactically correct, I don't think it is doing what you want. >>> Please change it to this: >>> >>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or >>> die "could not open blast report"; >>> >>> or alternatively, this: >>> >>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file); >>> if (not defined $search) { >>> die "could not open blast report"; >>> } >>> >>> and let us know what happens. >>> >>> all the example output you have supplied still suggests that >>> Bio::SearchIO can not load or parse your blast report. >>> >>> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From s.johri at imperial.ac.uk Mon May 8 15:38:13 2006 From: s.johri at imperial.ac.uk (Johri, Saurabh) Date: Mon, 8 May 2006 16:38:13 +0100 Subject: [Bioperl-l] PAML + Codeml problem.. Message-ID: <4A98ACB8EC146149872BAC9A132A582C277AC4@icex5.ic.ac.uk> Hi all, I'm trying to use codeml from PAML to estimate Ka, Ks values from sequences within a multi fasta file: i'm using the code which has been posted on the bioperl wiki... However, when I run the code, i get the following errors: I did a google search to see if anyone had come across similar problems.... in which case the problem seems to have been due to the sequences not being a multiple of 3, In my code I check if the sequence is a multiple of 3 and if not, i alter the sequences until this is the case, although I still have the same error messages, Any suggestions as to why this could be happening? Thanks!!! Saurabh Johri Tuberculosis Research Group Centre for Molecular Microbiology & Infection Imperial College London SW7 2AZ -------------------- WARNING --------------------- MSG: There was an error - see error_string for the program output --------------------------------------------------- ------------- EXCEPTION Bio::Root::NotImplemented ------------- MSG: Unknown format of PAML output STACK Bio::Tools::Phylo::PAML::_parse_summary /sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:359 STACK Bio::Tools::Phylo::PAML::next_result /sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:224 ------------------------------------ >Rv3923c caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc aaataagcccggtgttgcaatcaa >Rv3923c_mtb_cdc1551 caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcac >Rv3923c_mtb_f11 caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc aaataagcccggtgttgcaatcaa >Rv3923c_mtb_c1 caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc aaataagcccggtgttgcaatcaa >Rv3923c_mtb_210 caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc aaataagcccggtgttgcaatcaa >Rv3923c_mbovis caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc aaataagcccggtgttgcaatcaa ------------------------------------ From chen_li3 at yahoo.com Tue May 9 00:21:42 2006 From: chen_li3 at yahoo.com (chen li) Date: Mon, 8 May 2006 17:21:42 -0700 (PDT) Subject: [Bioperl-l] use primer3 to design primers with multiple sequences Message-ID: <20060509002142.94880.qmail@web36806.mail.mud.yahoo.com> Dear all, The following is the script I use to design primers for one sequence: #!/cygdrive/c/Perl/bin/perl.exe use warnings; use strict; use Bio::Tools::Run::Primer3; use Bio::SeqIO; my $file_in='piwil2.fa'; my $file_out='temp.out'; my $seqio=Bio::SeqIO->new(-file=>$file_in) my $seq=$seqio->next_seq; my $primer3=Bio::Tools::Run::Primer3->new( -seq=>$seq, -outfile=>$file_out, - path=>"c:/Perl/local/primer3_1.0.0/src/primer3.exe" ); unless ($primer3->executable){ print "primer3 can not be found. Is it installed?\n"; exit(-1); } $primer3->add_targets( # set your own parameters for the primers or product 'PRIMER_OPT_GC_PERCENT'=>' 50 ', 'PRIMER_OPT_SIZE'=> '24 ', 'PRIMER_OPT_TM'=> ' 60 '); my $result=$primer3->run; exit; I try to modify it for multiple sequences by using a while loop as following: while ($seq=$seqio->next_seq){ my $primer3=Bio::Tools::Run::Primer3->new() # design the primer} ....} I get primers only for the last sequence. It seems the earlier ones are overwritten. Any idea will be highly aprreciated. Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From jason.stajich at duke.edu Tue May 9 00:59:26 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon, 8 May 2006 20:59:26 -0400 Subject: [Bioperl-l] PAML + Codeml problem.. In-Reply-To: <4A98ACB8EC146149872BAC9A132A582C277AC4@icex5.ic.ac.uk> References: <4A98ACB8EC146149872BAC9A132A582C277AC4@icex5.ic.ac.uk> Message-ID: <4796FE3D-9D14-4D93-B455-69EDFE2B2B62@duke.edu> Saurabh - a) These sequences are identical except for difference in length so there isn't going to be any interesting values from PAML, but maybe you are just providing an example? b) I think you are missing the trailing gaps in the alignment of the Rv3923c_mtb_cdc1551 sequence as it is shorter PAML requires aligned sequences as input. c) The sequences, in the reading frame you have provided (and using the standard translation table), have stop codons in them, this will cause failure as well. Which code from the wiki are you running, the 'running PAML' part of the HOWTO? Try looking at the actual output from PAML to figure out what is wrong. Add this when initializing the Run object: -save_tempfiles => 1, -verbose => 1, then open up the tempdir that is reported and look at the output files (mlc file). -jason On May 8, 2006, at 11:38 AM, Johri, Saurabh wrote: > Hi all, > > I'm trying to use codeml from PAML to estimate Ka, Ks values from > sequences within a multi fasta file: > i'm using the code which has been posted on the bioperl wiki... > > However, when I run the code, i get the following errors: > > I did a google search to see if anyone had come across similar > problems.... in which case the problem seems to have been due to the > sequences not being a multiple of 3, > In my code I check if the sequence is a multiple of 3 and if not, i > alter the sequences until this is the case, although I still have the > same error messages, > > Any suggestions as to why this could be happening? > > Thanks!!! > > Saurabh Johri > Tuberculosis Research Group > Centre for Molecular Microbiology & Infection > Imperial College London > SW7 2AZ > > > > > -------------------- WARNING --------------------- > MSG: There was an error - see error_string for the program output > --------------------------------------------------- > > ------------- EXCEPTION Bio::Root::NotImplemented ------------- > MSG: Unknown format of PAML output > STACK Bio::Tools::Phylo::PAML::_parse_summary > /sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:359 > STACK Bio::Tools::Phylo::PAML::next_result > /sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:224 > ------------------------------------ > >> Rv3923c > caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg > ag > gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg > ac > ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc > gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg > gt > acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc > gc > aaataagcccggtgttgcaatcaa >> Rv3923c_mtb_cdc1551 > caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg > ag > gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg > ac > ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc > gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg > gt > acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcac >> Rv3923c_mtb_f11 > caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg > ag > gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg > ac > ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc > gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg > gt > acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc > gc > aaataagcccggtgttgcaatcaa >> Rv3923c_mtb_c1 > caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg > ag > gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg > ac > ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc > gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg > gt > acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc > gc > aaataagcccggtgttgcaatcaa >> Rv3923c_mtb_210 > caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg > ag > gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg > ac > ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc > gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg > gt > acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc > gc > aaataagcccggtgttgcaatcaa >> Rv3923c_mbovis > caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg > ag > gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg > ac > ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc > gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg > gt > acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc > gc > aaataagcccggtgttgcaatcaa > > ------------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From osborne1 at optonline.net Tue May 9 01:17:22 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Mon, 08 May 2006 21:17:22 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiple sequences In-Reply-To: <20060509002142.94880.qmail@web36806.mail.mud.yahoo.com> Message-ID: Li, If you're analyzing multiple input sequences you're going to have to create multiple output sequences. Brian O. On 5/8/06 8:21 PM, "chen li" wrote: > I get primers only for the last sequence. It seems the > earlier ones are overwritten. From WiersmaP at AGR.GC.CA Tue May 9 01:28:27 2006 From: WiersmaP at AGR.GC.CA (Wiersma, Paul) Date: Mon, 8 May 2006 21:28:27 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiple sequences Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C41@onncrxms5.agr.gc.ca> Hi Li, When you execute $primer3->run with a Bio::Tools::Run::Primer3 object it opens -outfile=>"filename" for writing and then closes. That's why putting it in a loop will overwrite your output file each time so you only see the last one. I suppose you could read in each output file before looping to the next seq and append it to another file. If you're doing a fair bit of work with this module it would be worth looking at the Bio::Tools::Primer3 module. The statement $result = $primer3->run produces a Bio::Tools::Primer3 object which has all the methods you need for customizing your output. Paul Paul A. Wiersma Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Summerland, BC wiersmap at agr.gc.ca From simon_sask at yahoo.com Tue May 9 08:06:04 2006 From: simon_sask at yahoo.com (Simon K. Chan) Date: Tue, 9 May 2006 01:06:04 -0700 (PDT) Subject: [Bioperl-l] Raw Blast Alignment Message-ID: <20060509080604.53621.qmail@web54104.mail.yahoo.com> Hi Fellow Bioperl-ers, bioperl-live/examples/searchio/rawwriter.pl is supposed to show the raw alignments using Bio::SearchIO. The script is written to parse a PSI-BLAST report. I found an old email in the archive from Jason stating that this should parse other flavors of blast reports as well. What do I need to do to make this script parse non-PSI blast reports? I tried to just specify a file and that the -format is 'blast', but I get an error stating that the object method 'raw_hit_data' is not defined in Bio::Search::Hit::BlastHit. Basically, I want to obtain the raw alignment because I'd like to get the size of the gaps, not just the number. Any help will be much appreciated. Many thanks __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cjfields at uiuc.edu Tue May 9 12:21:02 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Tue, 9 May 2006 07:21:02 -0500 Subject: [Bioperl-l] Raw Blast Alignment Message-ID: You need to read the SearchIO HOWTO, which gives several examples: http://www.bioperl.org/wiki/HOWTO:SearchIO Chris ---- Original message ---- >Date: Tue, 9 May 2006 01:06:04 -0700 (PDT) >From: "Simon K. Chan" >Subject: [Bioperl-l] Raw Blast Alignment >To: bioperl-l at lists.open-bio.org > >Hi Fellow Bioperl-ers, > >bioperl-live/examples/searchio/rawwriter.pl is >supposed to show the raw alignments using >Bio::SearchIO. The script is written to parse a >PSI-BLAST report. I found an old email in the archive >from Jason stating that this should parse other >flavors of blast reports as well. > >What do I need to do to make this script parse non-PSI >blast reports? I tried to just specify a file and >that the -format is 'blast', but I get an error >stating that the object method 'raw_hit_data' is not >defined in Bio::Search::Hit::BlastHit. > >Basically, I want to obtain the raw alignment because >I'd like to get the size of the gaps, not just the >number. > >Any help will be much appreciated. >Many thanks > > >__________________________________________________ >Do You Yahoo!? >Tired of spam? Yahoo! Mail has the best spam protection around >http://mail.yahoo.com >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From peterm at bioinf.uni-leipzig.de Tue May 9 12:44:25 2006 From: peterm at bioinf.uni-leipzig.de (Peter Menzel) Date: Tue, 09 May 2006 14:44:25 +0200 Subject: [Bioperl-l] colorize features Message-ID: <44608EA9.1030808@bioinf.uni-leipzig.de> Hi all, I am using the Bio::Graphics module to draw sequences and their features with Bio::SeqFeature::Generic. The features I want to highlight are occurrences of transcription binding factors. Therefore I want to give every factor its own color, but i didn't see how to manage it. I only can colorize complete tracks. Is there a known workaround? Thanks, Peter From Marc.Logghe at DEVGEN.com Tue May 9 14:13:24 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Tue, 9 May 2006 16:13:24 +0200 Subject: [Bioperl-l] colorize features Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746D88@ANTARESIA.be.devgen.com> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Peter Menzel > Sent: Tuesday, May 09, 2006 2:44 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] colorize features > > Hi all, > I am using the Bio::Graphics module to draw sequences and > their features with Bio::SeqFeature::Generic. > The features I want to highlight are occurrences of > transcription binding factors. Therefore I want to give every > factor its own color, but i didn't see how to manage it. I > only can colorize complete tracks. > Is there a known workaround? Yes, instead of giving a hardcoded color value you can pass a subroutine to the option. -bgcolor => sub { my $feat = shift; # get your attribute on which you want to base your color my ($attr) = $feat->get_tag_values('my_attribute'); return $attr > 10 ? 'red' : 'green' } Not sure about the method calls I am making here (could as well be get_attributes()) but you get the idea. Cheers, Marc From Marc.Logghe at DEVGEN.com Tue May 9 14:47:06 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Tue, 9 May 2006 16:47:06 +0200 Subject: [Bioperl-l] colorize features Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746D89@ANTARESIA.be.devgen.com> Hi Peter, Actually it is explained much better in this howto: http://bioperl.org/wiki/HOWTO:Graphics The examples show the principle I mentioned in my previous post (e.g. Example 4), but then for the -label or -description options. But as said, you can apply this as well for (most of ?) the other options as well. Regards, ML > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Marc Logghe > Sent: Tuesday, May 09, 2006 4:13 PM > To: Peter Menzel; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] colorize features > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Peter > > Menzel > > Sent: Tuesday, May 09, 2006 2:44 PM > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] colorize features > > > > Hi all, > > I am using the Bio::Graphics module to draw sequences and their > > features with Bio::SeqFeature::Generic. > > The features I want to highlight are occurrences of transcription > > binding factors. Therefore I want to give every factor its > own color, > > but i didn't see how to manage it. I only can colorize complete > > tracks. > > Is there a known workaround? > > Yes, instead of giving a hardcoded color value you can pass a > subroutine to the option. > -bgcolor => sub { > my $feat = shift; > # get your attribute on which you want to base your color > my ($attr) = $feat->get_tag_values('my_attribute'); > > return $attr > 10 ? 'red' : 'green' > } > > Not sure about the method calls I am making here (could as well be > get_attributes()) but you get the idea. > Cheers, > Marc > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From WiersmaP at AGR.GC.CA Tue May 9 15:49:33 2006 From: WiersmaP at AGR.GC.CA (Wiersma, Paul) Date: Tue, 9 May 2006 11:49:33 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiple sequences Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C42@onncrxms5.agr.gc.ca> Hi Li, The line "my $result = $primer3->run" is already in the code you submitted. In the Bio::Tools::Primer3 module the author uses "$p3" for the object. If you change your line to "my $p3 = $primer3->run" you should be able to run the examples below. Process the results for each sequence and output the results before looping to the next sequence. >From Bio::Tools::Primer3.pm: # how many results were there? my $num=$p3->number_of_results; print "There were $num results\n"; # get all the results my $all_results=$p3->all_results; print "ALL the results\n"; foreach my $key (keys %{$all_results}) {print "$key\t${$all_results}{$key}\n"} # get specific results my $result1=$p3->primer_results(1); print "The first primer is\n"; foreach my $key (keys %{$result1}) {print "$key\t${$result1}{$key}\n"} Paul Paul A. Wiersma Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Summerland, BC wiersmap at agr.gc.ca ? -----Original Message----- From: chen li [mailto:chen_li3 at yahoo.com] Sent: Monday, May 08, 2006 8:32 PM To: Wiersma, Paul Subject: Re: [Bioperl-l] use primer3 to design primers with multiple sequences Hi Paul, I read both documents. What I understand is that Bio:Tools::Run:Primer3 is for designing primers and Bio:Tools::Primer3 is for parsing the results. When I read the documents I do not see this line $result = $primer3->run in Bio:Tools::Primer3. I wonder how you get this infomration. Thanks, Li --- "Wiersma, Paul" wrote: > Hi Li, > > > > When you execute $primer3->run with a > Bio::Tools::Run::Primer3 object it > opens -outfile=>"filename" for writing and then > closes. That's why > putting it in a loop will overwrite your output file > each time so you > only see the last one. I suppose you could read in > each output file > before looping to the next seq and append it to > another file. > > > > If you're doing a fair bit of work with this module > it would be worth > looking at the Bio::Tools::Primer3 module. The > statement $result = > $primer3->run produces a Bio::Tools::Primer3 object > which has all the > methods you need for customizing your output. > > > > Paul > > > > Paul A. Wiersma > Agriculture and Agri-Food Canada/Agriculture et > Agroalimentaire Canada > Summerland, BC > > wiersmap at agr.gc.ca > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From chen_li3 at yahoo.com Tue May 9 17:32:32 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 9 May 2006 10:32:32 -0700 (PDT) Subject: [Bioperl-l] use primer3 to design primers with multiple sequences In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C42@onncrxms5.agr.gc.ca> Message-ID: <20060509173232.18843.qmail@web36802.mail.mud.yahoo.com> Thanks Paul it REALLY works. I have other questions: 1) When I run the script I use this line on the command prompt perl primer.pl >test When I check the default output file(temp.out) used by the script I only see the information about the last sequence which is different from what is in the test file. In test file I can get all the information for all the sequences. 2)Is it possible directly to use Bio::Tools:: Primer3 to print out selective information such as the primer sequence and the size of PCR product? Or do I have parse the file by myself? After I get all these information I would like to post the script for bacth-designing PCR primers. Thanks, Li --- "Wiersma, Paul" wrote: > Hi Li, > > The line "my $result = $primer3->run" is already in > the code you submitted. In the Bio::Tools::Primer3 > module the author uses "$p3" for the object. If you > change your line to "my $p3 = $primer3->run" you > should be able to run the examples below. Process > the results for each sequence and output the results > before looping to the next sequence. > > >From Bio::Tools::Primer3.pm: > > # how many results were there? > my $num=$p3->number_of_results; > print "There were $num results\n"; > > # get all the results > my $all_results=$p3->all_results; > print "ALL the results\n"; > foreach my $key (keys %{$all_results}) {print > "$key\t${$all_results}{$key}\n"} > > # get specific results > my $result1=$p3->primer_results(1); > print "The first primer is\n"; > foreach my $key (keys %{$result1}) {print > "$key\t${$result1}{$key}\n"} > > Paul > > Paul A. Wiersma > Agriculture and Agri-Food Canada/Agriculture et > Agroalimentaire Canada > Summerland, BC > wiersmap at agr.gc.ca > > ? > > > > -----Original Message----- > From: chen li [mailto:chen_li3 at yahoo.com] > Sent: Monday, May 08, 2006 8:32 PM > To: Wiersma, Paul > Subject: Re: [Bioperl-l] use primer3 to design > primers with multiple sequences > > Hi Paul, > > I read both documents. What I understand is that > Bio:Tools::Run:Primer3 is for designing primers and > Bio:Tools::Primer3 is for parsing the results. When > I > read the documents I do not see this line > $result = $primer3->run in Bio:Tools::Primer3. I > wonder how you get this infomration. > > Thanks, > > Li > > --- "Wiersma, Paul" wrote: > > > Hi Li, > > > > > > > > When you execute $primer3->run with a > > Bio::Tools::Run::Primer3 object it > > opens -outfile=>"filename" for writing and then > > closes. That's why > > putting it in a loop will overwrite your output > file > > each time so you > > only see the last one. I suppose you could read > in > > each output file > > before looping to the next seq and append it to > > another file. > > > > > > > > If you're doing a fair bit of work with this > module > > it would be worth > > looking at the Bio::Tools::Primer3 module. The > > statement $result = > > $primer3->run produces a Bio::Tools::Primer3 > object > > which has all the > > methods you need for customizing your output. > > > > > > > > Paul > > > > > > > > Paul A. Wiersma > > Agriculture and Agri-Food Canada/Agriculture et > > Agroalimentaire Canada > > Summerland, BC > > > > wiersmap at agr.gc.ca > > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam > protection around > http://mail.yahoo.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From WiersmaP at AGR.GC.CA Tue May 9 17:59:20 2006 From: WiersmaP at AGR.GC.CA (Wiersma, Paul) Date: Tue, 9 May 2006 13:59:20 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiple sequences Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C43@onncrxms5.agr.gc.ca> Hi Li, I've attached some code I used to explore basic functionality of Primer3.pm modules. Hopefully you can see how I've picked out parts of the results for printing. You can modify it as you need to output only some results. >>>>>>>> # design the primers. This runs primer3 and returns a # Bio::Tools::Run::Primer3 object with the results my $results=$primer3->run; # see the Bio::Tools::Run::Primer3 pod for # things that you can get from this. For example: print "There were ", $results->number_of_results+1, " primers\n"; my @out_keys_part = qw( START LENGTH TM GC_PERCENT SELF_ANY SELF_END SEQUENCE ); for (my $i=0;$i <= $results->number_of_results;$i++){ # get specific results my $result1=$results->primer_results($i); print "\n",$i+1; for $key qw(PRIMER_LEFT PRIMER_RIGHT){ my ($start, $length) = split /,/, ${$result1}{$key}; ${$result1}{$key."_START"} = $start; ${$result1}{$key."_LENGTH"} = $length; foreach $partkey (@out_keys_part) { print "\t", ${$result1}{$key."_".$partkey}; } print "\n"; } print "\tPRODUCT SIZE: ", ${$result1}{'PRIMER_PRODUCT_SIZE'}, ", PAIR ANY COMPL: ", ${$result1}{'PRIMER_PAIR_COMPL_ANY'}; print ", PAIR 3\' COMPL: ", ${$result1}{'PRIMER_PAIR_COMPL_END'}, "\n"; } >>>>>>>>>>>>>>> Paul A. Wiersma Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Telephone/T?l?phone: 250-494-6388 Facsimile/T?l?copieur: 250-494-0755 Box 5000, 4200 Hwy 97 Summerland, BC V0H 1Z0 wiersmap at agr.gc.ca ? -----Original Message----- From: chen li [mailto:chen_li3 at yahoo.com] Sent: Tuesday, May 09, 2006 10:33 AM To: Wiersma, Paul Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] use primer3 to design primers with multiple sequences Thanks Paul it REALLY works. I have other questions: 1) When I run the script I use this line on the command prompt perl primer.pl >test When I check the default output file(temp.out) used by the script I only see the information about the last sequence which is different from what is in the test file. In test file I can get all the information for all the sequences. 2)Is it possible directly to use Bio::Tools:: Primer3 to print out selective information such as the primer sequence and the size of PCR product? Or do I have parse the file by myself? After I get all these information I would like to post the script for bacth-designing PCR primers. Thanks, Li --- "Wiersma, Paul" wrote: > Hi Li, > > The line "my $result = $primer3->run" is already in > the code you submitted. In the Bio::Tools::Primer3 > module the author uses "$p3" for the object. If you > change your line to "my $p3 = $primer3->run" you > should be able to run the examples below. Process > the results for each sequence and output the results > before looping to the next sequence. > > >From Bio::Tools::Primer3.pm: > > # how many results were there? > my $num=$p3->number_of_results; > print "There were $num results\n"; > > # get all the results > my $all_results=$p3->all_results; > print "ALL the results\n"; > foreach my $key (keys %{$all_results}) {print > "$key\t${$all_results}{$key}\n"} > > # get specific results > my $result1=$p3->primer_results(1); > print "The first primer is\n"; > foreach my $key (keys %{$result1}) {print > "$key\t${$result1}{$key}\n"} > > Paul > > Paul A. Wiersma > Agriculture and Agri-Food Canada/Agriculture et > Agroalimentaire Canada > Summerland, BC > wiersmap at agr.gc.ca > > ? > > > > -----Original Message----- > From: chen li [mailto:chen_li3 at yahoo.com] > Sent: Monday, May 08, 2006 8:32 PM > To: Wiersma, Paul > Subject: Re: [Bioperl-l] use primer3 to design > primers with multiple sequences > > Hi Paul, > > I read both documents. What I understand is that > Bio:Tools::Run:Primer3 is for designing primers and > Bio:Tools::Primer3 is for parsing the results. When > I > read the documents I do not see this line > $result = $primer3->run in Bio:Tools::Primer3. I > wonder how you get this infomration. > > Thanks, > > Li > > --- "Wiersma, Paul" wrote: > > > Hi Li, > > > > > > > > When you execute $primer3->run with a > > Bio::Tools::Run::Primer3 object it > > opens -outfile=>"filename" for writing and then > > closes. That's why > > putting it in a loop will overwrite your output > file > > each time so you > > only see the last one. I suppose you could read > in > > each output file > > before looping to the next seq and append it to > > another file. > > > > > > > > If you're doing a fair bit of work with this > module > > it would be worth > > looking at the Bio::Tools::Primer3 module. The > > statement $result = > > $primer3->run produces a Bio::Tools::Primer3 > object > > which has all the > > methods you need for customizing your output. > > > > > > > > Paul > > > > > > > > Paul A. Wiersma > > Agriculture and Agri-Food Canada/Agriculture et > > Agroalimentaire Canada > > Summerland, BC > > > > wiersmap at agr.gc.ca > > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam > protection around > http://mail.yahoo.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cjfields at uiuc.edu Tue May 9 21:13:43 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 9 May 2006 16:13:43 -0500 Subject: [Bioperl-l] Oddness in Bio::SeqIO Message-ID: <000601c673ad$74601c30$15327e82@pyrimidine> I noticed an odd thing with SeqIO parsing of species lines (those problematic bacterial tax names again). I have a simple script that runs output to STDOUT to generate a list of hits. Here's what I get: Bacterium: Corynebacterium glutamicum ATCC 13032 hits: 4 Bacterium: Corynebacterium jeikeium K411 K411 <-- hits: 1 Bacterium: Frankia sp. CcI3 CcI3 <-- hits: 1 Bacterium: Frankia sp. EAN1pec EAN1pec <-- hits: 1 Bacterium: Janibacter sp. HTCC2649 HTCC2649 <-- hits: 1 Bacterium: Kineococcus radiotolerans SRS30216 SRS30216 <-- hits: 1 Bacterium: Leifsonia xyli subsp. xyli str. CTCB07 xyli str. CTCB07 <-- hits: 1 Bacterium: Mycobacterium avium subsp. paratuberculosis K-10 paratuberculosis K-10 <-- ... Most (but not all) of the strain numbers get repeated (marked with arrows). This is actually in the GenBank file itself, downloaded via Bio::DB::GenBank (and thus passed through Bio::SeqIO). Anyone seen this before? Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From torsten.seemann at infotech.monash.edu.au Tue May 9 23:42:29 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Wed, 10 May 2006 09:42:29 +1000 Subject: [Bioperl-l] Oddness in Bio::SeqIO In-Reply-To: <000601c673ad$74601c30$15327e82@pyrimidine> References: <000601c673ad$74601c30$15327e82@pyrimidine> Message-ID: <446128E5.1000908@infotech.monash.edu.au> Chris, > I noticed an odd thing with SeqIO parsing of species lines (those > problematic bacterial tax names again). I have a simple script that runs > output to STDOUT to generate a list of hits. Here's what I get: > Bacterium: Mycobacterium avium subsp. paratuberculosis K-10 paratuberculosis > K-10 <-- In this case, Genus = Mycobacterium Species = avium Subspecies = paratuberculosis Strain = K-10 which suggests that BioPerl is trying to handle something special, because the 'subsp.' is gone? Here's the pertinent parts of the Genbank file (apologies for the wrapping): LOCUS NC_002944 4829781 bp DNA circular BCT 18-JAN-2006 DEFINITION Mycobacterium avium subsp. paratuberculosis K-10, complete genome. SOURCE Mycobacterium avium subsp. paratuberculosis K-10 ORGANISM Mycobacterium avium subsp. paratuberculosis K-10 Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales; Corynebacterineae; Mycobacteriaceae; Mycobacterium; Mycobacterium avium complex (MAC). /organism="Mycobacterium avium subsp. paratuberculosis K-10" /strain="K-10" /sub_species="paratuberculosis" > Most (but not all) of the strain numbers get repeated (marked with arrows). > This is actually in the GenBank file itself, downloaded via Bio::DB::GenBank > (and thus passed through Bio::SeqIO). Anyone seen this before? The problem is mentioned in the wiki so it must have come up before? http://bioperl.org/wiki/Project_priority_list#Taxonomy_.2F_Species_data I also deal with Bacteria mainly, and should also look into this. I haven't been using the genbank headers directly, only the features, so i never came across this. Another thing which may crop up is when no Species has been allocated yet but the genus is known (or something like that). In that case the name is written as "Genus spp." eg. Gallibacterium spp. --Torsten From chen_li3 at yahoo.com Wed May 10 01:04:08 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 9 May 2006 18:04:08 -0700 (PDT) Subject: [Bioperl-l] use primer3 to design primers with multiple sequences In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C47@onncrxms5.agr.gc.ca> Message-ID: <20060510010408.24494.qmail@web36804.mail.mud.yahoo.com> Hi Paul, Thank you very much. Just like you point out in your lastest email I now figure out the line "my $result1=$results->primer_results(1);" returns a hash reference containing all the information for the first pair of primer. 1)Since it is a hash I should be able to get the specific value for its corresponding key by telling Perl which key is the entry for the value. 2) Also it is a reference I should deference it to get the so-called true value. I don't know too much OO and Perl and your code looks a little bit complicated to me. But I get the job done by adding the following lines directly: ############################################### #from Primer3 module to get all the infomration #foreach my $key (sort keys %{$result1}) { #print "$key\t${$result1}{$key}\n"} ################################################## #get the value for the key in the hash reference my $key_PRIMER_LEFT_SEQUENCE='PRIMER_LEFT_SEQUENCE'; print "$key_PRIMER_LEFT_SEQUENCE\t${$result1}{$key_PRIMER_LEFT_SEQUENCE}\n"; There is one point I don't understand: When I add these two lines into my code (line 49 in my code) my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID'; print "$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n"; I don't get the PRIMER_SEQUENCE_ID. Perl complains it and says "Use of uninitialized value in concatenation (.) or string at primer3-3 line 49." Li --- "Wiersma, Paul" wrote: > Hi Li, > > Just a bit of clarification of the code that I sent > earlier. > The line "my $result1=$results->primer_results($i);" > gives you a > reference to a hash that contains all of the > information for a primer > pair. > To access the entries you dereference the hash, i.e. > the hash is > %{$result1} and ${$result1}{'PRIMER_PRODUCT_SIZE'} > gives you the entry > for product size. The following are the available > entries. All are > single values or strings except PRIMER_RIGHT and > PRIMER_LEFT which are > start,length pairs (e.g. PRIMER_LEFT => '60,20') > which can be pulled out > with split. > my ($start, $length) = split /,/, > ${$result1}{'PRIMER_LEFT'}; > my $right_Tm = ${$result1}{'PRIMER_RIGHT_TM'} > PRIMER_PRODUCT_SIZE > PRIMER_PAIR_COMPL_ANY > PRIMER_PAIR_COMPL_END > PRIMER_PAIR_PENALTY > > PRIMER_LEFT > PRIMER_LEFT_END_STABILITY > PRIMER_LEFT_PENALTY > PRIMER_LEFT_TM > PRIMER_LEFT_GC_PERCENT > PRIMER_LEFT_SELF_ANY > PRIMER_LEFT_SELF_END > PRIMER_LEFT_SEQUENCE > > PRIMER_RIGHT > PRIMER_RIGHT_END_STABILITY > PRIMER_RIGHT_PENALTY > PRIMER_RIGHT_TM > PRIMER_RIGHT_GC_PERCENT > PRIMER_RIGHT_SELF_ANY > PRIMER_RIGHT_SELF_END > PRIMER_RIGHT_SEQUENCE > > Paul A. Wiersma > Agriculture and Agri-Food Canada/Agriculture et > Agroalimentaire Canada > Summerland, BC > wiersmap at agr.gc.ca > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From zhouyubio at gmail.com Wed May 10 01:35:01 2006 From: zhouyubio at gmail.com (Yu ZHOU) Date: Wed, 10 May 2006 01:35:01 +0000 (UTC) Subject: [Bioperl-l] pubmed References: <6.1.2.0.2.20050331171052.03830ba8@qfdong.mail.iastate.edu> Message-ID: Qunfeng iastate.edu> writes: > > Hi there, > > http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html > > I am not very familiar with BioPerl. I tried to follow the example showing > in the above page to retrieve pubmed ID under each Reference tag , i.e., > $value->pubmed(), but it doesn't work for me for the seq gi#56961711. The > authors() works for me. Appreciate any suggestions. > > Qunfeng > Hi, I have the same problem with you. Here is what I have done, by using regular expression to match the value of 'location' tag, if there is. #------------------ my $ann = $seqobj->annotation(); # annotation object foreach my $ref ( $ann->get_Annotations('reference') ) { print "Title: ", $ref->title,"\n"; print "Location: ", $ref->location, "\n"; if ($ref->location =~ /PUBMED\s+(\d+)/) { my $pmid = $1; print "PMID: ", $pmid, "\n"; } print "Authors: ", $ref->authors, "\n"; } #------------------ From osborne1 at optonline.net Wed May 10 03:01:49 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 09 May 2006 23:01:49 -0400 Subject: [Bioperl-l] pubmed In-Reply-To: Message-ID: Qunfeng, I'm using bioperl-live, I'm able retrieve the single PubMed id found in the 56961711 entry using the pubmed() method. Note that there are 4 references, only one of which has a Pubmed id. Also, the authors() method prints out the authors, not the Pubmed id. If you have a problem please show your code and tell us which version of Bioperl you're using. Brian O. use strict; use lib "/Users/bosborne/bioperl-live"; use Bio::DB::GenBank; my $db = Bio::DB::GenBank->new; my $seq = $db->get_Seq_by_id(56961711); my $ann_coll = $seq->annotation; foreach my $ann ($ann_coll->get_Annotations('reference')) { print "Author: ", $ann->authors, "\nPubmed id: ", $ann->pubmed, "\n"; } On 5/9/06 9:35 PM, "Yu ZHOU" wrote: > Qunfeng iastate.edu> writes: > >> >> Hi there, >> >> http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html >> >> I am not very familiar with BioPerl. I tried to follow the example showing >> in the above page to retrieve pubmed ID under each Reference tag , i.e., >> $value->pubmed(), but it doesn't work for me for the seq gi#56961711. The >> authors() works for me. Appreciate any suggestions. >> >> Qunfeng >> > > > Hi, > > I have the same problem with you. Here is what I have done, by using regular > expression to match the value of 'location' tag, if there is. > > #------------------ > my $ann = $seqobj->annotation(); # annotation object > foreach my $ref ( $ann->get_Annotations('reference') ) { > print "Title: ", $ref->title,"\n"; > print "Location: ", $ref->location, "\n"; > if ($ref->location =~ /PUBMED\s+(\d+)/) { > my $pmid = $1; > print "PMID: ", $pmid, "\n"; > } > print "Authors: ", $ref->authors, "\n"; > } > #------------------ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sb at mrc-dunn.cam.ac.uk Wed May 10 09:30:59 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Wed, 10 May 2006 10:30:59 +0100 Subject: [Bioperl-l] Bio::Taxonomy confusion Message-ID: <4461B2D3.7010603@mrc-dunn.cam.ac.uk> Hi, I'm a little confused as to how names are supposed to work in Bio::Taxonomy::Node. In the bioperl versions that I've looked at a Node doesn't seem to store the most important information about itself - it's scientific name - in an obvious place. bioperl 1.5.1 puts it at the start of the classification list. I'd have thought sticking it in -name would make more sense, but this is used only for the GenBank common name. The Bio::Taxonomy docs still suggests: my $node_species_sapiens = Bio::Taxonomy::Node->new( -object_id => 9606, # or -ncbi_taxid. Requird tag -names => { 'scientific' => ['sapiens'], 'common_name' => ['human'] }, -rank => 'species' # Required tag ); and whilst Bio::Taxonomy::Node does not accept -names, it does have a 'name' method which claims to work like: $obj->name('scientific', 'sapiens'); This kind of thing would be really nice, but afaics Bio::Taxonomy::Node->new takes the -name value and makes a common name out of it, whilst the name() method passes any 'scientific' name to the scientific_name() method which is unable to set any value (and warns about this), only get. It seems like the need to have this classification array work the same way as Bio::Species is causing some unnecessary restrictions. Can't the more sensible idea of having a dedicated storage spot for the ScientificName and other parameters be used, with the classification array either being generated just-in-time from the hash-stored data, or indeed being generated from the Lineage field? Also, why does a node store the complete hierarchy on itself in the classification array? If we're going that far, why don't the Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a get_taxonomy() method instead of a get_Taxonomy_Node() method. get_taxonomy() could, from a single efetch.fcgi lookup, create a complete Bio::Taxonomy with all the nodes. Whilst most nodes would only have a minimum of information, if you could simply ask a node what its rank and scientific name was you could easily build a classification array, or ask what Kingdom your species was in etc. Are there good reasons for Taxonomy working the way it does in 1.5.1, or would I not be wasting my time re-writing things to make more sense (to me)? Cheers, Sendu. From osborne1 at optonline.net Wed May 10 14:33:18 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 10 May 2006 10:33:18 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiple sequences In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C43@onncrxms5.agr.gc.ca> Message-ID: Paul, I took your code, added some "run" code and made it into a script and added this to CVS, examples/tools/run_primer3.pl. I hope this is OK with you. Brian O. On 5/9/06 1:59 PM, "Wiersma, Paul" wrote: > $results->number_of_results From stoltzfu at umbi.umd.edu Tue May 9 20:22:43 2006 From: stoltzfu at umbi.umd.edu (Arlin Stoltzfus) Date: Tue, 09 May 2006 16:22:43 -0400 Subject: [Bioperl-l] proposal: CDAT (character data and trees) integrative object Message-ID: Dear developers-- We propose a Bio::CDAT (Character Data And Trees) module to facilitate comparative analysis using evolutionary methods by 1) managing evolutionary relationships (by linking data to trees) and 2) allowing coordinated analysis of different types of data (by implementing a generic concept of ?character-state? data). Bio::CDAT would take advantage of existing BioPerl objects and would include the functionality of Rutger Vos's Bio::Phylo. It would provide the framework to develop interfaces to analysis tools (phylogeny inference, evolutionary rate models, functional shift inference, etc), as well as to file formats and visualization methods appropriate for such analyses. A proposal is attached. We would like to hear your thoughts (e.g., see the section on "Questions to consider")! Thanks Arlin Stoltzfus WeiGang Qiu Rutger Vos (with thanks to Justin Reese and Aaron Mackey) ------------------ Arlin Stoltzfus (stoltzfu at umbi.umd.edu) CARB, 9600 Gudelsky Drive, Rockville, Maryland 20850 tel 240 314 6208, fax 240 314 6255, www.molevol.org/camel --------- -------------- next part -------------- A non-text attachment was scrubbed... Name: CDAT-proposal.pdf Type: application/pdf Size: 193701 bytes Desc: not available URL: -------------- next part -------------- From zhouyubio at gmail.com Wed May 10 08:55:46 2006 From: zhouyubio at gmail.com (Yu Zhou) Date: Wed, 10 May 2006 16:55:46 +0800 Subject: [Bioperl-l] pubmed In-Reply-To: References: Message-ID: <613ffb490605100155w43a9ea4sca23818bc7fa4e33@mail.gmail.com> Thanks! I am using Bioperl-1.4, not bioperl-live. That may be the reason why it does not work! On 5/10/06, Brian Osborne wrote: > Qunfeng, > > I'm using bioperl-live, I'm able retrieve the single PubMed id found in the > 56961711 entry using the pubmed() method. Note that there are 4 references, > only one of which has a Pubmed id. Also, the authors() method prints out the > authors, not the Pubmed id. If you have a problem please show your code and > tell us which version of Bioperl you're using. > > Brian O. > > > use strict; > > use lib "/Users/bosborne/bioperl-live"; > > use Bio::DB::GenBank; > > > > my $db = Bio::DB::GenBank->new; > > my $seq = $db->get_Seq_by_id(56961711); > > my $ann_coll = $seq->annotation; > > > foreach my $ann ($ann_coll->get_Annotations('reference')) { > > print "Author: ", $ann->authors, "\nPubmed id: ", $ann->pubmed, "\n"; > > } > > > > > > On 5/9/06 9:35 PM, "Yu ZHOU" wrote: > > > Qunfeng iastate.edu> writes: > > > >> > >> Hi there, > >> > >> http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html > >> > >> I am not very familiar with BioPerl. I tried to follow the example > showing > >> in the above page to retrieve pubmed ID under each Reference tag , i.e., > >> $value->pubmed(), but it doesn't work for me for the seq gi#56961711. The > >> authors() works for me. Appreciate any suggestions. > >> > >> Qunfeng > >> > > > > > > Hi, > > > > I have the same problem with you. Here is what I have done, by using > regular > > expression to match the value of 'location' tag, if there is. > > > > #------------------ > > my $ann = $seqobj->annotation(); # annotation object > > foreach my $ref ( $ann->get_Annotations('reference') ) { > > print "Title: ", $ref->title,"\n"; > > print "Location: ", $ref->location, "\n"; > > if ($ref->location =~ /PUBMED\s+(\d+)/) { > > my $pmid = $1; > > print "PMID: ", $pmid, "\n"; > > } > > print "Authors: ", $ref->authors, "\n"; > > } > > #------------------ > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- Best Wishes! Yu From cjfields at uiuc.edu Wed May 10 15:46:27 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 10 May 2006 10:46:27 -0500 Subject: [Bioperl-l] Oddness in Bio::SeqIO In-Reply-To: <446128E5.1000908@infotech.monash.edu.au> Message-ID: <000f01c67448$e63973b0$15327e82@pyrimidine> This actually pops up when using $seq->species->common_name; using $seq->species->binomial chops some of the strain designations off, so really neither one works optimally for bacterial genus-species-strain taxonomy. Hilmar made the suggestion that it's probably best to grab the NCBI TaxID and parse it out that way by looking it up in the taxonomy database (using Bio::DB::Taxonomy), but at the moment that's not what Bio::SeqIO::genbank does. I wonder if we should be trying to shove most of this stuff into species objects directly from the beginning; in other words, maybe we should try to get the information in Bio::Annotation objects and then, after the parsing/IO is finished, have a method to get the information into Bio::Species objects when wanted/needed; a check could be added against the NCBI Taxonomy database there. Anyway, I really haven't looked at how they are parsed out and don't have the time at the moment. I may look into this as well but not until I get back from conference (end of May). Jason and Brian have been calling for a refactoring of Bio::SeqIO::genbank for a while; maybe it's getting time to do something about it... Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > Sent: Tuesday, May 09, 2006 6:42 PM > To: Chris Fields > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Oddness in Bio::SeqIO > > Chris, > > > I noticed an odd thing with SeqIO parsing of species lines (those > > problematic bacterial tax names again). I have a simple script that > runs > > output to STDOUT to generate a list of hits. Here's what I get: > > > Bacterium: Mycobacterium avium subsp. paratuberculosis K-10 > paratuberculosis > > K-10 <-- > > In this case, > > Genus = Mycobacterium > Species = avium > Subspecies = paratuberculosis > Strain = K-10 > > which suggests that BioPerl is trying to handle something special, > because the 'subsp.' is gone? > > Here's the pertinent parts of the Genbank file > (apologies for the wrapping): > > LOCUS NC_002944 4829781 bp DNA circular BCT > 18-JAN-2006 > DEFINITION Mycobacterium avium subsp. paratuberculosis K-10, complete > genome. > SOURCE Mycobacterium avium subsp. paratuberculosis K-10 > ORGANISM Mycobacterium avium subsp. paratuberculosis K-10 > Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales; > Corynebacterineae; Mycobacteriaceae; Mycobacterium; > Mycobacterium > avium complex (MAC). > > /organism="Mycobacterium avium subsp. > paratuberculosis K-10" > /strain="K-10" > /sub_species="paratuberculosis" > > > > Most (but not all) of the strain numbers get repeated (marked with > arrows). > > This is actually in the GenBank file itself, downloaded via > Bio::DB::GenBank > > (and thus passed through Bio::SeqIO). Anyone seen this before? > > The problem is mentioned in the wiki so it must have come up before? > http://bioperl.org/wiki/Project_priority_list#Taxonomy_.2F_Species_data > > I also deal with Bacteria mainly, and should also look into this. I > haven't been using the genbank headers directly, only the features, so i > never came across this. > > Another thing which may crop up is when no Species has been allocated > yet but the genus is known (or something like that). In that case the > name is written as "Genus spp." eg. Gallibacterium spp. > > --Torsten > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cuiw at mail.nih.gov Wed May 10 16:02:55 2006 From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F]) Date: Wed, 10 May 2006 12:02:55 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiplesequences In-Reply-To: <20060510010408.24494.qmail@web36804.mail.mud.yahoo.com> Message-ID: 'PRIMER_SEQUENCE_ID' is not a key in the Bio::Tools::Primer3 output hash. You can find all legal keys by "print keys %{$result1};" There is one point I don't understand: When I add these two lines into my code (line 49 in my code) my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID'; print "$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n"; I don't get the PRIMER_SEQUENCE_ID. Perl complains it and says "Use of uninitialized value in concatenation (.) or string at primer3-3 line 49." Li From WiersmaP at AGR.GC.CA Wed May 10 16:08:37 2006 From: WiersmaP at AGR.GC.CA (Wiersma, Paul) Date: Wed, 10 May 2006 12:08:37 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiple sequences Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C48@onncrxms5.agr.gc.ca> Brian, no problem with the code, thanks for asking. Li, PRIMER_SEQUENCE_ID and SEQUENCE are not part of the individual results but only end up by default with $results->primer_results(0). If you try to access them using $results->primer_results(1) (or anything but 0) you will get an error. Paul Paul A. Wiersma Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Summerland, BC wiersmap at agr.gc.ca ? -----Original Message----- From: chen li [mailto:chen_li3 at yahoo.com] Sent: Tuesday, May 09, 2006 6:04 PM To: Wiersma, Paul Cc: bioperl-l at bioperl.org Subject: RE: [Bioperl-l] use primer3 to design primers with multiple sequences Hi Paul, Thank you very much. Just like you point out in your lastest email I now figure out the line "my $result1=$results->primer_results(1);" returns a hash reference containing all the information for the first pair of primer. 1)Since it is a hash I should be able to get the specific value for its corresponding key by telling Perl which key is the entry for the value. 2) Also it is a reference I should deference it to get the so-called true value. I don't know too much OO and Perl and your code looks a little bit complicated to me. But I get the job done by adding the following lines directly: ############################################### #from Primer3 module to get all the infomration #foreach my $key (sort keys %{$result1}) { #print "$key\t${$result1}{$key}\n"} ################################################## #get the value for the key in the hash reference my $key_PRIMER_LEFT_SEQUENCE='PRIMER_LEFT_SEQUENCE'; print "$key_PRIMER_LEFT_SEQUENCE\t${$result1}{$key_PRIMER_LEFT_SEQUENCE}\n"; There is one point I don't understand: When I add these two lines into my code (line 49 in my code) my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID'; print "$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n"; I don't get the PRIMER_SEQUENCE_ID. Perl complains it and says "Use of uninitialized value in concatenation (.) or string at primer3-3 line 49." Li --- "Wiersma, Paul" wrote: > Hi Li, > > Just a bit of clarification of the code that I sent > earlier. > The line "my $result1=$results->primer_results($i);" > gives you a > reference to a hash that contains all of the > information for a primer > pair. > To access the entries you dereference the hash, i.e. > the hash is > %{$result1} and ${$result1}{'PRIMER_PRODUCT_SIZE'} > gives you the entry > for product size. The following are the available > entries. All are > single values or strings except PRIMER_RIGHT and > PRIMER_LEFT which are > start,length pairs (e.g. PRIMER_LEFT => '60,20') > which can be pulled out > with split. > my ($start, $length) = split /,/, > ${$result1}{'PRIMER_LEFT'}; > my $right_Tm = ${$result1}{'PRIMER_RIGHT_TM'} > PRIMER_PRODUCT_SIZE > PRIMER_PAIR_COMPL_ANY > PRIMER_PAIR_COMPL_END > PRIMER_PAIR_PENALTY > > PRIMER_LEFT > PRIMER_LEFT_END_STABILITY > PRIMER_LEFT_PENALTY > PRIMER_LEFT_TM > PRIMER_LEFT_GC_PERCENT > PRIMER_LEFT_SELF_ANY > PRIMER_LEFT_SELF_END > PRIMER_LEFT_SEQUENCE > > PRIMER_RIGHT > PRIMER_RIGHT_END_STABILITY > PRIMER_RIGHT_PENALTY > PRIMER_RIGHT_TM > PRIMER_RIGHT_GC_PERCENT > PRIMER_RIGHT_SELF_ANY > PRIMER_RIGHT_SELF_END > PRIMER_RIGHT_SEQUENCE > > Paul A. Wiersma > Agriculture and Agri-Food Canada/Agriculture et > Agroalimentaire Canada > Summerland, BC > wiersmap at agr.gc.ca > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cuiw at mail.nih.gov Wed May 10 18:42:36 2006 From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F]) Date: Wed, 10 May 2006 14:42:36 -0400 Subject: [Bioperl-l] use primer3 to design primers with multiplesequences: bug in code! In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C48@onncrxms5.agr.gc.ca> Message-ID: Hope this works! Bio::Tools::Primer3 line 264 should be: $self->{seqobject}=Bio::Seq->new(-seq=>$value, -id=>$id); Then you should be able to display PRIMER_SEQUENCE_ID by ####read primer3 output file############ my $p3=Bio::Tools::Primer3->new(-file=>"data/primer3_output.txt"); ######## print id############### print $p3->seqobject->id; Wenwu Cui, PhD NIH/NCI -----Original Message----- From: Wiersma, Paul [mailto:WiersmaP at agr.gc.ca] Sent: Wednesday, May 10, 2006 12:09 PM To: chen li Cc: bioperl-l at bioperl.org Subject: Re: [Bioperl-l] use primer3 to design primers with multiplesequences Brian, no problem with the code, thanks for asking. Li, PRIMER_SEQUENCE_ID and SEQUENCE are not part of the individual results but only end up by default with $results->primer_results(0). If you try to access them using $results->primer_results(1) (or anything but 0) you will get an error. Paul Paul A. Wiersma Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Summerland, BC wiersmap at agr.gc.ca ? -----Original Message----- From: chen li [mailto:chen_li3 at yahoo.com] Sent: Tuesday, May 09, 2006 6:04 PM To: Wiersma, Paul Cc: bioperl-l at bioperl.org Subject: RE: [Bioperl-l] use primer3 to design primers with multiple sequences Hi Paul, Thank you very much. Just like you point out in your lastest email I now figure out the line "my $result1=$results->primer_results(1);" returns a hash reference containing all the information for the first pair of primer. 1)Since it is a hash I should be able to get the specific value for its corresponding key by telling Perl which key is the entry for the value. 2) Also it is a reference I should deference it to get the so-called true value. I don't know too much OO and Perl and your code looks a little bit complicated to me. But I get the job done by adding the following lines directly: ############################################### #from Primer3 module to get all the infomration #foreach my $key (sort keys %{$result1}) { #print "$key\t${$result1}{$key}\n"} ################################################## #get the value for the key in the hash reference my $key_PRIMER_LEFT_SEQUENCE='PRIMER_LEFT_SEQUENCE'; print "$key_PRIMER_LEFT_SEQUENCE\t${$result1}{$key_PRIMER_LEFT_SEQUENCE}\n"; There is one point I don't understand: When I add these two lines into my code (line 49 in my code) my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID'; print "$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n"; I don't get the PRIMER_SEQUENCE_ID. Perl complains it and says "Use of uninitialized value in concatenation (.) or string at primer3-3 line 49." Li --- "Wiersma, Paul" wrote: > Hi Li, > > Just a bit of clarification of the code that I sent > earlier. > The line "my $result1=$results->primer_results($i);" > gives you a > reference to a hash that contains all of the > information for a primer > pair. > To access the entries you dereference the hash, i.e. > the hash is > %{$result1} and ${$result1}{'PRIMER_PRODUCT_SIZE'} > gives you the entry > for product size. The following are the available > entries. All are > single values or strings except PRIMER_RIGHT and > PRIMER_LEFT which are > start,length pairs (e.g. PRIMER_LEFT => '60,20') > which can be pulled out > with split. > my ($start, $length) = split /,/, > ${$result1}{'PRIMER_LEFT'}; > my $right_Tm = ${$result1}{'PRIMER_RIGHT_TM'} > PRIMER_PRODUCT_SIZE > PRIMER_PAIR_COMPL_ANY > PRIMER_PAIR_COMPL_END > PRIMER_PAIR_PENALTY > > PRIMER_LEFT > PRIMER_LEFT_END_STABILITY > PRIMER_LEFT_PENALTY > PRIMER_LEFT_TM > PRIMER_LEFT_GC_PERCENT > PRIMER_LEFT_SELF_ANY > PRIMER_LEFT_SELF_END > PRIMER_LEFT_SEQUENCE > > PRIMER_RIGHT > PRIMER_RIGHT_END_STABILITY > PRIMER_RIGHT_PENALTY > PRIMER_RIGHT_TM > PRIMER_RIGHT_GC_PERCENT > PRIMER_RIGHT_SELF_ANY > PRIMER_RIGHT_SELF_END > PRIMER_RIGHT_SEQUENCE > > Paul A. Wiersma > Agriculture and Agri-Food Canada/Agriculture et > Agroalimentaire Canada > Summerland, BC > wiersmap at agr.gc.ca > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Wed May 10 18:58:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 10 May 2006 13:58:19 -0500 Subject: [Bioperl-l] ListSummaries for April 26-May 9 Message-ID: <001801c67463$b3c0a910$15327e82@pyrimidine> ListSummaries for April 26-May 9 are up at the usual place: http://www.bioperl.org/wiki/Mailing_list_summaries Direct link: http://www.bioperl.org/wiki/ListSummary:April_26-May_9%2C2006 It's a bit of a hurried one so don't be surprised to find a few spelling errors here and there. I'm getting ready for a conference in a couple weeks so I may be off the radar a bit here and there. The next ListSummary won't be posted until May 26. Enjoy! Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From chen_li3 at yahoo.com Thu May 11 00:27:34 2006 From: chen_li3 at yahoo.com (chen li) Date: Wed, 10 May 2006 17:27:34 -0700 (PDT) Subject: [Bioperl-l] What is the relationship between primer3 module and run-primer3 module? Message-ID: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com> First thank you all for replying my previous post about primer3. But now I am a little confused even after I read the documents: What is the relationship between these two modules? What is correct/standard way to use them to do the batch-primer design? What I do is that I use Bio::Tools::Run::Primer3 to design primers. Based on Dr. Roy Chaudhuri's information I can set the parameters using the following syntax: $primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510'); Based on Paul A. Wiersma's explanation I can also print out part of the primer results(because I don't need all the information). But there is a little trouble: PRIMER_SEQUENCE_ID can't be accessed using this method. And Paul points out that "PRIMER_SEQUENCE_ID and SEQUENCE are not part of the individual results but only end up by default with $results->primer_results(0)". So it seems there is no way to get around this problem using Bio::Tools::Run::Primer3. And others suggest using Bio::Tools::Primer3 to parse the results. So is true that Bio::Tools::Run::Primer3 is for primer design and Bio::Tools::Primer3 is for parsing the results from Bio::Tools::Run::Primer3? But what I find is that I get almost all the results (except PRIMER_SEQUENCE_ID and SEQUENCE ) without providing a line code use Bio::Tools::Primer3 in the script. How to explain this? Is it because the following line code? my $result=$primer3->run; The last question: which line code is used to invoke program primer3.exe? How does Perl script call the primer3.exe? Once again thank you all very much, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From jason.stajich at duke.edu Thu May 11 00:41:31 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed, 10 May 2006 20:41:31 -0400 Subject: [Bioperl-l] What is the relationship between primer3 module and run-primer3 module? In-Reply-To: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com> References: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com> Message-ID: Bio::Tools::Run::XXX modules are for running applications... On May 10, 2006, at 8:27 PM, chen li wrote: > First thank you all for replying my previous post > about primer3. > > But now I am a little confused even after I read the > documents: What is the relationship between these two > modules? What is correct/standard way to use them to > do the batch-primer design? What I do is that I use > Bio::Tools::Run::Primer3 to design primers. Based on > Dr. Roy Chaudhuri's information I can set the > parameters using the following syntax: > > $primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510'); > > Based on Paul A. Wiersma's explanation I can also > print out part of the primer results(because I don't > need all the information). But there is a little > trouble: PRIMER_SEQUENCE_ID can't be accessed using > this method. And Paul points out that > "PRIMER_SEQUENCE_ID and SEQUENCE are not part of the > individual > results but only end up by default with > $results->primer_results(0)". So it seems there is no > way to get around this problem using > Bio::Tools::Run::Primer3. And others suggest using > Bio::Tools::Primer3 to parse the results. So is true > that Bio::Tools::Run::Primer3 is for primer design and > Bio::Tools::Primer3 is for parsing the results from > Bio::Tools::Run::Primer3? But what I find is that I > get almost all the results (except PRIMER_SEQUENCE_ID > and SEQUENCE ) without providing a line code > > use Bio::Tools::Primer3 > > in the script. How to explain this? Is it because the > following line code? > > my $result=$primer3->run; > > The last question: which line code is used to invoke > program primer3.exe? How does Perl script call the > primer3.exe? > > Once again thank you all very much, > > Li > > > > > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From jason.stajich at duke.edu Thu May 11 00:53:43 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed, 10 May 2006 20:53:43 -0400 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: <4461B2D3.7010603@mrc-dunn.cam.ac.uk> References: <4461B2D3.7010603@mrc-dunn.cam.ac.uk> Message-ID: <655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu> I would use the implementation that talks to the flatfile db as the standard here. nodes are defined by the data in from taxonomy dump dbs from ncbi. the eutils is pretty worthless except for taxid->name or reverse, you can't get the full taxonomy (or couldn't when that implementation was written). The "name" method refers to the name of the node - each level in the taxonomy can have a "name". The bits of hackiness relate to wrapping the node object as a Bio::Species and/or being able to read a genbank file and the organism taxonomy data as a list and instantiating. If we could rely on everything being in a DB of course this would be simpler. Another problem is the depth of the taxonomy is not constant for every node so assuming that a fixed number of slots will be filled in to generate the taxonomy leads to problems. Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the best example of working code as this is how I really wanted it to work, the Bio::Species hacks are only there to shoehorn data retrieved from genbank files in. With the flatfile implementation you have to walk all the way up the db hierarchy to get the kingdom for a node so you do have to build up the classification hierarchy as each node only stores data about itsself. I'm not exactly sure what you are proposing to do, but would definitely enjoy another pair of hands, I don't really have time to mess with it any time soon. -jason On May 10, 2006, at 5:30 AM, Sendu Bala wrote: > Hi, > I'm a little confused as to how names are supposed to work in > Bio::Taxonomy::Node. > > In the bioperl versions that I've looked at a Node doesn't seem to > store > the most important information about itself - it's scientific name > - in > an obvious place. bioperl 1.5.1 puts it at the start of the > classification list. I'd have thought sticking it in -name would make > more sense, but this is used only for the GenBank common name. > > The Bio::Taxonomy docs still suggests: > > my $node_species_sapiens = Bio::Taxonomy::Node->new( > -object_id => 9606, # or -ncbi_taxid. Requird tag > -names => { > 'scientific' => ['sapiens'], > 'common_name' => ['human'] > }, > -rank => 'species' # Required tag > ); > > and whilst Bio::Taxonomy::Node does not accept -names, it does have a > 'name' method which claims to work like: > > $obj->name('scientific', 'sapiens'); > > This kind of thing would be really nice, but afaics > Bio::Taxonomy::Node->new takes the -name value and makes a common name > out of it, whilst the name() method passes any 'scientific' name to > the > scientific_name() method which is unable to set any value (and warns > about this), only get. > > It seems like the need to have this classification array work the same > way as Bio::Species is causing some unnecessary restrictions. Can't > the > more sensible idea of having a dedicated storage spot for the > ScientificName and other parameters be used, with the classification > array either being generated just-in-time from the hash-stored > data, or > indeed being generated from the Lineage field? > > > Also, why does a node store the complete hierarchy on itself in the > classification array? If we're going that far, why don't the > Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a > get_taxonomy() method instead of a get_Taxonomy_Node() method. > get_taxonomy() could, from a single efetch.fcgi lookup, create a > complete Bio::Taxonomy with all the nodes. Whilst most nodes would > only > have a minimum of information, if you could simply ask a node what its > rank and scientific name was you could easily build a classification > array, or ask what Kingdom your species was in etc. > > Are there good reasons for Taxonomy working the way it does in > 1.5.1, or > would I not be wasting my time re-writing things to make more sense > (to me)? > > > Cheers, > Sendu. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cuiw at mail.nih.gov Thu May 11 01:46:00 2006 From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F]) Date: Wed, 10 May 2006 21:46:00 -0400 Subject: [Bioperl-l] What is the relationship between primer3 module andrun-primer3 module? References: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com> Message-ID: 1. Bio::Tools::Primer3 is already included in Bio::Tools::Run::Primer3 module so that you can parse the result file. 2. There is a bug in Bio::Toos::Primer3.pm line 264 as I mentioned. Once fixed, it can output PRIMER_SEQUENCE_ID 3. primer3.exe is called in the Bio::Tools::Run::Primer3 "run" function, please read the function definition. ________________________________ From: chen li [mailto:chen_li3 at yahoo.com] Sent: Wed 5/10/2006 8:27 PM To: bioperl-l at bioperl.org Subject: [Bioperl-l] What is the relationship between primer3 module andrun-primer3 module? First thank you all for replying my previous post about primer3. But now I am a little confused even after I read the documents: What is the relationship between these two modules? What is correct/standard way to use them to do the batch-primer design? What I do is that I use Bio::Tools::Run::Primer3 to design primers. Based on Dr. Roy Chaudhuri's information I can set the parameters using the following syntax: $primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510'); Based on Paul A. Wiersma's explanation I can also print out part of the primer results(because I don't need all the information). But there is a little trouble: PRIMER_SEQUENCE_ID can't be accessed using this method. And Paul points out that "PRIMER_SEQUENCE_ID and SEQUENCE are not part of the individual results but only end up by default with $results->primer_results(0)". So it seems there is no way to get around this problem using Bio::Tools::Run::Primer3. And others suggest using Bio::Tools::Primer3 to parse the results. So is true that Bio::Tools::Run::Primer3 is for primer design and Bio::Tools::Primer3 is for parsing the results from Bio::Tools::Run::Primer3? But what I find is that I get almost all the results (except PRIMER_SEQUENCE_ID and SEQUENCE ) without providing a line code use Bio::Tools::Primer3 in the script. How to explain this? Is it because the following line code? my $result=$primer3->run; The last question: which line code is used to invoke program primer3.exe? How does Perl script call the primer3.exe? Once again thank you all very much, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Thu May 11 03:36:39 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 10 May 2006 22:36:39 -0500 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: <655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu> Message-ID: <000301c674ac$1d40f0f0$15327e82@pyrimidine> I think you can get pretty much everything now, though I can definitely see the use of a local database. I ran a few tests, really unrelated to this, using the powerscripting test page at NCBI for eutils (for the curious, at http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was able to retrieve XML-formatted taxonomic information; here's the bacterium Frankia sp. CcI3 TaxID info, which looks like they have everything set up by rank. It gives quite a bit of information. 106370 Frankia sp. CcI3 1854 species Bacteria 11 Bacterial and Plant Plastid 0 Unspecified cellular organisms; Bacteria; Actinobacteria; Actinobacteria (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae; Frankia 131567 cellular organisms no rank 2 Bacteria superkingdom 201174 Actinobacteria phylum 1760 Actinobacteria (class) class 85003 Actinobacteridae subclass 2037 Actinomycetales order 85013 Frankineae suborder 74712 Frankiaceae family 1854 Frankia genus 1999/10/22 2005/01/19 2000/02/02 Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Jason Stajich > Sent: Wednesday, May 10, 2006 7:54 PM > To: Sendu Bala > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::Taxonomy confusion > > I would use the implementation that talks to the flatfile db as the > standard here. nodes are defined by the data in from taxonomy dump > dbs from ncbi. > the eutils is pretty worthless except for taxid->name or reverse, you > can't get the full taxonomy (or couldn't when that implementation was > written). > > The "name" method refers to the name of the node - each level in the > taxonomy can have a "name". > > The bits of hackiness relate to wrapping the node object as a > Bio::Species and/or being able to read a genbank file and the > organism taxonomy data as a list and instantiating. If we could rely > on everything being in a DB of course this would be simpler. > > Another problem is the depth of the taxonomy is not constant for > every node so assuming that a fixed number of slots will be filled in > to generate the taxonomy leads to problems. > > Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the > best example of working code as this is how I really wanted it to > work, the Bio::Species hacks are only there to shoehorn data > retrieved from genbank files in. With the flatfile implementation > you have to walk all the way up the db hierarchy to get the kingdom > for a node so you do have to build up the classification hierarchy as > each node only stores data about itsself. > > I'm not exactly sure what you are proposing to do, but would > definitely enjoy another pair of hands, I don't really have time to > mess with it any time soon. > > -jason > On May 10, 2006, at 5:30 AM, Sendu Bala wrote: > > > Hi, > > I'm a little confused as to how names are supposed to work in > > Bio::Taxonomy::Node. > > > > In the bioperl versions that I've looked at a Node doesn't seem to > > store > > the most important information about itself - it's scientific name > > - in > > an obvious place. bioperl 1.5.1 puts it at the start of the > > classification list. I'd have thought sticking it in -name would make > > more sense, but this is used only for the GenBank common name. > > > > The Bio::Taxonomy docs still suggests: > > > > my $node_species_sapiens = Bio::Taxonomy::Node->new( > > -object_id => 9606, # or -ncbi_taxid. Requird tag > > -names => { > > 'scientific' => ['sapiens'], > > 'common_name' => ['human'] > > }, > > -rank => 'species' # Required tag > > ); > > > > and whilst Bio::Taxonomy::Node does not accept -names, it does have a > > 'name' method which claims to work like: > > > > $obj->name('scientific', 'sapiens'); > > > > This kind of thing would be really nice, but afaics > > Bio::Taxonomy::Node->new takes the -name value and makes a common name > > out of it, whilst the name() method passes any 'scientific' name to > > the > > scientific_name() method which is unable to set any value (and warns > > about this), only get. > > > > It seems like the need to have this classification array work the same > > way as Bio::Species is causing some unnecessary restrictions. Can't > > the > > more sensible idea of having a dedicated storage spot for the > > ScientificName and other parameters be used, with the classification > > array either being generated just-in-time from the hash-stored > > data, or > > indeed being generated from the Lineage field? > > > > > > Also, why does a node store the complete hierarchy on itself in the > > classification array? If we're going that far, why don't the > > Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a > > get_taxonomy() method instead of a get_Taxonomy_Node() method. > > get_taxonomy() could, from a single efetch.fcgi lookup, create a > > complete Bio::Taxonomy with all the nodes. Whilst most nodes would > > only > > have a minimum of information, if you could simply ask a node what its > > rank and scientific name was you could easily build a classification > > array, or ask what Kingdom your species was in etc. > > > > Are there good reasons for Taxonomy working the way it does in > > 1.5.1, or > > would I not be wasting my time re-writing things to make more sense > > (to me)? > > > > > > Cheers, > > Sendu. > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Thu May 11 12:04:54 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu, 11 May 2006 08:04:54 -0400 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: <000301c674ac$1d40f0f0$15327e82@pyrimidine> References: <000301c674ac$1d40f0f0$15327e82@pyrimidine> Message-ID: Great - now we just need someone to volunteer to actually work on this. The current code grabs most of this but I believe expects a different XML On May 10, 2006, at 11:36 PM, Chris Fields wrote: > I think you can get pretty much everything now, though I can > definitely see > the use of a local database. I ran a few tests, really unrelated > to this, > using the powerscripting test page at NCBI for eutils (for the > curious, at > http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was > able to > retrieve XML-formatted taxonomic information; here's the bacterium > Frankia > sp. CcI3 TaxID info, which looks like they have everything set up > by rank. > It gives quite a bit of information. > > > "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd"> > > > > 106370 > Frankia sp. CcI3 > 1854 > species > Bacteria > > 11 > Bacterial and Plant Plastid > > > 0 > Unspecified > > cellular organisms; Bacteria; Actinobacteria; > Actinobacteria > (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae; > Frankia > > > 131567 > cellular organisms > no rank > > > 2 > Bacteria > superkingdom > > > 201174 > Actinobacteria > phylum > > > 1760 > Actinobacteria (class) > class > > > 85003 > Actinobacteridae > subclass > > > 2037 > Actinomycetales > order > > > 85013 > Frankineae > suborder > > > 74712 > Frankiaceae > family > > > 1854 > Frankia > genus > > > 1999/10/22 > 2005/01/19 > 2000/02/02 > > > > Chris > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Jason Stajich >> Sent: Wednesday, May 10, 2006 7:54 PM >> To: Sendu Bala >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion >> >> I would use the implementation that talks to the flatfile db as the >> standard here. nodes are defined by the data in from taxonomy dump >> dbs from ncbi. >> the eutils is pretty worthless except for taxid->name or reverse, you >> can't get the full taxonomy (or couldn't when that implementation was >> written). >> >> The "name" method refers to the name of the node - each level in the >> taxonomy can have a "name". >> >> The bits of hackiness relate to wrapping the node object as a >> Bio::Species and/or being able to read a genbank file and the >> organism taxonomy data as a list and instantiating. If we could rely >> on everything being in a DB of course this would be simpler. >> >> Another problem is the depth of the taxonomy is not constant for >> every node so assuming that a fixed number of slots will be filled in >> to generate the taxonomy leads to problems. >> >> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the >> best example of working code as this is how I really wanted it to >> work, the Bio::Species hacks are only there to shoehorn data >> retrieved from genbank files in. With the flatfile implementation >> you have to walk all the way up the db hierarchy to get the kingdom >> for a node so you do have to build up the classification hierarchy as >> each node only stores data about itsself. >> >> I'm not exactly sure what you are proposing to do, but would >> definitely enjoy another pair of hands, I don't really have time to >> mess with it any time soon. >> >> -jason >> On May 10, 2006, at 5:30 AM, Sendu Bala wrote: >> >>> Hi, >>> I'm a little confused as to how names are supposed to work in >>> Bio::Taxonomy::Node. >>> >>> In the bioperl versions that I've looked at a Node doesn't seem to >>> store >>> the most important information about itself - it's scientific name >>> - in >>> an obvious place. bioperl 1.5.1 puts it at the start of the >>> classification list. I'd have thought sticking it in -name would >>> make >>> more sense, but this is used only for the GenBank common name. >>> >>> The Bio::Taxonomy docs still suggests: >>> >>> my $node_species_sapiens = Bio::Taxonomy::Node->new( >>> -object_id => 9606, # or -ncbi_taxid. Requird tag >>> -names => { >>> 'scientific' => ['sapiens'], >>> 'common_name' => ['human'] >>> }, >>> -rank => 'species' # Required tag >>> ); >>> >>> and whilst Bio::Taxonomy::Node does not accept -names, it does >>> have a >>> 'name' method which claims to work like: >>> >>> $obj->name('scientific', 'sapiens'); >>> >>> This kind of thing would be really nice, but afaics >>> Bio::Taxonomy::Node->new takes the -name value and makes a common >>> name >>> out of it, whilst the name() method passes any 'scientific' name to >>> the >>> scientific_name() method which is unable to set any value (and warns >>> about this), only get. >>> >>> It seems like the need to have this classification array work the >>> same >>> way as Bio::Species is causing some unnecessary restrictions. Can't >>> the >>> more sensible idea of having a dedicated storage spot for the >>> ScientificName and other parameters be used, with the classification >>> array either being generated just-in-time from the hash-stored >>> data, or >>> indeed being generated from the Lineage field? >>> >>> >>> Also, why does a node store the complete hierarchy on itself in the >>> classification array? If we're going that far, why don't the >>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a >>> get_taxonomy() method instead of a get_Taxonomy_Node() method. >>> get_taxonomy() could, from a single efetch.fcgi lookup, create a >>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would >>> only >>> have a minimum of information, if you could simply ask a node >>> what its >>> rank and scientific name was you could easily build a classification >>> array, or ask what Kingdom your species was in etc. >>> >>> Are there good reasons for Taxonomy working the way it does in >>> 1.5.1, or >>> would I not be wasting my time re-writing things to make more sense >>> (to me)? >>> >>> >>> Cheers, >>> Sendu. >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From sb at mrc-dunn.cam.ac.uk Thu May 11 11:51:44 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Thu, 11 May 2006 12:51:44 +0100 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: <655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu> References: <4461B2D3.7010603@mrc-dunn.cam.ac.uk> <655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu> Message-ID: <44632550.3040603@mrc-dunn.cam.ac.uk> Jason Stajich wrote: > I would use the implementation that talks to the flatfile db as the > standard here. nodes are defined by the data in from taxonomy dump > dbs from ncbi. the eutils is pretty worthless except for taxid->name > or reverse, you can't get the full taxonomy (or couldn't when that > implementation was written). I'm not sure what you mean. In 1.5.1 you have access to the full taxonomy because you're using efetch.fcgi. Indeed, you parse the full taxonomy already to get the classification. > The "name" method refers to the name of the node - each level in the > taxonomy can have a "name". Yes, and to me the 'name of the node' is its scientific name (something like 'sapiens'), not a 'common' name. So why is it stored as a 'common' name in the object? Why don't the DB::Taxonomy modules store the actual common names (something like 'human')? > The bits of hackiness relate to wrapping the node object as a > Bio::Species and/or being able to read a genbank file and the > organism taxonomy data as a list and instantiating. If we could rely > on everything being in a DB of course this would be simpler. I think that Taxonomy stuff could be done in a 'pure' way, with a new Bio::Species made as a wrapper around an appropriate Taxonomy module(s) that cheated and made fake nodes from a genbank list and then made a proper Bio::Taxonomy. > With the flatfile implementation you have to walk all the way up the > db hierarchy to get the kingdom for a node so you do have to build up > the classification hierarchy as each node only stores data about > itsself. I'm still actually using bioperl 1.4 but I'm looking at 1.5.1 assuming it is the latest available and I see that the flatfile implementation works the same way as the entrez one. The requested node is fetched, but then internally it walks the hierarchy purely so it can build a classification list which is then stored on the object. If you're already retrieving every node above the the requested node, why not just return every node? Why not just return a whole Bio::Taxonomy? > I'm not exactly sure what you are proposing to do, but would > definitely enjoy another pair of hands, I don't really have time to > mess with it any time soon. I shouldn't really be spending any time on it either, but I knocked up a quick implementation for myself yesterday/today. I'm working on a bunch of modules that inherit from bioperl and then add/alter to suit my needs. In this regard they're a bit limited and kind of hard-coded to my way of thinking, but hopefully you can see my intent and perhaps use some of my implementation. In my implementation: # DB::Taxonomy::* return a Bio::Taxonomy equivalent with a single database lookup. # The Taxonomy is implicitly a tree. # The Taxonomy can have branches of different length from root to the same rank level. # The Taxonomy isn't told what ranks is has (isn't limited by some supplied rank list); it has the ranks that its Nodes have and knows (without being told) what order those ranks should be in. # The Taxonomy is made of Nodes that truly only contain information about themselves and have no classification array or anything like that. # A Node can still be classified. # We can have Nodes of rank 'no rank' that will be correctly ordered in the classification. # Nodes have a scientific name and common names # You get parent and all children nodes without database lookups. # There is a Bio::Species like thing that wraps around this and gives easy access to what I really want to do: my $human = TFBS::Species->new(-common_name => 'human'); my @classification = $human->classification; # returns the array you'd expect from a normally created, fully classified Bio::Species my $kingdom = $human->kingdom # returns 'Metazoa' # For genbank, we can still supply TFBS::Species a classification array http://bix.sendu.me.uk/files/taxonomy_the_tfbs_way.tar.gz (only tested inheriting from bioperl 1.4, but ideally that shouldn't make any difference!) Is there any scope for bioperl Taxonomy becoming more like this? Or are there problems with my design (quite likely!)? Or are there good reasons for maintaining the current way of working? Please feel free to shoot me down/ discuss. Cheers, Sendu. From sb at mrc-dunn.cam.ac.uk Thu May 11 12:22:53 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Thu, 11 May 2006 13:22:53 +0100 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: References: <000301c674ac$1d40f0f0$15327e82@pyrimidine> Message-ID: <44632C9D.4010408@mrc-dunn.cam.ac.uk> Jason Stajich wrote: > Great - now we just need someone to volunteer to actually work on this. Now I'm really confused... > The current code grabs most of this but I believe expects a different XML No, I think the code in bioperl 1.5.1 Bio::DB::Taxonomy::entrez expects that XML, and parses it as fully as flatfile.pm does. Nothing more to do. Weren't you the person that wrote that parser? I parse the same XML in my version of entrez.pm (see my previous email); the main difference being I make Nodes out of each Taxon instead of just adding each Taxon's ScientificName to the classification array. From jason.stajich at duke.edu Thu May 11 13:53:56 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu, 11 May 2006 09:53:56 -0400 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: <44632C9D.4010408@mrc-dunn.cam.ac.uk> References: <000301c674ac$1d40f0f0$15327e82@pyrimidine> <44632C9D.4010408@mrc-dunn.cam.ac.uk> Message-ID: i guess so - long since forgotten what it supports though since I don't regularly use it. sorry. On May 11, 2006, at 8:22 AM, Sendu Bala wrote: > Jason Stajich wrote: >> Great - now we just need someone to volunteer to actually work on >> this. > > Now I'm really confused... > > >> The current code grabs most of this but I believe expects a >> different XML > > No, I think the code in bioperl 1.5.1 Bio::DB::Taxonomy::entrez > expects > that XML, and parses it as fully as flatfile.pm does. Nothing more to > do. Weren't you the person that wrote that parser? > > I parse the same XML in my version of entrez.pm (see my previous > email); > the main difference being I make Nodes out of each Taxon instead of > just > adding each Taxon's ScientificName to the classification array. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cjfields at uiuc.edu Thu May 11 14:57:20 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 11 May 2006 09:57:20 -0500 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: Message-ID: <000b01c6750b$33e95ea0$15327e82@pyrimidine> Heh... To tell the truth, I haven't looked at Bio::DB::Taxonomy in any depth yet, but I myself have seen issues with the way Bio::Species treats bacterial strains (I guess this also involves Bio::Taxonomy::Node since that's what Bio::Species delegates to). Seems it likes to repeat some strain names when using $seq->species->common_name. Not a killer problem but annoying since the correct name is in the source tag in the feature table! I 'could' take a look at it but I can't guarantee quick results. Jason, I could add Taxonomy to the EUtilities overhaul I mentioned to you previously but it'll take awhile to get going. I'm really more interested in getting epost-esearch-efetch sequence retrieval up and running first with the same API as Bio::DB::GenBank/Genpept and Bio::DB::Query::GenBank, donate the code (late summer/fall???) after working out namespace issues so it doesn't conflict with current Bio::DB::WebDBSeqI inheritance. I suppose I could also look at Bio::DB:Taxonomy to see what's up in the next couple of weeks (after conference), unless someone gets to it sooner. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Jason Stajich > Sent: Thursday, May 11, 2006 7:05 AM > To: Chris Fields > Cc: bioperl-l at lists.open-bio.org; 'Sendu Bala' > Subject: Re: [Bioperl-l] Bio::Taxonomy confusion > > Great - now we just need someone to volunteer to actually work on this. > > The current code grabs most of this but I believe expects a different > XML > > > On May 10, 2006, at 11:36 PM, Chris Fields wrote: > > > I think you can get pretty much everything now, though I can > > definitely see > > the use of a local database. I ran a few tests, really unrelated > > to this, > > using the powerscripting test page at NCBI for eutils (for the > > curious, at > > http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was > > able to > > retrieve XML-formatted taxonomic information; here's the bacterium > > Frankia > > sp. CcI3 TaxID info, which looks like they have everything set up > > by rank. > > It gives quite a bit of information. > > > > > > > "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd"> > > > > > > > > 106370 > > Frankia sp. CcI3 > > 1854 > > species > > Bacteria > > > > 11 > > Bacterial and Plant Plastid > > > > > > 0 > > Unspecified > > > > cellular organisms; Bacteria; Actinobacteria; > > Actinobacteria > > (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae; > > Frankia > > > > > > 131567 > > cellular organisms > > no rank > > > > > > 2 > > Bacteria > > superkingdom > > > > > > 201174 > > Actinobacteria > > phylum > > > > > > 1760 > > Actinobacteria (class) > > class > > > > > > 85003 > > Actinobacteridae > > subclass > > > > > > 2037 > > Actinomycetales > > order > > > > > > 85013 > > Frankineae > > suborder > > > > > > 74712 > > Frankiaceae > > family > > > > > > 1854 > > Frankia > > genus > > > > > > 1999/10/22 > > 2005/01/19 > > 2000/02/02 > > > > > > > > Chris > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Jason Stajich > >> Sent: Wednesday, May 10, 2006 7:54 PM > >> To: Sendu Bala > >> Cc: bioperl-l at lists.open-bio.org > >> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion > >> > >> I would use the implementation that talks to the flatfile db as the > >> standard here. nodes are defined by the data in from taxonomy dump > >> dbs from ncbi. > >> the eutils is pretty worthless except for taxid->name or reverse, you > >> can't get the full taxonomy (or couldn't when that implementation was > >> written). > >> > >> The "name" method refers to the name of the node - each level in the > >> taxonomy can have a "name". > >> > >> The bits of hackiness relate to wrapping the node object as a > >> Bio::Species and/or being able to read a genbank file and the > >> organism taxonomy data as a list and instantiating. If we could rely > >> on everything being in a DB of course this would be simpler. > >> > >> Another problem is the depth of the taxonomy is not constant for > >> every node so assuming that a fixed number of slots will be filled in > >> to generate the taxonomy leads to problems. > >> > >> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the > >> best example of working code as this is how I really wanted it to > >> work, the Bio::Species hacks are only there to shoehorn data > >> retrieved from genbank files in. With the flatfile implementation > >> you have to walk all the way up the db hierarchy to get the kingdom > >> for a node so you do have to build up the classification hierarchy as > >> each node only stores data about itsself. > >> > >> I'm not exactly sure what you are proposing to do, but would > >> definitely enjoy another pair of hands, I don't really have time to > >> mess with it any time soon. > >> > >> -jason > >> On May 10, 2006, at 5:30 AM, Sendu Bala wrote: > >> > >>> Hi, > >>> I'm a little confused as to how names are supposed to work in > >>> Bio::Taxonomy::Node. > >>> > >>> In the bioperl versions that I've looked at a Node doesn't seem to > >>> store > >>> the most important information about itself - it's scientific name > >>> - in > >>> an obvious place. bioperl 1.5.1 puts it at the start of the > >>> classification list. I'd have thought sticking it in -name would > >>> make > >>> more sense, but this is used only for the GenBank common name. > >>> > >>> The Bio::Taxonomy docs still suggests: > >>> > >>> my $node_species_sapiens = Bio::Taxonomy::Node->new( > >>> -object_id => 9606, # or -ncbi_taxid. Requird tag > >>> -names => { > >>> 'scientific' => ['sapiens'], > >>> 'common_name' => ['human'] > >>> }, > >>> -rank => 'species' # Required tag > >>> ); > >>> > >>> and whilst Bio::Taxonomy::Node does not accept -names, it does > >>> have a > >>> 'name' method which claims to work like: > >>> > >>> $obj->name('scientific', 'sapiens'); > >>> > >>> This kind of thing would be really nice, but afaics > >>> Bio::Taxonomy::Node->new takes the -name value and makes a common > >>> name > >>> out of it, whilst the name() method passes any 'scientific' name to > >>> the > >>> scientific_name() method which is unable to set any value (and warns > >>> about this), only get. > >>> > >>> It seems like the need to have this classification array work the > >>> same > >>> way as Bio::Species is causing some unnecessary restrictions. Can't > >>> the > >>> more sensible idea of having a dedicated storage spot for the > >>> ScientificName and other parameters be used, with the classification > >>> array either being generated just-in-time from the hash-stored > >>> data, or > >>> indeed being generated from the Lineage field? > >>> > >>> > >>> Also, why does a node store the complete hierarchy on itself in the > >>> classification array? If we're going that far, why don't the > >>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a > >>> get_taxonomy() method instead of a get_Taxonomy_Node() method. > >>> get_taxonomy() could, from a single efetch.fcgi lookup, create a > >>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would > >>> only > >>> have a minimum of information, if you could simply ask a node > >>> what its > >>> rank and scientific name was you could easily build a classification > >>> array, or ask what Kingdom your species was in etc. > >>> > >>> Are there good reasons for Taxonomy working the way it does in > >>> 1.5.1, or > >>> would I not be wasting my time re-writing things to make more sense > >>> (to me)? > >>> > >>> > >>> Cheers, > >>> Sendu. > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> -- > >> Jason Stajich > >> Duke University > >> http://www.duke.edu/~jes12 > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Thu May 11 15:42:07 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu, 11 May 2006 11:42:07 -0400 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: <000b01c6750b$33e95ea0$15327e82@pyrimidine> References: <000b01c6750b$33e95ea0$15327e82@pyrimidine> Message-ID: <0C1C2DAC-F388-465E-B6C2-7217A3B4CC6C@duke.edu> I think you'll see it is different and mostly a limitation of the genbank format and the Bio::Species objects that you get from a genbank parse do represent the full capabilities of a Taxonomy::Node. I am happy for someone to overhaul things, but it all boils down to inferring which part of a list of names is the species versus sub- species versus strain when none of the members of the list are labeled. This is some of the same problems we have for swissprot as well. I just don't think we can do it right only from the genbank file data so I don't see a lot of point of expecting Bio::Species to provide more than a representation of what is in the file and just return that array. It has seemed like we need to special case things pretty heavily or do a lookup in the taxonomydb for something. Can you guess what value is the strain versus sub-species? What happens when there is a two part strain name (space separated) and a sub-species or variety designation? SOURCE Staphylococcus haemolyticus JCSC1435 ORGANISM Staphylococcus haemolyticus JCSC1435 Bacteria; Firmicutes; Bacillales; Staphylococcus. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=279808 strain is JCSC1435 versus SOURCE Muntiacus muntjak vaginalis ORGANISM Muntiacus muntjak vaginalis Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Ruminantia; Pecora; Cervidae; Muntiacinae; Muntiacus. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9887 species is muntjak, sub-species vaginalis ? versus SOURCE Aspergillus nidulans FGSC A4 ORGANISM Aspergillus nidulans FGSC A4 Eukaryota; Fungi; Ascomycota; Pezizomycotina; Eurotiomycetes; Eurotiales; Trichocomaceae; Emericella. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=227321 Genus should be Aspergillus or Emericella ? Strain and subspecies/variety in the same entry SOURCE Cryptococcus neoformans var. grubii H99 ORGANISM Cryptococcus neoformans var. grubii H99 Eukaryota; Fungi; Basidiomycota; Hymenomycetes; Heterobasidiomycetes; Tremellomycetidae; Tremellales; Tremellaceae; Filobasidiella. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=235443 On May 11, 2006, at 10:57 AM, Chris Fields wrote: > Heh... > > To tell the truth, I haven't looked at Bio::DB::Taxonomy in any > depth yet, > but I myself have seen issues with the way Bio::Species treats > bacterial > strains (I guess this also involves Bio::Taxonomy::Node since > that's what > Bio::Species delegates to). Seems it likes to repeat some strain > names when > using $seq->species->common_name. Not a killer problem but > annoying since > the correct name is in the source tag in the feature table! I > 'could' take > a look at it but I can't guarantee quick results. > > Jason, I could add Taxonomy to the EUtilities overhaul I mentioned > to you > previously but it'll take awhile to get going. I'm really more > interested > in getting epost-esearch-efetch sequence retrieval up and running > first with > the same API as Bio::DB::GenBank/Genpept and > Bio::DB::Query::GenBank, donate > the code (late summer/fall???) after working out namespace issues > so it > doesn't conflict with current Bio::DB::WebDBSeqI inheritance. I > suppose I > could also look at Bio::DB:Taxonomy to see what's up in the next > couple of > weeks (after conference), unless someone gets to it sooner. > > Chris > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Jason Stajich >> Sent: Thursday, May 11, 2006 7:05 AM >> To: Chris Fields >> Cc: bioperl-l at lists.open-bio.org; 'Sendu Bala' >> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion >> >> Great - now we just need someone to volunteer to actually work on >> this. >> >> The current code grabs most of this but I believe expects a different >> XML >> >> >> On May 10, 2006, at 11:36 PM, Chris Fields wrote: >> >>> I think you can get pretty much everything now, though I can >>> definitely see >>> the use of a local database. I ran a few tests, really unrelated >>> to this, >>> using the powerscripting test page at NCBI for eutils (for the >>> curious, at >>> http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was >>> able to >>> retrieve XML-formatted taxonomic information; here's the bacterium >>> Frankia >>> sp. CcI3 TaxID info, which looks like they have everything set up >>> by rank. >>> It gives quite a bit of information. >>> >>> >>> >> "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd"> >>> >>> >>> >>> 106370 >>> Frankia sp. CcI3 >>> 1854 >>> species >>> Bacteria >>> >>> 11 >>> Bacterial and Plant Plastid >>> >>> >>> 0 >>> Unspecified >>> >>> cellular organisms; Bacteria; Actinobacteria; >>> Actinobacteria >>> (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae; >>> Frankia >>> >>> >>> 131567 >>> cellular organisms >>> no rank >>> >>> >>> 2 >>> Bacteria >>> superkingdom >>> >>> >>> 201174 >>> Actinobacteria >>> phylum >>> >>> >>> 1760 >>> Actinobacteria (class) >>> class >>> >>> >>> 85003 >>> Actinobacteridae >>> subclass >>> >>> >>> 2037 >>> Actinomycetales >>> order >>> >>> >>> 85013 >>> Frankineae >>> suborder >>> >>> >>> 74712 >>> Frankiaceae >>> family >>> >>> >>> 1854 >>> Frankia >>> genus >>> >>> >>> 1999/10/22 >>> 2005/01/19 >>> 2000/02/02 >>> >>> >>> >>> Chris >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Jason Stajich >>>> Sent: Wednesday, May 10, 2006 7:54 PM >>>> To: Sendu Bala >>>> Cc: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion >>>> >>>> I would use the implementation that talks to the flatfile db as the >>>> standard here. nodes are defined by the data in from taxonomy dump >>>> dbs from ncbi. >>>> the eutils is pretty worthless except for taxid->name or >>>> reverse, you >>>> can't get the full taxonomy (or couldn't when that >>>> implementation was >>>> written). >>>> >>>> The "name" method refers to the name of the node - each level in >>>> the >>>> taxonomy can have a "name". >>>> >>>> The bits of hackiness relate to wrapping the node object as a >>>> Bio::Species and/or being able to read a genbank file and the >>>> organism taxonomy data as a list and instantiating. If we could >>>> rely >>>> on everything being in a DB of course this would be simpler. >>>> >>>> Another problem is the depth of the taxonomy is not constant for >>>> every node so assuming that a fixed number of slots will be >>>> filled in >>>> to generate the taxonomy leads to problems. >>>> >>>> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as >>>> the >>>> best example of working code as this is how I really wanted it to >>>> work, the Bio::Species hacks are only there to shoehorn data >>>> retrieved from genbank files in. With the flatfile implementation >>>> you have to walk all the way up the db hierarchy to get the kingdom >>>> for a node so you do have to build up the classification >>>> hierarchy as >>>> each node only stores data about itsself. >>>> >>>> I'm not exactly sure what you are proposing to do, but would >>>> definitely enjoy another pair of hands, I don't really have time to >>>> mess with it any time soon. >>>> >>>> -jason >>>> On May 10, 2006, at 5:30 AM, Sendu Bala wrote: >>>> >>>>> Hi, >>>>> I'm a little confused as to how names are supposed to work in >>>>> Bio::Taxonomy::Node. >>>>> >>>>> In the bioperl versions that I've looked at a Node doesn't seem to >>>>> store >>>>> the most important information about itself - it's scientific name >>>>> - in >>>>> an obvious place. bioperl 1.5.1 puts it at the start of the >>>>> classification list. I'd have thought sticking it in -name would >>>>> make >>>>> more sense, but this is used only for the GenBank common name. >>>>> >>>>> The Bio::Taxonomy docs still suggests: >>>>> >>>>> my $node_species_sapiens = Bio::Taxonomy::Node->new( >>>>> -object_id => 9606, # or -ncbi_taxid. Requird tag >>>>> -names => { >>>>> 'scientific' => ['sapiens'], >>>>> 'common_name' => ['human'] >>>>> }, >>>>> -rank => 'species' # Required tag >>>>> ); >>>>> >>>>> and whilst Bio::Taxonomy::Node does not accept -names, it does >>>>> have a >>>>> 'name' method which claims to work like: >>>>> >>>>> $obj->name('scientific', 'sapiens'); >>>>> >>>>> This kind of thing would be really nice, but afaics >>>>> Bio::Taxonomy::Node->new takes the -name value and makes a common >>>>> name >>>>> out of it, whilst the name() method passes any 'scientific' >>>>> name to >>>>> the >>>>> scientific_name() method which is unable to set any value (and >>>>> warns >>>>> about this), only get. >>>>> >>>>> It seems like the need to have this classification array work the >>>>> same >>>>> way as Bio::Species is causing some unnecessary restrictions. >>>>> Can't >>>>> the >>>>> more sensible idea of having a dedicated storage spot for the >>>>> ScientificName and other parameters be used, with the >>>>> classification >>>>> array either being generated just-in-time from the hash-stored >>>>> data, or >>>>> indeed being generated from the Lineage field? >>>>> >>>>> >>>>> Also, why does a node store the complete hierarchy on itself in >>>>> the >>>>> classification array? If we're going that far, why don't the >>>>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just >>>>> have a >>>>> get_taxonomy() method instead of a get_Taxonomy_Node() method. >>>>> get_taxonomy() could, from a single efetch.fcgi lookup, create a >>>>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would >>>>> only >>>>> have a minimum of information, if you could simply ask a node >>>>> what its >>>>> rank and scientific name was you could easily build a >>>>> classification >>>>> array, or ask what Kingdom your species was in etc. >>>>> >>>>> Are there good reasons for Taxonomy working the way it does in >>>>> 1.5.1, or >>>>> would I not be wasting my time re-writing things to make more >>>>> sense >>>>> (to me)? >>>>> >>>>> >>>>> Cheers, >>>>> Sendu. >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> Jason Stajich >>>> Duke University >>>> http://www.duke.edu/~jes12 >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From WiersmaP at AGR.GC.CA Thu May 11 17:04:01 2006 From: WiersmaP at AGR.GC.CA (Wiersma, Paul) Date: Thu, 11 May 2006 13:04:01 -0400 Subject: [Bioperl-l] What is the relationship between primer3 moduleandrun-primer3 module? Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C4D@onncrxms5.agr.gc.ca> The bug that Wenwu referred should only occur when reading a Primer3 output file; the Bio::Tools::Run::Primer3->run method takes the results and directly transfers them to a Bio::Tools::Primer3 object without an intermediate file. A Data::Dumper look at the Bio::Tools::Primer3 object shows the keys and results for PRIMER_SEQUENCE_ID and SEQUENCE in 'results' and then again in the 'results_by_number' hash but only in the '0' hash. All of this doesn't really matter for Li's original concern. If you want to include the id of sequence along with the primer3 results just take it from the seq object (i.e. $seq->display_id() ). Since you are in a loop taking one sequence at a time this $seq will be the one that was sent to primer3. PAW Paul A. Wiersma Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Summerland, BC wiersmap at agr.gc.ca ? -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Cui, Wenwu (NIH/NCI) [F] Sent: Wednesday, May 10, 2006 6:46 PM To: chen li; bioperl-l at bioperl.org Subject: Re: [Bioperl-l] What is the relationship between primer3 moduleandrun-primer3 module? 1. Bio::Tools::Primer3 is already included in Bio::Tools::Run::Primer3 module so that you can parse the result file. 2. There is a bug in Bio::Toos::Primer3.pm line 264 as I mentioned. Once fixed, it can output 3. primer3.exe is called in the Bio::Tools::Run::Primer3 "run" function, please read the function definition. From cjfields at uiuc.edu Thu May 11 17:16:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 11 May 2006 12:16:19 -0500 Subject: [Bioperl-l] Bio::Taxonomy confusion In-Reply-To: <0C1C2DAC-F388-465E-B6C2-7217A3B4CC6C@duke.edu> Message-ID: <000f01c6751e$9e89d6a0$15327e82@pyrimidine> > I think you'll see it is different and mostly a limitation of the > genbank format and the Bio::Species objects that you get from a > genbank parse do represent the full capabilities of a Taxonomy::Node. I definitely see the rational for using a TaxID lookup (I think Hilmar said so as well), especially for local databases. I wonder, though, if there is a way that RichSeqs like GenBank, when passed through SeqIO, can be just be 'short-circuited' using the sequence builder to just accept what's on the SOURCE or ORGANISM line of a file as is, without forcing it into Bio::Species/Bio::Taxonomy::Node. Or maybe diminish the role of the SOURCE/ORGANISM lines altogether to just simple Annotation objects and place much greater emphasis on the TaxID itself, in effect decoupling the TaxID (taxonomic information) from SOURCE/ORGANISM (annotation information). In other words, have GenBank/EMBL classification lines and organism lines essentially stay like they are in the input file (use simple objects). Then, if one were really intent on getting the full name, classification, etc., or one wanted to store their sequences in bioperl-db, they would be required to either have a local db of NCBI Taxonomy or remote access to a similar database (NCBI or something else) so a lookup could be accomplished using the TaxID. If they us BioSQL, then require them to preload their BioSQL database with NCBI's taxonomy, something Hilmar already strongly suggests. If anyone isn't interested in the taxonomic information or doesn't want to bother grabbing the database or setting up remote access, tough luck; just grab the Bio::Annotation/Bio::Species object and use that. As the saying goes, "you can't be all things to all people." At some point you have to throw your arms in the air, do the best you can, but give up trying to please everyone. > I am happy for someone to overhaul things, but it all boils down to > inferring which part of a list of names is the species versus sub- > species versus strain when none of the members of the list are > labeled. This is some of the same problems we have for swissprot as > well. I just don't think we can do it right only from the genbank > file data so I don't see a lot of point of expecting Bio::Species to > provide more than a representation of what is in the file and just > return that array. > > > It has seemed like we need to special case things pretty heavily or > do a lookup in the taxonomydb for something. > > Can you guess what value is the strain versus sub-species? What > happens when there is a two part strain name (space separated) and a > sub-species or variety designation? > > SOURCE Staphylococcus haemolyticus JCSC1435 > ORGANISM Staphylococcus haemolyticus JCSC1435 > Bacteria; Firmicutes; Bacillales; Staphylococcus. > http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=279808 > strain is JCSC1435 > > versus > SOURCE Muntiacus muntjak vaginalis > ORGANISM Muntiacus muntjak vaginalis > Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; > Euteleostomi; > Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; > Ruminantia; > Pecora; Cervidae; Muntiacinae; Muntiacus. > http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9887 > species is muntjak, sub-species vaginalis ? > > versus > SOURCE Aspergillus nidulans FGSC A4 > ORGANISM Aspergillus nidulans FGSC A4 > Eukaryota; Fungi; Ascomycota; Pezizomycotina; > Eurotiomycetes; > Eurotiales; Trichocomaceae; Emericella. > http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=227321 > > Genus should be Aspergillus or Emericella ? > > Strain and subspecies/variety in the same entry > SOURCE Cryptococcus neoformans var. grubii H99 > ORGANISM Cryptococcus neoformans var. grubii H99 > Eukaryota; Fungi; Basidiomycota; Hymenomycetes; > Heterobasidiomycetes; Tremellomycetidae; Tremellales; > Tremellaceae; > Filobasidiella. > http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=235443 Definitely tricky! This really points out the problem here. It used to be a problem for only a few cases but with so many bacterial and fungal genomes that's changed. The Frankia XML example has the scientific name set to "Frankia sp. CcI3", which matches the SOURCE/ORGANISM line in NCBI's GenBank files and the OS line in EMBL files. It looks like the lines are parsed into and then built from the ground-up in Bio::SeqIO::genbank using Bio::Species objects, which, in my case with the strain designation, is where the problem lies. They could be placed in annotation objects with (-tagname=> 'SOURCE', value =>'Frankia sp. CcI3') or similar settings. Or simplify Bio::Species to only represent the information in the GenBank SOURCE/ORGANISM/CLASSIFICATION or EMBL OS/OC lines and nothing more complex than that (no complex taxonomy; for that you use the TaxID and local database). Okay, I need to lay off the coffee now... Chris > On May 11, 2006, at 10:57 AM, Chris Fields wrote: > > > Heh... > > > > To tell the truth, I haven't looked at Bio::DB::Taxonomy in any > > depth yet, > > but I myself have seen issues with the way Bio::Species treats > > bacterial > > strains (I guess this also involves Bio::Taxonomy::Node since > > that's what > > Bio::Species delegates to). Seems it likes to repeat some strain > > names when > > using $seq->species->common_name. Not a killer problem but > > annoying since > > the correct name is in the source tag in the feature table! I > > 'could' take > > a look at it but I can't guarantee quick results. > > > > Jason, I could add Taxonomy to the EUtilities overhaul I mentioned > > to you > > previously but it'll take awhile to get going. I'm really more > > interested > > in getting epost-esearch-efetch sequence retrieval up and running > > first with > > the same API as Bio::DB::GenBank/Genpept and > > Bio::DB::Query::GenBank, donate > > the code (late summer/fall???) after working out namespace issues > > so it > > doesn't conflict with current Bio::DB::WebDBSeqI inheritance. I > > suppose I > > could also look at Bio::DB:Taxonomy to see what's up in the next > > couple of > > weeks (after conference), unless someone gets to it sooner. > > > > Chris > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Jason Stajich > >> Sent: Thursday, May 11, 2006 7:05 AM > >> To: Chris Fields > >> Cc: bioperl-l at lists.open-bio.org; 'Sendu Bala' > >> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion > >> > >> Great - now we just need someone to volunteer to actually work on > >> this. > >> > >> The current code grabs most of this but I believe expects a different > >> XML > >> > >> > >> On May 10, 2006, at 11:36 PM, Chris Fields wrote: > >> > >>> I think you can get pretty much everything now, though I can > >>> definitely see > >>> the use of a local database. I ran a few tests, really unrelated > >>> to this, > >>> using the powerscripting test page at NCBI for eutils (for the > >>> curious, at > >>> http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was > >>> able to > >>> retrieve XML-formatted taxonomic information; here's the bacterium > >>> Frankia > >>> sp. CcI3 TaxID info, which looks like they have everything set up > >>> by rank. > >>> It gives quite a bit of information. > >>> > >>> > >>> >>> "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd"> > >>> > >>> > >>> > >>> 106370 > >>> Frankia sp. CcI3 > >>> 1854 > >>> species > >>> Bacteria > >>> > >>> 11 > >>> Bacterial and Plant Plastid > >>> > >>> > >>> 0 > >>> Unspecified > >>> > >>> cellular organisms; Bacteria; Actinobacteria; > >>> Actinobacteria > >>> (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae; > >>> Frankia > >>> > >>> > >>> 131567 > >>> cellular organisms > >>> no rank > >>> > >>> > >>> 2 > >>> Bacteria > >>> superkingdom > >>> > >>> > >>> 201174 > >>> Actinobacteria > >>> phylum > >>> > >>> > >>> 1760 > >>> Actinobacteria (class) > >>> class > >>> > >>> > >>> 85003 > >>> Actinobacteridae > >>> subclass > >>> > >>> > >>> 2037 > >>> Actinomycetales > >>> order > >>> > >>> > >>> 85013 > >>> Frankineae > >>> suborder > >>> > >>> > >>> 74712 > >>> Frankiaceae > >>> family > >>> > >>> > >>> 1854 > >>> Frankia > >>> genus > >>> > >>> > >>> 1999/10/22 > >>> 2005/01/19 > >>> 2000/02/02 > >>> > >>> > >>> > >>> Chris > >>> > >>>> -----Original Message----- > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>> bounces at lists.open-bio.org] On Behalf Of Jason Stajich > >>>> Sent: Wednesday, May 10, 2006 7:54 PM > >>>> To: Sendu Bala > >>>> Cc: bioperl-l at lists.open-bio.org > >>>> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion > >>>> > >>>> I would use the implementation that talks to the flatfile db as the > >>>> standard here. nodes are defined by the data in from taxonomy dump > >>>> dbs from ncbi. > >>>> the eutils is pretty worthless except for taxid->name or > >>>> reverse, you > >>>> can't get the full taxonomy (or couldn't when that > >>>> implementation was > >>>> written). > >>>> > >>>> The "name" method refers to the name of the node - each level in > >>>> the > >>>> taxonomy can have a "name". > >>>> > >>>> The bits of hackiness relate to wrapping the node object as a > >>>> Bio::Species and/or being able to read a genbank file and the > >>>> organism taxonomy data as a list and instantiating. If we could > >>>> rely > >>>> on everything being in a DB of course this would be simpler. > >>>> > >>>> Another problem is the depth of the taxonomy is not constant for > >>>> every node so assuming that a fixed number of slots will be > >>>> filled in > >>>> to generate the taxonomy leads to problems. > >>>> > >>>> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as > >>>> the > >>>> best example of working code as this is how I really wanted it to > >>>> work, the Bio::Species hacks are only there to shoehorn data > >>>> retrieved from genbank files in. With the flatfile implementation > >>>> you have to walk all the way up the db hierarchy to get the kingdom > >>>> for a node so you do have to build up the classification > >>>> hierarchy as > >>>> each node only stores data about itsself. > >>>> > >>>> I'm not exactly sure what you are proposing to do, but would > >>>> definitely enjoy another pair of hands, I don't really have time to > >>>> mess with it any time soon. > >>>> > >>>> -jason > >>>> On May 10, 2006, at 5:30 AM, Sendu Bala wrote: > >>>> > >>>>> Hi, > >>>>> I'm a little confused as to how names are supposed to work in > >>>>> Bio::Taxonomy::Node. > >>>>> > >>>>> In the bioperl versions that I've looked at a Node doesn't seem to > >>>>> store > >>>>> the most important information about itself - it's scientific name > >>>>> - in > >>>>> an obvious place. bioperl 1.5.1 puts it at the start of the > >>>>> classification list. I'd have thought sticking it in -name would > >>>>> make > >>>>> more sense, but this is used only for the GenBank common name. > >>>>> > >>>>> The Bio::Taxonomy docs still suggests: > >>>>> > >>>>> my $node_species_sapiens = Bio::Taxonomy::Node->new( > >>>>> -object_id => 9606, # or -ncbi_taxid. Requird tag > >>>>> -names => { > >>>>> 'scientific' => ['sapiens'], > >>>>> 'common_name' => ['human'] > >>>>> }, > >>>>> -rank => 'species' # Required tag > >>>>> ); > >>>>> > >>>>> and whilst Bio::Taxonomy::Node does not accept -names, it does > >>>>> have a > >>>>> 'name' method which claims to work like: > >>>>> > >>>>> $obj->name('scientific', 'sapiens'); > >>>>> > >>>>> This kind of thing would be really nice, but afaics > >>>>> Bio::Taxonomy::Node->new takes the -name value and makes a common > >>>>> name > >>>>> out of it, whilst the name() method passes any 'scientific' > >>>>> name to > >>>>> the > >>>>> scientific_name() method which is unable to set any value (and > >>>>> warns > >>>>> about this), only get. > >>>>> > >>>>> It seems like the need to have this classification array work the > >>>>> same > >>>>> way as Bio::Species is causing some unnecessary restrictions. > >>>>> Can't > >>>>> the > >>>>> more sensible idea of having a dedicated storage spot for the > >>>>> ScientificName and other parameters be used, with the > >>>>> classification > >>>>> array either being generated just-in-time from the hash-stored > >>>>> data, or > >>>>> indeed being generated from the Lineage field? > >>>>> > >>>>> > >>>>> Also, why does a node store the complete hierarchy on itself in > >>>>> the > >>>>> classification array? If we're going that far, why don't the > >>>>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just > >>>>> have a > >>>>> get_taxonomy() method instead of a get_Taxonomy_Node() method. > >>>>> get_taxonomy() could, from a single efetch.fcgi lookup, create a > >>>>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would > >>>>> only > >>>>> have a minimum of information, if you could simply ask a node > >>>>> what its > >>>>> rank and scientific name was you could easily build a > >>>>> classification > >>>>> array, or ask what Kingdom your species was in etc. > >>>>> > >>>>> Are there good reasons for Taxonomy working the way it does in > >>>>> 1.5.1, or > >>>>> would I not be wasting my time re-writing things to make more > >>>>> sense > >>>>> (to me)? > >>>>> > >>>>> > >>>>> Cheers, > >>>>> Sendu. > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> -- > >>>> Jason Stajich > >>>> Duke University > >>>> http://www.duke.edu/~jes12 > >>>> > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > >> -- > >> Jason Stajich > >> Duke University > >> http://www.duke.edu/~jes12 > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 From WiersmaP at AGR.GC.CA Fri May 12 00:13:12 2006 From: WiersmaP at AGR.GC.CA (Wiersma, Paul) Date: Thu, 11 May 2006 20:13:12 -0400 Subject: [Bioperl-l] What is the relationship between primer3 module andrun-primer3 module? Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C52@onncrxms5.agr.gc.ca> Li, If you are only "a little confused" by the OO concepts in the primer3 modules than you are doing well. To expand a little on Wenwu's explanations. A Bio::Tools::Run:Primer3 object is a "wrapper" around the Primer3 program. All the commands and parameters that Primer3 needs for it to run are collected inside the object. This includes a sequence (which you must supply as a sequence object) and parameters (most of which are already supplied by default but can be changed using the $primer3_object->add_targets method). Then, when everything is set the way you want it you 'run' the Primer3 program by using $primer3_object->run. The "wrapper" collects all the run parameters and sends them off to the Primer3 executable. Primer3 does the analysis and outputs the results to "stdout" in boulder-io format. By redirecting the output (i.e. perl p3run_script.pl > out.txt) you will get the Primer3 output directly in the boulder-io format ('tag'='value') stored in out.txt. Because out.txt is not being closed between each sequence called in the script you get all of the results concatenated in out.txt. However, if you supplied an output filename (-outfile=>$file_out) in the "wrapper", each line of output from Primer3 will be written to $file_out and at the end of Primer3 output the file will be closed. Now if your script loops to another sequence it will open the same outfile again and overwrite. One last important detail for the "wrapper" object. When Primer3 is executed the $primer3_object is designed to return a Bio::Tools::Primer3 object (the code is: my $results_object = $primer3_object->run). $results_object is a Bio::Tools::Primer3 object and contains the results of your Primer3 run as well as having methods for getting at that information. This includes finding out how many primer sets were found and the means to access the primer set results one at a time. It does work as advertised. Because all of the primer sets are based on the same sequence, Primer3 only outputs the SEQUENCE and PRIMER_SEQUENCE_ID one time instead of for each primer set. That is why they only show up in $results_object as if they belonged with the first primer set (set '0') and they are not available for the other primer sets. PAW Paul A. Wiersma Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Summerland, BC wiersmap at agr.gc.ca ? -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chen li Sent: Wednesday, May 10, 2006 5:28 PM To: bioperl-l at bioperl.org Subject: [Bioperl-l] What is the relationship between primer3 module andrun-primer3 module? First thank you all for replying my previous post about primer3. But now I am a little confused even after I read the documents: What is the relationship between these two modules? What is correct/standard way to use them to do the batch-primer design? What I do is that I use Bio::Tools::Run::Primer3 to design primers. Based on Dr. Roy Chaudhuri's information I can set the parameters using the following syntax: $primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510'); Based on Paul A. Wiersma's explanation I can also print out part of the primer results(because I don't need all the information). But there is a little trouble: PRIMER_SEQUENCE_ID can't be accessed using this method. And Paul points out that "PRIMER_SEQUENCE_ID and SEQUENCE are not part of the individual results but only end up by default with $results->primer_results(0)". So it seems there is no way to get around this problem using Bio::Tools::Run::Primer3. And others suggest using Bio::Tools::Primer3 to parse the results. So is true that Bio::Tools::Run::Primer3 is for primer design and Bio::Tools::Primer3 is for parsing the results from Bio::Tools::Run::Primer3? But what I find is that I get almost all the results (except PRIMER_SEQUENCE_ID and SEQUENCE ) without providing a line code use Bio::Tools::Primer3 in the script. How to explain this? Is it because the following line code? my $result=$primer3->run; The last question: which line code is used to invoke program primer3.exe? How does Perl script call the primer3.exe? Once again thank you all very much, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From torsten.seemann at infotech.monash.edu.au Fri May 12 04:29:37 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 12 May 2006 14:29:37 +1000 Subject: [Bioperl-l] Using bioperl to convert gene predictions to gff In-Reply-To: <000301c6698f$b17a4d20$0202a8c0@GosinkFranklin> References: <000301c6698f$b17a4d20$0202a8c0@GosinkFranklin> Message-ID: <44640F31.6090702@infotech.monash.edu.au> Mark, > I'd like to reformat gene predictions from several different programs > (genscan, glimmerhmm, fgenesh) to gff format. I know bioperl can parse the > output from these and other predictors and that it can export into GFF. But > I'm not clear on how to string the two together. > Can anyone point me at any example code? The parser module for the gene predictions generally allow you to iterate through the predicted genes. Each prediction is usually returned as a Bio::SeqFeatureI-derived object. Those objects have a gff_string() method to print them as GFF. So something as simple as this *may* work: use Bio::Tools::Glimmer; my $parser = new Bio::Tools::Glimmer(-file => 'glimmer.out'); while(my $gene = $parser->next_prediction) { print $gene->gff_string; } If you want separate GFF lines for each exon, you'll have to do another loop over $gene->exons() etc each of which are luckily also Bio::SeqFeatures! Or if want to modify some of the GFF columns first, eg. the source tag, just do $gene->source_tag('mynewtag') before printing it. Hope this helps, -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From torsten.seemann at infotech.monash.edu.au Fri May 12 04:36:46 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 12 May 2006 14:36:46 +1000 Subject: [Bioperl-l] Bio::Graphics::Panel imagemap making with Bio::Graphics::Panel In-Reply-To: <5b6410e0605030120q31d1f554mbc4bf104deca48bf@mail.gmail.com> References: <5b6410e0605030120q31d1f554mbc4bf104deca48bf@mail.gmail.com> Message-ID: <446410DE.7070305@infotech.monash.edu.au> Kevin, > I want to create an imagemap of short sequence matches with a longer one > with clickable imagemaps for the short sequences. I figure I can do this > easily enough using the example script for parsing blast output but I need > an example script to understand how to produce the html code for the > imagemap. I can find only rather cryptic references about how this can be > done (see below). The "blastGraphic" project probably has Perl code that could help you. http://www.gmod.org/blastGraphic.shtml It is/was part of the GMOD project. It produces pretty clickable image maps from BLAST reports. Hope it helps, -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From brianjgilmartin at hotmail.com Fri May 12 09:29:15 2006 From: brianjgilmartin at hotmail.com (brian gilmartin) Date: Fri, 12 May 2006 10:29:15 +0100 Subject: [Bioperl-l] (no subject) Message-ID: please remove me from the list _________________________________________________________________ Be the first to hear what's new at MSN - sign up to our free newsletters! http://www.msn.co.uk/newsletters From sb at mrc-dunn.cam.ac.uk Fri May 12 10:24:39 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Fri, 12 May 2006 11:24:39 +0100 Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names Message-ID: <44646267.2000802@mrc-dunn.cam.ac.uk> In bioperl up to at least 1.5.1, when one of the database modules comes across a species rank it does: if ($rank eq 'species') { # get rid of genus from species name (undef,$taxon_name) = split(/\s+/,$taxon_name,2); } However even though true scientific name is usually 'Genus species' in the database, note the 'usually' - sometimes the species is a multiword item that does not include the Genus, so we can't do some simple split and take the second word. The same applies to levels below species, eg. 'Avian erythroblastosis virus' is a variant of the species 'Avian leukosis virus' but 'Avian erythroblastosis virus (strain ES4)' is a variant of that variant... My solution is to just remove whatever is the same between the current rank and the previous rank. Maybe even that's not so perfect, but it must be a lot better than turning the species 'Avian leukosis virus' into the species 'virus' (especially given that the genus here is 'Alpharetrovirus')! # we need to be going root(kingdom) -> leaf (species or lower) order # # we need to be storing untouched versions of the scientific name of # the previous rank ($self->{_last_raw}) # # probably only bother start doing this when we get to genus my $last_raw = $self->{_last_raw} || undef; $self->{_last_raw} = $sci_name; if ($last_raw) { $sci_name =~ s/$last_raw//; $sci_name =~ s/^\s+//; } Are there even more strange species (and lower) names that would still not work well with the above solution? Cheers, Sendu. From s_maheshwari84 at rediffmail.com Fri May 12 13:55:49 2006 From: s_maheshwari84 at rediffmail.com (saurabh maheshwari) Date: 12 May 2006 13:55:49 -0000 Subject: [Bioperl-l] problem help me...........please Message-ID: <20060512135549.27106.qmail@webmail9.rediffmail.com> hello I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD). I am working on protein protein interaction but I am unable to use the protein interaction module i.e. ProteinGraph.pm.. Actially I am facing lots of problem in the programme I have written Please help me since last four months I am not able to solve the same problem.. I am pasting my programe here also I am attaching it also. ...... #!usr/bin/perl use lib "/usr/local/bioxapps/bioperl/library/"; use strict; use Bio::Graph::SimpleGraph; use Bio::Graph::IO; our @ISA=qw( Bio::SeqI); use Bio::Graph::Edge; use Bio::Graph::IO::dip; use Bio::Graph::IO::psi_xml; use Clone qw(clone); use vars qw(@ISA); use Bio::AnnotatableI; use Bio::IdentifiableI; our @ISA = qw(Bio::Graph::SimpleGraph); @ISA = qw(Bio::Graph::IO); our @ISA=qw(Expoerter); use Bio::Graph::ProteinGraph; use Class::AutoClass; use Bio::Graph::SimpleGraph::Traversal; my $graphio = Bio::Graph::IO->new(-file => '/users/saurabh/perl_program/sample1.txt',-format => 'dip'); print "$graphio"; my $graph = $graphio->next_network(); print "$graph->nodes\t"; $graph->remove_dup_edges(); my @un=$graph->unconnected_nodes(); print "\nthe unconnected nodes are =@un"; my @n=$graph->subgraph(); print "\subgraph=@n\n"; #print "Please the protein-id whose clusering coefficient is to be detemined\n"; #my $v=; my $density = $graph->density(); print "\ngraph density=$density\n"; my @graphs = $graph->components(); print "\nno of Connected components=$#graphs\n"; print "\nplease enter the protein-id whom you want to remove from the network\n"; my $no=; $graph->remove_nodes($graph->nodes_by_id($no)); my $count = $graph->edge_count(); print "\nno of edges=$count\n "; my $ncount = $graph->node_count(); print "\nno of nodes=$ncount\n "; print"\nenter the protein whose interactions is to be find "; my $x=; my $node = $graph->nodes_by_id($x); #print " this is $node\n"; my @neighbors = $graph->neighbors($node); print "to check"; print join",",map{$_->object_id()} @neighbors; my @nodes = $graph->nodes(); print "\nno of nodes = @nodes\t\n"; my @hubs; foreach my $nodi (@nodes) { if ($graph->neighbor_count($node) > 10) { push @hubs, $nodi; } } foreach my $r(@hubs) { my @y=@$r; print "the following proteins have > 10 interactors=@y\n"; } #siblingual protein my @edgeref = $graph->articulation_points(); print "no of articulation points=$#edgeref\n"; print "please enter the protein whom you want to check for articulation point \n "; my $nod=; # make pathgen graph my $grap = Bio::Graph::IO->new(-file => 'org.txt',-format => 'dip'); my $gra = $grap->next_network(); $graph->remove_dup_edges(); $graph->union($gra); my @duplicates = $graph->dup_edges(); print "these interactions exist in cere and c.elegan\n=@duplicates"; print "please enter the first protein for identifiaction of shortest path\n"; my $p1=; print "please enter the second protein for identifiaction of shortest path\n"; my $p2=; my @a=$graph->shortest_paths(); print "shortest path=@a\t\n"; with Regards SAURABH MAHESHWARI M.Sc. (BIOINFORMATICS) JAMIA MILLIA ISLAMIA NEW DELHI -------------- next part -------------- A non-text attachment was scrubbed... Name: from.pl Type: application/octet-stream Size: 2723 bytes Desc: not available URL: From chen_li3 at yahoo.com Thu May 11 17:47:33 2006 From: chen_li3 at yahoo.com (chen li) Date: Thu, 11 May 2006 10:47:33 -0700 (PDT) Subject: [Bioperl-l] script for batch-primer design using primer3 module In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C4D@onncrxms5.agr.gc.ca> Message-ID: <20060511174733.68836.qmail@web36812.mail.mud.yahoo.com> Hi all, With the valuable input from many of you I finally come out a script for my personal need: 1)bacth-primer design 2)set some of the parameters instead of using all the default values 3)output only part of the information for the first pair of primers but not all of them(but you can choose) 4)the reults can be exported into excel for my convience. Enclosed are the script and the results tested. I also include some lines about how I figure out which keys/entries are vailable for change.If you don't want the sequence part just add # to comment it. Any comments are welcome. BTW the solution suggested by Dr. Cui and Paul doesn't work for me. Once again thank you very much, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: primer3-5 URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: result1.txt URL: From Marc.Logghe at DEVGEN.com Fri May 12 15:28:55 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Fri, 12 May 2006 17:28:55 +0200 Subject: [Bioperl-l] problem help me...........please Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746DAB@ANTARESIA.be.devgen.com> Hi, What is actually the problem ? Do you have errors ? Is the script not behaving as you expect ? You also might attach the input file sample1.txt so that people can try it. Regards, Marc > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > saurabh maheshwari > Sent: Friday, May 12, 2006 3:56 PM > To: bioperl-l at bioperl.org; s_maheshwari84 > Subject: [Bioperl-l] problem help me...........please > > > hello > I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD). > I am working on protein protein interaction but I am unable > to use the protein interaction module i.e. ProteinGraph.pm.. > Actially I am facing lots of problem in the programme I have > written Please help me since last four months I am not able > to solve the same problem.. > I am pasting my programe here also I am attaching it also. ...... > > #!usr/bin/perl > use lib "/usr/local/bioxapps/bioperl/library/"; > use strict; > use Bio::Graph::SimpleGraph; > use Bio::Graph::IO; > our @ISA=qw( Bio::SeqI); > use Bio::Graph::Edge; > use Bio::Graph::IO::dip; > use Bio::Graph::IO::psi_xml; > use Clone qw(clone); > use vars qw(@ISA); > use Bio::AnnotatableI; > use Bio::IdentifiableI; > our @ISA = qw(Bio::Graph::SimpleGraph); > @ISA = qw(Bio::Graph::IO); > our @ISA=qw(Expoerter); > use Bio::Graph::ProteinGraph; > use Class::AutoClass; > use Bio::Graph::SimpleGraph::Traversal; > > my $graphio = Bio::Graph::IO->new(-file => > '/users/saurabh/perl_program/sample1.txt',-format => 'dip'); > print "$graphio"; > my $graph = $graphio->next_network(); > print "$graph->nodes\t"; > $graph->remove_dup_edges(); > my @un=$graph->unconnected_nodes(); > print "\nthe unconnected nodes are =@un"; my > @n=$graph->subgraph(); print "\subgraph=@n\n"; #print "Please > the protein-id whose clusering coefficient is to be > detemined\n"; #my $v=; my $density = > $graph->density(); print "\ngraph density=$density\n"; my > @graphs = $graph->components(); print "\nno of Connected > components=$#graphs\n"; print "\nplease enter the protein-id > whom you want to remove from the network\n"; my $no=; > $graph->remove_nodes($graph->nodes_by_id($no)); > my $count = $graph->edge_count(); > print "\nno of edges=$count\n "; > my $ncount = $graph->node_count(); > print "\nno of nodes=$ncount\n "; > > print"\nenter the protein whose interactions is to be find > "; my $x=; my $node = $graph->nodes_by_id($x); #print > " this is $node\n"; my @neighbors = $graph->neighbors($node); > print "to check"; print join",",map{$_->object_id()} > @neighbors; my @nodes = $graph->nodes(); print "\nno of nodes > = @nodes\t\n"; my @hubs; foreach my $nodi (@nodes) { > if ($graph->neighbor_count($node) > 10) > { > push @hubs, $nodi; > } > } > > foreach my $r(@hubs) > { > my @y=@$r; > print "the following proteins have > 10 interactors=@y\n"; > } > #siblingual protein > > my @edgeref = $graph->articulation_points(); print "no of > articulation points=$#edgeref\n"; print "please enter the > protein whom you want to check for articulation point \n "; > my $nod=; > # make pathgen graph > my $grap = Bio::Graph::IO->new(-file => 'org.txt',-format > => 'dip'); > my $gra = $grap->next_network(); > $graph->remove_dup_edges(); > $graph->union($gra); > my @duplicates = $graph->dup_edges(); > print "these interactions exist in cere and c.elegan\n=@duplicates"; > print "please enter the first protein for identifiaction of > shortest path\n"; > my $p1=; > print "please enter the second protein for identifiaction > of shortest path\n"; > my $p2=; > > my @a=$graph->shortest_paths(); > print "shortest path=@a\t\n"; > > > > with Regards > > SAURABH MAHESHWARI > > M.Sc. (BIOINFORMATICS) > > JAMIA MILLIA ISLAMIA > > NEW DELHI > From stoltzfu at umbi.umd.edu Fri May 12 15:56:06 2006 From: stoltzfu at umbi.umd.edu (Arlin Stoltzfus) Date: Fri, 12 May 2006 11:56:06 -0400 Subject: [Bioperl-l] proposal: Bio::CDAT (character data and trees) Message-ID: Dear developers-- We propose a Bio::CDAT (Character Data And Trees) module to facilitate comparative analysis using evolutionary methods by 1) managing evolutionary relationships (by linking data to trees) and 2) allowing coordinated analysis of different types of data (by implementing a generic concept of ?character-state? data). Bio::CDAT would leverage existing BioPerl objects and include the functionality of Rutger Vos's Bio::Phylo. It would provide the framework to develop interfaces to analysis tools (phylogeny inference, evolutionary rate models, functional shift inference, etc), as well as to file formats and visualization methods appropriate for such analyses. A proposal is available at http://www.molevol.org/camel/projects/CDAT-proposal.pdf We would like to hear your thoughts (e.g., see the section on "Questions to consider")! Thanks Arlin Stoltzfus WeiGang Qiu Rutger Vos (with thanks to Justin Reese and Aaron Mackey) ------------------ Arlin Stoltzfus (stoltzfu at umbi.umd.edu) CARB, 9600 Gudelsky Drive, Rockville, Maryland 20850 tel 240 314 6208, fax 240 314 6255, www.molevol.org/camel From sdavis2 at mail.nih.gov Fri May 12 15:54:57 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Fri, 12 May 2006 11:54:57 -0400 Subject: [Bioperl-l] problem help me...........please In-Reply-To: <20060512135549.27106.qmail@webmail9.rediffmail.com> Message-ID: On 5/12/06 9:55 AM, "saurabh maheshwari" wrote: > > hello > I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD). > I am working on protein protein interaction but I am unable to use the protein > interaction module i.e. ProteinGraph.pm.. > Actially I am facing lots of problem in the programme I have written Please > help me since last four months I am not able to solve the same problem.. > I am pasting my programe here also I am attaching it also. ...... You haven't really told us what you are trying to do or what problems you are having. Sean From cjfields at uiuc.edu Fri May 12 17:08:11 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 12 May 2006 12:08:11 -0500 Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names In-Reply-To: <44646267.2000802@mrc-dunn.cam.ac.uk> Message-ID: <000f01c675e6$a61bde90$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Friday, May 12, 2006 5:25 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles > species,subspecies/variant names > > In bioperl up to at least 1.5.1, when one of the database modules comes > across a species rank it does: > > if ($rank eq 'species') { > # get rid of genus from species name > (undef,$taxon_name) = split(/\s+/,$taxon_name,2); > } The XML example from NCBI Taxonomy I mentioned previously seems to have everything in the classification, from superkingdom down to species (no strain unfortunately, and I'm nit sure about subspecies); if it's missing the rank then the designation doesn't exist or is tagged as 'no rank'. Like I mentioned before I'm not intimately familiar Bio::Taxonomy, Bio::DB::Taxonomy, or Bio::Species, so I don't have a clue as to how everything is parsed and plugged in to Bio::Taxonomy objects. I do know that XML::Twig is used for parsing through the data so it shouldn't be too hard to change what you want. I haven't tried using Bio::DB::Taxonomy directly yet, but I would have thought that the binomial is just built from the XML twig 'LineageEx' Rank=Genus + Rank=Species, that the genus comes from the tag 'Genus' and species from 'Species', and that the scientific name is from the tag 'ScientificName'. Guess not. > However even though true scientific name is usually 'Genus species' in > the database, note the 'usually' - sometimes the species is a multiword > item that does not include the Genus, so we can't do some simple split > and take the second word. > The same applies to levels below species, eg. 'Avian erythroblastosis > virus' is a variant of the species 'Avian leukosis virus' but 'Avian > erythroblastosis virus (strain ES4)' is a variant of that variant... > > My solution is to just remove whatever is the same between the current > rank and the previous rank. Maybe even that's not so perfect, but it > must be a lot better than turning the species 'Avian leukosis virus' > into the species 'virus' (especially given that the genus here is > 'Alpharetrovirus')! > > # we need to be going root(kingdom) -> leaf (species or lower) order > # > # we need to be storing untouched versions of the scientific name of > # the previous rank ($self->{_last_raw}) > # > # probably only bother start doing this when we get to genus > my $last_raw = $self->{_last_raw} || undef; > $self->{_last_raw} = $sci_name; > if ($last_raw) { > $sci_name =~ s/$last_raw//; > $sci_name =~ s/^\s+//; > } > > Are there even more strange species (and lower) names that would still > not work well with the above solution? I'm don't think taking Genus/Species directly from the scientific name (normally what is in the SOURCE or ORGANISM annotation for GenBank or OS for EMBL) is the best way to go about it since it's really a best guess using regex; Jason pointed out several examples where this falls apart, and being a bacterial man I have found many examples myself. I'm also not sure that forcing a lookup for every TaxID in every sequence every time it's passed through SeqIO is the best way to go either, though I think it should be required for storing sequences. It's a tricky balance. I still think that maybe we should absolve ourselves from using SOURCE/ORGANISM or OS/OC information in GenBank files as anything more than strictly annotation, or reconstruct Bio::Species to maybe a Bio::Annotation::Species object to handle that annotation and either deprecate Bio::Species or separate it completely from any Bio::Taxonomy objects. It would really simplify things. Then, if anyone is interested in taxonomy, either install a local database or use Entrez efetch, and then use Bio::DB::Taxonomy (fixed of course) to grab the TaxID info. Seems like we're running more and more into exceptions to the rule as more genomes are made available. Anyway, using Bio::Species for GenBank is really screwy for bacterial names, so currently I get around BioPerl issues with bacterial names by grabbing the 'source' seqfeature and pulling the 'organism' tag out. But it really shouldn't be that obfuscated, right? Chris > Cheers, > Sendu. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sdavis2 at mail.nih.gov Sat May 13 12:19:21 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Sat, 13 May 2006 08:19:21 -0400 Subject: [Bioperl-l] problem help me...........please In-Reply-To: <20060513041853.16091.qmail@webmail31.rediffmail.com> References: <20060513041853.16091.qmail@webmail31.rediffmail.com> Message-ID: <4465CEC9.2010909@mail.nih.gov> saurabh maheshwari wrote: > > hello > Thanks for your prompt reply. > Actaully I am trying to make a protein interaction graph from a dip > file.But I am not able to do so.In my last mail I have already attached > my program which is giving some error and I am not able troble shot > them.Please help > Thanks I meant that since we don't know what error(s) you are getting, it is really not possible to determine what the problem is. Also, someone else on the list offered to look at your code if you were to privide the input file. I find it helpful to look at this webpage every now and then to remind myself what constitutes a useful question to email lists: http://www.catb.org/~esr/faqs/smart-questions.html Sean > On Fri, 12 May 2006 Sean Davis wrote : > > > > > > > >On 5/12/06 9:55 AM, "saurabh maheshwari" > >wrote: > > > > > > > > hello > > > I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD). > > > I am working on protein protein interaction but I am unable to use > the protein > > > interaction module i.e. ProteinGraph.pm.. > > > Actially I am facing lots of problem in the programme I have > written Please > > > help me since last four months I am not able to solve the same > problem.. > > > I am pasting my programe here also I am attaching it also. ...... > > > >You haven't really told us what you are trying to do or what problems you > >are having. > > > >Sean > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > with Regards > SAURABH MAHESHWARI > M.Sc. (BIOINFORMATICS) > JAMIA MILLIA ISLAMIA > NEW DELHI > > > From s_maheshwari84 at rediffmail.com Sat May 13 05:17:58 2006 From: s_maheshwari84 at rediffmail.com (saurabh maheshwari) Date: 13 May 2006 05:17:58 -0000 Subject: [Bioperl-l] problem help me...........please Message-ID: <20060513051758.4610.qmail@webmail31.rediffmail.com> hello I am very happy to see the prompt reply from the group members.. As you all suggested to attach the required files .. So I have attached all the three file first the input file,secod I have saved the error I was getting into a error file and third the programme file.. Actully in error file I want to know some thing . I am putting here one error line, ## no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) ## what this stand for Second thing I want to get the connected graph as I have. which type of connected grph I explain you by example.. Let there are five object in such a way. A connected to B A connected to C B connected to C D connected to C E connected to A I want to create a whole link in betwwen all five. Please help me I am not getting the result with Regards SAURABH MAHESHWARI M.Sc. (BIOINFORMATICS) JAMIA MILLIA ISLAMIA NEW DELHI -------------- next part -------------- A non-text attachment was scrubbed... Name: sample.dip Type: application/octet-stream Size: 5794 bytes Desc: not available URL: -------------- next part -------------- bash-2.05b$ perl from.pl Bio::Graph::ProteinGraph=HASH(0x1182e70) Bio::Graph::ProteinGraph=HASH(0x1182e70)->nodes the unconnected nodes are =subgraph=Bio::Graph::SimpleGraph=HASH(0x11e2160) graph density=0.00826446280991736 no of Connected components=60 please enter the protein-id whom you want to remove from the network XMECF2 no of edges=61 no of nodes=122 enter the protein whose interactions is to be find XMECF2 XMECF2 interacts with map{->object_id()} no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) Bio::Seq::RichSeq=HASH(0x11d1850 ) Bio::Seq::RichSeq=HASH(0x11bd4c0) Bio::Seq::RichSeq=HASH(0x11c2fd0) Bio::Seq:: RichSeq=HASH(0x11aa7f0) Bio::Seq::RichSeq=HASH(0x1198340) Bio::Seq::RichSeq=HASH (0x11d81a0) Bio::Seq::RichSeq=HASH(0x11ca320) Bio::Seq::RichSeq=HASH(0x11b5e40) Bio::Seq::RichSeq=HASH(0x1190e00) Bio::Seq::RichSeq=HASH(0x11c1350) Bio::Seq::Ri chSeq=HASH(0x11b2e20) Bio::Seq::RichSeq=HASH(0x11cb360) Bio::Seq::RichSeq=HASH(0 x1198250) Bio::Seq::RichSeq=HASH(0x11d0240) Bio::Seq::RichSeq=HASH(0x11c8f20) Bi o::Seq::RichSeq=HASH(0x11b4ef0) Bio::Seq::RichSeq=HASH(0x119f7a0) Bio::Seq::Rich Seq=HASH(0x11c2ee0) Bio::Seq::RichSeq=HASH(0x11dba20) Bio::Seq::RichSeq=HASH(0x1 1e2300) Bio::Seq::RichSeq=HASH(0x11b2f10) Bio::Seq::RichSeq=HASH(0x11b4b90) Bio: :Seq::RichSeq=HASH(0x11d4df0) Bio::Seq::RichSeq=HASH(0x11d4b80) Bio::Seq::RichSe q=HASH(0x11d8e70) Bio::Seq::RichSeq=HASH(0x11a1270) Bio::Seq::RichSeq=HASH(0x11c b5d0) Bio::Seq::RichSeq=HASH(0x11d5cc0) Bio::Seq::RichSeq=HASH(0x11d32a0) Bio::S eq::RichSeq=HASH(0x11b4c80) Bio::Seq::RichSeq=HASH(0x119e0c0) Bio::Seq::RichSeq= HASH(0x11b7ed0) Bio::Seq::RichSeq=HASH(0x11ad490) Bio::Seq::RichSeq=HASH(0x1196e 60) Bio::Seq::RichSeq=HASH(0x119b7f0) Bio::Seq::RichSeq=HASH(0x11cef60) Bio::Seq ::RichSeq=HASH(0x11b7b70) Bio::Seq::RichSeq=HASH(0x11dd330) Bio::Seq::RichSeq=HA SH(0x11da8c0) Bio::Seq::RichSeq=HASH(0x11a9f70) Bio::Seq::RichSeq=HASH(0x119b700 ) Bio::Seq::RichSeq=HASH(0x119a550) Bio::Seq::RichSeq=HASH(0x11ba910) Bio::Seq:: RichSeq=HASH(0x11e0b30) Bio::Seq::RichSeq=HASH(0x11d3030) Bio::Seq::RichSeq=HASH (0x11c62d0) Bio::Seq::RichSeq=HASH(0x11abb20) Bio::Seq::RichSeq=HASH(0x11d5bd0) Bio::Seq::RichSeq=HASH(0x11b03c0) Bio::Seq::RichSeq=HASH(0x119e1b0) Bio::Seq::Ri chSeq=HASH(0x11aa060) Bio::Seq::RichSeq=HASH(0x11a5700) Bio::Seq::RichSeq=HASH(0 x11a81e0) Bio::Seq::RichSeq=HASH(0x1196b00) Bio::Seq::RichSeq=HASH(0x11c1260) Bi o::Seq::RichSeq=HASH(0x11a2800) Bio::Seq::RichSeq=HASH(0x11c63c0) Bio::Seq::Rich Seq=HASH(0x11b60b0) Bio::Seq::RichSeq=HASH(0x11b93b0) Bio::Seq::RichSeq=HASH(0x1 1a4490) Bio::Seq::RichSeq=HASH(0x11ded50) Bio::Seq::RichSeq=HASH(0x11bbcd0) Bio: :Seq::RichSeq=HASH(0x1194780) Bio::Seq::RichSeq=HASH(0x11aedd0) Bio::Seq::RichSe q=HASH(0x11cd300) Bio::Seq::RichSeq=HASH(0x11a14e0) Bio::Seq::RichSeq=HASH(0x11c 4630) Bio::Seq::RichSeq=HASH(0x11a43a0) Bio::Seq::RichSeq=HASH(0x11a80f0) Bio::S eq::RichSeq=HASH(0x11bbbe0) Bio::Seq::RichSeq=HASH(0x11d5960) Bio::Seq::RichSeq= HASH(0x11c8e30) Bio::Seq::RichSeq=HASH(0x11cd3f0) Bio::Seq::RichSeq=HASH(0x11dd4 20) Bio::Seq::RichSeq=HASH(0x11cee70) Bio::Seq::RichSeq=HASH(0x11dbb10) Bio::Seq ::RichSeq=HASH(0x119a460) Bio::Seq::RichSeq=HASH(0x11aaa60) Bio::Seq::RichSeq=HA SH(0x11d1760) Bio::Seq::RichSeq=HASH(0x11cb6c0) Bio::Seq::RichSeq=HASH(0x11c7530 ) Bio::Seq::RichSeq=HASH(0x11deae0) Bio::Seq::RichSeq=HASH(0x11c4720) Bio::Seq:: RichSeq=HASH(0x119f890) Bio::Seq::RichSeq=HASH(0x11a6c40) Bio::Seq::RichSeq=HASH (0x11ad130) Bio::Seq::RichSeq=HASH(0x11e23f0) Bio::Seq::RichSeq=HASH(0x11d2f40) Bio::Seq::RichSeq=HASH(0x1194640) Bio::Seq::RichSeq=HASH(0x11d8f60) Bio::Seq::Ri chSeq=HASH(0x11d0150) Bio::Seq::RichSeq=HASH(0x119d070) Bio::Seq::RichSeq=HASH(0 x11a5610) Bio::Seq::RichSeq=HASH(0x11aa2d0) Bio::Seq::RichSeq=HASH(0x11b94a0) Bi o::Seq::RichSeq=HASH(0x11bd5b0) Bio::Seq::RichSeq=HASH(0x11c0ff0) Bio::Seq::Rich Seq=HASH(0x11a6b50) Bio::Seq::RichSeq=HASH(0x119cf80) Bio::Seq::RichSeq=HASH(0x1 1baa00) Bio::Seq::RichSeq=HASH(0x11c7620) Bio::Seq::RichSeq=HASH(0x119fb00) Bio: :Seq::RichSeq=HASH(0x11a2a70) Bio::Seq::RichSeq=HASH(0x11b1960) Bio::Seq::RichSe q=HASH(0x11ab8b0) Bio::Seq::RichSeq=HASH(0x11e0c20) Bio::Seq::RichSeq=HASH(0x11a d3a0) Bio::Seq::RichSeq=HASH(0x1197fe0) Bio::Seq::RichSeq=HASH(0x11b1870) Bio::S eq::RichSeq=HASH(0x11a2b60) Bio::Seq::RichSeq=HASH(0x1192750) Bio::Seq::RichSeq= HASH(0x11c9190) Bio::Seq::RichSeq=HASH(0x11e08c0) Bio::Seq::RichSeq=HASH(0x11dd6 90) Bio::Seq::RichSeq=HASH(0x11da7d0) Bio::Seq::RichSeq=HASH(0x11aece0) Bio::Seq ::RichSeq=HASH(0x11d80b0) Bio::Seq::RichSeq=HASH(0x11ca0b0) Bio::Seq::RichSeq=HA SH(0x1196bf0) Bio::Seq::RichSeq=HASH(0x11b7de0) Bio::Seq::RichSeq=HASH(0x11b02d0 ) Can't call method "isa" on an undefined value at /usr/local/bioxapps/bioperl/lib rary//Bio/Graph/ProteinGraph.pm line 477, line 2. -------------- next part -------------- A non-text attachment was scrubbed... Name: from.pl Type: application/octet-stream Size: 2723 bytes Desc: not available URL: From cjfields at uiuc.edu Sat May 13 18:18:53 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 13 May 2006 13:18:53 -0500 Subject: [Bioperl-l] problem help me...........please In-Reply-To: <20060513051758.4610.qmail@webmail31.rediffmail.com> Message-ID: <000901c676b9$b14479c0$15327e82@pyrimidine> I really hate to break the bad news here, but I'm going to be brutally honest. I have not looked at any of the Bio::Graph modules and have no idea how they are implemented, and I haven't looked at your input file, but I can tell right off the bat your script has major logic problems. I can also pretty much tell that you don't understand the object model we use here, at all. This is why I say that (from your last response): > ## no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) ## > what this stand for Did you cut and paste from several other scripts hoping that it would work? I say that b/c you mix styles quite frequently here, using objects correctly (deref'ing with '->') and incorrectly (print "$object"). You also declare (and redeclare) @ISA four times for a script (not needed unless you're declaring a class and inheriting methods from other modules). You also use @ISA once with a misspelled module name (I don't think there is a module named 'Expoerter'). So, I'm actually stunned that the script doesn't crash at all. Yikes! Okay, brutal honesty time over. Any time you see something like this: Bio::Graph::ProteinGraph=HASH(0x1182e70) means that what you are printing out is an reference to an object (it refers to the object class and the location in memory) and is NOT what you want. You should be doing something along the lines of $object->method, not 'print $object', to get at the object data and methods. You use this several times in your script already; that should be a big hint as the areas where it doesn't work do not use this syntax. Read the documentation for the many varied modules you use in your script. Look at script examples. Start simply, then work your way up. Also, using the '->' dereferencing operator inside double quotes doesn't work; you have to do something like: print $graph->nodes,"\t"; not print "$graph->nodes\t"; That's why you get this in your output: Bio::Graph::ProteinGraph=HASH(0x1182e70)->nodes Which just prints the object reference with the string '->nodes'. If any of what I just said doesn't make any sense, you really need to pick up 'Learning Perl' and 'Intermediate Perl' by Schwartz et al and 'Programming Perl' by Wall et al. I don't know if anyone can really help at this point w/o completely writing the script for you. We will fix problems to a point but we, for the most part, will not do your work for you. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of saurabh maheshwari > Sent: Saturday, May 13, 2006 12:18 AM > To: bioperl_l > Subject: [Bioperl-l] problem help me...........please > > > hello > I am very happy to see the prompt reply from the group members.. > As you all suggested to attach the required files .. > So I have attached all the three file first the input file,secod I have > saved the error I was getting into a error file and third the programme > file.. > Actully in error file I want to know some thing . > I am putting here one error line, > ## no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) ## > what this stand for > Second thing I want to get the connected graph as I have. > which type of connected grph I explain you by example.. > Let there are five object in such a way. > A connected to B > A connected to C > B connected to C > D connected to C > E connected to A > I want to create a whole link in betwwen all five. > > > Please help me I am not getting the result > > > with Regards > > SAURABH MAHESHWARI > > M.Sc. (BIOINFORMATICS) > > JAMIA MILLIA ISLAMIA > > NEW DELHI From hubert.prielinger at gmx.at Sun May 14 03:45:58 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Sat, 13 May 2006 21:45:58 -0600 Subject: [Bioperl-l] parsing output files from other tools Message-ID: <4466A7F6.30204@gmx.at> hi, Is it possible to parse text outputfiles rather than blast output files, like the text outputfiles form the search tool mpSrch that is offered by EBI, because the WU Blast output files are possible to parse with bioperl. thanks Hubert From arareko at campus.iztacala.unam.mx Sun May 14 04:09:35 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Sat, 13 May 2006 23:09:35 -0500 Subject: [Bioperl-l] Deobfuscator interface now available Message-ID: <4466AD7F.6050700@campus.iztacala.unam.mx> I'm glad to announce the availability of the Deobfuscator interface at the BioPerl website. You can use it at the following URL: http://bioperl.org/cgi-bin/deob_interface.cgi Many thanks to Laura Kavanaugh and David Messina for this great contribution to the BioPerl project! Mauricio. -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Sun May 14 16:18:10 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 14 May 2006 11:18:10 -0500 Subject: [Bioperl-l] parsing output files from other tools In-Reply-To: <4466A7F6.30204@gmx.at> Message-ID: <000301c67772$00b4e4f0$15327e82@pyrimidine> These are the current report types parsed through SearchIO: http://www.bioperl.org/wiki/Module:Bio::SearchIO I don't see mpsrch among them. If you want you could create a new plugin module to parse those reports; the SearchIO HOWTO gives some pointers: http://www.bioperl.org/wiki/HOWTO:SearchIO You can always look at some of the current modules like blast, blastxml, or fasta to get an idea of how it works. Judging by the mpsrch output I'm pretty sure you would have to build a custom plugin for it. A viable alternative: looking through the mail list it looks like mpsrch is a multiprocessor implementation of ssearch, itself an implementation of the Smith-Waterman algorithm for local alignments in the FASTA package of programs: http://www.bioperl.org/wiki/SSEARCH You might be able to use SearchIO::fasta there... Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger > Sent: Saturday, May 13, 2006 10:46 PM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] parsing output files from other tools > > hi, > Is it possible to parse text outputfiles rather than blast output files, > like the text outputfiles form the search tool mpSrch that is offered by > EBI, because the WU Blast output files are possible to parse with bioperl. > > thanks > Hubert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From chen_li3 at yahoo.com Sun May 14 17:14:30 2006 From: chen_li3 at yahoo.com (chen li) Date: Sun, 14 May 2006 10:14:30 -0700 (PDT) Subject: [Bioperl-l] no revcom method in Bio::Seq module? Message-ID: <20060514171430.74846.qmail@web36802.mail.mud.yahoo.com> Hi all, I need to get a reverse-complemenary sequence out of a fasta sequence file. And the Synopsis of Bio::Seq points out I can do like this way: $revcom=$seqobj->revcom(); I use the following script trying to get the job done but it doesn't work. Then I read documentation of Bio::Seq and it looks like it doesn't contain revcom method. Any idea will be appreciated. Li ############################### Here is the code: #!c:/perl/bin/perl.exe use strict; use warnings; use Bio::Seq; use Bio::SeqIO; my $file='c:/perl/local/primer3_1.0.0/src/est.txt'; my $seqIO=Bio::SeqIO->new(-file=>"<$file", -format=>'fasta' ); my $seqobj=$seqIO->next_seq();#create object print "what attributes/keys are available:\n"; for my $key (sort keys %$seqobj){ my $value=$seqobj->{$key}; print "$key\t=>\t$value\n" } # These are the output on the screen #primary_id => gi|54093|emb|X61809.1| #primary_seq => Bio::PrimarySeq=HASH(0x10492848) #based on these results primary_id can get #access right away # as to primary_seq it is an object in #Bio::Primaryseq and it provides the following #methods after reading the documentaion: #new #seq #validate_seq #subseq #length #display_id #accession_number #primary_id #alphabet #desc #can_call_new #id #is_circular #object_id #version #authority #namespace #display_name #description print "primary_id=",$seqobj->primary_id, "\n\n"; print "id=",$seqobj->id, "\n\n"; print "revcom=",$seqobj->revcom,"\n\n"; my $now_time=localtime; print $now_time, "\n\n"; exit; #These are the output on the screen #primary_id=gi|54093|emb|X61809.1| #id=gi|54093|emb|X61809.1 #revcom=Bio::Seq=HASH(0x10493304) #Sun May 14 12:45:20 2006 __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cjfields at uiuc.edu Sun May 14 17:39:50 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 14 May 2006 12:39:50 -0500 Subject: [Bioperl-l] no revcom method in Bio::Seq module? In-Reply-To: <20060514171430.74846.qmail@web36802.mail.mud.yahoo.com> Message-ID: <000401c6777d$66ddb120$15327e82@pyrimidine> This line should give you the hint: #revcom=Bio::Seq=HASH(0x10493304) You're getting an object ref here. The actual way to get the rev. comp on the wiki states '$seq->revcom->seq', not '$seq->revcom'. When I ran your script and change your line to the wiki version I get (using my test seq): what attributes/keys are available: primary_id => test, primary_seq => Bio::PrimarySeq=HASH(0x1d47fe0) primary_id=test, id=test, revcom=GGAACGAGATCTCCATGCCGCGCACCATCGGCCCGGGATGCAGCACGATCGCGCGGTCCGGCAGCATCG CCTGGCGCTTCTCGGACAATCCGTAGCGCACCGAGTACTCACGCGCGGA CGGGAAGAAACTGCCGTTCATGCGTTCGGCCTGCACGCGCAGCATGAGCACCGCGTCGGCCGCGGGCAGTTCGGCG TCCAGGTCATAGGACACGGTCACCGGCCAGTTCTCGACGCCCCTGGGGA GCAGCGTCGGTGGGGACACCAGCACCACCTCGGCCCCGAGGGTGTGCAGCAGCGTCACGTTGGAGCGGGCCACGCG GCTGTGCAGCACGTCGCCGACGATCACCACGCGCTTGCCCTCGACGCTG Sun May 14 17:34:45 2006 Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of chen li > Sent: Sunday, May 14, 2006 12:15 PM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] no revcom method in Bio::Seq module? > > Hi all, > > I need to get a reverse-complemenary sequence out of a > fasta sequence file. And the Synopsis of Bio::Seq > points out I can do like this way: > > $revcom=$seqobj->revcom(); > > I use the following script trying to get the job done > but it doesn't work. Then I read documentation of > Bio::Seq and it looks like it doesn't contain revcom > method. > > Any idea will be appreciated. > > Li > > > ############################### > Here is the code: > > #!c:/perl/bin/perl.exe > use strict; > use warnings; > > use Bio::Seq; > use Bio::SeqIO; > > my $file='c:/perl/local/primer3_1.0.0/src/est.txt'; > > > my $seqIO=Bio::SeqIO->new(-file=>"<$file", > -format=>'fasta' ); > > my $seqobj=$seqIO->next_seq();#create object > > print "what attributes/keys are available:\n"; > for my $key (sort keys %$seqobj){ > my $value=$seqobj->{$key}; > print "$key\t=>\t$value\n" > } > # These are the output on the screen > #primary_id => gi|54093|emb|X61809.1| > #primary_seq => Bio::PrimarySeq=HASH(0x10492848) > > #based on these results primary_id can get > #access right away > # as to primary_seq it is an object in > #Bio::Primaryseq and it provides the following > #methods after reading the documentaion: > #new > #seq > #validate_seq > #subseq > #length > #display_id > #accession_number > #primary_id > #alphabet > #desc > #can_call_new > #id > #is_circular > #object_id > #version > #authority > #namespace > #display_name > #description > > print "primary_id=",$seqobj->primary_id, "\n\n"; > print "id=",$seqobj->id, "\n\n"; > print "revcom=",$seqobj->revcom,"\n\n"; > > my $now_time=localtime; > print $now_time, "\n\n"; > exit; > > #These are the output on the screen > #primary_id=gi|54093|emb|X61809.1| > #id=gi|54093|emb|X61809.1 > #revcom=Bio::Seq=HASH(0x10493304) > #Sun May 14 12:45:20 2006 > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From chen_li3 at yahoo.com Sun May 14 18:08:49 2006 From: chen_li3 at yahoo.com (chen li) Date: Sun, 14 May 2006 11:08:49 -0700 (PDT) Subject: [Bioperl-l] no revcom method in Bio::Seq module? In-Reply-To: <000401c6777d$66ddb120$15327e82@pyrimidine> Message-ID: <20060514180849.55423.qmail@web36808.mail.mud.yahoo.com> Hi Chris, Thank you very much. But could you please give me the link for this syntax: $seq->revcom->seq? Li --- Chris Fields wrote: > This line should give you the hint: > > #revcom=Bio::Seq=HASH(0x10493304) > > You're getting an object ref here. The actual way > to get the rev. comp on > the wiki states '$seq->revcom->seq', not > '$seq->revcom'. > > When I ran your script and change your line to the > wiki version I get (using > my test seq): > > what attributes/keys are available: > primary_id => test, > primary_seq => > Bio::PrimarySeq=HASH(0x1d47fe0) > primary_id=test, > > id=test, > > revcom=GGAACGAGATCTCCATGCCGCGCACCATCGGCCCGGGATGCAGCACGATCGCGCGGTCCGGCAGCATCG > CCTGGCGCTTCTCGGACAATCCGTAGCGCACCGAGTACTCACGCGCGGA > CGGGAAGAAACTGCCGTTCATGCGTTCGGCCTGCACGCGCAGCATGAGCACCGCGTCGGCCGCGGGCAGTTCGGCG > TCCAGGTCATAGGACACGGTCACCGGCCAGTTCTCGACGCCCCTGGGGA > GCAGCGTCGGTGGGGACACCAGCACCACCTCGGCCCCGAGGGTGTGCAGCAGCGTCACGTTGGAGCGGGCCACGCG > GCTGTGCAGCACGTCGCCGACGATCACCACGCGCTTGCCCTCGACGCTG > > Sun May 14 17:34:45 2006 > > Chris > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of chen li > > Sent: Sunday, May 14, 2006 12:15 PM > > To: bioperl-l at bioperl.org > > Subject: [Bioperl-l] no revcom method in Bio::Seq > module? > > > > Hi all, > > > > I need to get a reverse-complemenary sequence out > of a > > fasta sequence file. And the Synopsis of Bio::Seq > > points out I can do like this way: > > > > $revcom=$seqobj->revcom(); > > > > I use the following script trying to get the job > done > > but it doesn't work. Then I read documentation of > > Bio::Seq and it looks like it doesn't contain > revcom > > method. > > > > Any idea will be appreciated. > > > > Li > > > > > > ############################### > > Here is the code: > > > > #!c:/perl/bin/perl.exe > > use strict; > > use warnings; > > > > use Bio::Seq; > > use Bio::SeqIO; > > > > my > $file='c:/perl/local/primer3_1.0.0/src/est.txt'; > > > > > > my $seqIO=Bio::SeqIO->new(-file=>"<$file", > > -format=>'fasta' ); > > > > my $seqobj=$seqIO->next_seq();#create object > > > > print "what attributes/keys are available:\n"; > > for my $key (sort keys %$seqobj){ > > my $value=$seqobj->{$key}; > > print "$key\t=>\t$value\n" > > } > > # These are the output on the screen > > #primary_id => gi|54093|emb|X61809.1| > > #primary_seq => > Bio::PrimarySeq=HASH(0x10492848) > > > > #based on these results primary_id can get > > #access right away > > # as to primary_seq it is an object in > > #Bio::Primaryseq and it provides the following > > #methods after reading the documentaion: > > #new > > #seq > > #validate_seq > > #subseq > > #length > > #display_id > > #accession_number > > #primary_id > > #alphabet > > #desc > > #can_call_new > > #id > > #is_circular > > #object_id > > #version > > #authority > > #namespace > > #display_name > > #description > > > > print "primary_id=",$seqobj->primary_id, "\n\n"; > > print "id=",$seqobj->id, "\n\n"; > > print "revcom=",$seqobj->revcom,"\n\n"; > > > > my $now_time=localtime; > > print $now_time, "\n\n"; > > exit; > > > > #These are the output on the screen > > #primary_id=gi|54093|emb|X61809.1| > > #id=gi|54093|emb|X61809.1 > > #revcom=Bio::Seq=HASH(0x10493304) > > #Sun May 14 12:45:20 2006 > > > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cjfields at uiuc.edu Sun May 14 18:28:14 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Sun, 14 May 2006 13:28:14 -0500 Subject: [Bioperl-l] no revcom method in Bio::Seq module? Message-ID: I think the confusion lies in what revcom returns. This page http://www.bioperl.org/wiki/Getting_Started show a quick way of using revcom, (which I mentioned previously) while this page http://www.bioperl.org/wiki/HOWTO:Beginners explains what is returned when you use revcom. '$seq_obj->revcom' returns a sequence object (not a sequence string): http://www.bioperl.org/wiki/HOWTO:Beginners#The_Sequence_Object which is why you need to use the 'seq' method to get the string. Hence, '$seq_obj->revcom->seq'. Chris ---- Original message ---- >Date: Sun, 14 May 2006 11:08:49 -0700 (PDT) >From: chen li >Subject: RE: [Bioperl-l] no revcom method in Bio::Seq module? >To: Chris Fields >Cc: bioperl-l at bioperl.org > >Hi Chris, > >Thank you very much. But could you please give me the >link for this syntax: $seq->revcom->seq? > >Li > > > >--- Chris Fields wrote: > >> This line should give you the hint: >> >> #revcom=Bio::Seq=HASH(0x10493304) >> >> You're getting an object ref here. The actual way >> to get the rev. comp on >> the wiki states '$seq->revcom->seq', not >> '$seq->revcom'. >> >> When I ran your script and change your line to the >> wiki version I get (using >> my test seq): >> >> what attributes/keys are available: >> primary_id => test, >> primary_seq => >> Bio::PrimarySeq=HASH(0x1d47fe0) >> primary_id=test, >> >> id=test, >> >> >revcom=GGAACGAGATCTCCATGCCGCGCACCATCGGCCCGGGATGCAGCACGAT CGCGCGGTCCGGCAGCATCG >> CCTGGCGCTTCTCGGACAATCCGTAGCGCACCGAGTACTCACGCGCGGA >> >CGGGAAGAAACTGCCGTTCATGCGTTCGGCCTGCACGCGCAGCATGAGCACCGCG TCGGCCGCGGGCAGTTCGGCG >> TCCAGGTCATAGGACACGGTCACCGGCCAGTTCTCGACGCCCCTGGGGA >> >GCAGCGTCGGTGGGGACACCAGCACCACCTCGGCCCCGAGGGTGTGCAGCAGCGT CACGTTGGAGCGGGCCACGCG >> GCTGTGCAGCACGTCGCCGACGATCACCACGCGCTTGCCCTCGACGCTG >> >> Sun May 14 17:34:45 2006 >> >> Chris >> >> > -----Original Message----- >> > From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l- >> > bounces at lists.open-bio.org] On Behalf Of chen li >> > Sent: Sunday, May 14, 2006 12:15 PM >> > To: bioperl-l at bioperl.org >> > Subject: [Bioperl-l] no revcom method in Bio::Seq >> module? >> > >> > Hi all, >> > >> > I need to get a reverse-complemenary sequence out >> of a >> > fasta sequence file. And the Synopsis of Bio::Seq >> > points out I can do like this way: >> > >> > $revcom=$seqobj->revcom(); >> > >> > I use the following script trying to get the job >> done >> > but it doesn't work. Then I read documentation of >> > Bio::Seq and it looks like it doesn't contain >> revcom >> > method. >> > >> > Any idea will be appreciated. >> > >> > Li >> > >> > >> > ############################### >> > Here is the code: >> > >> > #!c:/perl/bin/perl.exe >> > use strict; >> > use warnings; >> > >> > use Bio::Seq; >> > use Bio::SeqIO; >> > >> > my >> $file='c:/perl/local/primer3_1.0.0/src/est.txt'; >> > >> > >> > my $seqIO=Bio::SeqIO->new(-file=>"<$file", >> > -format=>'fasta' ); >> > >> > my $seqobj=$seqIO->next_seq();#create object >> > >> > print "what attributes/keys are available:\n"; >> > for my $key (sort keys %$seqobj){ >> > my $value=$seqobj->{$key}; >> > print "$key\t=>\t$value\n" >> > } >> > # These are the output on the screen >> > #primary_id => gi|54093|emb|X61809.1| >> > #primary_seq => >> Bio::PrimarySeq=HASH(0x10492848) >> > >> > #based on these results primary_id can get >> > #access right away >> > # as to primary_seq it is an object in >> > #Bio::Primaryseq and it provides the following >> > #methods after reading the documentaion: >> > #new >> > #seq >> > #validate_seq >> > #subseq >> > #length >> > #display_id >> > #accession_number >> > #primary_id >> > #alphabet >> > #desc >> > #can_call_new >> > #id >> > #is_circular >> > #object_id >> > #version >> > #authority >> > #namespace >> > #display_name >> > #description >> > >> > print "primary_id=",$seqobj->primary_id, "\n\n"; >> > print "id=",$seqobj->id, "\n\n"; >> > print "revcom=",$seqobj->revcom,"\n\n"; >> > >> > my $now_time=localtime; >> > print $now_time, "\n\n"; >> > exit; >> > >> > #These are the output on the screen >> > #primary_id=gi|54093|emb|X61809.1| >> > #id=gi|54093|emb|X61809.1 >> > #revcom=Bio::Seq=HASH(0x10493304) >> > #Sun May 14 12:45:20 2006 >> > >> > >> > >> > __________________________________________________ >> > Do You Yahoo!? >> > Tired of spam? Yahoo! Mail has the best spam >> protection around >> > http://mail.yahoo.com >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > >__________________________________________________ >Do You Yahoo!? >Tired of spam? Yahoo! Mail has the best spam protection around >http://mail.yahoo.com From Marc.Logghe at DEVGEN.com Sun May 14 20:28:34 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Sun, 14 May 2006 22:28:34 +0200 Subject: [Bioperl-l] no revcom method in Bio::Seq module? Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746DAC@ANTARESIA.be.devgen.com> Hi Li, > doesn't work. Then I read documentation of Bio::Seq and it > looks like it doesn't contain revcom method. Here, the Deobfuscator interface that Mauricio announced earlier, comes in handy. http://bioperl.org/cgi-bin/deob_interface.cgi?Search=Search&module=Bio%3 A%3ASeq&sort_order=by+method&search_string= If you look in the methods table, you will find out that the revcom method is inherited from, and implemented by Bio::PrimarySeqI. HTH, Marc From sb at mrc-dunn.cam.ac.uk Mon May 15 08:18:11 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Mon, 15 May 2006 09:18:11 +0100 Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names In-Reply-To: <000f01c675e6$a61bde90$15327e82@pyrimidine> References: <000f01c675e6$a61bde90$15327e82@pyrimidine> Message-ID: <44683943.5020307@mrc-dunn.cam.ac.uk> Chris Fields wrote: > Sendu Bala wrote: >> In bioperl up to at least 1.5.1, when one of the database modules >> comes across a species rank it does: >> >> if ($rank eq 'species') { # get rid of genus from species name >> (undef,$taxon_name) = split(/\s+/,$taxon_name,2); } > > The XML example from NCBI Taxonomy I mentioned previously seems to > have everything in the classification, from superkingdom down to > species (no strain unfortunately, and I'm nit sure about subspecies); > if it's missing the rank then the designation doesn't exist or is > tagged as 'no rank'. Like I mentioned before I'm not intimately > familiar Bio::Taxonomy, Bio::DB::Taxonomy, or Bio::Species, so I > don't have a clue as to how everything is parsed and plugged in to > Bio::Taxonomy objects. I do know that XML::Twig is used for parsing > through the data so it shouldn't be too hard to change what you > want. Yes, that's all true, but I'm not sure what it has to do with what I was saying. FYI, you do get a 'subspecies' rank but no 'variant' rank. In my own implementation I change the rank of all 'no rank' Nodes below species to 'variant'. > I haven't tried using Bio::DB::Taxonomy directly yet, but I would > have thought that the binomial is just built from the XML twig > 'LineageEx' Rank=Genus + Rank=Species, that the genus comes from the > tag 'Genus' and species from 'Species', and that the scientific name > is from the tag 'ScientificName'. Guess not. No. See above for what it actually does. That is a copy/paste from the code (there, $taxon_name == ScientificName). When it finds a species rank it does that split because in the ncbi taxonomy database the 'genus' rank for a human has a ScientificName of 'Homo', whilst the 'species' rank has a ScientificName of 'Homo sapiens', and the bioperl model (quite rightly, I think) wants the 'species' node to not have information of other nodes (well, except for the classification array). So it removes the 'Homo' from 'Homo sapiens' giving a species name of 'sapiens'. This then allows the binomial method to return 'Homo sapiens' instead of 'Homo Homo sapiens'. (though in a bizarre twist, and this is one of my problems with how names are currently represented in the Taxonomy modules, 'Scientific Name' and 'binomial' are synonymous) [snip] >> My solution is to just remove whatever is the same between the >> current rank and the previous rank. Maybe even that's not so >> perfect, but it must be a lot better than turning the species >> 'Avian leukosis virus' into the species 'virus' (especially given >> that the genus here is 'Alpharetrovirus')! > > I'm don't think taking Genus/Species directly from the scientific > name (normally what is in the SOURCE or ORGANISM annotation for > GenBank or OS for EMBL) is the best way to go about it [snip] Perhaps, but again I'm not sure what this has to do with what I was saying. If you don't want your species name to contain your genus name you have to do some kind of parsing. My post merely pointed out that the parsing currently in bioperl does not work for viruses and possibly other species. I'd like to think that someone cares about this error and would do the simple fix I offered, or that they already know about the problem and have done their own fix. > I'm also not sure that forcing a lookup for every TaxID in every > sequence every time it's passed through SeqIO is the best way to go > either, though I think it should be required for storing sequences. > It's a tricky balance. In my own implementation any database lookups are cached, and you have the option of not doing any database lookup at all and 'faking' a taxonomy from the supplied list of names (so it works just like normal Bio::Seq). > I still think that maybe we should absolve ourselves from using > SOURCE/ORGANISM or OS/OC information in GenBank files as anything > more than strictly annotation, or reconstruct Bio::Species to maybe a > Bio::Annotation::Species object to handle that annotation and either > deprecate Bio::Species or separate it completely from any > Bio::Taxonomy objects. It would really simplify things. Then, if > anyone is interested in taxonomy, either install a local database or > use Entrez efetch, and then use Bio::DB::Taxonomy (fixed of course) > to grab the TaxID info. My personal view is that having it as an annotation would serve no real purpose. For me the whole point of any kind of species representation in bioperl is to allow you to compare species in a biologically meaningful way. If it's just some annotation then that means it's basically free-form text and you have no guarantee that two sequences from the same species are annotated exactly the same - no guarantee that your code would identify that those sequences are from the same species. The only other useful thing that a species object needs to do it let you know how related two different species are - you need to be able to ask what a species' class, kingdom etc. are. Again, not viable with an annotation - you need something strict like a properly constructed Taxonomy. I guess it comes down to the philosophy of parsing a file. Do you try and reflect exactly what the file contains, letter for letter, so that your resulting object can recreate that file letter for letter, or do you parse the file and extract the correct /meaning/ in order to be more useful? I think there can be a choice by the user, and this is best done by making Bio::Species a clever wrapper around an improved Bio::Taxonomy, as in my own implementation. From s_maheshwari84 at rediffmail.com Mon May 15 08:15:26 2006 From: s_maheshwari84 at rediffmail.com (saurabh maheshwari) Date: 15 May 2006 08:15:26 -0000 Subject: [Bioperl-l] please help Message-ID: <20060515081526.27270.qmail@webmail7.rediffmail.com> Hello All I have sent a problem to the earlier also but my problem is still unsolve so i have modified the problem in another way please can any body give me code to make a graph between some items which are in a text file in the following formate: Example item1 interacts with item2 and i want to make graph by giving any item as input and asking all interactions of that item. item 1 item 2 A B A C C B D B D E A F G A with Regards SAURABH MAHESHWARI M.Sc. (BIOINFORMATICS) JAMIA MILLIA ISLAMIA NEW DELHI From sdavis2 at mail.nih.gov Mon May 15 10:26:53 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Mon, 15 May 2006 06:26:53 -0400 Subject: [Bioperl-l] please help In-Reply-To: <20060515081526.27270.qmail@webmail7.rediffmail.com> Message-ID: On 5/15/06 4:15 AM, "saurabh maheshwari" wrote: > > Hello All > I have sent a problem to the earlier also but my problem is still unsolve so i > have modified the problem in another way please can any body give me code to > make a graph between some items which are in a text file in the following > formate: > Example > item1 interacts with item2 and i want to make graph by giving any item as > input and asking all interactions of that item. > > item 1 item 2 > A B > A C > C B > D B > D E > A F > G A Not a bioperl answer, but in your case, I would suggest looking at using cytoscape to do this. Look here for details: http://www.cytoscape.org/ Sean From sdavis2 at mail.nih.gov Mon May 15 11:03:28 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Mon, 15 May 2006 07:03:28 -0400 Subject: [Bioperl-l] please help In-Reply-To: Message-ID: On 5/15/06 6:26 AM, "Sean Davis" wrote: > > > > On 5/15/06 4:15 AM, "saurabh maheshwari" > wrote: > >> >> Hello All >> I have sent a problem to the earlier also but my problem is still unsolve so >> i >> have modified the problem in another way please can any body give me code to >> make a graph between some items which are in a text file in the following >> formate: >> Example >> item1 interacts with item2 and i want to make graph by giving any item as >> input and asking all interactions of that item. >> >> item 1 item 2 >> A B >> A C >> C B >> D B >> D E >> A F >> G A > > Not a bioperl answer, but in your case, I would suggest looking at using > cytoscape to do this. Look here for details: > > http://www.cytoscape.org/ I forgot to mention, if you are looking for a perl solution, I would look at the Graph module. http://search.cpan.org/~jhi/Graph-0.69/lib/Graph.pod You can create the graph according to the docs and then use the neighbors() method (if I remember correctly) to get the nodes connected to the query node. Sean From akarger at CGR.Harvard.edu Mon May 15 12:20:11 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Mon, 15 May 2006 08:20:11 -0400 Subject: [Bioperl-l] Deobfuscator interface now available Message-ID: This tool is quite nice, and may save me a lot of perdoc'ing. A couple of minor interface thoughts. 1)There's quite a lot of methods for many of the classes. As such, I think I'll often want to browse through what's available in a class. But 60% or so of the screen real estate is used for "Enter a search string... OR select a class from the list". IMO, it would be better to have two pages, a search page and a result page. It only takes a click on Back (or a "new search" button) to get to a new search, and now you can use your whole screen for reading your results. 2) Please sort the "select a class from the list" alphabetically. I guess I can enter a search term to get the right classes, but it would be nice to be able to browse. 2a) if you want to be really fancy, make a javascript nested menu with expandable submenus. OK, maybe not. 3) Minimalist is nice, but documentation is even nicer. It wasn't clear to me that the search searches within class names rather than function names. What I really want to know sometimes is which module has, say, the revcom method in it. So, if it's not easy to include that within this search, then at least tell me what my search space is. 4) When I search for something that's not found, I get a screen that looks pretty familiar, with the extra text "No match to string found" down at the bottom. It took me a while to even notice it. (Studies show that most users don't read most of the text on a page.) Bold might be nice here. Or put the error at the top of the screen. Or both. 5) I'll save my stupidest comment for last - please make the page title "Bioperl Deobfuscator", so that when I bookmark it I'll know what the bookmark stands for. Thanks, Laura Kavanaugh and David Messina, for a neat AND useful tool. - Amir Karger Computational Biology Group Bauer Center for Genomics Research Harvard University 617-496-0626 From sb at mrc-dunn.cam.ac.uk Mon May 15 13:08:32 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Mon, 15 May 2006 14:08:32 +0100 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: References: Message-ID: <44687D50.6080306@mrc-dunn.cam.ac.uk> Amir Karger wrote: > This tool is quite nice, and may save me a lot of perdoc'ing. Yes, many thanks to everyone involved. > A couple of minor interface thoughts. > > 1)There's quite a lot of methods for many of the classes. As such, I > think I'll often want to browse through what's available in a class. But > 60% or so of the screen real estate is used for "Enter a search > string... OR select a class from the list". IMO, it would be better to > have two pages, a search page and a result page. It only takes a click > on Back (or a "new search" button) to get to a new search, and now you > can use your whole screen for reading your results. As the compromise it must be, I like the way it behaves. I don't like lots of windows. I especially don't like pop up windows. Right now when I'm using the bioperl docs I tend to have a whole bunch of tabs open to different class pages at once, so being able to see an overview all on one page in Deobfuscator is very nice. Further to that, I'd love it if clicking on a method name caused an in-place css(&|javascript) reveal (similar to how a well implemented drop down menu works in a website) rather than a new window opened. Alternatively, just have more columns in the results table, ie. usage, function, returns, args columns. I feel that opening a window for each method you want to understand is far too slow. I'd also really like a link to the code for the method as well. The bioperl docs are rarely complete enough that you can really understand what every method is supposed to do without looking at the code. > 3) Minimalist is nice, but documentation is even nicer. It wasn't clear > to me that the search searches within class names rather than function > names. What I really want to know sometimes is which module has, say, > the revcom method in it. This would be a great feature to add. Another minor interface thought: 6) Have a little more cell padding in all the tables. Things are just a little too cramped and things start to look messy/ run into each other. From cjfields at uiuc.edu Mon May 15 13:59:57 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 08:59:57 -0500 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <44687D50.6080306@mrc-dunn.cam.ac.uk> Message-ID: <000901c67827$d99eabb0$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Monday, May 15, 2006 8:09 AM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Deobfuscator interface now available > > Amir Karger wrote: > > This tool is quite nice, and may save me a lot of perdoc'ing. > > Yes, many thanks to everyone involved. The Deobfuscator currently indexes bioperl-1.4, so it's not completely up-to-date. I believe Mauricio and Dave may be working on updating to the newer versions and maybe bioperl-live, as well as getting the other bioperl packages up and running. For modules added after v1.4 I use the script in the FAQ question mentioned on the Deobfuscator wiki page to get up-to-date methods, then grab the that ActiveState HTML'd perldocs pumped out when installing using PPM (I make a custom PPM/PPD file and install myself every once in a while): #!/usr/bin/perl -w use Class::Inspector; $class = shift || die "Usage: methods perl_class_name\n"; eval "require $class"; print join ("\n", sort @{Class::Inspector- > > A couple of minor interface thoughts. > > > > 1)There's quite a lot of methods for many of the classes. As such, I > > think I'll often want to browse through what's available in a class. But > > 60% or so of the screen real estate is used for "Enter a search > > string... OR select a class from the list". IMO, it would be better to > > have two pages, a search page and a result page. It only takes a click > > on Back (or a "new search" button) to get to a new search, and now you > > can use your whole screen for reading your results. > > As the compromise it must be, I like the way it behaves. I don't like > lots of windows. I especially don't like pop up windows. Right now when > I'm using the bioperl docs I tend to have a whole bunch of tabs open to > different class pages at once, so being able to see an overview all on > one page in Deobfuscator is very nice. > > Further to that, I'd love it if clicking on a method name caused an > in-place css(&|javascript) reveal (similar to how a well implemented > drop down menu works in a website) rather than a new window opened. > Alternatively, just have more columns in the results table, ie. usage, > function, returns, args columns. I feel that opening a window for each > method you want to understand is far too slow. Agreed. > I'd also really like a link to the code for the method as well. The > bioperl docs are rarely complete enough that you can really understand > what every method is supposed to do without looking at the code. The methods that pop up are in columns along with the class module that implements the method. If you click on that link you get PDOC documentation for the module which includes most of the code (strangely, though Deobfuscator indexes bioperl 1.4, the PDOC corresponds to bioperl-live). Is that what you meant, or something a bit more detailed? > > 3) Minimalist is nice, but documentation is even nicer. It wasn't clear > > to me that the search searches within class names rather than function > > names. What I really want to know sometimes is which module has, say, > > the revcom method in it. That's listed in the method results table (the next column has the module with a link to the module's online docs). Chris > This would be a great feature to add. > > > Another minor interface thought: > 6) Have a little more cell padding in all the tables. Things are just a > little too cramped and things start to look messy/ run into each other. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Mon May 15 16:08:30 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 11:08:30 -0500 Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names In-Reply-To: <44683943.5020307@mrc-dunn.cam.ac.uk> Message-ID: <001601c67839$cf289490$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Monday, May 15, 2006 3:18 AM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, > subspecies/variant names > > Chris Fields wrote: > > Sendu Bala wrote: > >> In bioperl up to at least 1.5.1, when one of the database modules > >> comes across a species rank it does: > >> > >> if ($rank eq 'species') { # get rid of genus from species name > >> (undef,$taxon_name) = split(/\s+/,$taxon_name,2); } > > > > The XML example from NCBI Taxonomy I mentioned previously seems to > > have everything in the classification, from superkingdom down to > > species (no strain unfortunately, and I'm nit sure about subspecies); > > if it's missing the rank then the designation doesn't exist or is > > tagged as 'no rank'. Like I mentioned before I'm not intimately > > familiar Bio::Taxonomy, Bio::DB::Taxonomy, or Bio::Species, so I > > don't have a clue as to how everything is parsed and plugged in to > > Bio::Taxonomy objects. I do know that XML::Twig is used for parsing > > through the data so it shouldn't be too hard to change what you > > want. > > Yes, that's all true, but I'm not sure what it has to do with what I was > saying. FYI, you do get a 'subspecies' rank but no 'variant' rank. In my > own implementation I change the rank of all 'no rank' Nodes below > species to 'variant'. Sorry; wandered a bit off topic there. > > I haven't tried using Bio::DB::Taxonomy directly yet, but I would > > have thought that the binomial is just built from the XML twig > > 'LineageEx' Rank=Genus + Rank=Species, that the genus comes from the > > tag 'Genus' and species from 'Species', and that the scientific name > > is from the tag 'ScientificName'. Guess not. > > No. See above for what it actually does. That is a copy/paste from the > code (there, $taxon_name == ScientificName). When it finds a species > rank it does that split because in the > ncbi taxonomy database the 'genus' rank for a human has a ScientificName > of 'Homo', whilst the 'species' rank has a ScientificName of 'Homo > sapiens', and the bioperl model (quite rightly, I think) wants the > 'species' node to not have information of other nodes (well, except for > the classification array). So it removes the 'Homo' from 'Homo sapiens' > giving a species name of 'sapiens'. This then allows the binomial method > to return 'Homo sapiens' instead of 'Homo Homo sapiens'. > > (though in a bizarre twist, and this is one of my problems with how > names are currently represented in the Taxonomy modules, 'Scientific > Name' and 'binomial' are synonymous) Ah, now I see. That's a bit screwy, but it's not on our end so we have to deal with it. I also noticed that subspecies also contains the entire string: 135461 Bacillus subtilis subsp. subtilis subspecies As for the 'scientific_name' method when accessed through Bio::DB::Taxonomy, I don't get the actual scientific name for the node (from the GenBank ORGANISM line) almost every time; I get the name with the strain chopped off instead and a number of times the names get mangled. The regexes below only grab from the topmost tags: Script: --------------------------------- #! perl use strict; use warnings; use Bio::DB::Taxonomy; my $file = shift @ARGV; print "\nNCBI XML output ScientificName tag for each node:\n"; my @taxid =(); open (TAXFILE, "){ if (/^\s{2}(\d+)<\/TaxId>/) { print "$1\t"; push @taxid, $1; } print "$1\n" if /^\s{2}(.*)<\/ScientificName>/; } close TAXFILE; print "\nBio::DB::Taxonomy scientific_name:\n"; for my $id (@taxid){ my $factory = Bio::DB::Taxonomy->new(-source => 'entrez'); my $node = $factory->get_Taxonomy_Node(-taxonid => $id); print $node->ncbi_taxid,"\t",$node->scientific_name,"\n"; } --------------------------------- Output: --------------------------------- NCBI XML output ScientificName tag for each node: 191218 Bacillus anthracis str. A2012 198094 Bacillus anthracis str. Ames 222523 Bacillus cereus ATCC 10987 224308 Bacillus subtilis subsp. subtilis str. 168 226186 Bacteroides thetaiotaomicron VPI-5482 226900 Bacillus cereus ATCC 14579 246194 Carboxydothermus hydrogenoformans Z-2901 260799 Bacillus anthracis str. Sterne 261594 Bacillus anthracis str. 'Ames Ancestor' 264462 Bdellovibrio bacteriovorus HD100 272558 Bacillus halodurans C-125 272559 Bacteroides fragilis NCTC 9343 279010 Bacillus licheniformis ATCC 14580 281309 Bacillus thuringiensis serovar konkukian str. 97-27 288681 Bacillus cereus E33L 295405 Bacteroides fragilis YCH46 66692 Bacillus clausii KSM-K16 76114 Azoarcus sp. EbN1 Bio::DB::Taxonomy scientific_name: 191218 Bacillus cereus group anthracis 198094 Bacillus cereus group anthracis 222523 Bacillus cereus group cereus 224308 subtilis Bacillus subtilis subsp. subtilis 226186 Bacteroides thetaiotaomicron 226900 Bacillus cereus group cereus 246194 Carboxydothermus hydrogenoformans 260799 Bacillus cereus group anthracis 261594 Bacillus cereus group anthracis 264462 Bdellovibrio bacteriovorus 272558 Bacillus halodurans 272559 Bacteroides fragilis 279010 Bacillus licheniformis 281309 Bacillus cereus group thuringiensis 288681 Bacillus cereus group cereus 295405 Bacteroides fragilis 66692 Bacillus clausii 76114 Azoarcus sp. --------------------------------- Note Bacillus subtilis in the Bio::Tax output above. Not one of those is the scientific name as defined by NCBI (and most taxonomists for that matter). So, in a nutshell, there's a problem here. I don't know if your fix works for that, but I definitely don't think the 'scientific name' should be assembled ad hoc but should be taken from the tagname for that node. I am currently reduced to grabbing the feature primary_tagged 'source' and getting the 'organism' tagname from that. I cannot stress enough that it should NOT be that way. As for 'binomial' == 'scientific_name', I agree; I see it as well and that should be fixed. ... > Perhaps, but again I'm not sure what this has to do with what I was > saying. If you don't want your species name to contain your genus name > you have to do some kind of parsing. My post merely pointed out that the > parsing currently in bioperl does not work for viruses and possibly > other species. I'd like to think that someone cares about this error and > would do the simple fix I offered, or that they already know about the > problem and have done their own fix. Again me going off-topic, so my apologies; it's more to do with my frustrations with Bio::Species (not Bio::DB::Taxonomy). My point here was, since there is no real way to surmise from a GenBank flatfile what the taxonomic ranks are w/o guessing (which seems to break more often than not when dealing with complex names), there shouldn't be any tie to Bio::Tax objects, at least directly. I guess methods could be incorporated into Bio::Species for those who want to give it a try, but I would like to get a GenBank file, for once, in which the scientific name/binomial name isn't mangled by Bio::Species. Back to Bio::DB::Taxonomy; I don't have a problem with implementing your methods here; on the contrary, if they fix my problem above then I'll be more than glad to. I can't get to it immediately but maybe later today/tomorrow. > > I'm also not sure that forcing a lookup for every TaxID in every > > sequence every time it's passed through SeqIO is the best way to go > > either, though I think it should be required for storing sequences. > > It's a tricky balance. > > In my own implementation any database lookups are cached, and you have > the option of not doing any database lookup at all and 'faking' a > taxonomy from the supplied list of names (so it works just like normal > Bio::Seq). > > > > I still think that maybe we should absolve ourselves from using > > SOURCE/ORGANISM or OS/OC information in GenBank files as anything > > more than strictly annotation, or reconstruct Bio::Species to maybe a > > Bio::Annotation::Species object to handle that annotation and either > > deprecate Bio::Species or separate it completely from any > > Bio::Taxonomy objects. It would really simplify things. Then, if > > anyone is interested in taxonomy, either install a local database or > > use Entrez efetch, and then use Bio::DB::Taxonomy (fixed of course) > > to grab the TaxID info. > > My personal view is that having it as an annotation would serve no real > purpose. For me the whole point of any kind of species representation in > bioperl is to allow you to compare species in a biologically meaningful > way. If it's just some annotation then that means it's basically > free-form text and you have no guarantee that two sequences from the > same species are annotated exactly the same - no guarantee that your > code would identify that those sequences are from the same species. > The only other useful thing that a species object needs to do it let you > know how related two different species are - you need to be able to ask > what a species' class, kingdom etc. are. Again, not viable with an > annotation - you need something strict like a properly constructed > Taxonomy. My point is, a large number of users do NOT use, nor care about, taxonomic information to the degree they need to know the entire classification of the organism; many are just as happy about getting the scientific name only, which is in the GenBank/EMBL file itself. To take one extreme, it is not productive to force every user to download the NCBI tax database and use lookups just to convert sequences from EMBL format to GenBank format. It's not productive to allow users to spam the NCBI tax database remotely either, so hardcoding lookups is, IMHO, a big mistake. > I guess it comes down to the philosophy of parsing a file. Do you try > and reflect exactly what the file contains, letter for letter, so that > your resulting object can recreate that file letter for letter, or do > you parse the file and extract the correct /meaning/ in order to be more > useful? > I think there can be a choice by the user, and this is best done by > making Bio::Species a clever wrapper around an improved Bio::Taxonomy, > as in my own implementation. I understand both philosophies, but the latter implies that you know the intention of the ones submitting the sequence. 99.9% of the time that's fine, something I can live with. However, when we mess up something as simple as getting the scientific name for an organism when the information is directly in the flat file (ORGANISM line) by trying to 'imply' what the classification is, yes, I get frustrated. Even more frustrating to me is that Bio::DB::Taxonomy, which should return accurate information directly from the Taxonomy database, still manages to screw up the scientific name. The NCBI definition in the sample record: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html state that the ORGANISM line contains the formal scientific name and it's lineage (no ranking). If the lineage is very long it is abbreviated so you don't get the same thing as you would through using TaxID. So, in essence, I believe you are correct, that Bio::Species can be used as a 'wrapper' for Bio::Taxonomy objects, but only up to a certain degree with caveats or warnings for possible inaccuracies. I also believe that lookups should be allowed but optional, not required (i.e. left up to the user, as you state). I just feel that it's somewhat misleading to imply, by delegating to Bio::Taxonomy, that Bio::Species contains accurate taxonomic information when NCBI themselves state that the GenBank flatfile classification can be incomplete and does not supply rankings (genus, species) in the file. It's our best guess in most cases, and a best guess by definition is not very accurate. If you want taxonomic accuracy, use the TaxID and a local tax database. I feel that we shouldn't punish those who don't worry/care about taxonomy by implementing Bio::Species with methods that mangle data that's directly in the flat file they're parsing. Okay, not to cut short this discussion, but I have to get back to $job. I'll try adding your fixes in a bit later today/tomorrow; if they pass tests I'll commit them in. Chris From hlapp at gmx.net Mon May 15 16:59:06 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 15 May 2006 12:59:06 -0400 Subject: [Bioperl-l] error loading uniprot release 49.6 into mysql In-Reply-To: <051520061234.14794.446875470003415A000039CA21602807489D0A02970E9DD29C@att.net> References: <051520061234.14794.446875470003415A000039CA21602807489D0A02970E9DD29C@att.net> Message-ID: You found the right instance. Unfortunately with the way the bioperl swissprot parser works the group (RG) isn't promoted to author if there is no author in addition (in fact you may debate whether that would even be the best way of doing things), so it doesn't find it on second occurrence by unique key. If you can live without this entry, or any other entry that causes a hiccup, just supply the flag --safe and it will gracefully move on to the next entry. Fixing the issue would require either to fix the bioperl swissprot parser (or Bio::Annotation::Reference) to stick the RG group into the author slot if there is no author, or to fix Bioperl Bio::Annotation::Reference to also feature a group and biosql to use it in place of a missing author. Actually there is $reference->rg. Maybe Bioperl-db (and hence Biosql) should just use that in place of a missing author? The downside is that upon round-tripping an entry, the RG annotation line will become an RA annotation line. How bad would that be? Any thoughts from anyone? -hilmar On May 15, 2006, at 8:34 AM, s.rayner at att.net wrote: > I found where the script is hiccuping.... > > The Uniprot release contains lines with identical annotation for > the RL keyword for two different sequences. > > ___________________ > > First occurence... > ___________________ > > ID 1433T_PONPY STANDARD; PRT; 245 AA. > AC Q5RFJ2; Q5RDK2; > DT 05-JUL-2005, integrated into UniProtKB/Swiss-Prot. > DT 05-JUL-2005, sequence version 2. > DT 18-APR-2006, entry version 13. > DE 14-3-3 protein theta. > GN Name=YWHAQ; > OS Pongo pygmaeus (Orangutan). > OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; > OC Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; > OC Catarrhini; Hominidae; Pongo. > OX NCBI_TaxID=9600; > RN [1] > RP NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA]. > RC TISSUE=Brain cortex, and Kidney; > RG The German cDNA consortium; > RL Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases. > <====== Not Unique > > > ___________________ > > Second occurence... > ___________________ > > > ID 1433G_PONPY STANDARD; PRT; 246 AA. > AC Q5RC20; > DT 05-JUL-2005, integrated into UniProtKB/Swiss-Prot. > DT 05-JUL-2005, sequence version 2. > DT 18-APR-2006, entry version 13. > DE 14-3-3 protein gamma. > GN Name=YWHAG; > OS Pongo pygmaeus (Orangutan). > OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; > OC Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; > OC Catarrhini; Hominidae; Pongo. > OX NCBI_TaxID=9600; > RN [1] > RP NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA]. > RC TISSUE=Heart; > RG The German cDNA consortium; > RL Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases. > <====== Not Unique > > > > in these two cases the generated CRC key is identical and so MySQL > throws a wobbly. > > if i look at the MySQL entry in the REFERENCE table for the first > sequence > ------+-------+---------+----------------------+ > | 139 | NULL | Submitted (NOV-2004) to the EMBL/ > GenBank/DDBJ databases. | NULL | NULL | CRC-E7973FEA4B5611DC | > +--------------+----------- > +---------------------------------------------------- > > and the error when the script choked was > > MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, > values were > ("","","Submitted (NOV-2004) to the EMBL/GenBank/DDBJ > databases.","CRC-E7973FEA4B5611DC","","","") FKs ( Duplicate entry 'CRC-E7973FEA4B5611DC' for key 3 > > hence the problem. > > I'm guessing i'm not the first person to encounter this, but dont > see any hints for an easy way around this. > > any suggestions....? > > ta > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Mon May 15 17:01:14 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 15 May 2006 13:01:14 -0400 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <4466AD7F.6050700@campus.iztacala.unam.mx> References: <4466AD7F.6050700@campus.iztacala.unam.mx> Message-ID: <068E49BD-2DE4-47BA-BD7C-D6FD487DF095@gmx.net> Hey, thanks to Laura & David for this interface. Any idea why most of the Bio::Ontology::* modules show up without their leading Bio::Ontology? And clicking on those hyperlinks doesn't go anywhere either ... Anything different with those modules that I can fix? -hilmar On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: > I'm glad to announce the availability of the Deobfuscator interface at > the BioPerl website. You can use it at the following URL: > > http://bioperl.org/cgi-bin/deob_interface.cgi > > Many thanks to Laura Kavanaugh and David Messina for this great > contribution to the BioPerl project! > > Mauricio. > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Mon May 15 17:22:13 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 12:22:13 -0500 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <068E49BD-2DE4-47BA-BD7C-D6FD487DF095@gmx.net> Message-ID: <000301c67844$1b506280$15327e82@pyrimidine> That's strange. Clicking on the list gives me the results for that module. When I click on the hyperlinks in the results section they open fine; the method column links opens a new page containing usage-function-returns-args and the class column links opens pdoc (same page) for bioperl-live. I'm using Firefox 1.5 on WinXP. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > Sent: Monday, May 15, 2006 12:01 PM > To: Mauricio Herrera Cuadra > Cc: bioperl-l > Subject: Re: [Bioperl-l] Deobfuscator interface now available > > Hey, thanks to Laura & David for this interface. > > Any idea why most of the Bio::Ontology::* modules show up without > their leading Bio::Ontology? And clicking on those hyperlinks doesn't > go anywhere either ... Anything different with those modules that I > can fix? > > -hilmar > > On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: > > > I'm glad to announce the availability of the Deobfuscator interface at > > the BioPerl website. You can use it at the following URL: > > > > http://bioperl.org/cgi-bin/deob_interface.cgi > > > > Many thanks to Laura Kavanaugh and David Messina for this great > > contribution to the BioPerl project! > > > > Mauricio. > > > > -- > > MAURICIO HERRERA CUADRA > > arareko at campus.iztacala.unam.mx > > Laboratorio de Gen?tica > > Unidad de Morfofisiolog?a y Funci?n > > Facultad de Estudios Superiores Iztacala, UNAM > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sb at mrc-dunn.cam.ac.uk Mon May 15 18:00:15 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Mon, 15 May 2006 19:00:15 +0100 Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names In-Reply-To: <001601c67839$cf289490$15327e82@pyrimidine> References: <001601c67839$cf289490$15327e82@pyrimidine> Message-ID: <4468C1AF.9080400@mrc-dunn.cam.ac.uk> Chris Fields wrote: > > Ah, now I see. That's a bit screwy, but it's not on our end so we have to > deal with it. I also noticed that subspecies also contains the entire > string: > > > 135461 > Bacillus subtilis subsp. subtilis > subspecies > Yes, this is one of the problems I mentioned in the first post to this thread. > As for the 'scientific_name' method when accessed through Bio::DB::Taxonomy, > I don't get the actual scientific name for the node (from the GenBank > ORGANISM line) almost every time; I get the name with the strain chopped off > instead and a number of times the names get mangled. [snip, should be:] > 224308 Bacillus subtilis subsp. subtilis str. 168 > 281309 Bacillus thuringiensis serovar konkukian str. 97-27 [snip, but Bio::DB::Taxonomy gives:] > 224308 subtilis Bacillus subtilis subsp. subtilis > 281309 Bacillus cereus group thuringiensis [snip] > So, in a nutshell, there's a problem here. I don't know if your fix works > for that, but I definitely don't think the 'scientific name' should be > assembled ad hoc but should be taken from the tagname for that node. Yes, my implementation will get you the correct answer, but not quite as you say. My solution was to munge the actual ScientificName but 'ensure' that the binomial would give you back the actual binomial name you wanted - which is the intent of current Bio::DB::Taxonomy code. my $species0 = TFBS::Species->new(-ncbi_taxid => 224308); my $leaf_node = $species0->taxonomy->get_leaves(); print "sci_name of Node = '", $leaf_node->scientific_name, "'\n"; print "Species0 subspecies = '", $species0->subspecies, "'\n"; print "Species0 variants = '", scalar($species0->variant), "'\n"; print "Species0 binomial = '", $species0->binomial('FULL'), "'\n"; gives: sci_name of Node = 'str. 168' Species0 subspecies = 'subsp. subtilis' Species0 variants = 'str. 168' Species0 binomial = 'Bacillus subtilis subsp. subtilis str. 168' and the same again for id 281309: sci_name of Node = 'str. 97-27' Species0 subspecies = '' Species0 variants = 'serovar konkukian str. 97-27' Species0 binomial = 'Bacillus thuringiensis serovar konkukian str. 97-27' I've done it this way because even though strictly speaking the ScientificName for 224308 (a 'no rank') is 'Bacillus subtilis subsp. subtilis str. 168', when I ask for the variant I don't want that whole string. I just want the bit that will be different when comparing other strains of this subspecies of this species of Bacillus. I want 'str. 168'. Note that my objects never store the original ScientificName; it is due to 'luck' (or as I like to think, a good implementation) that the binomial method is able to reconstruct a string that is identical to what the original ScientificName was. If you'd like to see my code let me know. You can't just drop the code snippet I posted in this thread into existing bioperl modules; quite a bit else has to change as well. I'll have to make an updated taxonomy_the_tfbs_way.tar.gz file available if you want an example implementation; the current version of that file is now out of date - it doesn't do any of what I describe above. From hlapp at gmx.net Mon May 15 18:08:49 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 15 May 2006 14:08:49 -0400 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <000301c67844$1b506280$15327e82@pyrimidine> References: <000301c67844$1b506280$15327e82@pyrimidine> Message-ID: Safari or Firefox on MacOSX don't do this. Note that the appearance in the browsable list is already different (the prefix is missing), and the JavaScript link also lacks the prefix in the module name in contrast to others, e.g., Bio::Ontology::Ontology (which is one of the few Bio::Ontology exceptions that do work and do display correctly). I suppose there is something peculiar about the code formatting of those modules? Some of the modules under Bio::OntologyIO are also affected BTW. What happens is after you click on the link the page apppears to reload (i.e., gets submitted) but the second table that is supposed open underneath the first doesn't appear. However, the sort-by drop down selector does appear. -hilmar On May 15, 2006, at 1:22 PM, Chris Fields wrote: > That's strange. Clicking on the list gives me the results for that > module. > When I click on the hyperlinks in the results section they open > fine; the > method column links opens a new page containing usage-function- > returns-args > and the class column links opens pdoc (same page) for bioperl- > live. I'm > using Firefox 1.5 on WinXP. > > Chris > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp >> Sent: Monday, May 15, 2006 12:01 PM >> To: Mauricio Herrera Cuadra >> Cc: bioperl-l >> Subject: Re: [Bioperl-l] Deobfuscator interface now available >> >> Hey, thanks to Laura & David for this interface. >> >> Any idea why most of the Bio::Ontology::* modules show up without >> their leading Bio::Ontology? And clicking on those hyperlinks doesn't >> go anywhere either ... Anything different with those modules that I >> can fix? >> >> -hilmar >> >> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: >> >>> I'm glad to announce the availability of the Deobfuscator >>> interface at >>> the BioPerl website. You can use it at the following URL: >>> >>> http://bioperl.org/cgi-bin/deob_interface.cgi >>> >>> Many thanks to Laura Kavanaugh and David Messina for this great >>> contribution to the BioPerl project! >>> >>> Mauricio. >>> >>> -- >>> MAURICIO HERRERA CUADRA >>> arareko at campus.iztacala.unam.mx >>> Laboratorio de Gen?tica >>> Unidad de Morfofisiolog?a y Funci?n >>> Facultad de Estudios Superiores Iztacala, UNAM >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Mon May 15 19:07:59 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 14:07:59 -0500 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: Message-ID: <000501c67852$e1bb55c0$15327e82@pyrimidine> I'll have to give it a try on Mac OS X (we have an ancient G4 in the lab which I can try it on). I'll let you know what I find. This is what I get when I do a search for 'Bio::Ont*' using Firefox on WinXP and this Deobfuscator link (http://bioperl.org/cgi-bin/deob_interface.cgi?); all the classes have links that work (I added newline and tab to make it a bit more readable) : Bio::OntologyIO Parser factory for Ontology formats Bio::OntologyIO::Handlers::BaseSAXHandler no short description available Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler no short description available Bio::Ontology::OntologyI Interface for an ontology implementation Bio::Ontology::TermFactory Instantiates a new Bio::Ontology::TermI (or derived class) through a factory Bio::Ontology::OntologyStore A repository of ontologies Bio::Ontology::RelationshipFactory Instantiates a new Bio::Ontology::RelationshipI (or derived class) through a factory Bio::Ontology::Ontology standard implementation of an Ontology So the names seem fine here. When I click on a class (Bio::Ontology::Ontology) I get in the results section: Method Class Returns Usage add_relationship Bio::Ontology::Ontology Its argument. add_relationship(RelationshipI relationship): RelationshipI add_relationship_type Bio::Ontology::OntologyEngineI not documented not documented add_term Bio::Ontology::Ontology its argument. add_term(TermI term): TermI ....and so on Where each method is clickable and opens a new page containing a table: Bio::Ontology::Ontology::add_relationship Usage add_relationship(RelationshipI relationship): RelationshipI Function Adds a relationship object to the ontology engine. Returns Its argument. Args A RelationshipI object. Each class is also linked to the bioperl-live PDOC. Clicking on class Bio::Ontology::Ontology in the results table gets me this page (no new page): http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html Chris > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Monday, May 15, 2006 1:09 PM > To: Chris Fields > Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > Subject: Re: [Bioperl-l] Deobfuscator interface now available > > Safari or Firefox on MacOSX don't do this. Note that the appearance > in the browsable list is already different (the prefix is missing), > and the JavaScript link also lacks the prefix in the module name in > contrast to others, e.g., Bio::Ontology::Ontology (which is one of > the few Bio::Ontology exceptions that do work and do display correctly). > > I suppose there is something peculiar about the code formatting of > those modules? Some of the modules under Bio::OntologyIO are also > affected BTW. > > What happens is after you click on the link the page apppears to > reload (i.e., gets submitted) but the second table that is supposed > open underneath the first doesn't appear. However, the sort-by drop > down selector does appear. > > -hilmar > > On May 15, 2006, at 1:22 PM, Chris Fields wrote: > > > That's strange. Clicking on the list gives me the results for that > > module. > > When I click on the hyperlinks in the results section they open > > fine; the > > method column links opens a new page containing usage-function- > > returns-args > > and the class column links opens pdoc (same page) for bioperl- > > live. I'm > > using Firefox 1.5 on WinXP. > > > > Chris > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > >> Sent: Monday, May 15, 2006 12:01 PM > >> To: Mauricio Herrera Cuadra > >> Cc: bioperl-l > >> Subject: Re: [Bioperl-l] Deobfuscator interface now available > >> > >> Hey, thanks to Laura & David for this interface. > >> > >> Any idea why most of the Bio::Ontology::* modules show up without > >> their leading Bio::Ontology? And clicking on those hyperlinks doesn't > >> go anywhere either ... Anything different with those modules that I > >> can fix? > >> > >> -hilmar > >> > >> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: > >> > >>> I'm glad to announce the availability of the Deobfuscator > >>> interface at > >>> the BioPerl website. You can use it at the following URL: > >>> > >>> http://bioperl.org/cgi-bin/deob_interface.cgi > >>> > >>> Many thanks to Laura Kavanaugh and David Messina for this great > >>> contribution to the BioPerl project! > >>> > >>> Mauricio. > >>> > >>> -- > >>> MAURICIO HERRERA CUADRA > >>> arareko at campus.iztacala.unam.mx > >>> Laboratorio de Gen?tica > >>> Unidad de Morfofisiolog?a y Funci?n > >>> Facultad de Estudios Superiores Iztacala, UNAM > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > >> -- > >> =========================================================== > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >> =========================================================== > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From cjfields at uiuc.edu Mon May 15 19:12:34 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 14:12:34 -0500 Subject: [Bioperl-l] Deobfuscator interface now available Message-ID: <000601c67853$85d49cc0$15327e82@pyrimidine> I just tried the same thing (links, search, etc) with Mac OS X v 10.3.9 and Safari (no Firefox sorry) and it worked fine as well (all links, no missing Bio::Ontology, etc). Not sure what it could be... Chris > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > Sent: Monday, May 15, 2006 2:08 PM > To: 'Hilmar Lapp' > Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > Subject: RE: [Bioperl-l] Deobfuscator interface now available > > I'll have to give it a try on Mac OS X (we have an ancient G4 in the lab > which I can try it on). I'll let you know what I find. > > This is what I get when I do a search for 'Bio::Ont*' using Firefox on > WinXP and this Deobfuscator link (http://bioperl.org/cgi- > bin/deob_interface.cgi?); all the classes have links that work (I added > newline and tab to make it a bit more readable) : > > Bio::OntologyIO > Parser factory for Ontology formats > Bio::OntologyIO::Handlers::BaseSAXHandler > no short description available > Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler > no short description available > Bio::Ontology::OntologyI > Interface for an ontology implementation > Bio::Ontology::TermFactory > Instantiates a new Bio::Ontology::TermI (or derived class) through a > factory > Bio::Ontology::OntologyStore > A repository of ontologies > Bio::Ontology::RelationshipFactory > Instantiates a new Bio::Ontology::RelationshipI (or derived class) > through a factory > Bio::Ontology::Ontology > standard implementation of an Ontology > > So the names seem fine here. > > When I click on a class (Bio::Ontology::Ontology) I get in the results > section: > > Method Class Returns > Usage > add_relationship Bio::Ontology::Ontology Its > argument. add_relationship(RelationshipI relationship): RelationshipI > add_relationship_type Bio::Ontology::OntologyEngineI not > documented not documented > add_term Bio::Ontology::Ontology its > argument. add_term(TermI term): TermI > > ....and so on > > Where each method is clickable and opens a new page containing a table: > > Bio::Ontology::Ontology::add_relationship > Usage add_relationship(RelationshipI relationship): RelationshipI > Function Adds a relationship object to the ontology engine. > Returns Its argument. > Args A RelationshipI object. > > > Each class is also linked to the bioperl-live PDOC. Clicking on class > Bio::Ontology::Ontology in the results table gets me this page (no new > page): > > http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html > > > Chris > > > -----Original Message----- > > From: Hilmar Lapp [mailto:hlapp at gmx.net] > > Sent: Monday, May 15, 2006 1:09 PM > > To: Chris Fields > > Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > > Subject: Re: [Bioperl-l] Deobfuscator interface now available > > > > Safari or Firefox on MacOSX don't do this. Note that the appearance > > in the browsable list is already different (the prefix is missing), > > and the JavaScript link also lacks the prefix in the module name in > > contrast to others, e.g., Bio::Ontology::Ontology (which is one of > > the few Bio::Ontology exceptions that do work and do display correctly). > > > > I suppose there is something peculiar about the code formatting of > > those modules? Some of the modules under Bio::OntologyIO are also > > affected BTW. > > > > What happens is after you click on the link the page apppears to > > reload (i.e., gets submitted) but the second table that is supposed > > open underneath the first doesn't appear. However, the sort-by drop > > down selector does appear. > > > > -hilmar > > > > On May 15, 2006, at 1:22 PM, Chris Fields wrote: > > > > > That's strange. Clicking on the list gives me the results for that > > > module. > > > When I click on the hyperlinks in the results section they open > > > fine; the > > > method column links opens a new page containing usage-function- > > > returns-args > > > and the class column links opens pdoc (same page) for bioperl- > > > live. I'm > > > using Firefox 1.5 on WinXP. > > > > > > Chris > > > > > >> -----Original Message----- > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > > >> Sent: Monday, May 15, 2006 12:01 PM > > >> To: Mauricio Herrera Cuadra > > >> Cc: bioperl-l > > >> Subject: Re: [Bioperl-l] Deobfuscator interface now available > > >> > > >> Hey, thanks to Laura & David for this interface. > > >> > > >> Any idea why most of the Bio::Ontology::* modules show up without > > >> their leading Bio::Ontology? And clicking on those hyperlinks doesn't > > >> go anywhere either ... Anything different with those modules that I > > >> can fix? > > >> > > >> -hilmar > > >> > > >> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: > > >> > > >>> I'm glad to announce the availability of the Deobfuscator > > >>> interface at > > >>> the BioPerl website. You can use it at the following URL: > > >>> > > >>> http://bioperl.org/cgi-bin/deob_interface.cgi > > >>> > > >>> Many thanks to Laura Kavanaugh and David Messina for this great > > >>> contribution to the BioPerl project! > > >>> > > >>> Mauricio. > > >>> > > >>> -- > > >>> MAURICIO HERRERA CUADRA > > >>> arareko at campus.iztacala.unam.mx > > >>> Laboratorio de Gen?tica > > >>> Unidad de Morfofisiolog?a y Funci?n > > >>> Facultad de Estudios Superiores Iztacala, UNAM > > >>> > > >>> _______________________________________________ > > >>> Bioperl-l mailing list > > >>> Bioperl-l at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>> > > >> > > >> -- > > >> =========================================================== > > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > >> =========================================================== > > >> > > >> > > >> > > >> > > >> > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > -- > > =========================================================== > > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > =========================================================== > > > > > > From arareko at campus.iztacala.unam.mx Mon May 15 19:20:10 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Mon, 15 May 2006 14:20:10 -0500 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <000901c67827$d99eabb0$15327e82@pyrimidine> References: <000901c67827$d99eabb0$15327e82@pyrimidine> Message-ID: <4468D46A.8070203@campus.iztacala.unam.mx> Laura and Dave would be very happy to see all of your comments/suggestions/enhancements/complaints summarized in the appropriate wiki page. Just be sure to sign them properly with your name and date: http://bioperl.org/wiki/Deobfuscator I think they'll have to discuss which features will be nice to implement and which don't, depending on the direction they want their project to go. But don't worry, they're extremely nice people who are open to all kind of ideas. The best of all: the Deobfuscator is open-source so everyone is invited to contribute to it, just ask them for the code :) On my side, I'm working on tweaking the code so it would be able of browsing different BioPerl packages (core, run, ext) and their respective releases (stable, developer, cvs). Regards, Mauricio. Chris Fields wrote: >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Sendu Bala >> Sent: Monday, May 15, 2006 8:09 AM >> To: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Deobfuscator interface now available >> >> Amir Karger wrote: >>> This tool is quite nice, and may save me a lot of perdoc'ing. >> Yes, many thanks to everyone involved. > > The Deobfuscator currently indexes bioperl-1.4, so it's not completely > up-to-date. I believe Mauricio and Dave may be working on updating to the > newer versions and maybe bioperl-live, as well as getting the other bioperl > packages up and running. > > For modules added after v1.4 I use the script in the FAQ question mentioned > on the Deobfuscator wiki page to get up-to-date methods, then grab the that > ActiveState HTML'd perldocs pumped out when installing using PPM (I make a > custom PPM/PPD file and install myself every once in a while): > > #!/usr/bin/perl -w > use Class::Inspector; > $class = shift || die "Usage: methods perl_class_name\n"; > eval "require $class"; > print join ("\n", sort @{Class::Inspector- > >>> A couple of minor interface thoughts. >>> >>> 1)There's quite a lot of methods for many of the classes. As such, I >>> think I'll often want to browse through what's available in a class. But >>> 60% or so of the screen real estate is used for "Enter a search >>> string... OR select a class from the list". IMO, it would be better to >>> have two pages, a search page and a result page. It only takes a click >>> on Back (or a "new search" button) to get to a new search, and now you >>> can use your whole screen for reading your results. >> As the compromise it must be, I like the way it behaves. I don't like >> lots of windows. I especially don't like pop up windows. Right now when >> I'm using the bioperl docs I tend to have a whole bunch of tabs open to >> different class pages at once, so being able to see an overview all on >> one page in Deobfuscator is very nice. >> >> Further to that, I'd love it if clicking on a method name caused an >> in-place css(&|javascript) reveal (similar to how a well implemented >> drop down menu works in a website) rather than a new window opened. >> Alternatively, just have more columns in the results table, ie. usage, >> function, returns, args columns. I feel that opening a window for each >> method you want to understand is far too slow. > > Agreed. > >> I'd also really like a link to the code for the method as well. The >> bioperl docs are rarely complete enough that you can really understand >> what every method is supposed to do without looking at the code. > > The methods that pop up are in columns along with the class module that > implements the method. > > > If you click on that link you get PDOC documentation for the module which > includes most of the code (strangely, though Deobfuscator indexes bioperl > 1.4, the PDOC corresponds to bioperl-live). Is that what you meant, or > something a bit more detailed? > >>> 3) Minimalist is nice, but documentation is even nicer. It wasn't clear >>> to me that the search searches within class names rather than function >>> names. What I really want to know sometimes is which module has, say, >>> the revcom method in it. > > That's listed in the method results table (the next column has the module > with a link to the module's online docs). > > > Chris > > >> This would be a great feature to add. >> >> >> Another minor interface thought: >> 6) Have a little more cell padding in all the tables. Things are just a >> little too cramped and things start to look messy/ run into each other. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From hlapp at gmx.net Mon May 15 19:23:55 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 15 May 2006 15:23:55 -0400 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <000501c67852$e1bb55c0$15327e82@pyrimidine> References: <000501c67852$e1bb55c0$15327e82@pyrimidine> Message-ID: <57326DCD-D72B-4CED-801D-9E25609BF57C@gmx.net> I wasn't using the search. It's in the scrollable table for browsing. -hilmar On May 15, 2006, at 3:07 PM, Chris Fields wrote: > I'll have to give it a try on Mac OS X (we have an ancient G4 in > the lab > which I can try it on). I'll let you know what I find. > > This is what I get when I do a search for 'Bio::Ont*' using Firefox > on WinXP > and this Deobfuscator link (http://bioperl.org/cgi-bin/ > deob_interface.cgi?); > all the classes have links that work (I added newline and tab to > make it a > bit more readable) : > > Bio::OntologyIO > Parser factory for Ontology formats > Bio::OntologyIO::Handlers::BaseSAXHandler > no short description available > Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler > no short description available > Bio::Ontology::OntologyI > Interface for an ontology implementation > Bio::Ontology::TermFactory > Instantiates a new Bio::Ontology::TermI (or derived class) through a > factory > Bio::Ontology::OntologyStore > A repository of ontologies > Bio::Ontology::RelationshipFactory > Instantiates a new Bio::Ontology::RelationshipI (or derived class) > through a factory > Bio::Ontology::Ontology > standard implementation of an Ontology > > So the names seem fine here. > > When I click on a class (Bio::Ontology::Ontology) I get in the results > section: > > Method Class > Returns > Usage > add_relationship Bio::Ontology::Ontology Its > argument. add_relationship(RelationshipI relationship): > RelationshipI > add_relationship_type Bio::Ontology::OntologyEngineI not > documented not documented > add_term Bio::Ontology::Ontology its > argument. add_term(TermI term): TermI > > ....and so on > > Where each method is clickable and opens a new page containing a > table: > > Bio::Ontology::Ontology::add_relationship > Usage add_relationship(RelationshipI relationship): RelationshipI > Function Adds a relationship object to the ontology engine. > Returns Its argument. > Args A RelationshipI object. > > > Each class is also linked to the bioperl-live PDOC. Clicking on class > Bio::Ontology::Ontology in the results table gets me this page (no new > page): > > http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html > > > Chris > >> -----Original Message----- >> From: Hilmar Lapp [mailto:hlapp at gmx.net] >> Sent: Monday, May 15, 2006 1:09 PM >> To: Chris Fields >> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' >> Subject: Re: [Bioperl-l] Deobfuscator interface now available >> >> Safari or Firefox on MacOSX don't do this. Note that the appearance >> in the browsable list is already different (the prefix is missing), >> and the JavaScript link also lacks the prefix in the module name in >> contrast to others, e.g., Bio::Ontology::Ontology (which is one of >> the few Bio::Ontology exceptions that do work and do display >> correctly). >> >> I suppose there is something peculiar about the code formatting of >> those modules? Some of the modules under Bio::OntologyIO are also >> affected BTW. >> >> What happens is after you click on the link the page apppears to >> reload (i.e., gets submitted) but the second table that is supposed >> open underneath the first doesn't appear. However, the sort-by drop >> down selector does appear. >> >> -hilmar >> >> On May 15, 2006, at 1:22 PM, Chris Fields wrote: >> >>> That's strange. Clicking on the list gives me the results for that >>> module. >>> When I click on the hyperlinks in the results section they open >>> fine; the >>> method column links opens a new page containing usage-function- >>> returns-args >>> and the class column links opens pdoc (same page) for bioperl- >>> live. I'm >>> using Firefox 1.5 on WinXP. >>> >>> Chris >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp >>>> Sent: Monday, May 15, 2006 12:01 PM >>>> To: Mauricio Herrera Cuadra >>>> Cc: bioperl-l >>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available >>>> >>>> Hey, thanks to Laura & David for this interface. >>>> >>>> Any idea why most of the Bio::Ontology::* modules show up without >>>> their leading Bio::Ontology? And clicking on those hyperlinks >>>> doesn't >>>> go anywhere either ... Anything different with those modules that I >>>> can fix? >>>> >>>> -hilmar >>>> >>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: >>>> >>>>> I'm glad to announce the availability of the Deobfuscator >>>>> interface at >>>>> the BioPerl website. You can use it at the following URL: >>>>> >>>>> http://bioperl.org/cgi-bin/deob_interface.cgi >>>>> >>>>> Many thanks to Laura Kavanaugh and David Messina for this great >>>>> contribution to the BioPerl project! >>>>> >>>>> Mauricio. >>>>> >>>>> -- >>>>> MAURICIO HERRERA CUADRA >>>>> arareko at campus.iztacala.unam.mx >>>>> Laboratorio de Gen?tica >>>>> Unidad de Morfofisiolog?a y Funci?n >>>>> Facultad de Estudios Superiores Iztacala, UNAM >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> -- >>>> =========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>> =========================================================== >>>> >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From ClarkeW at AGR.GC.CA Mon May 15 19:40:15 2006 From: ClarkeW at AGR.GC.CA (Clarke, Wayne) Date: Mon, 15 May 2006 15:40:15 -0400 Subject: [Bioperl-l] Memory Leak in Bio::SearchIO Message-ID: <320530F83FA47047823E57F110DDEAADB159EC@onncrxms4.agr.gc.ca> Hey everyone, I have been developing some code to download and parse blast reports from a remote server using Soap::Lite as well as insert the results into a mysql database. The problem I am having is that my program seems to be taking up and huge amount of RAM. For a single job of 10000 queries it can consume as much as a couple hundred Mb inside an hour. I realize that a lot of work is being done but this seems like way too much. This leads me to the subject of my post. I think I may have traced the source of the memory leak to Bio::SearchIO. I have used Devel::Size to track the size of my variables and done other debugging steps and have had no luck with resolving this very frustrating problem. My code is as follows: my $result = $connector->getQueryResult($query_id); my $FH; open $FH, "<", \$result; my $searchio = new Bio::SearchIO(-format => "blast", -fh => $FH); while (my $o_blast = $searchio->next_result()) { my $clone_id = $o_blast->query_name(); my $statement = $bdbi->form_push_SQL ($o_blast, $clone_id, 5); this is just the leading and tailing code surrounding the use of Bio::SearchIO since there is quite a lot. I am mostly just wondering if anyone has ever had problems with SearchIO and its memory usage. I looked at the source code for it but am afraid it is out of my league. Any help/suggestions/questions would be great. Thanks From dmessina at wustl.edu Mon May 15 19:34:10 2006 From: dmessina at wustl.edu (David Messina) Date: Mon, 15 May 2006 14:34:10 -0500 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <000901c67827$d99eabb0$15327e82@pyrimidine> References: <000901c67827$d99eabb0$15327e82@pyrimidine> Message-ID: Responding to: >>> Amir Karger >> Sendu Bala > Chris Fields > The Deobfuscator currently indexes bioperl-1.4, so it's not completely > up-to-date. I believe Mauricio and Dave may be working on updating > to the > newer versions and maybe bioperl-live, as well as getting the other > bioperl > packages up and running. That's correct -- Mauricio is currently working on a version that will allow you to search 1.4, 1.5.1, or bioperl-live. The Deobfuscator indexes will be updated (daily?) to keep them in sync with the CVS repository. >>> A couple of minor interface thoughts. >>> >>> 1)There's quite a lot of methods for many of the classes. As such, I >>> think I'll often want to browse through what's available in a >>> class. But >>> 60% or so of the screen real estate is used for "Enter a search >>> string... OR select a class from the list". IMO, it would be >>> better to >>> have two pages, a search page and a result page. It only takes >>> a click >>> on Back (or a "new search" button) to get to a new search, and >>> now you >>> can use your whole screen for reading your results. >> >> As the compromise it must be, I like the way it behaves. I don't like >> lots of windows. I especially don't like pop up windows. Right now >> when >> I'm using the bioperl docs I tend to have a whole bunch of tabs >> open to >> different class pages at once, so being able to see an overview >> all on >> one page in Deobfuscator is very nice. I think the current behavior makes sense as the default, but I like the idea of being able to view the search results in a separate window for easier browsing. Thanks for the suggestion; I'll add it to the list. >> Further to that, I'd love it if clicking on a method name caused an >> in-place css(&|javascript) reveal (similar to how a well implemented >> drop down menu works in a website) rather than a new window opened. >> Alternatively, just have more columns in the results table, ie. >> usage, >> function, returns, args columns. I feel that opening a window for >> each >> method you want to understand is far too slow. > > Agreed. Yeah, the way it currently works is admittedly lame, and was done as a placeholder until we figured out a better way to do it. An in-place reveal sounds like a good solution. >>> 2) Please sort the "select a class from the list" alphabetically. I >>> guess I can enter a search term to get the right classes, but it >>> would >>> be nice to be able to browse. Agreed. I think we were doing this in an earlier test version, but I must have left it out of the release I handed off to Mauricio. >>> 3) Minimalist is nice, but documentation is even nicer. It wasn't >>> clear >>> to me that the search searches within class names rather than >>> function >>> names. What I really want to know sometimes is which module has, >>> say, >>> the revcom method in it. >> >> This would be a great feature to add. That's a great idea. >>> 4) When I search for something that's not found, I get a screen that >>> looks pretty familiar, with the extra text "No match to string >>> found" >>> down at the bottom. It took me a while to even notice it. >>> (Studies show >>> that most users don't read most of the text on a page.) Bold >>> might be >>> nice here. Or put the error at the top of the screen. Or both. Added to the list. >>> 5) I'll save my stupidest comment for last - please make the page >>> title >>> "Bioperl Deobfuscator", so that when I bookmark it I'll know what >>> the >>> bookmark stands for. Added to the list. Not stupid, by the way -- much to my surprise, there are at least 2 or 3 other (obviously inferior :) ) deobfuscators floating around out there. >> Another minor interface thought: >> 6) Have a little more cell padding in all the tables. Things are >> just a >> little too cramped and things start to look messy/ run into each >> other. Added to the list. Thanks to all of you for taking the time to give such detailed feedback -- it's really helpful. There is a wiki page on the BioPerl site for this project (http:// www.bioperl.org/wiki/Deobfuscator), so I'll be putting your comments there for tracking and further discussion. Please feel free to add to it. Dave -- Dave Messina WashU Genome Sequencing Center dmessina at wustl.edu 314-286-1825 From faruque at ebi.ac.uk Mon May 15 19:47:27 2006 From: faruque at ebi.ac.uk (Nadeem Faruque) Date: Mon, 15 May 2006 20:47:27 +0100 Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names Message-ID: <809AE0C7-A9B6-48A4-BF11-BA392C770CA9@ebi.ac.uk> >> My personal view is that having it as an annotation would serve no >> real >> purpose. For me the whole point of any kind of species >> representation in >> bioperl is to allow you to compare species in a biologically >> meaningful >> way. If it's just some annotation then that means it's basically I understand the need to find the species name of entries, especially now that so many complete genomes have been given their own strain- specific tax nodes, and I also think it is a shame that the ncbi tax dump does not give a rank to entries such as these (they cannot easily be distinguished from unofficial ranks higher in the tree without ascending the tree). Would it be useful for the species name to be included within EMBL file headers, eg in a line called OB (OB is a terrible suggestion based on 'Organism Binomial' since OS is already in use)? eg two examples of the species 'Apple stem grooving virus', where the second one would appear to be a different species without delving into the tax tree or the inclusion of an OB line. AC D14995; S47260; DE Apple stem grooving virus genome, complete sequence. OS Apple stem grooving virus OB Apple stem grooving virus OC Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae; OC Capillovirus. AC AY646511; DE Citrus tatter leaf virus strain Kumquat 1, complete genome. OS Citrus tatter leaf virus OB Apple stem grooving virus OC Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae; OC Capillovirus. > My point is, a large number of users do NOT use, nor care about, > taxonomic > information to the degree they need to know the entire > classification of the > organism; many are just as happy about getting the scientific name > only, > which is in the GenBank/EMBL file itself. To take one extreme, it > is not > productive to force every user to download the NCBI tax database > and use > lookups just to convert sequences from EMBL format to GenBank > format. It's > not productive to allow users to spam the NCBI tax database > remotely either, > so hardcoding lookups is, IMHO, a big mistake. I don't think you need to add any information to turn an embl-format file into a Genbank flatfile, but maybe I'm missing something obvious. Nadeem -- Dr S.M. Nadeem N. Faruque 9 Barley Court Saffron Walden Essex CB11 3HG 01799 500 120 From dmessina at wustl.edu Mon May 15 20:12:48 2006 From: dmessina at wustl.edu (David Messina) Date: Mon, 15 May 2006 15:12:48 -0500 Subject: [Bioperl-l] Deobfuscator interface now available Message-ID: <5A2309FD-8C6E-4349-99CC-B3EDA8B2F499@wustl.edu> On May 15, 2006, at 2:23 PM, Hilmar Lapp wrote: > I wasn't using the search. It's in the scrollable table for browsing. > -hilmar I'm seeing this too on OS X with Safari 2.0.3. If you type 'goflat' (without the quotes) into the search box, you'll see the behavior. Chris, can you try it again this way just to confirm it's an OS/browser-specific thing? Not sure what's going on, Hilmar -- I'll take a look. Dave From cjfields at uiuc.edu Mon May 15 20:56:29 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 15:56:29 -0500 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <57326DCD-D72B-4CED-801D-9E25609BF57C@gmx.net> Message-ID: <000a01c67862$0a00cab0$15327e82@pyrimidine> Okay, I see what you mean. Using the search term "Bio::Ont*" also explains why I didn't see it ;P. Yeah, the bug shows up here too (WinXP and Mac OS X), and those links are broken like you said. Could be something to do with indexing. Using the methods script in the FAQ (http://www.bioperl.org/wiki/FAQ#Why_can.27t_I_easily_get_a_list_of_all_the_ methods_a_object_can_call.3F) I get this: C:\Perl\Scripts>methods.pl Bio::OntologyIO::simplehierarchy Bio::OntologyIO::simplehierarchy::Dumper Bio::OntologyIO::simplehierarchy::basename Bio::OntologyIO::simplehierarchy::dirname Bio::OntologyIO::simplehierarchy::fileparse Bio::OntologyIO::simplehierarchy::fileparse_set_fstype Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > Sent: Monday, May 15, 2006 2:24 PM > To: Chris Fields > Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > Subject: Re: [Bioperl-l] Deobfuscator interface now available > > I wasn't using the search. It's in the scrollable table for browsing. > -hilmar > > On May 15, 2006, at 3:07 PM, Chris Fields wrote: > > > I'll have to give it a try on Mac OS X (we have an ancient G4 in > > the lab > > which I can try it on). I'll let you know what I find. > > > > This is what I get when I do a search for 'Bio::Ont*' using Firefox > > on WinXP > > and this Deobfuscator link (http://bioperl.org/cgi-bin/ > > deob_interface.cgi?); > > all the classes have links that work (I added newline and tab to > > make it a > > bit more readable) : > > > > Bio::OntologyIO > > Parser factory for Ontology formats > > Bio::OntologyIO::Handlers::BaseSAXHandler > > no short description available > > Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler > > no short description available > > Bio::Ontology::OntologyI > > Interface for an ontology implementation > > Bio::Ontology::TermFactory > > Instantiates a new Bio::Ontology::TermI (or derived class) through a > > factory > > Bio::Ontology::OntologyStore > > A repository of ontologies > > Bio::Ontology::RelationshipFactory > > Instantiates a new Bio::Ontology::RelationshipI (or derived class) > > through a factory > > Bio::Ontology::Ontology > > standard implementation of an Ontology > > > > So the names seem fine here. > > > > When I click on a class (Bio::Ontology::Ontology) I get in the results > > section: > > > > Method Class > > Returns > > Usage > > add_relationship Bio::Ontology::Ontology > Its > > argument. add_relationship(RelationshipI relationship): > > RelationshipI > > add_relationship_type Bio::Ontology::OntologyEngineI not > > documented not documented > > add_term Bio::Ontology::Ontology its > > argument. add_term(TermI term): TermI > > > > ....and so on > > > > Where each method is clickable and opens a new page containing a > > table: > > > > Bio::Ontology::Ontology::add_relationship > > Usage add_relationship(RelationshipI relationship): RelationshipI > > Function Adds a relationship object to the ontology engine. > > Returns Its argument. > > Args A RelationshipI object. > > > > > > Each class is also linked to the bioperl-live PDOC. Clicking on class > > Bio::Ontology::Ontology in the results table gets me this page (no new > > page): > > > > http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html > > > > > > Chris > > > >> -----Original Message----- > >> From: Hilmar Lapp [mailto:hlapp at gmx.net] > >> Sent: Monday, May 15, 2006 1:09 PM > >> To: Chris Fields > >> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > >> Subject: Re: [Bioperl-l] Deobfuscator interface now available > >> > >> Safari or Firefox on MacOSX don't do this. Note that the appearance > >> in the browsable list is already different (the prefix is missing), > >> and the JavaScript link also lacks the prefix in the module name in > >> contrast to others, e.g., Bio::Ontology::Ontology (which is one of > >> the few Bio::Ontology exceptions that do work and do display > >> correctly). > >> > >> I suppose there is something peculiar about the code formatting of > >> those modules? Some of the modules under Bio::OntologyIO are also > >> affected BTW. > >> > >> What happens is after you click on the link the page apppears to > >> reload (i.e., gets submitted) but the second table that is supposed > >> open underneath the first doesn't appear. However, the sort-by drop > >> down selector does appear. > >> > >> -hilmar > >> > >> On May 15, 2006, at 1:22 PM, Chris Fields wrote: > >> > >>> That's strange. Clicking on the list gives me the results for that > >>> module. > >>> When I click on the hyperlinks in the results section they open > >>> fine; the > >>> method column links opens a new page containing usage-function- > >>> returns-args > >>> and the class column links opens pdoc (same page) for bioperl- > >>> live. I'm > >>> using Firefox 1.5 on WinXP. > >>> > >>> Chris > >>> > >>>> -----Original Message----- > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > >>>> Sent: Monday, May 15, 2006 12:01 PM > >>>> To: Mauricio Herrera Cuadra > >>>> Cc: bioperl-l > >>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available > >>>> > >>>> Hey, thanks to Laura & David for this interface. > >>>> > >>>> Any idea why most of the Bio::Ontology::* modules show up without > >>>> their leading Bio::Ontology? And clicking on those hyperlinks > >>>> doesn't > >>>> go anywhere either ... Anything different with those modules that I > >>>> can fix? > >>>> > >>>> -hilmar > >>>> > >>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: > >>>> > >>>>> I'm glad to announce the availability of the Deobfuscator > >>>>> interface at > >>>>> the BioPerl website. You can use it at the following URL: > >>>>> > >>>>> http://bioperl.org/cgi-bin/deob_interface.cgi > >>>>> > >>>>> Many thanks to Laura Kavanaugh and David Messina for this great > >>>>> contribution to the BioPerl project! > >>>>> > >>>>> Mauricio. > >>>>> > >>>>> -- > >>>>> MAURICIO HERRERA CUADRA > >>>>> arareko at campus.iztacala.unam.mx > >>>>> Laboratorio de Gen?tica > >>>>> Unidad de Morfofisiolog?a y Funci?n > >>>>> Facultad de Estudios Superiores Iztacala, UNAM > >>>>> > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>> > >>>> -- > >>>> =========================================================== > >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >>>> =========================================================== > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > >> -- > >> =========================================================== > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >> =========================================================== > >> > >> > >> > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Mon May 15 21:29:14 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 16:29:14 -0500 Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names In-Reply-To: <809AE0C7-A9B6-48A4-BF11-BA392C770CA9@ebi.ac.uk> Message-ID: <000b01c67866$9dac2620$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Nadeem Faruque > Sent: Monday, May 15, 2006 2:47 PM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::DB::Taxonomy:: mishandles > species,subspecies/variant names > > >> My personal view is that having it as an annotation would serve no > >> real > >> purpose. For me the whole point of any kind of species > >> representation in > >> bioperl is to allow you to compare species in a biologically > >> meaningful > >> way. If it's just some annotation then that means it's basically > > I understand the need to find the species name of entries, especially > now that so many complete genomes have been given their own strain- > specific tax nodes, and I also think it is a shame that the ncbi tax > dump does not give a rank to entries such as these (they cannot > easily be distinguished from unofficial ranks higher in the tree > without ascending the tree). > Would it be useful for the species name to be included within EMBL > file headers, eg in a line called OB (OB is a terrible suggestion > based on 'Organism Binomial' since OS is already in use)? > > eg two examples of the species 'Apple stem grooving virus', where the > second one would appear to be a different species without delving > into the tax tree or the inclusion of an OB line. > > AC D14995; S47260; > DE Apple stem grooving virus genome, complete sequence. > OS Apple stem grooving virus > OB Apple stem grooving virus > OC Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae; > OC Capillovirus. > > AC AY646511; > DE Citrus tatter leaf virus strain Kumquat 1, complete genome. > OS Citrus tatter leaf virus > OB Apple stem grooving virus > OC Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae; > OC Capillovirus. Jason also mentions a few examples (see below). The problem lies in the fact that EMBL and GenBank flatfiles do not give hierarchy ranking for taxonomy, so it's a best guess. What I'm seeing is that the guess is wrong more often than not when it comes to complex scientific names (viruses, bacteria, etc). Notice the doubling of the strain in the following GenBank files passed through SeqIO (genbank->genbank conversion, BTW; haven't tried EMBL): SOURCE Azoarcus sp. EbN1 EbN1 ORGANISM Azoarcus sp. Bacteria; Proteobacteria; Betaproteobacteria; Rhodocyclales; Rhodocyclaceae; Azoarcus. SOURCE Mycobacterium sp. KMS KMS ORGANISM Mycobacterium sp. Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales; Corynebacterineae; Mycobacteriaceae; Mycobacterium. SOURCE Mycobacterium tuberculosis C C ORGANISM Mycobacterium tuberculosis Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales; Corynebacterineae; Mycobacteriaceae; Mycobacterium; Mycobacterium; tuberculosis complex; Mycobacterium. SOURCE Bacillus subtilis subsp. subtilis str. 168 subtilis str. 168 ORGANISM Bacillus subtilis subsp. Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus. Here are Jason's examples, for posterity: Can you guess what value is the strain versus sub-species? What happens when there is a two part strain name (space separated) and a sub-species or variety designation? SOURCE Staphylococcus haemolyticus JCSC1435 ORGANISM Staphylococcus haemolyticus JCSC1435 Bacteria; Firmicutes; Bacillales; Staphylococcus. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=279808 strain is JCSC1435 versus SOURCE Muntiacus muntjak vaginalis ORGANISM Muntiacus muntjak vaginalis Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Ruminantia; Pecora; Cervidae; Muntiacinae; Muntiacus. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9887 species is muntjak, sub-species vaginalis ? versus SOURCE Aspergillus nidulans FGSC A4 ORGANISM Aspergillus nidulans FGSC A4 Eukaryota; Fungi; Ascomycota; Pezizomycotina; Eurotiomycetes; Eurotiales; Trichocomaceae; Emericella. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=227321 Genus should be Aspergillus or Emericella ? Strain and subspecies/variety in the same entry SOURCE Cryptococcus neoformans var. grubii H99 ORGANISM Cryptococcus neoformans var. grubii H99 Eukaryota; Fungi; Basidiomycota; Hymenomycetes; Heterobasidiomycetes; Tremellomycetidae; Tremellales; Tremellaceae; Filobasidiella. http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=235443 > > My point is, a large number of users do NOT use, nor care about, > > taxonomic > > information to the degree they need to know the entire > > classification of the > > organism; many are just as happy about getting the scientific name > > only, > > which is in the GenBank/EMBL file itself. To take one extreme, it > > is not > > productive to force every user to download the NCBI tax database > > and use > > lookups just to convert sequences from EMBL format to GenBank > > format. It's > > not productive to allow users to spam the NCBI tax database > > remotely either, > > so hardcoding lookups is, IMHO, a big mistake. > > I don't think you need to add any information to turn an embl-format > file into a Genbank flatfile, but maybe I'm missing something obvious. The issue is the way the SOURCE and ORGANISM lines are handled (OS/OC lines in EMBL, I believe), which is using a Bio::Species object. The problem is, like I mentioned above, no hierarchal ranking is in the flat file, just the order of the ranking. We can try to make a best guess based on that but it's obviously very tricky, particularly when dealing with subspecies, strains, etc. NCBI also states that many times the classification can be too long for a file so may be incomplete (I think they leave out nodes which have 'no rank' tags, but I can't be completely sure), so there's another issue. Anyway, this is where the lookup would come in, which would require a local taxonomy database (we can't spam the NCBI remote database, that would just be rude) which would give the complete taxonomic classification if it worked properly. So now we have three possible situations: 1) One extreme : We require a lookup to get it right (which, BTW, it currently doesn't); this by default requires a local database. 2) Middle of the road : we try and guess the information as best as we can with the information given (the current situation); this is breaking more and more often now, so is becoming more unreliable. 3) Other extreme : we punt and absolve ourselves of even trying to parse the data and just have a strict tagname->value or similar simple construct to handle the data. #3 as default with option to do #1 is probably best (least error prone with option for most information), with caching to speed up lookups as Sendu Bala does now. Chris > Nadeem > > > -- > Dr S.M. Nadeem N. Faruque > 9 Barley Court > Saffron Walden > Essex CB11 3HG > 01799 500 120 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Mon May 15 21:37:56 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 15 May 2006 17:37:56 -0400 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <000a01c67862$0a00cab0$15327e82@pyrimidine> References: <000a01c67862$0a00cab0$15327e82@pyrimidine> Message-ID: <6CCA8112-651D-4154-94AE-88FE8EFBCD27@gmx.net> It does have the following line though (and a 'use' statement for OntologyIO); @ISA = qw( Bio::OntologyIO ); So what is it doing 'wrong' (there aren't any tests or so in which anything erroneous would show)? -hilmar On May 15, 2006, at 4:56 PM, Chris Fields wrote: > Okay, I see what you mean. Using the search term "Bio::Ont*" also > explains > why I didn't see it ;P. Yeah, the bug shows up here too (WinXP and > Mac OS > X), and those links are broken like you said. Could be something > to do with > indexing. > > Using the methods script in the FAQ > (http://www.bioperl.org/wiki/FAQ#Why_can. > 27t_I_easily_get_a_list_of_all_the_ > methods_a_object_can_call.3F) I get this: > > C:\Perl\Scripts>methods.pl Bio::OntologyIO::simplehierarchy > Bio::OntologyIO::simplehierarchy::Dumper > Bio::OntologyIO::simplehierarchy::basename > Bio::OntologyIO::simplehierarchy::dirname > Bio::OntologyIO::simplehierarchy::fileparse > Bio::OntologyIO::simplehierarchy::fileparse_set_fstype > > Chris > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp >> Sent: Monday, May 15, 2006 2:24 PM >> To: Chris Fields >> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' >> Subject: Re: [Bioperl-l] Deobfuscator interface now available >> >> I wasn't using the search. It's in the scrollable table for browsing. >> -hilmar >> >> On May 15, 2006, at 3:07 PM, Chris Fields wrote: >> >>> I'll have to give it a try on Mac OS X (we have an ancient G4 in >>> the lab >>> which I can try it on). I'll let you know what I find. >>> >>> This is what I get when I do a search for 'Bio::Ont*' using Firefox >>> on WinXP >>> and this Deobfuscator link (http://bioperl.org/cgi-bin/ >>> deob_interface.cgi?); >>> all the classes have links that work (I added newline and tab to >>> make it a >>> bit more readable) : >>> >>> Bio::OntologyIO >>> Parser factory for Ontology formats >>> Bio::OntologyIO::Handlers::BaseSAXHandler >>> no short description available >>> Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler >>> no short description available >>> Bio::Ontology::OntologyI >>> Interface for an ontology implementation >>> Bio::Ontology::TermFactory >>> Instantiates a new Bio::Ontology::TermI (or derived class) >>> through a >>> factory >>> Bio::Ontology::OntologyStore >>> A repository of ontologies >>> Bio::Ontology::RelationshipFactory >>> Instantiates a new Bio::Ontology::RelationshipI (or derived class) >>> through a factory >>> Bio::Ontology::Ontology >>> standard implementation of an Ontology >>> >>> So the names seem fine here. >>> >>> When I click on a class (Bio::Ontology::Ontology) I get in the >>> results >>> section: >>> >>> Method Class >>> Returns >>> Usage >>> add_relationship Bio::Ontology::Ontology >> Its >>> argument. add_relationship(RelationshipI relationship): >>> RelationshipI >>> add_relationship_type Bio::Ontology::OntologyEngineI >>> not >>> documented not documented >>> add_term Bio::Ontology::Ontology >>> its >>> argument. add_term(TermI term): TermI >>> >>> ....and so on >>> >>> Where each method is clickable and opens a new page containing a >>> table: >>> >>> Bio::Ontology::Ontology::add_relationship >>> Usage add_relationship(RelationshipI relationship): RelationshipI >>> Function Adds a relationship object to the ontology engine. >>> Returns Its argument. >>> Args A RelationshipI object. >>> >>> >>> Each class is also linked to the bioperl-live PDOC. Clicking on >>> class >>> Bio::Ontology::Ontology in the results table gets me this page >>> (no new >>> page): >>> >>> http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html >>> >>> >>> Chris >>> >>>> -----Original Message----- >>>> From: Hilmar Lapp [mailto:hlapp at gmx.net] >>>> Sent: Monday, May 15, 2006 1:09 PM >>>> To: Chris Fields >>>> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' >>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available >>>> >>>> Safari or Firefox on MacOSX don't do this. Note that the appearance >>>> in the browsable list is already different (the prefix is missing), >>>> and the JavaScript link also lacks the prefix in the module name in >>>> contrast to others, e.g., Bio::Ontology::Ontology (which is one of >>>> the few Bio::Ontology exceptions that do work and do display >>>> correctly). >>>> >>>> I suppose there is something peculiar about the code formatting of >>>> those modules? Some of the modules under Bio::OntologyIO are also >>>> affected BTW. >>>> >>>> What happens is after you click on the link the page apppears to >>>> reload (i.e., gets submitted) but the second table that is supposed >>>> open underneath the first doesn't appear. However, the sort-by drop >>>> down selector does appear. >>>> >>>> -hilmar >>>> >>>> On May 15, 2006, at 1:22 PM, Chris Fields wrote: >>>> >>>>> That's strange. Clicking on the list gives me the results for >>>>> that >>>>> module. >>>>> When I click on the hyperlinks in the results section they open >>>>> fine; the >>>>> method column links opens a new page containing usage-function- >>>>> returns-args >>>>> and the class column links opens pdoc (same page) for bioperl- >>>>> live. I'm >>>>> using Firefox 1.5 on WinXP. >>>>> >>>>> Chris >>>>> >>>>>> -----Original Message----- >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp >>>>>> Sent: Monday, May 15, 2006 12:01 PM >>>>>> To: Mauricio Herrera Cuadra >>>>>> Cc: bioperl-l >>>>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available >>>>>> >>>>>> Hey, thanks to Laura & David for this interface. >>>>>> >>>>>> Any idea why most of the Bio::Ontology::* modules show up without >>>>>> their leading Bio::Ontology? And clicking on those hyperlinks >>>>>> doesn't >>>>>> go anywhere either ... Anything different with those modules >>>>>> that I >>>>>> can fix? >>>>>> >>>>>> -hilmar >>>>>> >>>>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: >>>>>> >>>>>>> I'm glad to announce the availability of the Deobfuscator >>>>>>> interface at >>>>>>> the BioPerl website. You can use it at the following URL: >>>>>>> >>>>>>> http://bioperl.org/cgi-bin/deob_interface.cgi >>>>>>> >>>>>>> Many thanks to Laura Kavanaugh and David Messina for this great >>>>>>> contribution to the BioPerl project! >>>>>>> >>>>>>> Mauricio. >>>>>>> >>>>>>> -- >>>>>>> MAURICIO HERRERA CUADRA >>>>>>> arareko at campus.iztacala.unam.mx >>>>>>> Laboratorio de Gen?tica >>>>>>> Unidad de Morfofisiolog?a y Funci?n >>>>>>> Facultad de Estudios Superiores Iztacala, UNAM >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>> >>>>>> -- >>>>>> =========================================================== >>>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>>>> =========================================================== >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> -- >>>> =========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>> =========================================================== >>>> >>>> >>>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Mon May 15 22:03:48 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 15 May 2006 17:03:48 -0500 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: <6CCA8112-651D-4154-94AE-88FE8EFBCD27@gmx.net> Message-ID: <000d01c6786b$71c04e60$15327e82@pyrimidine> And Bio::OntologyIO works on it's own: C:\Perl\Scripts>methods.pl Bio::OntologyIO Bio::OntologyIO::DESTROY Bio::OntologyIO::new Bio::OntologyIO::next_ontology Bio::OntologyIO::term_factory Bio::OntologyIO::unescape Bio::Root::IO::catfile Bio::Root::IO::close Bio::Root::IO::dup Bio::Root::IO::exists_exe Bio::Root::IO::file Bio::Root::IO::flush Bio::Root::IO::gensym Bio::Root::IO::mode Bio::Root::IO::noclose Bio::Root::IO::qualify Bio::Root::IO::qualify_to_ref Bio::Root::IO::rmtree Bio::Root::IO::tempdir Bio::Root::IO::tempfile Bio::Root::IO::ungensym Bio::Root::Root::confess Bio::Root::Root::debug Bio::Root::Root::throw Bio::Root::Root::verbose Bio::Root::RootI::carp Bio::Root::RootI::deprecated Bio::Root::RootI::stack_trace Bio::Root::RootI::stack_trace_dump Bio::Root::RootI::throw_not_implemented Bio::Root::RootI::warn Bio::Root::RootI::warn_not_implemented But when I try these: C:\Perl\Scripts>methods.pl Bio::OntologyIO::goflat C:\Perl\Scripts>methods.pl Bio::OntologyIO::dagflat I get nada. It could be related to the way the methods are parsed using Class::Inspector : print join ("\n", sort @{Class::Inspector->methods($class,'full','public')}), "\n"; I haven't tried it on all the weird Bio::Ontology-missing modules (don't have time today). It's not common to all of those modules though: C:\Perl\Scripts>methods.pl Bio::OntologyIO::InterProParser Bio::OntologyIO::DESTROY Bio::OntologyIO::InterProParser::next_ontology Bio::OntologyIO::InterProParser::parse Bio::OntologyIO::InterProParser::secondary_accessions_map Bio::OntologyIO::new Bio::OntologyIO::term_factory Bio::OntologyIO::unescape Bio::Root::IO::catfile Bio::Root::IO::close Bio::Root::IO::dup Bio::Root::IO::exists_exe Bio::Root::IO::file Bio::Root::IO::flush Bio::Root::IO::gensym Bio::Root::IO::mode Bio::Root::IO::noclose Bio::Root::IO::qualify Bio::Root::IO::qualify_to_ref Bio::Root::IO::rmtree Bio::Root::IO::tempdir Bio::Root::IO::tempfile Bio::Root::IO::ungensym Bio::Root::Root::confess Bio::Root::Root::debug Bio::Root::Root::throw Bio::Root::Root::verbose Bio::Root::RootI::carp Bio::Root::RootI::deprecated Bio::Root::RootI::stack_trace Bio::Root::RootI::stack_trace_dump Bio::Root::RootI::throw_not_implemented Bio::Root::RootI::warn Bio::Root::RootI::warn_not_implemented Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > Sent: Monday, May 15, 2006 4:38 PM > To: Chris Fields > Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > Subject: Re: [Bioperl-l] Deobfuscator interface now available > > It does have the following line though (and a 'use' statement for > OntologyIO); > > @ISA = qw( Bio::OntologyIO ); > > So what is it doing 'wrong' (there aren't any tests or so in which > anything erroneous would show)? > > -hilmar > > On May 15, 2006, at 4:56 PM, Chris Fields wrote: > > > Okay, I see what you mean. Using the search term "Bio::Ont*" also > > explains > > why I didn't see it ;P. Yeah, the bug shows up here too (WinXP and > > Mac OS > > X), and those links are broken like you said. Could be something > > to do with > > indexing. > > > > Using the methods script in the FAQ > > (http://www.bioperl.org/wiki/FAQ#Why_can. > > 27t_I_easily_get_a_list_of_all_the_ > > methods_a_object_can_call.3F) I get this: > > > > C:\Perl\Scripts>methods.pl Bio::OntologyIO::simplehierarchy > > Bio::OntologyIO::simplehierarchy::Dumper > > Bio::OntologyIO::simplehierarchy::basename > > Bio::OntologyIO::simplehierarchy::dirname > > Bio::OntologyIO::simplehierarchy::fileparse > > Bio::OntologyIO::simplehierarchy::fileparse_set_fstype > > > > Chris > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > >> Sent: Monday, May 15, 2006 2:24 PM > >> To: Chris Fields > >> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > >> Subject: Re: [Bioperl-l] Deobfuscator interface now available > >> > >> I wasn't using the search. It's in the scrollable table for browsing. > >> -hilmar > >> > >> On May 15, 2006, at 3:07 PM, Chris Fields wrote: > >> > >>> I'll have to give it a try on Mac OS X (we have an ancient G4 in > >>> the lab > >>> which I can try it on). I'll let you know what I find. > >>> > >>> This is what I get when I do a search for 'Bio::Ont*' using Firefox > >>> on WinXP > >>> and this Deobfuscator link (http://bioperl.org/cgi-bin/ > >>> deob_interface.cgi?); > >>> all the classes have links that work (I added newline and tab to > >>> make it a > >>> bit more readable) : > >>> > >>> Bio::OntologyIO > >>> Parser factory for Ontology formats > >>> Bio::OntologyIO::Handlers::BaseSAXHandler > >>> no short description available > >>> Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler > >>> no short description available > >>> Bio::Ontology::OntologyI > >>> Interface for an ontology implementation > >>> Bio::Ontology::TermFactory > >>> Instantiates a new Bio::Ontology::TermI (or derived class) > >>> through a > >>> factory > >>> Bio::Ontology::OntologyStore > >>> A repository of ontologies > >>> Bio::Ontology::RelationshipFactory > >>> Instantiates a new Bio::Ontology::RelationshipI (or derived class) > >>> through a factory > >>> Bio::Ontology::Ontology > >>> standard implementation of an Ontology > >>> > >>> So the names seem fine here. > >>> > >>> When I click on a class (Bio::Ontology::Ontology) I get in the > >>> results > >>> section: > >>> > >>> Method Class > >>> Returns > >>> Usage > >>> add_relationship Bio::Ontology::Ontology > >> Its > >>> argument. add_relationship(RelationshipI relationship): > >>> RelationshipI > >>> add_relationship_type Bio::Ontology::OntologyEngineI > >>> not > >>> documented not documented > >>> add_term Bio::Ontology::Ontology > >>> its > >>> argument. add_term(TermI term): TermI > >>> > >>> ....and so on > >>> > >>> Where each method is clickable and opens a new page containing a > >>> table: > >>> > >>> Bio::Ontology::Ontology::add_relationship > >>> Usage add_relationship(RelationshipI relationship): RelationshipI > >>> Function Adds a relationship object to the ontology engine. > >>> Returns Its argument. > >>> Args A RelationshipI object. > >>> > >>> > >>> Each class is also linked to the bioperl-live PDOC. Clicking on > >>> class > >>> Bio::Ontology::Ontology in the results table gets me this page > >>> (no new > >>> page): > >>> > >>> http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html > >>> > >>> > >>> Chris > >>> > >>>> -----Original Message----- > >>>> From: Hilmar Lapp [mailto:hlapp at gmx.net] > >>>> Sent: Monday, May 15, 2006 1:09 PM > >>>> To: Chris Fields > >>>> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l' > >>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available > >>>> > >>>> Safari or Firefox on MacOSX don't do this. Note that the appearance > >>>> in the browsable list is already different (the prefix is missing), > >>>> and the JavaScript link also lacks the prefix in the module name in > >>>> contrast to others, e.g., Bio::Ontology::Ontology (which is one of > >>>> the few Bio::Ontology exceptions that do work and do display > >>>> correctly). > >>>> > >>>> I suppose there is something peculiar about the code formatting of > >>>> those modules? Some of the modules under Bio::OntologyIO are also > >>>> affected BTW. > >>>> > >>>> What happens is after you click on the link the page apppears to > >>>> reload (i.e., gets submitted) but the second table that is supposed > >>>> open underneath the first doesn't appear. However, the sort-by drop > >>>> down selector does appear. > >>>> > >>>> -hilmar > >>>> > >>>> On May 15, 2006, at 1:22 PM, Chris Fields wrote: > >>>> > >>>>> That's strange. Clicking on the list gives me the results for > >>>>> that > >>>>> module. > >>>>> When I click on the hyperlinks in the results section they open > >>>>> fine; the > >>>>> method column links opens a new page containing usage-function- > >>>>> returns-args > >>>>> and the class column links opens pdoc (same page) for bioperl- > >>>>> live. I'm > >>>>> using Firefox 1.5 on WinXP. > >>>>> > >>>>> Chris > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp > >>>>>> Sent: Monday, May 15, 2006 12:01 PM > >>>>>> To: Mauricio Herrera Cuadra > >>>>>> Cc: bioperl-l > >>>>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available > >>>>>> > >>>>>> Hey, thanks to Laura & David for this interface. > >>>>>> > >>>>>> Any idea why most of the Bio::Ontology::* modules show up without > >>>>>> their leading Bio::Ontology? And clicking on those hyperlinks > >>>>>> doesn't > >>>>>> go anywhere either ... Anything different with those modules > >>>>>> that I > >>>>>> can fix? > >>>>>> > >>>>>> -hilmar > >>>>>> > >>>>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote: > >>>>>> > >>>>>>> I'm glad to announce the availability of the Deobfuscator > >>>>>>> interface at > >>>>>>> the BioPerl website. You can use it at the following URL: > >>>>>>> > >>>>>>> http://bioperl.org/cgi-bin/deob_interface.cgi > >>>>>>> > >>>>>>> Many thanks to Laura Kavanaugh and David Messina for this great > >>>>>>> contribution to the BioPerl project! > >>>>>>> > >>>>>>> Mauricio. > >>>>>>> > >>>>>>> -- > >>>>>>> MAURICIO HERRERA CUADRA > >>>>>>> arareko at campus.iztacala.unam.mx > >>>>>>> Laboratorio de Gen?tica > >>>>>>> Unidad de Morfofisiolog?a y Funci?n > >>>>>>> Facultad de Estudios Superiores Iztacala, UNAM > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Bioperl-l mailing list > >>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>> > >>>>>> > >>>>>> -- > >>>>>> =========================================================== > >>>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >>>>>> =========================================================== > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>> > >>>> -- > >>>> =========================================================== > >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >>>> =========================================================== > >>>> > >>>> > >>>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > >> -- > >> =========================================================== > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >> =========================================================== > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Tue May 16 00:14:28 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Mon, 15 May 2006 19:14:28 -0500 Subject: [Bioperl-l] Memory Leak in Bio::SearchIO Message-ID: ---- Original message ---- >Date: Mon, 15 May 2006 15:40:15 -0400 >From: "Clarke, Wayne" >Subject: [Bioperl-l] Memory Leak in Bio::SearchIO >To: > >Hey everyone, > > > >I have been developing some code to download and parse blast reports >from a remote server using Soap::Lite as well as insert the results into >a mysql database. The problem I am having is that my program seems to be >taking up and huge amount of RAM. For a single job of 10000 queries it >can consume as much as a couple hundred Mb inside an hour. If you're parsing 10000 queries (10000 different BLAST reports, right?) then it's not necessarily a memory leak as much as it is object creatio. Each report generates hit objects which in turn generate hsp objects. I think Jason recommends using the tabular output option (-m8 or -m9) for huge reports as it cuts down considerably on this. If you are cycling through each report it shouldn't be as much of a problem unless your BLAST reports are really huge. Have you tried parsing a single report to see if the problem persists? Now, if you are using Bioperl 1.5.1 with BLAST 2.2.13 or newer, you'll likely run into a problem with an infinite loop that occurs due to a change in NCBI's text output. You can try updating bioperl from CVS in either case to see if that helps any. Tabular output and XML output, AFAIK, is the same regardless of version; this bug only affected text output of BLAST reports. > I realize >that a lot of work is being done but this seems like way too much. This >leads me to the subject of my post. I think I may have traced the source >of the memory leak to Bio::SearchIO. I have used Devel::Size to track >the size of my variables and done other debugging steps and have had no >luck with resolving this very frustrating problem. My code is as >follows: > > > > my $result = $connector->getQueryResult($query_id); > > > > my $FH; > > open $FH, "<", \$result; > > > > my $searchio = new Bio::SearchIO(-format => "blast", > > > > -fh => $FH); > > > > while (my $o_blast = $searchio->next_result()) { > > my $clone_id = $o_blast->query_name(); > > > > my $statement = $bdbi->form_push_SQL ($o_blast, >$clone_id, 5); > > > >this is just the leading and tailing code surrounding the use of >Bio::SearchIO since there is quite a lot. I am mostly just wondering if >anyone has ever had problems with SearchIO and its memory usage. I >looked at the source code for it but am afraid it is out of my league. >Any help/suggestions/questions would be great. Thanks > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From torsten.seemann at infotech.monash.edu.au Tue May 16 00:18:44 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 16 May 2006 10:18:44 +1000 Subject: [Bioperl-l] Memory Leak in Bio::SearchIO In-Reply-To: <320530F83FA47047823E57F110DDEAADB159EC@onncrxms4.agr.gc.ca> References: <320530F83FA47047823E57F110DDEAADB159EC@onncrxms4.agr.gc.ca> Message-ID: <44691A64.8040607@infotech.monash.edu.au> > taking up and huge amount of RAM. For a single job of 10000 queries it > can consume as much as a couple hundred Mb inside an hour. I realize > my $result = $connector->getQueryResult($query_id); > my $searchio = new Bio::SearchIO(-format => "blast", > while (my $o_blast = $searchio->next_result()) { > my $clone_id = $o_blast->query_name(); > my $statement = $bdbi->form_push_SQL ($o_blast, $clone_id, 5); } Some comments: Have you considered that whatever class/module $bdbi belongs to is causing the problem? ie. is it keeping a reference to $o_blast around? Are you aware that Perl garbage collection does not necessarily return freed memory back to the OS? This may affect how you were measuring "memory usage". -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From kmdaily at indiana.edu Mon May 15 21:00:12 2006 From: kmdaily at indiana.edu (Daily, Kenneth Michael) Date: Mon, 15 May 2006 17:00:12 -0400 Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO Message-ID: <20528E699A515C499B80C222BDBEBC34043FF8@iu-mssg-mbx108.ads.iu.edu> I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in Bio/SeqIO). How can I get this module? Kenny Daily IU School of Informatics kmdaily at indiana.edu From letondal at pasteur.fr Tue May 16 06:06:19 2006 From: letondal at pasteur.fr (Catherine Letondal) Date: Tue, 16 May 2006 08:06:19 +0200 Subject: [Bioperl-l] Deobfuscator interface now available In-Reply-To: References: <000901c67827$d99eabb0$15327e82@pyrimidine> Message-ID: <9c36140009c3d80bbb0d543376afa6e0@pasteur.fr> On May 15, 2006, at 9:34 PM, David Messina wrote: >>>> A couple of minor interface thoughts. >>>> >>>> 1)There's quite a lot of methods for many of the classes. As such, I >>>> think I'll often want to browse through what's available in a >>>> class. But >>>> 60% or so of the screen real estate is used for "Enter a search >>>> string... OR select a class from the list". IMO, it would be >>>> better to >>>> have two pages, a search page and a result page. It only takes >>>> a click >>>> on Back (or a "new search" button) to get to a new search, and >>>> now you >>>> can use your whole screen for reading your results. >>> >>> As the compromise it must be, I like the way it behaves. I don't like >>> lots of windows. I especially don't like pop up windows. Right now >>> when >>> I'm using the bioperl docs I tend to have a whole bunch of tabs >>> open to >>> different class pages at once, so being able to see an overview >>> all on >>> one page in Deobfuscator is very nice. > > I think the current behavior makes sense as the default, but I like > the idea of being able to view the search results in a separate > window for easier browsing. Thanks for the suggestion; I'll add it to > the list. > First, thanks for this very useful Web interface! There are examples (quite ajaxian ones) that reach a compromise between several windows for easily browsing large results, and composing everything in one window to get an overview - the 2 examples that come in my mind currently are (not biology related): - http://montreal.mspace.fm/chi/sched/ - http://www.live.com/ (see the slider on the top right enabling to squeeze or enlarge the results area) -- Catherine Letondal -- Institut Pasteur From cjfields at uiuc.edu Tue May 16 11:38:42 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Tue, 16 May 2006 06:38:42 -0500 Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO Message-ID: <36b52ba4.b94c5b79.8198c00@expms6.cites.uiuc.edu> You'll have to install from CVS. I believe Brian added Entrezgene.pm after the lst developer release (1.5.1): http://www.bioperl.org/wiki/Installing_BioPerl Chris ---- Original message ---- >Date: Mon, 15 May 2006 17:00:12 -0400 >From: "Daily, Kenneth Michael" >Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO >To: > >I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in Bio/SeqIO). How can I get this module? > >Kenny Daily >IU School of Informatics >kmdaily at indiana.edu > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From bernd.web at gmail.com Tue May 16 11:37:46 2006 From: bernd.web at gmail.com (Bernd Web) Date: Tue, 16 May 2006 13:37:46 +0200 Subject: [Bioperl-l] Bio::DB::Query::GenBank checks Message-ID: <716af09c0605160437tfcf824dxa514f38f6b94d423@mail.gmail.com> Hi all, I was using Bio::DB::Query::GenBank to obtain only IDs from Entrez and found some issues and differences (bugs?) in behaviour wrt the pod. Do these look familiar ? Some example code: my $query = Bio::DB::Query::GenBank->new (-query =>'Lassa Virus[ORGN]', -reldate => '30', -db => 'protein', -ids => [195052,2981014,11127914], -maxids => 30 ); $gb = new Bio::DB::GenBank(format=>'fasta'); my $seqio = $gb->get_Stream_by_query($query); while (my $seq = $seqio->next_seq) { print $seq->desc,"\n"; } The module states that if we provide -ids that: If you provide an array reference of IDs in -ids, the query will be ignored and the list of IDs will be used when the query is passed to a Bio::DB::GenBank object's get_Stream_by_query() method. In the above case actually the query is passed ('Lassa Virus[ORGN]), not the IDs. Also $query->query shows the original query. Am I doing something wrong or is the pod not reflecting current behaviour of this module? I was also surprised that if internet is down no warning is thrown for $query->query or $query->count at all. Only the get_Stream_by_query above will warn us if the site is unreachable (500 Internal Server Error). $query->ids or $query->count will not throw a warning and @ids=$query->ids will just be an empty array. (I realize $query->count is not initialized, so I am using this now to check for succes, but a warning from WebDBSeqI would me more approprotiate I think). Last, the example from the pod is not working, but no warnings are raised: # initialize the list yourself my $query = Bio::DB::Query::GenBank->new(-ids=>[195052,2981014,11127914]); $query->count returns zero w/o any warning. Of course this query did not specify a DB. Only if we specify -db=>'nucleotide' $query->count is 3. However, why not any warning if we set -db->'protein' or if we did not set this? On the NCBI website searching Protein DB returns for 19505: See Details. No items found. The following term(s) refer to a different DB:195052 But this is not reflected via Bio::DB::Query::GenBank. Can I check for this situation in the code apart from checking on $query->count == 0 ? Or would it indeed be better to check for these situations in the module? Regards, Bernd From chen_li3 at yahoo.com Tue May 16 14:55:51 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 16 May 2006 07:55:51 -0700 (PDT) Subject: [Bioperl-l] module for 6 reading frames Message-ID: <20060516145551.50370.qmail@web36802.mail.mud.yahoo.com> Hi all, I wonder which module is available for translating DNA sequence into 6 reading frames. Thank you, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From smarkel at scitegic.com Tue May 16 15:10:35 2006 From: smarkel at scitegic.com (smarkel at scitegic.com) Date: Tue, 16 May 2006 08:10:35 -0700 Subject: [Bioperl-l] module for 6 reading frames In-Reply-To: <20060516145551.50370.qmail@web36802.mail.mud.yahoo.com> Message-ID: Li, Use the translate() function in Bio::Tools::CodonTable. Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at scitegic.com SciTegic Inc. mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 279 8804 USA web: http://www.scitegic.com bioperl-l-bounces at lists.open-bio.org wrote on 16.05.2006 07:55:51: > Hi all, > > I wonder which module is available for translating DNA > sequence into 6 reading frames. > > Thank you, > > Li From golharam at umdnj.edu Tue May 16 16:18:19 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Tue, 16 May 2006 12:18:19 -0400 Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene? Message-ID: <001f01c67904$59b08ad0$2f01a8c0@GOLHARMOBILE1> I just updated my local copy of bioperl from cvs. When I ran the configure script, it says I need the external module Bio::ASN1::EntrezGene. Which package contains this module? -- Ryan Golhar - golharam at umdnj.edu The Informatics Institute of UMDNJ From golharam at umdnj.edu Tue May 16 16:24:03 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Tue, 16 May 2006 12:24:03 -0400 Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene? Message-ID: <002001c67905$2622a580$2f01a8c0@GOLHARMOBILE1> Never mind. I see its in CPAN. -----Original Message----- From: Ryan Golhar [mailto:golharam at umdnj.edu] Sent: Tuesday, May 16, 2006 12:18 PM To: 'bioperl-l at bioperl.org' Subject: Where is Bio::ASN1::EntrezGene? I just updated my local copy of bioperl from cvs. When I ran the configure script, it says I need the external module Bio::ASN1::EntrezGene. Which package contains this module? -- Ryan Golhar - golharam at umdnj.edu The Informatics Institute of UMDNJ From cjfields at uiuc.edu Tue May 16 17:27:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 16 May 2006 12:27:32 -0500 Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene? In-Reply-To: <001f01c67904$59b08ad0$2f01a8c0@GOLHARMOBILE1> Message-ID: <002701c6790e$03d8f110$15327e82@pyrimidine> It's actually not part of Bioperl currently; you can find it on CPAN: http://search.cpan.org/~mingyiliu/Bio-ASN1-EntrezGene-1.091/lib/Bio/ASN1/Ent rezGene.pm Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Ryan Golhar > Sent: Tuesday, May 16, 2006 11:18 AM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene? > > I just updated my local copy of bioperl from cvs. When I ran the > configure script, it says I need the external module > Bio::ASN1::EntrezGene. Which package contains this module? > > -- > Ryan Golhar - golharam at umdnj.edu > The Informatics Institute of UMDNJ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ClarkeW at AGR.GC.CA Tue May 16 20:57:13 2006 From: ClarkeW at AGR.GC.CA (Clarke, Wayne) Date: Tue, 16 May 2006 16:57:13 -0400 Subject: [Bioperl-l] Memory Leak in Bio::SearchIO Message-ID: <320530F83FA47047823E57F110DDEAADB159FB@onncrxms4.agr.gc.ca> With regards to the suggestions/comments made thank you. However I think I should clear a few things up. I am running bioperl v1.4, I am cycling through the blast reports which should not be of absurd size since they only contain the top 5 hits, and I am using top to track(although I realize fairly inacuately) the memory usage. I have looked through the code for both AAFCBLAST and BEAST_UPDATE but do not believe the leak/problem to be contained within them since they are almost exclusively using method calls and those variables should be destroyed upon leaving the scope of the method. I have used Devel::Size to check the size of the variables $bdbi and $searchio and $connector and on each iteration these variables have the same size. Any other suggestions would be greatly appreciated as I have nearly gone insane trying to track this problem down. Thanks, Wayne -----Original Message----- From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au] Sent: Monday, May 15, 2006 6:19 PM To: Clarke, Wayne Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO > taking up and huge amount of RAM. For a single job of 10000 queries it > can consume as much as a couple hundred Mb inside an hour. I realize > my $result = $connector->getQueryResult($query_id); > my $searchio = new Bio::SearchIO(-format => "blast", > while (my $o_blast = $searchio->next_result()) { > my $clone_id = $o_blast->query_name(); > my $statement = $bdbi->form_push_SQL ($o_blast, $clone_id, 5); } Some comments: Have you considered that whatever class/module $bdbi belongs to is causing the problem? ie. is it keeping a reference to $o_blast around? Are you aware that Perl garbage collection does not necessarily return freed memory back to the OS? This may affect how you were measuring "memory usage". -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From smarkel at scitegic.com Tue May 16 20:52:05 2006 From: smarkel at scitegic.com (smarkel at scitegic.com) Date: Tue, 16 May 2006 13:52:05 -0700 Subject: [Bioperl-l] module for 6 reading frames In-Reply-To: <20060516200436.34908.qmail@web36812.mail.mud.yahoo.com> Message-ID: Li, You can either do the substring, and reverse complement, yourself or you can use the translate() function in Bio::PrimarySeq. It inherits from Bio::PrimarySeqI, so check there for the documentation. That translate() function takes a "-frame" argument. Scott PS In future, please respond to the list. That way others see the questions and answers. chen li wrote on 16.05.2006 13:04:36: > Dear Dr. Markel, > > I browse through the document of > Bio:Tools::Codontable and find this line: > > my $translation= $CodonTable->translate($seq); > > I think this line is to do the translation. Here is my > question: which line in the doc says how to translate > the remaining frames 2,3, and -1, -2, -3? > > > Thank you, > > Li > > --- smarkel at scitegic.com wrote: > > > Li, > > > > Use the translate() function in > > Bio::Tools::CodonTable. > > > > Scott > > > > Scott Markel, Ph.D. > > Principal Bioinformatics Architect email: > > smarkel at scitegic.com > > SciTegic Inc. mobile: +1 858 > > 205 3653 > > 10188 Telesis Court, Suite 100 voice: +1 858 > > 799 5603 > > San Diego, CA 92121 fax: +1 858 > > 279 8804 > > USA web: > > http://www.scitegic.com > > > > > > bioperl-l-bounces at lists.open-bio.org wrote on > > 16.05.2006 07:55:51: > > > > > Hi all, > > > > > > I wonder which module is available for translating > > DNA > > > sequence into 6 reading frames. > > > > > > Thank you, > > > > > > Li > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > > > -- > Click on the link below to report this email as spam > https://www.mailcontrol. > com/sr/YWaRnXqa+nSyeG1Z34OqL4dC5eYKMoJmYLQSBonkiAgNVwARwO! > frAkRrVu9wDE5L8wrIaSzXTpcs3mxX9Ufx7LAO0PQl77O8HiAh50c4TI! > ysIW++WTn79gM0HS11zvKPuUVANsGXCZT! > LRAY3PyyLo6NzoChgLXk6YfX05ndLG3vE+GH2aUSTxvV3pwd2! > JlBh9ARAt+OXXsyYtG6VgFNOO9GFnNxV From cjfields at uiuc.edu Tue May 16 21:15:10 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 16 May 2006 16:15:10 -0500 Subject: [Bioperl-l] Memory Leak in Bio::SearchIO In-Reply-To: <320530F83FA47047823E57F110DDEAADB159FB@onncrxms4.agr.gc.ca> Message-ID: <000601c6792d$d0ab1500$15327e82@pyrimidine> I mentioned two possibilities last time I posted: 1) that the BLAST file was too large, or 2) that you are using an old version of bioperl that SearchIO is broken. You seem to fit #2. The issue is that NCBI does not consider text BLAST output sacrosanct and routinely makes changes to it that break parsing. Due to this, SearchIO::blast needs to be constantly updated, so much so that there are normally a few updates a year to fix parsing issues in that module alone compared to BioPerl as a whole. And, BTW, although bioperl-1.4 is about 2 years old now, even bioperl-1.5.1 SearchIO is broken when it comes to the latest NCBI BLAST (2.2.14 now). I seriously suggest updating your local bioperl distribution to the latest bioperl-live (from CVS). Take one of those 10000 reports, just one, and try parsing it. If you have the same problem (a CPU spike and increasing memory usage) then it may be fixed in CVS. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne > Sent: Tuesday, May 16, 2006 3:57 PM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO > > > With regards to the suggestions/comments made thank you. However I think > I should clear a few things up. I am running bioperl v1.4, I am cycling > through the blast reports which should not be of absurd size since they > only contain the top 5 hits, and I am using top to track(although I > realize fairly inacuately) the memory usage. I have looked through the > code for both AAFCBLAST and BEAST_UPDATE but do not believe the > leak/problem to be contained within them since they are almost > exclusively using method calls and those variables should be destroyed > upon leaving the scope of the method. I have used Devel::Size to check > the size of the variables $bdbi and $searchio and $connector and on each > iteration these variables have the same size. Any other suggestions > would be greatly appreciated as I have nearly gone insane trying to > track this problem down. > > Thanks, Wayne > > > -----Original Message----- > From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au] > Sent: Monday, May 15, 2006 6:19 PM > To: Clarke, Wayne > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO > > > taking up and huge amount of RAM. For a single job of 10000 queries it > > can consume as much as a couple hundred Mb inside an hour. I realize > > > my $result = $connector->getQueryResult($query_id); > > my $searchio = new Bio::SearchIO(-format => "blast", > > while (my $o_blast = $searchio->next_result()) { > > my $clone_id = $o_blast->query_name(); > > my $statement = $bdbi->form_push_SQL > ($o_blast, $clone_id, 5); } > > Some comments: > > Have you considered that whatever class/module $bdbi belongs to is > causing the problem? ie. is it keeping a reference to $o_blast around? > > Are you aware that Perl garbage collection does not necessarily return > freed memory back to the OS? This may affect how you were measuring > "memory usage". > > -- > Dr Torsten Seemann http://www.vicbioinformatics.com > Victorian Bioinformatics Consortium, Monash University, Australia > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ClarkeW at AGR.GC.CA Tue May 16 21:24:51 2006 From: ClarkeW at AGR.GC.CA (Clarke, Wayne) Date: Tue, 16 May 2006 17:24:51 -0400 Subject: [Bioperl-l] Memory Leak in Bio::SearchIO Message-ID: <320530F83FA47047823E57F110DDEAADB159FC@onncrxms4.agr.gc.ca> Thanks Chris, I did forget to mention however that I did parse one single report and found no problems, it finished fast and with no noticeable memory usage. I will consider getting my SA to update bioperl from CVS as a precaution but he has already stated he prefers to wait for the release of v1.5. Even a single job of 10000 will finish but the problem is that I am trying to loop through many jobs of 10000 and it seems to be additive for reasons I can not determine. During testing I noticed that the RSS on top decreased around 80% MEM usage, but then the shared mem increased. I am wondering if this is due to the perl garbage collector freeing up memory but keeping it in its pool for use, if so that is fine as long as the it does not then want to reach into swapped mem. Thanks again, Wayne -----Original Message----- From: Chris Fields [mailto:cjfields at uiuc.edu] Sent: Tuesday, May 16, 2006 3:15 PM To: Clarke, Wayne; bioperl-l at lists.open-bio.org Subject: RE: [Bioperl-l] Memory Leak in Bio::SearchIO I mentioned two possibilities last time I posted: 1) that the BLAST file was too large, or 2) that you are using an old version of bioperl that SearchIO is broken. You seem to fit #2. The issue is that NCBI does not consider text BLAST output sacrosanct and routinely makes changes to it that break parsing. Due to this, SearchIO::blast needs to be constantly updated, so much so that there are normally a few updates a year to fix parsing issues in that module alone compared to BioPerl as a whole. And, BTW, although bioperl-1.4 is about 2 years old now, even bioperl-1.5.1 SearchIO is broken when it comes to the latest NCBI BLAST (2.2.14 now). I seriously suggest updating your local bioperl distribution to the latest bioperl-live (from CVS). Take one of those 10000 reports, just one, and try parsing it. If you have the same problem (a CPU spike and increasing memory usage) then it may be fixed in CVS. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne > Sent: Tuesday, May 16, 2006 3:57 PM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO > > > With regards to the suggestions/comments made thank you. However I think > I should clear a few things up. I am running bioperl v1.4, I am cycling > through the blast reports which should not be of absurd size since they > only contain the top 5 hits, and I am using top to track(although I > realize fairly inacuately) the memory usage. I have looked through the > code for both AAFCBLAST and BEAST_UPDATE but do not believe the > leak/problem to be contained within them since they are almost > exclusively using method calls and those variables should be destroyed > upon leaving the scope of the method. I have used Devel::Size to check > the size of the variables $bdbi and $searchio and $connector and on each > iteration these variables have the same size. Any other suggestions > would be greatly appreciated as I have nearly gone insane trying to > track this problem down. > > Thanks, Wayne > > > -----Original Message----- > From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au] > Sent: Monday, May 15, 2006 6:19 PM > To: Clarke, Wayne > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO > > > taking up and huge amount of RAM. For a single job of 10000 queries it > > can consume as much as a couple hundred Mb inside an hour. I realize > > > my $result = $connector->getQueryResult($query_id); > > my $searchio = new Bio::SearchIO(-format => "blast", > > while (my $o_blast = $searchio->next_result()) { > > my $clone_id = $o_blast->query_name(); > > my $statement = $bdbi->form_push_SQL > ($o_blast, $clone_id, 5); } > > Some comments: > > Have you considered that whatever class/module $bdbi belongs to is > causing the problem? ie. is it keeping a reference to $o_blast around? > > Are you aware that Perl garbage collection does not necessarily return > freed memory back to the OS? This may affect how you were measuring > "memory usage". > > -- > Dr Torsten Seemann http://www.vicbioinformatics.com > Victorian Bioinformatics Consortium, Monash University, Australia > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Tue May 16 21:45:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 16 May 2006 16:45:16 -0500 Subject: [Bioperl-l] Memory Leak in Bio::SearchIO In-Reply-To: <320530F83FA47047823E57F110DDEAADB159FC@onncrxms4.agr.gc.ca> Message-ID: <000801c67932$050dbd30$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne > Sent: Tuesday, May 16, 2006 4:25 PM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO > > > Thanks Chris, > > I did forget to mention however that I did parse one single report and > found no problems, it finished fast and with no noticeable memory usage. > I will consider getting my SA to update bioperl from CVS as a precaution > but he has already stated he prefers to wait for the release of v1.5. Um, you can tell him the last release was v.1.5.1 (last October). It's considered a developer release but is pretty stable; well, except for that whole SearchIO quibble, and that's not our fault. You could also install a local version in case he doesn't budge; see here: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPERL_I N_A_PERSONAL_MODULE_AREA Chris > Even a single job of 10000 will finish but the problem is that I am > trying to loop through many jobs of 10000 and it seems to be additive > for reasons I can not determine. During testing I noticed that the RSS > on top decreased around 80% MEM usage, but then the shared mem > increased. I am wondering if this is due to the perl garbage collector > freeing up memory but keeping it in its pool for use, if so that is fine > as long as the it does not then want to reach into swapped mem. > > Thanks again, Wayne > ... From cjfields at uiuc.edu Tue May 16 22:20:29 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 16 May 2006 17:20:29 -0500 Subject: [Bioperl-l] Bio::DB::Query::GenBank checks In-Reply-To: <716af09c0605160437tfcf824dxa514f38f6b94d423@mail.gmail.com> Message-ID: <000901c67936$f0896990$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Bernd Web > Sent: Tuesday, May 16, 2006 6:38 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::DB::Query::GenBank checks > > Hi all, > > I was using Bio::DB::Query::GenBank to obtain only IDs from Entrez and > found some issues and differences (bugs?) in behaviour wrt the pod. > Do these look familiar ? > > Some example code: > my $query = Bio::DB::Query::GenBank->new > (-query =>'Lassa Virus[ORGN]', > -reldate => '30', > -db => 'protein', > -ids => [195052,2981014,11127914], > -maxids => 30 ); > > $gb = new Bio::DB::GenBank(format=>'fasta'); > my $seqio = $gb->get_Stream_by_query($query); > while (my $seq = $seqio->next_seq) { > print $seq->desc,"\n"; } > > The module states that if we provide -ids that: > If you provide an array reference of IDs in -ids, the query will be > ignored and the list of IDs will be used when the query is passed > to a > Bio::DB::GenBank object's get_Stream_by_query() method. > > In the above case actually the query is passed ('Lassa Virus[ORGN]), > not the IDs. Also $query->query shows the original query. Am I doing > something wrong or is the pod not reflecting current behaviour of this > module? > > I was also surprised that if internet is down no warning is thrown for > $query->query or $query->count at all. Only the get_Stream_by_query > above will warn us if the site is unreachable (500 Internal Server > Error). I believe this has to do with the difference in the objects and the way they retrieve request data; Bio::DB::GenBank and Bio::DB::Query::GenBank use different methods to retrieve ids, Bio::DB::GenBank's get_Stream_by_query method just makes it a bit easier to retrieve a list of uid's directly instead of saving them as an array then reposting them using get_Stream_by_id. Not fullproof but it works okay. > $query->ids or $query->count will not throw a warning and > @ids=$query->ids will just be an empty array. (I realize $query->count > is not initialized, so I am using this now to check for succes, but a > warning from WebDBSeqI would me more approprotiate I think). WebDBSeqI would be the place to make general warnings (it supposed to be and interface for any web seq DB), but not eutils-specific warnings. > Last, the example from the pod is not working, but no warnings are raised: > # initialize the list yourself > my $query = > Bio::DB::Query::GenBank->new(-ids=>[195052,2981014,11127914]); > > $query->count returns zero w/o any warning. Of course this query did > not specify a DB. Only if we specify -db=>'nucleotide' $query->count > is 3. > However, why not any warning if we set -db->'protein' or if we did not set > this? > > > On the NCBI website searching Protein DB returns for 19505: > See Details. No items found. > The following term(s) refer to a different DB:195052 > > But this is not reflected via Bio::DB::Query::GenBank. > > Can I check for this situation in the code apart from checking on > $query->count == 0 ? Or would it indeed be better to check for these > situations in the module? > > Regards, > Bernd I can probably play around with adding a few things in tomorrow and clean up the POD somewhat. I'm planning a rewrite for EUtilities-based searches but that's a ways off still... Can't promise much;l I'm pretty busy til next week. Chris From chen_li3 at yahoo.com Wed May 17 00:53:17 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 16 May 2006 17:53:17 -0700 (PDT) Subject: [Bioperl-l] module for formating sequence output on the screen Message-ID: <20060517005317.3976.qmail@web36815.mail.mud.yahoo.com> Hi all, Thank you very much for the help. I have some DNA sequences printed on the screen. But the default output is longer than I expect. I need 50 necleotides/line. I search CPAN but can not get the right module. Which bioperl module can do this job? Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From kmdaily at indiana.edu Tue May 16 13:57:52 2006 From: kmdaily at indiana.edu (Daily, Kenneth Michael) Date: Tue, 16 May 2006 09:57:52 -0400 Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO References: <36b52ba4.b94c5b79.8198c00@expms6.cites.uiuc.edu> Message-ID: <20528E699A515C499B80C222BDBEBC34043FFB@iu-mssg-mbx108.ads.iu.edu> OK, got that installed. But I still get an error: Can't locate object method "url" via package "Bio::Annotation::DBLink" at /home/kmdaily/src/bioperl/core/Bio/SeqIO/entrezgene.pm line 557. I am using this on a shared system, and an older version of Bioperl was installed by the admin. But the path to the one I downloaded via CVS is first in the list @INC, and PERL5LIB="/home/kmdaily/src/bioperl/core". Kenny Daily IU School of Informatics kmdaily at indiana.edu -----Original Message----- From: Christopher Fields [mailto:cjfields at uiuc.edu] Sent: Tue 5/16/2006 7:38 AM To: Daily, Kenneth Michael; bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] entrezgene.pm not in Bio::SeqIO You'll have to install from CVS. I believe Brian added Entrezgene.pm after the lst developer release (1.5.1): http://www.bioperl.org/wiki/Installing_BioPerl Chris ---- Original message ---- >Date: Mon, 15 May 2006 17:00:12 -0400 >From: "Daily, Kenneth Michael" >Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO >To: > >I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in Bio/SeqIO). How can I get this module? > >Kenny Daily >IU School of Informatics >kmdaily at indiana.edu > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From skirov at utk.edu Wed May 17 11:48:29 2006 From: skirov at utk.edu (Stefan Kirov) Date: Wed, 17 May 2006 07:48:29 -0400 Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO In-Reply-To: <20528E699A515C499B80C222BDBEBC34043FFB@iu-mssg-mbx108.ads.iu.edu> References: <36b52ba4.b94c5b79.8198c00@expms6.cites.uiuc.edu> <20528E699A515C499B80C222BDBEBC34043FFB@iu-mssg-mbx108.ads.iu.edu> Message-ID: <446B0D8D.40901@utk.edu> You are using an old Bio::Annotation::DBLink module. Did you download only entrezgene.pm or the whole bioperl? If yes, what does the tests tell you? Stefan Daily, Kenneth Michael wrote: >OK, got that installed. But I still get an error: > >Can't locate object method "url" via package "Bio::Annotation::DBLink" at /home/kmdaily/src/bioperl/core/Bio/SeqIO/entrezgene.pm line 557. > >I am using this on a shared system, and an older version of Bioperl was installed by the admin. But the path to the one I downloaded via CVS is first in the list @INC, and PERL5LIB="/home/kmdaily/src/bioperl/core". > >Kenny Daily >IU School of Informatics >kmdaily at indiana.edu > > > >-----Original Message----- >From: Christopher Fields [mailto:cjfields at uiuc.edu] >Sent: Tue 5/16/2006 7:38 AM >To: Daily, Kenneth Michael; bioperl-l at lists.open-bio.org >Subject: Re: [Bioperl-l] entrezgene.pm not in Bio::SeqIO > >You'll have to install from CVS. I believe Brian added Entrezgene.pm after the lst >developer release (1.5.1): > >http://www.bioperl.org/wiki/Installing_BioPerl > >Chris > >---- Original message ---- > > >>Date: Mon, 15 May 2006 17:00:12 -0400 >>From: "Daily, Kenneth Michael" >>Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO >>To: >> >>I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in >> >> >Bio/SeqIO). How can I get this module? > > >>Kenny Daily >>IU School of Informatics >>kmdaily at indiana.edu >> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From osborne1 at optonline.net Wed May 17 00:46:00 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 16 May 2006 20:46:00 -0400 Subject: [Bioperl-l] module for 6 reading frames In-Reply-To: Message-ID: Chen Li, There's some documentation on translate() in bptutorial: http://bioperl.org/Core/Latest/bptutorial.html You could also use the translate_6frames() method of Bio::SeqUtils. Brian O. On 5/16/06 4:52 PM, "smarkel at scitegic.com" wrote: > Li, > > You can either do the substring, and reverse complement, yourself > or you can use the translate() function in Bio::PrimarySeq. It > inherits from Bio::PrimarySeqI, so check there for the documentation. > That translate() function takes a "-frame" argument. > > Scott > > PS In future, please respond to the list. That way others see > the questions and answers. > > chen li wrote on 16.05.2006 13:04:36: > >> Dear Dr. Markel, >> >> I browse through the document of >> Bio:Tools::Codontable and find this line: >> >> my $translation= $CodonTable->translate($seq); >> >> I think this line is to do the translation. Here is my >> question: which line in the doc says how to translate >> the remaining frames 2,3, and -1, -2, -3? >> >> >> Thank you, >> >> Li >> >> --- smarkel at scitegic.com wrote: >> >>> Li, >>> >>> Use the translate() function in >>> Bio::Tools::CodonTable. >>> >>> Scott >>> >>> Scott Markel, Ph.D. >>> Principal Bioinformatics Architect email: >>> smarkel at scitegic.com >>> SciTegic Inc. mobile: +1 858 >>> 205 3653 >>> 10188 Telesis Court, Suite 100 voice: +1 858 >>> 799 5603 >>> San Diego, CA 92121 fax: +1 858 >>> 279 8804 >>> USA web: >>> http://www.scitegic.com >>> >>> >>> bioperl-l-bounces at lists.open-bio.org wrote on >>> 16.05.2006 07:55:51: >>> >>>> Hi all, >>>> >>>> I wonder which module is available for translating >>> DNA >>>> sequence into 6 reading frames. >>>> >>>> Thank you, >>>> >>>> Li >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> __________________________________________________ >> Do You Yahoo!? >> Tired of spam? Yahoo! Mail has the best spam protection around >> http://mail.yahoo.com >> >> >> -- >> Click on the link below to report this email as spam >> https://www.mailcontrol. >> com/sr/YWaRnXqa+nSyeG1Z34OqL4dC5eYKMoJmYLQSBonkiAgNVwARwO! >> frAkRrVu9wDE5L8wrIaSzXTpcs3mxX9Ufx7LAO0PQl77O8HiAh50c4TI! >> ysIW++WTn79gM0HS11zvKPuUVANsGXCZT! >> LRAY3PyyLo6NzoChgLXk6YfX05ndLG3vE+GH2aUSTxvV3pwd2! >> JlBh9ARAt+OXXsyYtG6VgFNOO9GFnNxV > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From e-just at northwestern.edu Wed May 17 15:03:41 2006 From: e-just at northwestern.edu (Eric Just) Date: Wed, 17 May 2006 10:03:41 -0500 Subject: [Bioperl-l] Modware: a BioPerl based API for Chado Message-ID: <6.1.1.1.2.20060517095821.13353920@hecky.it.northwestern.edu> Hi Everyone, We are announcing a new Sourceforge Project called Modware that may be of interest to you. It is an object-oriented API written in Perl that creates BioPerl object representations of biological features stored in a Chado database. It basically creates a Bio::Seq object for chromosomes in Chado and creates Bio::SeqFeature::Gene objects for protein coding transcripts stored in Chado. Things like contigs are represented as Bio::SeqFeature::Generic objects. We also provide many methods for manipulating these objects once they are in memory. For download please visit our Sourceforge project page: http://sourceforge.net/projects/gmod-ware For API documentation and some short examples of selected use cases visit our project home page: http://gmod-ware.sourceforge.net/ This software is adapted from the production middleware code that dictyBase uses. Modware 0.1 requires the latest stable GMOD release: 0.003 be installed. We are currently calling it a release candidate and if we get some feedback will call it an official release if there are no major install bugs (we've installed it only on two different machines). If you would like a version that works on the latest CVS version of GMOD, let me know and I'll expedite getting that out the door. Lastly, please use the direct download version, we have not fully recovered from the recent Sourceforge CVS issues. Please try the software out and let us know what you think! Sincerely, Eric Just and Sohel Merchant e-just at northwestern.edu s-merchant at northwestern.edu ============================================ Eric Just e-just at northwestern.edu dictyBase Programmer Center for Genetic Medicine Northwestern University http://dictybase.org ============================================ From sb at mrc-dunn.cam.ac.uk Wed May 17 17:46:45 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Wed, 17 May 2006 18:46:45 +0100 Subject: [Bioperl-l] Bio::Map:: enhancements Message-ID: <446B6185.1000602@mrc-dunn.cam.ac.uk> I added bug http://bugzilla.bioperl.org/show_bug.cgi?id=1998 I'm interested in what people have to say about the secondary enhancement I talk about there. Is it a sane thing to do? What are the better ways of doing that? If it /is/ ok, I suppose I'd have to go back and alter Bio::Map::MappableI and Bio::Map::MarkerI as well, not just Marker. Oh, on a side note, you'll see I had to override RangeI's intersection method to work on multiple ranges. Why is RangeI limited to an intersection of only two ranges? Cheers, Sendu. From David_Waner/San_Diego/Accelrys at scitegic.com Thu May 18 19:30:46 2006 From: David_Waner/San_Diego/Accelrys at scitegic.com (David_Waner/San_Diego/Accelrys at scitegic.com) Date: Thu, 18 May 2006 12:30:46 -0700 Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8 on Windows Message-ID: BioPerl Users/Developers, In our testing we have found severe performance problems using BioPerl with Perl 5.8 on Windows (but not on Linux). They show up especially in SeqIO when reading or writing Fasta files containing large (~16 MB) sequences. The same files that can be read in 1 or 2 seconds with Windows Perl 5.6 or Linux Perl 5.8, take minutes in Windows Perl 5.8. Although the fault is clearly with Perl, not with BioPerl, I have identified a couple of places where BioPerl could be modified in order to save Windows Perl 5.8 users a lot of time, while not affecting other users. For example, in my testing the following excerpt from Bio::Root::IO::_readline() takes 50 seconds (!) to execute (when reading a 16 MB sequence): if( (!$param{-raw}) && (defined $line) ) { $line =~ s/\015?\012/\n/g; $line =~ s/\015/\n/g unless $ONMAC; } whereas the following replacement code should be equivalent: if( (!$param{-raw}) && (defined $line) ) { $line =~ s/\015\012/\012/g; # Change all CR/LF pairs to LF $line =~ tr/\015/\n/ unless $ONMAC; # Change all single CRs to NEWLINE } but executes in less than 1 second. In addition, changing: defined $sequence && $sequence =~ s/\s//g; # Remove whitespace to: defined $sequence && $sequence =~ tr/ \t\n\r//d; # Remove whitespace in Bio::SeqIO::fasta.pm saves an additional ~20 seconds. There are also problems in reading files with the <> operator when $/ is redefined to "\n>", where reading the first line of Fasta files containing large sequences takes ~50 seconds, but reading subsequent lines or files takes about 1 second. I don't have a work-around for this. I would like to ask the mailing list: 1. Has anyone else run into this problem? Any fixes? 2. Do you think BioPerl should incorporate these changes? I plan to submit a bug report to perlbug, but don't know when or if the problem will be fixed. - David From cjfields at uiuc.edu Thu May 18 20:07:14 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 18 May 2006 15:07:14 -0500 Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8 onWindows In-Reply-To: Message-ID: <002901c67ab6$a84c3140$15327e82@pyrimidine> David, I have seen some slowdowns with Bio::SeqIO associated with GenBank files, which this could be related to. I can't do anything about it (test or commit changes) until next week but someone else using Windows might (though we are few and far between, and I'm switching to Mac OS X in fall). Would be nice to try the changes and test it out on a few platforms. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of > David_Waner/San_Diego/Accelrys at scitegic.com > Sent: Thursday, May 18, 2006 2:31 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8 > onWindows > > BioPerl Users/Developers, > > In our testing we have found severe performance problems using BioPerl > with Perl 5.8 on Windows (but not on Linux). They show up especially in > SeqIO when reading or writing Fasta files containing large (~16 MB) > sequences. The same files that can be read in 1 or 2 seconds with Windows > Perl 5.6 or Linux Perl 5.8, take minutes in Windows Perl 5.8. > > Although the fault is clearly with Perl, not with BioPerl, I have > identified a couple of places where BioPerl could be modified in order to > save Windows Perl 5.8 users a lot of time, while not affecting other > users. > > For example, in my testing the following excerpt from > Bio::Root::IO::_readline() takes 50 seconds (!) to execute (when reading a > 16 MB sequence): > > if( (!$param{-raw}) && (defined $line) ) { > $line =~ s/\015?\012/\n/g; > $line =~ s/\015/\n/g unless $ONMAC; > } > > whereas the following replacement code should be equivalent: > > if( (!$param{-raw}) && (defined $line) ) { > $line =~ s/\015\012/\012/g; # Change all > CR/LF pairs to LF > $line =~ tr/\015/\n/ unless $ONMAC; # Change all single CRs to > NEWLINE > } > > but executes in less than 1 second. > > In addition, changing: > > defined $sequence && $sequence =~ s/\s//g; # Remove whitespace > > to: > > defined $sequence && $sequence =~ tr/ \t\n\r//d; # Remove > whitespace > > in Bio::SeqIO::fasta.pm saves an additional ~20 seconds. > > There are also problems in reading files with the <> operator when $/ is > redefined to "\n>", where reading the first line of Fasta files containing > large sequences takes ~50 seconds, but reading subsequent lines or files > takes about 1 second. I don't have a work-around for this. > > I would like to ask the mailing list: > > 1. Has anyone else run into this problem? Any fixes? > 2. Do you think BioPerl should incorporate these changes? > > I plan to submit a bug report to perlbug, but don't know when or if the > problem will be fixed. > > - David > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Thu May 18 20:27:57 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 18 May 2006 16:27:57 -0400 Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8 on Windows In-Reply-To: Message-ID: David, What are the results from the relevant t/*t files before and after these patches? Brian O. On 5/18/06 3:30 PM, "David_Waner/San_Diego/Accelrys at scitegic.com" wrote: > BioPerl Users/Developers, > > In our testing we have found severe performance problems using BioPerl > with Perl 5.8 on Windows (but not on Linux). They show up especially in > SeqIO when reading or writing Fasta files containing large (~16 MB) > sequences. The same files that can be read in 1 or 2 seconds with Windows > Perl 5.6 or Linux Perl 5.8, take minutes in Windows Perl 5.8. > > Although the fault is clearly with Perl, not with BioPerl, I have > identified a couple of places where BioPerl could be modified in order to > save Windows Perl 5.8 users a lot of time, while not affecting other > users. > > For example, in my testing the following excerpt from > Bio::Root::IO::_readline() takes 50 seconds (!) to execute (when reading a > 16 MB sequence): > > if( (!$param{-raw}) && (defined $line) ) { > $line =~ s/\015?\012/\n/g; > $line =~ s/\015/\n/g unless $ONMAC; > } > > whereas the following replacement code should be equivalent: > > if( (!$param{-raw}) && (defined $line) ) { > $line =~ s/\015\012/\012/g; # Change all > CR/LF pairs to LF > $line =~ tr/\015/\n/ unless $ONMAC; # Change all single CRs to > NEWLINE > } > > but executes in less than 1 second. > > In addition, changing: > > defined $sequence && $sequence =~ s/\s//g; # Remove whitespace > > to: > > defined $sequence && $sequence =~ tr/ \t\n\r//d; # Remove > whitespace > > in Bio::SeqIO::fasta.pm saves an additional ~20 seconds. > > There are also problems in reading files with the <> operator when $/ is > redefined to "\n>", where reading the first line of Fasta files containing > large sequences takes ~50 seconds, but reading subsequent lines or files > takes about 1 second. I don't have a work-around for this. > > I would like to ask the mailing list: > > 1. Has anyone else run into this problem? Any fixes? > 2. Do you think BioPerl should incorporate these changes? > > I plan to submit a bug report to perlbug, but don't know when or if the > problem will be fixed. > > - David > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Thu May 18 20:41:27 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Thu, 18 May 2006 14:41:27 -0600 Subject: [Bioperl-l] parsing xml output Message-ID: <446CDBF7.10908@gmx.at> hi, what is the best way to parse NCBI- and WU- Blast XML output.... and is it possible to parse both with the same parser, or differ their XML output... thanks From staffa at niehs.nih.gov Thu May 18 20:49:15 2006 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Thu, 18 May 2006 16:49:15 -0400 Subject: [Bioperl-l] Reading GenBank Genomic File Annotation Message-ID: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov> Would like a fairly simple way to extract certain information from Genbank Genomic File Annotations. Namely the six D.melanogaster sequences. Specifically to find gene entries and learn the gene name, begin and end and CDS. Please point me to appropriate modules and documentation. Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Information Technology Support Services Contract (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina From adamnkraut at gmail.com Thu May 18 21:07:42 2006 From: adamnkraut at gmail.com (Adam Kraut) Date: Thu, 18 May 2006 17:07:42 -0400 Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline C? Message-ID: <134ede0b0605181407l52d1c2c3x79dd7f177ae7b828@mail.gmail.com> I am currently using a pairwise alignment algorithm written in C (not by me). The program consists of a library of routines, structures, and definitions which I do not want to spend a lot of time abstracting. I already have a hack method of writing the parameters and inputs I want from perl, calling the c program with system( ), and then parsing the output in Perl. Any good programmer would probably smack me but I'm just an undergrad and I needed to show my boss that this works in order to spend more time on it. So on to my question, what is the preferred method of extending Bioperl to use this algorithm? I have just read the XS tutorial and a bit about Inline C. Can I put the main function in my script using Inline, and then just point Inline at the rest of the C library? The program has several C-structures that are semantically equivalent to Bioperl objects, so just need somewhere to start. I will spend some more time so that I have a more specific question, I just wanted a little feedback, this is my first post to the bioperl list. Thanks, Adam From osborne1 at optonline.net Thu May 18 21:54:01 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 18 May 2006 17:54:01 -0400 Subject: [Bioperl-l] Reading GenBank Genomic File Annotation In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov> Message-ID: Nick, Have you read the Feature-Annotation HOWTO? This would be a good starting point... Brian O. On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" wrote: > Would like a fairly simple way to extract certain information from Genbank > Genomic File Annotations. > Namely the six D.melanogaster sequences. > Specifically to find gene entries and learn the gene name, begin and end and > CDS. > Please point me to appropriate modules and documentation. > > > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Information Technology Support Services Contract > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Thu May 18 22:22:32 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu, 18 May 2006 18:22:32 -0400 Subject: [Bioperl-l] parsing xml output In-Reply-To: <446CDBF7.10908@gmx.at> References: <446CDBF7.10908@gmx.at> Message-ID: we don't parse WU-BLAST XML at this time. We'd welcome someone contributing this. ncbi XML is parsed with blastxml format. -jason On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote: > hi, > what is the best way to parse NCBI- and WU- Blast XML output.... > and is it possible to parse both with the same parser, or differ their > XML output... > > thanks > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From MEC at stowers-institute.org Thu May 18 22:39:15 2006 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Thu, 18 May 2006 17:39:15 -0500 Subject: [Bioperl-l] module for formating sequence output on the screen Message-ID: Li, Here's a one-liner that uses bioperl's Bio::SeqIO module to reformat fasta on standard input to 50 char wide fasta on standard output. perl -MBio::SeqIO -e 'select Bio::SeqIO->newFh(-format => "fasta", -width => 50); $in = Bio::SeqIO->newFh(-format => "fasta", -fh => \*STDIN); print while <$in>' You can call it like this: perl -MBio::SeqIO -e 'select Bio::SeqIO->newFh(-format => "fasta", -width => 50); $in = Bio::SeqIO->newFh(-format => "fasta", -fh => \*STDIN); print while <$in>' inputfile.fasta > outputfile.fasta Does this help? --Malcolm Cook >-----Original Message----- >From: bioperl-l-bounces at lists.open-bio.org >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chen li >Sent: Tuesday, May 16, 2006 7:53 PM >To: bioperl-l at bioperl.org >Subject: [Bioperl-l] module for formating sequence output on the screen > >Hi all, > >Thank you very much for the help. > >I have some DNA sequences printed on the screen. But >the default output is longer than I expect. I need 50 >necleotides/line. I search CPAN but can not get the >right module. Which bioperl module can do this job? > >Li > >__________________________________________________ >Do You Yahoo!? >Tired of spam? Yahoo! Mail has the best spam protection around >http://mail.yahoo.com >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From gish at watson.wustl.edu Thu May 18 23:57:03 2006 From: gish at watson.wustl.edu (Warren Gish) Date: Thu, 18 May 2006 18:57:03 -0500 Subject: [Bioperl-l] parsing xml output In-Reply-To: Message-ID: <009f01c67ad6$c359a560$0d00a8c0@PM> Just to clarify, the XML output from WU-BLAST conforms to the standard NCBI_BlastOutput.dtd. Technically, contents of data fields could still be incompatible, but care was taken to ensure compatibility. If someone identifies a difference that prevents parsing or proper interpretation of the WU-BLAST output, please let me know. Regards, --Warren > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Jason Stajich > Sent: Thursday, May 18, 2006 5:23 PM > To: Hubert Prielinger > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] parsing xml output > > we don't parse WU-BLAST XML at this time. We'd welcome someone > contributing this. > > ncbi XML is parsed with blastxml format. > > -jason > On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote: > > > hi, > > what is the best way to parse NCBI- and WU- Blast XML output.... > > and is it possible to parse both with the same parser, or > differ their > > XML output... > > > > thanks > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at uiuc.edu Fri May 19 01:10:50 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Thu, 18 May 2006 20:10:50 -0500 Subject: [Bioperl-l] parsing xml output Message-ID: Just to make sure everybody knows, if you use bioperl v1.5.1, SearchIO::blastxml uses XML::Parser which should come with most recent perl distributions. The bioperl-live version has switched over to XML::SAX for SAX2 parsing and it is recommended that you install XML::SAX::ExpatXS as well for faster parsing. Chris ---- Original message ---- >Date: Thu, 18 May 2006 18:57:03 -0500 >From: "Warren Gish" >Subject: Re: [Bioperl-l] parsing xml output >To: "'Hubert Prielinger'" >Cc: bioperl-l at bioperl.org > >Just to clarify, the XML output from WU-BLAST conforms to the standard >NCBI_BlastOutput.dtd. Technically, contents of data fields could still be >incompatible, but care was taken to ensure compatibility. If someone >identifies a difference that prevents parsing or proper interpretation of >the WU-BLAST output, please let me know. >Regards, >--Warren > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Jason Stajich >> Sent: Thursday, May 18, 2006 5:23 PM >> To: Hubert Prielinger >> Cc: bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] parsing xml output >> >> we don't parse WU-BLAST XML at this time. We'd welcome someone >> contributing this. >> >> ncbi XML is parsed with blastxml format. >> >> -jason >> On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote: >> >> > hi, >> > what is the best way to parse NCBI- and WU- Blast XML output.... >> > and is it possible to parse both with the same parser, or >> differ their >> > XML output... >> > >> > thanks >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Fri May 19 12:52:13 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri, 19 May 2006 08:52:13 -0400 Subject: [Bioperl-l] parsing xml output In-Reply-To: <009f01c67ad6$c359a560$0d00a8c0@PM> References: <009f01c67ad6$c359a560$0d00a8c0@PM> Message-ID: <360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu> Whoops - sorry Warren - for some reason I had it in my mind that it was different. So the blastxml parser should work fine. The WUBLAST tab-delimited output is different than NCBI's -m8/9 though, right? -jason On May 18, 2006, at 7:57 PM, Warren Gish wrote: > Just to clarify, the XML output from WU-BLAST conforms to the standard > NCBI_BlastOutput.dtd. Technically, contents of data fields could > still be > incompatible, but care was taken to ensure compatibility. If someone > identifies a difference that prevents parsing or proper > interpretation of > the WU-BLAST output, please let me know. > Regards, > --Warren > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Jason Stajich >> Sent: Thursday, May 18, 2006 5:23 PM >> To: Hubert Prielinger >> Cc: bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] parsing xml output >> >> we don't parse WU-BLAST XML at this time. We'd welcome someone >> contributing this. >> >> ncbi XML is parsed with blastxml format. >> >> -jason >> On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote: >> >>> hi, >>> what is the best way to parse NCBI- and WU- Blast XML output.... >>> and is it possible to parse both with the same parser, or >> differ their >>> XML output... >>> >>> thanks >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From torsten.seemann at infotech.monash.edu.au Thu May 18 22:42:05 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 19 May 2006 08:42:05 +1000 Subject: [Bioperl-l] parsing xml output In-Reply-To: <446CDBF7.10908@gmx.at> References: <446CDBF7.10908@gmx.at> Message-ID: <446CF83D.60207@infotech.monash.edu.au> > what is the best way to parse NCBI- and WU- Blast XML output.... > and is it possible to parse both with the same parser, or differ their > XML output... For NCBI BLAST XML format, use Bio::SearchIO->new(-format=>'blastxml', ...) I don't know if 'blastxml' will load WU-BLAST XML format. http://www.bioperl.org/wiki/HOWTO:SearchIO does not mention it. Why not try it, and report back the results to the bioperl list? -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia -------------- next part -------------- A non-text attachment was scrubbed... Name: torsten.seemann.vcf Type: text/x-vcard Size: 348 bytes Desc: not available URL: From torsten.seemann at infotech.monash.edu.au Thu May 18 22:37:17 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 19 May 2006 08:37:17 +1000 Subject: [Bioperl-l] Reading GenBank Genomic File Annotation In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov> References: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov> Message-ID: <446CF71D.2070207@infotech.monash.edu.au> Staffa, Nick (NIH/NIEHS) [C] wrote: > Would like a fairly simple way to extract certain information from Genbank Genomic File Annotations. > Namely the six D.melanogaster sequences. > Specifically to find gene entries and learn the gene name, begin and end and CDS. > Please point me to appropriate modules and documentation. http://www.bioperl.org/ -> http://www.bioperl.org/wiki/HOWTOs -> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation http://www.bioperl.org/ -> http://www.bioperl.org/wiki/FAQ -> http://www.bioperl.org/wiki/FAQ#Annotations_and_Features -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia -------------- next part -------------- A non-text attachment was scrubbed... Name: torsten.seemann.vcf Type: text/x-vcard Size: 348 bytes Desc: not available URL: From gish at watson.wustl.edu Fri May 19 14:50:08 2006 From: gish at watson.wustl.edu (Warren Gish) Date: Fri, 19 May 2006 09:50:08 -0500 Subject: [Bioperl-l] parsing xml output In-Reply-To: <360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu> References: <009f01c67ad6$c359a560$0d00a8c0@PM> <360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu> Message-ID: Right, the WU-BLAST tabbed output contains more fields. (See http:// blast.wustl.edu/blast/tabular.html). --Warren > Whoops - sorry Warren - for some reason I had it in my mind that it > was different. So the blastxml parser should work fine. The > WUBLAST tab-delimited output is different than NCBI's -m8/9 though, > right? > > -jason From adamnkraut at gmail.com Fri May 19 15:04:01 2006 From: adamnkraut at gmail.com (Adam Kraut) Date: Fri, 19 May 2006 11:04:01 -0400 Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline C? In-Reply-To: References: <134ede0b0605181407l52d1c2c3x79dd7f177ae7b828@mail.gmail.com> Message-ID: <134ede0b0605190804i60ee5ce1v984a33e0c91adf52@mail.gmail.com> The program generates an ensemble of weighted suboptimal alignments by use of a partition function and stochastic backtracking. The algorithm is quite novel and it's really only part of a larger multi-scale comparative modeling project. There documentation is here: http://www.tbi.univie.ac.at/~ulim/probA/probA_lib.html While I think this would be useful to the bioperl community if it were fully abstracted/extended, I would at the least like to be able to pass in any two sequences and get back SimpleAlign objects for our internal uses first. I have a good idea on how to get started. I will be sure to post when I get into trouble. On 5/19/06, aaron.j.mackey at gsk.com wrote: > > bioperl-ext is the package in which alignment algorithms and/or BioPerl > "wrapped" external C libraries live. Subprojects in bioperl-ext use both > XS and Inline::C, that's up to you. > > You'll need to get your C code compiled to a dynamically loaded library > (.so) to use either XS or Inline::C; this precludes any reuse of the C > main() function (although your Inline::C wrapper might recapitulate/copy > the main() function code). > > Out of curiosity, what pairwise alignment algorithm are you using? This > is a heavily beaten path, you might want to dig around first to see if > someone else already has what you need. > > -Aaron > > From slenk at emich.edu Fri May 19 14:42:41 2006 From: slenk at emich.edu (Stephen Gordon Lenk) Date: Fri, 19 May 2006 10:42:41 -0400 Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline C? Message-ID: There is nothing wrong with a reasonable way that works - better not to put yourself down. Inline is good if you can get it to work for you - I have had issues with linking Inline to dynamic libraries. I believe Inline makes a file that has linkage characteristics specified. Try it and see, then tell people how you did it. My two cents. Another way to use exterior executables is popen3, then reading and writing to the pipes. I use it (primer3 and local lab automation code) - snippet follows: my $pid = 0; my $cancmd = 'cancmd.exe'; my $write = 0; my $read = 0; sub new { my $c = {}; $pid = open3(\*WTRFH, \*RDRFH, \*RDRFH, $cancmd); $write = *WTRFH; $read = *RDRFH; $write->autoflush(); bless $c; return $c; } Just write your request, then read it back - I make sure that each pair is a newline terminated text line - be sure you harvest the child pid when you are done. ----- Original Message ----- From: Adam Kraut Date: Thursday, May 18, 2006 5:07 pm Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline C? > I am currently using a pairwise alignment algorithm written in C > (not by > me). The program consists of a library of routines, structures, and > definitions which I do not want to spend a lot of time > abstracting. I > already have a hack method of writing the parameters and inputs I > want from > perl, calling the c program with system( ), and then parsing the > output in > Perl. Any good programmer would probably smack me but I'm just an > undergradand I needed to show my boss that this works in order to > spend more time on > it. > > So on to my question, what is the preferred method of extending > Bioperl to > use this algorithm? I have just read the XS tutorial and a bit > about Inline > C. Can I put the main function in my script using Inline, and > then just > point Inline at the rest of the C library? The program has several > C-structures that are semantically equivalent to Bioperl objects, > so just > need somewhere to start. I will spend some more time so that I > have a more > specific question, I just wanted a little feedback, this is my > first post to > the bioperl list. > > Thanks, > Adam > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hubert.prielinger at gmx.at Fri May 19 16:52:28 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 19 May 2006 10:52:28 -0600 Subject: [Bioperl-l] parsing xml output In-Reply-To: References: <009f01c67ad6$c359a560$0d00a8c0@PM> <360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu> Message-ID: <446DF7CC.5060509@gmx.at> hi, I wondered whether is it also possible in the xml output (either WU or NCBI - Blast) to get the species (taxononmy) for every hit, if I do a general search. regards Warren Gish wrote: > Right, the WU-BLAST tabbed output contains more fields. (See http:// > blast.wustl.edu/blast/tabular.html). > --Warren > > >> Whoops - sorry Warren - for some reason I had it in my mind that it >> was different. So the blastxml parser should work fine. The >> WUBLAST tab-delimited output is different than NCBI's -m8/9 though, >> right? >> >> -jason >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From staffa at niehs.nih.gov Fri May 19 18:12:47 2006 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C]) Date: Fri, 19 May 2006 14:12:47 -0400 Subject: [Bioperl-l] Reading GenBank Genomic File Annotation In-Reply-To: Message-ID: <7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov> Specifically: I have the document to which you refer, but have not seen this one thing I need in the printout of tags etc.: the values in this line; mRNA join(380..509,578..1913,7784..8649,9439..10200) Is that a location object? Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Information Technology Support Services Contract (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina > ---------- > From: Brian Osborne > Sent: Thursday, May 18, 2006 5:54 PM > To: Staffa, Nick (NIH/NIEHS) [C]; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Reading GenBank Genomic File Annotation > > Nick, > > Have you read the Feature-Annotation HOWTO? This would be a good starting > point... > > Brian O. > > > On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" > wrote: > > > Would like a fairly simple way to extract certain information from Genbank > > Genomic File Annotations. > > Namely the six D.melanogaster sequences. > > Specifically to find gene entries and learn the gene name, begin and end and > > CDS. > > Please point me to appropriate modules and documentation. > > > > > > Nick Staffa > > Telephone: 919-316-4569 (NIEHS: 6-4569) > > Scientific Computing Support Group > > NIEHS Information Technology Support Services Contract > > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) > > National Institute of Environmental Health Sciences > > National Institutes of Health > > Research Triangle Park, North Carolina > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From chandan.kr.singh at gmail.com Fri May 19 18:37:26 2006 From: chandan.kr.singh at gmail.com (CHANDAN SINGH) Date: Sat, 20 May 2006 00:07:26 +0530 Subject: [Bioperl-l] Reading GenBank Genomic File Annotation In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov> References: <7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov> Message-ID: <2d4f320605191137n11017ec0xe41a632a3c7ea9a9@mail.gmail.com> On 5/19/06, Staffa, Nick (NIH/NIEHS) [C] wrote: > > Specifically: > I have the document to which you refer, > but have not seen this one thing I need in the printout of tags etc.: > the values in this line; > mRNA join(380..509,578..1913,7784..8649,9439..10200) > Is that a location object? Yes it is a location object . If you want that as a string (this is what seems from ur mail ) , u just have to do this : $loc = $fet->location(); $loc_str = $loc->to_FTstring() ; Hope it helps. Chandan Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Information Technology Support Services Contract > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > > > ---------- > > From: Brian Osborne > > Sent: Thursday, May 18, 2006 5:54 PM > > To: Staffa, Nick (NIH/NIEHS) [C]; bioperl-l at lists.open-bio.org > > Subject: Re: [Bioperl-l] Reading GenBank Genomic File Annotation > > > > Nick, > > > > Have you read the Feature-Annotation HOWTO? This would be a good > starting > > point... > > > > Brian O. > > > > > > On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" > > > wrote: > > > > > Would like a fairly simple way to extract certain information from > Genbank > > > Genomic File Annotations. > > > Namely the six D.melanogaster sequences. > > > Specifically to find gene entries and learn the gene name, begin and > end and > > > CDS. > > > Please point me to appropriate modules and documentation. > > > > > > > > > Nick Staffa > > > Telephone: 919-316-4569 (NIEHS: 6-4569) > > > Scientific Computing Support Group > > > NIEHS Information Technology Support Services Contract > > > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) > > > National Institute of Environmental Health Sciences > > > National Institutes of Health > > > Research Triangle Park, North Carolina > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From osborne1 at optonline.net Fri May 19 19:39:36 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 19 May 2006 15:39:36 -0400 Subject: [Bioperl-l] Reading GenBank Genomic File Annotation In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov> Message-ID: Nick, This is from the HOWTO: Another way of describing a feature in Genbank involves multiple start and end positions. These could be called "split" locations, and a very common example is the join statement in the CDS feature found in Genbank entries (e.g. join(45..122,233..267)). This calls for a specialized object, Bio::Location::SplitLocationI, which is a container for Location objects: for my $feature ($seqobj->top_SeqFeatures){ if ( $feature->location->isa('Bio::Location::SplitLocationI') && $feature->primary_tag eq 'CDS' ) { for my $location ( $feature->location->sub_Location ) { print $location->start . ".." . $location->end . "\n"; } } } Brian O. On 5/19/06 2:12 PM, "Staffa, Nick (NIH/NIEHS) [C]" wrote: > Specifically: > I have the document to which you refer, > but have not seen this one thing I need in the printout of tags etc.: > the values in this line; > mRNA join(380..509,578..1913,7784..8649,9439..10200) > Is that a location object? > > > > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Information Technology Support Services Contract > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > >> ---------- >> From: Brian Osborne >> Sent: Thursday, May 18, 2006 5:54 PM >> To: Staffa, Nick (NIH/NIEHS) [C]; bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Reading GenBank Genomic File Annotation >> >> Nick, >> >> Have you read the Feature-Annotation HOWTO? This would be a good starting >> point... >> >> Brian O. >> >> >> On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" >> wrote: >> >>> Would like a fairly simple way to extract certain information from Genbank >>> Genomic File Annotations. >>> Namely the six D.melanogaster sequences. >>> Specifically to find gene entries and learn the gene name, begin and end and >>> CDS. >>> Please point me to appropriate modules and documentation. >>> >>> >>> Nick Staffa >>> Telephone: 919-316-4569 (NIEHS: 6-4569) >>> Scientific Computing Support Group >>> NIEHS Information Technology Support Services Contract >>> (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) >>> National Institute of Environmental Health Sciences >>> National Institutes of Health >>> Research Triangle Park, North Carolina >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Fri May 19 20:42:09 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 19 May 2006 14:42:09 -0600 Subject: [Bioperl-l] parsing xml output In-Reply-To: References: <009f01c67ad6$c359a560$0d00a8c0@PM> <360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu> <446DF7CC.5060509@gmx.at> Message-ID: <446E2DA1.1050503@gmx.at> hi warren, that means if I alter the DTD (if that is possible) by adding the taxonomic id to the DTD..... then I should have the taxonomic id tag in the xml file (theoretically) but I guess this is only possible with a local search (blastall) but not with an online search. greetings Warren Gish wrote: > > On May 19, 2006, at 11:52 AM, Hubert Prielinger wrote: > >> hi, >> I wondered whether is it also possible in the xml output (either WU >> or NCBI - Blast) to get the species (taxononmy) for every hit, if I >> do a general search. >> regards >> > The taxonomic id is not an entity in the NCBI XML DTD. If the > information was embedded in deflines, one could conceivably parse for > it, but I believe the NCBI only distributes taxids in their ASN.1 data > and in their pre-formated BLAST databases, and NCBI BLAST only reports > taxids in its ASN.1 output format, where taxid is available as an entity. > > --Warren > > From cjfields at uiuc.edu Fri May 19 20:56:56 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Fri, 19 May 2006 15:56:56 -0500 Subject: [Bioperl-l] parsing xml output Message-ID: <5c1c5a79.bb0af5aa.8198d00@expms6.cites.uiuc.edu> You'll have to pull the GI or accession from each hit and do a lookup by either grabbing the sequence and using Bio::Species or use Bio::DB::Taxonomy; there isn't any tax information directly incorporated into BLAST reports AFAIK. Chris ---- Original message ---- >Date: Fri, 19 May 2006 10:52:28 -0600 >From: Hubert Prielinger >Subject: Re: [Bioperl-l] parsing xml output >To: Warren Gish , bioperl-l at bioperl.org > >hi, >I wondered whether is it also possible in the xml output (either WU or >NCBI - Blast) to get the species (taxononmy) for every hit, if I do a >general search. >regards > >Warren Gish wrote: >> Right, the WU-BLAST tabbed output contains more fields. (See http:// >> blast.wustl.edu/blast/tabular.html). >> --Warren >> >> >>> Whoops - sorry Warren - for some reason I had it in my mind that it >>> was different. So the blastxml parser should work fine. The >>> WUBLAST tab-delimited output is different than NCBI's -m8/9 though, >>> right? >>> >>> -jason >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri May 19 20:59:35 2006 From: cjfields at uiuc.edu (Christopher Fields) Date: Fri, 19 May 2006 15:59:35 -0500 Subject: [Bioperl-l] parsing xml output Message-ID: <65932c77.bb0b33b0.8253400@expms6.cites.uiuc.edu> Um, I don't think it works that way. I'm pretty sure the XML is generated from the ASN1 output. I don't think (like Warren says) that you can directly get to the tax information. Indirectly is another matter... Chris ---- Original message ---- >Date: Fri, 19 May 2006 14:42:09 -0600 >From: Hubert Prielinger >Subject: Re: [Bioperl-l] parsing xml output >To: Warren Gish , bioperl-l at bioperl.org > >hi warren, >that means if I alter the DTD (if that is possible) by adding the >taxonomic id to the DTD..... then I should have the taxonomic id tag in >the xml file (theoretically) >but I guess this is only possible with a local search (blastall) but not >with an online search. > >greetings > >Warren Gish wrote: >> >> On May 19, 2006, at 11:52 AM, Hubert Prielinger wrote: >> >>> hi, >>> I wondered whether is it also possible in the xml output (either WU >>> or NCBI - Blast) to get the species (taxononmy) for every hit, if I >>> do a general search. >>> regards >>> >> The taxonomic id is not an entity in the NCBI XML DTD. If the >> information was embedded in deflines, one could conceivably parse for >> it, but I believe the NCBI only distributes taxids in their ASN.1 data >> and in their pre-formated BLAST databases, and NCBI BLAST only reports >> taxids in its ASN.1 output format, where taxid is available as an entity. >> >> --Warren >> >> > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Fri May 19 21:30:20 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 19 May 2006 15:30:20 -0600 Subject: [Bioperl-l] parsing xml output In-Reply-To: <446E3854.5010708@gmx.at> References: <5c1c5a79.bb0af5aa.8198d00@expms6.cites.uiuc.edu> <446E3854.5010708@gmx.at> Message-ID: <446E38EC.9020100@gmx.at> ok, thanks, it appears that I only need the species where the Protein is derived from, so I guess Bio:Species would satisfy me, or? and it would work that I just pull off the accession from the blast output file and then assign the accession code and get as return value the species name. is it possible to just assign the accession code, because I looked up but they were always talking of the entire file. regards > > > Christopher Fields wrote: >> You'll have to pull the GI or accession from each hit and do a lookup >> by either grabbing the sequence and using Bio::Species or use >> Bio::DB::Taxonomy; there isn't any tax information directly >> incorporated into BLAST reports AFAIK. >> >> Chris >> >> ---- Original message ---- >> >>> Date: Fri, 19 May 2006 10:52:28 -0600 >>> From: Hubert Prielinger Subject: Re: >>> [Bioperl-l] parsing xml output To: Warren Gish >>> , bioperl-l at bioperl.org >>> >>> hi, >>> I wondered whether is it also possible in the xml output (either WU >>> or NCBI - Blast) to get the species (taxononmy) for every hit, if I >>> do a general search. >>> regards >>> >>> Warren Gish wrote: >>> >>>> Right, the WU-BLAST tabbed output contains more fields. (See >>>> http:// blast.wustl.edu/blast/tabular.html). >>>> --Warren >>>> >>>> >>>>> Whoops - sorry Warren - for some reason I had it in my mind that >>>>> it was different. So the blastxml parser should work fine. The >>>>> WUBLAST tab-delimited output is different than NCBI's -m8/9 >>>>> though, right? >>>>> >>>>> -jason >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > From jason.stajich at duke.edu Fri May 19 22:40:54 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri, 19 May 2006 18:40:54 -0400 Subject: [Bioperl-l] parsing xml output In-Reply-To: <446E38EC.9020100@gmx.at> References: <5c1c5a79.bb0af5aa.8198d00@expms6.cites.uiuc.edu> <446E3854.5010708@gmx.at> <446E38EC.9020100@gmx.at> Message-ID: There is a gi2taxid table in the /pub/taxonomy part of NCBI FTP site (ftp.ncbi.nih.gov) -- I have used this to take GI numbers from report and get taxonomy for overall classification. I think something like this exists in the scripts or examples directory in the bioperl distro. I know I posted about it when I wrote about it a while ago. -jason On May 19, 2006, at 5:30 PM, Hubert Prielinger wrote: > ok, thanks, > it appears that I only need the species where the Protein is derived > from, so I guess Bio:Species would satisfy me, or? > and it would work that I just pull off the accession from the blast > output file and then assign the accession code and get as return value > the species name. > is it possible to just assign the accession code, because I looked up > but they were always talking of the entire file. > > regards >> >> >> Christopher Fields wrote: >>> You'll have to pull the GI or accession from each hit and do a >>> lookup >>> by either grabbing the sequence and using Bio::Species or use >>> Bio::DB::Taxonomy; there isn't any tax information directly >>> incorporated into BLAST reports AFAIK. >>> >>> Chris >>> >>> ---- Original message ---- >>> >>>> Date: Fri, 19 May 2006 10:52:28 -0600 >>>> From: Hubert Prielinger Subject: Re: >>>> [Bioperl-l] parsing xml output To: Warren Gish >>>> , bioperl-l at bioperl.org >>>> >>>> hi, >>>> I wondered whether is it also possible in the xml output (either WU >>>> or NCBI - Blast) to get the species (taxononmy) for every hit, if I >>>> do a general search. >>>> regards >>>> >>>> Warren Gish wrote: >>>> >>>>> Right, the WU-BLAST tabbed output contains more fields. (See >>>>> http:// blast.wustl.edu/blast/tabular.html). >>>>> --Warren >>>>> >>>>> >>>>>> Whoops - sorry Warren - for some reason I had it in my mind that >>>>>> it was different. So the blastxml parser should work fine. The >>>>>> WUBLAST tab-delimited output is different than NCBI's -m8/9 >>>>>> though, right? >>>>>> >>>>>> -jason >>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From ewijaya at i2r.a-star.edu.sg Sat May 20 12:36:44 2006 From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward) Date: Sat, 20 May 2006 20:36:44 +0800 Subject: [Bioperl-l] Method for checking Sequence type of a file Message-ID: <30362db229c.446f7ddc@i2r.a-star.edu.sg> Dear expert, Is there any Bioperl method that allows you to check verify sequence type in a file? For example, given a file we wish to check (return true or false) whether it is in FASTA format, GENBANK format, etc. This method is useful in web application as taint checking procedure. Regards, Edward WIJAYA SINGAPORE ------------ Institute For Infocomm Research - Disclaimer ------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you. -------------------------------------------------------- From aaron.j.mackey at gsk.com Fri May 19 13:33:01 2006 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Fri, 19 May 2006 09:33:01 -0400 Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline C? In-Reply-To: <134ede0b0605181407l52d1c2c3x79dd7f177ae7b828@mail.gmail.com> Message-ID: bioperl-ext is the package in which alignment algorithms and/or BioPerl "wrapped" external C libraries live. Subprojects in bioperl-ext use both XS and Inline::C, that's up to you. You'll need to get your C code compiled to a dynamically loaded library (.so) to use either XS or Inline::C; this precludes any reuse of the C main() function (although your Inline::C wrapper might recapitulate/copy the main() function code). Out of curiosity, what pairwise alignment algorithm are you using? This is a heavily beaten path, you might want to dig around first to see if someone else already has what you need. -Aaron bioperl-l-bounces at lists.open-bio.org wrote on 05/18/2006 05:07:42 PM: > I am currently using a pairwise alignment algorithm written in C (not by > me). The program consists of a library of routines, structures, and > definitions which I do not want to spend a lot of time abstracting. I > already have a hack method of writing the parameters and inputs I want from > perl, calling the c program with system( ), and then parsing the output in > Perl. Any good programmer would probably smack me but I'm just an undergrad > and I needed to show my boss that this works in order to spend more time on > it. > > So on to my question, what is the preferred method of extending Bioperl to > use this algorithm? I have just read the XS tutorial and a bit about Inline > C. Can I put the main function in my script using Inline, and then just > point Inline at the rest of the C library? The program has several > C-structures that are semantically equivalent to Bioperl objects, so just > need somewhere to start. I will spend some more time so that I have a more > specific question, I just wanted a little feedback, this is my first post to > the bioperl list. > > Thanks, > Adam > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at duke.edu Sat May 20 14:50:17 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Sat, 20 May 2006 10:50:17 -0400 Subject: [Bioperl-l] Method for checking Sequence type of a file In-Reply-To: <30362db229c.446f7ddc@i2r.a-star.edu.sg> References: <30362db229c.446f7ddc@i2r.a-star.edu.sg> Message-ID: Try Bio::Tools::GuessSeqFormat On May 20, 2006, at 8:36 AM, Wijaya Edward wrote: > > Dear expert, > > Is there any Bioperl method that allows > you to check verify sequence type in a file? > > For example, given a file we wish > to check (return true or false) whether > it is in FASTA format, GENBANK format, etc. > > This method is useful in web application > as taint checking procedure. > > Regards, > Edward WIJAYA > SINGAPORE > > > ------------ Institute For Infocomm Research - Disclaimer > ------------- > This email is confidential and may be privileged. If you are not > the intended recipient, please delete it and notify us immediately. > Please do not copy or use it for any purpose, or disclose its > contents to any other person. Thank you. > -------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From chen_li3 at yahoo.com Sun May 21 00:15:01 2006 From: chen_li3 at yahoo.com (chen li) Date: Sat, 20 May 2006 17:15:01 -0700 (PDT) Subject: [Bioperl-l] problems iwth Bio::graphics module Message-ID: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com> Dear all, I try one script from GraphicsHowTo under Cygwin environment(GD and libpng already installed). I type this line in Cygwin X window: $ perl render_blast1.pl data1.txt | display - And here is the result: display: no decode delegate for this image format `/tmp/magick-qKiRPDRS'. Any idea? Thank you very much, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From osborne1 at optonline.net Sun May 21 00:59:06 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Sat, 20 May 2006 20:59:06 -0400 Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com> Message-ID: Chen, Not sure. However, whenever I see a new or incomprehensible error message like "display: no decode delegate for this image format" I Google it. Brian O. On 5/20/06 8:15 PM, "chen li" wrote: > Dear all, > > > I try one script from GraphicsHowTo under Cygwin > environment(GD and libpng already installed). I type > this line in Cygwin X window: > > > $ perl render_blast1.pl data1.txt | display - > > And here is the result: > > display: no decode delegate for this image format > `/tmp/magick-qKiRPDRS'. > > Any idea? > > > Thank you very much, > > Li > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From n.saunders at uq.edu.au Sun May 21 22:17:44 2006 From: n.saunders at uq.edu.au (Neil Saunders) Date: Mon, 22 May 2006 08:17:44 +1000 Subject: [Bioperl-l] problems with Bio::Graph Message-ID: <4470E708.3070402@uq.edu.au> dear all, I am having some problems with the Bio::Graph modules. Running Bioperl 1.5.0 RC1 with Ubuntu 5.10 i686. I would like to parse files in PSI MI XML 2.5 format and for selected proteins, get the Uniprot accession of interacting partners (this is outlined in the documentation for Bio::Graph::ProteinGraph). I wrote a very simple test script and ran it on a selection of XML files. The script is simply: ---------------------------------------------------------------- use strict; use Bio::Graph::IO; my $mifile = shift || die("Usage = biograph.pl \n"); my $graphio = Bio::Graph::IO->new('-file' => $mifile, '-format' => 'psi_xml'); my $gr = $graphio->next_network; ---------------------------------------------------------------- Here's a summary of the error messages with some sample files (I tried PSI MI XML versions 1 and 2.5): 1. MINT database 9707552_small.xml (PSI 2.5) Can't call method "att" on an undefined value at /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173. 2. IntAct database yeast_small-11.xml (PSI 2.5) Can't call method "att" on an undefined value at /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173. 3. IntAct database yeast_small-11.xml (PSI 1) Use of uninitialized value in string eq at /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 126. 4. DIP files Scere20060402.mif, Ecoli20060402.mif (PSI 1) These give no errors 5. DIP file dip20060402.mif (PSI 1, complete dataset) ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Invalid species name 'immunodeficiency virus type 1, HIV-1' STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.7/Bio/Root/Root.pm:328 STACK: Bio::Species::validate_species_name /usr/local/share/perl/5.8.7/Bio/Species.pm:340 STACK: Bio::Species::classification /usr/local/share/perl/5.8.7/Bio/Species.pm:170 STACK: Bio::Species::new /usr/local/share/perl/5.8.7/Bio/Species.pm:118 STACK: Bio::Graph::IO::psi_xml::_proteinInteractor /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:105 STACK: XML::Twig::_twig_end /usr/share/perl5/XML/Twig.pm:1473 STACK: XML::Parser::Expat::parse /usr/lib/perl5/XML/Parser/Expat.pm:469 STACK: XML::Parser::parse /usr/lib/perl5/XML/Parser.pm:187 STACK: XML::Parser::parsefile /usr/lib/perl5/XML/Parser.pm:233 STACK: Bio::Graph::IO::psi_xml::next_network /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:79 STACK: ./biograph.pl:18 ----------------------------------------------------------- Looking at the module code, it seems that the first 2 errors relate to a parameter "proteinInteractorRef", found in PSI MI version 1 but not version 2.5. Error 3 I haven't yet figured out. DIP PSI MI XML version 1 for single species seems OK, but it seems there are species names in the complete dataset that cause problems (error 5). Is the CVS version of Bio::Graph any better at handling PSI MI XML? Are there plans to get it to work with version 2.5 files from all sources (MINT and IntAct) ? Googling and checking the list archives didn't give a lot of hits which made me think it's not a widely-used module. thanks, Neil -- School of Molecular and Microbial Sciences University of Queensland Brisbane 4072 Australia http://psychro.bioinformatics.unsw.edu.au/neil From torsten.seemann at infotech.monash.edu.au Mon May 22 01:31:56 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Mon, 22 May 2006 11:31:56 +1000 Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com> References: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com> Message-ID: <4471148C.5090404@infotech.monash.edu.au> > I try one script from GraphicsHowTo under Cygwin > environment(GD and libpng already installed). I type > this line in Cygwin X window: > $ perl render_blast1.pl data1.txt | display - > display: no decode delegate for this image format > `/tmp/magick-qKiRPDRS'. You are piping the output of the Perl script (which is a GIF/PNG image) into the input of a program called "display". This program is part of the ImageMagick toolkit, standard on most Linux installations. Because you are using Windows you probably don't have it installed! Try this: $ perl render_blast1.pl data1.txt > image.gif Then load 'image.gif' into whatever your favourite image viewer is. -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From darin.london at duke.edu Mon May 22 15:29:45 2006 From: darin.london at duke.edu (Darin London) Date: Mon, 22 May 2006 11:29:45 -0400 Subject: [Bioperl-l] BOSC 2006 2nd Call for Papers In-Reply-To: <4471CE49.80109@duke.edu> References: <44294B65.4050207@duke.edu> <4471CE49.80109@duke.edu> Message-ID: <4471D8E9.8090109@duke.edu> 2nd CALL FOR SPEAKERS This is the second and last official call for speakers to submit their abstracts to speak at BOSC 2006 in Fortaleza, Brasil. In order to be considered as a potential speaker, an abstract must be recieved by Monday, June 5th, 2006. We look forward to a great conference this year. Please consult The Official BOSC 2006 Website at: http://www.open-bio.org/wiki/BOSC_2006 for more details and information. In addition, a BOSC weblog has been setup to make it easier to desiminate all BOSC related announcements: http://wiki.open-bio.org/boscblog/ And if you have an ICAL compatible Calendar, there is an EventDB calendar set up with all BOSC related deadlines. http://eventful.com/groups/G0-001-000014747-0 More information about ISMB can be found at the Official ISMB 2006 Website: http://ismb2006.cbi.cnptia.embrapa.br/ Thank You, and we look forward to seeing you all, The BOSC Organizing Committee. From darin.london at duke.edu Mon May 22 16:00:55 2006 From: darin.london at duke.edu (Darin London) Date: Mon, 22 May 2006 09:00:55 -0700 Subject: [Bioperl-l] [Bioperl-announce-l] BOSC 2006 2nd Call for Papers In-Reply-To: <4471CE49.80109@duke.edu> References: <44294B65.4050207@duke.edu> <4471CE49.80109@duke.edu> Message-ID: <000301c67db8$e8391f70$6400a8c0@CodonSolutions.local> 2nd CALL FOR SPEAKERS This is the second and last official call for speakers to submit their abstracts to speak at BOSC 2006 in Fortaleza, Brasil. In order to be considered as a potential speaker, an abstract must be recieved by Monday, June 5th, 2006. We look forward to a great conference this year. Please consult The Official BOSC 2006 Website at: http://www.open-bio.org/wiki/BOSC_2006 for more details and information. In addition, a BOSC weblog has been setup to make it easier to desiminate all BOSC related announcements: http://wiki.open-bio.org/boscblog/ And if you have an ICAL compatible Calendar, there is an EventDB calendar set up with all BOSC related deadlines. http://eventful.com/groups/G0-001-000014747-0 More information about ISMB can be found at the Official ISMB 2006 Website: http://ismb2006.cbi.cnptia.embrapa.br/ Thank You, and we look forward to seeing you all, The BOSC Organizing Committee. _______________________________________________ Bioperl-announce-l mailing list Bioperl-announce-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l From osborne1 at optonline.net Mon May 22 21:37:50 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Mon, 22 May 2006 17:37:50 -0400 Subject: [Bioperl-l] problems with Bio::Graph In-Reply-To: <4470E708.3070402@uq.edu.au> Message-ID: Neil, Let me propose an alternative. In the past few months I've been working on a Bioperl package for handling protein interaction networks, it is called bioperl-network. It's similar to the Bio::Graph modules, except for the following: - It does not use Nat Goodman's SimpleGraph, it uses Perl's Graph. The advantage is that we are not responsible for maintaining the algorithm code, the disadvantage is that Graph has some bugs but Jarkko Hietaniemi has been working on these and has fixed some significant ones recently. - It uses names and concepts from Graph. It also has separate notions of edge and interaction, where one edge can have one or more interactions. - It uses more method names and conventions borrowed from interaction databases and PSI MI. For example, a node can be a protein complex composed of multiple Seq objects, not just a protein. This package is a makeover of Bio::Graph, therefore Nat Goodman and Richard Adams are major contributors to it. It's also worth mentioning that it's not complete, meaning it won't parse all fields from PSI MI 2 or 2.5 but I think it should be able to handle the code you've shown (and if it cannot then I'll see that it's fixed). I don't know about PSI MI version 1 but if I'm not mistaken there's a version 1 -> version 2 converter. I'm about to put this into CVS so you can take a look, should you choose to. Brian O. On 5/21/06 6:17 PM, "Neil Saunders" wrote: > dear all, > > I am having some problems with the Bio::Graph modules. Running Bioperl 1.5.0 > RC1 with Ubuntu 5.10 i686. > > I would like to parse files in PSI MI XML 2.5 format and for selected > proteins, > get the Uniprot accession of interacting partners (this is outlined in the > documentation for Bio::Graph::ProteinGraph). I wrote a very simple test > script > and ran it on a selection of XML files. The script is simply: > > ---------------------------------------------------------------- > use strict; > use Bio::Graph::IO; > > my $mifile = shift || die("Usage = biograph.pl \n"); > my $graphio = Bio::Graph::IO->new('-file' => $mifile, > '-format' => 'psi_xml'); > my $gr = $graphio->next_network; > ---------------------------------------------------------------- > > Here's a summary of the error messages with some sample files (I tried PSI MI > XML versions 1 and 2.5): > > 1. MINT database 9707552_small.xml (PSI 2.5) > Can't call method "att" on an undefined value at > /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173. > > 2. IntAct database yeast_small-11.xml (PSI 2.5) > Can't call method "att" on an undefined value at > /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173. > > 3. IntAct database yeast_small-11.xml (PSI 1) > Use of uninitialized value in string eq at > /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 126. > > 4. DIP files Scere20060402.mif, Ecoli20060402.mif (PSI 1) > These give no errors > > 5. DIP file dip20060402.mif (PSI 1, complete dataset) > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Invalid species name 'immunodeficiency virus type 1, HIV-1' > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.7/Bio/Root/Root.pm:328 > STACK: Bio::Species::validate_species_name > /usr/local/share/perl/5.8.7/Bio/Species.pm:340 > STACK: Bio::Species::classification > /usr/local/share/perl/5.8.7/Bio/Species.pm:170 > STACK: Bio::Species::new /usr/local/share/perl/5.8.7/Bio/Species.pm:118 > STACK: Bio::Graph::IO::psi_xml::_proteinInteractor > /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:105 > STACK: XML::Twig::_twig_end /usr/share/perl5/XML/Twig.pm:1473 > STACK: XML::Parser::Expat::parse /usr/lib/perl5/XML/Parser/Expat.pm:469 > STACK: XML::Parser::parse /usr/lib/perl5/XML/Parser.pm:187 > STACK: XML::Parser::parsefile /usr/lib/perl5/XML/Parser.pm:233 > STACK: Bio::Graph::IO::psi_xml::next_network > /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:79 > STACK: ./biograph.pl:18 > ----------------------------------------------------------- > > > Looking at the module code, it seems that the first 2 errors relate to a > parameter "proteinInteractorRef", found in PSI MI version 1 but not version > 2.5. > Error 3 I haven't yet figured out. DIP PSI MI XML version 1 for single > species seems OK, but it seems there are species names in the complete dataset > that cause problems (error 5). > > > Is the CVS version of Bio::Graph any better at handling PSI MI XML? Are there > plans to get it to work with version 2.5 files from all sources (MINT and > IntAct) ? Googling and checking the list archives didn't give a lot of hits > which made me think it's not a widely-used module. > > thanks, > Neil From torsten.seemann at infotech.monash.edu.au Mon May 22 21:53:02 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 23 May 2006 07:53:02 +1000 Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <20060522132553.21896.qmail@web36804.mail.mud.yahoo.com> References: <20060522132553.21896.qmail@web36804.mail.mud.yahoo.com> Message-ID: <447232BE.1080001@infotech.monash.edu.au> Chen Li > perl render_blast1.pl data1.txt >im.png Based on http://bioperl.org/wiki/HOWTO:Graphics I believe the example script is creating a PNG image. The last line is: print $panel->png; > and Perl runs without any problem. I use adobe > photoshop to open them and Adobe can't recognize them. > If I use ACDSee to open them I only get a black > background. If I issue this line under Cygwin X window > display im.png or display im.gif > Cygwin says: > display: Improper image header `im.png'. > It seems Perl can't produce an image with right > format. Are you sure Perl is producing a PNG file at all? How many bytes does im.png use? Zero? Did you notice this in http://bioperl.org/wiki/HOWTO:Graphics ? It says: "If you are on a Windows platform, you need to put STDOUT into binary mode so that the PNG file does not go through Window's carriage return/linefeed transformations. Before the final print statement, put the statement binmode(STDOUT)." ie. your script should have binmode(STDOUT); print $panel->png; as the last 2 lines. > Do you experience the same problem before? No. --Torsten From chen_li3 at yahoo.com Mon May 22 13:25:53 2006 From: chen_li3 at yahoo.com (chen li) Date: Mon, 22 May 2006 06:25:53 -0700 (PDT) Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <4471148C.5090404@infotech.monash.edu.au> Message-ID: <20060522132553.21896.qmail@web36804.mail.mud.yahoo.com> Dear Dr. Seemann, Thank you very much for the reply. I issue this line: perl render_blast1.pl data1.txt >im.gif or perl render_blast1.pl data1.txt >im.png and Perl runs without any problem. I use adobe photoshop to open them and Adobe can't recognize them. If I use ACDSee to open them I only get a black background. If I issue this line under Cygwin X window display im.png or display im.gif Cygwin says: display: Improper image header `im.png'. or display: Improper image header `im.gif'. It seems Perl can't produce an image with right format. Do you experience the same problem before? Li --- Torsten Seemann wrote: > > I try one script from GraphicsHowTo under Cygwin > > environment(GD and libpng already installed). I > type > > this line in Cygwin X window: > > $ perl render_blast1.pl data1.txt | display - > > display: no decode delegate for this image format > > `/tmp/magick-qKiRPDRS'. > > You are piping the output of the Perl script (which > is a GIF/PNG image) > into the input of a program called "display". This > program is part of > the ImageMagick toolkit, standard on most Linux > installations. Because > you are using Windows you probably don't have it > installed! Try this: > > $ perl render_blast1.pl data1.txt > image.gif > > Then load 'image.gif' into whatever your favourite > image viewer is. > > -- > Dr Torsten Seemann > http://www.vicbioinformatics.com > Victorian Bioinformatics Consortium, Monash > University, Australia > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From chen_li3 at yahoo.com Mon May 22 22:57:42 2006 From: chen_li3 at yahoo.com (chen li) Date: Mon, 22 May 2006 15:57:42 -0700 (PDT) Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <447232BE.1080001@infotech.monash.edu.au> Message-ID: <20060522225742.78245.qmail@web36804.mail.mud.yahoo.com> Hi, I try both: either with or without this statement binmode(STDOUT) before the last line print $panel->png; But there are no differenes. I get a file of 2432 bytes. Li > Chen Li > > > perl render_blast1.pl data1.txt >im.png > > Based on http://bioperl.org/wiki/HOWTO:Graphics I > believe the example > script is creating a PNG image. The last line is: > print $panel->png; > > > and Perl runs without any problem. I use adobe > > photoshop to open them and Adobe can't recognize > them. > > If I use ACDSee to open them I only get a black > > background. If I issue this line under Cygwin X > window > > display im.png or display im.gif > > Cygwin says: > > display: Improper image header `im.png'. > > It seems Perl can't produce an image with right > > format. > > Are you sure Perl is producing a PNG file at all? > How many bytes does im.png use? Zero? > > Did you notice this in > http://bioperl.org/wiki/HOWTO:Graphics ? > > It says: "If you are on a Windows platform, you need > to put STDOUT into > binary mode so that the PNG file does not go through > Window's carriage > return/linefeed transformations. Before the final > print statement, put > the statement binmode(STDOUT)." > > ie. your script should have > > binmode(STDOUT); > print $panel->png; > > as the last 2 lines. > > > Do you experience the same problem before? > > No. > > --Torsten > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From barry.moore at genetics.utah.edu Tue May 23 01:00:06 2006 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Mon, 22 May 2006 19:00:06 -0600 Subject: [Bioperl-l] Problems with Unflattener.pm Message-ID: <729FFBBD-955B-4689-8A27-66733E81431C@genetics.utah.edu> Hi All, NT_113910 appears to throw Bio::SeqFeatures::Tools::Unflattener into an infinite recursive loop. The trouble occurs in the method find_best_matches between lines 2258 and 2281, and in particular the loop is perpetuated by line 2273. NT_113910 has a fairly complex features table, and but I have as yet been unable to figure out why this loop is not exiting properly. This has been submitted to bugzilla, but I?ll post here so it gets documented on the list also. Any suggestions from Chris or others would be greatly appreciated. This problem can be recreated as follows: Grab NT_113910 from genbank. bp_fetch.pl -fmt genbank net::genbank:NT_113910 > NT_113910.gbk Pass NT_113910.gbk on the command line to the attached script. #!/usr/bin/perl; use strict; use warnings; use Bio::SeqIO; use Bio::SeqFeature::Tools::Unflattener; my $file = shift; # generate an Unflattener object my $unflattener = Bio::SeqFeature::Tools::Unflattener->new; #$unflattener->verbose(1); # first fetch a genbank SeqI object my $seqio = Bio::SeqIO->new(-file => $file, -format => 'GenBank'); my $out = Bio::SeqIO->new(-format => 'asciitree'); while (my $seq = $seqio->next_seq()) { # get top level unflattended SeqFeatureI objects $unflattener->unflatten_seq(-seq => $seq, -use_magic => 1); $out->write_seq($seq); } From miker at biotiquesystems.com Mon May 22 23:56:52 2006 From: miker at biotiquesystems.com (Michael Rogoff) Date: Mon, 22 May 2006 16:56:52 -0700 Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version Message-ID: <002a01c67dfb$663cc600$c100a8c0@mike> As best as I can tell, using Bio::SeqIO to parse a uniprot file ignores the sequence version, and calling seq_version() on the resulting RichSeq object returns undef. It looks like swiss.pm is trying to parse the version out of the SV line, which apparently doesn't exist any more? The sequence version(s) are now specified as part of the Date (DT) lines. Is this not a bug? Is swiss.pm not designed to parse uniprot files? Thanks for any help ... From jason.stajich at duke.edu Tue May 23 01:37:13 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon, 22 May 2006 21:37:13 -0400 Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version In-Reply-To: <002a01c67dfb$663cc600$c100a8c0@mike> References: <002a01c67dfb$663cc600$c100a8c0@mike> Message-ID: Sounds like a "missing feature" =) AFAIK the module was only written for swissprot files. It is possible there have been changes in the format that have not been tracked to the current code. We'd certainly appreciate someone testing it out as versions evolve. If you submit a bug to bugzilla with version of bioperl and example files you can track when a fix is in. We of course appreciate anyone's efforts to provide a patch as most bugs get fixed of late when someone gets "itchy" enough to fix them. -jason On May 22, 2006, at 7:56 PM, Michael Rogoff wrote: > > As best as I can tell, using Bio::SeqIO to parse a uniprot file > ignores the > sequence version, and calling seq_version() on the resulting > RichSeq object > returns undef. > > It looks like swiss.pm is trying to parse the version out of the SV > line, which > apparently doesn't exist any more? The sequence version(s) are now > specified as > part of the Date (DT) lines. > > Is this not a bug? Is swiss.pm not designed to parse uniprot files? > > Thanks for any help ... > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From jason.stajich at duke.edu Tue May 23 02:04:17 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon, 22 May 2006 22:04:17 -0400 Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version In-Reply-To: <003301c67e0b$5dd44410$c100a8c0@mike> References: <003301c67e0b$5dd44410$c100a8c0@mike> Message-ID: <3607997C-DAD4-4E0E-A919-7D9212AC6D50@duke.edu> We ask that people post patches to the bugzilla as an attachment to the bugzilla so we can track what and why the bug was that the patch fixes. I am not totally sure this patch works because it seems like we need to strip out more information now from the DT line if the $date actually contains more information than just the date. If you would go ahead and create a bug in bugzilla for this (http:// bugzilla.open-bio.org) this sort of conversation can be tracked to the bug. If any of this is unclear please let us know - I though we had put some pages up about this sort of thing on the wiki but maybe they need to be expanded. -jason On May 22, 2006, at 9:51 PM, Michael Rogoff wrote: > I have a patch that seems to work but I'm not familiar with the > proper method to > "provide" it. How do I go about that? > > The patch is pretty simple, it just parses the sequence version out > of the date > line where it now hides: > > #date > elsif( /^DT\s+(.*)/ ) { > my $date = $1; > + > + if ($date =~ /sequence version (\d+)/i) { > + $params{'-seq_version'} ||= $1; > + } > + > $date =~ s/\;//; > $date =~ s/\s+$//; > push @{$params{'-dates'}}, $date; > } > > By the way, what is the difference between Bio::Seq::version and > Bio::Seq::RichSeq::seq_version? > > >> -----Original Message----- >> From: Jason Stajich [mailto:jason.stajich at duke.edu] >> Sent: Monday, May 22, 2006 6:37 PM >> To: Michael Rogoff >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version >> >> >> Sounds like a "missing feature" =) >> >> AFAIK the module was only written for swissprot files. It is >> possible there have been changes in the format that have not been >> tracked to the current code. We'd certainly appreciate someone >> testing it out as versions evolve. If you submit a bug to bugzilla >> with version of bioperl and example files you can track when >> a fix is >> in. We of course appreciate anyone's efforts to provide a patch as >> most bugs get fixed of late when someone gets "itchy" enough to fix >> them. >> >> -jason >> >> On May 22, 2006, at 7:56 PM, Michael Rogoff wrote: >> >>> >>> As best as I can tell, using Bio::SeqIO to parse a uniprot file >>> ignores the >>> sequence version, and calling seq_version() on the resulting >>> RichSeq object >>> returns undef. >>> >>> It looks like swiss.pm is trying to parse the version out >> of the SV >>> line, which >>> apparently doesn't exist any more? The sequence version(s) >> are now >>> specified as >>> part of the Date (DT) lines. >>> >>> Is this not a bug? Is swiss.pm not designed to parse uniprot files? >>> >>> Thanks for any help ... >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From Marc.Logghe at DEVGEN.com Tue May 23 07:08:37 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Tue, 23 May 2006 09:08:37 +0200 Subject: [Bioperl-l] problems iwth Bio::graphics module Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746E4E@ANTARESIA.be.devgen.com> Hi Li, Did you check your script for any other print statements (to STDOUT, that is) that potentially could contaminate your png stream ? Marc > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chen li > Sent: Tuesday, May 23, 2006 12:58 AM > To: Torsten Seemann > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] problems iwth Bio::graphics module > > Hi, > > I try both: either with or without this statement > binmode(STDOUT) before the last line print $panel->png; But > there are no differenes. I get a file of 2432 bytes. > > Li > > > > > Chen Li > > > > > perl render_blast1.pl data1.txt >im.png > > > > Based on http://bioperl.org/wiki/HOWTO:Graphics I believe > the example > > script is creating a PNG image. The last line is: > > print $panel->png; > > > > > and Perl runs without any problem. I use adobe photoshop to open > > > them and Adobe can't recognize > > them. > > > If I use ACDSee to open them I only get a black background. If I > > > issue this line under Cygwin X > > window > > > display im.png or display im.gif > > > Cygwin says: > > > display: Improper image header `im.png'. > > > It seems Perl can't produce an image with right format. > > > > Are you sure Perl is producing a PNG file at all? > > How many bytes does im.png use? Zero? > > > > Did you notice this in > > http://bioperl.org/wiki/HOWTO:Graphics ? > > > > It says: "If you are on a Windows platform, you need to put STDOUT > > into binary mode so that the PNG file does not go through Window's > > carriage return/linefeed transformations. Before the final print > > statement, put the statement binmode(STDOUT)." > > > > ie. your script should have > > > > binmode(STDOUT); > > print $panel->png; > > > > as the last 2 lines. > > > > > Do you experience the same problem before? > > > > No. > > > > --Torsten > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection > around http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From chen_li3 at yahoo.com Tue May 23 13:27:06 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 23 May 2006 06:27:06 -0700 (PDT) Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746E4E@ANTARESIA.be.devgen.com> Message-ID: <20060523132706.57245.qmail@web36811.mail.mud.yahoo.com> Dear Dr. Logghe, Thank you so much. I have the script worked after getting your suggestion under Cygwin. Here are the last two lines: either binmode (STDOUT); print STDOUT $panel->png; or only print STDOUT $panel->png; They both work for me. I know the default output in perl to the screen. I don't why it works if STDOUT after print is added. Could you explain it? BTW I copy this script from GraphicsHowTo on Bioperl website and only one line contains print statement, which is 'print $panel->png'. Once again thank you so much, Li --- Marc Logghe wrote: > Hi Li, > Did you check your script for any other print > statements (to STDOUT, > that is) that potentially could contaminate your png > stream ? > > Marc > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open-bio.org] On > Behalf Of chen li > > Sent: Tuesday, May 23, 2006 12:58 AM > > To: Torsten Seemann > > Cc: bioperl-l at lists.open-bio.org > > Subject: Re: [Bioperl-l] problems iwth > Bio::graphics module > > > > Hi, > > > > I try both: either with or without this statement > > binmode(STDOUT) before the last line print > $panel->png; But > > there are no differenes. I get a file of 2432 > bytes. > > > > Li > > > > > > > > > Chen Li > > > > > > > perl render_blast1.pl data1.txt >im.png > > > > > > Based on http://bioperl.org/wiki/HOWTO:Graphics > I believe > > the example > > > script is creating a PNG image. The last line > is: > > > print $panel->png; > > > > > > > and Perl runs without any problem. I use adobe > photoshop to open > > > > them and Adobe can't recognize > > > them. > > > > If I use ACDSee to open them I only get a > black background. If I > > > > issue this line under Cygwin X > > > window > > > > display im.png or display im.gif > > > > Cygwin says: > > > > display: Improper image header `im.png'. > > > > It seems Perl can't produce an image with > right format. > > > > > > Are you sure Perl is producing a PNG file at > all? > > > How many bytes does im.png use? Zero? > > > > > > Did you notice this in > > > http://bioperl.org/wiki/HOWTO:Graphics ? > > > > > > It says: "If you are on a Windows platform, you > need to put STDOUT > > > into binary mode so that the PNG file does not > go through Window's > > > carriage return/linefeed transformations. Before > the final print > > > statement, put the statement binmode(STDOUT)." > > > > > > ie. your script should have > > > > > > binmode(STDOUT); > > > print $panel->png; > > > > > > as the last 2 lines. > > > > > > > Do you experience the same problem before? > > > > > > No. > > > > > > --Torsten > > > > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection > > around http://mail.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From lstein at cshl.edu Tue May 23 14:06:27 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 23 May 2006 10:06:27 -0400 Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com> References: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com> Message-ID: <200605231006.28392.lstein@cshl.edu> Hi, It is possible that your version of display can't handle PNG images. Try saving the output as a file and then opening it in another image program: perl render_blast1.pl data1.txt > data1.png Another thing to watch out for is that, depending on what version of Perl you're using, you may have to insert this statement into the render_blast1.pl script (somewhere near the top): binmode STDOUT; Lincoln On Saturday 20 May 2006 20:15, chen li wrote: > Dear all, > > > I try one script from GraphicsHowTo under Cygwin > environment(GD and libpng already installed). I type > this line in Cygwin X window: > > > $ perl render_blast1.pl data1.txt | display - > > And here is the result: > > display: no decode delegate for this image format > `/tmp/magick-qKiRPDRS'. > > Any idea? > > > Thank you very much, > > Li > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From Derek.Fairley at bll.n-i.nhs.uk Tue May 23 14:39:16 2006 From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek) Date: Tue, 23 May 2006 15:39:16 +0100 Subject: [Bioperl-l] Bio::Restriction::IO query Message-ID: Hi folks, I'm new to BioPerl, and struggling to make the Bio::Restriction::* modules work (using BioPerl-1.4; Perl-5.8.1; Linux-2.4). Specifically, I'm having some trouble understanding the behaviour of the Bio::Restriction::IO module. I'm trying to use this to create a Bio::Restriction::EnzymeCollection object from a local REBASE file (which is in bairoch-format); this will in turn be passed to a Bio::Restriction::Analysis object. The following test script (derived from the Bio::Restriction::IO perldoc) runs fine: #! /usr/bin/perl -w use strict; use warnings; use Bio::Restriction::IO; my $in = Bio::Restriction::IO->new( -file => "REBASE_file", -format =>'Bairoch'); my $collection = $in->read(); print "Number of REs in the collection: ", scalar $collection->each_enzyme, "\n"; #note that using -format=>'bairoch' without capitalisation (as shown in perldoc synopsis) throws an exception: Failed to load module Bio::Restriction::IO::bairoch... However... the test script returns the number 532 - the number of enzymes in the default enzyme set - regardless of the number of enzymes in the file. A default Bio::Restriction::EnzymeCollection object has presumably been created (as the 'read()' and 'each_enzyme' methods are available) but it didn't come from the local file. The result is the same if the Bio::Restriction::IO->new() method is called with no arguments - a default EnzymeCollection object is created. It's not clear to me where this has come from. My (mis?)understanding was that the default set of enzymes would be loaded on creation of a new Bio::Restriction::Analysis object (in the absence of a -enzymes=>... argument). Presumably this is down to my poor understanding of the BioPerl object model... ;-) So: how should I create an EnzymeCollection object from file? Any help or advice would be gratefully received. PS. Congratulations to the development team for creating a very impressive and useful open source toolkit. Derek. ----------------------------------------- Derek Fairley, Ph.D. Regional Virus Laboratory, Kelvin Building, Royal Victoria Hospital, Grosvenor Road, Belfast, N. Ireland. BT12 6BA Tel. +44 (0)2890 635303 From rowan.mitchell at bbsrc.ac.uk Tue May 23 14:53:42 2006 From: rowan.mitchell at bbsrc.ac.uk (rowan mitchell (RRes-Roth)) Date: Tue, 23 May 2006 15:53:42 +0100 Subject: [Bioperl-l] Assembly::IO ace output Message-ID: Hi I am very interested in writing ace format files and had assumed that I would be able to do this with Assembly::IO until I tried it! I see there has been some correspondence last year on this, but as far as I can see this is still not implemented in 1.5.1. Is this correct ? Is it planned to be included; are there modules under development available ? many thanks Rowan =============================================== Dr Rowan Mitchell Rothamsted Research Harpenden Herts AL5 2JQ UK Tel: +44 (0)1582 763133 x2469 Fax: +44 (0)1582 763010 E-mail: rowan.mitchell at bbsrc.ac.uk WWW: http://www.rothamsted.bbsrc.ac.uk/ =============================================== Rothamsted Research is a company limited by guarantee, registered in England under the registration number 2393175 and a not for profit charity number 802038. From rfsouza at cecm.usp.br Tue May 23 20:17:36 2006 From: rfsouza at cecm.usp.br (Robson Francisco de Souza {S}) Date: Tue, 23 May 2006 17:17:36 -0300 Subject: [Bioperl-l] Assembly::IO ace output In-Reply-To: References: Message-ID: <20060523201736.GA28401@cecm.usp.br> Hi Rowan, On Tue, May 23, 2006 at 03:53:42PM +0100, rowan mitchell (RRes-Roth) wrote: > Hi > > I am very interested in writing ace format files and had assumed that I > would be able to do this with Assembly::IO until I tried it! I see there > has been some correspondence last year on this, but as far as I can see > this is still not implemented in 1.5.1. Is this correct ? Is it planned > to be included; are there modules under development available ? As far as I know, there are no plans to add write support to Bio::Assembly::IO. When I wrote the original modules there was no need for this so I left it aside. Best regards, Robson > many thanks > > Rowan > > =============================================== > Dr Rowan Mitchell > Rothamsted Research > Harpenden > Herts AL5 2JQ UK > > Tel: +44 (0)1582 763133 x2469 > Fax: +44 (0)1582 763010 > E-mail: rowan.mitchell at bbsrc.ac.uk > WWW: http://www.rothamsted.bbsrc.ac.uk/ > =============================================== > Rothamsted Research is a company limited by guarantee, registered in > England under the registration number 2393175 and a not for profit > charity number 802038. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From lstein at cshl.edu Tue May 23 20:53:34 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 23 May 2006 16:53:34 -0400 Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <200605231006.28392.lstein@cshl.edu> References: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com> <200605231006.28392.lstein@cshl.edu> Message-ID: <200605231653.36087.lstein@cshl.edu> Hi Chen, It looks to me like you cut and paste the data1.txt file from the web site, consequently replacing the tabs with spaces. Please get table1.txt from the BioPerl distribution, as instructed in the tutorial. Best, Lincoln On Tuesday 23 May 2006 10:06, Lincoln Stein wrote: > Hi, > > It is possible that your version of display can't handle PNG images. Try > saving the output as a file and then opening it in another image program: > > perl render_blast1.pl data1.txt > data1.png > > Another thing to watch out for is that, depending on what version of Perl > you're using, you may have to insert this statement into the > render_blast1.pl script (somewhere near the top): > > binmode STDOUT; > > Lincoln > > On Saturday 20 May 2006 20:15, chen li wrote: > > Dear all, > > > > > > I try one script from GraphicsHowTo under Cygwin > > environment(GD and libpng already installed). I type > > this line in Cygwin X window: > > > > > > $ perl render_blast1.pl data1.txt | display - > > > > And here is the result: > > > > display: no decode delegate for this image format > > `/tmp/magick-qKiRPDRS'. > > > > Any idea? > > > > > > Thank you very much, > > > > Li > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam protection around > > http://mail.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From chen_li3 at yahoo.com Tue May 23 21:46:16 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 23 May 2006 14:46:16 -0700 (PDT) Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <200605231653.36087.lstein@cshl.edu> Message-ID: <20060523214616.15131.qmail@web36813.mail.mud.yahoo.com> Dear Dr. Stein, Thank you so much. I follow your suggestions and download codes from the Bioperl CVS website. Now everything is working. Li --- Lincoln Stein wrote: > Hi Chen, > > It looks to me like you cut and paste the data1.txt > file from the web site, > consequently replacing the tabs with spaces. Please > get table1.txt from the > BioPerl distribution, as instructed in the tutorial. > > Best, > > Lincoln > > On Tuesday 23 May 2006 10:06, Lincoln Stein wrote: > > Hi, > > > > It is possible that your version of display can't > handle PNG images. Try > > saving the output as a file and then opening it in > another image program: > > > > perl render_blast1.pl data1.txt > data1.png > > > > Another thing to watch out for is that, depending > on what version of Perl > > you're using, you may have to insert this > statement into the > > render_blast1.pl script (somewhere near the top): > > > > binmode STDOUT; > > > > Lincoln > > > > On Saturday 20 May 2006 20:15, chen li wrote: > > > Dear all, > > > > > > > > > I try one script from GraphicsHowTo under Cygwin > > > environment(GD and libpng already installed). I > type > > > this line in Cygwin X window: > > > > > > > > > $ perl render_blast1.pl data1.txt | display - > > > > > > And here is the result: > > > > > > display: no decode delegate for this image > format > > > `/tmp/magick-qKiRPDRS'. > > > > > > Any idea? > > > > > > > > > Thank you very much, > > > > > > Li > > > > > > > > > > __________________________________________________ > > > Do You Yahoo!? > > > Tired of spam? Yahoo! Mail has the best spam > protection around > > > http://mail.yahoo.com > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From chen_li3 at yahoo.com Tue May 23 22:59:46 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 23 May 2006 15:59:46 -0700 (PDT) Subject: [Bioperl-l] How to download sequence files either in EMBL format Message-ID: <20060523225946.2118.qmail@web36805.mail.mud.yahoo.com> Hi all, I need to download one sequence for a gene. I go to NCBI website,find the gene of interest,download the file in Genbank format(saved as sequence.genbank). But to my surprise this so-called genbank format file doesn't contain many features such as exons,compared to the one in Emsembl. My question: where can I download this sequence file in EMBL format? It looks like the one in EMBL might contain other information such exon. Thank you very much, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From osborne1 at optonline.net Wed May 24 14:33:16 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 24 May 2006 10:33:16 -0400 Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <20060523132706.57245.qmail@web36811.mail.mud.yahoo.com> Message-ID: Li, The Graphics HOWTO talks about this Windows workaround in _four_ different places, it's impossible to miss if you read it from start to finish. This is what one should do if one wants to use these modules and one is a novice. Example: Important! Remember that if you are on a Windows platform, you need to put STDOUT into binary mode so that the PNG file does not go through Window's carriage return/linefeed transformations. Before the final print statement, write binmode(STDOUT). Brian O. On 5/23/06 9:27 AM, "chen li" wrote: > BTW I copy this script from GraphicsHowTo on Bioperl > website and only one line contains print statement, > which is 'print $panel->png'. From chen_li3 at yahoo.com Wed May 24 16:17:15 2006 From: chen_li3 at yahoo.com (chen li) Date: Wed, 24 May 2006 09:17:15 -0700 (PDT) Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: Message-ID: <20060524161715.45141.qmail@web36807.mail.mud.yahoo.com> Thanks but Dr. Stein already helps me to figure out what is going on: I should have copied the source codes for the examples in CVS instead of "cut and paste" from the HOWTO tutorial. And sorry for any inconvience. Li --- Brian Osborne wrote: > Li, > > The Graphics HOWTO talks about this Windows > workaround in _four_ different > places, it's impossible to miss if you read it from > start to finish. This is > what one should do if one wants to use these modules > and one is a novice. > Example: > > Important! Remember that if you are on a Windows > platform, you need to put > STDOUT into binary mode so that the PNG file does > not go through Window's > carriage return/linefeed transformations. Before the > final print statement, > write binmode(STDOUT). > > Brian O. > > > On 5/23/06 9:27 AM, "chen li" > wrote: > > > BTW I copy this script from GraphicsHowTo on > Bioperl > > website and only one line contains print > statement, > > which is 'print $panel->png'. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From ULNJUJERYDIX at spammotel.com Thu May 25 01:59:36 2006 From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau) Date: Thu, 25 May 2006 09:59:36 +0800 Subject: [Bioperl-l] URGENT: Bio::Graphics::Panel make the ruler have negative (-) position numbering imagemap making Message-ID: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com> Hi thanks for the help offered thus far! sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq using bioperl. therefore i was asked to make the numberings as such (-1000) is there any way at all to do this in bioperl without changing the .pm file? thanks guys.. kevin From cjfields at uiuc.edu Thu May 25 16:43:37 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 25 May 2006 11:43:37 -0500 Subject: [Bioperl-l] Problems with Unflattener.pm In-Reply-To: <729FFBBD-955B-4689-8A27-66733E81431C@genetics.utah.edu> Message-ID: <009d01c6801a$5f75d2a0$15327e82@pyrimidine> I was able to reproduce this using WinXP and bioperl-live. Seems to get caught up in the loop during recursion: debugging shows it is unable to get past 'find_best_matches: (/15)'. There are lots of unmatched pairs here with this sequence, so could that be the problem? I not terribly familiar with Unflattener... Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Barry Moore > Sent: Monday, May 22, 2006 8:00 PM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] Problems with Unflattener.pm > > Hi All, > > NT_113910 appears to throw Bio::SeqFeatures::Tools::Unflattener into > an infinite recursive loop. The trouble occurs in the method > find_best_matches between lines 2258 and 2281, and in particular the > loop is perpetuated by line 2273. NT_113910 has a fairly complex > features table, and but I have as yet been unable to figure out why > this loop is not exiting properly. This has been submitted to > bugzilla, but I'll post here so it gets documented on the list also. > Any suggestions from Chris or others would be greatly appreciated. > > This problem can be recreated as follows: > > Grab NT_113910 from genbank. > bp_fetch.pl -fmt genbank net::genbank:NT_113910 > NT_113910.gbk > > Pass NT_113910.gbk on the command line to the attached script. > > > > #!/usr/bin/perl; > > use strict; > use warnings; > > use Bio::SeqIO; > use Bio::SeqFeature::Tools::Unflattener; > > my $file = shift; > > # generate an Unflattener object > my $unflattener = Bio::SeqFeature::Tools::Unflattener->new; > #$unflattener->verbose(1); > > # first fetch a genbank SeqI object > my $seqio = > Bio::SeqIO->new(-file => $file, > -format => 'GenBank'); > my $out = > Bio::SeqIO->new(-format => 'asciitree'); > while (my $seq = $seqio->next_seq()) { > > # get top level unflattended SeqFeatureI objects > $unflattener->unflatten_seq(-seq => $seq, > -use_magic => 1); > $out->write_seq($seq); > } > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Thu May 25 19:44:01 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 25 May 2006 14:44:01 -0500 Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version In-Reply-To: <3607997C-DAD4-4E0E-A919-7D9212AC6D50@duke.edu> Message-ID: <00a101c68033$95606dd0$15327e82@pyrimidine> This is due to recent changes in the SwissProt/UniProt format (there apparently are many other changes besides this). >From UniProtKB news (http://ca.expasy.org/sprot/relnotes/sp_news.html) is this tidbit: ---------------------------------------------------------- UniProtKB release 7.0 of 07-Feb-2006 Changes concerning dates and versions numbers (DT lines) We changed from showing only the dates corresponding to full UniProtKB releases in the DT lines to displaying the date of the biweekly release at which an entry is integrated or updated. We dropped the information concerning the release number and introduced entry and sequence version numbers in the DT lines. The new format of the three DT lines is: DT DD-MMM-YYYY, integrated into UniProtKB/database_name. DT DD-MMM-YYYY, sequence version version_number. DT DD-MMM-YYYY, entry version version_number. Example for UniProtKB/Swiss-Prot: DT 01-JAN-1998, integrated into UniProtKB/Swiss-Prot. DT 15-OCT-2001, sequence version 3. DT 01-APR-2004, entry version 14. Example for UniProtKB/TrEMBL: DT 01-FEB-1999, integrated into UniProtKB/TrEMBL. DT 15-OCT-2000, sequence version 2. DT 15-DEC-2004, entry version 5. The sequence version number of an entry is incremented by one when its amino acid sequence is modified. The entry version number is incremented by one whenever any data in the flat file representation of the entry is modified. We retrofitted the entry and sequence version numbers, as well as all dates, using archived UniProtKB releases. ---------------------------------------------------------- Probably should explain on the swissprot wiki page that the format is in a state of flux at the moment. I've added this tidbit to the bug page (#2003) as well. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Jason Stajich > Sent: Monday, May 22, 2006 9:04 PM > To: Michael Rogoff > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version > > We ask that people post patches to the bugzilla as an attachment to > the bugzilla so we can track what and why the bug was that the patch > fixes. > > I am not totally sure this patch works because it seems like we need > to strip out more information now from the DT line if the $date > actually contains more information than just the date. > > If you would go ahead and create a bug in bugzilla for this (http:// > bugzilla.open-bio.org) this sort of conversation can be tracked to > the bug. > > If any of this is unclear please let us know - I though we had put > some pages up about this sort of thing on the wiki but maybe they > need to be expanded. > > -jason > On May 22, 2006, at 9:51 PM, Michael Rogoff wrote: > > > I have a patch that seems to work but I'm not familiar with the > > proper method to > > "provide" it. How do I go about that? > > > > The patch is pretty simple, it just parses the sequence version out > > of the date > > line where it now hides: > > > > #date > > elsif( /^DT\s+(.*)/ ) { > > my $date = $1; > > + > > + if ($date =~ /sequence version (\d+)/i) { > > + $params{'-seq_version'} ||= $1; > > + } > > + > > $date =~ s/\;//; > > $date =~ s/\s+$//; > > push @{$params{'-dates'}}, $date; > > } > > > > By the way, what is the difference between Bio::Seq::version and > > Bio::Seq::RichSeq::seq_version? > > > > > >> -----Original Message----- > >> From: Jason Stajich [mailto:jason.stajich at duke.edu] > >> Sent: Monday, May 22, 2006 6:37 PM > >> To: Michael Rogoff > >> Cc: bioperl-l at lists.open-bio.org > >> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version > >> > >> > >> Sounds like a "missing feature" =) > >> > >> AFAIK the module was only written for swissprot files. It is > >> possible there have been changes in the format that have not been > >> tracked to the current code. We'd certainly appreciate someone > >> testing it out as versions evolve. If you submit a bug to bugzilla > >> with version of bioperl and example files you can track when > >> a fix is > >> in. We of course appreciate anyone's efforts to provide a patch as > >> most bugs get fixed of late when someone gets "itchy" enough to fix > >> them. > >> > >> -jason > >> > >> On May 22, 2006, at 7:56 PM, Michael Rogoff wrote: > >> > >>> > >>> As best as I can tell, using Bio::SeqIO to parse a uniprot file > >>> ignores the > >>> sequence version, and calling seq_version() on the resulting > >>> RichSeq object > >>> returns undef. > >>> > >>> It looks like swiss.pm is trying to parse the version out > >> of the SV > >>> line, which > >>> apparently doesn't exist any more? The sequence version(s) > >> are now > >>> specified as > >>> part of the Date (DT) lines. > >>> > >>> Is this not a bug? Is swiss.pm not designed to parse uniprot files? > >>> > >>> Thanks for any help ... > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> -- > >> Jason Stajich > >> Duke University > >> http://www.duke.edu/~jes12 > >> > >> > >> > > > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From miker at biotiquesystems.com Tue May 23 01:51:10 2006 From: miker at biotiquesystems.com (Michael Rogoff) Date: Mon, 22 May 2006 18:51:10 -0700 Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version In-Reply-To: Message-ID: <003301c67e0b$5dd44410$c100a8c0@mike> I have a patch that seems to work but I'm not familiar with the proper method to "provide" it. How do I go about that? The patch is pretty simple, it just parses the sequence version out of the date line where it now hides: #date elsif( /^DT\s+(.*)/ ) { my $date = $1; + + if ($date =~ /sequence version (\d+)/i) { + $params{'-seq_version'} ||= $1; + } + $date =~ s/\;//; $date =~ s/\s+$//; push @{$params{'-dates'}}, $date; } By the way, what is the difference between Bio::Seq::version and Bio::Seq::RichSeq::seq_version? > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich at duke.edu] > Sent: Monday, May 22, 2006 6:37 PM > To: Michael Rogoff > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version > > > Sounds like a "missing feature" =) > > AFAIK the module was only written for swissprot files. It is > possible there have been changes in the format that have not been > tracked to the current code. We'd certainly appreciate someone > testing it out as versions evolve. If you submit a bug to bugzilla > with version of bioperl and example files you can track when > a fix is > in. We of course appreciate anyone's efforts to provide a patch as > most bugs get fixed of late when someone gets "itchy" enough to fix > them. > > -jason > > On May 22, 2006, at 7:56 PM, Michael Rogoff wrote: > > > > > As best as I can tell, using Bio::SeqIO to parse a uniprot file > > ignores the > > sequence version, and calling seq_version() on the resulting > > RichSeq object > > returns undef. > > > > It looks like swiss.pm is trying to parse the version out > of the SV > > line, which > > apparently doesn't exist any more? The sequence version(s) > are now > > specified as > > part of the Date (DT) lines. > > > > Is this not a bug? Is swiss.pm not designed to parse uniprot files? > > > > Thanks for any help ... > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > From chen_li3 at yahoo.com Tue May 23 15:48:46 2006 From: chen_li3 at yahoo.com (chen li) Date: Tue, 23 May 2006 08:48:46 -0700 (PDT) Subject: [Bioperl-l] problems iwth Bio::graphics module In-Reply-To: <200605231006.28392.lstein@cshl.edu> Message-ID: <20060523154846.70831.qmail@web36815.mail.mud.yahoo.com> Dear Dr. Stein, I have the job partially done by adding this line (under Cygwin) print STDOUT $panel->png; It is done because I can produce the image to be viewed by other programs but it is only partially done because I don't get exactly the same image as that shown on the website. Enclosed is the image I get. Thank you, Li --- Lincoln Stein wrote: > Hi, > > It is possible that your version of display can't > handle PNG images. Try > saving the output as a file and then opening it in > another image program: > > perl render_blast1.pl data1.txt > data1.png > > Another thing to watch out for is that, depending on > what version of Perl > you're using, you may have to insert this statement > into the render_blast1.pl > script (somewhere near the top): > > binmode STDOUT; > > Lincoln > > > On Saturday 20 May 2006 20:15, chen li wrote: > > Dear all, > > > > > > I try one script from GraphicsHowTo under Cygwin > > environment(GD and libpng already installed). I > type > > this line in Cygwin X window: > > > > > > $ perl render_blast1.pl data1.txt | display - > > > > And here is the result: > > > > display: no decode delegate for this image format > > `/tmp/magick-qKiRPDRS'. > > > > Any idea? > > > > > > Thank you very much, > > > > Li > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -------------- next part -------------- A non-text attachment was scrubbed... Name: im1 Type: image/x-png Size: 2423 bytes Desc: 2615755531-im1 URL: From cjfields at uiuc.edu Fri May 26 01:28:14 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 25 May 2006 20:28:14 -0500 Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version In-Reply-To: <003301c67e0b$5dd44410$c100a8c0@mike> References: <003301c67e0b$5dd44410$c100a8c0@mike> Message-ID: This patch works only for the recent change in swissprot seq format for sequence versions on the DT line. I checked it out vs the test data provided with bioperl (t\data\swiss.dat). I did manage to get it working for both old and new using a modification to your patch but there's another issue; using $seq->get_dates, which should only show dates, shows the entire line (date and version info). Jason mentioned that there needs to be a better way to address this which I'm looking into. Chris On May 22, 2006, at 8:51 PM, Michael Rogoff wrote: > I have a patch that seems to work but I'm not familiar with the > proper method to > "provide" it. How do I go about that? > > The patch is pretty simple, it just parses the sequence version out > of the date > line where it now hides: > > #date > elsif( /^DT\s+(.*)/ ) { > my $date = $1; > + > + if ($date =~ /sequence version (\d+)/i) { > + $params{'-seq_version'} ||= $1; > + } > + > $date =~ s/\;//; > $date =~ s/\s+$//; > push @{$params{'-dates'}}, $date; > } > > By the way, what is the difference between Bio::Seq::version and > Bio::Seq::RichSeq::seq_version? > > >> -----Original Message----- >> From: Jason Stajich [mailto:jason.stajich at duke.edu] >> Sent: Monday, May 22, 2006 6:37 PM >> To: Michael Rogoff >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version >> >> >> Sounds like a "missing feature" =) >> >> AFAIK the module was only written for swissprot files. It is >> possible there have been changes in the format that have not been >> tracked to the current code. We'd certainly appreciate someone >> testing it out as versions evolve. If you submit a bug to bugzilla >> with version of bioperl and example files you can track when >> a fix is >> in. We of course appreciate anyone's efforts to provide a patch as >> most bugs get fixed of late when someone gets "itchy" enough to fix >> them. >> >> -jason >> >> On May 22, 2006, at 7:56 PM, Michael Rogoff wrote: >> >>> >>> As best as I can tell, using Bio::SeqIO to parse a uniprot file >>> ignores the >>> sequence version, and calling seq_version() on the resulting >>> RichSeq object >>> returns undef. >>> >>> It looks like swiss.pm is trying to parse the version out >> of the SV >>> line, which >>> apparently doesn't exist any more? The sequence version(s) >> are now >>> specified as >>> part of the Date (DT) lines. >>> >>> Is this not a bug? Is swiss.pm not designed to parse uniprot files? >>> >>> Thanks for any help ... >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From lstein at cshl.edu Fri May 26 14:38:29 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 26 May 2006 10:38:29 -0400 Subject: [Bioperl-l] URGENT: Bio::Graphics::Panel make the ruler have negative (-) position numbering imagemap making In-Reply-To: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com> References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com> Message-ID: <200605261038.30380.lstein@cshl.edu> Hi, For some reason I didn't see the first posting on this. In current bioperl live, the ruler can have negative numberings - I use this routinely. You need to create a feature that starts in negative coordinates. What is happening to you when you try this? Lincoln On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote: > Hi > thanks for the help offered thus far! > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq using > bioperl. therefore i was asked to make the numberings as such (-1000) is > there any way at all to do this in bioperl without changing the .pm file? > > thanks guys.. > kevin > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From jelenaob at gmail.com Fri May 26 16:47:05 2006 From: jelenaob at gmail.com (Jelena Obradovic) Date: Fri, 26 May 2006 09:47:05 -0700 Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file Message-ID: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com> Hi there, I have tried loading enzyme list from a file REBASE bairoch.605 using Bio::Restriction::IO; 1. But for some reason the number of enzymes in the list is always 532 which is a default set of enzymes in enzyme collection. Is there any known issue with this module or a workaround? And here is the code I have been using: my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-format=>"Bairoch") || die "can't load the file bairoch.605: $!"; my $enzymes = $re_in->read; print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n"; 2. The other problem is when trying to use format that is lower-case it throws an exception, but when "B" is capitalized it is ok. I assume it cannot load a file and does not initilize enzyme collection properly. Can't call method "each_enzyme" on an undefined value at .../cgi-bin/seq-load.pl line 51. Any thoughts? Thanks in advance, Jelena Obradovic jelenaob at gmail.com From cjfields at uiuc.edu Fri May 26 19:27:13 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 26 May 2006 14:27:13 -0500 Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file In-Reply-To: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com> Message-ID: <002601c680fa$644635a0$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic > Sent: Friday, May 26, 2006 11:47 AM > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file > > Hi there, > > I have tried loading enzyme list from a file REBASE bairoch.605 using > Bio::Restriction::IO; > > 1. But for some reason the number of enzymes in the list is always 532 > which is a default set of enzymes in enzyme collection. > > Is there any known issue with this module or a workaround? > > And here is the code I have been using: > > my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- > format=>"Bairoch") > || die "can't load the file bairoch.605: $!"; > my $enzymes = $re_in->read; > print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n"; my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- format=>"Bairoch"); should be my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- format=>"bairoch"); Note the case change for the format; this is noted in the bug report you submitted earlier. Bio::Restriction::IO works similarly to Bio::SeqIO (i.e. requires a specific format, which I believe is case-sensitive). Judging by the modules in Bio/Restriction/IO directory, looks like the Bio::Restriction::IO format should match one of the following formats: bairoch, itype2, withrefm, and you can also build your own if needed using the previous as examples and implementing Bio::Restriction::IO::base. > 2. The other problem is when trying to use format that is lower-case > it throws an exception, but when "B" is capitalized it is ok. > I assume it cannot load a file and does not initilize enzyme > collection properly. > > Can't call method "each_enzyme" on an undefined value at > .../cgi-bin/seq-load.pl line 51. My guess? The reason it works with an uppercase ('Bairoch') is that it can't find the module and uses the default set of enzymes as a fallback. The exception that you reported when you use lowercase ('bairoch') is real and I reported it as a bug (there are a few I found in that module). You might want to try using one of the other formats if you can get the files in the right format from REBASE. I'm looking into the bugs specifically associated with Bio::Restriction::IO::bairoch. > Any thoughts? > > > Thanks in advance, > > > Jelena Obradovic > jelenaob at gmail.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Fri May 26 19:43:18 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 26 May 2006 15:43:18 -0400 Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file In-Reply-To: <002601c680fa$644635a0$15327e82@pyrimidine> Message-ID: Chris, SeqIO's arguments are case-insensitive (e.g. 'fasta', 'Fasta', 'FASTA' should work). This is what the documentation says and what the code seems to suggest. This is probably what the Restriction modules should do as well. Brian O. From cjfields at uiuc.edu Fri May 26 20:21:03 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 26 May 2006 15:21:03 -0500 Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file In-Reply-To: Message-ID: <002701c68101$e9432540$15327e82@pyrimidine> Okay, my bad. Having the format be case-insensitive makes sense and is probably an easy fix, but there seem to be more serious issues with the Bio::Restriction::IO modules at the moment. None have implemented write methods though POD implies they work: SYNOPSIS use Bio::Restriction::IO; $in = Bio::Restriction::IO->new(-file => "inputfilename" , -format => 'withrefm'); $out = Bio::Restriction::IO->new(-file => ">outputfilename" , -format => 'bairoch'); my $res = $in->read; # a Bio::Restriction::EnzymeCollection $out->write($res); and no tests exist for Bio::Restriction::IO::bairoch yet. In fact, the tests are pretty confusing; when did we allow this syntax: '-format => 8'? Anyway, I'm muddling my way through this and will probably write something up for the project priority list if I can't work this bug out. Chris > -----Original Message----- > From: Brian Osborne [mailto:osborne1 at optonline.net] > Sent: Friday, May 26, 2006 2:43 PM > To: Chris Fields; 'Jelena Obradovic'; Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::Restriction::IO and REBASE file > > Chris, > > SeqIO's arguments are case-insensitive (e.g. 'fasta', 'Fasta', 'FASTA' > should work). This is what the documentation says and what the code seems > to > suggest. This is probably what the Restriction modules should do as well. > > Brian O. > > From andreas.bender at complife.org Fri May 26 14:50:03 2006 From: andreas.bender at complife.org (Andreas Bender (CompLife'06)) Date: Fri, 26 May 2006 10:50:03 -0400 Subject: [Bioperl-l] Bioperl-based Applications for "Free Software" Session? Message-ID: Dear All, Did anyone of you implement some cool programs/tools using Bioperl? Or is there someone from the Bioperl core team who wants to present Bioperl itself at our conference? We are holding a "free software" session (free at least as in free beer, ideally also open source, some GNU-type license) at our "Computational Life Sciences" Conference in Cambridge/UK later this year and you are warmly welcome to present your software there. Please contact me directly or visit the website in case of any questions. Enjoy the weekend, Andreas Call for Contributions ================================================== LIFE SCIENCE FREE SOFTWARE SESSION held at CompLife 2006 (http://www.complife.org) in Cambridge, United Kingdom, on September 27 - 29, 2006 ================================================== In the last years more and more free and open source software has been developed for chemo- and bioinformatics, molecular modelling or other Life Science applications, but many of the programs are not well known. During the CompLife 2006 conference we will organize a special session dedicated to this type of free software. The demo session will be preceeded by a short session having room for brief introductory presentations whereas the demo session itself will allow attendees to see the tools in action. Authors of free software will have the opportunity to present their program to the CompLife audience which will consist of researchers and users from computer science, biology, chemistry and everything in between. In case you are interested in the free software session, send us an email at fss at complife.org and briefly describe your program and how you intend to present it at the conference (1-2 pages max - please include URL to downloadable version where available). The only restrictions are that the program must be freely available for everyone or even open source and that it must be related to Life Science applications. The deadline for these proposals is June, 16th 2006. In mid July we will notify you if your software demo was accepted. ************************ -- Computational Life Sciences '06 Cambridge/UK, 27-29 September 2006: Visit http://www.complife.org for more information! Andreas Kieron Patrick Bender - http://www.andreasbender.de Novartis Institutes for BioMedical Research, Cambridge/MA From cjfields at uiuc.edu Fri May 26 21:19:08 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 26 May 2006 16:19:08 -0500 Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file In-Reply-To: <286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com> Message-ID: <002b01c6810a$06642400$15327e82@pyrimidine> The POD documentation is a bit misleading for Bio::Restriction::IO. Brian's right, there needs to be more flexibility with the case for the formats used. I found a few other odd things as well which I may file bug reports for. Looks like another post for the project priority list. Chris _____ From: Jelena Obradovic [mailto:jobradovic at gmail.com] Sent: Friday, May 26, 2006 3:56 PM To: Chris Fields Cc: Jelena Obradovic; Bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio::Restriction::IO and REBASE file Hi guys, I tried with the other formats, and it works fine with "withrefm" format but not with "withref". Thanks a lot for your reponse. Cheers, Jelena On 5/26/06, Chris Fields wrote: > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic > Sent: Friday, May 26, 2006 11:47 AM > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file > > Hi there, > > I have tried loading enzyme list from a file REBASE bairoch.605 using > Bio::Restriction::IO; > > 1. But for some reason the number of enzymes in the list is always 532 > which is a default set of enzymes in enzyme collection. > > Is there any known issue with this module or a workaround? > > And here is the code I have been using: > > my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- > format=>"Bairoch") > || die "can't load the file bairoch.605: $!"; > my $enzymes = $re_in->read; > print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n"; my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- format=>"Bairoch"); should be my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- format=>"bairoch"); Note the case change for the format; this is noted in the bug report you submitted earlier. Bio::Restriction::IO works similarly to Bio::SeqIO ( i.e. requires a specific format, which I believe is case-sensitive). Judging by the modules in Bio/Restriction/IO directory, looks like the Bio::Restriction::IO format should match one of the following formats: bairoch, itype2, withrefm, and you can also build your own if needed using the previous as examples and implementing Bio::Restriction::IO::base. > 2. The other problem is when trying to use format that is lower-case > it throws an exception, but when "B" is capitalized it is ok. > I assume it cannot load a file and does not initilize enzyme > collection properly. > > Can't call method "each_enzyme" on an undefined value at > .../cgi-bin/seq-load.pl line 51. My guess? The reason it works with an uppercase ('Bairoch') is that it can't find the module and uses the default set of enzymes as a fallback. The exception that you reported when you use lowercase ('bairoch') is real and I reported it as a bug (there are a few I found in that module). You might want to try using one of the other formats if you can get the files in the right format from REBASE. I'm looking into the bugs specifically associated with Bio::Restriction::IO::bairoch. > Any thoughts? > > > Thanks in advance, > > > Jelena Obradovic > jelenaob at gmail.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jelena Obradovic Email: jobradovic at gmail.com From jay at jays.net Sat May 27 16:47:27 2006 From: jay at jays.net (Jay Hannah) Date: Sat, 27 May 2006 11:47:27 -0500 Subject: [Bioperl-l] "Project OpenLab" (working title) Message-ID: <4478829F.5030508@jays.net> Hola -- We've been kicking around this idea for a few months now. I'm threatening to start coding. Once I do I might not sleep for a few weeks so I thought I'd solicit feedback now. :) "Project OpenLab": http://omaha.pm.org/kwiki/?BioPerl - Does any such project already exist? - If there's no other obvious choice already bent to BioPerl / BioPerl DB / BioSQL, I'll probably be writing the web framework in Perl's Template Toolkit. The server is Linux, Apache, mySQL (BioPerl DBs). - I'll be using BioPerl objects for the persistence layer as much as possible. Where not possible I'll ask this list about my patches/additions/ugly hackery. - I'll be discussing my back office tables like "users" that don't belong in bioperl-db; and my questions about new tables that might belong there on the BioSQL-l mailing list. - I'm not a computer language zealot (usually), so I'm open to out-of-the-box ideas from anyone. - I'm a biology newb with a long Perl/database/web/e-commerce background, so please feel free to point out any bio idiocy I engage in. Thanks for your time, j From fernan at iib.unsam.edu.ar Sat May 27 22:30:44 2006 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Sat, 27 May 2006 19:30:44 -0300 Subject: [Bioperl-l] "Project OpenLab" (working title) In-Reply-To: <4478829F.5030508@jays.net> References: <4478829F.5030508@jays.net> Message-ID: <20060527223044.GA40583@iib.unsam.edu.ar> +----[ Jay Hannah (27.May.2006 15:15): | | Hola -- Hola! | We've been kicking around this idea for a few months now. I'm threatening to start coding. Once I do I might not sleep for a few weeks so I thought I'd solicit feedback now. :) | | "Project OpenLab": | http://omaha.pm.org/kwiki/?BioPerl | | - Does any such project already exist? mmm ... maybe ... both GUS (Genomics Unified Schema: gusdb.org, though not developed around bioperl) and GMOD (Generic Model Organism Database: gmod.org) provide you with i) RDBMS storage ii) a Perl object layer iii) a web app framework Though certainly overkill for the needs you describe in the wiki, they can be customized to work in the way you describe or at least serve as a guide. | - If there's no other obvious choice already bent to BioPerl / BioPerl DB / BioSQL, I'll probably be writing the web framework in Perl's Template Toolkit. The server is Linux, Apache, mySQL (BioPerl DBs). Have you considered Perl Catalyst? It has the benefits of allowing you to work with bioperl modules naturally (it's Perl!) a choice of templating toolkits (Template Toolkit, Mason, among others) and will provide you with an almost ready to go controller/url dispatcher. | - I'll be using BioPerl objects for the persistence layer as much as possible. Where not possible I'll ask this list about my patches/additions/ugly hackery. | - I'll be discussing my back office tables like "users" that don't belong in bioperl-db; and my questions about new tables that might belong there on the BioSQL-l mailing list. | - I'm not a computer language zealot (usually), so I'm open to out-of-the-box ideas from anyone. | - I'm a biology newb with a long Perl/database/web/e-commerce background, so please feel free to point out any bio idiocy I engage in. | | Thanks for your time, | | j | +----] Good luck, Fernan From epsteinj at mail.nih.gov Fri May 26 18:46:32 2006 From: epsteinj at mail.nih.gov (Epstein, Jonathan A (NIH/NICHD) [E]) Date: Fri, 26 May 2006 14:46:32 -0400 Subject: [Bioperl-l] URGENT: Bio::Graphics::Panel make the ruler havenegative (-) position numbering imagemap making In-Reply-To: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com> References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com> Message-ID: <42504F69898FE546B3F0238C9BD032750915F8@NIHCESMLBX7.nih.gov> While this is being discussed and we have Lincoln's attention; in example 4 on the Biographics Howto: http://stein.cshl.org/genome_informatics/BioGraphics/Graphics-HOWTO.html how can one assign directional arrows to the graded segments which represent the BLAST hits? I.e., is there a glyph type which is both an 'arrow' and a 'graded_segment'? What other techniques do you recommend for associating directionality with these hits? Thanks®ards, Jonathan From jobradovic at gmail.com Fri May 26 20:55:35 2006 From: jobradovic at gmail.com (Jelena Obradovic) Date: Fri, 26 May 2006 13:55:35 -0700 Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file In-Reply-To: <002601c680fa$644635a0$15327e82@pyrimidine> References: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com> <002601c680fa$644635a0$15327e82@pyrimidine> Message-ID: <286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com> Hi guys, I tried with the other formats, and it works fine with "withrefm" format but not with "withref". Thanks a lot for your reponse. Cheers, Jelena On 5/26/06, Chris Fields wrote: > > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic > > Sent: Friday, May 26, 2006 11:47 AM > > To: Bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file > > > > Hi there, > > > > I have tried loading enzyme list from a file REBASE bairoch.605 using > > Bio::Restriction::IO; > > > > 1. But for some reason the number of enzymes in the list is always 532 > > which is a default set of enzymes in enzyme collection. > > > > Is there any known issue with this module or a workaround? > > > > And here is the code I have been using: > > > > my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- > > format=>"Bairoch") > > || die "can't load the file bairoch.605: $!"; > > my $enzymes = $re_in->read; > > print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n"; > > my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- > format=>"Bairoch"); > > should be > > my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- > format=>"bairoch"); > > Note the case change for the format; this is noted in the bug report you > submitted earlier. Bio::Restriction::IO works similarly to Bio::SeqIO ( > i.e. > requires a specific format, which I believe is case-sensitive). Judging > by > the modules in Bio/Restriction/IO directory, looks like the > Bio::Restriction::IO format should match one of the following formats: > bairoch, itype2, withrefm, and you can also build your own if needed using > the previous as examples and implementing Bio::Restriction::IO::base. > > > 2. The other problem is when trying to use format that is lower-case > > it throws an exception, but when "B" is capitalized it is ok. > > I assume it cannot load a file and does not initilize enzyme > > collection properly. > > > > Can't call method "each_enzyme" on an undefined value at > > .../cgi-bin/seq-load.pl line 51. > > My guess? The reason it works with an uppercase ('Bairoch') is that it > can't find the module and uses the default set of enzymes as a fallback. > The exception that you reported when you use lowercase ('bairoch') is real > and I reported it as a bug (there are a few I found in that module). > > You might want to try using one of the other formats if you can get the > files in the right format from REBASE. I'm looking into the bugs > specifically associated with Bio::Restriction::IO::bairoch. > > > Any thoughts? > > > > > > Thanks in advance, > > > > > > Jelena Obradovic > > jelenaob at gmail.com > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Jelena Obradovic Email: jobradovic at gmail.com From gad14 at cornell.edu Fri May 26 20:02:33 2006 From: gad14 at cornell.edu (Genevieve DeClerck) Date: Fri, 26 May 2006 16:02:33 -0400 Subject: [Bioperl-l] results problem with StandAloneBlast Message-ID: <44775ED9.4020208@cornell.edu> Hi, I'm running local blast with Bio::Tools::Run::StandAloneBlast. Everything seems to work ok up to the point of accessing the results. I am able to print the results but when I try to do more than one thing with the result, nothing is returned for the second activity.. I'd like to first sort the results into groups of results that hit the db seq once, twice, three times, etc - where the results are stored as SeqFeature objects in temporary arrays whose contents are printed sequentially to stdout when the whole sort is complete. Secondly, I need to print the results in Hit Table (i.e. -m 8) format to stdout. If I've sorted the results the sorted-results will print to screen, however when I try to print the Hit Table results nothing is returned, as if the blast results have evaporated.... and visa versa, if i comment out the part where i point my sorting subroutine to the blast results reference, my hit table results suddenly prints to screen. It's almost like the reference to the SearchIO obj that holds the StandAloneBlast results is lost after one use?? (I'm beginning to think there is something naive about the way I'm using references?..) Here's an abbreviated version of my code: my $ref_seq_objs; # ref to array of Sequence obj's my $genome_seq; # fasta containing 1 genomic sequence my @params = ('program' => 'blastn', 'database' => $genome_seq, ); my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); my $blast_report = $factory->blastall($ref_seq_objs); #OK ####### ### the following 2 actions seem to be mutually exclusive. # 1) sort results into 1-hitter, 2-hitter, etc. groups of # SeqFeature objs stored in arrays. arrays are then printed # to stdout &sort_results($blast_report); # 2) print blast results &print_blast_results($blast_report); ####### sub print_blast_results{ my $report = shift; while(my $result = $report->next_result()){ while(my $hit = $result->next_hit()){ while(my $hsp = $hit->next_hsp()){ my $q_name = $hsp_q_seq_obj->display_id; print join(", ",$q_name,$hit->name,$hsp->bits)."\n"; } } } } I'm about to lose my mind on this... any assistance appreciated! Thanks, Genevieve From rvosa at sfu.ca Sun May 28 07:43:23 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Sun, 28 May 2006 00:43:23 -0700 Subject: [Bioperl-l] "Project OpenLab" (working title) In-Reply-To: <4478829F.5030508@jays.net> References: <4478829F.5030508@jays.net> Message-ID: <4479549B.5030202@sfu.ca> The TreeBaseII team (part of the cipres project: http://www.phylo.org) are working on a lab database system for storage of intermediate calculation results and data (sequence alignments, trees, taxon sets). I think what you're discussing is a bit more molecular and less phylogenetic, but it does sound similar in spirit. Rutger Jay Hannah wrote: > Hola -- > > We've been kicking around this idea for a few months now. I'm threatening to start coding. Once I do I might not sleep for a few weeks so I thought I'd solicit feedback now. :) > > "Project OpenLab": > http://omaha.pm.org/kwiki/?BioPerl > > - Does any such project already exist? > - If there's no other obvious choice already bent to BioPerl / BioPerl DB / BioSQL, I'll probably be writing the web framework in Perl's Template Toolkit. The server is Linux, Apache, mySQL (BioPerl DBs). > - I'll be using BioPerl objects for the persistence layer as much as possible. Where not possible I'll ask this list about my patches/additions/ugly hackery. > - I'll be discussing my back office tables like "users" that don't belong in bioperl-db; and my questions about new tables that might belong there on the BioSQL-l mailing list. > - I'm not a computer language zealot (usually), so I'm open to out-of-the-box ideas from anyone. > - I'm a biology newb with a long Perl/database/web/e-commerce background, so please feel free to point out any bio idiocy I engage in. > > Thanks for your time, > > j > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- ++++++++++++++++++++++++++++++++++++++++++++++++++++ Rutger Vos, PhD. candidate Department of Biological Sciences Simon Fraser University 8888 University Drive Burnaby, BC, V5A1S6 Phone: 604-291-5625 Fax: 604-291-3496 Personal site: http://www.sfu.ca/~rvosa FAB* lab: http://www.sfu.ca/~fabstar Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ ++++++++++++++++++++++++++++++++++++++++++++++++++++ From cjfields at uiuc.edu Sun May 28 13:55:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 28 May 2006 08:55:47 -0500 Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file In-Reply-To: <286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com> References: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com> <002601c680fa$644635a0$15327e82@pyrimidine> <286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com> Message-ID: Again, it's b/c 'withrefm' is a valid Restriction::IO module and 'withref' is not. Similar to the case issue you saw before with 'bairoch.' Making this more lenient would help but there are more serious issues with these modules that need to be addressed... http://www.bioperl.org/wiki/Project_priority_list#Restriction_Enzymes Chris On May 26, 2006, at 3:55 PM, Jelena Obradovic wrote: > Hi guys, I tried with the other formats, and it works fine with > "withrefm" > format but not with "withref". > > Thanks a lot for your reponse. > > Cheers, > > Jelena > > On 5/26/06, Chris Fields wrote: >> >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic >>> Sent: Friday, May 26, 2006 11:47 AM >>> To: Bioperl-l at lists.open-bio.org >>> Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file >>> >>> Hi there, >>> >>> I have tried loading enzyme list from a file REBASE bairoch.605 >>> using >>> Bio::Restriction::IO; >>> >>> 1. But for some reason the number of enzymes in the list is >>> always 532 >>> which is a default set of enzymes in enzyme collection. >>> >>> Is there any known issue with this module or a workaround? >>> >>> And here is the code I have been using: >>> >>> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- >>> format=>"Bairoch") >>> || die "can't load the file bairoch.605: $!"; >>> my $enzymes = $re_in->read; >>> print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n"; >> >> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- >> format=>"Bairoch"); >> >> should be >> >> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- >> format=>"bairoch"); >> >> Note the case change for the format; this is noted in the bug >> report you >> submitted earlier. Bio::Restriction::IO works similarly to >> Bio::SeqIO ( >> i.e. >> requires a specific format, which I believe is case-sensitive). >> Judging >> by >> the modules in Bio/Restriction/IO directory, looks like the >> Bio::Restriction::IO format should match one of the following >> formats: >> bairoch, itype2, withrefm, and you can also build your own if >> needed using >> the previous as examples and implementing Bio::Restriction::IO::base. >> >>> 2. The other problem is when trying to use format that is lower-case >>> it throws an exception, but when "B" is capitalized it is ok. >>> I assume it cannot load a file and does not initilize enzyme >>> collection properly. >>> >>> Can't call method "each_enzyme" on an undefined value at >>> .../cgi-bin/seq-load.pl line 51. >> >> My guess? The reason it works with an uppercase ('Bairoch') is >> that it >> can't find the module and uses the default set of enzymes as a >> fallback. >> The exception that you reported when you use lowercase ('bairoch') >> is real >> and I reported it as a bug (there are a few I found in that module). >> >> You might want to try using one of the other formats if you can >> get the >> files in the right format from REBASE. I'm looking into the bugs >> specifically associated with Bio::Restriction::IO::bairoch. >> >>> Any thoughts? >>> >>> >>> Thanks in advance, >>> >>> >>> Jelena Obradovic >>> jelenaob at gmail.com >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > -- > Jelena Obradovic > Email: jobradovic at gmail.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From osborne1 at optonline.net Sun May 28 15:03:37 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Sun, 28 May 2006 11:03:37 -0400 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <44775ED9.4020208@cornell.edu> Message-ID: Genevieve, Does this simplified code, without the &sort_results($blast_report) line, work? By the way, no one can really help you here because you haven't shown us all of the code. The code you are showing certainly looks OK. Brian O. On 5/26/06 4:02 PM, "Genevieve DeClerck" wrote: > &sort_results($blast_report); From simon.rayner.mlist at gmail.com Mon May 29 07:37:24 2006 From: simon.rayner.mlist at gmail.com (mailing lists) Date: Mon, 29 May 2006 15:37:24 +0800 Subject: [Bioperl-l] installation problems with bioperl-ext on x86_64 running SuSE linux Message-ID: Hello, i'm having a problem trying to install the bioperl-ext package on my system. biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align # perl Makefile.PL Writing Makefile for Bio::Ext::Align biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align # make cc -c -I./libs -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -pipe -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -fPIC -O2 -fmessage-length=0 -Wall -D_FORTIFY_SOURCE=2 -g -Wall -pipe -DVERSION=\"0.1\" -DXS_VERSION= \"0.1\" -fPIC "-I/usr/lib/perl5/5.8.7/x86_64-linux-thread-multi/CORE" -DPOSIX -DNOERROR Align.c In file included from Align.xs:12: ./libs/sw.h:1360:1: warning: "/*" within comment . . . Running Mkbootstrap for Bio::Ext::Align () chmod 644 Align.bs rm -f blib/arch/auto/Bio/Ext/Align/Align.so LD_RUN_PATH="" cc -shared -L/usr/local/lib64 Align.o -o blib/arch/auto/Bio/Ext/Align/Align.so libs/libsw.a -lm /usr/lib64/gcc/x86_64-suse-linux/4.0.2/../../../../x86_64-suse-linux/bin/ld: libs/libsw.a(aln.o): relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC libs/libsw.a: could not read symbols: Bad value collect2: ld returned 1 exit status make: *** [blib/arch/auto/Bio/Ext/Align/Align.so] Error 1 biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align # biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align # biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align # the -fPIC flag is already set in the makefile. I found a similar problem in an earlier posting with the following suggestions.... From: Aaron J. Mackey pcbi.upenn.edu> Subject: Re: compiling bioperl-ext Newsgroups: gmane.comp.lang.perl.bio.general Date: 2004-06-09 20:46:05 GMT (1 year, 50 weeks, 3 days, 3 hours and 50 minutes ago) 1) Are you starting with a clean build directory? 2) Does installing other compiled Perl modules work for you (e.g. Data::Dumper or Storable)? That's a pretty arcane error, and if the answer to #2 is "no", then I don't think we can help you. -Aaron ....In my case, both 1) and 2) are true. I installed Data::Dumper without any problems. I've found plenty of similar incidences for other sofware and it seems to relate to 32/64bit issues. Does anyone have any suggestions about how to get around this? thanks Simon Rayner From ULNJUJERYDIX at spammotel.com Mon May 29 09:46:21 2006 From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau) Date: Mon, 29 May 2006 17:46:21 +0800 Subject: [Bioperl-l] **Fwd: Re: URGENT: Bio::Graphics::Panel make the ruler have In-Reply-To: <200605261038.30380.lstein@cshl.edu> References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com> <200605261038.30380.lstein@cshl.edu> Message-ID: <5b6410e0605290246p8875c78n286caa672a55b4de@mail.gmail.com> Hi! oh it was in a slightly different header asking about the create image map feature. I am using the stable version 1.4 of bioperl now. In any case I have not added the sequence as a feature annotated seq. as I already have the bp where the TF binds (in 1-1050 numberings) so what I did was to just add graded segments based on the position. I saw that there is a scale function for the arrow glyp however, it is a multiply function, can it be hacked to take in a offset value (ie minus the scale by 1000?) cheers kevin Hi, > > For some reason I didn't see the first posting on this. In current bioperl > live, the ruler can have negative numberings - I use this routinely. You > need > to create a feature that starts in negative coordinates. What is happening > to > you when you try this? > > Lincoln > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote: > > Hi > > thanks for the help offered thus far! > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq > using > > bioperl. therefore i was asked to make the numberings as such (-1000) is > > there any way at all to do this in bioperl without changing the .pm > file? > > > > thanks guys.. > > kevin > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From shameer at ncbs.res.in Mon May 29 10:07:17 2006 From: shameer at ncbs.res.in (Shameer Khadar) Date: Mon, 29 May 2006 15:37:17 +0530 (IST) Subject: [Bioperl-l] Reg. Integrated Server / CGI to pass PDB to multiple Servers Message-ID: <49187.192.168.1.1.1148897237.squirrel@192.168.1.1> Dear All, My query may not be directly related to BioPERL, But am sure I will get some idea to move on. Some possibilities wil be available from Pise or related modules Query : --------- We have several public servers(say a,b,c). All of them will take a pdb-file as an input and process it and displays it. Now, I need to create a web page(a meta-server/integrated web-server) with three radio buttons(a,b,c) and a single input form(to accept pdb file from the users ...:( - File passing as an argument seems to be some what impossible to me). I need output as 3 links in next page. Is there any Bio-PERL module / CGI / Perl tricks to do it ? Thanks in advance, -- Shameer Khadar Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India T - 91-080-23636420-32 EXT 4241 F - 91-080-23636662/23636675 W - http://caps.ncbs.res.in -------------------------------------------------- "Refrain from illusions, insist on work and not words, patiently seek divine and scientific truth." From torsten.seemann at infotech.monash.edu.au Tue May 30 06:41:31 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 30 May 2006 16:41:31 +1000 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <44775ED9.4020208@cornell.edu> References: <44775ED9.4020208@cornell.edu> Message-ID: <447BE91B.30001@infotech.monash.edu.au> > my $ref_seq_objs; # ref to array of Sequence obj's > my $genome_seq; # fasta containing 1 genomic sequence > my @params = ('program' => 'blastn', > 'database' => $genome_seq, > ); The database parameter needs to be the same thing you would pass to the "-d" option in "blastall". I don't think you can pass a perl string here. ie. there needs to be a properly formatted set of blast indices for your genome sequence on the disk in the appropriate place. See ftp://ftp.ncbi.nlm.nih.gov/blast/documents/blast.html > my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); > my $blast_report = $factory->blastall($ref_seq_objs); #OK But I could be wrong, and $blast_report here contains a valid BLAST report. -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From sb at mrc-dunn.cam.ac.uk Tue May 30 07:59:28 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Tue, 30 May 2006 08:59:28 +0100 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <44775ED9.4020208@cornell.edu> References: <44775ED9.4020208@cornell.edu> Message-ID: <447BFB60.4000006@mrc-dunn.cam.ac.uk> Genevieve DeClerck wrote: > Hi, [snip] > If I've sorted the results the sorted-results will print to screen, > however when I try to print the Hit Table results nothing is returned, > as if the blast results have evaporated.... and visa versa, if i comment > out the part where i point my sorting subroutine to the blast results > reference, my hit table results suddenly prints to screen. [snip] > Here's an abbreviated version of my code: [snip] > ####### > ### the following 2 actions seem to be mutually exclusive. > # 1) sort results into 1-hitter, 2-hitter, etc. groups of > # SeqFeature objs stored in arrays. arrays are then printed > # to stdout > &sort_results($blast_report); > > # 2) print blast results > &print_blast_results($blast_report); > sub print_blast_results{ > my $report = shift; > while(my $result = $report->next_result()){ [snip] You didn't give us your sort_results subroutine, but is it as simple as they both use $report->next_result (and/or $result->next_hit), but you don't reset the internal counter back to the start, so the second subroutine tries to get the next_result and finds the first subroutine has already looked at the last result and so next_result returns false? From a quick look it wasn't obvious how to reset the counter. Hopefully this can be done and someone else knows how. From torsten.seemann at infotech.monash.edu.au Tue May 30 08:18:45 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 30 May 2006 18:18:45 +1000 Subject: [Bioperl-l] For CVS developers - potential pitfall with "return undef" Message-ID: <447BFFE5.8010508@infotech.monash.edu.au> FYI Bioperl developers: I just audited the bioperl-live CVS and found about 450 occurrences of "return undef". Page 199 of "Perl Best Practices" by Damian Conway, and this URL http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest: "Use return; instead of return undef; if you want to return nothing. If someone assigns the return value to an array, the latter creates an array of one value (undef), which evaluates to true. The former will correctly handle all contexts." So I'm guessing at least some of these 450 occurrences *could* result in bugs and should probably be changed. Your opinion may differ :-) -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From cjfields at uiuc.edu Tue May 30 14:07:45 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 30 May 2006 09:07:45 -0500 Subject: [Bioperl-l] For CVS developers - potential pitfall with "returnundef" In-Reply-To: <447BFFE5.8010508@infotech.monash.edu.au> Message-ID: <000c01c683f2$6ca62570$15327e82@pyrimidine> Torsten, Any way you can post a list of some/all of the offending lines or modules? Sounds like something to consider, but if the list is as large as you say we made need something (bugzilla? wiki?) to track the changes and make sure they pass tests; I'm sure a large majority will. I'm guessing Jason would want this somewhere on the project priority list or bugzilla, with a link to the actual list, but I'm not sure. Maybe start a page on the wiki for proposed code changes? Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > Sent: Tuesday, May 30, 2006 3:19 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] For CVS developers - potential pitfall with > "returnundef" > > FYI Bioperl developers: > > I just audited the bioperl-live CVS and found about 450 occurrences of > "return undef". > > Page 199 of "Perl Best Practices" by Damian Conway, and this URL > http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest: > > "Use return; instead of return undef; if you want to return nothing. If > someone assigns the return value to an array, the latter creates an > array of one value (undef), which evaluates to true. The former will > correctly handle all contexts." > > So I'm guessing at least some of these 450 occurrences *could* result in > bugs and should probably be changed. > > Your opinion may differ :-) > > -- > Dr Torsten Seemann http://www.vicbioinformatics.com > Victorian Bioinformatics Consortium, Monash University, Australia > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lstein at cshl.edu Tue May 30 14:47:48 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 30 May 2006 10:47:48 -0400 Subject: [Bioperl-l] **Fwd: Re: URGENT: Bio::Graphics::Panel make the ruler have In-Reply-To: <5b6410e0605290246p8875c78n286caa672a55b4de@mail.gmail.com> References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com> <200605261038.30380.lstein@cshl.edu> <5b6410e0605290246p8875c78n286caa672a55b4de@mail.gmail.com> Message-ID: <200605301047.49127.lstein@cshl.edu> Hi Kevin, I'm afraid that there is no offset value. You'll need the 1.51 version of bioperl to handle negative numbers properly. I understand your reluctance to upgrade just to get the Bio::Graphics functionality. You might consider checking out just the Bio/Graphics subtree and installing that. It should work on top of 1.4 Lincoln On Monday 29 May 2006 05:46, Kevin Lam Koiyau wrote: > Hi! > oh it was in a slightly different header asking about the create image map > feature. > I am using the stable version 1.4 of bioperl now. In any case I have not > added the sequence as a feature annotated seq. as I already have the bp > where the TF binds (in 1-1050 numberings) so what I did was to just add > graded segments based on the position. > I saw that there is a scale function for the arrow glyp however, it is a > multiply function, can it be hacked to take in a offset value (ie minus the > scale by 1000?) > > cheers > kevin > > > Hi, > > > For some reason I didn't see the first posting on this. In current > > bioperl live, the ruler can have negative numberings - I use this > > routinely. You need > > to create a feature that starts in negative coordinates. What is > > happening to > > you when you try this? > > > > Lincoln > > > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote: > > > Hi > > > thanks for the help offered thus far! > > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq > > > > using > > > > > bioperl. therefore i was asked to make the numberings as such (-1000) > > > is there any way at all to do this in bioperl without changing the .pm > > > > file? > > > > > thanks guys.. > > > kevin > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Lincoln D. Stein > > Cold Spring Harbor Laboratory > > 1 Bungtown Road > > Cold Spring Harbor, NY 11724 > > (516) 367-8380 (voice) > > (516) 367-8389 (fax) > > FOR URGENT MESSAGES & SCHEDULING, > > PLEASE CONTACT MY ASSISTANT, > > SANDRA MICHELSEN, AT michelse at cshl.edu > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Tue May 30 14:50:06 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 30 May 2006 09:50:06 -0500 Subject: [Bioperl-l] Bio::Restriction::IO issues Message-ID: <000f01c683f8$5771ed50$15327e82@pyrimidine> Jason, Brian, et al, I found several major issues with Bio::Restriction::IO (this popped up while bug squashing). In particular, the POD is pretty misleading. It states (directly from perldoc): SYNOPSIS use Bio::Restriction::IO; $in = Bio::Restriction::IO->new(-file => "inputfilename" , -format => 'withrefm'); $out = Bio::Restriction::IO->new(-file => ">outputfilename" , -format => 'bairoch'); my $res = $in->read; # a Bio::Restriction::EnzymeCollection $out->write($res); # or # use Bio::Restriction::IO; # # #input file format can be read from the file extension (dat|xml) # $in = Bio::Restriction::IO->newFh(-file => "inputfilename"); # $out = Bio::Restriction::IO->newFh('-format' => 'xml'); # # # World's shortest flat<->xml format converter: # print $out $_ while <$in>; So, I have found several problems with these modules. I really hate to criticize code here, as my own is pretty hacky, but I think these are things to seriously mull over: 1) Note that, though some of the lines above are commented they are still there in POD and thus present in perldoc/pod2html etc. So, judging from the above, it suggests using the script above should read in from one format and write out to another (like SeqIO). However, NONE of the current write() methods are implemented for any of the IO modules (withref, base, itype2, bairoch), so this does not happen as expected. You get the nasty thrown 'method not implemented error' instead when writing. 2) The commented statements in POD above also suggest that REBASE XML format is supported when there is no XML module. 3) The Bio::Restriction::IO::bairoch module had multiple bugs which made it unusable until I added a few small changes; it still can't handle multisite/multicut enzymes properly, so in essence it is useless until that is addressed. 4) Bio::Restriction::IO inherits from Bio::SeqIO, though I'm not sure why. Shouldn't it just inherit from Bio::Root::Root/Bio::Root::IO and make up it's own methods? I'm working on at least getting the 'bairoch' input format up and running (so at least it gets the enzymes into a Bio::Restriction::Enzyme::Collection). From this point I'm not sure where to proceed. The POD obviously needs to be corrected to reflect that writing formats is not implemented (and the bit about XML should be taken out completely); that's the easy part which I am working on and plan committing today. However, these modules don't seem to be used too frequently so I'm not sure whether it's worth spending too much time getting these up to speed at the moment (adding write methods, switching to Bio::Root::Root, etc); I have other priorities at the moment (including a way overdue ListSummary). I'm also not sure who else is (using|working) on these so I don't want to (make too many changes|step on someone else's toes), but these are, IMHO, pretty serious problems. Any thoughts? Chris Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue May 30 16:34:18 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 30 May 2006 11:34:18 -0500 Subject: [Bioperl-l] Bio::Restriction::IO changes Message-ID: <001401c68406$e71e9850$15327e82@pyrimidine> Jason, Brian, et al: I have made changes to the Bio::Restriction::IO POD to remove any reference to write functions since almost none have been implemented yet, so including this into POD is a bit misleading. At the moment, you can't write to any REBASE format except for 'base', which I found is the only one that works. And, upon further checking, even that one has issues: it looks like there are problems with multicut/multisite enzymes when writing in 'base' format which I'm not delving into ('TaqII' only displays one site when writing when it has two cut sites). I'll add this to the wiki and a bug report (enhancement) for this module. I am also removing mention of XML and 'bairoch' formats (the former isn't present and the latter is broken at the moment) and added a few things to the POD TO DO section. Rob (if you're out there somewhere in the ether), have you made any more changes to these modules that need to be committed? Didn't know if any of these issues have already been addressed/changed etc. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From jelenaob at gmail.com Tue May 30 04:58:35 2006 From: jelenaob at gmail.com (Jelena Obradovic) Date: Mon, 29 May 2006 21:58:35 -0700 Subject: [Bioperl-l] Bio::Graphic::Panel backgroud color Message-ID: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com> Hello everybody, does anybody know how to remove the background color of the Panel. Currently, I am not adding anything to it, so I can troubleshot the problem, and I have tried setting up all color attributes I could find to the panel, but no luck. Whatever I do, I get the BLUE border of the panel. Has anybody faced the same problem? Thanks in advance, Jelena And here is the code I am currently using: ----------------------------------------------------------------------------------------------------------- my $panel = Bio::Graphics::Panel->new(-length => $prim_seq->length() + 200, -width => 800, -pad_left => 10, -pad_right => 10, -key_color => 'white', -bgcolor => 'white', -gridcolor=>'black', -fgcolor => 'black', -grid => 0, ); my ($url,$map,$mapname) = $panel->image_and_map( #-root=>'$root_url' , -url => '/tmpimages'); #make clickable image print $cgi->img({-src=>$url,-usemap=>"#$mapname"}); print $map; ----------------------------------------------------------------------------------------------------------- From jelenaob at gmail.com Tue May 30 04:58:35 2006 From: jelenaob at gmail.com (Jelena Obradovic) Date: Mon, 29 May 2006 21:58:35 -0700 Subject: [Bioperl-l] Bio::Graphic::Panel backgroud color Message-ID: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com> Hello everybody, does anybody know how to remove the background color of the Panel. Currently, I am not adding anything to it, so I can troubleshot the problem, and I have tried setting up all color attributes I could find to the panel, but no luck. Whatever I do, I get the BLUE border of the panel. Has anybody faced the same problem? Thanks in advance, Jelena And here is the code I am currently using: ----------------------------------------------------------------------------------------------------------- my $panel = Bio::Graphics::Panel->new(-length => $prim_seq->length() + 200, -width => 800, -pad_left => 10, -pad_right => 10, -key_color => 'white', -bgcolor => 'white', -gridcolor=>'black', -fgcolor => 'black', -grid => 0, ); my ($url,$map,$mapname) = $panel->image_and_map( #-root=>'$root_url' , -url => '/tmpimages'); #make clickable image print $cgi->img({-src=>$url,-usemap=>"#$mapname"}); print $map; ----------------------------------------------------------------------------------------------------------- From luciap at sas.upenn.edu Tue May 30 18:49:48 2006 From: luciap at sas.upenn.edu (Lucia Peixoto) Date: Tue, 30 May 2006 14:49:48 -0400 Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function Message-ID: <1149014988.447c93cc01761@128.91.55.38> Hi I am here again, I finally got to write the "collapse nodes" function and have a couple of questions. In order to collpase any node $node, I first have to get the parent which I can do as $parent=$node->ancestor and then the children as: @children=$node->get_all_Descendents (or should I use each descendent?) Then before deleting $node I have to assign all its children to $parent, and here is where I am kind of confussed. Can I use the add_Descendent function for this? I've been tryig to write something like this: foreach $child (@children){ $parent=add_Descendent->$child; } but this doesn't work and I think it is because I don't have any idea of what I am doing any suggestions? thanks Lucia Peixoto Department of Biology,SAS University of Pennsylvania From rvosa at sfu.ca Tue May 30 18:52:52 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Tue, 30 May 2006 11:52:52 -0700 Subject: [Bioperl-l] For CVS developers - potential pitfall with "returnundef" In-Reply-To: <000c01c683f2$6ca62570$15327e82@pyrimidine> References: <000c01c683f2$6ca62570$15327e82@pyrimidine> Message-ID: <447C9484.9030102@sfu.ca> Although I agree with the sentiment of following PBP, I'm not so sure changing 'return undef' to 'return' *now* will fix any bugs without introducing new, subtle ones. Chris Fields wrote: > Torsten, > > Any way you can post a list of some/all of the offending lines or modules? > Sounds like something to consider, but if the list is as large as you say we > made need something (bugzilla? wiki?) to track the changes and make sure > they pass tests; I'm sure a large majority will. > > I'm guessing Jason would want this somewhere on the project priority list or > bugzilla, with a link to the actual list, but I'm not sure. Maybe start a > page on the wiki for proposed code changes? > > Chris > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann >> Sent: Tuesday, May 30, 2006 3:19 AM >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] For CVS developers - potential pitfall with >> "returnundef" >> >> FYI Bioperl developers: >> >> I just audited the bioperl-live CVS and found about 450 occurrences of >> "return undef". >> >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest: >> >> "Use return; instead of return undef; if you want to return nothing. If >> someone assigns the return value to an array, the latter creates an >> array of one value (undef), which evaluates to true. The former will >> correctly handle all contexts." >> >> So I'm guessing at least some of these 450 occurrences *could* result in >> bugs and should probably be changed. >> >> Your opinion may differ :-) >> >> -- >> Dr Torsten Seemann http://www.vicbioinformatics.com >> Victorian Bioinformatics Consortium, Monash University, Australia >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- ++++++++++++++++++++++++++++++++++++++++++++++++++++ Rutger Vos, PhD. candidate Department of Biological Sciences Simon Fraser University 8888 University Drive Burnaby, BC, V5A1S6 Phone: 604-291-5625 Fax: 604-291-3496 Personal site: http://www.sfu.ca/~rvosa FAB* lab: http://www.sfu.ca/~fabstar Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ ++++++++++++++++++++++++++++++++++++++++++++++++++++ From luciap at sas.upenn.edu Tue May 30 20:11:52 2006 From: luciap at sas.upenn.edu (Lucia Peixoto) Date: Tue, 30 May 2006 16:11:52 -0400 Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function In-Reply-To: References: Message-ID: <1149019912.447ca7085124e@128.91.55.38> Hi OK that was silly, but what I have in my code is what you just wrote But the problem is that if I write $parent->add_Descendent($child) it tells me that I am calling the method "ass_Descendent" on an undefined value (but I did define $parent before??) So here it goes the code so far: use Bio::TreeIO; my $in = new Bio::TreeIO(-file => 'Test2.tre', -format => 'newick'); my $out = new Bio::TreeIO(-file => '>mytree.out', -format => 'newick'); while( my $tree = $in->next_tree ) { foreach my $node ( grep { ! $_->is_Leaf() } $tree->get_nodes() ) { my $bootstrap=$node->_creation_id; if ($bootstrap < 70 ){ my $parent = $node->ancestor; my @children=$node->get_all_Descendents; foreach my $child (@children){ $parent->add_Descendent($child); } ........ eventually I'll add (once I assigned the children to the parent succesfully): $tree->remove_Node($node); } } $out->write_tree($tree); } Quoting aaron.j.mackey at gsk.com: > > foreach $child (@children){ > > $parent=add_Descendent->$child; > > } > > I think what you want is $parent->add_Descendent($child) > > -Aaron > Lucia Peixoto Department of Biology,SAS University of Pennsylvania From jason.stajich at duke.edu Tue May 30 20:30:56 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue, 30 May 2006 16:30:56 -0400 Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function In-Reply-To: <1149019912.447ca7085124e@128.91.55.38> References: <1149019912.447ca7085124e@128.91.55.38> Message-ID: <6B175FC0-F9D4-4658-AF9D-23D7F1C1B241@duke.edu> you need to special case the root - it won't have an ancestor. just protect the my $parent = $node->ancestor with an if statement as I did below On May 30, 2006, at 4:11 PM, Lucia Peixoto wrote: > Hi > OK that was silly, but what I have in my code is what you just wrote > But the problem is that if I write > > $parent->add_Descendent($child) > > it tells me that I am calling the method "ass_Descendent" on an > undefined value > (but I did define $parent before??) > > So here it goes the code so far: > > use Bio::TreeIO; > my $in = new Bio::TreeIO(-file => 'Test2.tre', > -format => 'newick'); > my $out = new Bio::TreeIO(-file => '>mytree.out', > -format => 'newick'); > while( my $tree = $in->next_tree ) { > foreach my $node ( grep { ! $_->is_Leaf() } $tree->get_nodes() ) { > my $bootstrap=$node->_creation_id; > > if ($bootstrap < 70 ){ > >>> if( my $parent = $node->ancestor ) { > my @children=$node->get_all_Descendents; > foreach my $child (@children){ > $parent->add_Descendent($child); > } } > > ........ > > eventually I'll add (once I assigned the children to the parent > succesfully): > $tree->remove_Node($node); > > } > } > $out->write_tree($tree); > } > > Quoting aaron.j.mackey at gsk.com: > >>> foreach $child (@children){ >>> $parent=add_Descendent->$child; >>> } >> >> I think what you want is $parent->add_Descendent($child) >> >> -Aaron >> > > > Lucia Peixoto > Department of Biology,SAS > University of Pennsylvania > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cjfields at uiuc.edu Tue May 30 21:40:18 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 30 May 2006 16:40:18 -0500 Subject: [Bioperl-l] For CVS developers - potential pitfallwith "returnundef" In-Reply-To: <447C9484.9030102@sfu.ca> Message-ID: <001801c68431$a586b2d0$15327e82@pyrimidine> Agreed, though I think these changes should be implemented at some point (Conway's argument here makes sense and it is nice for Torsten to check this out). If proper tests are written then any changes resulting in errors should be picked up by checking the appropriate test suite, though I know it doesn't absolutely guarantee it. ; P Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Rutger Vos > Sent: Tuesday, May 30, 2006 1:53 PM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith > "returnundef" > > Although I agree with the sentiment of following PBP, I'm not so sure > changing 'return undef' to 'return' *now* will fix any bugs without > introducing new, subtle ones. > > Chris Fields wrote: > > Torsten, > > > > Any way you can post a list of some/all of the offending lines or > modules? > > Sounds like something to consider, but if the list is as large as you > say we > > made need something (bugzilla? wiki?) to track the changes and make sure > > they pass tests; I'm sure a large majority will. > > > > I'm guessing Jason would want this somewhere on the project priority > list or > > bugzilla, with a link to the actual list, but I'm not sure. Maybe start > a > > page on the wiki for proposed code changes? > > > > Chris > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > >> Sent: Tuesday, May 30, 2006 3:19 AM > >> To: bioperl-l at lists.open-bio.org > >> Subject: [Bioperl-l] For CVS developers - potential pitfall with > >> "returnundef" > >> > >> FYI Bioperl developers: > >> > >> I just audited the bioperl-live CVS and found about 450 occurrences of > >> "return undef". > >> > >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL > >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest: > >> > >> "Use return; instead of return undef; if you want to return nothing. If > >> someone assigns the return value to an array, the latter creates an > >> array of one value (undef), which evaluates to true. The former will > >> correctly handle all contexts." > >> > >> So I'm guessing at least some of these 450 occurrences *could* result > in > >> bugs and should probably be changed. > >> > >> Your opinion may differ :-) > >> > >> -- > >> Dr Torsten Seemann http://www.vicbioinformatics.com > >> Victorian Bioinformatics Consortium, Monash University, Australia > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > -- > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > Rutger Vos, PhD. candidate > Department of Biological Sciences > Simon Fraser University > 8888 University Drive > Burnaby, BC, V5A1S6 > Phone: 604-291-5625 > Fax: 604-291-3496 > Personal site: http://www.sfu.ca/~rvosa > FAB* lab: http://www.sfu.ca/~fabstar > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rvosa at sfu.ca Tue May 30 21:58:25 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Tue, 30 May 2006 14:58:25 -0700 Subject: [Bioperl-l] For CVS developers - potential pitfallwith"returnundef" In-Reply-To: <001901c68433$026b1ad0$15327e82@pyrimidine> References: <001901c68433$026b1ad0$15327e82@pyrimidine> Message-ID: <447CC001.4050000@sfu.ca> I've been following the perl6 mailing lists for a while now. I think this time around it won't really take that long (one year?) for pugs/perl6 stacks to become more than just toys. I think especially large projects, like bioperl, will really benefit from the improved OO implementation in perl6, so it might be of interest to at least fantasize about it. Chris Fields wrote: > Ha! Or may be the 'nonexistent' bioperl-experimental. Wonder what'll > happen once Perl6 comes to term? > > -CJF > > >> -----Original Message----- >> From: Rutger Vos [mailto:rvosa at sfu.ca] >> Sent: Tuesday, May 30, 2006 4:48 PM >> To: Chris Fields >> Subject: Re: [Bioperl-l] For CVS developers - potential >> pitfallwith"returnundef" >> >> Surely this will all sort itself out in bioperl6 ;-) >> >> Chris Fields wrote: >> >>> Agreed, though I think these changes should be implemented at some point >>> (Conway's argument here makes sense and it is nice for Torsten to check >>> >> this >> >>> out). If proper tests are written then any changes resulting in errors >>> should be picked up by checking the appropriate test suite, though I >>> >> know it >> >>> doesn't absolutely guarantee it. ; P >>> >>> Chris >>> >>> >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Rutger Vos >>>> Sent: Tuesday, May 30, 2006 1:53 PM >>>> To: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith >>>> "returnundef" >>>> >>>> Although I agree with the sentiment of following PBP, I'm not so sure >>>> changing 'return undef' to 'return' *now* will fix any bugs without >>>> introducing new, subtle ones. >>>> >>>> Chris Fields wrote: >>>> >>>> >>>>> Torsten, >>>>> >>>>> Any way you can post a list of some/all of the offending lines or >>>>> >>>>> >>>> modules? >>>> >>>> >>>>> Sounds like something to consider, but if the list is as large as you >>>>> >>>>> >>>> say we >>>> >>>> >>>>> made need something (bugzilla? wiki?) to track the changes and make >>>>> >> sure >> >>>>> they pass tests; I'm sure a large majority will. >>>>> >>>>> I'm guessing Jason would want this somewhere on the project priority >>>>> >>>>> >>>> list or >>>> >>>> >>>>> bugzilla, with a link to the actual list, but I'm not sure. Maybe >>>>> >> start >> >>>> a >>>> >>>> >>>>> page on the wiki for proposed code changes? >>>>> >>>>> Chris >>>>> >>>>> >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann >>>>>> Sent: Tuesday, May 30, 2006 3:19 AM >>>>>> To: bioperl-l at lists.open-bio.org >>>>>> Subject: [Bioperl-l] For CVS developers - potential pitfall with >>>>>> "returnundef" >>>>>> >>>>>> FYI Bioperl developers: >>>>>> >>>>>> I just audited the bioperl-live CVS and found about 450 occurrences >>>>>> >> of >> >>>>>> "return undef". >>>>>> >>>>>> Page 199 of "Perl Best Practices" by Damian Conway, and this URL >>>>>> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html >>>>>> >> suggest: >> >>>>>> "Use return; instead of return undef; if you want to return nothing. >>>>>> >> If >> >>>>>> someone assigns the return value to an array, the latter creates an >>>>>> array of one value (undef), which evaluates to true. The former will >>>>>> correctly handle all contexts." >>>>>> >>>>>> So I'm guessing at least some of these 450 occurrences *could* result >>>>>> >>>>>> >>>> in >>>> >>>> >>>>>> bugs and should probably be changed. >>>>>> >>>>>> Your opinion may differ :-) >>>>>> >>>>>> -- >>>>>> Dr Torsten Seemann http://www.vicbioinformatics.com >>>>>> Victorian Bioinformatics Consortium, Monash University, Australia >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> -- >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> Rutger Vos, PhD. candidate >>>> Department of Biological Sciences >>>> Simon Fraser University >>>> 8888 University Drive >>>> Burnaby, BC, V5A1S6 >>>> Phone: 604-291-5625 >>>> Fax: 604-291-3496 >>>> Personal site: http://www.sfu.ca/~rvosa >>>> FAB* lab: http://www.sfu.ca/~fabstar >>>> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> >>> >>> >> -- >> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Rutger Vos, PhD. candidate >> Department of Biological Sciences >> Simon Fraser University >> 8888 University Drive >> Burnaby, BC, V5A1S6 >> Phone: 604-291-5625 >> Fax: 604-291-3496 >> Personal site: http://www.sfu.ca/~rvosa >> FAB* lab: http://www.sfu.ca/~fabstar >> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > > > > > > -- ++++++++++++++++++++++++++++++++++++++++++++++++++++ Rutger Vos, PhD. candidate Department of Biological Sciences Simon Fraser University 8888 University Drive Burnaby, BC, V5A1S6 Phone: 604-291-5625 Fax: 604-291-3496 Personal site: http://www.sfu.ca/~rvosa FAB* lab: http://www.sfu.ca/~fabstar Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ ++++++++++++++++++++++++++++++++++++++++++++++++++++ From cjfields at uiuc.edu Tue May 30 22:08:26 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 30 May 2006 17:08:26 -0500 Subject: [Bioperl-l] For CVS developers - potentialpitfallwith"returnundef" In-Reply-To: <447CC001.4050000@sfu.ca> Message-ID: <001a01c68435$93135a50$15327e82@pyrimidine> Agreed. I would say, probably 6-12 months time, might be a good idea to try getting something actually started, maybe under the 'bioperl-experimental' title Jason has mentioned. One could always try getting a Bio::Root-like object going in Pugs/Perl6 as a starter and work up from there, with emphasis on key areas (seq. parsing, so on). CJF > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Rutger Vos > Sent: Tuesday, May 30, 2006 4:58 PM > To: bioperl list > Subject: Re: [Bioperl-l] For CVS developers - > potentialpitfallwith"returnundef" > > I've been following the perl6 mailing lists for a while now. I think > this time around it won't really take that long (one year?) for > pugs/perl6 stacks to become more than just toys. I think especially > large projects, like bioperl, will really benefit from the improved OO > implementation in perl6, so it might be of interest to at least > fantasize about it. > > Chris Fields wrote: > > Ha! Or may be the 'nonexistent' bioperl-experimental. Wonder what'll > > happen once Perl6 comes to term? > > > > -CJF > > > > > >> -----Original Message----- > >> From: Rutger Vos [mailto:rvosa at sfu.ca] > >> Sent: Tuesday, May 30, 2006 4:48 PM > >> To: Chris Fields > >> Subject: Re: [Bioperl-l] For CVS developers - potential > >> pitfallwith"returnundef" > >> > >> Surely this will all sort itself out in bioperl6 ;-) > >> > >> Chris Fields wrote: > >> > >>> Agreed, though I think these changes should be implemented at some > point > >>> (Conway's argument here makes sense and it is nice for Torsten to > check > >>> > >> this > >> > >>> out). If proper tests are written then any changes resulting in > errors > >>> should be picked up by checking the appropriate test suite, though I > >>> > >> know it > >> > >>> doesn't absolutely guarantee it. ; P > >>> > >>> Chris > >>> > >>> > >>> > >>>> -----Original Message----- > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>> bounces at lists.open-bio.org] On Behalf Of Rutger Vos > >>>> Sent: Tuesday, May 30, 2006 1:53 PM > >>>> To: bioperl-l at lists.open-bio.org > >>>> Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith > >>>> "returnundef" > >>>> > >>>> Although I agree with the sentiment of following PBP, I'm not so sure > >>>> changing 'return undef' to 'return' *now* will fix any bugs without > >>>> introducing new, subtle ones. > >>>> > >>>> Chris Fields wrote: > >>>> > >>>> > >>>>> Torsten, > >>>>> > >>>>> Any way you can post a list of some/all of the offending lines or > >>>>> > >>>>> > >>>> modules? > >>>> > >>>> > >>>>> Sounds like something to consider, but if the list is as large as > you > >>>>> > >>>>> > >>>> say we > >>>> > >>>> > >>>>> made need something (bugzilla? wiki?) to track the changes and make > >>>>> > >> sure > >> > >>>>> they pass tests; I'm sure a large majority will. > >>>>> > >>>>> I'm guessing Jason would want this somewhere on the project priority > >>>>> > >>>>> > >>>> list or > >>>> > >>>> > >>>>> bugzilla, with a link to the actual list, but I'm not sure. Maybe > >>>>> > >> start > >> > >>>> a > >>>> > >>>> > >>>>> page on the wiki for proposed code changes? > >>>>> > >>>>> Chris > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>>>> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > >>>>>> Sent: Tuesday, May 30, 2006 3:19 AM > >>>>>> To: bioperl-l at lists.open-bio.org > >>>>>> Subject: [Bioperl-l] For CVS developers - potential pitfall with > >>>>>> "returnundef" > >>>>>> > >>>>>> FYI Bioperl developers: > >>>>>> > >>>>>> I just audited the bioperl-live CVS and found about 450 occurrences > >>>>>> > >> of > >> > >>>>>> "return undef". > >>>>>> > >>>>>> Page 199 of "Perl Best Practices" by Damian Conway, and this URL > >>>>>> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html > >>>>>> > >> suggest: > >> > >>>>>> "Use return; instead of return undef; if you want to return > nothing. > >>>>>> > >> If > >> > >>>>>> someone assigns the return value to an array, the latter creates an > >>>>>> array of one value (undef), which evaluates to true. The former > will > >>>>>> correctly handle all contexts." > >>>>>> > >>>>>> So I'm guessing at least some of these 450 occurrences *could* > result > >>>>>> > >>>>>> > >>>> in > >>>> > >>>> > >>>>>> bugs and should probably be changed. > >>>>>> > >>>>>> Your opinion may differ :-) > >>>>>> > >>>>>> -- > >>>>>> Dr Torsten Seemann http://www.vicbioinformatics.com > >>>>>> Victorian Bioinformatics Consortium, Monash University, Australia > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>>>> > >>>>>> > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>> -- > >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>> Rutger Vos, PhD. candidate > >>>> Department of Biological Sciences > >>>> Simon Fraser University > >>>> 8888 University Drive > >>>> Burnaby, BC, V5A1S6 > >>>> Phone: 604-291-5625 > >>>> Fax: 604-291-3496 > >>>> Personal site: http://www.sfu.ca/~rvosa > >>>> FAB* lab: http://www.sfu.ca/~fabstar > >>>> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>> > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> > >>> > >>> > >>> > >>> > >> -- > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> Rutger Vos, PhD. candidate > >> Department of Biological Sciences > >> Simon Fraser University > >> 8888 University Drive > >> Burnaby, BC, V5A1S6 > >> Phone: 604-291-5625 > >> Fax: 604-291-3496 > >> Personal site: http://www.sfu.ca/~rvosa > >> FAB* lab: http://www.sfu.ca/~fabstar > >> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> > > > > > > > > > > > > > > -- > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > Rutger Vos, PhD. candidate > Department of Biological Sciences > Simon Fraser University > 8888 University Drive > Burnaby, BC, V5A1S6 > Phone: 604-291-5625 > Fax: 604-291-3496 > Personal site: http://www.sfu.ca/~rvosa > FAB* lab: http://www.sfu.ca/~fabstar > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ULNJUJERYDIX at spammotel.com Wed May 31 03:45:12 2006 From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau) Date: Wed, 31 May 2006 11:45:12 +0800 Subject: [Bioperl-l] SOLVED Bio::Graphics::Panel make ruler have neg values Message-ID: <5b6410e0605302045x5c420674x6f898a8a2973991a@mail.gmail.com> I am so sorry for the truncated email accidentally hit reply. if anyone is interested i have opted to change change line 161 of arrow.pm in Perl/site/lib/Bio/Graphics/Glyph/arrow.pm in linux its /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Glyph/arrow.pm $gd->string($font,$middle,$center+$a2-1,$label,$font_color) to $gd->string($font,$middle,$center+$a2-1,$label-1000,$font_color) just for this one-off use. strangely I found at line 112 for ver 1.51 bioperl in arrow.pm a hidden option for coords offset? my $relative_coords_offset = $self->option('relative_coords_offset'); $relative_coords_offset = 1 unless defined $relative_coords_offset; but entering the option -relative_coords_offset=>1000 in the arrow glyphs didn't do anything... Hi! > oh it was in a slightly different header asking about the create image map > feature. > I am using the stable version 1.4 of bioperl now. In any case I have not > added the sequence as a feature annotated seq. as I already have the bp > where the TF binds (in 1-1050 numberings) so what I did was to just add > graded segments based on the position. > I saw that there is a scale function for the arrow glyp however, it is a > multiply function, can it be hacked to take in a offset value (ie minus > the > scale by 1000?) > > cheers > kevin > > > Hi, > > > > For some reason I didn't see the first posting on this. In current > bioperl > > live, the ruler can have negative numberings - I use this routinely. You > > need > > to create a feature that starts in negative coordinates. What is > happening > > to > > you when you try this? > > > > Lincoln > > > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote: > > > Hi > > > thanks for the help offered thus far! > > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq > > using > > > bioperl. therefore i was asked to make the numberings as such (-1000) > is > > > there any way at all to do this in bioperl without changing the .pm > > file? > > > > > > thanks guys.. > > > kevin > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Lincoln D. Stein > > Cold Spring Harbor Laboratory > > 1 Bungtown Road > > Cold Spring Harbor, NY 11724 > > (516) 367-8380 (voice) > > (516) 367-8389 (fax) > > FOR URGENT MESSAGES & SCHEDULING, > > PLEASE CONTACT MY ASSISTANT, > > SANDRA MICHELSEN, AT michelse at cshl.edu > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From sb at mrc-dunn.cam.ac.uk Wed May 31 08:40:08 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Wed, 31 May 2006 09:40:08 +0100 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <447C7985.9000404@cornell.edu> References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk> <447C7985.9000404@cornell.edu> Message-ID: <447D5668.7070500@mrc-dunn.cam.ac.uk> Genevieve DeClerck wrote: > Thanks for your comment Sendu, it was very helpful. I think this must be > what's going on.. I am using $blast_report->next_result in both > subroutines. It appears that analyzing the blast results first w/ my > sort subroutine empties (?) the $blast_result object so that when I try > to print, there is nothing left to print. (and visa-versa when I print > first then try to sort). > So, from the looks of things, using next_result has the effect of > popping the Bio::Search::Result::ResultI objects off of the SearchIO > blast report object?? Not quite. It's more or less exactly like opening a file and then trying to read it all twice like this: open(FILE, "file"); while () { print # prints each line in the file } while () { print # never happens, we never enter this while loop } To get the second while loop to print anything we need to say seek(FILE, 0, 0) before it. Or in the first while loop store each line in an array, and then make the second loop a foreach through that array. > It seems I could get around this by making a copy of the blast report by > setting it to another new variable...(not the most elegant solution) but > I'm having trouble with this... > > If I do: > > my $blast_report_copy = $blast_report; > > I'm just copying the reference to the SearchIO blast result, so it > doesn't help me. How can I make another physical copy of this blast > result object? Seems like a simple thing but how to do it is escaping me. Not really a good idea, and it may not work anyway if the object contains a filehandle. But for a simple object you might recursively loop through the data structure and copy each element out into a similar data structure. > But better yet, the way to go is to 'reset the counter,' or to find a > way to look at/print/sort the results without removing data from the > blast result object. How is this done though?? It would be rather nice if this worked: my $blast_report = $factory->blastall($ref_seq_objs); my $blast_fh = $blast_report->fh(); while (<$blast_fh>) { # $_ is a ResultI object, use as normal } seek($blast_fh, 0, 0); # this would be great, but does it work? while <$blast_fh>) { # go through the results again in your second subroutine } An alternative hacky way of doing it, which may also not work, would be to go through your $blast_report as normal, but then before going through it a second time, say my $fh = $blast_report->_fh; seek($fh, 0, 0); Finally, the most sensible way (assuming bioperl provides no methods of its own for this) of solving the problem is, the first time you go through each next_result, next_hit and next_hsp, just store the returned objects in an array of arrays of arrays. Then the second time get the objects from your array structure instead of with the method calls. From heikki at sanbi.ac.za Wed May 31 10:55:18 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 31 May 2006 12:55:18 +0200 Subject: [Bioperl-l] =?iso-8859-1?q?For_CVS_developers_-_potential_pitfall?= =?iso-8859-1?q?with_=22returnundef=22?= In-Reply-To: <001801c68431$a586b2d0$15327e82@pyrimidine> References: <001801c68431$a586b2d0$15327e82@pyrimidine> Message-ID: <200605311255.19166.heikki@sanbi.ac.za> In my opinion the sooner the bugs get exposed the better. It is much more likely that there is a well hidden bug caused by assigning accidentally undef into an one element array that someone intentionally writing code that expects that behaviour! I removed (but did not commit yet) all undefs from my old Bio::Variation code and could not see any differences in the test output. Let's remove them! -Heikki On Tuesday 30 May 2006 23:40, Chris Fields wrote: > Agreed, though I think these changes should be implemented at some point > (Conway's argument here makes sense and it is nice for Torsten to check > this out). If proper tests are written then any changes resulting in > errors should be picked up by checking the appropriate test suite, though I > know it doesn't absolutely guarantee it. ; P > > Chris > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Rutger Vos > > Sent: Tuesday, May 30, 2006 1:53 PM > > To: bioperl-l at lists.open-bio.org > > Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith > > "returnundef" > > > > Although I agree with the sentiment of following PBP, I'm not so sure > > changing 'return undef' to 'return' *now* will fix any bugs without > > introducing new, subtle ones. > > > > Chris Fields wrote: > > > Torsten, > > > > > > Any way you can post a list of some/all of the offending lines or > > > > modules? > > > > > Sounds like something to consider, but if the list is as large as you > > > > say we > > > > > made need something (bugzilla? wiki?) to track the changes and make > > > sure they pass tests; I'm sure a large majority will. > > > > > > I'm guessing Jason would want this somewhere on the project priority > > > > list or > > > > > bugzilla, with a link to the actual list, but I'm not sure. Maybe > > > start > > > > a > > > > > page on the wiki for proposed code changes? > > > > > > Chris > > > > > >> -----Original Message----- > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > > >> Sent: Tuesday, May 30, 2006 3:19 AM > > >> To: bioperl-l at lists.open-bio.org > > >> Subject: [Bioperl-l] For CVS developers - potential pitfall with > > >> "returnundef" > > >> > > >> FYI Bioperl developers: > > >> > > >> I just audited the bioperl-live CVS and found about 450 occurrences of > > >> "return undef". > > >> > > >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL > > >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html > > >> suggest: > > >> > > >> "Use return; instead of return undef; if you want to return nothing. > > >> If someone assigns the return value to an array, the latter creates an > > >> array of one value (undef), which evaluates to true. The former will > > >> correctly handle all contexts." > > >> > > >> So I'm guessing at least some of these 450 occurrences *could* result > > > > in > > > > >> bugs and should probably be changed. > > >> > > >> Your opinion may differ :-) > > >> > > >> -- > > >> Dr Torsten Seemann http://www.vicbioinformatics.com > > >> Victorian Bioinformatics Consortium, Monash University, Australia > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Rutger Vos, PhD. candidate > > Department of Biological Sciences > > Simon Fraser University > > 8888 University Drive > > Burnaby, BC, V5A1S6 > > Phone: 604-291-5625 > > Fax: 604-291-3496 > > Personal site: http://www.sfu.ca/~rvosa > > FAB* lab: http://www.sfu.ca/~fabstar > > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of the Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Wed May 31 10:44:28 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 31 May 2006 12:44:28 +0200 Subject: [Bioperl-l] Bio::Restriction::IO issues In-Reply-To: <000f01c683f8$5771ed50$15327e82@pyrimidine> References: <000f01c683f8$5771ed50$15327e82@pyrimidine> Message-ID: <200605311244.29187.heikki@sanbi.ac.za> Chris, Thanks for stepping in. I feel partly responsible here because I originally changed some of Rob's code but have not followed up since. There have not been active development on these modules so do not worry about stepping on anyone's toes. -Heikki On Tuesday 30 May 2006 16:50, Chris Fields wrote: > Jason, Brian, et al, > > I found several major issues with Bio::Restriction::IO (this popped up > while bug squashing). In particular, the POD is pretty misleading. It > states (directly from perldoc): > > SYNOPSIS > use Bio::Restriction::IO; > > $in = Bio::Restriction::IO->new(-file => "inputfilename" , > -format => 'withrefm'); > $out = Bio::Restriction::IO->new(-file => ">outputfilename" , > -format => 'bairoch'); > my $res = $in->read; # a Bio::Restriction::EnzymeCollection > $out->write($res); > > # or > > # use Bio::Restriction::IO; > # > # #input file format can be read from the file extension (dat|xml) > # $in = Bio::Restriction::IO->newFh(-file => "inputfilename"); > # $out = Bio::Restriction::IO->newFh('-format' => 'xml'); > # > # # World's shortest flat<->xml format converter: > # print $out $_ while <$in>; > > So, I have found several problems with these modules. I really hate to > criticize code here, as my own is pretty hacky, but I think these are > things to seriously mull over: > > 1) Note that, though some of the lines above are commented they are > still there in POD and thus present in perldoc/pod2html etc. So, judging > from the above, it suggests using the script above should read in from one > format and write out to another (like SeqIO). However, NONE of the current > write() methods are implemented for any of the IO modules (withref, base, > itype2, bairoch), so this does not happen as expected. You get the nasty > thrown 'method not implemented error' instead when writing. > 2) The commented statements in POD above also suggest that REBASE XML > format is supported when there is no XML module. > 3) The Bio::Restriction::IO::bairoch module had multiple bugs which > made it unusable until I added a few small changes; it still can't handle > multisite/multicut enzymes properly, so in essence it is useless until that > is addressed. > 4) Bio::Restriction::IO inherits from Bio::SeqIO, though I'm not sure > why. Shouldn't it just inherit from Bio::Root::Root/Bio::Root::IO and make > up it's own methods? > > I'm working on at least getting the 'bairoch' input format up and running > (so at least it gets the enzymes into a > Bio::Restriction::Enzyme::Collection). From this point I'm not sure where > to proceed. The POD obviously needs to be corrected to reflect that > writing formats is not implemented (and the bit about XML should be taken > out completely); that's the easy part which I am working on and plan > committing today. However, these modules don't seem to be used too > frequently so I'm not sure whether it's worth spending too much time > getting these up to speed at the moment (adding write methods, switching to > Bio::Root::Root, etc); I have other priorities at the moment (including a > way overdue ListSummary). I'm also not sure who else is (using|working) on > these so I don't want to (make too many changes|step on someone else's > toes), but these are, IMHO, pretty serious problems. > > Any thoughts? > > Chris > > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of the Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From cjfields at uiuc.edu Wed May 31 13:10:00 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 08:10:00 -0500 Subject: [Bioperl-l] Bio::Restriction::IO issues In-Reply-To: <200605311244.29187.heikki@sanbi.ac.za> References: <000f01c683f8$5771ed50$15327e82@pyrimidine> <200605311244.29187.heikki@sanbi.ac.za> Message-ID: Heikki, I mainly just changed a few things so no one would get the wrong ideas from POD (that they write format as well) and added a few things to the TO DO. I also added a warning to Bio::Restriction::IO::bairoch for the multisite/multicut issue. Besides that I haven't done much to them. I also added a bit to the Project Priority List in case someone wants to take it up. I may tinker with it but it's not really high on my priority list. I've been pretty busy getting the ListSummaries back up to speed (very busy mail lists since the last one) and am writing/testing a new interface to NCBI EUtilities which I may donate at some in the next few months or so. Chris On May 31, 2006, at 5:44 AM, Heikki Lehvaslaiho wrote: > > Chris, > > Thanks for stepping in. I feel partly responsible here because I > originally > changed some of Rob's code but have not followed up since. > > There have not been active development on these modules so do not > worry about > stepping on anyone's toes. > > -Heikki > > On Tuesday 30 May 2006 16:50, Chris Fields wrote: >> Jason, Brian, et al, >> >> I found several major issues with Bio::Restriction::IO (this >> popped up >> while bug squashing). In particular, the POD is pretty >> misleading. It >> states (directly from perldoc): >> >> SYNOPSIS >> use Bio::Restriction::IO; >> >> $in = Bio::Restriction::IO->new(-file => "inputfilename" , >> -format => 'withrefm'); >> $out = Bio::Restriction::IO->new(-file => ">outputfilename" , >> -format => 'bairoch'); >> my $res = $in->read; # a Bio::Restriction::EnzymeCollection >> $out->write($res); >> >> # or >> >> # use Bio::Restriction::IO; >> # >> # #input file format can be read from the file extension >> (dat|xml) >> # $in = Bio::Restriction::IO->newFh(-file => >> "inputfilename"); >> # $out = Bio::Restriction::IO->newFh('-format' => 'xml'); >> # >> # # World's shortest flat<->xml format converter: >> # print $out $_ while <$in>; >> >> So, I have found several problems with these modules. I really >> hate to >> criticize code here, as my own is pretty hacky, but I think these are >> things to seriously mull over: >> >> 1) Note that, though some of the lines above are commented they are >> still there in POD and thus present in perldoc/pod2html etc. So, >> judging >> from the above, it suggests using the script above should read in >> from one >> format and write out to another (like SeqIO). However, NONE of >> the current >> write() methods are implemented for any of the IO modules >> (withref, base, >> itype2, bairoch), so this does not happen as expected. You get >> the nasty >> thrown 'method not implemented error' instead when writing. >> 2) The commented statements in POD above also suggest that REBASE XML >> format is supported when there is no XML module. >> 3) The Bio::Restriction::IO::bairoch module had multiple bugs which >> made it unusable until I added a few small changes; it still can't >> handle >> multisite/multicut enzymes properly, so in essence it is useless >> until that >> is addressed. >> 4) Bio::Restriction::IO inherits from Bio::SeqIO, though I'm not sure >> why. Shouldn't it just inherit from Bio::Root::Root/Bio::Root::IO >> and make >> up it's own methods? >> >> I'm working on at least getting the 'bairoch' input format up and >> running >> (so at least it gets the enzymes into a >> Bio::Restriction::Enzyme::Collection). From this point I'm not >> sure where >> to proceed. The POD obviously needs to be corrected to reflect that >> writing formats is not implemented (and the bit about XML should >> be taken >> out completely); that's the easy part which I am working on and plan >> committing today. However, these modules don't seem to be used too >> frequently so I'm not sure whether it's worth spending too much time >> getting these up to speed at the moment (adding write methods, >> switching to >> Bio::Root::Root, etc); I have other priorities at the moment >> (including a >> way overdue ListSummary). I'm also not sure who else is (using| >> working) on >> these so I don't want to (make too many changes|step on someone >> else's >> toes), but these are, IMHO, pretty serious problems. >> >> Any thoughts? >> >> Chris >> >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of the Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jay at jays.net Wed May 31 13:07:10 2006 From: jay at jays.net (Jay Hannah) Date: Wed, 31 May 2006 08:07:10 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl Message-ID: <447D94FE.8090305@jays.net> http://www.bioperl.org/wiki/Bptutorial.pl I think I just partially fulfilled this TODO: TODO: check if the POD is in the Wiki yet, and if not, put it here? I used Pod::Simple::Wiki (format 'mediawiki') to burn bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it the wiki page via my web browser. (Is that proper procedure? Is the plan to just do that manually from time to time as the document changes?) Now what? Should there be a new link on the far left of bioperl.org called "Tutorial"? It's an amazing document. IMHO it should be listed prominently on bioperl.org. HTH, j From osborne1 at optonline.net Wed May 31 13:58:01 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 31 May 2006 09:58:01 -0400 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <447D94FE.8090305@jays.net> Message-ID: Jay, Excellent! Now we need to answer a few more questions for ourselves: - Do we remove the file bptutorial.pl from the package now? I'd say yes, we don't want to have to maintain two bptutorials. - What do we do with the script part of bptutorial.pl? It certainly could be excised and put into the examples/ directory, for example, but this would break a few of the paths that are being used. - A link to bptutorial? Or a link to the existing tutorials page? http://www.bioperl.org/wiki/Tutorials. Any thoughts on these? Brian O. On 5/31/06 9:07 AM, "Jay Hannah" wrote: > http://www.bioperl.org/wiki/Bptutorial.pl > > I think I just partially fulfilled this TODO: > > TODO: check if the POD is in the Wiki yet, and if not, put it here? > > I used Pod::Simple::Wiki (format 'mediawiki') to burn > bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it the > wiki page via my web browser. (Is that proper procedure? Is the plan to just > do that manually from time to time as the document changes?) > > Now what? > > Should there be a new link on the far left of bioperl.org called "Tutorial"? > > It's an amazing document. IMHO it should be listed prominently on bioperl.org. > > HTH, > > j > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From luciap at sas.upenn.edu Wed May 31 14:06:13 2006 From: luciap at sas.upenn.edu (Lucia Peixoto) Date: Wed, 31 May 2006 10:06:13 -0400 Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function In-Reply-To: <6B175FC0-F9D4-4658-AF9D-23D7F1C1B241@duke.edu> References: <1149019912.447ca7085124e@128.91.55.38> <6B175FC0-F9D4-4658-AF9D-23D7F1C1B241@duke.edu> Message-ID: <1149084373.447da2d5c5339@128.91.55.38> Hi Thanks a couple more questions why is the bootstrap value stored as the node id? Is that right? also, in the add_descendant method, how do you set the $ignoreoverwrite parameter to true? Lucia Quoting Jason Stajich : > you need to special case the root - it won't have an ancestor. just > protect the my $parent = $node->ancestor with an if statement as I > did below > > On May 30, 2006, at 4:11 PM, Lucia Peixoto wrote: > > > Hi > > OK that was silly, but what I have in my code is what you just wrote > > But the problem is that if I write > > > > $parent->add_Descendent($child) > > > > it tells me that I am calling the method "ass_Descendent" on an > > undefined value > > (but I did define $parent before??) > > > > So here it goes the code so far: > > > > use Bio::TreeIO; > > my $in = new Bio::TreeIO(-file => 'Test2.tre', > > -format => 'newick'); > > my $out = new Bio::TreeIO(-file => '>mytree.out', > > -format => 'newick'); > > while( my $tree = $in->next_tree ) { > > foreach my $node ( grep { ! $_->is_Leaf() } $tree->get_nodes() ) { > > my $bootstrap=$node->_creation_id; > > > > if ($bootstrap < 70 ){ > > >>> if( my $parent = $node->ancestor ) { > > my @children=$node->get_all_Descendents; > > foreach my $child (@children){ > > $parent->add_Descendent($child); > > } > } > > > > ........ > > > > eventually I'll add (once I assigned the children to the parent > > succesfully): > > $tree->remove_Node($node); > > > > } > > } > > $out->write_tree($tree); > > } > > > > Quoting aaron.j.mackey at gsk.com: > > > >>> foreach $child (@children){ > >>> $parent=add_Descendent->$child; > >>> } > >> > >> I think what you want is $parent->add_Descendent($child) > >> > >> -Aaron > >> > > > > > > Lucia Peixoto > > Department of Biology,SAS > > University of Pennsylvania > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > Lucia Peixoto Department of Biology,SAS University of Pennsylvania From sb at mrc-dunn.cam.ac.uk Wed May 31 14:56:49 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Wed, 31 May 2006 15:56:49 +0100 Subject: [Bioperl-l] For CVS developers - potential pitfallwith "returnundef" In-Reply-To: <200605311255.19166.heikki@sanbi.ac.za> References: <001801c68431$a586b2d0$15327e82@pyrimidine> <200605311255.19166.heikki@sanbi.ac.za> Message-ID: <447DAEB1.4040509@mrc-dunn.cam.ac.uk> Heikki Lehvaslaiho wrote: > In my opinion the sooner the bugs get exposed the better. It is much more > likely that there is a well hidden bug caused by assigning accidentally undef > into an one element array that someone intentionally writing code that > expects that behaviour! > > I removed (but did not commit yet) all undefs from my old Bio::Variation code > and could not see any differences in the test output. > > Let's remove them! Just looking for all return undef;s isn't enough. It's entirely possible to do something like: my $return_value; { # do something that assigns to return_value on success # on failure, just do nothing } return $return_value; The bioperl docs will typically explicitly state that undef is returned, and under what circumstance. If a user suffers from the undef-into-array-problem, yes it can be slightly unexpected, but lots of unexpected things will happen when you don't use a method correctly, as per the docs! Fixing the return of undef is either a job that shouldn't be done, or a much harder job than expected. From bernd.web at gmail.com Wed May 31 14:30:30 2006 From: bernd.web at gmail.com (Bernd Web) Date: Wed, 31 May 2006 16:30:30 +0200 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: References: <447D94FE.8090305@jays.net> Message-ID: <716af09c0605310730o7de20489m674a07b5a928039d@mail.gmail.com> Hi, I am not sure to what extent bptutorial will be removed, but I actually like having bptutorial.pl in my BioPerl base for reference. regards, Bernd On 5/31/06, Brian Osborne wrote: > Jay, > > Excellent! Now we need to answer a few more questions for ourselves: > > - Do we remove the file bptutorial.pl from the package now? I'd say yes, we > don't want to have to maintain two bptutorials. > > - What do we do with the script part of bptutorial.pl? It certainly could be > excised and put into the examples/ directory, for example, but this would > break a few of the paths that are being used. > > - A link to bptutorial? Or a link to the existing tutorials page? > http://www.bioperl.org/wiki/Tutorials. > > Any thoughts on these? > > > Brian O. > > > On 5/31/06 9:07 AM, "Jay Hannah" wrote: > > > http://www.bioperl.org/wiki/Bptutorial.pl > > > > I think I just partially fulfilled this TODO: > > > > TODO: check if the POD is in the Wiki yet, and if not, put it here? > > > > I used Pod::Simple::Wiki (format 'mediawiki') to burn > > bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it the > > wiki page via my web browser. (Is that proper procedure? Is the plan to just > > do that manually from time to time as the document changes?) > > > > Now what? > > > > Should there be a new link on the far left of bioperl.org called "Tutorial"? > > > > It's an amazing document. IMHO it should be listed prominently on bioperl.org. > > > > HTH, > > > > j > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From lstein at cshl.edu Wed May 31 16:03:13 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 31 May 2006 12:03:13 -0400 Subject: [Bioperl-l] For CVS developers - potential pitfallwith "returnundef" In-Reply-To: <200605311255.19166.heikki@sanbi.ac.za> References: <001801c68431$a586b2d0$15327e82@pyrimidine> <200605311255.19166.heikki@sanbi.ac.za> Message-ID: <200605311203.13922.lstein@cshl.edu> I'm afraid that everything depends on the context. If the subroutine is documented to return a single scalar, then returning undef is appropriate. If the subroutine is documented to return "false" on failure, then one must call return (or "return ()" ). Changing all the return undefs to return is going to expose hidden bugs in the code written by people who are using BioPerl. While I agree wholeheartedly with the proposed audit, I think we need to expect that people are going to complain. Lincoln On Wednesday 31 May 2006 06:55, Heikki Lehvaslaiho wrote: > In my opinion the sooner the bugs get exposed the better. It is much more > likely that there is a well hidden bug caused by assigning accidentally > undef into an one element array that someone intentionally writing code > that expects that behaviour! > > I removed (but did not commit yet) all undefs from my old Bio::Variation > code and could not see any differences in the test output. > > Let's remove them! > > -Heikki > > On Tuesday 30 May 2006 23:40, Chris Fields wrote: > > Agreed, though I think these changes should be implemented at some point > > (Conway's argument here makes sense and it is nice for Torsten to check > > this out). If proper tests are written then any changes resulting in > > errors should be picked up by checking the appropriate test suite, though > > I know it doesn't absolutely guarantee it. ; P > > > > Chris > > > > > -----Original Message----- > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > bounces at lists.open-bio.org] On Behalf Of Rutger Vos > > > Sent: Tuesday, May 30, 2006 1:53 PM > > > To: bioperl-l at lists.open-bio.org > > > Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith > > > "returnundef" > > > > > > Although I agree with the sentiment of following PBP, I'm not so sure > > > changing 'return undef' to 'return' *now* will fix any bugs without > > > introducing new, subtle ones. > > > > > > Chris Fields wrote: > > > > Torsten, > > > > > > > > Any way you can post a list of some/all of the offending lines or > > > > > > modules? > > > > > > > Sounds like something to consider, but if the list is as large as you > > > > > > say we > > > > > > > made need something (bugzilla? wiki?) to track the changes and make > > > > sure they pass tests; I'm sure a large majority will. > > > > > > > > I'm guessing Jason would want this somewhere on the project priority > > > > > > list or > > > > > > > bugzilla, with a link to the actual list, but I'm not sure. Maybe > > > > start > > > > > > a > > > > > > > page on the wiki for proposed code changes? > > > > > > > > Chris > > > > > > > >> -----Original Message----- > > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > > > >> Sent: Tuesday, May 30, 2006 3:19 AM > > > >> To: bioperl-l at lists.open-bio.org > > > >> Subject: [Bioperl-l] For CVS developers - potential pitfall with > > > >> "returnundef" > > > >> > > > >> FYI Bioperl developers: > > > >> > > > >> I just audited the bioperl-live CVS and found about 450 occurrences > > > >> of "return undef". > > > >> > > > >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL > > > >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html > > > >> suggest: > > > >> > > > >> "Use return; instead of return undef; if you want to return nothing. > > > >> If someone assigns the return value to an array, the latter creates > > > >> an array of one value (undef), which evaluates to true. The former > > > >> will correctly handle all contexts." > > > >> > > > >> So I'm guessing at least some of these 450 occurrences *could* > > > >> result > > > > > > in > > > > > > >> bugs and should probably be changed. > > > >> > > > >> Your opinion may differ :-) > > > >> > > > >> -- > > > >> Dr Torsten Seemann http://www.vicbioinformatics.com > > > >> Victorian Bioinformatics Consortium, Monash University, Australia > > > >> > > > >> _______________________________________________ > > > >> Bioperl-l mailing list > > > >> Bioperl-l at lists.open-bio.org > > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > Rutger Vos, PhD. candidate > > > Department of Biological Sciences > > > Simon Fraser University > > > 8888 University Drive > > > Burnaby, BC, V5A1S6 > > > Phone: 604-291-5625 > > > Fax: 604-291-3496 > > > Personal site: http://www.sfu.ca/~rvosa > > > FAB* lab: http://www.sfu.ca/~fabstar > > > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Wed May 31 16:34:54 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 11:34:54 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: Message-ID: <001201c684d0$263c5530$15327e82@pyrimidine> Brian, Jay, I think it would be nice to have the tutorial prominently displayed somehow (Jay's suggestion), with a link provided via the tutorials page. Hopefully this will help with the bioperl newbies. Jay, looks like there are still some weird formatting issues with the bptutorial wiki page, something which I ran into before when getting the Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or more spaces preceding a line denotes code for some reason). Not much you can do in these cases except remove the extra spaces in those spots. Looking good though! Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Brian Osborne > Sent: Wednesday, May 31, 2006 8:58 AM > To: Jay Hannah; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl > > Jay, > > Excellent! Now we need to answer a few more questions for ourselves: > > - Do we remove the file bptutorial.pl from the package now? I'd say yes, > we > don't want to have to maintain two bptutorials. > > - What do we do with the script part of bptutorial.pl? It certainly could > be > excised and put into the examples/ directory, for example, but this would > break a few of the paths that are being used. > > - A link to bptutorial? Or a link to the existing tutorials page? > http://www.bioperl.org/wiki/Tutorials. > > Any thoughts on these? > > > Brian O. > > > On 5/31/06 9:07 AM, "Jay Hannah" wrote: > > > http://www.bioperl.org/wiki/Bptutorial.pl > > > > I think I just partially fulfilled this TODO: > > > > TODO: check if the POD is in the Wiki yet, and if not, put it here? > > > > I used Pod::Simple::Wiki (format 'mediawiki') to burn > > bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it > the > > wiki page via my web browser. (Is that proper procedure? Is the plan to > just > > do that manually from time to time as the document changes?) > > > > Now what? > > > > Should there be a new link on the far left of bioperl.org called > "Tutorial"? > > > > It's an amazing document. IMHO it should be listed prominently on > bioperl.org. > > > > HTH, > > > > j > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Wed May 31 16:44:31 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 11:44:31 -0500 Subject: [Bioperl-l] For CVS developers - potential pitfallwith"returnundef" In-Reply-To: <200605311203.13922.lstein@cshl.edu> Message-ID: <001301c684d1$7e849fd0$15327e82@pyrimidine> My feeling is the test suite 'should' pick up a large majority of problems if changes are made to these lines, the quotes there indicating the utopian idea that the tests are all written well (I believe 99% of the tests are, BTW). You can always try the changes (wholesale or on smaller chunks of code), see if they pass tests on different OS's using 'make/nmake test', revert the ones that didn't pass, etc. It's a matter of someone willing to try it out. I think the original argument proposed here (originating from Damian Conway and 'Perl Best Practices') is maybe using 'return undef' is something we shouldn't be doing since this can lead to subtle errors itself. Not that everything we do is considered 'a good practice' by any means. If I remember correctly from 'OOPerl', Conway doesn't like combined get/setters either (he prefers separate getters and setters); we use the 'bad' combined version predominately in Bioperl. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Lincoln Stein > Sent: Wednesday, May 31, 2006 11:03 AM > To: bioperl-l at lists.open-bio.org > Cc: Heikki Lehvaslaiho > Subject: Re: [Bioperl-l] For CVS developers - potential > pitfallwith"returnundef" > > I'm afraid that everything depends on the context. If the subroutine is > documented to return a single scalar, then returning undef is appropriate. > If > the subroutine is documented to return "false" on failure, then one must > call > return (or "return ()" ). > > Changing all the return undefs to return is going to expose hidden bugs in > the > code written by people who are using BioPerl. While I agree wholeheartedly > with the proposed audit, I think we need to expect that people are going > to > complain. > > Lincoln > > > On Wednesday 31 May 2006 06:55, Heikki Lehvaslaiho wrote: > > In my opinion the sooner the bugs get exposed the better. It is much > more > > likely that there is a well hidden bug caused by assigning accidentally > > undef into an one element array that someone intentionally writing code > > that expects that behaviour! > > > > I removed (but did not commit yet) all undefs from my old Bio::Variation > > code and could not see any differences in the test output. > > > > Let's remove them! > > > > -Heikki > > > > On Tuesday 30 May 2006 23:40, Chris Fields wrote: > > > Agreed, though I think these changes should be implemented at some > point > > > (Conway's argument here makes sense and it is nice for Torsten to > check > > > this out). If proper tests are written then any changes resulting in > > > errors should be picked up by checking the appropriate test suite, > though > > > I know it doesn't absolutely guarantee it. ; P > > > > > > Chris > > > > > > > -----Original Message----- > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > bounces at lists.open-bio.org] On Behalf Of Rutger Vos > > > > Sent: Tuesday, May 30, 2006 1:53 PM > > > > To: bioperl-l at lists.open-bio.org > > > > Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith > > > > "returnundef" > > > > > > > > Although I agree with the sentiment of following PBP, I'm not so > sure > > > > changing 'return undef' to 'return' *now* will fix any bugs without > > > > introducing new, subtle ones. > > > > > > > > Chris Fields wrote: > > > > > Torsten, > > > > > > > > > > Any way you can post a list of some/all of the offending lines or > > > > > > > > modules? > > > > > > > > > Sounds like something to consider, but if the list is as large as > you > > > > > > > > say we > > > > > > > > > made need something (bugzilla? wiki?) to track the changes and > make > > > > > sure they pass tests; I'm sure a large majority will. > > > > > > > > > > I'm guessing Jason would want this somewhere on the project > priority > > > > > > > > list or > > > > > > > > > bugzilla, with a link to the actual list, but I'm not sure. Maybe > > > > > start > > > > > > > > a > > > > > > > > > page on the wiki for proposed code changes? > > > > > > > > > > Chris > > > > > > > > > >> -----Original Message----- > > > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann > > > > >> Sent: Tuesday, May 30, 2006 3:19 AM > > > > >> To: bioperl-l at lists.open-bio.org > > > > >> Subject: [Bioperl-l] For CVS developers - potential pitfall with > > > > >> "returnundef" > > > > >> > > > > >> FYI Bioperl developers: > > > > >> > > > > >> I just audited the bioperl-live CVS and found about 450 > occurrences > > > > >> of "return undef". > > > > >> > > > > >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL > > > > >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html > > > > >> suggest: > > > > >> > > > > >> "Use return; instead of return undef; if you want to return > nothing. > > > > >> If someone assigns the return value to an array, the latter > creates > > > > >> an array of one value (undef), which evaluates to true. The > former > > > > >> will correctly handle all contexts." > > > > >> > > > > >> So I'm guessing at least some of these 450 occurrences *could* > > > > >> result > > > > > > > > in > > > > > > > > >> bugs and should probably be changed. > > > > >> > > > > >> Your opinion may differ :-) > > > > >> > > > > >> -- > > > > >> Dr Torsten Seemann http://www.vicbioinformatics.com > > > > >> Victorian Bioinformatics Consortium, Monash University, Australia > > > > >> > > > > >> _______________________________________________ > > > > >> Bioperl-l mailing list > > > > >> Bioperl-l at lists.open-bio.org > > > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > _______________________________________________ > > > > > Bioperl-l mailing list > > > > > Bioperl-l at lists.open-bio.org > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > -- > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > Rutger Vos, PhD. candidate > > > > Department of Biological Sciences > > > > Simon Fraser University > > > > 8888 University Drive > > > > Burnaby, BC, V5A1S6 > > > > Phone: 604-291-5625 > > > > Fax: 604-291-3496 > > > > Personal site: http://www.sfu.ca/~rvosa > > > > FAB* lab: http://www.sfu.ca/~fabstar > > > > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Wed May 31 14:59:53 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 31 May 2006 10:59:53 -0400 Subject: [Bioperl-l] For CVS developers - potential pitfallwith "returnundef" In-Reply-To: <200605311255.19166.heikki@sanbi.ac.za> References: <001801c68431$a586b2d0$15327e82@pyrimidine> <200605311255.19166.heikki@sanbi.ac.za> Message-ID: <949F348A-391B-495D-ABCE-30BABC37FF05@gmx.net> I agree. Thanks to Torsten for the audit and Chris for stepping up. -hilmar On May 31, 2006, at 6:55 AM, Heikki Lehvaslaiho wrote: > In my opinion the sooner the bugs get exposed the better. It is > much more > likely that there is a well hidden bug caused by assigning > accidentally undef > into an one element array that someone intentionally writing code that > expects that behaviour! > > I removed (but did not commit yet) all undefs from my old > Bio::Variation code > and could not see any differences in the test output. > > Let's remove them! > > -Heikki > > On Tuesday 30 May 2006 23:40, Chris Fields wrote: >> Agreed, though I think these changes should be implemented at some >> point >> (Conway's argument here makes sense and it is nice for Torsten to >> check >> this out). If proper tests are written then any changes resulting in >> errors should be picked up by checking the appropriate test suite, >> though I >> know it doesn't absolutely guarantee it. ; P >> >> Chris >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Rutger Vos >>> Sent: Tuesday, May 30, 2006 1:53 PM >>> To: bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith >>> "returnundef" >>> >>> Although I agree with the sentiment of following PBP, I'm not so >>> sure >>> changing 'return undef' to 'return' *now* will fix any bugs without >>> introducing new, subtle ones. >>> >>> Chris Fields wrote: >>>> Torsten, >>>> >>>> Any way you can post a list of some/all of the offending lines or >>> >>> modules? >>> >>>> Sounds like something to consider, but if the list is as large >>>> as you >>> >>> say we >>> >>>> made need something (bugzilla? wiki?) to track the changes and make >>>> sure they pass tests; I'm sure a large majority will. >>>> >>>> I'm guessing Jason would want this somewhere on the project >>>> priority >>> >>> list or >>> >>>> bugzilla, with a link to the actual list, but I'm not sure. Maybe >>>> start >>> >>> a >>> >>>> page on the wiki for proposed code changes? >>>> >>>> Chris >>>> >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann >>>>> Sent: Tuesday, May 30, 2006 3:19 AM >>>>> To: bioperl-l at lists.open-bio.org >>>>> Subject: [Bioperl-l] For CVS developers - potential pitfall with >>>>> "returnundef" >>>>> >>>>> FYI Bioperl developers: >>>>> >>>>> I just audited the bioperl-live CVS and found about 450 >>>>> occurrences of >>>>> "return undef". >>>>> >>>>> Page 199 of "Perl Best Practices" by Damian Conway, and this URL >>>>> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html >>>>> suggest: >>>>> >>>>> "Use return; instead of return undef; if you want to return >>>>> nothing. >>>>> If someone assigns the return value to an array, the latter >>>>> creates an >>>>> array of one value (undef), which evaluates to true. The former >>>>> will >>>>> correctly handle all contexts." >>>>> >>>>> So I'm guessing at least some of these 450 occurrences *could* >>>>> result >>> >>> in >>> >>>>> bugs and should probably be changed. >>>>> >>>>> Your opinion may differ :-) >>>>> >>>>> -- >>>>> Dr Torsten Seemann http://www.vicbioinformatics.com >>>>> Victorian Bioinformatics Consortium, Monash University, Australia >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> Rutger Vos, PhD. candidate >>> Department of Biological Sciences >>> Simon Fraser University >>> 8888 University Drive >>> Burnaby, BC, V5A1S6 >>> Phone: 604-291-5625 >>> Fax: 604-291-3496 >>> Personal site: http://www.sfu.ca/~rvosa >>> FAB* lab: http://www.sfu.ca/~fabstar >>> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of the Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed May 31 18:08:43 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 31 May 2006 14:08:43 -0400 Subject: [Bioperl-l] For CVS developers - potential pitfallwith "returnundef" In-Reply-To: <200605311203.13922.lstein@cshl.edu> References: <001801c68431$a586b2d0$15327e82@pyrimidine> <200605311255.19166.heikki@sanbi.ac.za> <200605311203.13922.lstein@cshl.edu> Message-ID: On May 31, 2006, at 12:03 PM, Lincoln Stein wrote: > If the subroutine is documented to return "false" on failure, then > one must call > return (or "return ()" ). The problem seems to be that 'a value that evaluates to either true or false' and 'a [meaningful] value or undef' and 'a value or false' ('a value or no value) are not the same in perl. And what would/should one expect if the doc states 'true on success and false otherwise'? Maybe the documentation should also be fixed to avoid any ambiguity. I.e., avoid documenting 'a value or false' because it may be ambiguous (not only) to the less proficient. 'True or false' should imply a value being returned. Comments? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From lstein at cshl.edu Wed May 31 18:14:59 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 31 May 2006 14:14:59 -0400 Subject: [Bioperl-l] For CVS developers - potential pitfallwith "returnundef" In-Reply-To: References: <001801c68431$a586b2d0$15327e82@pyrimidine> <200605311203.13922.lstein@cshl.edu> Message-ID: <200605311415.00414.lstein@cshl.edu> If the documentation says "returns false" then I expect to be able to do this: @result = foo(); die "foo() failed" unless @result; If the documentation says "returns undef" then I expect this: @result = foo(); die "foo() failed" unless $result[0]; Lincoln On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote: > On May 31, 2006, at 12:03 PM, Lincoln Stein wrote: > > If the subroutine is documented to return "false" on failure, then > > one must call > > return (or "return ()" ). > > The problem seems to be that 'a value that evaluates to either true > or false' and 'a [meaningful] value or undef' and 'a value or > false' ('a value or no value) are not the same in perl. And what > would/should one expect if the doc states 'true on success and false > otherwise'? > > Maybe the documentation should also be fixed to avoid any ambiguity. > I.e., avoid documenting 'a value or false' because it may be > ambiguous (not only) to the less proficient. 'True or false' should > imply a value being returned. > > Comments? > > -hilmar -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From hlapp at gmx.net Wed May 31 18:31:21 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 31 May 2006 14:31:21 -0400 Subject: [Bioperl-l] For CVS developers - potential pitfallwith "returnundef" In-Reply-To: <200605311415.00414.lstein@cshl.edu> References: <001801c68431$a586b2d0$15327e82@pyrimidine> <200605311203.13922.lstein@cshl.edu> <200605311415.00414.lstein@cshl.edu> Message-ID: <241E77AE-8D1E-4708-9C4C-8A9619822DB4@gmx.net> On May 31, 2006, at 2:14 PM, Lincoln Stein wrote: > If the documentation says "returns false" then I expect to be able > to do this: > > @result = foo(); > die "foo() failed" unless @result; Except if the alternative to 'false' would be a scalar, you normally wouldn't assign it to an array, would you? I.e., I wouldn't expect this strict of a behavior from an open-source package written largely from people whose job is biological science, not programming perl knowing and following DC to the letter ... I'd rather be on the safe side and assign to a scalar. Just my $0.02 ... -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Wed May 31 18:50:30 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 13:50:30 -0500 Subject: [Bioperl-l] For CVS developers - potential pitfallwith"returnundef" In-Reply-To: <447DAEB1.4040509@mrc-dunn.cam.ac.uk> Message-ID: <001801c684e3$16e33730$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Wednesday, May 31, 2006 9:57 AM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] For CVS developers - potential > pitfallwith"returnundef" > > Heikki Lehvaslaiho wrote: > > In my opinion the sooner the bugs get exposed the better. It is much > more > > likely that there is a well hidden bug caused by assigning accidentally > undef > > into an one element array that someone intentionally writing code that > > expects that behaviour! > > > > I removed (but did not commit yet) all undefs from my old Bio::Variation > code > > and could not see any differences in the test output. > > > > Let's remove them! > > Just looking for all return undef;s isn't enough. It's entirely possible > to do something like: > > my $return_value; > { > # do something that assigns to return_value on success > # on failure, just do nothing > } > return $return_value; Agreed, though looking for these is obviously much harder. The way to get around those is: return $return_value if $return_value; return; which I've seen used in a number of get/set methods. > The bioperl docs will typically explicitly state that undef is returned, > and under what circumstance. If a user suffers from the > undef-into-array-problem, yes it can be slightly unexpected, but lots of > unexpected things will happen when you don't use a method correctly, as > per the docs! Right, but the argument you make is that code will always work as expected from the perldoc examples. My recent experiences with the Bio::Restriction::IO and Bio::Species classes show that the docs are not always up-to-date and may indicate the unimplemented intent of the author more than the actual implementation. Again, I believe a large majority of the docs are fine, but it's those few errors that made a devil's advocate of me... > Fixing the return of undef is either a job that shouldn't be done, or a > much harder job than expected. I don't think ignoring the problem is the best answer here though I agree the problem is more complicated than at first glance. Judging from code I'm trolled through a bit lately I've seen a lot of methods (mainly get/setters) that are essentially copied multiple times in the same or across similar modules to save time. You could see a scenario where, in those instances, so-called 'bad code' would spread quite quickly. I think adding a wiki page to address some of these issues would be nice, something separate from the Project Priority List. Chris _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From forward at hongyu.org Wed May 31 18:03:46 2006 From: forward at hongyu.org (Hongyu Zhang) Date: Wed, 31 May 2006 11:03:46 -0700 Subject: [Bioperl-l] New functions for SimpleAlign.pm Message-ID: <20060531110346.78xod658td8o0w0w@hongyu.org> Greetings, I am a new member in this mailing list. Nice to be here. I wrote two more functions for the alignment module SimpleAlign.pm that calculate the percentage of identity based on the shortest and longest sequence length, respectively. I also found an error in the no_residues() function that calculate the number of residues in the alignment. I am wondering whether they can be added to the official bioperl package. I've contacted the original author of this module, Heikki Lehvaslaiho, a couple of weeks ago, but haven't heard from him yet. Thanks. -- Hongyu Zhang, Ph.D. Computational biologist Ceres Inc. From cjfields at uiuc.edu Wed May 31 19:39:26 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 14:39:26 -0500 Subject: [Bioperl-l] New functions for SimpleAlign.pm In-Reply-To: <20060531110346.78xod658td8o0w0w@hongyu.org> Message-ID: <001901c684e9$ed4a1720$15327e82@pyrimidine> I added a bit to the FAQ about this: http://www.bioperl.org/wiki/FAQ#How_do_I_submit_a_patch_or_enhancement_to_Bi oPerl.3F and the HOWTO explains things a bit more directly: http://www.bioperl.org/wiki/HOWTO:SubmitPatch In brief, these need to be submitted to Bugzilla as either code enhancements (for your added methods) or bugs with the patch to the relevant code. Code enhancements probably should include some code and test cases to demonstrate usage. Patches to buggy code are checked to make sure they pass relevant tests by the core developers. Submitting it to the mail list is definitely the first step, though, so you're on the right path. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hongyu Zhang > Sent: Wednesday, May 31, 2006 1:04 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] New functions for SimpleAlign.pm > > Greetings, > > I am a new member in this mailing list. Nice to be here. > > I wrote two more functions for the alignment module SimpleAlign.pm > that calculate the percentage of identity based on the shortest and > longest sequence length, respectively. I also found an error in the > no_residues() function that calculate the number of residues in the > alignment. > > I am wondering whether they can be added to the official bioperl > package. I've contacted the original author of this module, Heikki > Lehvaslaiho, a couple of weeks ago, but haven't heard from him yet. > > Thanks. > > -- > Hongyu Zhang, Ph.D. > Computational biologist > Ceres Inc. > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Wed May 31 20:40:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 15:40:19 -0500 Subject: [Bioperl-l] For CVS developers - potential pitfallwith"returnundef" In-Reply-To: <200605311415.00414.lstein@cshl.edu> Message-ID: <002001c684f2$6fb7daf0$15327e82@pyrimidine> What about modules that have 'throw_not_implemented' statements present? Here's a list with the total for each. Some of these are interfaces (I got rid of a number that ended in 'I' or 'IO' to remove the I/IO interfaces but it misses a few). There are a number here that are implementations, though (Bio::AlignIO::maf, Bio::Restriction:IO::*), so they are technically incomplete: Instances: 1 Module : Bio::AlignIO::maf Instances: 25 Module : Bio::Assembly::Contig Instances: 2 Module : Bio::Assembly::ContigAnalysis Instances: 2 Module : Bio::Biblio::BiblioBase Instances: 4 Module : Bio::DB::Expression Instances: 2 Module : Bio::DB::Expression::geo Instances: 5 Module : Bio::DB::Flat Instances: 2 Module : Bio::DB::Query::WebQuery Instances: 17 Module : Bio::DB::SeqFeature::Store Instances: 2 Module : Bio::DB::SeqVersion Instances: 3 Module : Bio::DB::Taxonomy Instances: 1 Module : Bio::FeatureIO::bed Instances: 1 Module : Bio::Map::Marker Instances: 1 Module : Bio::MapIO::fpc Instances: 1 Module : Bio::MapIO::mapmaker Instances: 1 Module : Bio::Restriction::IO::bairoch Instances: 1 Module : Bio::Restriction::IO::itype2 Instances: 1 Module : Bio::Restriction::IO::withrefm Instances: 1 Module : Bio::Tools::Analysis::SimpleAnalysisBase Instances: 3 Module : Bio::Tools::Run::WrapperBase Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Lincoln Stein > Sent: Wednesday, May 31, 2006 1:15 PM > To: Hilmar Lapp > Cc: bioperl-l at lists.open-bio.org; Heikki Lehvaslaiho > Subject: Re: [Bioperl-l] For CVS developers - potential > pitfallwith"returnundef" > > If the documentation says "returns false" then I expect to be able to do > this: > > @result = foo(); > die "foo() failed" unless @result; > > If the documentation says "returns undef" then I expect this: > > @result = foo(); > die "foo() failed" unless $result[0]; > > Lincoln > > > On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote: > > On May 31, 2006, at 12:03 PM, Lincoln Stein wrote: > > > If the subroutine is documented to return "false" on failure, then > > > one must call > > > return (or "return ()" ). > > > > The problem seems to be that 'a value that evaluates to either true > > or false' and 'a [meaningful] value or undef' and 'a value or > > false' ('a value or no value) are not the same in perl. And what > > would/should one expect if the doc states 'true on success and false > > otherwise'? > > > > Maybe the documentation should also be fixed to avoid any ambiguity. > > I.e., avoid documenting 'a value or false' because it may be > > ambiguous (not only) to the less proficient. 'True or false' should > > imply a value being returned. > > > > Comments? > > > > -hilmar > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lstein at cshl.edu Wed May 31 21:07:06 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 31 May 2006 17:07:06 -0400 Subject: [Bioperl-l] For CVS developers - potential pitfallwith"returnundef" In-Reply-To: <002001c684f2$6fb7daf0$15327e82@pyrimidine> References: <002001c684f2$6fb7daf0$15327e82@pyrimidine> Message-ID: <200605311707.08196.lstein@cshl.edu> > Instances: 17 Module : Bio::DB::SeqFeature::Store This is intentional. Bio::DB::SeqFeature::Store is intended to be a virtual base class. The throw_not_implemented() calls are there to force developers to override the needed interface methods. If this is not the right way to do it, let me know and I'll fix it. Lincoln > Instances: 2 Module : Bio::DB::SeqVersion > Instances: 3 Module : Bio::DB::Taxonomy > Instances: 1 Module : Bio::FeatureIO::bed > Instances: 1 Module : Bio::Map::Marker > Instances: 1 Module : Bio::MapIO::fpc > Instances: 1 Module : Bio::MapIO::mapmaker > Instances: 1 Module : Bio::Restriction::IO::bairoch > Instances: 1 Module : Bio::Restriction::IO::itype2 > Instances: 1 Module : Bio::Restriction::IO::withrefm > Instances: 1 Module : Bio::Tools::Analysis::SimpleAnalysisBase > Instances: 3 Module : Bio::Tools::Run::WrapperBase > > Chris > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Lincoln Stein > > Sent: Wednesday, May 31, 2006 1:15 PM > > To: Hilmar Lapp > > Cc: bioperl-l at lists.open-bio.org; Heikki Lehvaslaiho > > Subject: Re: [Bioperl-l] For CVS developers - potential > > pitfallwith"returnundef" > > > > If the documentation says "returns false" then I expect to be able to do > > this: > > > > @result = foo(); > > die "foo() failed" unless @result; > > > > If the documentation says "returns undef" then I expect this: > > > > @result = foo(); > > die "foo() failed" unless $result[0]; > > > > Lincoln > > > > On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote: > > > On May 31, 2006, at 12:03 PM, Lincoln Stein wrote: > > > > If the subroutine is documented to return "false" on failure, then > > > > one must call > > > > return (or "return ()" ). > > > > > > The problem seems to be that 'a value that evaluates to either true > > > or false' and 'a [meaningful] value or undef' and 'a value or > > > false' ('a value or no value) are not the same in perl. And what > > > would/should one expect if the doc states 'true on success and false > > > otherwise'? > > > > > > Maybe the documentation should also be fixed to avoid any ambiguity. > > > I.e., avoid documenting 'a value or false' because it may be > > > ambiguous (not only) to the less proficient. 'True or false' should > > > imply a value being returned. > > > > > > Comments? > > > > > > -hilmar > > > > -- > > Lincoln D. Stein > > Cold Spring Harbor Laboratory > > 1 Bungtown Road > > Cold Spring Harbor, NY 11724 > > (516) 367-8380 (voice) > > (516) 367-8389 (fax) > > FOR URGENT MESSAGES & SCHEDULING, > > PLEASE CONTACT MY ASSISTANT, > > SANDRA MICHELSEN, AT michelse at cshl.edu > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From hlapp at gmx.net Wed May 31 21:21:57 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 31 May 2006 17:21:57 -0400 Subject: [Bioperl-l] For CVS developers - throw_not_implemented In-Reply-To: <002001c684f2$6fb7daf0$15327e82@pyrimidine> References: <002001c684f2$6fb7daf0$15327e82@pyrimidine> Message-ID: On May 31, 2006, at 4:40 PM, Chris Fields wrote: > What about modules that have 'throw_not_implemented' statements > present? Those are often if not always legitimate - the problem are those that don't have them but fail to override an inherited interface or abstract method. If something is not implemented what is the better way to express this other than throwing an exception? (and if it's not an interface or abstract base class, saying so in the documentation) -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed May 31 21:25:48 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 31 May 2006 17:25:48 -0400 Subject: [Bioperl-l] For CVS developers - potential pitfallwith"returnundef" In-Reply-To: <001801c684e3$16e33730$15327e82@pyrimidine> References: <001801c684e3$16e33730$15327e82@pyrimidine> Message-ID: <8AA04BF0-FA79-43CF-9FBB-310314FECD91@gmx.net> On May 31, 2006, at 2:50 PM, Chris Fields wrote: > I've seen a lot of methods (mainly get/setters) > that are essentially copied multiple times in the same or across > similar > modules to save time. You could see a scenario where, in those > instances, > so-called 'bad code' would spread quite quickly. This will usually be code generated by macros, e.g. the emacs macros for getter/setter generation for properties. If the macro generates wrong code, that's indeed pretty bad. (We've had that.) OTOH it should be spotted quickly as well. And macro changes or new macros should probably be scrutinized by all eyes watching ... -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Wed May 31 21:40:22 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 31 May 2006 16:40:22 -0500 Subject: [Bioperl-l] For CVS developers - throw_not_implemented In-Reply-To: Message-ID: <002401c684fa$d28e7640$15327e82@pyrimidine> I think, as long as it's reflected in the docs that something doesn't work (hasn't been implemented) then there's no problem. It's when the docs are misleading that we run into problems. The sticking point lies with some classes, such as IO classes (like SeqIO, or Restrict::IO, with read and write methods) where the IO base class specifies that it is possible to read and write a particular format but the actual implementation varies according to whether or not the derived class overrides the base or interface method (in other words, 'doesn't work as advertised' only in specific circumstances). I don't know how to solve this issue except to add in the docs that specific formats don't implement write() methods. Personally, I haven't had an issue with it and it probably makes no difference, but I think it needs to be pointed out. The most extreme I ran into was Bio::Restriction::IO, which had 3 out of 4 plugin modules that didn't implement the write() method but left this in the synopsis in POD: use Bio::Restriction::IO; $in = Bio::Restriction::IO->new(-file => "inputfilename" , -format => 'withrefm'); $out = Bio::Restriction::IO->new(-file => ">outputfilename" , -format => 'bairoch'); my $res = $in->read; # a Bio::Restriction::EnzymeCollection $out->write($res); # or # use Bio::Restriction::IO; # # #input file format can be read from the file extension (dat|xml) # $in = Bio::Restriction::IO->newFh(-file => "inputfilename"); # $out = Bio::Restriction::IO->newFh('-format' => 'xml'); # # # World's shortest flat<->xml format converter: # print $out $_ while <$in>; None of this code works; in fact, no XML parser even exists for these IO classes! Bio::AlignIO also has a few as well (maf and Stockholm formats don't write). Chris > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Wednesday, May 31, 2006 4:22 PM > To: Chris Fields > Cc: lstein at cshl.edu; bioperl-l at lists.open-bio.org; 'Heikki Lehvaslaiho' > Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented > > > On May 31, 2006, at 4:40 PM, Chris Fields wrote: > > > What about modules that have 'throw_not_implemented' statements > > present? > > Those are often if not always legitimate - the problem are those that > don't have them but fail to override an inherited interface or > abstract method. > > If something is not implemented what is the better way to express > this other than throwing an exception? (and if it's not an interface > or abstract base class, saying so in the documentation) > > -hilmar > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From hlapp at gmx.net Wed May 31 21:55:37 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 31 May 2006 17:55:37 -0400 Subject: [Bioperl-l] For CVS developers - throw_not_implemented In-Reply-To: <002401c684fa$d28e7640$15327e82@pyrimidine> References: <002401c684fa$d28e7640$15327e82@pyrimidine> Message-ID: This is documentation cruft resulting from copy&paste w/o later fixing it. (which isn't a justification) Note that not implementing the write is as legitimate as not implementing the read method ... It should be pointed out in the documentation though that it will depend on the actual implementation of the format whether it supports reading or writing or both. -hilmar On May 31, 2006, at 5:40 PM, Chris Fields wrote: > I think, as long as it's reflected in the docs that something > doesn't work > (hasn't been implemented) then there's no problem. It's when the > docs are > misleading that we run into problems. > > The sticking point lies with some classes, such as IO classes (like > SeqIO, > or Restrict::IO, with read and write methods) where the IO base class > specifies that it is possible to read and write a particular format > but the > actual implementation varies according to whether or not the > derived class > overrides the base or interface method (in other words, 'doesn't > work as > advertised' only in specific circumstances). I don't know how to > solve this > issue except to add in the docs that specific formats don't implement > write() methods. > > Personally, I haven't had an issue with it and it probably makes no > difference, but I think it needs to be pointed out. The most > extreme I ran > into was Bio::Restriction::IO, which had 3 out of 4 plugin modules > that > didn't implement the write() method but left this in the synopsis > in POD: > > use Bio::Restriction::IO; > > $in = Bio::Restriction::IO->new(-file => "inputfilename" , > -format => 'withrefm'); > $out = Bio::Restriction::IO->new(-file => ">outputfilename" , > -format => 'bairoch'); > my $res = $in->read; # a Bio::Restriction::EnzymeCollection > $out->write($res); > > # or > > # use Bio::Restriction::IO; > # > # #input file format can be read from the file extension (dat| > xml) > # $in = Bio::Restriction::IO->newFh(-file => "inputfilename"); > # $out = Bio::Restriction::IO->newFh('-format' => 'xml'); > # > # # World's shortest flat<->xml format converter: > # print $out $_ while <$in>; > > None of this code works; in fact, no XML parser even exists for > these IO > classes! Bio::AlignIO also has a few as well (maf and Stockholm > formats > don't write). > > Chris > > >> -----Original Message----- >> From: Hilmar Lapp [mailto:hlapp at gmx.net] >> Sent: Wednesday, May 31, 2006 4:22 PM >> To: Chris Fields >> Cc: lstein at cshl.edu; bioperl-l at lists.open-bio.org; 'Heikki >> Lehvaslaiho' >> Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented >> >> >> On May 31, 2006, at 4:40 PM, Chris Fields wrote: >> >>> What about modules that have 'throw_not_implemented' statements >>> present? >> >> Those are often if not always legitimate - the problem are those that >> don't have them but fail to override an inherited interface or >> abstract method. >> >> If something is not implemented what is the better way to express >> this other than throwing an exception? (and if it's not an interface >> or abstract base class, saying so in the documentation) >> >> -hilmar >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From slenk at emich.edu Wed May 31 21:52:13 2006 From: slenk at emich.edu (Stephen Gordon Lenk) Date: Wed, 31 May 2006 17:52:13 -0400 Subject: [Bioperl-l] For CVS developers - throw_not_implemented Message-ID: <100682f110067a83.10067a83100682f1@emich.edu> Isn't it fairly standard in OO schemes/languages to have an exception thrown if a method can't be found at the end of a search up the class hierarchy? I recall being very mad at Smalltalk because "method not found" kept biting me. C++ has pure virtual base classes that do not allow objects to be instantiated directly; they are meant to be inherited and then implemented. Perl 6 was mentioned a bit back. Is this issue addressed there? Should it be? Do the Bioperl people feed their needs into Perl 6 so that all the code effort to make Bio::Root is handled for them in the next effort by Perl 6 itself. Make the Perl 6 people solve these issues with your input, then you will not have to deal with implementing it yourselves. I'll just bet that you are not the only potential users of Perl 6 who will have to solve these issues eventually. ----- Original Message ----- From: Hilmar Lapp Date: Wednesday, May 31, 2006 5:21 pm Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented > > On May 31, 2006, at 4:40 PM, Chris Fields wrote: > > > What about modules that have 'throw_not_implemented' statements > > present? > > Those are often if not always legitimate - the problem are those > that > don't have them but fail to override an inherited interface or > abstract method. > > If something is not implemented what is the better way to express > this other than throwing an exception? (and if it's not an > interface > or abstract base class, saying so in the documentation) > > -hilmar > > -- > ========================================================= == > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > ========================================================= == > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From arareko at campus.iztacala.unam.mx Wed May 31 22:49:03 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Wed, 31 May 2006 17:49:03 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <001201c684d0$263c5530$15327e82@pyrimidine> References: <001201c684d0$263c5530$15327e82@pyrimidine> Message-ID: <447E1D5F.1050807@campus.iztacala.unam.mx> Brian, Jay, Chris, I agree with what Bernd Web said in another reply. For some people will be nice to still be able to run the script from the codebase and interact with it. I don't think it should be a lot of problem to maintain both tutorials, as long as the 'main' one is the one in the CVS tree. By reading what Jay did in order to convert it into mediawiki format, I suppose this can be easily done again for each new change to the script (again, this is just my guessing). Besides, as far as I've seen, there aren't frequent commits to the script at all. I've added a link in the left menu of the wiki. If you think it should point to the Tutorials page instead of the Bptutorial.pl page please let me know. Regards, Mauricio. Chris Fields wrote: > Brian, Jay, > > I think it would be nice to have the tutorial prominently displayed somehow > (Jay's suggestion), with a link provided via the tutorials page. Hopefully > this will help with the bioperl newbies. > > Jay, looks like there are still some weird formatting issues with the > bptutorial wiki page, something which I ran into before when getting the > Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or more > spaces preceding a line denotes code for some reason). Not much you can do > in these cases except remove the extra spaces in those spots. Looking good > though! > > Chris > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Brian Osborne >> Sent: Wednesday, May 31, 2006 8:58 AM >> To: Jay Hannah; bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl >> >> Jay, >> >> Excellent! Now we need to answer a few more questions for ourselves: >> >> - Do we remove the file bptutorial.pl from the package now? I'd say yes, >> we >> don't want to have to maintain two bptutorials. >> >> - What do we do with the script part of bptutorial.pl? It certainly could >> be >> excised and put into the examples/ directory, for example, but this would >> break a few of the paths that are being used. >> >> - A link to bptutorial? Or a link to the existing tutorials page? >> http://www.bioperl.org/wiki/Tutorials. >> >> Any thoughts on these? >> >> >> Brian O. >> >> >> On 5/31/06 9:07 AM, "Jay Hannah" wrote: >> >>> http://www.bioperl.org/wiki/Bptutorial.pl >>> >>> I think I just partially fulfilled this TODO: >>> >>> TODO: check if the POD is in the Wiki yet, and if not, put it here? >>> >>> I used Pod::Simple::Wiki (format 'mediawiki') to burn >>> bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it >> the >>> wiki page via my web browser. (Is that proper procedure? Is the plan to >> just >>> do that manually from time to time as the document changes?) >>> >>> Now what? >>> >>> Should there be a new link on the far left of bioperl.org called >> "Tutorial"? >>> It's an amazing document. IMHO it should be listed prominently on >> bioperl.org. >>> HTH, >>> >>> j >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From gad14 at cornell.edu Tue May 30 16:57:41 2006 From: gad14 at cornell.edu (Genevieve DeClerck) Date: Tue, 30 May 2006 12:57:41 -0400 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <447BFB20.40501@mrc-dunn.cam.ac.uk> References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk> Message-ID: <447C7985.9000404@cornell.edu> Thanks for your comment Sendu, it was very helpful. I think this must be what's going on.. I am using $blast_report->next_result in both subroutines. It appears that analyzing the blast results first w/ my sort subroutine empties (?) the $blast_result object so that when I try to print, there is nothing left to print. (and visa-versa when I print first then try to sort). So, from the looks of things, using next_result has the effect of popping the Bio::Search::Result::ResultI objects off of the SearchIO blast report object?? It seems I could get around this by making a copy of the blast report by setting it to another new variable...(not the most elegant solution) but I'm having trouble with this... If I do: my $blast_report_copy = $blast_report; I'm just copying the reference to the SearchIO blast result, so it doesn't help me. How can I make another physical copy of this blast result object? Seems like a simple thing but how to do it is escaping me. But better yet, the way to go is to 'reset the counter,' or to find a way to look at/print/sort the results without removing data from the blast result object. How is this done though?? Sendu and Brian, I didn't post the sort_results subroutine because it is sprawling, as is a lot of my code. The code I provided was more like an aid for my explanation of the problem.. it doesn't actually run - sorry for the confusion, I should have more clear on that. The important thing to know perhaps is that both sort_results and print_blast_results contain a foreach loop where I am using the 'next_results' method to view blast results. (And to clarify for Torsten, the blastall() is working just fine - the analysis/viewing of the results object is where I am encountering the problem.) Any other ideas would be greatly appreciated... Thank you, Genevieve Sendu Bala wrote: > Genevieve DeClerck wrote: > >> Hi, > > [snip] > >> If I've sorted the results the sorted-results will print to screen, >> however when I try to print the Hit Table results nothing is returned, >> as if the blast results have evaporated.... and visa versa, if i >> comment out the part where i point my sorting subroutine to the blast >> results reference, my hit table results suddenly prints to screen. > > [snip] > >> Here's an abbreviated version of my code: > > [snip] > >> ####### >> ### the following 2 actions seem to be mutually exclusive. >> # 1) sort results into 1-hitter, 2-hitter, etc. groups of >> # SeqFeature objs stored in arrays. arrays are then printed >> # to stdout >> &sort_results($blast_report); >> >> # 2) print blast results >> &print_blast_results($blast_report); > > >> sub print_blast_results{ >> my $report = shift; >> while(my $result = $report->next_result()){ > > [snip] > > You didn't give us your sort_results subroutine, but is it as simple as > they both use $report->next_result (and/or $result->next_hit), but you > don't reset the internal counter back to the start, so the second > subroutine tries to get the next_result and finds the first subroutine > has already looked at the last result and so next_result returns false? > > From a quick look it wasn't obvious how to reset the counter. Hopefully > this can be done and someone else knows how. > From lstein at cshl.edu Wed May 31 15:17:39 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 31 May 2006 11:17:39 -0400 Subject: [Bioperl-l] SOLVED Bio::Graphics::Panel make ruler have neg values In-Reply-To: <5b6410e0605302045x5c420674x6f898a8a2973991a@mail.gmail.com> References: <5b6410e0605302045x5c420674x6f898a8a2973991a@mail.gmail.com> Message-ID: <200605311117.41479.lstein@cshl.edu> Hi Kevin, Since you are modifying the Panel.pm source code, why don't you just go ahead and use the current Bio::Graphics development tree? Since 1.5.1 it supports negative coordinates. Here's an illustration: #!/usr/bin/perl use strict; use Bio::Graphics; use Bio::Graphics::Feature; my $whole = Bio::Graphics::Feature->new(-start=>-200,-end=>+200); my $feature = Bio::Graphics::Feature->new(-start=>-100,-end=>+100,-strand=>+1); my $panel = Bio::Graphics::Panel->new(-start=> -200, -end => +200, -width=>800, -pad_left=>10, -pad_right=>10); $panel->add_track($whole, -glyph=>'arrow', -double=>1, -tick=>2); $panel->add_track($feature, -glyph=>'box', -stranded=>1); print $panel->png; exit 0; The resulting image is attached. Lincoln On Tuesday 30 May 2006 23:45, Kevin Lam Koiyau wrote: > I am so sorry for the truncated email accidentally hit reply. > if anyone is interested i have opted to change > > change line 161 of arrow.pm in Perl/site/lib/Bio/Graphics/Glyph/arrow.pm > in linux its > /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Glyph/arrow.pm > > > $gd->string($font,$middle,$center+$a2-1,$label,$font_color) > > to > > $gd->string($font,$middle,$center+$a2-1,$label-1000,$font_color) > > just for this one-off use. > > > > strangely I found at line 112 for ver 1.51 bioperl in arrow.pm a hidden > option for coords offset? > my $relative_coords_offset = $self->option('relative_coords_offset'); > $relative_coords_offset = 1 unless defined $relative_coords_offset; > but entering the option -relative_coords_offset=>1000 in the arrow glyphs > didn't do anything... > > > > Hi! > > > oh it was in a slightly different header asking about the create image > > map feature. > > I am using the stable version 1.4 of bioperl now. In any case I have not > > added the sequence as a feature annotated seq. as I already have the bp > > where the TF binds (in 1-1050 numberings) so what I did was to just add > > graded segments based on the position. > > I saw that there is a scale function for the arrow glyp however, it is a > > multiply function, can it be hacked to take in a offset value (ie minus > > the > > scale by 1000?) > > > > cheers > > kevin > > > > > > Hi, > > > > > For some reason I didn't see the first posting on this. In current > > > > bioperl > > > > > live, the ruler can have negative numberings - I use this routinely. > > > You need > > > to create a feature that starts in negative coordinates. What is > > > > happening > > > > > to > > > you when you try this? > > > > > > Lincoln > > > > > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote: > > > > Hi > > > > thanks for the help offered thus far! > > > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq > > > > > > using > > > > > > > bioperl. therefore i was asked to make the numberings as such (-1000) > > > > is > > > > > > there any way at all to do this in bioperl without changing the .pm > > > > > > file? > > > > > > > thanks guys.. > > > > kevin > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > > > Lincoln D. Stein > > > Cold Spring Harbor Laboratory > > > 1 Bungtown Road > > > Cold Spring Harbor, NY 11724 > > > (516) 367-8380 (voice) > > > (516) 367-8389 (fax) > > > FOR URGENT MESSAGES & SCHEDULING, > > > PLEASE CONTACT MY ASSISTANT, > > > SANDRA MICHELSEN, AT michelse at cshl.edu > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu -------------- next part -------------- A non-text attachment was scrubbed... Name: negatives.png Type: image/png Size: 1065 bytes Desc: not available URL: From lstein at cshl.edu Wed May 31 16:05:47 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 31 May 2006 12:05:47 -0400 Subject: [Bioperl-l] Fwd: Re: SOLVED Bio::Graphics::Panel make ruler have neg values Message-ID: <200605311205.48122.lstein@cshl.edu> Oddly, bioperl-l listserver is holding this mail because it has "a suspicious header". I took out Kevin's email address in case it is the "spammotel" header that is bothering it. Lincoln ---------- Forwarded Message ---------- Subject: Re: [Bioperl-l] SOLVED Bio::Graphics::Panel make ruler have neg values Date: Wednesday 31 May 2006 11:17 From: Lincoln Stein To: bioperl-l at lists.open-bio.org Cc: "Kevin Lam Koiyau" Hi Kevin, Since you are modifying the Panel.pm source code, why don't you just go ahead and use the current Bio::Graphics development tree? Since 1.5.1 it supports negative coordinates. Here's an illustration: #!/usr/bin/perl use strict; use Bio::Graphics; use Bio::Graphics::Feature; my $whole = Bio::Graphics::Feature->new(-start=>-200,-end=>+200); my $feature = Bio::Graphics::Feature->new(-start=>-100,-end=>+100,-strand=>+1); my $panel = Bio::Graphics::Panel->new(-start=> -200, -end => +200, -width=>800, -pad_left=>10, -pad_right=>10); $panel->add_track($whole, -glyph=>'arrow', -double=>1, -tick=>2); $panel->add_track($feature, -glyph=>'box', -stranded=>1); print $panel->png; exit 0; The resulting image is attached. Lincoln On Tuesday 30 May 2006 23:45, Kevin Lam Koiyau wrote: > I am so sorry for the truncated email accidentally hit reply. > if anyone is interested i have opted to change > > change line 161 of arrow.pm in Perl/site/lib/Bio/Graphics/Glyph/arrow.pm > in linux its > /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Glyph/arrow.pm > > > $gd->string($font,$middle,$center+$a2-1,$label,$font_color) > > to > > $gd->string($font,$middle,$center+$a2-1,$label-1000,$font_color) > > just for this one-off use. > > > > strangely I found at line 112 for ver 1.51 bioperl in arrow.pm a hidden > option for coords offset? > my $relative_coords_offset = $self->option('relative_coords_offset'); > $relative_coords_offset = 1 unless defined $relative_coords_offset; > but entering the option -relative_coords_offset=>1000 in the arrow glyphs > didn't do anything... > > > > Hi! > > > oh it was in a slightly different header asking about the create image > > map feature. > > I am using the stable version 1.4 of bioperl now. In any case I have not > > added the sequence as a feature annotated seq. as I already have the bp > > where the TF binds (in 1-1050 numberings) so what I did was to just add > > graded segments based on the position. > > I saw that there is a scale function for the arrow glyp however, it is a > > multiply function, can it be hacked to take in a offset value (ie minus > > the > > scale by 1000?) > > > > cheers > > kevin > > > > > > Hi, > > > > > For some reason I didn't see the first posting on this. In current > > > > bioperl > > > > > live, the ruler can have negative numberings - I use this routinely. > > > You need > > > to create a feature that starts in negative coordinates. What is > > > > happening > > > > > to > > > you when you try this? > > > > > > Lincoln > > > > > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote: > > > > Hi > > > > thanks for the help offered thus far! > > > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq > > > > > > using > > > > > > > bioperl. therefore i was asked to make the numberings as such (-1000) > > > > is > > > > > > there any way at all to do this in bioperl without changing the .pm > > > > > > file? > > > > > > > thanks guys.. > > > > kevin > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > > > Lincoln D. Stein > > > Cold Spring Harbor Laboratory > > > 1 Bungtown Road > > > Cold Spring Harbor, NY 11724 > > > (516) 367-8380 (voice) > > > (516) 367-8389 (fax) > > > FOR URGENT MESSAGES & SCHEDULING, > > > PLEASE CONTACT MY ASSISTANT, > > > SANDRA MICHELSEN, AT michelse at cshl.edu > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu ------------------------------------------------------- -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu -------------- next part -------------- A non-text attachment was scrubbed... Name: negatives.png Type: image/png Size: 1065 bytes Desc: not available URL: From rvosa at sfu.ca Tue May 30 19:10:17 2006 From: rvosa at sfu.ca (Rutger Vos) Date: Tue, 30 May 2006 12:10:17 -0700 Subject: [Bioperl-l] New mailing list for Bio::Phylo Message-ID: <447C9899.5060102@sfu.ca> Dear recipients, the open bioinformatics foundation has been kind enough to host a mailing list for Bio::Phylo (http://search.cpan.org/~rvosa/Bio-Phylo/, the cpan distribution for phylogenetic analysis using perl). The scope of this list is at present fairly broad as it is both meant for user questions and development discussion on deeper integration with bioperl. You are invited to sign up at: http://lists.open-bio.org/mailman/listinfo/bio-phylo-l Best wishes, Rutger Vos -- ++++++++++++++++++++++++++++++++++++++++++++++++++++ Rutger Vos, PhD. candidate Department of Biological Sciences Simon Fraser University 8888 University Drive Burnaby, BC, V5A1S6 Phone: 604-291-5625 Fax: 604-291-3496 Personal site: http://www.sfu.ca/~rvosa FAB* lab: http://www.sfu.ca/~fabstar Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/ ++++++++++++++++++++++++++++++++++++++++++++++++++++