From shameer at ncbs.res.in Wed Aug 1 01:45:45 2007 From: shameer at ncbs.res.in (Shameer Khadar) Date: Wed, 1 Aug 2007 11:15:45 +0530 (IST) Subject: [Bioperl-l] Perl 3D OpenGL In-Reply-To: <04BCAD9E-CC25-4F0A-85B1-FBA91C64CE7D@uiuc.edu> References: <152401c7d224$8e2455b0$6e4e7c0a@HPONE> <25A5F0A3-1CC3-46B5-8976-A24C451204E7@jays.net> <04BCAD9E-CC25-4F0A-85B1-FBA91C64CE7D@uiuc.edu> Message-ID: <49637.192.168.1.1.1185947145.squirrel@mail.ncbs.res.in> Hi, Open-GL/3D contributions are always welcome !!! What about Perl-OpenGL/3D implimentation of a web-based 3D-Viewer like Jmol. http://jmol.sourceforge.net/ (So we dont need to worry about Java installation and stuffs :) develop it and deploy it in Perl - eternal happiness !!!) -- SK > > On Jul 31, 2007, at 7:00 AM, Jay Hannah wrote: > >> On Jul 29, 2007, at 4:08 PM, Grafman Productions wrote: >>> If this posting is inappropriate, please let me know - my apologies. >> >> Not at all. AFAIK this is the perfect place to discuss any >> contributions you're motivated to make to the BioPerl project. >> >>> I recently came across an article on BioPerl, and it occurred to me >>> that >>> there might be some need for 3D rendering within your BioPerl >>> project. >>> >>> I released a number of new/updated Perl OpenGL (POGL) modules this >>> year, >>> along with benchmarks that demonstrate that it performs comparably >>> to C. >>> >>> If there's a need for 3D features within BioPerl, and if I can be >>> of any >>> assistance in helping to add such features, I would enjoy the >>> opportunity. >> >> I know nothing about 3D modeling in biology, nor do I hang out with >> any protein structure folks, but 3D always sounds sexy. -grin- >> >> If you're new to bioinformatics (I certainly am) you might want to >> read this: >> >> http://en.wikipedia.org/wiki/Protein_structure >> >> Because that's probably where your 3D work would be used. Especially >> note the "Software" section, where you'll find some of the >> "competition". :) >> >> There's some cool stuff out there. I don't know what all would or >> wouldn't be time well spent in Perl / BioPerl. >> >> HTH, >> >> Jay Hannah >> http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah > > I agree that protein structure is the best place for something like > this. > > It's a wide open area as far as I'm concerned; in fact I would say > that Bio::Structure is getting pretty dated, so if anyone wants to > take it over, refactor the code, and so on I don't have a problem. > > chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Shameer Khadar Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India T - 91-080-23666001 EXT - 6251 W - http://www.ncbs.res.in From Alicia.Amadoz at uv.es Wed Aug 1 03:13:11 2007 From: Alicia.Amadoz at uv.es (Alicia Amadoz) Date: Wed, 1 Aug 2007 09:13:11 +0200 (CEST) Subject: [Bioperl-l] trying to save blast hit sequences to fasta file Message-ID: <1664224328amadoz@uv.es> Hi, I would like to save my hit sequences from a blast result in a fasta file. I am trying some things but I have problems using Bio::SearchIO and Bio::SeqIO. Hope anyone could help me with this. Here is my current code: # my $seq_out = Bio::SeqIO->new("-file" => ">$fasfilename", "-format" => "fasta"); my $seq_out = Bio::SearchIO->new("-file" => ">$fasfilename", "-format" => "fasta"); while(my $result = $blast_report->next_result()) { while(my $hit = $result->next_hit()) { while(my $hsp = $hit->next_hsp()) { my $hseq = $hsp->hit_string(); # $seq_out->write_seq($hseq); $seq_out->write_result($hseq); } } } Here the error is, ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: ResultWriter not defined. I couldn't find any kind of documentation about ResultWriter. Thanks in advance, Alicia From xianranli78 at yahoo.com.cn Wed Aug 1 04:11:53 2007 From: xianranli78 at yahoo.com.cn (Xianran Li) Date: Wed, 1 Aug 2007 16:11:53 +0800 Subject: [Bioperl-l] trying to save blast hit sequences to fasta file References: <1664224328amadoz@uv.es> Message-ID: <001101c7d413$a0d79aa0$ed07a8c0@BGI.LOCAL> The $hseq->$hsp->hit_string() will return the string of hit sequence, rather than an objective of Bio::Seq. So may be you should construct a objective firstly, then you could use $seq_out->write_seq($hseq_obj) to write the seq into a fasta file. # my $seq_out = Bio::SeqIO->new("-file" => ">$fasfilename", "-format" =>"fasta"); my $seq_out = Bio::SearchIO->new("-file" => ">$fasfilename", "-format"=> "fasta"); while(my $result = $blast_report->next_result()) { while(my $hit = $result->next_hit()) { while(my $hsp = $hit->next_hsp()) { my $hseq = $hsp->hit_string(); $hseq =~ s/-//g; #### remove the gap within the aligment my $hseq_obj = Bio::Seq->new(-display_id => $id, -seq => $hseq); # $seq_out->write_seq($hseq); $seq_out->write_result($hseq_obj); } } } Xianran ----- Original Message ----- From: "Alicia Amadoz" To: Sent: Wednesday, August 01, 2007 3:13 PM Subject: [Bioperl-l] trying to save blast hit sequences to fasta file > Hi, I would like to save my hit sequences from a blast result in a fasta > file. I am trying some things but I have problems using Bio::SearchIO > and Bio::SeqIO. Hope anyone could help me with this. Here is my current > code: > > # my $seq_out = Bio::SeqIO->new("-file" => ">$fasfilename", "-format" => > "fasta"); > my $seq_out = Bio::SearchIO->new("-file" => ">$fasfilename", "-format" > => "fasta"); > while(my $result = $blast_report->next_result()) { > while(my $hit = $result->next_hit()) { > while(my $hsp = $hit->next_hsp()) { > my $hseq = $hsp->hit_string(); > # $seq_out->write_seq($hseq); > $seq_out->write_result($hseq); > } > } > } > > Here the error is, > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: ResultWriter not defined. > > I couldn't find any kind of documentation about ResultWriter. > Thanks in advance, > Alicia > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l?????????????????????????????????????????????????????????????????'?f??????? From Alicia.Amadoz at uv.es Wed Aug 1 06:25:29 2007 From: Alicia.Amadoz at uv.es (Alicia Amadoz) Date: Wed, 1 Aug 2007 12:25:29 +0200 (CEST) Subject: [Bioperl-l] trying to save blast hit sequences to fasta file Message-ID: <5927683277amadoz@uv.es> Hi, I have tried what you suggested and I get also some errors. With this code, my $seq_out = Bio::SearchIO->new("-file" => ">$fasfilename", "-format" => "fasta"); while(my $result = $blast_report->next_result()) { while(my $hit = $result->next_hit()) { while(my $hsp = $hit->next_hsp()) { my $hseq = $hsp->hit_string(); $hseq =~ s/-//g; #### remove the gap within the aligment my $hseq_obj = Bio::Seq->new(-display_id => $id, -seq => $hseq); $seq_out->write_seq($hseq_obj); } } } I have the following error: Can't locate object method "write_seq" via package "Bio::SearchIO::fasta" And using write_result methog with this code, my $seq_out = Bio::SearchIO->new("-file" => ">$fasfilename", "-format" => "fasta"); while(my $result = $blast_report->next_result()) { while(my $hit = $result->next_hit()) { while(my $hsp = $hit->next_hsp()) { my $hseq = $hsp->hit_string(); $hseq =~ s/-//g; #### remove the gap within the aligment my $hseq_obj = Bio::Seq->new(-display_id => $id, -seq => $hseq); $seq_out->write_result($hseq_obj); } } } I have again this kind of error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: ResultWriter not defined. STACK: Error::throw So, what else can I try?? Thanks in advance, Alicia From neetisomaiya at gmail.com Wed Aug 1 07:28:40 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Wed, 1 Aug 2007 16:58:40 +0530 Subject: [Bioperl-l] URGENT : Problem in OMIM parser Message-ID: <764978cf0708010428q5b51d27dw13f73dc5ca8d6dc9@mail.gmail.com> I have downloaded the omim.txt file from NCBI ftp site and I am running my attached parser on this file, the parser run stops in between with this :- ------------- EXCEPTION ------------- MSG: a part/organism must be assigned STACK Bio::Phenotype::OMIM::OMIMentry::add_clinical_symptoms /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMentry.pm:566 STACK Bio::Phenotype::OMIM::OMIMparser::_finer_parse_symptoms /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMparser.pm:555 STACK Bio::Phenotype::OMIM::OMIMparser::_createOMIMentry /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMparser.pm:536 STACK Bio::Phenotype::OMIM::OMIMparser::next_phenotype /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMparser.pm:272 STACK toplevel parse_omim_original.pl:47 -------------------------------------- What is the reason for this? Can anyone guide me please. -- -Neeti Even my blood says, B positive From neetisomaiya at gmail.com Wed Aug 1 07:28:40 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Wed, 1 Aug 2007 16:58:40 +0530 Subject: [Bioperl-l] URGENT : Problem in OMIM parser Message-ID: <764978cf0708010428q5b51d27dw13f73dc5ca8d6dc9@mail.gmail.com> I have downloaded the omim.txt file from NCBI ftp site and I am running my attached parser on this file, the parser run stops in between with this :- ------------- EXCEPTION ------------- MSG: a part/organism must be assigned STACK Bio::Phenotype::OMIM::OMIMentry::add_clinical_symptoms /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMentry.pm:566 STACK Bio::Phenotype::OMIM::OMIMparser::_finer_parse_symptoms /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMparser.pm:555 STACK Bio::Phenotype::OMIM::OMIMparser::_createOMIMentry /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMparser.pm:536 STACK Bio::Phenotype::OMIM::OMIMparser::next_phenotype /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMparser.pm:272 STACK toplevel parse_omim_original.pl:47 -------------------------------------- What is the reason for this? Can anyone guide me please. -- -Neeti Even my blood says, B positive From neetisomaiya at gmail.com Wed Aug 1 07:28:40 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Wed, 1 Aug 2007 16:58:40 +0530 Subject: [Bioperl-l] URGENT : Problem in OMIM parser Message-ID: <764978cf0708010428q5b51d27dw13f73dc5ca8d6dc9@mail.gmail.com> I have downloaded the omim.txt file from NCBI ftp site and I am running my attached parser on this file, the parser run stops in between with this :- ------------- EXCEPTION ------------- MSG: a part/organism must be assigned STACK Bio::Phenotype::OMIM::OMIMentry::add_clinical_symptoms /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMentry.pm:566 STACK Bio::Phenotype::OMIM::OMIMparser::_finer_parse_symptoms /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMparser.pm:555 STACK Bio::Phenotype::OMIM::OMIMparser::_createOMIMentry /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMparser.pm:536 STACK Bio::Phenotype::OMIM::OMIMparser::next_phenotype /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMparser.pm:272 STACK toplevel parse_omim_original.pl:47 -------------------------------------- What is the reason for this? Can anyone guide me please. -- -Neeti Even my blood says, B positive From jay at jays.net Wed Aug 1 09:30:50 2007 From: jay at jays.net (Jay Hannah) Date: Wed, 1 Aug 2007 09:30:50 -0400 (EDT) Subject: [Bioperl-l] trying to save blast hit sequences to fasta file In-Reply-To: <5927683277amadoz@uv.es> References: <5927683277amadoz@uv.es> Message-ID: On Wed, 1 Aug 2007, Alicia Amadoz wrote: > Hi, I have tried what you suggested and I get also some errors. > With this code, > > my $seq_out = Bio::SearchIO->new("-file" => ">$fasfilename", "-format" > => "fasta"); > while(my $result = $blast_report->next_result()) { > while(my $hit = $result->next_hit()) { > while(my $hsp = $hit->next_hsp()) { > my $hseq = $hsp->hit_string(); > $hseq =~ s/-//g; #### remove the gap within the aligment > my $hseq_obj = Bio::Seq->new(-display_id => $id, -seq => $hseq); > $seq_out->write_seq($hseq_obj); > } > } > } > > I have the following error: > > Can't locate object method "write_seq" via package "Bio::SearchIO::fasta" You don't want to write_seq() to a SearchIO, you want to write_seq() to a SeqIO. Try this: my $seq_out = Bio::SeqIO->new(-file => ">$fasfilename", -format => "fasta"); while(my $result = $blast_report->next_result()) { while(my $hit = $result->next_hit()) { while(my $hsp = $hit->next_hsp()) { my $hseq = $hsp->hit_string(); $hseq =~ s/-//g; #### remove the gap within the aligment my $hseq_obj = Bio::Seq->new(-display_id => $id, -seq => $hseq); $seq_out->write_seq($hseq_obj); } } } (Untested.) HTH, Jay Hannah http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From cjfields at uiuc.edu Wed Aug 1 11:02:07 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 1 Aug 2007 10:02:07 -0500 Subject: [Bioperl-l] URGENT : Problem in OMIM parser In-Reply-To: <764978cf0708010428q5b51d27dw13f73dc5ca8d6dc9@mail.gmail.com> References: <764978cf0708010428q5b51d27dw13f73dc5ca8d6dc9@mail.gmail.com> Message-ID: <0458A649-AD66-4495-8E25-78D556D51AD9@uiuc.edu> Neeti, Only post to one list email address, namely the one I'm responding to and the one shown here: http://bioperl.org/mailman/listinfo/bioperl-l The others are aliases so you essentially posted three times. As for your question: there was no attached script or any additional information (bioperl version would have also been nice), so we can't help you until we have something more to work with. chris On Aug 1, 2007, at 6:28 AM, neeti somaiya wrote: > I have downloaded the omim.txt file from NCBI ftp site and I am > running my > attached parser on this file, the parser run stops in between with > this :- > > ------------- EXCEPTION ------------- > MSG: a part/organism must be assigned > STACK Bio::Phenotype::OMIM::OMIMentry::add_clinical_symptoms > /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMentry.pm:566 > STACK Bio::Phenotype::OMIM::OMIMparser::_finer_parse_symptoms > /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMparser.pm:555 > STACK Bio::Phenotype::OMIM::OMIMparser::_createOMIMentry > /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMparser.pm:536 > STACK Bio::Phenotype::OMIM::OMIMparser::next_phenotype > /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMparser.pm:272 > STACK toplevel parse_omim_original.pl:47 > > -------------------------------------- > > What is the reason for this? > Can anyone guide me please. > > -- > -Neeti > Even my blood says, B positive > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From torsten.seemann at infotech.monash.edu.au Wed Aug 1 20:50:06 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Thu, 2 Aug 2007 10:50:06 +1000 Subject: [Bioperl-l] trying to save blast hit sequences to fasta file In-Reply-To: <1664224328amadoz@uv.es> References: <1664224328amadoz@uv.es> Message-ID: Alicia, > Hi, I would like to save my hit sequences from a blast result in a fasta > file. I am trying some things but I have problems using Bio::SearchIO > and Bio::SeqIO. Hope anyone could help me with this. Here is my current > code: > # my $seq_out = Bio::SeqIO->new("-file" => ">$fasfilename", "-format" => > "fasta"); > my $seq_out = Bio::SearchIO->new("-file" => ">$fasfilename", "-format" > => "fasta"); > ... > my $hseq = $hsp->hit_string(); > # $seq_out->write_seq($hseq); > $seq_out->write_result($hseq); You have encountered two common problems for BioPerl beginners: 1. "fasta" means two different things! In SearchIO it refers to the output format of the "fasta" sequence alignment software. In SeqIO it refers to a file format that stores just sequences. Confusing, I know. You need SeqIO and write_seq, not SearchIO and write_result. 2. $hseq is a STRING which has the raw sequence letters in it. However, the write_seq() method needs a Bio::Seq object (which has extra details like the name and ID) not a raw string. The example code Jay Hannah supplied in his reply looks pretty good, you should try it. -- --Torsten Seemann --Victorian Bioinformatics Consortium, Monash University From Alicia.Amadoz at uv.es Thu Aug 2 03:06:54 2007 From: Alicia.Amadoz at uv.es (Alicia Amadoz) Date: Thu, 2 Aug 2007 09:06:54 +0200 (CEST) Subject: [Bioperl-l] trying to save blast hit sequences to fasta file In-Reply-To: References: Message-ID: <3579584634amadoz@uv.es> Hi, thanks for your help and suggestions. I have tried the example code of Jay Hannah and it works perfectly. But what I need to save in fasta format is the whole sequence in the database that is similar to my query sequence. I don't understand very well the difference between hit_string() and query_string(), are they the whole sequence that is similiar (about hit_string), a part of the whole sequence or just the part that is aligned to my query string? With the previous code what I have are different sequences in length with the same id as my query string, so I am not sure that I am doing what I need to do. Any light on this point? Thank you very much for your help. Alicia > Alicia, > > > Hi, I would like to save my hit sequences from a blast result in a fasta > > file. I am trying some things but I have problems using Bio::SearchIO > > and Bio::SeqIO. Hope anyone could help me with this. Here is my current > > code: > > # my $seq_out = Bio::SeqIO->new("-file" => ">$fasfilename", "-format" => > > "fasta"); > > my $seq_out = Bio::SearchIO->new("-file" => ">$fasfilename", "-format" > > => "fasta"); > > ... > > my $hseq = $hsp->hit_string(); > > # $seq_out->write_seq($hseq); > > $seq_out->write_result($hseq); > > You have encountered two common problems for BioPerl beginners: > > 1. "fasta" means two different things! In SearchIO it refers to the > output format of the "fasta" sequence alignment software. In SeqIO it > refers to a file format that stores just sequences. Confusing, I know. > You need SeqIO and write_seq, not SearchIO and write_result. > > 2. $hseq is a STRING which has the raw sequence letters in it. > However, the write_seq() method needs a Bio::Seq object (which has > extra details like the name and ID) not a raw string. > > The example code Jay Hannah supplied in his reply looks pretty good, > you should try it. > > -- > --Torsten Seemann > --Victorian Bioinformatics Consortium, Monash University > > From xianranli78 at yahoo.com.cn Thu Aug 2 04:56:04 2007 From: xianranli78 at yahoo.com.cn (Xianran Li) Date: Thu, 2 Aug 2007 16:56:04 +0800 Subject: [Bioperl-l] trying to save blast hit sequences to fasta file References: <3579584634amadoz@uv.es> Message-ID: <003701c7d4e2$f7a34bc0$ed07a8c0@BGI.LOCAL> ----- Original Message ----- From: "Alicia Amadoz" To: "Torsten Seemann" ; Cc: Sent: Thursday, August 02, 2007 3:06 PM Subject: Re: [Bioperl-l] trying to save blast hit sequences to fasta file > Hi, thanks for your help and suggestions. I have tried the example code > of Jay Hannah and it works perfectly. But what I need to save in fasta > format is the whole sequence in the database that is similar to my query > sequence. I don't understand very well the difference between > hit_string() and query_string(), are they the whole sequence that is > similiar (about hit_string), a part of the whole sequence or just the > part that is aligned to my query string? The hit_string() returns the aligned sequences of the subject in your database and the query_string() is the aligned sequences of the query. These two things will be the same unless there are some mutations and or gaps within the alignment. > > With the previous code what I have are different sequences in length > with the same id as my query string, so I am not sure that I am doing > what I need to do. Any light on this point? Did you specify the $id before my $hseq_obj = Bio::Seq->new(-display_id => $id, -seq => $hseq); If you didn't, then all the sequences retrieved will get the same id. The following is a simply way to avoid this problem. my $seq_out = Bio::SeqIO->new("-file" => ">$fasfilename", "-format" =>"fasta"); my $i; while(my $result = $blast_report->next_result()) { while(my $hit = $result->next_hit()) { while(my $hsp = $hit->next_hsp()) { $i ++; my $hseq = $hsp->hit_string(); $hseq =~ s/-//g; #### remove the gap within the aligment my $id = $i; ###### specifiy the id my $hseq_obj = Bio::Seq->new(-display_id => $id, -seq => $hseq); # $seq_out->write_seq($hseq); $seq_out->write_result($hseq_obj); } } } Xianran > > Thank you very much for your help. > Alicia > > > Alicia, > > > > > Hi, I would like to save my hit sequences from a blast result in a fasta > > > file. I am trying some things but I have problems using Bio::SearchIO > > > and Bio::SeqIO. Hope anyone could help me with this. Here is my current > > > code: > > > # my $seq_out = Bio::SeqIO->new("-file" => ">$fasfilename", "-format" => > > > "fasta"); > > > my $seq_out = Bio::SearchIO->new("-file" => ">$fasfilename", "-format" > > > => "fasta"); > > > ... > > > my $hseq = $hsp->hit_string(); > > > # $seq_out->write_seq($hseq); > > > $seq_out->write_result($hseq); > > > > You have encountered two common problems for BioPerl beginners: > > > > 1. "fasta" means two different things! In SearchIO it refers to the > > output format of the "fasta" sequence alignment software. In SeqIO it > > refers to a file format that stores just sequences. Confusing, I know. > > You need SeqIO and write_seq, not SearchIO and write_result. > > > > 2. $hseq is a STRING which has the raw sequence letters in it. > > However, the write_seq() method needs a Bio::Seq object (which has > > extra details like the name and ID) not a raw string. > > > > The example code Jay Hannah supplied in his reply looks pretty good, > > you should try it. > > > > -- > > --Torsten Seemann > > --Victorian Bioinformatics Consortium, Monash University > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l?????????????????????????????????????????????????????????????????'?f??????? From neetisomaiya at gmail.com Thu Aug 2 02:20:33 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Thu, 2 Aug 2007 11:50:33 +0530 Subject: [Bioperl-l] URGENT : Problem in OMIM parser In-Reply-To: <0458A649-AD66-4495-8E25-78D556D51AD9@uiuc.edu> References: <764978cf0708010428q5b51d27dw13f73dc5ca8d6dc9@mail.gmail.com> <0458A649-AD66-4495-8E25-78D556D51AD9@uiuc.edu> Message-ID: <764978cf0708012320v1f30c7a7tfc3a2e524b72093@mail.gmail.com> Hi, The script is attached with this mail. I am using bioperl-1.4. Regards, Neeti. On 8/1/07, Chris Fields wrote: > > Neeti, > > Only post to one list email address, namely the one I'm responding to > and the one shown here: > > http://bioperl.org/mailman/listinfo/bioperl-l > > The others are aliases so you essentially posted three times. As for > your question: there was no attached script or any additional > information (bioperl version would have also been nice), so we can't > help you until we have something more to work with. > > chris > > On Aug 1, 2007, at 6:28 AM, neeti somaiya wrote: > > > I have downloaded the omim.txt file from NCBI ftp site and I am > > running my > > attached parser on this file, the parser run stops in between with > > this :- > > > > ------------- EXCEPTION ------------- > > MSG: a part/organism must be assigned > > STACK Bio::Phenotype::OMIM::OMIMentry::add_clinical_symptoms > > /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMentry.pm:566 > > STACK Bio::Phenotype::OMIM::OMIMparser::_finer_parse_symptoms > > /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMparser.pm:555 > > STACK Bio::Phenotype::OMIM::OMIMparser::_createOMIMentry > > /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMparser.pm:536 > > STACK Bio::Phenotype::OMIM::OMIMparser::next_phenotype > > /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMparser.pm:272 > > STACK toplevel parse_omim_original.pl:47 > > > > -------------------------------------- > > > > What is the reason for this? > > Can anyone guide me please. > > > > -- > > -Neeti > > Even my blood says, B positive > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > -- -Neeti Even my blood says, B positive -------------- next part -------------- A non-text attachment was scrubbed... Name: parse_omim_original.pl Type: application/x-perl Size: 5998 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070802/fbbee8db/attachment.bin From neetisomaiya at gmail.com Thu Aug 2 09:00:33 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Thu, 2 Aug 2007 18:30:33 +0530 Subject: [Bioperl-l] URGENT : Problem in OMIM parser In-Reply-To: <764978cf0708012320v1f30c7a7tfc3a2e524b72093@mail.gmail.com> References: <764978cf0708010428q5b51d27dw13f73dc5ca8d6dc9@mail.gmail.com> <0458A649-AD66-4495-8E25-78D556D51AD9@uiuc.edu> <764978cf0708012320v1f30c7a7tfc3a2e524b72093@mail.gmail.com> Message-ID: <764978cf0708020600v551b917ck9acdd443268b85fa@mail.gmail.com> Also, As per the following links we can fetch data from the genemap file as well :- http://search.cpan.org/~birney/bioperl-1.2.3/Bio/Phenotype/OMIM/OMIMparser.pm But when I am trying to do so in the exact manner as given in the above link, I get no data. As in there are OMIM ids which are present in both the omim.txt and genemap files, and for such cases when I parse and fetch data, data from both files should be obtained, but I aint getting it. For eg. while running the attached script, for OMIM id 100790, I get all data from omim.txt but the cytoposition, gene symbol etc from genemap is not coming, though it is present in the genemap file. Please help me find what could be going wrong. On 8/2/07, neeti somaiya wrote: > > Hi, > > The script is attached with this mail. > I am using bioperl-1.4. > > Regards, > Neeti. > > On 8/1/07, Chris Fields < cjfields at uiuc.edu> wrote: > > > > Neeti, > > > > Only post to one list email address, namely the one I'm responding to > > and the one shown here: > > > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > The others are aliases so you essentially posted three times. As for > > your question: there was no attached script or any additional > > information (bioperl version would have also been nice), so we can't > > help you until we have something more to work with. > > > > chris > > > > On Aug 1, 2007, at 6:28 AM, neeti somaiya wrote: > > > > > I have downloaded the omim.txt file from NCBI ftp site and I am > > > running my > > > attached parser on this file, the parser run stops in between with > > > this :- > > > > > > ------------- EXCEPTION ------------- > > > MSG: a part/organism must be assigned > > > STACK Bio::Phenotype::OMIM::OMIMentry::add_clinical_symptoms > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMentry.pm:566 > > > STACK Bio::Phenotype::OMIM::OMIMparser::_finer_parse_symptoms > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMparser.pm:555 > > > STACK Bio::Phenotype::OMIM::OMIMparser::_createOMIMentry > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMparser.pm:536 > > > STACK Bio::Phenotype::OMIM::OMIMparser::next_phenotype > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMparser.pm:272 > > > STACK toplevel parse_omim_original.pl:47 > > > > > > -------------------------------------- > > > > > > What is the reason for this? > > > Can anyone guide me please. > > > > > > -- > > > -Neeti > > > Even my blood says, B positive > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > > > > > -- > -Neeti > Even my blood says, B positive > > -- -Neeti Even my blood says, B positive -------------- next part -------------- A non-text attachment was scrubbed... Name: parse_omim_original.pl Type: application/x-perl Size: 8750 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070802/6bdb009c/attachment.bin From cjfields at uiuc.edu Thu Aug 2 13:05:55 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 2 Aug 2007 12:05:55 -0500 Subject: [Bioperl-l] Fwd: nonstop repeated output from Remote_blast with xml References: <38B65B2C-A36D-41FB-83C9-7D7B55156CCD@uiuc.edu> Message-ID: For archiving purposes; of course I forgot to cc the list! -c Begin forwarded message: > From: Chris Fields > Date: August 2, 2007 12:04:59 PM CDT > To: gyang at plantbio.uga.edu > Subject: Re: [Bioperl-l] nonstop repeated output from Remote_blast > with xml > > Guojun, > > Make sure to keep this on the mail list for archiving purposes. > > It could be that the RID is not being removed properly (if it isn't > removed then you will repeatedly retrieve your BLAST report). The > new error you are seeing may be coming from whatever XML::SAX > backend parser is being used (XML::SAX::ExpatXS, XML::SAX::Expat, > etc); it doesn't look bioperl-related and there is an eval which > catches this stuff in SearchIO::blastxml. Does text parsing work? > > Could you directly send me your script or add it to a new bug > report as an attachment? > > http://www.bioperl.org/wiki/Bugs > > chris > > On Aug 2, 2007, at 11:07 AM, Guojun Yang wrote: > >> Hi,Chris, >> I installed the latest version of bioperl, in addition to the >> repeated output problem, there are new problems with parsing: >> >> >> -------------------- WARNING --------------------- >> MSG: error in parsing a report: >> No close tag marker [Ln: 4126, Col: 0] >> >> --------------------------------------------------- >> >> Would you please kindly give me a hint on this, >> Thanks a lot, >> Guojun >> >> >> ----- Original Message ----- >> From: Chris Fields [mailto:cjfields at uiuc.edu] >> To: gyang at plantbio.uga.edu >> Cc: bioperl-l List [mailto:bioperl-l at lists.open-bio.org] >> Subject: Re: [Bioperl-l] nonstop repeated output from Remote_blast >> with xml >> >> >>> Make sure to keep responses on the ail list. >>>> You might want to run a full install, just in case. If I remember >>> correctly Sendu made some changes a while back in the BLAST-related >>> modules which may be related to this. At the very least install/ >>> upgrade all modules in Bio::Tools::Run. >>>> chris >>>> On Jul 31, 2007, at 9:40 AM, Guojun Yang wrote: >>>>> Thanks, Chris, >>>> But when I replaced the old RemoteBlast.pm with the new one, I got >>>> "can't locate the object method "retrieve_parameter"". Does this >>>> mean I need to install something else? >>>> Guojun >>>> >>>> ----- Original Message ----- >>>> From: Chris Fields [mailto:cjfields at uiuc.edu] >>>> To: gyang at plantbio.uga.edu >>>> Cc: bioperl-l at bioperl.org >>>> Subject: Re: [Bioperl-l] nonstop repeated output from Remote_blast >>>> with xml >>>> >>>> >>>>>> On Jul 30, 2007, at 3:58 PM, Guojun Yang wrote: >>>>>>> I am running remoteblast and using readmethod "xml", I >>>>>>> noticed that >>>>>> it is printing the output repeatedly nonstop. It's like in a >>>>>> loop. >>>>>> Did anybody notice this before? Can anybody help me getting >>>>>> out of >>>>>> this? >>>>>> Thanks a lot, >>>>>> >>>>>> >>>>>> Guojun Yang >>>>>> University of Georgia >>>>>> Not seeing that using bioperl-live; you may need to update >>>>> RemoteBlast.pm as this sounds similar to an issue that popped up >>>>> earlier in the spring. >>>>>> chris >>>>> >>>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>>>>> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Aug 2 13:51:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 2 Aug 2007 12:51:27 -0500 Subject: [Bioperl-l] URGENT : Problem in OMIM parser In-Reply-To: <764978cf0708020600v551b917ck9acdd443268b85fa@mail.gmail.com> References: <764978cf0708010428q5b51d27dw13f73dc5ca8d6dc9@mail.gmail.com> <0458A649-AD66-4495-8E25-78D556D51AD9@uiuc.edu> <764978cf0708012320v1f30c7a7tfc3a2e524b72093@mail.gmail.com> <764978cf0708020600v551b917ck9acdd443268b85fa@mail.gmail.com> Message-ID: <921F31D6-3CA9-483A-8AFF-B3555E9768C4@uiuc.edu> Neeti, The genemap wasn't loaded in all cases; don't know what the reasoning for it was, but it is fixed in CVS now (Bio::Phenotype::OMIM::OMIMparser, specifically). I would recommend that you install a full upgrade to at least bioperl 1.5.2 before using this; I can't guarantee it will work with bioperl 1.4. chris On Aug 2, 2007, at 8:00 AM, neeti somaiya wrote: > Also, > As per the following links we can fetch data from the genemap file > as well > :- > http://search.cpan.org/~birney/bioperl-1.2.3/Bio/Phenotype/OMIM/ > OMIMparser.pm > > But when I am trying to do so in the exact manner as given in the > above > link, I get no data. As in there are OMIM ids which are present in > both the > omim.txt and genemap files, and for such cases when I parse and > fetch data, > data from both files should be obtained, but I aint getting it. > > For eg. while running the attached script, for OMIM id 100790, I > get all > data from omim.txt but the cytoposition, gene symbol etc from > genemap is not > coming, though it is present in the genemap file. > > Please help me find what could be going wrong. > > On 8/2/07, neeti somaiya wrote: >> >> Hi, >> >> The script is attached with this mail. >> I am using bioperl-1.4. >> >> Regards, >> Neeti. >> >> On 8/1/07, Chris Fields < cjfields at uiuc.edu> wrote: >>> >>> Neeti, >>> >>> Only post to one list email address, namely the one I'm >>> responding to >>> and the one shown here: >>> >>> http://bioperl.org/mailman/listinfo/bioperl-l >>> >>> The others are aliases so you essentially posted three times. As >>> for >>> your question: there was no attached script or any additional >>> information (bioperl version would have also been nice), so we can't >>> help you until we have something more to work with. >>> >>> chris >>> >>> On Aug 1, 2007, at 6:28 AM, neeti somaiya wrote: >>> >>>> I have downloaded the omim.txt file from NCBI ftp site and I am >>>> running my >>>> attached parser on this file, the parser run stops in between with >>>> this :- >>>> >>>> ------------- EXCEPTION ------------- >>>> MSG: a part/organism must be assigned >>>> STACK Bio::Phenotype::OMIM::OMIMentry::add_clinical_symptoms >>>> /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMentry.pm:566 >>>> STACK Bio::Phenotype::OMIM::OMIMparser::_finer_parse_symptoms >>>> /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMparser.pm:555 >>>> STACK Bio::Phenotype::OMIM::OMIMparser::_createOMIMentry >>>> /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMparser.pm:536 >>>> STACK Bio::Phenotype::OMIM::OMIMparser::next_phenotype >>>> /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMparser.pm:272 >>>> STACK toplevel parse_omim_original.pl:47 >>>> >>>> -------------------------------------- >>>> >>>> What is the reason for this? >>>> Can anyone guide me please. >>>> >>>> -- >>>> -Neeti >>>> Even my blood says, B positive >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> >> >> >> -- >> -Neeti >> Even my blood says, B positive >> >> > > > -- > -Neeti > Even my blood says, B positive > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Aug 2 14:16:56 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 2 Aug 2007 13:16:56 -0500 Subject: [Bioperl-l] URGENT : Problem in OMIM parser In-Reply-To: <764978cf0708021057g435539d2yd7168274589ec55f@mail.gmail.com> References: <764978cf0708010428q5b51d27dw13f73dc5ca8d6dc9@mail.gmail.com> <0458A649-AD66-4495-8E25-78D556D51AD9@uiuc.edu> <764978cf0708012320v1f30c7a7tfc3a2e524b72093@mail.gmail.com> <764978cf0708021057g435539d2yd7168274589ec55f@mail.gmail.com> Message-ID: <9D5F428F-D091-4815-A438-B3357D88212C@uiuc.edu> Neeti, Keep this on the list please. I am unable to reproduce this using your script with or without using the optional genemap file. You really should upgrade bioperl to 1.5.2 and try the fix first; this is something that may have been fixed post-bioperl 1.4. chris On Aug 2, 2007, at 12:57 PM, neeti somaiya wrote: > Waiting for your reply on the exception I had mentioned in my first > mail. > > Thanks. > > ---------- Forwarded message ---------- > From: neeti somaiya < neetisomaiya at gmail.com> > Date: Aug 2, 2007 11:50 AM > Subject: Re: [Bioperl-l] URGENT : Problem in OMIM parser > To: bioperl-l at lists.open-bio.org > > Hi, > > The script is attached with this mail. > I am using bioperl-1.4. > > Regards, > Neeti. > > > On 8/1/07, Chris Fields < cjfields at uiuc.edu> wrote:Neeti, > > Only post to one list email address, namely the one I'm responding to > and the one shown here: > > http://bioperl.org/mailman/listinfo/bioperl-l > > The others are aliases so you essentially posted three times. As for > your question: there was no attached script or any additional > information (bioperl version would have also been nice), so we can't > help you until we have something more to work with. > > chris > > On Aug 1, 2007, at 6:28 AM, neeti somaiya wrote: > > > I have downloaded the omim.txt file from NCBI ftp site and I am > > running my > > attached parser on this file, the parser run stops in between with > > this :- > > > > ------------- EXCEPTION ------------- > > MSG: a part/organism must be assigned > > STACK Bio::Phenotype::OMIM::OMIMentry::add_clinical_symptoms > > /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMentry.pm:566 > > STACK Bio::Phenotype::OMIM::OMIMparser::_finer_parse_symptoms > > /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMparser.pm:555 > > STACK Bio::Phenotype::OMIM::OMIMparser::_createOMIMentry > > /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMparser.pm:536 > > STACK Bio::Phenotype::OMIM::OMIMparser::next_phenotype > > /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMparser.pm:272 > > STACK toplevel parse_omim_original.pl:47 > > > > -------------------------------------- > > > > What is the reason for this? > > Can anyone guide me please. > > > > -- > > -Neeti > > Even my blood says, B positive > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > > > > -- > -Neeti > Even my blood says, B positive > > > > -- > -Neeti > Even my blood says, B positive > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From torsten.seemann at infotech.monash.edu.au Thu Aug 2 21:03:36 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 3 Aug 2007 11:03:36 +1000 Subject: [Bioperl-l] trying to save blast hit sequences to fasta file In-Reply-To: <3579584634amadoz@uv.es> References: <3579584634amadoz@uv.es> Message-ID: Alicia, > Hi, thanks for your help and suggestions. I have tried the example code > of Jay Hannah and it works perfectly. But what I need to save in fasta > format is the whole sequence in the database that is similar to my query > sequence. Unfortunately the hit_string is only that part of the sequence in the database that was similar enough to your query sequence. The BLAST report does not have the whole hit sequence in it, only the locally aligned part. SearchIO can only give you what it can get from the BLAST report. You will need to record the IDs of the database sequences you are interested in, and write extra code to retrieve the WHOLE hit sequence from your database. --Torsten Seemann --Victorian Bioinformatics Consortium, Monash University From neetisomaiya at gmail.com Fri Aug 3 01:46:32 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Fri, 3 Aug 2007 11:16:32 +0530 Subject: [Bioperl-l] URGENT : Problem in OMIM parser In-Reply-To: <9D5F428F-D091-4815-A438-B3357D88212C@uiuc.edu> References: <764978cf0708010428q5b51d27dw13f73dc5ca8d6dc9@mail.gmail.com> <0458A649-AD66-4495-8E25-78D556D51AD9@uiuc.edu> <764978cf0708012320v1f30c7a7tfc3a2e524b72093@mail.gmail.com> <764978cf0708021057g435539d2yd7168274589ec55f@mail.gmail.com> <9D5F428F-D091-4815-A438-B3357D88212C@uiuc.edu> Message-ID: <764978cf0708022246v98abed6ue41233f6b27c5674@mail.gmail.com> Hi, Thanks a lot. The exception is not coming after upgrade to bioperl-1.5.2 But the genemap data is still a problem. You had mentioned that I should take Bio::Phenotype::OMIM::OMIMparser, specifically from cvs. Where exactly can I get it? Thanks, Neeti. On 8/2/07, Chris Fields wrote: > > Neeti, > > Keep this on the list please. I am unable to reproduce this using > your script with or without using the optional genemap file. You > really should upgrade bioperl to 1.5.2 and try the fix first; this is > something that may have been fixed post-bioperl 1.4. > > chris > > On Aug 2, 2007, at 12:57 PM, neeti somaiya wrote: > > > Waiting for your reply on the exception I had mentioned in my first > > mail. > > > > Thanks. > > > > ---------- Forwarded message ---------- > > From: neeti somaiya < neetisomaiya at gmail.com> > > Date: Aug 2, 2007 11:50 AM > > Subject: Re: [Bioperl-l] URGENT : Problem in OMIM parser > > To: bioperl-l at lists.open-bio.org > > > > Hi, > > > > The script is attached with this mail. > > I am using bioperl-1.4. > > > > Regards, > > Neeti. > > > > > > On 8/1/07, Chris Fields < cjfields at uiuc.edu> wrote:Neeti, > > > > Only post to one list email address, namely the one I'm responding to > > and the one shown here: > > > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > The others are aliases so you essentially posted three times. As for > > your question: there was no attached script or any additional > > information (bioperl version would have also been nice), so we can't > > help you until we have something more to work with. > > > > chris > > > > On Aug 1, 2007, at 6:28 AM, neeti somaiya wrote: > > > > > I have downloaded the omim.txt file from NCBI ftp site and I am > > > running my > > > attached parser on this file, the parser run stops in between with > > > this :- > > > > > > ------------- EXCEPTION ------------- > > > MSG: a part/organism must be assigned > > > STACK Bio::Phenotype::OMIM::OMIMentry::add_clinical_symptoms > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMentry.pm:566 > > > STACK Bio::Phenotype::OMIM::OMIMparser::_finer_parse_symptoms > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMparser.pm:555 > > > STACK Bio::Phenotype::OMIM::OMIMparser::_createOMIMentry > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMparser.pm:536 > > > STACK Bio::Phenotype::OMIM::OMIMparser::next_phenotype > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Phenotype/OMIM/OMIMparser.pm:272 > > > STACK toplevel parse_omim_original.pl:47 > > > > > > -------------------------------------- > > > > > > What is the reason for this? > > > Can anyone guide me please. > > > > > > -- > > > -Neeti > > > Even my blood says, B positive > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > > > > > > > > -- > > -Neeti > > Even my blood says, B positive > > > > > > > > -- > > -Neeti > > Even my blood says, B positive > > > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > -- -Neeti Even my blood says, B positive From jay at jays.net Fri Aug 3 10:23:11 2007 From: jay at jays.net (Jay Hannah) Date: Fri, 03 Aug 2007 09:23:11 -0500 Subject: [Bioperl-l] trying to save blast hit sequences to fasta file In-Reply-To: References: <3579584634amadoz@uv.es> Message-ID: <46B33A4F.2010403@jays.net> Torsten Seemann wrote: >> Hi, thanks for your help and suggestions. I have tried the example code >> of Jay Hannah and it works perfectly. But what I need to save in fasta >> format is the whole sequence in the database that is similar to my query >> sequence. >> > > Unfortunately the hit_string is only that part of the sequence in the > database that was similar enough to your query sequence. The BLAST > report does not have the whole hit sequence in it, only the locally > aligned part. SearchIO can only give you what it can get from the > BLAST report. > > You will need to record the IDs of the database sequences you are > interested in, and write extra code to retrieve the WHOLE hit sequence > from your database. > This probably won't help, but my (extremely poorly documented) "SeqLab.net" project http://seqlab.net is a framework that sits on top of BioPerl. The current cross_blast() stuff (http://seqlab.net/pods2html/tutorial.html) does this: GenBank -> FASTA -> formatdb -> "stand alone" NCBI BLAST -> reports When the reports run they have simultaneous access to both the original Bio::Seq objects from the GenBank file and the Bio::SearchIO objects from the BLAST results, so it can kick out reports that understand the relationships between (and details of) the original sequences and HSPs simultaneously... If you get stuck trying to do what Torsten suggests and have questions about SeqLab.net you could open a ticket with my group http://clab.ist.unomaha.edu/CLAB/index.php/RT and I'll try to help. Cheers, Jay Hannah http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From mbasu at mail.nih.gov Fri Aug 3 14:55:57 2007 From: mbasu at mail.nih.gov (Malay) Date: Fri, 03 Aug 2007 14:55:57 -0400 Subject: [Bioperl-l] trying to save blast hit sequences to fasta file In-Reply-To: <46B33A4F.2010403@jays.net> References: <3579584634amadoz@uv.es> <46B33A4F.2010403@jays.net> Message-ID: <46B37A3D.4070606@mail.nih.gov> Jay Hannah wrote: > Torsten Seemann wrote: >>> Hi, thanks for your help and suggestions. I have tried the example code >>> of Jay Hannah and it works perfectly. But what I need to save in fasta >>> format is the whole sequence in the database that is similar to my query >>> sequence. >>> >> Unfortunately the hit_string is only that part of the sequence in the >> database that was similar enough to your query sequence. The BLAST >> report does not have the whole hit sequence in it, only the locally >> aligned part. SearchIO can only give you what it can get from the >> BLAST report. >> >> You will need to record the IDs of the database sequences you are >> interested in, and write extra code to retrieve the WHOLE hit sequence >> from your database. I am not sure whether it has already been suggested or not but you can retrieve the full sequence from any blast database using "fastacmd", which is part of NCBI toolbox. Parse the "description" string from from the BLAST report and run: fastacmd -d -s where, the argument of -s can be any unique string for the database. -Malay From cjfields at uiuc.edu Mon Aug 6 13:49:08 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 6 Aug 2007 12:49:08 -0500 Subject: [Bioperl-l] Fwd: nonstop repeated output from Remote_blast with xml References: <1FE846F1-CB20-41FD-929D-8D14E5695B59@uiuc.edu> Message-ID: Wasn't paying attention! Forwarding this to the mail list in case anyone wanted the answer... chris Begin forwarded message: > From: Chris Fields > Date: August 6, 2007 12:10:37 PM CDT > To: gyang at plantbio.uga.edu > Subject: Re: [Bioperl-l] nonstop repeated output from Remote_blast > with xml > > Guojun, > > Sorry about the long wait on this. At this time RemoteBlast > doesn't automatically set the retrieval header to return XML when > setting the -reporttype parameter to 'xml' or 'blastxml'. The > default is text output, so you are retrieving regular text BLAST > reports instead of XML, hence the reported XML parser failure (BTW, > you can see the plain text being returned in the debugging > output). I'll look into a fix for that. > > In the meantime, you can do this manually by setting the following > key prior to submitting the BLAST run: > > $Bio::Tools::Run::RemoteBlast::RETRIEVALHEADER{'FORMAT_TYPE'} = 'XML'; > > When I run your example with the above line added it works fine. > As an additional note, the CVS version of Bio::SearchIO::blastxml > now supports newer versions of XML::SAX::Expat; the problem there > was a bug in XML::SAX::Expat that killed parsing. > > Additional rant before I go back to work (you can skip this if > needed): RemoteBlast is one of the most used modules in BioPerl, > but it is also the most problematic as NCBI keeps changing things > on their end (BLAST text output, prompts when returning RIDs, > etc). It frankly isn't as well-maintained as we would like; this > is partly due to plans we have (but unfortunately haven't acted > upon) to merge RemoteBlast/StandAloneBlast so they have a similar > API and can be used for any BLAST program, including netblast. If > someone wants to take this on at some point then they are more than > welcome! > > chris > > On Aug 3, 2007, at 10:08 AM, Guojun Yang wrote: > >> Thanks, Chris, >> Attached are my script and the query file. I suspected that we >> need to add "remove RID... in the code", I tried putting romoving >> RID at the end of the parsing coding, but it seemed it removed it >> even before the output was processed. I installed >> XML::SAX::Expat, the error became "XML::SAX::Expat is no longer >> supported...", so I installed ExpatXS, the error message becomes: >> >> -------------------- WARNING --------------------- >> MSG: error in parsing a report: >> no element found at line 4126, column 1, byte 186628 at /usr/lib/ >> perl5/site_perl/5.8.3/Bio/SearchIO/blastxml.pm line 304 >> >> >> Would you please try the script with the query file with the >> following input parameters, to see what happens on your machine (I >> want to make sure there is no installation problem on my machine). >> The search subroutine is where blast is performed, I did not >> include a romove RID there. Thanks again! >> >> master:/home/guojun # perl makcgi07.txt >> Query file name: >> kiddo.txt >> Select a function: 1.member;2.RES; 3, long; 4.Anchor; 5.Associator. >> 1 >> Type in the name of an organism, e.g. Oryza sativa. >> Oryza sativa >> Type in the organism to search for RES: >> Your E_value: >> 0.001 >> Size limit for ancestor element: >> 4000 >> Flanking size for retrieved members: >> 50 >> Tolerance for end mismatch: >> 0 >> >> >> >> Guojun From: Chris Fields [mailto:cjfields at uiuc.edu] >> To: gyang at plantbio.uga.edu >> Sent: Thu, 02 Aug 2007 13:04:59 -0400 >> Subject: Re: [Bioperl-l] nonstop repeated output from Remote_blast >> with xml >> >> Guojun, >> >> Make sure to keep this on the mail list for archiving purposes. >> >> It could be that the RID is not being removed properly (if it isn't >> removed then you will repeatedly retrieve your BLAST report). The >> new error you are seeing may be coming from whatever XML::SAX backend >> parser is being used (XML::SAX::ExpatXS, XML::SAX::Expat, etc); it >> doesn't look bioperl-related and there is an eval which catches this >> stuff in SearchIO::blastxml. Does text parsing work? >> >> Could you directly send me your script or add it to a new bug report >> as an attachment? >> >> http://www.bioperl.org/wiki/Bugs >> >> chris >> >> On Aug 2, 2007, at 11:07 AM, Guojun Yang wrote: >> >> > Hi,Chris, >> > I installed the latest version of bioperl, in addition to the >> > repeated output problem, there are new problems with parsing: >> > >> > >> > -------------------- WARNING --------------------- >> > MSG: error in parsing a report: >> > No close tag marker [Ln: 4126, Col: 0] >> > >> > --------------------------------------------------- >> > >> > Would you please kindly give me a hint on this, >> > Thanks a lot, >> > Guojun >> > >> > >> > ----- Original Message ----- >> > From: Chris Fields [mailto:cjfields at uiuc.edu] >> > To: gyang at plantbio.uga.edu >> > Cc: bioperl-l List [mailto:bioperl-l at lists.open-bio.org] >> > Subject: Re: [Bioperl-l] nonstop repeated output from Remote_blast >> > with xml >> > >> > >> >> Make sure to keep responses on the ail list. >> >>> You might want to run a full install, just in case. If I remember >> >> correctly Sendu made some changes a while back in the BLAST- >> related >> >> modules which may be related to this. At the very least install/ >> >> upgrade all modules in Bio::Tools::Run. >> >>> chris >> >>> On Jul 31, 2007, at 9:40 AM, Guojun Yang wrote: >> >>>> Thanks, Chris, >> >>> But when I replaced the old RemoteBlast.pm with the new one, I >> got >> >>> "can't locate the object method "retrieve_parameter"". Does this >> >>> mean I need to install something else? >> >>> Guojun >> >>> >> >>> ----- Original Message ----- >> >>> From: Chris Fields [mailto:cjfields at uiuc.edu] >> >>> To: gyang at plantbio.uga.edu >> >>> Cc: bioperl-l at bioperl.org >> >>> Subject: Re: [Bioperl-l] nonstop repeated output from >> Remote_blast >> >>> with xml >> >>> >> >>> >> >>>>> On Jul 30, 2007, at 3:58 PM, Guojun Yang wrote: >> >>>>>> I am running remoteblast and using readmethod "xml", I noticed >> >>>>>> that >> >>>>> it is printing the output repeatedly nonstop. It's like in a >> loop. >> >>>>> Did anybody notice this before? Can anybody help me getting >> out of >> >>>>> this? >> >>>>> Thanks a lot, >> >>>>> >> >>>>> >> >>>>> Guojun Yang >> >>>>> University of Georgia >> >>>>> Not seeing that using bioperl-live; you may need to update >> >>>> RemoteBlast.pm as this sounds similar to an issue that popped up >> >>>> earlier in the spring. >> >>>>> chris >> >>>> >> >>> Christopher Fields >> >> Postdoctoral Researcher >> >> Lab of Dr. Robert Switzer >> >> Dept of Biochemistry >> >> University of Illinois Urbana-Champaign >> >>>>> >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> >> >> >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From Alicia.Amadoz at uv.es Tue Aug 7 04:20:12 2007 From: Alicia.Amadoz at uv.es (Alicia Amadoz) Date: Tue, 7 Aug 2007 10:20:12 +0200 (CEST) Subject: [Bioperl-l] error using standaloneblast through webserver, part II Message-ID: <1387114447amadoz@uv.es> Hi again, i'm trying to run a bioperl script in linux with standaloneblast from a webserver but i now have another error. It is the following: [blastall] WARNING: Unable to open outfile_allseq.nin [blastall] WARNING: 101: Unable to open outfile_allseq.nin ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: blastall call crashed: 256 /usr/local/blast-2.2.16/bin/blastall -d "/outfile_allseq" -e 10 -i /tmp//alicia_2007_07_20/result_search_alicia_12_03_40.fasta -o /tmp//alicia_2007_08_07/101_result_Local_Blast_alicia_09_56_47.out -p blastn My perl code is the following: my $blastdatadir = $ARGV[9]; -> Here the value of the variable is ok BEGIN { $ENV{PATH} .= ':/usr/local/blast-2.2.16/bin'; # path where blastall bin is located $ENV{BLASTDIR} = '/usr/local/blast-2.2.16/bin'; # path where blastall bin is located $ENV{BLASTDATADIR} = $blastdatadir; # path where formated local databases are located -> Here the value is empty } I have tried without BEGIN { } so $ENV var has a correct value for $blastdatadir but i get the same error. I have checked that formatdb was done and all the files are correct. Any idea or help to solve this problem? Thanks in advance. Regards, Alicia From mheusel at gmail.com Tue Aug 7 04:45:33 2007 From: mheusel at gmail.com (Martin Heusel) Date: Tue, 7 Aug 2007 10:45:33 +0200 Subject: [Bioperl-l] error using standaloneblast through webserver, part II In-Reply-To: <1387114447amadoz@uv.es> References: <1387114447amadoz@uv.es> Message-ID: <6127fc200708070145keb750acycce8a43edd0f724d@mail.gmail.com> > MSG: blastall call crashed: 256 /usr/local/blast-2.2.16/bin/blastall -d > "/outfile_allseq" -e 10 -i I'm not familiar with all this, but it seems your script tries to write in the systems root directory / -d "/outfile_allseq" that is normally not writable for normal users is this the problem? cu Martin -- + openid: http://mhe.myopenid.com/ + gpg : http://user.cs.tu-berlin.de/~mhe/pub/martin.gpg + gpg fp: 4844 71B5 B4E4 3892 69CA 6EA5 6598 61BE 0021 94A2 From Alicia.Amadoz at uv.es Tue Aug 7 07:08:12 2007 From: Alicia.Amadoz at uv.es (Alicia Amadoz) Date: Tue, 7 Aug 2007 13:08:12 +0200 (CEST) Subject: [Bioperl-l] error using standaloneblast through webserver, part II In-Reply-To: <1387114447amadoz@uv.es> References: <1387114447amadoz@uv.es> Message-ID: <5825345446amadoz@uv.es> Hi, i thought that it was enough with setting $ENV{BLASTDATADIR} and standaloneblast would find the database. I have change it, setting -database option of params with path_to_database+name_of_database and it works ok. Thanks for your help. Regards, Alicia From jason at bioperl.org Wed Aug 8 15:16:07 2007 From: jason at bioperl.org (Jason Stajich) Date: Wed, 8 Aug 2007 14:16:07 -0500 Subject: [Bioperl-l] Fwd: Question regarding Bio::GenBank module References: <7a93dad10708081148w74dfede3sd05799a651ebcb80@mail.gmail.com> Message-ID: <24F7DCFE-7047-43BA-BD92-E2238C05DAE1@bioperl.org> Young - I'm forwarding to the list for more help. Begin forwarded message: > From: "Young Song" > Date: August 8, 2007 1:48:29 PM CDT > To: jason at bioperl.org > Subject: Question regarding Bio::GenBank module > > Hello, > > I am currently located in Vancouver, Canada, and I actually have > some > question based on the Bio::GenBank module for bioperl. I read in the > online document for the module ( > http://search.cpan.org/dist/bioperl/Bio/DB/GenBank.pm), that we are > not > supposed to spam the NCBI with multiple requests, which lead me to > think > about the script that I wrote. I am trying to extract some > information > based on the fasta protein files located in the NCBI's database. > The > script reads each '.faa' (Fasta Protein) file and takes in the > 'gi' ID > for each sequence, and extracts several information, which looks like > following output (please note that there are lot more gi's then I > am showing > you right now): > > 10954456 > accesstion number: NP_047185.1 > dbsource: GenBank: NC_001911.1 > NP_047185.1 > starting pos. at genomic seq: 1488 > ending pos. at genomic seq: 1991 > strand: + > description: putative membrane-associated protein > organism: Buchnera aphidicola > MERIIEKAIYASRWLMFPVYVGLSFGFILLTLKFFQQIVFIIPDILAMSESGLVLVVLSLIDIALVGGLL > VMVMFLGYENFISKMDIQDNEKRLGWMGTMDVNSIKNKVASSIVAISSVHLLRLFMEAEKILDDKIMLCV > IIHLTFVLSAFGMAYIDKMSKKKHVLH > ************************************************ > 10954457 > accesstion number: NP_047186.1 > dbsource: GenBank: NC_001911.1 > NP_047186.1 > starting pos. at genomic seq: 2158 > ending pos. at genomic seq: 2913 > strand: + > description: putative replication-associated protein > organism: Buchnera aphidicola > MPRKNYIYNPKPVFNPPKNKRKISTFICYAMKKASEIDVARSNLNYTLLLIDPKTGNILPRFRRLNEHRA > CAMRAIVLAMLYYFDIHSNLVEASIEKLADECGLSTFSDSGNKSITRVSRLINDFLEPMGFVRCKKIKRK > FVSNYIPKKIFLTPMFFMLFNISQSKINRYLFKSKKMSQNLKITEKKIFISFSDIKVMSRLDEKSIRKKI > LNALINYYTASELTKIGPKGLKKRIDIEYNNLCKLFKKIKK > > > > Because there are lot of sequences I am dealing with here, I am > little bit > worried that I may be causing harm to the NCBI server. I just need > to know > if this is the right approach to take, or if there is another > solution (I am > little bit confused what you mean by "multiple requests" in the > document). > Your reply would be very much appreciated. Thank you in advance. > > Sincerely, > > Young C. Song -- Jason Stajich jason at bioperl.org From cjfields at uiuc.edu Wed Aug 8 15:41:34 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 8 Aug 2007 14:41:34 -0500 Subject: [Bioperl-l] Fwd: Question regarding Bio::GenBank module In-Reply-To: <24F7DCFE-7047-43BA-BD92-E2238C05DAE1@bioperl.org> References: <7a93dad10708081148w74dfede3sd05799a651ebcb80@mail.gmail.com> <24F7DCFE-7047-43BA-BD92-E2238C05DAE1@bioperl.org> Message-ID: NCBI eUtils (which Bio::DB::GenBank uses to get sequence data) has a list of user requirements: http://www.ncbi.nlm.nih.gov/entrez/query/static/ eutils_help.html#UserSystemRequirements The most important one is the 3 second timeout between requests, but the module already implements that policy so there isn't a real issue unless you deliberately mess with that setting. NCBI has been known to block IPs which don't follow that particular rule. Also, if you are planning making hundreds of requests you should consider running the script during low traffic times as indicated in the above link. chris On Aug 8, 2007, at 2:16 PM, Jason Stajich wrote: > Young - > I'm forwarding to the list for more help. > > Begin forwarded message: > >> From: "Young Song" >> Date: August 8, 2007 1:48:29 PM CDT >> To: jason at bioperl.org >> Subject: Question regarding Bio::GenBank module >> >> Hello, >> >> I am currently located in Vancouver, Canada, and I actually have >> some >> question based on the Bio::GenBank module for bioperl. I read in the >> online document for the module ( >> http://search.cpan.org/dist/bioperl/Bio/DB/GenBank.pm), that we are >> not >> supposed to spam the NCBI with multiple requests, which lead me to >> think >> about the script that I wrote. I am trying to extract some >> information >> based on the fasta protein files located in the NCBI's database. >> The >> script reads each '.faa' (Fasta Protein) file and takes in the >> 'gi' ID >> for each sequence, and extracts several information, which looks >> like >> following output (please note that there are lot more gi's then I >> am showing >> you right now): >> >> 10954456 >> accesstion number: NP_047185.1 >> dbsource: GenBank: NC_001911.1 >> NP_047185.1 >> starting pos. at genomic seq: 1488 >> ending pos. at genomic seq: 1991 >> strand: + >> description: putative membrane-associated protein >> organism: Buchnera aphidicola >> MERIIEKAIYASRWLMFPVYVGLSFGFILLTLKFFQQIVFIIPDILAMSESGLVLVVLSLIDIALVGGL >> L >> VMVMFLGYENFISKMDIQDNEKRLGWMGTMDVNSIKNKVASSIVAISSVHLLRLFMEAEKILDDKIMLC >> V >> IIHLTFVLSAFGMAYIDKMSKKKHVLH >> ************************************************ >> 10954457 >> accesstion number: NP_047186.1 >> dbsource: GenBank: NC_001911.1 >> NP_047186.1 >> starting pos. at genomic seq: 2158 >> ending pos. at genomic seq: 2913 >> strand: + >> description: putative replication-associated protein >> organism: Buchnera aphidicola >> MPRKNYIYNPKPVFNPPKNKRKISTFICYAMKKASEIDVARSNLNYTLLLIDPKTGNILPRFRRLNEHR >> A >> CAMRAIVLAMLYYFDIHSNLVEASIEKLADECGLSTFSDSGNKSITRVSRLINDFLEPMGFVRCKKIKR >> K >> FVSNYIPKKIFLTPMFFMLFNISQSKINRYLFKSKKMSQNLKITEKKIFISFSDIKVMSRLDEKSIRKK >> I >> LNALINYYTASELTKIGPKGLKKRIDIEYNNLCKLFKKIKK >> >> >> >> Because there are lot of sequences I am dealing with here, I am >> little bit >> worried that I may be causing harm to the NCBI server. I just need >> to know >> if this is the right approach to take, or if there is another >> solution (I am >> little bit confused what you mean by "multiple requests" in the >> document). >> Your reply would be very much appreciated. Thank you in advance. >> >> Sincerely, >> >> Young C. Song > > -- > Jason Stajich > jason at bioperl.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From gyang at plantbio.uga.edu Thu Aug 9 15:03:21 2007 From: gyang at plantbio.uga.edu (Guojun Yang) Date: Thu, 09 Aug 2007 15:03:21 -0400 Subject: [Bioperl-l] standalone blastall call crashed, please help In-Reply-To: 1FE846F1-CB20-41FD-929D-8D14E5695B59@uiuc.edu Message-ID: <20070809190321.191d0d4a@dogwood.plantbio.uga.edu> Hi, Chris, Thanks a lot for your efforts. With your help, I am gaining more confidence to fix the cgi code. While the remoteblast problem is fixed now, I am caught in a local blast problem (see the error message and subroutine). The line starting with * is line 593 in the error message. I tried command line blastall, it works fine. I set the permission to all the blast folders and files, it did not help much. The same sequence and database works OK if I use command line blastall. I used the seq object ref $query as query, the error message gives "-i /tmp/...", does this look like an input problem? The subroutine was working before early 2006 (on a different machine), I am wondering whether this is due to changes in the StandAloneBlast.pm? Best, Guojun I set the blast env variables: BEGIN {$ENV{BLASTDIR} = '/usr/blast-2.2.10/bin'; } BEGIN {$ENV{BLASTDB}='/usr/blast-2.2.10/data';} BEGIN {$ENV{BLASTMAT}='/usr/blast-2.2.10/data';} $PROGRAMDIR = $ENV{'BLASTDIR'} || ''; ...... ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: blastall call crashed: -1 /usr/blast-2.2.10/bin/blastall -d "/usr/blast-2.2.10/data/swissprot" -e 0.001 -i /tmp/3cjvQyodxg -o /tmp/4qSSO16EZP -p blastx STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.3/Bio/Root/Root.pm:359 STACK: Bio::Tools::Run::StandAloneBlast::_runblast /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:813 STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:760 STACK: Bio::Tools::Run::StandAloneBlast::blastall /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:570 STACK: main::ancestor makcgi07.txt:593 STACK: makcgi07.txt:208 sub ancestor { use Bio::Tools::Run::StandAloneBlast; use Bio::SearchIO::blast; my $query = Bio::Seq -> new ( -seq=>"$_[0]", -id=>"test"); print $query->seq(); my $len=$query->length(); my $long_name=$_[1]; my $long_start=$_[2]; my $long_end=$_[3]; @db=('swissprot'); foreach my $db (@db) { my $factory = Bio::Tools::Run::StandAloneBlast->new(-program => "blastx", -database => "$db", -e => 1e-3, ); * my $blast_report = $factory->blastall($query); while (my $result = $blast_report->next_result) { while( my $hit = $result->next_hit()) { $hit_name=$hit->name; $hit_name =~ /\S+[|](\S+)[.]\d+[|].*/; $name=$1; $desc = $hit->description(); if ($desc =~ /.*{|\btransposon\b|\btransposase\b|}.*/i){ $AN=0; $replica=0; while ($ancestor_name[$AN]) { $replica=1 if (($ancestor_name[$AN] eq $long_name) && ($hitname[$AN] eq $name)); $AN+=1; } if ($replica==0) { push @ancestor_name, $long_name; push @ancestor_start, $long_start; push @ancestor_end, $long_end; push @desc, $desc; push @hitname,$name; } } } }} return @ancestor_name, at ancestor_start, at ancestor_end, at desc; } From harijay at gmail.com Thu Aug 9 17:47:50 2007 From: harijay at gmail.com (hari jayaram) Date: Thu, 9 Aug 2007 17:47:50 -0400 Subject: [Bioperl-l] newbie wants install help Message-ID: Hi I am trying to install bioperl as a non root user since I dont have root access on the machine. I was following the instructions as given on the wiki at http://bioperl.open-bio.org/wiki/Installing_Bioperl_for_Unix I started from scratch using perl version v5.8.5 and used cpan to install the bioperl module prerequisites bundle Bundle::BioPerl since I thought it was needed. Everything worked just fine I could use cpan as a non root user following instructions given at http://www.dcc.fc.up.pt/~pbrandao/aulas/0203/AR/modules_inst_cpan.html But when I try to install bioperl using the instructions for non-root I get an error when I build Module::Build because I am not root. Iget the same Module::Build error when I try to install without CPAN using command line script perl Build.PL --install_base option as given on the wiki. Is there a way out Thanks for your help in advance harijay Brandeis University Installing /usr/share/man/man3/Module::Build::Platform::VMS.3pm Installing /usr/share/man/man3/Module::Build::Base.3pm Installing /usr/share/man/man3/Module::Build::Authoring.3pm Installing /usr/share/man/man3/Module::Build::Compat.3pm mkdir /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi/auto/Module: Permission denied at /usr/lib/perl5/5.8.5/ExtUtils/Install.pm line 207 Installing /usr/bin/config_data make: *** [install] Error 255 /usr/bin/make install -- NOT OK You may have to su to root to install the package Couldn't install Module::Build, giving up. make: *** No targets specified and no makefile found. Stop. /usr/bin/make -- NOT OK Running make test Can't test without successful make Running make install make had returned bad status, install seems impossible From bix at sendu.me.uk Thu Aug 9 18:23:24 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 09 Aug 2007 23:23:24 +0100 Subject: [Bioperl-l] newbie wants install help In-Reply-To: References: Message-ID: <46BB93DC.9010608@sendu.me.uk> hari jayaram wrote: > Hi I am trying to install bioperl as a non root user since I dont have root > access on the machine. > > I was following the instructions as given on the wiki at > http://bioperl.open-bio.org/wiki/Installing_Bioperl_for_Unix > I started from scratch using perl version v5.8.5 and used cpan to install > the bioperl module prerequisites bundle Bundle::BioPerl since I thought it > was needed. Everything worked just fine > I could use cpan as a non root user following instructions given at > http://www.dcc.fc.up.pt/~pbrandao/aulas/0203/AR/modules_inst_cpan.html > > But when I try to install bioperl using the instructions for non-root I get > an error when I build Module::Build because I am not root. > Iget the same Module::Build error when I try to install without CPAN using > command line script perl Build.PL --install_base option as given on the > wiki. Follow the cpan instructions you found to install as non-root: Bundle::CPAN Failing that, you require at least: Module::Build Failing that: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPERL_MODULES_THE_HARD_WAY (it's actually the easiest way, go figure) From bix at sendu.me.uk Fri Aug 10 03:41:29 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 10 Aug 2007 08:41:29 +0100 Subject: [Bioperl-l] newbie wants install help In-Reply-To: References: <46BB93DC.9010608@sendu.me.uk> Message-ID: <46BC16A9.7090709@sendu.me.uk> hari jayaram wrote: > Hi Sendu , Hi, please post back to the list as well, so others can benefit. > Well after going through a few attempts at installing Bundle::CPAN I > gave up. > It always had weird timeout issues . ANd kept re-installing everything > on restarting the CPAN shell > After a while I thought it did complete - since it retunred me to the shell > > I tried the CPAN install of bioperl at that point > > ANd bingo I got booted out at the exact same point when the Bioperl > install tried to re-install(?) Module:Build which failed as non root Did you follow steps 7 and 8 of http://www.dcc.fc.up.pt/~pbrandao/aulas/0203/AR/modules_inst_cpan.html ? If you managed to install Bundle::CPAN, when you now run 'cpan' it should start up and tell you its version number, which should be v1.9102 or higher. If its lower, you didn't manage to install the latest CPAN, or you haven't managed to tell Perl where your newly installed modules are. > I guess for all future modules I will adopt the option 3 you detailed , > i.e just have the modules sitting somewhere and use them from there > > But I am still interested in getting it done right via CPAN. From n.haigh at sheffield.ac.uk Fri Aug 10 06:14:06 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 10 Aug 2007 11:14:06 +0100 Subject: [Bioperl-l] newbie wants install help In-Reply-To: <46BC16A9.7090709@sendu.me.uk> References: <46BB93DC.9010608@sendu.me.uk> <46BC16A9.7090709@sendu.me.uk> Message-ID: <46BC3A6E.80302@sheffield.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Sendu Bala wrote: > hari jayaram wrote: >> Hi Sendu , > > Hi, please post back to the list as well, so others can benefit. > > >> Well after going through a few attempts at installing Bundle::CPAN I >> gave up. >> It always had weird timeout issues . ANd kept re-installing everything >> on restarting the CPAN shell >> After a while I thought it did complete - since it retunred me to the shell >> >> I tried the CPAN install of bioperl at that point >> >> ANd bingo I got booted out at the exact same point when the Bioperl >> install tried to re-install(?) Module:Build which failed as non root > > Did you follow steps 7 and 8 of > http://www.dcc.fc.up.pt/~pbrandao/aulas/0203/AR/modules_inst_cpan.html ? > > If you managed to install Bundle::CPAN, when you now run 'cpan' it > should start up and tell you its version number, which should be v1.9102 > or higher. If its lower, you didn't manage to install the latest CPAN, > or you haven't managed to tell Perl where your newly installed modules are. > > >> I guess for all future modules I will adopt the option 3 you detailed , >> i.e just have the modules sitting somewhere and use them from there >> >> But I am still interested in getting it done right via CPAN. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l It will probably also help, if you post the commands you have run and any output (truncated if it's really long), then we can follow what you have tried and make some better suggestions. Cheers Nath -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGvDpuczuW2jkwy2gRAjFjAJ0eG90cMfHrrIh7LbKWx1JN94kbXgCdGSbi tMjQrZ/8EPc0wLiNAhYTr4Y= =kXZ2 -----END PGP SIGNATURE----- From mbasu at mail.nih.gov Fri Aug 10 11:25:35 2007 From: mbasu at mail.nih.gov (Malay) Date: Fri, 10 Aug 2007 11:25:35 -0400 Subject: [Bioperl-l] newbie wants install help In-Reply-To: References: Message-ID: <46BC836F.7010906@mail.nih.gov> hari jayaram wrote: > Hi I am trying to install bioperl as a non root user since I dont have root > access on the machine. > > I was following the instructions as given on the wiki at > http://bioperl.open-bio.org/wiki/Installing_Bioperl_for_Unix > I started from scratch using perl version v5.8.5 and used cpan to install > the bioperl module prerequisites bundle Bundle::BioPerl since I thought it > was needed. Everything worked just fine > I could use cpan as a non root user following instructions given at > http://www.dcc.fc.up.pt/~pbrandao/aulas/0203/AR/modules_inst_cpan.html > > But when I try to install bioperl using the instructions for non-root I get > an error when I build Module::Build because I am not root. > Iget the same Module::Build error when I try to install without CPAN using > command line script perl Build.PL --install_base option as given on the > wiki. > > Is there a way out > > Thanks for your help in advance > harijay > Brandeis University > This is related your situation and broadly applicable to all perl users in a non root situation. I can tell from my own experience the best way to handle your situation is to use your own Perl, if you are a dedicated perl developer. Just compile and install your own perl installation in any directory of you choice and put the "bin" directory in front of you path and off you go. The advantages are several fold. First, you get a very optimized, fast perl. The sysadmin might have just installed a binary run-of-the-mill perl version. Second, you get all the freedom of installing the very latest updates of all the modules. The sysadmins may be too busy man to update perl frequently. Third, a very common problem with production machine is that they follow strictly the perl installation instruction and avoid threaded perl, which clips your wings particularly, when almost all machines contain multiple processors. The drawbacks are related to finding "/usr/bin/perl" in the shebang line. If you follow the perl way of installing any script, it will take care of it. When you develop, use the more portable way of #!/usr/bin/env perl BEGIN {$^W =1 } # Use it switch on compile time warnings (-w) All the best, Malay -- Malay K Basu www.malaybasu.net From gyang at plantbio.uga.edu Fri Aug 10 11:23:36 2007 From: gyang at plantbio.uga.edu (Guojun Yang) Date: Fri, 10 Aug 2007 11:23:36 -0400 Subject: [Bioperl-l] ATTN: Matthew Laird & Elia----blastall call crashed from StandAloneBlast In-Reply-To: 20070809190321.191d0d4a@dogwood.plantbio.uga.edu Message-ID: <20070810152336.898c3979@dogwood.plantbio.uga.edu> Hi, Chris, Interestingly, I found the message in bioperl-l from Matthew Laird 2005 "Blastall & StandAloneBlast". "...the Odd thing is, Blast DOES run. If one comments out this line in StandAloneBlast.pm, the execution succeeds perfectly fine". It seemed to be mysterious when I uncommented the " $self->throw("$executable call crashed: $? $! $commandstring\n") unless ($status==0) ;" line, the blastall runs. The only difference from what Matthew saw is that, when I did not uncomment the line, blastall DID NOT run. Thanks, Guojun _____ From: Guojun Yang [mailto:gyang at plantbio.uga.edu] To: Chris Fields [mailto:cjfields at uiuc.edu] Cc: bioperl-l at lists.open-bio.org Sent: Thu, 09 Aug 2007 15:03:21 -0400 Subject: standalone blastall call crashed, please help Hi, Chris, Thanks a lot for your efforts. With your help, I am gaining more confidence to fix the cgi code. While the remoteblast problem is fixed now, I am caught in a local blast problem (see the error message and subroutine). The line starting with * is line 593 in the error message. I tried command line blastall, it works fine. I set the permission to all the blast folders and files, it did not help much. The same sequence and database works OK if I use command line blastall. I used the seq object ref $query as query, the error message gives "-i /tmp/...", does this look like an input problem? The subroutine was working before early 2006 (on a different machine), I am wondering whether this is due to changes in the StandAloneBlast.pm? Best, Guojun I set the blast env variables: BEGIN {$ENV{BLASTDIR} = '/usr/blast-2.2.10/bin'; } BEGIN {$ENV{BLASTDB}='/usr/blast-2.2.10/data';} BEGIN {$ENV{BLASTMAT}='/usr/blast-2.2.10/data';} $PROGRAMDIR = $ENV{'BLASTDIR'} || ''; ...... ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: blastall call crashed: -1 /usr/blast-2.2.10/bin/blastall -d "/usr/blast-2.2.10/data/swissprot" -e 0.001 -i /tmp/3cjvQyodxg -o /tmp/4qSSO16EZP -p blastx STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.3/Bio/Root/Root.pm:359 STACK: Bio::Tools::Run::StandAloneBlast::_runblast /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:813 STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:760 STACK: Bio::Tools::Run::StandAloneBlast::blastall /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:570 STACK: main::ancestor makcgi07.txt:593 STACK: makcgi07.txt:208 sub ancestor { use Bio::Tools::Run::StandAloneBlast; use Bio::SearchIO::blast; my $query = Bio::Seq -> new ( -seq=>"$_[0]", -id=>"test"); print $query->seq(); my $len=$query->length(); my $long_name=$_[1]; my $long_start=$_[2]; my $long_end=$_[3]; @db=('swissprot'); foreach my $db (@db) { my $factory = Bio::Tools::Run::StandAloneBlast->new(-program => "blastx", -database => "$db", -e => 1e-3, ); * my $blast_report = $factory->blastall($query); while (my $result = $blast_report->next_result) { while( my $hit = $result->next_hit()) { $hit_name=$hit->name; $hit_name =~ /\S+[|](\S+)[.]\d+[|].*/; $name=$1; $desc = $hit->description(); if ($desc =~ /.*{|\btransposon\b|\btransposase\b|}.*/i){ $AN=0; $replica=0; while ($ancestor_name[$AN]) { $replica=1 if (($ancestor_name[$AN] eq $long_name) && ($hitname[$AN] eq $name)); $AN+=1; } if ($replica==0) { push @ancestor_name, $long_name; push @ancestor_start, $long_start; push @ancestor_end, $long_end; push @desc, $desc; push @hitname,$name; } } } }} return @ancestor_name, at ancestor_start, at ancestor_end, at desc; } From cjfields at uiuc.edu Fri Aug 10 12:17:38 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 10 Aug 2007 11:17:38 -0500 Subject: [Bioperl-l] ATTN: Matthew Laird & Elia----blastall call crashed from StandAloneBlast In-Reply-To: <20070810152336.898c3979@dogwood.plantbio.uga.edu> References: <20070810152336.898c3979@dogwood.plantbio.uga.edu> Message-ID: <56186844-3CB9-4968-B16F-FD5EE72865A2@uiuc.edu> This should be filed as a bug if possible; could you do that? http://www.bioperl.org/wiki/Bugs Suggestions have been made many times previously that StandAloneBlast, RemoteBlast, etc be combined to use a common API, incorporate other BLAST implementations (i.e. WU-BLAST, NCBI's netblast, etc), and maybe utilize other cross-platform compatible means of running programs and passing off reports to parsers. In fact, Jason, Roger Hall, Torsten, and I discussed tentative plans for plugin-able BLAST wrappers: http://www.bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast Though they have never been acted upon. If I get time towards the end of fall and manage to finish up some other projects I may try taking this on, maybe using the wiki to track progress. chris On Aug 10, 2007, at 10:23 AM, Guojun Yang wrote: > Hi, Chris, > Interestingly, I found the message in bioperl-l from Matthew Laird > 2005 "Blastall & StandAloneBlast". "...the Odd thing is, Blast DOES > run. If one comments out this line in StandAloneBlast.pm, the > execution succeeds perfectly fine". It seemed to be mysterious when > I uncommented the " $self->throw("$executable call crashed: $? $! > $commandstring\n") unless ($status==0) ;" line, the blastall runs. > The only difference from what Matthew saw is that, when I did not > uncomment the line, blastall DID NOT run. > Thanks, > Guojun > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > To: Chris Fields [mailto:cjfields at uiuc.edu] > Cc: bioperl-l at lists.open-bio.org > Sent: Thu, 09 Aug 2007 15:03:21 -0400 > Subject: standalone blastall call crashed, please help > > Hi, Chris, > Thanks a lot for your efforts. With your help, I am gaining more > confidence to fix the cgi code. While the remoteblast problem is > fixed now, I am caught in a local blast problem (see the error > message and subroutine). The line starting with * is line 593 in > the error message. I tried command line blastall, it works fine. I > set the permission to all the blast folders and files, it did not > help much. The same sequence and database works OK if I use command > line blastall. I used the seq object ref $query as query, the error > message gives "-i /tmp/...", does this look like an input problem? > The subroutine was working before early 2006 (on a different > machine), I am wondering whether this is due to changes in the > StandAloneBlast.pm? Best, Guojun > > I set the blast env variables: > > BEGIN {$ENV{BLASTDIR} = '/usr/blast-2.2.10/bin'; } > BEGIN {$ENV{BLASTDB}='/usr/blast-2.2.10/data';} > BEGIN {$ENV{BLASTMAT}='/usr/blast-2.2.10/data';} > $PROGRAMDIR = $ENV{'BLASTDIR'} || ''; > ...... > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: blastall call crashed: -1 /usr/blast-2.2.10/bin/blastall -d "/ > usr/blast-2.2.10/data/swissprot" -e 0.001 -i /tmp/3cjvQyodxg - > o /tmp/4qSSO16EZP -p blastx > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.3/Bio/ > Root/Root.pm:359 > STACK: Bio::Tools::Run::StandAloneBlast::_runblast /usr/lib/perl5/ > site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:813 > STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /usr/ > lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:760 > STACK: Bio::Tools::Run::StandAloneBlast::blastall /usr/lib/perl5/ > site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:570 > STACK: main::ancestor makcgi07.txt:593 > STACK: makcgi07.txt:208 > sub ancestor { > use Bio::Tools::Run::StandAloneBlast; > use Bio::SearchIO::blast; > > my $query = Bio::Seq -> new ( -seq=>"$_[0]", > -id=>"test"); > print $query->seq(); > my $len=$query->length(); > my $long_name=$_[1]; > my $long_start=$_[2]; > my $long_end=$_[3]; > @db=('swissprot'); > foreach my $db (@db) { > my $factory = Bio::Tools::Run::StandAloneBlast->new(-program => > "blastx", > -database > => "$db", > -e => 1e-3, > ); > * my $blast_report = $factory->blastall($query); > while (my $result = $blast_report->next_result) { > while( my $hit = $result->next_hit()) { > $hit_name=$hit->name; > $hit_name =~ /\S+[|](\S+)[.]\d+[|].*/; > $name=$1; > $desc = $hit->description(); > if ($desc =~ /.*{|\btransposon\b|\btransposase > \b|}.*/i){ > $AN=0; > $replica=0; > while ($ancestor_name[$AN]) { > $replica=1 if (($ancestor_name[$AN] eq > $long_name) && ($hitname[$AN] eq $name)); > $AN+=1; > } > if ($replica==0) { > push @ancestor_name, $long_name; > push @ancestor_start, $long_start; > push @ancestor_end, $long_end; > push @desc, $desc; > push @hitname,$name; > } > } > } > }} > return @ancestor_name, at ancestor_start, at ancestor_end, at desc; > } > > > > > > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From harijay at gmail.com Fri Aug 10 13:09:32 2007 From: harijay at gmail.com (hari jayaram) Date: Fri, 10 Aug 2007 13:09:32 -0400 Subject: [Bioperl-l] newbie wants install help In-Reply-To: <46BC16A9.7090709@sendu.me.uk> References: <46BB93DC.9010608@sendu.me.uk> <46BC16A9.7090709@sendu.me.uk> Message-ID: Hey all , Thanks for your help. Its working real well now. Turns out I had not set my PERL5LIB environment variable correctly and it was not finding the installed modules (thanks Sendu) So the steps I followed were 1) Install CPAN as myself as detailed http://www.dcc.fc.up.pt/~pbrandao/aulas/0203/AR/modules_inst_cpan.html Importantly the line which tells CPAN what prefix to use for all module installs PREFIX=~/perl5lib/ LIB=~/perl5lib/lib INSTALLMAN1DIR=~/perl5lib/man1 INSTALLMAN3DIR=~/perl5lib/man3 2) Set the Perl5LIB to /home/perl5lib/lib ( and not just /home/perl5lib) in the shell . I use cshell so I edited .cshrc setenv PERL5LIB /home/hari/perl5lib/lib setenv MANPATH ${MANPATH}:/home/hari/perl5lib 3) Updated the system CPAN to latest version - this woked very well once the perl5lib was installed ..only it took a while and sometimes stalled with messages like done 31/34 But a CTRL C , got it going again 4) Made sure I was using the new CPAN v1.9102 5) Installed Bioperl with command install S/SE/SENDU/bioperl-1.5.2_102.tar.gz AND I was good to go.. I am thinking I will screencast this process for everyones benefit and put it up on bioscreencast.com . If that will be useful for others. Thanks to everyone on the group. Now the journey begins Hari Jayaram On 8/10/07, Sendu Bala wrote: > hari jayaram wrote: > > Hi Sendu , > > Hi, please post back to the list as well, so others can benefit. > > > > Well after going through a few attempts at installing Bundle::CPAN I > > gave up. > > It always had weird timeout issues . ANd kept re-installing everything > > on restarting the CPAN shell > > After a while I thought it did complete - since it retunred me to the shell > > > > I tried the CPAN install of bioperl at that point > > > > ANd bingo I got booted out at the exact same point when the Bioperl > > install tried to re-install(?) Module:Build which failed as non root > > Did you follow steps 7 and 8 of > http://www.dcc.fc.up.pt/~pbrandao/aulas/0203/AR/modules_inst_cpan.html ? > > If you managed to install Bundle::CPAN, when you now run 'cpan' it > should start up and tell you its version number, which should be v1.9102 > or higher. If its lower, you didn't manage to install the latest CPAN, > or you haven't managed to tell Perl where your newly installed modules are. > > > > I guess for all future modules I will adopt the option 3 you detailed , > > i.e just have the modules sitting somewhere and use them from there > > > > But I am still interested in getting it done right via CPAN. > From torsten.seemann at infotech.monash.edu.au Fri Aug 10 17:48:56 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Sat, 11 Aug 2007 07:48:56 +1000 Subject: [Bioperl-l] ATTN: Matthew Laird & Elia----blastall call crashed from StandAloneBlast In-Reply-To: <20070810152336.898c3979@dogwood.plantbio.uga.edu> References: <20070809190321.191d0d4a@dogwood.plantbio.uga.edu> <20070810152336.898c3979@dogwood.plantbio.uga.edu> Message-ID: > Interestingly, I found the message in bioperl-l from Matthew Laird 2005 "Blastall & StandAloneBlast". "...the Odd thing is, Blast DOES run. If one comments out this line in StandAloneBlast.pm, the execution succeeds perfectly fine". It seemed to be mysterious when I uncommented the " $self->throw("$executable call crashed: $? $! $commandstring\n") unless ($status==0) ;" line, the blastall runs. The only difference from what Matthew saw is that, when I did not uncomment the line, blastall DID NOT run. Yes, Matthew is one of the authors of PSORTB and I spent a bit of time last year trying to fix this problem (unsuccessfully). The PSORTB docs http://www.psort.org/downloads/index.html explain how to get around this problem just as Guojun describes. I use a custom BioPerl installation just for PSORTB! I was under the impression it was already filed as a bug, but my searching indicates this is not so. -- --Torsten Seemann --Victorian Bioinformatics Consortium, Monash University From cjfields at uiuc.edu Fri Aug 10 18:04:20 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 10 Aug 2007 17:04:20 -0500 Subject: [Bioperl-l] ATTN: Matthew Laird & Elia----blastall call crashed from StandAloneBlast In-Reply-To: References: <20070809190321.191d0d4a@dogwood.plantbio.uga.edu> <20070810152336.898c3979@dogwood.plantbio.uga.edu> Message-ID: <41A08079-6EEC-4B62-8104-C41E70C03083@uiuc.edu> On Aug 10, 2007, at 4:48 PM, Torsten Seemann wrote: >> Interestingly, I found the message in bioperl-l from Matthew Laird >> 2005 "Blastall & StandAloneBlast". "...the Odd thing is, Blast >> DOES run. If one comments out this line in StandAloneBlast.pm, >> the execution succeeds perfectly fine". It seemed to be mysterious >> when I uncommented the " $self->throw("$executable call crashed: >> $? $! $commandstring\n") unless ($status==0) ;" line, the blastall >> runs. The only difference from what Matthew saw is that, when I >> did not uncomment the line, blastall DID NOT run. > > Yes, Matthew is one of the authors of PSORTB and I spent a bit of time > last year trying to fix this problem (unsuccessfully). The PSORTB docs > http://www.psort.org/downloads/index.html > explain how to get around this problem just as Guojun describes. I use > a custom BioPerl installation just for PSORTB! > > I was under the impression it was already filed as a bug, but my > searching indicates this is not so. > > -- > --Torsten Seemann > --Victorian Bioinformatics Consortium, Monash University > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Might be wise to go ahead and add it to bugzilla so we can track it, along with the workaround. chris From neetisomaiya at gmail.com Mon Aug 13 06:29:39 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Mon, 13 Aug 2007 15:59:39 +0530 Subject: [Bioperl-l] Homologene parser? Message-ID: <764978cf0708130329r484bf210qd0e6e9ad39274bbc@mail.gmail.com> Hi, Does anyone know of any Homologene parser, if available? Please let me know. Thanks and Regards, Neeti. -- -Neeti Even my blood says, B positive From shameer at ncbs.res.in Mon Aug 13 07:07:45 2007 From: shameer at ncbs.res.in (Shameer Khadar) Date: Mon, 13 Aug 2007 16:37:45 +0530 (IST) Subject: [Bioperl-l] Bio::Graphics : To change scale color & sort and add direction to SeqFeature In-Reply-To: <6dce9a0b0705030901w203344b4te03ad271a5482faf@mail.gmail.com> References: <10259461.post@talk.nabble.com> <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in> <1178028249.2644.13.camel@localhost.localdomain> <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in> <6dce9a0b0705030901w203344b4te03ad271a5482faf@mail.gmail.com> Message-ID: <51133.192.168.1.1.1187003265.squirrel@mail.ncbs.res.in> Dear All, I am generating images based on Transcription Factor binding site data using bio::graphics module. I created my images using program : version-2 [http://stein.cshl.org/genome_informatics/BioGraphics/] (Courtsey : L. Stein ). I attaching one of the image with this mail. I need to make 3 changes to this image 1. to color the 'scale' Color the scale in two different colors ie, from start 1.0k - color blue from 101 - till end of the scale green (I thoroghly checked the Bio::Graphics document, I couldnt find an option to do this ) 2. to sort the Transcription factors based on the z_score 3. to give forward/reverse [> or < ]direction for the black boxes I would appreaciate if any one can give me some clues/link to accomplish this :). thanks in advance , Shameer -- Shameer Khadar Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India T - 91-080-23666001 EXT - 6251 W - http://www.ncbs.res.in -------------- next part -------------- A non-text attachment was scrubbed... Name: TF_top3.png Type: image/png Size: 2188 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070813/6a4423bd/attachment.png From bix at sendu.me.uk Mon Aug 13 09:11:50 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 13 Aug 2007 14:11:50 +0100 Subject: [Bioperl-l] Bio::Graphics : To change scale color & sort and add direction to SeqFeature In-Reply-To: <51133.192.168.1.1.1187003265.squirrel@mail.ncbs.res.in> References: <10259461.post@talk.nabble.com> <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in> <1178028249.2644.13.camel@localhost.localdomain> <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in> <6dce9a0b0705030901w203344b4te03ad271a5482faf@mail.gmail.com> <51133.192.168.1.1.1187003265.squirrel@mail.ncbs.res.in> Message-ID: <46C05896.1010002@sendu.me.uk> Shameer Khadar wrote: > Dear All, > > I am generating images based on Transcription Factor binding site data > using bio::graphics module. > I created my images using program : version-2 > [http://stein.cshl.org/genome_informatics/BioGraphics/] (Courtsey : L. > Stein ). I attaching one of the image with this mail. > > I need to make 3 changes to this image > > 1. to color the 'scale' > Color the scale in two different colors ie, from start 1.0k - color blue > from 101 - till end of the scale green (I thoroghly checked the > Bio::Graphics document, I couldnt find an option to do this ) The scale is just a scale and shouldn't need colouring. You can do what you want by having a blue 'upstream' feature and a green 'gene' feature in the first row. > 2. to sort the Transcription factors based on the z_score I don't know Bio::Graphics well enough, but am interested in the answer... > 3. to give forward/reverse [> or < ]direction for the black boxes Presumably you just change the glyph type of your binding sites to something that shows direction, like 'processed_transcript'. Someone else may have a more appropriate suggestion. However, do your binding sites really have a direction? That is, do you really know which strand your transcription factor bound to? From cjfields at uiuc.edu Mon Aug 13 10:39:11 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 13 Aug 2007 09:39:11 -0500 Subject: [Bioperl-l] Bio::Graphics : To change scale color & sort and add direction to SeqFeature In-Reply-To: <51133.192.168.1.1.1187003265.squirrel@mail.ncbs.res.in> References: <10259461.post@talk.nabble.com> <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in> <1178028249.2644.13.camel@localhost.localdomain> <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in> <6dce9a0b0705030901w203344b4te03ad271a5482faf@mail.gmail.com> <51133.192.168.1.1.1187003265.squirrel@mail.ncbs.res.in> Message-ID: <871544DF-19F0-4C6A-849E-514D8B7BAA12@uiuc.edu> On Aug 13, 2007, at 6:07 AM, Shameer Khadar wrote: > Dear All, > > I am generating images based on Transcription Factor binding site data > using bio::graphics module. > I created my images using program : version-2 > [http://stein.cshl.org/genome_informatics/BioGraphics/] (Courtsey : L. > Stein ). I attaching one of the image with this mail. > > I need to make 3 changes to this image > > 1. to color the 'scale' > Color the scale in two different colors ie, from start 1.0k - color > blue > from 101 - till end of the scale green (I thoroghly checked the > Bio::Graphics document, I couldnt find an option to do this ) Much of the documentation you need is available via 'perldoc Bio::Graphics::Panel' and the various Bio::Graphics::Glyph classes. The above may be possible using two seqfeatures instead of one or maybe a split location with a callback (not sure, haven't tried either, mileage may vary, batteries not included, warranty void if packaging is opened, etc). Might be worth checking out the POD for the arrow glyph to see what's possible. > 2. to sort the Transcription factors based on the z_score In Bio::Graphics::Panel POD under 'Glyph Options', there is documentation for 'sort_order' which accepts callbacks. According to the docs you would basically do something like the following (the prototype is required; note the score): -sort_order => sub ($$) { my ($glyph1,$glyph2) = @_; my $a = $glyph1->feature; my $b = $glyph2->feature; ( $b->score/log($b->length) <=> $a->score/log($a->length) ) || ( $a->start <=> $b->start ) } Again, haven't tried. > 3. to give forward/reverse [> or < ]direction for the black boxes I think you first need to ensure the glyph will accept strandedness, though I think most do. Then you would set either the 'strand_arrow' or 'stranded' option to 1 (they are synonyms). Again, see Bio::Graphics::Panel POD under Glyph Options, specifically the parameter 'stranded' or 'strand_arrow'. > I would appreaciate if any one can give me some clues/link to > accomplish > this :). > thanks in advance , > Shameer No problem! chris > -- > Shameer Khadar > Lab (# 25) The Computational Biology Group > National Centre for Biological Sciences (TIFR) > GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India > T - 91-080-23666001 EXT - 6251 > W - http://www.ncbs.res.in > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From shameer at ncbs.res.in Mon Aug 13 10:47:35 2007 From: shameer at ncbs.res.in (Shameer Khadar) Date: Mon, 13 Aug 2007 20:17:35 +0530 (IST) Subject: [Bioperl-l] Bio::Graphics : To change scale color & sort and add direction to SeqFeature In-Reply-To: <46C05896.1010002@sendu.me.uk> References: <10259461.post@talk.nabble.com> <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in> <1178028249.2644.13.camel@localhost.localdomain> <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in> <6dce9a0b0705030901w203344b4te03ad271a5482faf@mail.gmail.com> <51133.192.168.1.1.1187003265.squirrel@mail.ncbs.res.in> <46C05896.1010002@sendu.me.uk> Message-ID: <59564.192.168.1.1.1187016455.squirrel@mail.ncbs.res.in> Dear Sendu, Thanks for your reply. >> I need to make 3 changes to this image >> >> 1. to color the 'scale' >> Color the scale in two different colors ie, from start 1.0k - color blue >> from 101 - till end of the scale green (I thoroghly checked the >> Bio::Graphics document, I couldnt find an option to do this ) > > The scale is just a scale and shouldn't need colouring. You can do what > you want by having a blue 'upstream' feature and a green 'gene' feature > in the first row. Thanks for the point : 'The scale is just a scale...'. But my idea is to differentiate the scale in to three to diffentiate between 100bp upstream region, UTR and gene start site. starting point of scale till 0k is the 100bp upstream. From 0k till end of the current_scale is UTR, from the end of scale gene starts, since this is a bit tough to distinguish, we thought of this coloring option. Addition of an extra track may is an alternate option (I tried to convince our experimental team by adding an extra track, but they want it this way :(..) > >> 2. to sort the Transcription factors based on the z_score > I don't know Bio::Graphics well enough, but am interested in the answer... > It is possible, but sort_order option is available. I tried it a couple of times but it is not working. > >> 3. to give forward/reverse [> or < ]direction for the black boxes > > Presumably you just change the glyph type of your binding sites to > something that shows direction, like 'processed_transcript'. Someone > else may have a more appropriate suggestion. Thanks, I will look in to it. > > However, do your binding sites really have a direction? That is, do you > really know which strand your transcription factor bound to? Yes, these info we collated from various experimental datasets. -- Shameer Khadar Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India T - 91-080-23666001 EXT - 6251 W - http://www.ncbs.res.in From bix at sendu.me.uk Mon Aug 13 11:01:43 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 13 Aug 2007 16:01:43 +0100 Subject: [Bioperl-l] Bio::Graphics : To change scale color & sort and add direction to SeqFeature In-Reply-To: <59564.192.168.1.1.1187016455.squirrel@mail.ncbs.res.in> References: <10259461.post@talk.nabble.com> <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in> <1178028249.2644.13.camel@localhost.localdomain> <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in> <6dce9a0b0705030901w203344b4te03ad271a5482faf@mail.gmail.com> <51133.192.168.1.1.1187003265.squirrel@mail.ncbs.res.in> <46C05896.1010002@sendu.me.uk> <59564.192.168.1.1.1187016455.squirrel@mail.ncbs.res.in> Message-ID: <46C07257.1000308@sendu.me.uk> Shameer Khadar wrote: >> However, do your binding sites really have a direction? That is, do you >> really know which strand your transcription factor bound to? > > Yes, these info we collated from various experimental datasets. Well, those datasets I'd like to see... What I was getting at is the strand probably isn't known at the experimental level, but to describe the site a strand has to be arbitrarily picked so you can write the sequence of the site down as a single string. Its probably the case that the strand information you have is just the way it happened to be reported in the literature and has no biological meaning. From shameer at ncbs.res.in Mon Aug 13 11:16:33 2007 From: shameer at ncbs.res.in (Shameer Khadar) Date: Mon, 13 Aug 2007 20:46:33 +0530 (IST) Subject: [Bioperl-l] Bio::Graphics : To change scale color & sort and add direction to SeqFeature In-Reply-To: <871544DF-19F0-4C6A-849E-514D8B7BAA12@uiuc.edu> References: <10259461.post@talk.nabble.com> <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in> <1178028249.2644.13.camel@localhost.localdomain> <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in> <6dce9a0b0705030901w203344b4te03ad271a5482faf@mail.gmail.com> <51133.192.168.1.1.1187003265.squirrel@mail.ncbs.res.in> <871544DF-19F0-4C6A-849E-514D8B7BAA12@uiuc.edu> Message-ID: <42833.192.168.1.1.1187018193.squirrel@mail.ncbs.res.in> Chris, Thanks for your detailed reply. I will read up the docs and try different options using ur code snippets as starting point. I will get back to the list with my results. Thanks -- Shameer > > On Aug 13, 2007, at 6:07 AM, Shameer Khadar wrote: > >> Dear All, >> >> I am generating images based on Transcription Factor binding site data >> using bio::graphics module. >> I created my images using program : version-2 >> [http://stein.cshl.org/genome_informatics/BioGraphics/] (Courtsey : L. >> Stein ). I attaching one of the image with this mail. >> >> I need to make 3 changes to this image >> >> 1. to color the 'scale' >> Color the scale in two different colors ie, from start 1.0k - color >> blue >> from 101 - till end of the scale green (I thoroghly checked the >> Bio::Graphics document, I couldnt find an option to do this ) > > Much of the documentation you need is available via 'perldoc > Bio::Graphics::Panel' and the various Bio::Graphics::Glyph classes. > The above may be possible using two seqfeatures instead of one or > maybe a split location with a callback (not sure, haven't tried > either, mileage may vary, batteries not included, warranty void if > packaging is opened, etc). Might be worth checking out the POD for > the arrow glyph to see what's possible. > >> 2. to sort the Transcription factors based on the z_score > > In Bio::Graphics::Panel POD under 'Glyph Options', there is > documentation for 'sort_order' which accepts callbacks. According to > the docs you would basically do something like the following (the > prototype is required; note the score): > > -sort_order => sub ($$) { > my ($glyph1,$glyph2) = @_; > my $a = $glyph1->feature; > my $b = $glyph2->feature; > ( $b->score/log($b->length) > <=> > $a->score/log($a->length) ) > || > ( $a->start <=> $b->start ) > } > > Again, haven't tried. > >> 3. to give forward/reverse [> or < ]direction for the black boxes > > I think you first need to ensure the glyph will accept strandedness, > though I think most do. Then you would set either the 'strand_arrow' > or 'stranded' option to 1 (they are synonyms). Again, see > Bio::Graphics::Panel POD under Glyph Options, specifically the > parameter 'stranded' or 'strand_arrow'. > -- Shameer Khadar Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India T - 91-080-23666001 EXT - 6251 W - http://www.ncbs.res.in From bix at sendu.me.uk Mon Aug 13 11:47:10 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 13 Aug 2007 16:47:10 +0100 Subject: [Bioperl-l] newbie wants install help In-Reply-To: References: <46BB93DC.9010608@sendu.me.uk> <46BC16A9.7090709@sendu.me.uk> Message-ID: <46C07CFE.7020105@sendu.me.uk> hari jayaram wrote: > Hey all , > Thanks for your help. Its working real well now. [snip] > I am thinking I will screencast this process for everyones benefit and > put it up on bioscreencast.com . If that will > be useful for others. I'm certain it will. That's a very interesting website. Thanks for taking the time, and I hope you find Bioperl useful. From cjfields at uiuc.edu Mon Aug 13 12:24:15 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 13 Aug 2007 11:24:15 -0500 Subject: [Bioperl-l] Bio::Graphics : To change scale color & sort and add direction to SeqFeature In-Reply-To: <46C07257.1000308@sendu.me.uk> References: <10259461.post@talk.nabble.com> <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in> <1178028249.2644.13.camel@localhost.localdomain> <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in> <6dce9a0b0705030901w203344b4te03ad271a5482faf@mail.gmail.com> <51133.192.168.1.1.1187003265.squirrel@mail.ncbs.res.in> <46C05896.1010002@sendu.me.uk> <59564.192.168.1.1.1187016455.squirrel@mail.ncbs.res.in> <46C07257.1000308@sendu.me.uk> Message-ID: On Aug 13, 2007, at 10:01 AM, Sendu Bala wrote: > Shameer Khadar wrote: >>> However, do your binding sites really have a direction? That is, >>> do you >>> really know which strand your transcription factor bound to? >> >> Yes, these info we collated from various experimental datasets. > > Well, those datasets I'd like to see... What I was getting at is the > strand probably isn't known at the experimental level, but to describe > the site a strand has to be arbitrarily picked so you can write the > sequence of the site down as a single string. Its probably the case > that > the strand information you have is just the way it happened to be > reported in the literature and has no biological meaning. It's subjective. I can think of several cases where strandedness does matter and has meaning. If the motif is related to how the gene is transcribed or post-transcriptionally regulated, for instance; elements which indicate start of transcription (-10/-35 or any sigma- factor-related promoter element in prokaryotes), end of transcription (poly-A signal, transcription terminators), modulation of translation (SECIS, IRES), or conserved DNA motifs which are transcribed prior to regulation (RNA-binding proteins like IRE). chris From amacgregor at ccg.murdoch.edu.au Mon Aug 13 20:52:10 2007 From: amacgregor at ccg.murdoch.edu.au (Andrew Macgregor) Date: Tue, 14 Aug 2007 08:52:10 +0800 Subject: [Bioperl-l] Homologene parser? In-Reply-To: <764978cf0708130329r484bf210qd0e6e9ad39274bbc@mail.gmail.com> References: <764978cf0708130329r484bf210qd0e6e9ad39274bbc@mail.gmail.com> Message-ID: <22FEA28C-1720-4519-B129-B7A5ADA452D4@ccg.murdoch.edu.au> On 13/08/2007, at 6:29 PM, neeti somaiya wrote: > Hi, > > Does anyone know of any Homologene parser, if available? > Please let me know. > > Thanks and Regards, > Neeti. Hi Neeti, Quite a long time ago now I wrote an Homologene parser and posted it to the mailing list: I don't know if this still works but you could use it as a starting point. There may also be something newer out there too, I don't know. If you search the mailing list archives you'll get a few messages around the topic. Cheers, Andrew. Andrew Macgregor Centre for Comparative Genomics, Murdoch University Email: amacgregor at ccg.murdoch.edu.au Tel: (08) 9360 2961 From cjfields at uiuc.edu Mon Aug 13 23:21:54 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 13 Aug 2007 22:21:54 -0500 Subject: [Bioperl-l] Homologene parser? In-Reply-To: <22FEA28C-1720-4519-B129-B7A5ADA452D4@ccg.murdoch.edu.au> References: <764978cf0708130329r484bf210qd0e6e9ad39274bbc@mail.gmail.com> <22FEA28C-1720-4519-B129-B7A5ADA452D4@ccg.murdoch.edu.au> Message-ID: <4E7F8A99-68A7-49C2-9919-E2FC5652C8D7@uiuc.edu> It looks like Heikki responded and thought a good place for it would be Bio::SeqIO, but it didn't go anywhere I suppose. I see that a few other posts suggest it could be placed in Bio::Cluster as well which I'm not familiar with. We could add it in if you were still interested, just need to find a good place for it; might be nice to have a Parse::RecDescent-based parser. chris On Aug 13, 2007, at 7:52 PM, Andrew Macgregor wrote: > On 13/08/2007, at 6:29 PM, neeti somaiya wrote: > >> Hi, >> >> Does anyone know of any Homologene parser, if available? >> Please let me know. >> >> Thanks and Regards, >> Neeti. > > Hi Neeti, > > Quite a long time ago now I wrote an Homologene parser and posted it > to the mailing list: > > > > I don't know if this still works but you could use it as a starting > point. There may also be something newer out there too, I don't know. > If you search the mailing list archives you'll get a few messages > around the topic. > > Cheers, Andrew. > > > Andrew Macgregor > Centre for Comparative Genomics, Murdoch University > Email: amacgregor at ccg.murdoch.edu.au > Tel: (08) 9360 2961 > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From n.haigh at sheffield.ac.uk Tue Aug 14 03:46:19 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Tue, 14 Aug 2007 08:46:19 +0100 Subject: [Bioperl-l] Warnings/errors generated by Eclipse Message-ID: <46C15DCB.80603@sheffield.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I've just been setting up Eclipse with the EPIC plugin, and it's generating some errors and warnings about bioperl-live that I'd like to pass by you. I think most of the errors are along the lines of: "Can't find 'build_params' in _build in /usr/local/share/perl/5.8.8/Module/Build/Base.pm line 1011" This occurs with files like: t/Biblio_biofetch.t t/seqread_fail.t I think it's to do with the parameters passed to test_begin() or it could be my setup of Eclipse? Other highlighted problems are some of the scripts in the examples dir. Some require modules that reside in the bioperl-run package. Would it be wise to move these to the bioperl-run examples dir? There may also be some problems with XML files in t/data e.g. t/data/interpro_ebi.xml There appears to be a typo on line 2. However, I'm not sure this is up-to-date? I can comment on the others later if required. Cheers Nath -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGwV3KczuW2jkwy2gRApM/AJ9abWl02CAJqDK2sEXEUEg8nGRC4ACdHcAb nZmh+1dmtc1W9mThkUVKitw= =5eXZ -----END PGP SIGNATURE----- From amacgregor at ccg.murdoch.edu.au Tue Aug 14 01:14:58 2007 From: amacgregor at ccg.murdoch.edu.au (Andrew Macgregor) Date: Tue, 14 Aug 2007 13:14:58 +0800 Subject: [Bioperl-l] Homologene parser? In-Reply-To: <4E7F8A99-68A7-49C2-9919-E2FC5652C8D7@uiuc.edu> References: <764978cf0708130329r484bf210qd0e6e9ad39274bbc@mail.gmail.com> <22FEA28C-1720-4519-B129-B7A5ADA452D4@ccg.murdoch.edu.au> <4E7F8A99-68A7-49C2-9919-E2FC5652C8D7@uiuc.edu> Message-ID: On 14/08/2007, at 11:21 AM, Chris Fields wrote: > It looks like Heikki responded and thought a good place for it > would be Bio::SeqIO, but it didn't go anywhere I suppose. I see > that a few other posts suggest it could be placed in Bio::Cluster > as well which I'm not familiar with. We could add it in if you > were still interested, just need to find a good place for it; might > be nice to have a Parse::RecDescent-based parser. > > chris > Hi Chris, I was also doing some parsing of UniGene at the time but found RecDescent was too slow and went back to regexes. That code found it's way into Bio::Cluster. Occasionally I see a message with someone looking for a Homologene parser but not very often, so I'm not sure it is worth the effort of moving the code into bioperl. Cheers, Andrew. From neetisomaiya at gmail.com Tue Aug 14 09:24:07 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Tue, 14 Aug 2007 18:54:07 +0530 Subject: [Bioperl-l] Homologene parser? In-Reply-To: <22FEA28C-1720-4519-B129-B7A5ADA452D4@ccg.murdoch.edu.au> References: <764978cf0708130329r484bf210qd0e6e9ad39274bbc@mail.gmail.com> <22FEA28C-1720-4519-B129-B7A5ADA452D4@ccg.murdoch.edu.au> Message-ID: <764978cf0708140624s5c198b5akee38bf98866fd7f2@mail.gmail.com> Hi Andrew, I think the homologene data files have changed now on the ftp, from what you had used. It is now homologene.data and homologene.xml. I tried using your parser, but because it was written on the file hmlg.trip.ftp, it doesnt work anymore. I came across a parser http://bioinformatics.tgen.org/brunit/software/bioparser/docs/pod_bio_parser_homologene_fileparser_pm.shtml . I am looking at it to see if it works for me. NOt sure if it will. ~Neeti. On 8/14/07, Andrew Macgregor wrote: > > On 13/08/2007, at 6:29 PM, neeti somaiya wrote: > > > Hi, > > > > Does anyone know of any Homologene parser, if available? > > Please let me know. > > > > Thanks and Regards, > > Neeti. > > Hi Neeti, > > Quite a long time ago now I wrote an Homologene parser and posted it > to the mailing list: > > > > I don't know if this still works but you could use it as a starting > point. There may also be something newer out there too, I don't know. > If you search the mailing list archives you'll get a few messages > around the topic. > > Cheers, Andrew. > > > Andrew Macgregor > Centre for Comparative Genomics, Murdoch University > Email: amacgregor at ccg.murdoch.edu.au > Tel: (08) 9360 2961 > > > > -- -Neeti Even my blood says, B positive From bix at sendu.me.uk Tue Aug 14 10:57:29 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 14 Aug 2007 15:57:29 +0100 Subject: [Bioperl-l] Should coords be adjusted after removing alignment columns? Message-ID: <46C1C2D9.6050409@sendu.me.uk> I'm looking at what looks like a pretty major bug in Bio::SimpleAlign, but before I commit the fix I wanted to check my sanity/understanding. My understanding is that an alignment may be built from just sub-parts of a number of sequences. So you give each sequence in the alignment a start and stop so you can later map back the aligned region to the original sequence. So, for example, the following should all pass: diff -r1.56 SimpleAlign.t 459a460,540 > > > # is _remove_col really working correctly? > my $a = Bio::LocatableSeq->new(-id => 'a', -seq => 'atcgatcgatcgatcg', -start => 5, -end => 20); > my $b = Bio::LocatableSeq->new(-id => 'b', -seq => '-tcgatc-atcgatcg', -start => 30, -end => 43); > my $c = Bio::LocatableSeq->new(-id => 'c', -seq => 'atcgatcgatc-atc-', -start => 50, -end => 63); > my $d = Bio::LocatableSeq->new(-id => 'd', -seq => '--cgatcgatcgat--', -start => 80, -end => 91); > my $e = Bio::LocatableSeq->new(-id => 'e', -seq => '-t-gatcgatcga-c-', -start => 100, -end => 111); > $aln = Bio::SimpleAlign->new(); > $aln->add_seq($a); > $aln->add_seq($b); > $aln->add_seq($c); > > my $gapless = $aln->remove_gaps(); > foreach my $seq ($gapless->each_seq) { > if ($seq->id eq 'a') { > is $seq->start, 6; > is $seq->end, 19; > is $seq->seq, 'tcgatcatcatc'; > } > elsif ($seq->id eq 'b') { > is $seq->start, 30; > is $seq->end, 42; > is $seq->seq, 'tcgatcatcatc'; > } > elsif ($seq->id eq 'c') { > is $seq->start, 51; > is $seq->end, 63; > is $seq->seq, 'tcgatcatcatc'; > } > } > > $aln->add_seq($d); > $aln->add_seq($e); > $gapless = $aln->remove_gaps(); > foreach my $seq ($gapless->each_seq) { > if ($seq->id eq 'a') { > is $seq->start, 8; > is $seq->end, 17; > is $seq->seq, 'gatcatca'; > } > elsif ($seq->id eq 'b') { > is $seq->start, 32; > is $seq->end, 40; > is $seq->seq, 'gatcatca'; > } > elsif ($seq->id eq 'c') { > is $seq->start, 53; > is $seq->end, 61; > is $seq->seq, 'gatcatca'; > } > elsif ($seq->id eq 'd') { > is $seq->start, 81; > is $seq->end, 90; > is $seq->seq, 'gatcatca'; > } > elsif ($seq->id eq 'e') { > is $seq->start, 101; > is $seq->end, 110; > is $seq->seq, 'gatcatca'; > } > } > > my $f = Bio::LocatableSeq->new(-id => 'f', -seq => 'a-cgatcgatcgat-g', -start => 30, -end => 43); > $aln = Bio::SimpleAlign->new(); > $aln->add_seq($a); > $aln->add_seq($f); > > $gapless = $aln->remove_gaps(); > foreach my $seq ($gapless->each_seq) { > if ($seq->id eq 'a') { > is $seq->start, 5; > is $seq->end, 20; > is $seq->seq, 'acgatcgatcgatg'; > } > elsif ($seq->id eq 'f') { > is $seq->start, 30; > is $seq->end, 43; > is $seq->seq, 'acgatcgatcgatg'; > } > } But they don't. Once you remove certain columns the start and stop of the sequences in the alignment are no longer correct coordinates for the sub-sequence in the original sequence. I propose the following patch to resolve this issue: diff -r1.136 SimpleAlign.pm 1116c1116,1118 < --- > > my $gap = $self->gap_char; > 1129,1137c1131,1147 < my $spliced; < $spliced .= $start > 0 ? substr($sequence,0,$start) : ''; < $spliced .= substr($sequence,$end+1,$seq->length-$end+1); < $sequence = $spliced; < if ($start == 1) { < $new_seq->start($end); < } < else { < $new_seq->start( $seq->start); --- > my $orig = $sequence; > my $head = $start > 0 ? substr($sequence, 0, $start) : ''; > my $tail = ($end + 1) >= length($sequence) ? '' : substr($sequence, $end + 1); > $sequence = $head.$tail; > # start > unless (defined $new_seq->start) { > if ($start == 0) { > my $start_adjust = () = substr($orig, 0, $end + 1) =~ /$gap/g; > $new_seq->start($seq->start + $end + 1 - $start_adjust); > } > else { > my $start_adjust = $orig =~ /$gap+/; > if ($start_adjust) { > $start_adjust = $+[0] - 1 < $start; > } > $new_seq->start($seq->start + $start_adjust); > } 1140,1141c1150,1152 < if($end >= $seq->end){ < $new_seq->end( $start); --- > if (($end + 1) >= length($orig)) { > my $end_adjust = () = substr($orig, $start) =~ /$gap/g; > $new_seq->end($seq->end - (length($orig) - $start) + $end_adjust); 1144c1155 < $new_seq->end($seq->end); --- > $new_seq->end($seq->end); 1148c1159 < push @new, $new_seq; --- > push @new, $new_seq; 1207,1209c1218,1234 < # sort the positions to remove columns at the end 1st < @$positions = sort { $b->[0] <=> $a->[0] } @$positions; < $aln = $self->_remove_col($aln,$positions); --- > # sort the positions > @$positions = sort { $a->[0] <=> $b->[0] } @$positions; > > my @remove; > my $length = 0; > foreach my $pos (@{$positions}) { > my ($start, $end) = @{$pos}; > > #have to offset the start and end for subsequent removes > $start-=$length; > $end -=$length; > $length += ($end-$start+1); > push @remove, [$start,$end]; > } > > #remove the segments > $aln = $#remove >= 0 ? $self->_remove_col($aln,\@remove) : $self; This breaks 2 tests in SimpleAlign.t, but as far as I can tell, those tests expect the wrong answer. Changed to expect the correct answer, SimpleAlign.t and all other tests in the test suite pass. diff -r1.56 SimpleAlign.t 214,215c214,215 < "P84139/1-33 NEGEHQIKLDELFEKLLRARLIFKNKDVLRRC\n". < "P814153/1-33 NEGMHQIKLDVLFEKLLRARLIFKNKDVLRRC\n". --- > "P84139/2-33 NEGEHQIKLDELFEKLLRARLIFKNKDVLRRC\n". > "P814153/2-33 NEGMHQIKLDVLFEKLLRARLIFKNKDVLRRC\n". 229c229 < "gb|443893|124775/1-32 -RFRIKVPPAVEGARPALLIFKSRPELGC\n", --- > "gb|443893|124775/2-32 -RFRIKVPPAVEGARPALLIFKSRPELGC\n", Can someone triple-check my thinking and report back please? Cheers, Sendu. From basu at pharm.sunysb.edu Tue Aug 14 11:02:06 2007 From: basu at pharm.sunysb.edu (Siddhartha Basu) Date: Tue, 14 Aug 2007 11:02:06 -0400 Subject: [Bioperl-l] Homologene parser? In-Reply-To: <764978cf0708140624s5c198b5akee38bf98866fd7f2@mail.gmail.com> References: <764978cf0708130329r484bf210qd0e6e9ad39274bbc@mail.gmail.com> <22FEA28C-1720-4519-B129-B7A5ADA452D4@ccg.murdoch.edu.au> <764978cf0708140624s5c198b5akee38bf98866fd7f2@mail.gmail.com> Message-ID: <46C1C3EE.4030006@pharm.sunysb.edu> neeti somaiya wrote: > Hi Andrew, > > I think the homologene data files have changed now on the ftp, from what you > had used. > It is now homologene.data and homologene.xml. > I tried using your parser, but because it was written on the file > hmlg.trip.ftp, it doesnt work anymore. > > I came across a parser > http://bioinformatics.tgen.org/brunit/software/bioparser/docs/pod_bio_parser_homologene_fileparser_pm.shtml > . > I am looking at it to see if it works for me. NOt sure if it will. > > ~Neeti. Hi Neeti, I have recently written a parser for 'homologene' xml data specific for my purpose. I am not sure whether it will suit your purpose but it could be extended for general purpose parsing, so i am putting it forward. Here is how it works ....... * It only parses a single homologene entry ...... * It does SAX based parsing (currently uses XML::SAX::ExpatXS) * Returns a graph(uses Graph module of perl) object where each node is a homologue entry with its corresponding entrez gene id. Each node also contain the following attributes ... * Refseq protein id. * Protein id (pid) * ncbi taxon id. * The edge attribute contain information about the ortholog(true/false) relationship between two nodes. * The rest of tags currently are not being extracted. However, parsing the rest of the tags should not be very difficult. Generally i get homologene xml stream from an 'efetch' through Bio::DB::EUtilities, feed it to the parser, gets back 'Graph' object and then works on it. So, to make it more generic and work on local file * We need another class that reads the chunk between ..... and sends it to the parser. * Add supports for most of the tags. * Massage the data to a bioperl compatible object. The first two i could work it out and for the last one i have to figure out the bioperl object that could be suitable (like Bio::Cluster or Bio::NetWork::Node/Edge). Let me know if it sounds interesting and i will send you the code. -siddhartha > > On 8/14/07, Andrew Macgregor wrote: >> On 13/08/2007, at 6:29 PM, neeti somaiya wrote: >> >>> Hi, >>> >>> Does anyone know of any Homologene parser, if available? >>> Please let me know. >>> >>> Thanks and Regards, >>> Neeti. >> Hi Neeti, >> >> Quite a long time ago now I wrote an Homologene parser and posted it >> to the mailing list: >> >> >> >> I don't know if this still works but you could use it as a starting >> point. There may also be something newer out there too, I don't know. >> If you search the mailing list archives you'll get a few messages >> around the topic. >> >> Cheers, Andrew. >> >> >> Andrew Macgregor >> Centre for Comparative Genomics, Murdoch University >> Email: amacgregor at ccg.murdoch.edu.au >> Tel: (08) 9360 2961 >> >> >> >> > > From cjfields at uiuc.edu Tue Aug 14 12:33:31 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 14 Aug 2007 11:33:31 -0500 Subject: [Bioperl-l] Should coords be adjusted after removing alignment columns? In-Reply-To: <46C1C2D9.6050409@sendu.me.uk> References: <46C1C2D9.6050409@sendu.me.uk> Message-ID: Could you attach the scripts and patches to a bug report for tracking so anyone interested can double-check? Having them in an email is problematic as the text in some clients wraps. From what I'm seeing I think we're in general agreement, though I'll reason through it to see if I'm following correctly. The data in the SimpleAlign example you give is this: a/5-20 atcgatcgatcgatcg b/30-43 -tcgatc-atcgatcg c/50-63 atcgatcgatc-atc- ****** *** *** Removing the gaps gives: a/5-20 tcgatcatcatc b/30-43 tcgatcatcatc c/50-63 tcgatcatcatc ************ The start/end is wrong, as you state. Adjusting to map simple start/ ends to the original sequence won't work as we're removing gaps and residues in the LocatableSeqs along with it (ends and internal residues). I guess if we want to map back to the original sequence accurately we would have to use split locations (not currently implemented with LocatableSeq) or maybe a cigar-like syntax against consensus (ugh), otherwise we wouldn't know where to map the relevant internal gaps (now missing from the alignment) w/o running a local alignment against the original sequence: a/6-11;12-19 tcgatcatcatc b/30-38;40-42 tcgatcatcatc c/51-56;58-63 tcgatcatcatc ************ That could get really hairy for long alignments. We could also return multiple SimpleAligns which map correctly (ugh), but what we really want (and the API specifies) is a new single SimpleAlign. It may come down to simply stating it 'voids the warranty' (so-to- speak) when modifications are made to alignments which remove/insert residues from LocatableSeqs via remove_gaps/remove_columns or similar, and either leave as is with relevant warnings or readjust start/end appropriately when LocatableSeq residues change. gapless_a/1-12 tcgatcatcatc gapless_b/1-12 tcgatcatcatc gapless_c/1-12 tcgatcatcatc ************ Not sure which is the best approach but anything would be better than giving an unexpectedly incorrect answer. chris On Aug 14, 2007, at 9:57 AM, Sendu Bala wrote: > I'm looking at what looks like a pretty major bug in Bio::SimpleAlign, > but before I commit the fix I wanted to check my sanity/understanding. > > My understanding is that an alignment may be built from just sub-parts > of a number of sequences. So you give each sequence in the alignment a > start and stop so you can later map back the aligned region to the > original sequence. So, for example, the following should all pass: > > diff -r1.56 SimpleAlign.t > 459a460,540 >> >> >> # is _remove_col really working correctly? >> my $a = Bio::LocatableSeq->new(-id => 'a', -seq => > 'atcgatcgatcgatcg', -start => 5, -end => 20); >> my $b = Bio::LocatableSeq->new(-id => 'b', -seq => > '-tcgatc-atcgatcg', -start => 30, -end => 43); >> my $c = Bio::LocatableSeq->new(-id => 'c', -seq => > 'atcgatcgatc-atc-', -start => 50, -end => 63); >> my $d = Bio::LocatableSeq->new(-id => 'd', -seq => > '--cgatcgatcgat--', -start => 80, -end => 91); >> my $e = Bio::LocatableSeq->new(-id => 'e', -seq => > '-t-gatcgatcga-c-', -start => 100, -end => 111); >> $aln = Bio::SimpleAlign->new(); >> $aln->add_seq($a); >> $aln->add_seq($b); >> $aln->add_seq($c); >> >> my $gapless = $aln->remove_gaps(); >> foreach my $seq ($gapless->each_seq) { >> if ($seq->id eq 'a') { >> is $seq->start, 6; >> is $seq->end, 19; >> is $seq->seq, 'tcgatcatcatc'; >> } >> elsif ($seq->id eq 'b') { >> is $seq->start, 30; >> is $seq->end, 42; >> is $seq->seq, 'tcgatcatcatc'; >> } >> elsif ($seq->id eq 'c') { >> is $seq->start, 51; >> is $seq->end, 63; >> is $seq->seq, 'tcgatcatcatc'; >> } >> } >> >> $aln->add_seq($d); >> $aln->add_seq($e); >> $gapless = $aln->remove_gaps(); >> foreach my $seq ($gapless->each_seq) { >> if ($seq->id eq 'a') { >> is $seq->start, 8; >> is $seq->end, 17; >> is $seq->seq, 'gatcatca'; >> } >> elsif ($seq->id eq 'b') { >> is $seq->start, 32; >> is $seq->end, 40; >> is $seq->seq, 'gatcatca'; >> } >> elsif ($seq->id eq 'c') { >> is $seq->start, 53; >> is $seq->end, 61; >> is $seq->seq, 'gatcatca'; >> } >> elsif ($seq->id eq 'd') { >> is $seq->start, 81; >> is $seq->end, 90; >> is $seq->seq, 'gatcatca'; >> } >> elsif ($seq->id eq 'e') { >> is $seq->start, 101; >> is $seq->end, 110; >> is $seq->seq, 'gatcatca'; >> } >> } >> >> my $f = Bio::LocatableSeq->new(-id => 'f', -seq => > 'a-cgatcgatcgat-g', -start => 30, -end => 43); >> $aln = Bio::SimpleAlign->new(); >> $aln->add_seq($a); >> $aln->add_seq($f); >> >> $gapless = $aln->remove_gaps(); >> foreach my $seq ($gapless->each_seq) { >> if ($seq->id eq 'a') { >> is $seq->start, 5; >> is $seq->end, 20; >> is $seq->seq, 'acgatcgatcgatg'; >> } >> elsif ($seq->id eq 'f') { >> is $seq->start, 30; >> is $seq->end, 43; >> is $seq->seq, 'acgatcgatcgatg'; >> } >> } > > > But they don't. Once you remove certain columns the start and stop of > the sequences in the alignment are no longer correct coordinates > for the > sub-sequence in the original sequence. > > I propose the following patch to resolve this issue: > > diff -r1.136 SimpleAlign.pm > 1116c1116,1118 > < > --- >> >> my $gap = $self->gap_char; >> > 1129,1137c1131,1147 > < my $spliced; > < $spliced .= $start > 0 ? substr($sequence,0,$start) : > ''; > < $spliced .= substr($sequence,$end+1,$seq->length-$end > +1); > < $sequence = $spliced; > < if ($start == 1) { > < $new_seq->start($end); > < } > < else { > < $new_seq->start( $seq->start); > --- >> my $orig = $sequence; >> my $head = $start > 0 ? substr($sequence, 0, >> $start) : ''; >> my $tail = ($end + 1) >= length($sequence) ? '' : > substr($sequence, $end + 1); >> $sequence = $head.$tail; >> # start >> unless (defined $new_seq->start) { >> if ($start == 0) { >> my $start_adjust = () = substr($orig, 0, $end + > 1) =~ /$gap/g; >> $new_seq->start($seq->start + $end + 1 - > $start_adjust); >> } >> else { >> my $start_adjust = $orig =~ /$gap+/; >> if ($start_adjust) { >> $start_adjust = $+[0] - 1 < $start; >> } >> $new_seq->start($seq->start + $start_adjust); >> } > 1140,1141c1150,1152 > < if($end >= $seq->end){ > < $new_seq->end( $start); > --- >> if (($end + 1) >= length($orig)) { >> my $end_adjust = () = substr($orig, $start) =~ / >> $gap/g; >> $new_seq->end($seq->end - (length($orig) - $start) + > $end_adjust); > 1144c1155 > < $new_seq->end($seq->end); > --- >> $new_seq->end($seq->end); > 1148c1159 > < push @new, $new_seq; > --- >> push @new, $new_seq; > 1207,1209c1218,1234 > < # sort the positions to remove columns at the end 1st > < @$positions = sort { $b->[0] <=> $a->[0] } @$positions; > < $aln = $self->_remove_col($aln,$positions); > --- >> # sort the positions >> @$positions = sort { $a->[0] <=> $b->[0] } @$positions; >> >> my @remove; >> my $length = 0; >> foreach my $pos (@{$positions}) { >> my ($start, $end) = @{$pos}; >> >> #have to offset the start and end for subsequent removes >> $start-=$length; >> $end -=$length; >> $length += ($end-$start+1); >> push @remove, [$start,$end]; >> } >> >> #remove the segments >> $aln = $#remove >= 0 ? $self->_remove_col($aln,\@remove) : $self; > > > This breaks 2 tests in SimpleAlign.t, but as far as I can tell, those > tests expect the wrong answer. Changed to expect the correct answer, > SimpleAlign.t and all other tests in the test suite pass. > > diff -r1.56 SimpleAlign.t > 214,215c214,215 > < "P84139/1-33 NEGEHQIKLDELFEKLLRARLIFKNKDVLRRC\n". > < "P814153/1-33 NEGMHQIKLDVLFEKLLRARLIFKNKDVLRRC\n". > --- >> "P84139/2-33 NEGEHQIKLDELFEKLLRARLIFKNKDVLRRC\n". >> "P814153/2-33 NEGMHQIKLDVLFEKLLRARLIFKNKDVLRRC\n". > 229c229 > < "gb|443893|124775/1-32 -RFRIKVPPAVEGARPALLIFKSRPELGC\n", > --- >> "gb|443893|124775/2-32 -RFRIKVPPAVEGARPALLIFKSRPELGC\n", > > > Can someone triple-check my thinking and report back please? > > Cheers, > Sendu. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Tue Aug 14 13:13:30 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 14 Aug 2007 18:13:30 +0100 Subject: [Bioperl-l] Should coords be adjusted after removing alignment columns? In-Reply-To: References: <46C1C2D9.6050409@sendu.me.uk> Message-ID: <46C1E2BA.8060606@sendu.me.uk> Chris Fields wrote: > Could you attach the scripts and patches to a bug report for tracking > so anyone interested can double-check? Having them in an email is > problematic as the text in some clients wraps. http://bugzilla.open-bio.org/show_bug.cgi?id=2344 > From what I'm seeing I think we're in general agreement, though I'll > reason through it to see if I'm following correctly. The data in > the SimpleAlign example you give is this: > > a/5-20 atcgatcgatcgatcg > b/30-43 -tcgatc-atcgatcg > c/50-63 atcgatcgatc-atc- > ****** *** *** > > Removing the gaps gives: > > a/5-20 tcgatcatcatc > b/30-43 tcgatcatcatc > c/50-63 tcgatcatcatc > ************ > > The start/end is wrong, as you state. Yes. For extra clarity, my thinking is that the correct answer is: a/6-19 tcgatcatcatc b/30-42 tcgatcatcatc c/51-63 tcgatcatcatc ************ > Adjusting to map simple start/ends to the original sequence won't > work as we're removing gaps and residues in the LocatableSeqs along > with it (ends and internal residues). I guess if we want to map back > to the original sequence accurately [snip] What you say in the rest of your discussion is valid and deserves some thought/discussion, but for now just getting the start and end correct, ignoring any issues with internal residues, seems like a no-brainer. For my own purposes that is all I need; having removed gaps I only need the start and end so I can take that region from each sequence and do a new alignment (for example). BTW. Either my patch isn't quite perfect or there's another related bug I'm still tracking down. I'll commit when I've solved that, unless someone points out any mistakes in my thinking. From basu at pharm.stonybrook.edu Tue Aug 14 12:16:23 2007 From: basu at pharm.stonybrook.edu (Siddhartha Basu) Date: Tue, 14 Aug 2007 12:16:23 -0400 Subject: [Bioperl-l] Homologene parser? In-Reply-To: <764978cf0708140624s5c198b5akee38bf98866fd7f2@mail.gmail.com> References: <764978cf0708130329r484bf210qd0e6e9ad39274bbc@mail.gmail.com> <22FEA28C-1720-4519-B129-B7A5ADA452D4@ccg.murdoch.edu.au> <764978cf0708140624s5c198b5akee38bf98866fd7f2@mail.gmail.com> Message-ID: <46C1D557.7090101@pharm.stonybrook.edu> neeti somaiya wrote: > Hi Andrew, > > I think the homologene data files have changed now on the ftp, from what you > had used. > It is now homologene.data and homologene.xml. > I tried using your parser, but because it was written on the file > hmlg.trip.ftp, it doesnt work anymore. > > I came across a parser > http://bioinformatics.tgen.org/brunit/software/bioparser/docs/pod_bio_parser_homologene_fileparser_pm.shtml > . > I am looking at it to see if it works for me. NOt sure if it will. > > ~Neeti. Hi Neeti, I have recently written a parser for 'homologene' xml data specific for my purpose. I am not sure whether it will suit your purpose but it could be extended for general purpose parsing, so i am putting it forward. Here is how it works ....... * It only parses a single homologene entry ...... * It does SAX based parsing (currently uses XML::SAX::ExpatXS) * Returns a graph(uses Graph module of perl) object where each node is a homologue entry with its corresponding entrez gene id. Each node also contain the following attributes ... * Refseq protein id. * Protein id (pid) * ncbi taxon id. * The edge attribute contain information about the ortholog(true/false) relationship between two nodes. * The rest of tags currently are not being extracted. However, parsing the rest of the tags should not be very difficult. Generally i get homologene xml stream from an 'efetch' through Bio::DB::EUtilities, feed it to the parser, gets back 'Graph' object and then works on it. So, to make it more generic and work on local file * We need another class that reads the chunk between ..... and sends it to the parser. * Add supports for most of the tags. * Massage the data to a bioperl compatible object. The first two i could work it out and for the last one i have to figure out the bioperl object that could be suitable (like Bio::Cluster or Bio::NetWork::Node/Edge). Let me know if it sounds interesting and i will send you the code. -siddhartha > > On 8/14/07, Andrew Macgregor wrote: >> On 13/08/2007, at 6:29 PM, neeti somaiya wrote: >> >>> Hi, >>> >>> Does anyone know of any Homologene parser, if available? >>> Please let me know. >>> >>> Thanks and Regards, >>> Neeti. >> Hi Neeti, >> >> Quite a long time ago now I wrote an Homologene parser and posted it >> to the mailing list: >> >> >> >> I don't know if this still works but you could use it as a starting >> point. There may also be something newer out there too, I don't know. >> If you search the mailing list archives you'll get a few messages >> around the topic. >> >> Cheers, Andrew. >> >> >> Andrew Macgregor >> Centre for Comparative Genomics, Murdoch University >> Email: amacgregor at ccg.murdoch.edu.au >> Tel: (08) 9360 2961 >> >> >> >> > > From cjfields at uiuc.edu Tue Aug 14 13:19:59 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 14 Aug 2007 12:19:59 -0500 Subject: [Bioperl-l] Should coords be adjusted after removing alignment columns? In-Reply-To: <46C1E2BA.8060606@sendu.me.uk> References: <46C1C2D9.6050409@sendu.me.uk> <46C1E2BA.8060606@sendu.me.uk> Message-ID: On Aug 14, 2007, at 12:13 PM, Sendu Bala wrote: ... > > Yes. For extra clarity, my thinking is that the correct answer is: > > a/6-19 tcgatcatcatc > b/30-42 tcgatcatcatc > c/51-63 tcgatcatcatc > ... > What you say in the rest of your discussion is valid and deserves > some thought/discussion, but for now just getting the start and end > correct, ignoring any issues with internal residues, seems like a > no-brainer. > > For my own purposes that is all I need; having removed gaps I only > need the start and end so I can take that region from each sequence > and do a new alignment (for example). It might be worth addressing the split location issue in the bug report before it gets lost in the ether. Or maybe start a new one as an enhancement request. > BTW. Either my patch isn't quite perfect or there's another related > bug I'm still tracking down. I'll commit when I've solved that, > unless someone points out any mistakes in my thinking. Sounds fine by me. chris From gyang at plantbio.uga.edu Tue Aug 14 15:01:07 2007 From: gyang at plantbio.uga.edu (Guojun Yang) Date: Tue, 14 Aug 2007 15:01:07 -0400 Subject: [Bioperl-l] the most weird thing I've seen, help please In-Reply-To: 41A08079-6EEC-4B62-8104-C41E70C03083@uiuc.edu Message-ID: <20070814190107.4834b14b@dogwood.plantbio.uga.edu> Hi, all, I have two subroutines in my code. One is remoteblast and the other local blast. It works well. When I decided to change the remoteblast to local blast, I always get the following error. I downloaded nt database from NCBI as preformatted, but it works ok for both subroutines when I use command line blastall -p blastn.... I changed the db name to 'nt', 'nt.00', the same error message was returned. The error says: "program name was not given an argument", but I apparently gave it there. Can anybody help me? The code for the two subrountines are very similar: sub search { use Bio::Tools::Run::StandAloneBlast; use Bio::SearchIO::blast; my $query = Bio::Seq -> new ( -seq=>"$_[0]", -id=>"query"); my $len=$query->length(); @db=('nt.nal'); foreach my $db (@db) { my $factory = Bio::Tools::Run::StandAloneBlast->new( -program =>"blastn", -database =>"$db", -e =>"$_[1]"); my $rc = $factory->blastall($query); ...... sub ancestor { use Bio::Tools::Run::StandAloneBlast; use Bio::SearchIO::blast; my $query = Bio::Seq -> new ( -seq=>"$_[0]", -id=>"test"); my $len=$query->length(); my $long_name=$_[1]; my $long_start=$_[2]; my $long_end=$_[3]; @db=('TNDB'); foreach my $db (@db) { my $factory = Bio::Tools::Run::StandAloneBlast->new(-program => "blastx", -database => "$db", -e => 1e-3, ); my $blast_report = $factory->blastall($query); Thanks a lot! Guojun Yang Department of Plant Biology University of Georgia From zhaodj at ioz.ac.cn Wed Aug 15 04:05:36 2007 From: zhaodj at ioz.ac.cn (De-Jian,ZHAO) Date: Wed, 15 Aug 2007 16:05:36 +0800 (CST) Subject: [Bioperl-l] the most weird thing I've seen, help please In-Reply-To: <20070814190107.4834b14b@dogwood.plantbio.uga.edu> References: <20070814190107.4834b14b@dogwood.plantbio.uga.edu> Message-ID: <52820.159.226.67.49.1187165136.squirrel@mail.ioz.ac.cn> Hi Guojun Yang, I tested your code,modifying part of them. However,I did not encounter the error.The modified code follows (see below and the attachment). The codes run without any error on my Windows XP and generates a file named lclblastResult.txt In the codes I use the NCBI ecoli.nt database instead. Some parameters change without affecting its function. I think errors may happen in other part of your codes and more details are needed. -------code starts------- #sub search { use Bio::Tools::Run::StandAloneBlast; use Bio::SearchIO::blast; #my $query = Bio::Seq -> new ( -seq=>"$_[0]", # -id=>"query"); my $query=Bio::Seq->new(-seq=>"ctgtattctgggatgca"); my $len=$query->length(); #@db=('nt.nal'); #foreach my $db (@db) { my $factory = Bio::Tools::Run::StandAloneBlast->new( -program =>"blastn", -database =>'D:/blast/bin/ecoli.nt', -e =>1, -o=>'lclblastResult.txt'); my $rc = $factory->blastall($query); -----code ends-------- On Wed, Aug 15, 2007 03:01, Guojun Yang wrote: > Hi, all, > I have two subroutines in my code. One is remoteblast and the other > local blast. It works well. > When I decided to change the remoteblast to local blast, I always get the following error. I downloaded nt database from NCBI as > preformatted, but it works ok for both subroutines when I use > command line blastall -p blastn.... I changed the db name to 'nt', 'nt.00', the same error message was returned. The error says: > "program name was not given an argument", but I apparently gave it there. Can anybody help me? The code for the two subrountines are very similar: > > sub search { > use Bio::Tools::Run::StandAloneBlast; > use Bio::SearchIO::blast; > my $query = Bio::Seq -> new ( -seq=>"$_[0]", > -id=>"query"); > my $len=$query->length(); > @db=('nt.nal'); > foreach my $db (@db) { > my $factory = Bio::Tools::Run::StandAloneBlast->new( -program > =>"blastn", > -database > =>"$db", > -e > =>"$_[1]"); > my $rc = $factory->blastall($query); > ...... > > > sub ancestor { > use Bio::Tools::Run::StandAloneBlast; > use Bio::SearchIO::blast; > my $query = Bio::Seq -> new ( -seq=>"$_[0]", > -id=>"test"); > my $len=$query->length(); > my $long_name=$_[1]; > my $long_start=$_[2]; > my $long_end=$_[3]; > @db=('TNDB'); > foreach my $db (@db) { > my $factory = Bio::Tools::Run::StandAloneBlast->new(-program => > "blastx", > -database => > "$db", > -e => 1e-3, > ); > my $blast_report = $factory->blastall($query); > > > Thanks a lot! > Guojun Yang > Department of Plant Biology > University of Georgia > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- De-Jian Zhao Institute of Zoology,Chinese Academy of Sciences +86-10-64807217 zhaodj at ioz.ac.cn -------------- next part -------------- A non-text attachment was scrubbed... Name: lclblast.pl Type: application/octet-stream Size: 644 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070815/f40b2950/attachment.obj From tania.oh at brasenose.oxford.ac.uk Wed Aug 15 12:05:15 2007 From: tania.oh at brasenose.oxford.ac.uk (Tania Oh) Date: Wed, 15 Aug 2007 17:05:15 +0100 Subject: [Bioperl-l] exonerate parser in bioperl-live fails when protein2dna comparison is performed Message-ID: Dear All, I was trying to use the Bio::SearchIO::Alignment::Exonerate module to run and parse my exonerate output. But I've noticed that the parser which is actually Bio::SearchIO::Exonerate works if the model used in Exonerate is --model est2genome. I used exonerate with the model -- model protein2dna and the parser was unable to parse the hsps. Below is a simple of code I used for testing the output from exonerate: use Bio::SearchIO; use strict; -------------- next part -------------- A non-text attachment was scrubbed... Name: exonerate.output.works Type: application/octet-stream Size: 6056 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070815/e4e43d75/attachment-0002.obj -------------- next part -------------- my $searchio = Bio::SearchIO->new(-file => 'test_data/ exonerate.output.dontwork -------------- next part -------------- A non-text attachment was scrubbed... Name: exonerate.output.dontwork Type: application/octet-stream Size: 3283 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070815/e4e43d75/attachment-0003.obj -------------- next part -------------- ', -format => 'exonerate'); while( my $r = $searchio->next_result ) { while(my $hit = $r->next_hit){ while(my $hsp = $hit->next_hsp){ print $hsp->start. "\t". $hsp->end. "\n"; } } print $r->query_name, "\n"; } There are 2 files attached to show the examples of using either the est2genome or protein2dna model: 1. exonerate.output.works - produced from the command line: exonerate -q exonerate_cdna.fa -t exonerate_genomic.fa --model est2genome --bestn 1 > exonerate.output.works 2. exonerate.output.dontwork - produced from the command line: exonerate -q test_aa.fa -t test_cds.fa --model protein2dna > exonerate.output.dontwork Line 239 in Bio::searchIO::exonerate (cut and pasted below) elsif( s/^vulgar:\s+(\S+)\s+ # query sequence id (\d+)\s+(\d+)\s+([\-\+])\s+ # query start-end-strand (\S+)\s+ # target sequence id (\d+)\s+(\d+)\s+([\-\+])\s+ # target start-end- strand (\d+)\s+ # score //ox ) { parses the vulgar line of an --model est2genome exonerate output well. An example of the (complex) vulgar line which I've truncated for readability is: vulgar: MUSSPSYN 3 1279 + 4.143962167-143965267 28 3074 + 6137 M 8 8 G 0 1 M 231 231 5 0 2 I 0 253 3 0 whereas the vulgar line I've obtained from a --model protein2dna exonerate output is much simpler and the parser fails to pick it up: vulgar: SJCHGC00851 0 204 . SJCHGC00851 2 614 + 1059 M 204 612 Has anyone encountered this situation before? I've not changed the parser as exonerate is widely used for it's est2genome model, and thought I'd run it pass the list to see if there is a work around solution. many thanks in advance, tania From johnsonmar at mail.nih.gov Wed Aug 15 12:47:10 2007 From: johnsonmar at mail.nih.gov (Johnson, Mary (NIH/NCI) [C]) Date: Wed, 15 Aug 2007 12:47:10 -0400 Subject: [Bioperl-l] Need assistance with make error Message-ID: I'm trying to install bioperl on 2 Linux servers - 1 running Redhat Enterprise Linux 4, and the other running RHEL3. I'm getting the following 'make Error 255' when running make test. I'm not sure what this error indicates, and whether I should continue with a force install? Could you please advise. Failed Test Stat Wstat Total Fail Failed List of Failed ------------------------------------------------------------------------ ------- t/BioFetch_DB.t 27 1 3.70% 8 t/EMBL_DB.t 15 3 20.00% 6 13-14 t/Ontology.t 9 2304 50 100 200.00% 1-50 t/TreeIO.t 41 1 2.44% 42 t/Variation_IO.t 25 3 12.00% 15 20 25 t/simpleGOparser.t 9 2304 98 196 200.00% 1-98 120 subtests skipped. Failed 6/179 test scripts, 96.65% okay. 154/8268 subtests failed, 98.14% okay. make: *** [test_dynamic] Error 255 Thanks, Mary Johnson Sr. Network Engineer National Cancer Institute Center for Bioinformatics Contractor, TerpSys http://www.terpsys.com/ From arareko at campus.iztacala.unam.mx Wed Aug 15 13:45:39 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Wed, 15 Aug 2007 12:45:39 -0500 Subject: [Bioperl-l] Need assistance with make error In-Reply-To: References: Message-ID: <46C33BC3.9000409@campus.iztacala.unam.mx> Which version of bioperl you're trying to install? Johnson, Mary (NIH/NCI) [C] wrote: > I'm trying to install bioperl on 2 Linux servers - 1 running Redhat > Enterprise Linux 4, and the other running RHEL3. I'm getting the > following 'make Error 255' when running make test. I'm not sure what > this error indicates, and whether I should continue with a force > install? Could you please advise. > > > > > > Failed Test Stat Wstat Total Fail Failed List of Failed > > ------------------------------------------------------------------------ > ------- > > t/BioFetch_DB.t 27 1 3.70% 8 > > t/EMBL_DB.t 15 3 20.00% 6 13-14 > > t/Ontology.t 9 2304 50 100 200.00% 1-50 > > t/TreeIO.t 41 1 2.44% 42 > > t/Variation_IO.t 25 3 12.00% 15 20 25 > > t/simpleGOparser.t 9 2304 98 196 200.00% 1-98 > > 120 subtests skipped. > > Failed 6/179 test scripts, 96.65% okay. 154/8268 subtests failed, 98.14% > okay. > > make: *** [test_dynamic] Error 255 > > > > > > > > Thanks, > > > > Mary Johnson > > Sr. Network Engineer > > National Cancer Institute Center for Bioinformatics > Contractor, TerpSys > http://www.terpsys.com/ > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From mbasu at mail.nih.gov Wed Aug 15 13:55:50 2007 From: mbasu at mail.nih.gov (Malay) Date: Wed, 15 Aug 2007 13:55:50 -0400 Subject: [Bioperl-l] Developer docs Message-ID: <46C33E26.2050004@mail.nih.gov> Hello All: I apologize for not searching throughly. But I'd appreciate if someone point to a location where I can find any bioperl coding convention that I need follow for any code contribution to Bioperl. -Malay -- Malay K Basu www.malaybasu.net From arareko at campus.iztacala.unam.mx Wed Aug 15 14:39:29 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Wed, 15 Aug 2007 13:39:29 -0500 Subject: [Bioperl-l] Developer docs In-Reply-To: <46C33E26.2050004@mail.nih.gov> References: <46C33E26.2050004@mail.nih.gov> Message-ID: <46C34861.8090400@campus.iztacala.unam.mx> You may want to bookmark this one: http://bioperl.org/wiki/Developer_Information#BioPerl_Code Mauricio. Malay wrote: > Hello All: > > I apologize for not searching throughly. But I'd appreciate if someone > point to a location where I can find any bioperl coding convention that > I need follow for any code contribution to Bioperl. > > -Malay > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From johnsonmar at mail.nih.gov Wed Aug 15 15:01:23 2007 From: johnsonmar at mail.nih.gov (Johnson, Mary (NIH/NCI) [C]) Date: Wed, 15 Aug 2007 15:01:23 -0400 Subject: [Bioperl-l] Need assistance with make error In-Reply-To: <46C33BC3.9000409@campus.iztacala.unam.mx> Message-ID: This is version 1.4. Mary Johnson Sr. Network Engineer National Cancer Institute Center for Bioinformatics Contractor, TerpSys http://www.terpsys.com/ -----Original Message----- From: Mauricio Herrera Cuadra [mailto:arareko at campus.iztacala.unam.mx] Sent: Wednesday, August 15, 2007 1:46 PM To: Johnson, Mary (NIH/NCI) [C] Cc: bioperl-l at bioperl.org Subject: Re: [Bioperl-l] Need assistance with make error Which version of bioperl you're trying to install? Johnson, Mary (NIH/NCI) [C] wrote: > I'm trying to install bioperl on 2 Linux servers - 1 running Redhat > Enterprise Linux 4, and the other running RHEL3. I'm getting the > following 'make Error 255' when running make test. I'm not sure what > this error indicates, and whether I should continue with a force > install? Could you please advise. > > > > > > Failed Test Stat Wstat Total Fail Failed List of Failed > > ------------------------------------------------------------------------ > ------- > > t/BioFetch_DB.t 27 1 3.70% 8 > > t/EMBL_DB.t 15 3 20.00% 6 13-14 > > t/Ontology.t 9 2304 50 100 200.00% 1-50 > > t/TreeIO.t 41 1 2.44% 42 > > t/Variation_IO.t 25 3 12.00% 15 20 25 > > t/simpleGOparser.t 9 2304 98 196 200.00% 1-98 > > 120 subtests skipped. > > Failed 6/179 test scripts, 96.65% okay. 154/8268 subtests failed, 98.14% > okay. > > make: *** [test_dynamic] Error 255 > > > > > > > > Thanks, > > > > Mary Johnson > > Sr. Network Engineer > > National Cancer Institute Center for Bioinformatics > Contractor, TerpSys > http://www.terpsys.com/ > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Wed Aug 15 16:25:30 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 15 Aug 2007 15:25:30 -0500 Subject: [Bioperl-l] Need assistance with make error In-Reply-To: References: Message-ID: You'll definitely want to update to the latest (v 1.5.2). We hope to get a new stable release out sometime soon and possibly move to a more regular release cycle. chris On Aug 15, 2007, at 2:01 PM, Johnson, Mary (NIH/NCI) [C] wrote: > This is version 1.4. > > Mary Johnson > > Sr. Network Engineer > > National Cancer Institute Center for Bioinformatics > Contractor, TerpSys > http://www.terpsys.com/ > > > > -----Original Message----- > From: Mauricio Herrera Cuadra [mailto:arareko at campus.iztacala.unam.mx] > Sent: Wednesday, August 15, 2007 1:46 PM > To: Johnson, Mary (NIH/NCI) [C] > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] Need assistance with make error > > Which version of bioperl you're trying to install? > > Johnson, Mary (NIH/NCI) [C] wrote: >> I'm trying to install bioperl on 2 Linux servers - 1 running Redhat >> Enterprise Linux 4, and the other running RHEL3. I'm getting the >> following 'make Error 255' when running make test. I'm not sure what >> this error indicates, and whether I should continue with a force >> install? Could you please advise. >> >> >> >> >> >> Failed Test Stat Wstat Total Fail Failed List of Failed >> >> --------------------------------------------------------------------- >> --- >> ------- >> >> t/BioFetch_DB.t 27 1 3.70% 8 >> >> t/EMBL_DB.t 15 3 20.00% 6 13-14 >> >> t/Ontology.t 9 2304 50 100 200.00% 1-50 >> >> t/TreeIO.t 41 1 2.44% 42 >> >> t/Variation_IO.t 25 3 12.00% 15 20 25 >> >> t/simpleGOparser.t 9 2304 98 196 200.00% 1-98 >> >> 120 subtests skipped. >> >> Failed 6/179 test scripts, 96.65% okay. 154/8268 subtests failed, >> 98.14% >> okay. >> >> make: *** [test_dynamic] Error 255 >> >> >> >> >> >> >> >> Thanks, >> >> >> >> Mary Johnson >> >> Sr. Network Engineer >> >> National Cancer Institute Center for Bioinformatics >> Contractor, TerpSys >> http://www.terpsys.com/ >> >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From johnsonmar at mail.nih.gov Wed Aug 15 16:32:43 2007 From: johnsonmar at mail.nih.gov (Johnson, Mary (NIH/NCI) [C]) Date: Wed, 15 Aug 2007 16:32:43 -0400 Subject: [Bioperl-l] Need assistance with make error In-Reply-To: Message-ID: I saw the 1.5.2 version, but it stated that this was a developer release and that 1.4 was the latest stable version, so I went with 1.4. I'll give 1.5.2 a try. Thanks, Mary Johnson Sr. Network Engineer National Cancer Institute Center for Bioinformatics Contractor, TerpSys http://www.terpsys.com/ -----Original Message----- From: Chris Fields [mailto:cjfields at uiuc.edu] Sent: Wednesday, August 15, 2007 4:26 PM To: Johnson, Mary (NIH/NCI) [C] Cc: Mauricio Herrera Cuadra; bioperl-l at bioperl.org Subject: Re: [Bioperl-l] Need assistance with make error You'll definitely want to update to the latest (v 1.5.2). We hope to get a new stable release out sometime soon and possibly move to a more regular release cycle. chris On Aug 15, 2007, at 2:01 PM, Johnson, Mary (NIH/NCI) [C] wrote: > This is version 1.4. > > Mary Johnson > > Sr. Network Engineer > > National Cancer Institute Center for Bioinformatics > Contractor, TerpSys > http://www.terpsys.com/ > > > > -----Original Message----- > From: Mauricio Herrera Cuadra [mailto:arareko at campus.iztacala.unam.mx] > Sent: Wednesday, August 15, 2007 1:46 PM > To: Johnson, Mary (NIH/NCI) [C] > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] Need assistance with make error > > Which version of bioperl you're trying to install? > > Johnson, Mary (NIH/NCI) [C] wrote: >> I'm trying to install bioperl on 2 Linux servers - 1 running Redhat >> Enterprise Linux 4, and the other running RHEL3. I'm getting the >> following 'make Error 255' when running make test. I'm not sure what >> this error indicates, and whether I should continue with a force >> install? Could you please advise. >> >> >> >> >> >> Failed Test Stat Wstat Total Fail Failed List of Failed >> >> --------------------------------------------------------------------- >> --- >> ------- >> >> t/BioFetch_DB.t 27 1 3.70% 8 >> >> t/EMBL_DB.t 15 3 20.00% 6 13-14 >> >> t/Ontology.t 9 2304 50 100 200.00% 1-50 >> >> t/TreeIO.t 41 1 2.44% 42 >> >> t/Variation_IO.t 25 3 12.00% 15 20 25 >> >> t/simpleGOparser.t 9 2304 98 196 200.00% 1-98 >> >> 120 subtests skipped. >> >> Failed 6/179 test scripts, 96.65% okay. 154/8268 subtests failed, >> 98.14% >> okay. >> >> make: *** [test_dynamic] Error 255 >> >> >> >> >> >> >> >> Thanks, >> >> >> >> Mary Johnson >> >> Sr. Network Engineer >> >> National Cancer Institute Center for Bioinformatics >> Contractor, TerpSys >> http://www.terpsys.com/ >> >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Aug 15 16:40:32 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 15 Aug 2007 15:40:32 -0500 Subject: [Bioperl-l] Need assistance with make error In-Reply-To: References: Message-ID: The term 'stable' is relative in this case; tons of bugs fixes were incorporated in the 1.5.2 release. There are a few dev-specific issues we'll need to resolve prior to a new release; once those are out of the way we'll try to get a new 'stable' out. chris On Aug 15, 2007, at 3:32 PM, Johnson, Mary (NIH/NCI) [C] wrote: > I saw the 1.5.2 version, but it stated that this was a developer > release and that 1.4 was the latest stable version, so I went with > 1.4. I'll give 1.5.2 a try. > > Thanks, > > > Mary Johnson > > Sr. Network Engineer > > National Cancer Institute Center for Bioinformatics > Contractor, TerpSys > http://www.terpsys.com/ > > > > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > Sent: Wednesday, August 15, 2007 4:26 PM > To: Johnson, Mary (NIH/NCI) [C] > Cc: Mauricio Herrera Cuadra; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] Need assistance with make error > > You'll definitely want to update to the latest (v 1.5.2). We hope to > get a new stable release out sometime soon and possibly move to a > more regular release cycle. > > chris > > On Aug 15, 2007, at 2:01 PM, Johnson, Mary (NIH/NCI) [C] wrote: > >> This is version 1.4. >> >> Mary Johnson >> >> Sr. Network Engineer >> >> National Cancer Institute Center for Bioinformatics >> Contractor, TerpSys >> http://www.terpsys.com/ >> >> >> >> -----Original Message----- >> From: Mauricio Herrera Cuadra >> [mailto:arareko at campus.iztacala.unam.mx] >> Sent: Wednesday, August 15, 2007 1:46 PM >> To: Johnson, Mary (NIH/NCI) [C] >> Cc: bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] Need assistance with make error >> >> Which version of bioperl you're trying to install? >> >> Johnson, Mary (NIH/NCI) [C] wrote: >>> I'm trying to install bioperl on 2 Linux servers - 1 running Redhat >>> Enterprise Linux 4, and the other running RHEL3. I'm getting the >>> following 'make Error 255' when running make test. I'm not sure >>> what >>> this error indicates, and whether I should continue with a force >>> install? Could you please advise. >>> >>> >>> >>> >>> >>> Failed Test Stat Wstat Total Fail Failed List of Failed >>> >>> -------------------------------------------------------------------- >>> - >>> --- >>> ------- >>> >>> t/BioFetch_DB.t 27 1 3.70% 8 >>> >>> t/EMBL_DB.t 15 3 20.00% 6 13-14 >>> >>> t/Ontology.t 9 2304 50 100 200.00% 1-50 >>> >>> t/TreeIO.t 41 1 2.44% 42 >>> >>> t/Variation_IO.t 25 3 12.00% 15 20 25 >>> >>> t/simpleGOparser.t 9 2304 98 196 200.00% 1-98 >>> >>> 120 subtests skipped. >>> >>> Failed 6/179 test scripts, 96.65% okay. 154/8268 subtests failed, >>> 98.14% >>> okay. >>> >>> make: *** [test_dynamic] Error 255 >>> >>> >>> >>> >>> >>> >>> >>> Thanks, >>> >>> >>> >>> Mary Johnson >>> >>> Sr. Network Engineer >>> >>> National Cancer Institute Center for Bioinformatics >>> Contractor, TerpSys >>> http://www.terpsys.com/ >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> MAURICIO HERRERA CUADRA >> arareko at campus.iztacala.unam.mx >> Laboratorio de Gen?tica >> Unidad de Morfofisiolog?a y Funci?n >> Facultad de Estudios Superiores Iztacala, UNAM >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From Kevin.M.Brown at asu.edu Wed Aug 15 16:54:04 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 15 Aug 2007 13:54:04 -0700 Subject: [Bioperl-l] Need assistance with make error In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B40386D612@EX02.asurite.ad.asu.edu> It technically is a developer release, but given the age of the 1.4 release it is better because of fixes for things like doing webblasts and other improvements and I've found that it is reliable in the results that come out of the various objects that I've had to use in my current projects. > I saw the 1.5.2 version, but it stated that this was a > developer release and that 1.4 was the latest stable version, > so I went with 1.4. I'll give 1.5.2 a try. > > Thanks, > > > Mary Johnson > > Sr. Network Engineer > > National Cancer Institute Center for Bioinformatics > Contractor, TerpSys http://www.terpsys.com/ > > > > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > Sent: Wednesday, August 15, 2007 4:26 PM > To: Johnson, Mary (NIH/NCI) [C] > Cc: Mauricio Herrera Cuadra; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] Need assistance with make error > > You'll definitely want to update to the latest (v 1.5.2). We > hope to get a new stable release out sometime soon and > possibly move to a more regular release cycle. > > chris > > On Aug 15, 2007, at 2:01 PM, Johnson, Mary (NIH/NCI) [C] wrote: > > > This is version 1.4. > > > > Mary Johnson > > > > Sr. Network Engineer > > > > National Cancer Institute Center for Bioinformatics Contractor, > > TerpSys http://www.terpsys.com/ > > > > > > > > -----Original Message----- > > From: Mauricio Herrera Cuadra > [mailto:arareko at campus.iztacala.unam.mx] > > Sent: Wednesday, August 15, 2007 1:46 PM > > To: Johnson, Mary (NIH/NCI) [C] > > Cc: bioperl-l at bioperl.org > > Subject: Re: [Bioperl-l] Need assistance with make error > > > > Which version of bioperl you're trying to install? > > > > Johnson, Mary (NIH/NCI) [C] wrote: > >> I'm trying to install bioperl on 2 Linux servers - 1 > running Redhat > >> Enterprise Linux 4, and the other running RHEL3. I'm getting the > >> following 'make Error 255' when running make test. I'm > not sure what > >> this error indicates, and whether I should continue with a force > >> install? Could you please advise. > >> > >> > >> > >> > >> > >> Failed Test Stat Wstat Total Fail Failed List of Failed > >> > >> > --------------------------------------------------------------------- > >> --- > >> ------- > >> > >> t/BioFetch_DB.t 27 1 3.70% 8 > >> > >> t/EMBL_DB.t 15 3 20.00% 6 13-14 > >> > >> t/Ontology.t 9 2304 50 100 200.00% 1-50 > >> > >> t/TreeIO.t 41 1 2.44% 42 > >> > >> t/Variation_IO.t 25 3 12.00% 15 20 25 > >> > >> t/simpleGOparser.t 9 2304 98 196 200.00% 1-98 > >> > >> 120 subtests skipped. > >> > >> Failed 6/179 test scripts, 96.65% okay. 154/8268 subtests failed, > >> 98.14% okay. > >> > >> make: *** [test_dynamic] Error 255 > >> > >> > >> > >> > >> > >> > >> > >> Thanks, > >> > >> > >> > >> Mary Johnson > >> > >> Sr. Network Engineer > >> > >> National Cancer Institute Center for Bioinformatics Contractor, > >> TerpSys http://www.terpsys.com/ > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > -- > > MAURICIO HERRERA CUADRA > > arareko at campus.iztacala.unam.mx > > Laboratorio de Gen?tica > > Unidad de Morfofisiolog?a y Funci?n > > Facultad de Estudios Superiores Iztacala, UNAM > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bix at sendu.me.uk Wed Aug 15 16:50:02 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 15 Aug 2007 21:50:02 +0100 Subject: [Bioperl-l] Developer docs In-Reply-To: <46C34861.8090400@campus.iztacala.unam.mx> References: <46C33E26.2050004@mail.nih.gov> <46C34861.8090400@campus.iztacala.unam.mx> Message-ID: <46C366FA.40609@sendu.me.uk> Mauricio Herrera Cuadra wrote: > You may want to bookmark this one: > > http://bioperl.org/wiki/Developer_Information#BioPerl_Code Yup. The important one is http://bioperl.org/wiki/Bioperl_Best_Practices , which I've just updated with the latest info on writing test scripts. From bix at sendu.me.uk Wed Aug 15 16:54:45 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 15 Aug 2007 21:54:45 +0100 Subject: [Bioperl-l] Need assistance with make error In-Reply-To: References: Message-ID: <46C36815.5010908@sendu.me.uk> Johnson, Mary (NIH/NCI) [C] wrote: > I'm trying to install bioperl on 2 Linux servers - 1 running Redhat > Enterprise Linux 4, and the other running RHEL3. I'm getting the > following 'make Error 255' when running make test. I'm not sure what > this error indicates, and whether I should continue with a force > install? Could you please advise. Unless you know you really must install Bioperl 1.4, install 1.5.2 instead. http://www.bioperl.org/wiki/Release_1.5.2 If you use the Build.PL installation, at the very least you certainly won't get a make error. http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#PRELIMINARY_PREPARATION From cjfields at uiuc.edu Wed Aug 15 17:16:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 15 Aug 2007 16:16:27 -0500 Subject: [Bioperl-l] exonerate parser in bioperl-live fails when protein2dna comparison is performed In-Reply-To: References: Message-ID: I can confirm this with bioperl-live. Bio::SearchIO::exonerate docs indicate protein2genome and est2genome model output is supported but doesn't specifically indicate that it can parse any other output. You can add an enhancement request to bugzilla indicating this deficiency or, if you are inclined, add the functionality yourself and donate the code. chris On Aug 15, 2007, at 11:05 AM, Tania Oh wrote: > Dear All, > > I was trying to use the Bio::SearchIO::Alignment::Exonerate module > to run and parse my exonerate output. But I've noticed that the > parser which is actually Bio::SearchIO::Exonerate works if the > model used in Exonerate is --model est2genome. I used exonerate > with the model --model protein2dna and the parser was unable to > parse the hsps. > > > Below is a simple of code I used for testing the output from > exonerate: > > use Bio::SearchIO; > use strict; > > my $searchio = Bio::SearchIO->new(-file => 'test_data/ > exonerate.output.dontwork > > ', > -format => 'exonerate'); > > while( my $r = $searchio->next_result ) { > while(my $hit = $r->next_hit){ > while(my $hsp = $hit->next_hsp){ > print $hsp->start. "\t". $hsp->end. "\n"; > } > } > > print $r->query_name, "\n"; > } > > > There are 2 files attached to show the examples of using either the > est2genome or protein2dna model: > 1. exonerate.output.works - produced from the command line: > exonerate -q exonerate_cdna.fa -t exonerate_genomic.fa --model > est2genome --bestn 1 > exonerate.output.works > > 2. exonerate.output.dontwork - produced from the command line: > exonerate -q test_aa.fa -t test_cds.fa --model protein2dna > > exonerate.output.dontwork > > > Line 239 in Bio::searchIO::exonerate (cut and pasted below) > > elsif( s/^vulgar:\s+(\S+)\s+ # query sequence id > (\d+)\s+(\d+)\s+([\-\+])\s+ # query start-end- > strand > (\S+)\s+ # target sequence id > (\d+)\s+(\d+)\s+([\-\+])\s+ # target start-end- > strand > (\d+)\s+ # score > //ox ) { > > parses the vulgar line of an --model est2genome exonerate output > well. An example of the (complex) vulgar line which I've truncated > for readability is: > vulgar: MUSSPSYN 3 1279 + 4.143962167-143965267 28 3074 + 6137 M 8 > 8 G 0 1 M 231 231 5 0 2 I 0 253 3 0 > > whereas the vulgar line I've obtained from a --model protein2dna > exonerate output is much simpler and the parser fails to pick it up: > vulgar: SJCHGC00851 0 204 . SJCHGC00851 2 614 + 1059 M 204 612 > > Has anyone encountered this situation before? I've not changed the > parser as exonerate is widely used for it's est2genome model, and > thought I'd run it pass the list to see if there is a work around > solution. > > many thanks in advance, > tania > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From johnsonmar at mail.nih.gov Wed Aug 15 17:45:36 2007 From: johnsonmar at mail.nih.gov (Johnson, Mary (NIH/NCI) [C]) Date: Wed, 15 Aug 2007 17:45:36 -0400 Subject: [Bioperl-l] Need assistance with make error In-Reply-To: Message-ID: Version 1.5.2 worked fine! Thanks to all of you for your quick response. I wish all of our vendors were that quick in getting back to me:) Mary Johnson Sr. Network Engineer National Cancer Institute Center for Bioinformatics Contractor, TerpSys http://www.terpsys.com/ -----Original Message----- From: Chris Fields [mailto:cjfields at uiuc.edu] Sent: Wednesday, August 15, 2007 4:41 PM To: Johnson, Mary (NIH/NCI) [C] Cc: Mauricio Herrera Cuadra; bioperl-l at bioperl.org Subject: Re: [Bioperl-l] Need assistance with make error The term 'stable' is relative in this case; tons of bugs fixes were incorporated in the 1.5.2 release. There are a few dev-specific issues we'll need to resolve prior to a new release; once those are out of the way we'll try to get a new 'stable' out. chris On Aug 15, 2007, at 3:32 PM, Johnson, Mary (NIH/NCI) [C] wrote: > I saw the 1.5.2 version, but it stated that this was a developer > release and that 1.4 was the latest stable version, so I went with > 1.4. I'll give 1.5.2 a try. > > Thanks, > > > Mary Johnson > > Sr. Network Engineer > > National Cancer Institute Center for Bioinformatics > Contractor, TerpSys > http://www.terpsys.com/ > > > > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > Sent: Wednesday, August 15, 2007 4:26 PM > To: Johnson, Mary (NIH/NCI) [C] > Cc: Mauricio Herrera Cuadra; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] Need assistance with make error > > You'll definitely want to update to the latest (v 1.5.2). We hope to > get a new stable release out sometime soon and possibly move to a > more regular release cycle. > > chris > > On Aug 15, 2007, at 2:01 PM, Johnson, Mary (NIH/NCI) [C] wrote: > >> This is version 1.4. >> >> Mary Johnson >> >> Sr. Network Engineer >> >> National Cancer Institute Center for Bioinformatics >> Contractor, TerpSys >> http://www.terpsys.com/ >> >> >> >> -----Original Message----- >> From: Mauricio Herrera Cuadra >> [mailto:arareko at campus.iztacala.unam.mx] >> Sent: Wednesday, August 15, 2007 1:46 PM >> To: Johnson, Mary (NIH/NCI) [C] >> Cc: bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] Need assistance with make error >> >> Which version of bioperl you're trying to install? >> >> Johnson, Mary (NIH/NCI) [C] wrote: >>> I'm trying to install bioperl on 2 Linux servers - 1 running Redhat >>> Enterprise Linux 4, and the other running RHEL3. I'm getting the >>> following 'make Error 255' when running make test. I'm not sure >>> what >>> this error indicates, and whether I should continue with a force >>> install? Could you please advise. >>> >>> >>> >>> >>> >>> Failed Test Stat Wstat Total Fail Failed List of Failed >>> >>> -------------------------------------------------------------------- >>> - >>> --- >>> ------- >>> >>> t/BioFetch_DB.t 27 1 3.70% 8 >>> >>> t/EMBL_DB.t 15 3 20.00% 6 13-14 >>> >>> t/Ontology.t 9 2304 50 100 200.00% 1-50 >>> >>> t/TreeIO.t 41 1 2.44% 42 >>> >>> t/Variation_IO.t 25 3 12.00% 15 20 25 >>> >>> t/simpleGOparser.t 9 2304 98 196 200.00% 1-98 >>> >>> 120 subtests skipped. >>> >>> Failed 6/179 test scripts, 96.65% okay. 154/8268 subtests failed, >>> 98.14% >>> okay. >>> >>> make: *** [test_dynamic] Error 255 >>> >>> >>> >>> >>> >>> >>> >>> Thanks, >>> >>> >>> >>> Mary Johnson >>> >>> Sr. Network Engineer >>> >>> National Cancer Institute Center for Bioinformatics >>> Contractor, TerpSys >>> http://www.terpsys.com/ >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> MAURICIO HERRERA CUADRA >> arareko at campus.iztacala.unam.mx >> Laboratorio de Gen?tica >> Unidad de Morfofisiolog?a y Funci?n >> Facultad de Estudios Superiores Iztacala, UNAM >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From neetisomaiya at gmail.com Thu Aug 16 00:22:18 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Thu, 16 Aug 2007 09:52:18 +0530 Subject: [Bioperl-l] Homologene parser? In-Reply-To: <46C1D557.7090101@pharm.stonybrook.edu> References: <764978cf0708130329r484bf210qd0e6e9ad39274bbc@mail.gmail.com> <22FEA28C-1720-4519-B129-B7A5ADA452D4@ccg.murdoch.edu.au> <764978cf0708140624s5c198b5akee38bf98866fd7f2@mail.gmail.com> <46C1D557.7090101@pharm.stonybrook.edu> Message-ID: <764978cf0708152122oba56e13qef83544cdde7e795@mail.gmail.com> Hi Siddhartha, Thanks a lot for your mail. It would be great if you could send me your parser, I will see how I can modify it for my purpose. Thanks and Regards, Neeti. On 8/14/07, Siddhartha Basu wrote: > > neeti somaiya wrote: > > Hi Andrew, > > > > I think the homologene data files have changed now on the ftp, from what > you > > had used. > > It is now homologene.data and homologene.xml. > > I tried using your parser, but because it was written on the file > > hmlg.trip.ftp, it doesnt work anymore. > > > > I came across a parser > > > http://bioinformatics.tgen.org/brunit/software/bioparser/docs/pod_bio_parser_homologene_fileparser_pm.shtml > > . > > I am looking at it to see if it works for me. NOt sure if it will. > > > > ~Neeti. > > Hi Neeti, > I have recently written a parser for 'homologene' xml data specific for > my purpose. I am not sure whether it will suit your purpose but it could > be extended for general purpose parsing, so i am putting it forward. > Here is how it works ....... > > * It only parses a single homologene entry ...... > * It does SAX based parsing (currently uses XML::SAX::ExpatXS) > * Returns a graph(uses Graph module of perl) object where each node is a > homologue entry with its corresponding entrez gene id. Each node also > contain the following attributes ... > * Refseq protein id. > * Protein id (pid) > * ncbi taxon id. > * The edge attribute contain information about the ortholog(true/false) > relationship between two nodes. > * The rest of tags currently are not being extracted. However, parsing > the rest of the tags should not be very difficult. > > Generally i get homologene xml stream from an 'efetch' through > Bio::DB::EUtilities, feed it to the parser, gets back 'Graph' object and > then works on it. > > So, to make it more generic and work on local file > > * We need another class that reads the chunk between > ..... and sends it to the parser. > * Add supports for most of the tags. > * Massage the data to a bioperl compatible object. > > The first two i could work it out and for the last one i have to figure > out the bioperl object that could be suitable (like Bio::Cluster or > Bio::NetWork::Node/Edge). > > Let me know if it sounds interesting and i will send you the code. > > -siddhartha > > > > > > On 8/14/07, Andrew Macgregor wrote: > >> On 13/08/2007, at 6:29 PM, neeti somaiya wrote: > >> > >>> Hi, > >>> > >>> Does anyone know of any Homologene parser, if available? > >>> Please let me know. > >>> > >>> Thanks and Regards, > >>> Neeti. > >> Hi Neeti, > >> > >> Quite a long time ago now I wrote an Homologene parser and posted it > >> to the mailing list: > >> > >> > >> > >> I don't know if this still works but you could use it as a starting > >> point. There may also be something newer out there too, I don't know. > >> If you search the mailing list archives you'll get a few messages > >> around the topic. > >> > >> Cheers, Andrew. > >> > >> > >> Andrew Macgregor > >> Centre for Comparative Genomics, Murdoch University > >> Email: amacgregor at ccg.murdoch.edu.au > >> Tel: (08) 9360 2961 > >> > >> > >> > >> > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- -Neeti Even my blood says, B positive From neetisomaiya at gmail.com Thu Aug 16 01:56:21 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Thu, 16 Aug 2007 11:26:21 +0530 Subject: [Bioperl-l] PDB Parser Message-ID: <764978cf0708152256w38b2bfa7ic4d196ae08a46bd0@mail.gmail.com> Hi, After a lot of search I could find this link from where PDB files can be downloaded : ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/pdb/ Is there any other link where one can download all pdb data from? I tried using Bio::Structure::IO::pdb with some code like :- use Bio::Structure::IO; $in = Bio::Structure::IO->new(-file => "pdb100d.ent", -format => 'pdb'); while ( my $struc = $in->next_structure() ) { print "Structure ", $struc->id,"\n"; } It works well. But I am not able to find documentation of other methods which will give me various specific details available in a pdb file, right from title, keywords, references to structure details, atoms, coordinates etc. There must be different methods to fetch and parse each of this data from a pdb file, right? Where can I find the details? Any example code of the same would also be of great use. Thanks and Regards, Neeti Somaiya. -- -Neeti Even my blood says, B positive From hrh at sanger.ac.uk Thu Aug 16 04:48:16 2007 From: hrh at sanger.ac.uk (Hans Rudolf Hotz) Date: Thu, 16 Aug 2007 09:48:16 +0100 (BST) Subject: [Bioperl-l] PDB Parser In-Reply-To: <764978cf0708152256w38b2bfa7ic4d196ae08a46bd0@mail.gmail.com> References: <764978cf0708152256w38b2bfa7ic4d196ae08a46bd0@mail.gmail.com> Message-ID: On Thu, 16 Aug 2007, neeti somaiya wrote: > Hi, > > After a lot of search I could find this link from where PDB files can be > downloaded : > ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/pdb/ > Is there any other link where one can download all pdb data from? try: ftp://pdb.protein.osaka-u.ac.jp/v3/pub/pdb/ or ftp://ftp.ebi.ac.uk/pub/databases/rcsb/pdb-remediated/ it is not BioPerl, but James Tisdall's book: O'Reilly: "Begiining Perl for Bioinformatics" has a nice introduction into parsing PDB files Regards, Hans > > I tried using Bio::Structure::IO::pdb with some code like :- > use Bio::Structure::IO; > > $in = Bio::Structure::IO->new(-file => "pdb100d.ent", > -format => 'pdb'); > > while ( my $struc = $in->next_structure() ) { > print "Structure ", $struc->id,"\n"; > } > > It works well. But I am not able to find documentation of other methods > which will give me various specific details available in a pdb file, right > from title, keywords, references to structure details, atoms, coordinates > etc. There must be different methods to fetch and parse each of this data > from a pdb file, right? Where can I find the details? Any example code of > the same would also be of great use. > > Thanks and Regards, > Neeti Somaiya. > > -- > -Neeti > Even my blood says, B positive > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From neetisomaiya at gmail.com Thu Aug 16 05:30:42 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Thu, 16 Aug 2007 15:00:42 +0530 Subject: [Bioperl-l] Homologene parser? In-Reply-To: References: <764978cf0708130329r484bf210qd0e6e9ad39274bbc@mail.gmail.com> <22FEA28C-1720-4519-B129-B7A5ADA452D4@ccg.murdoch.edu.au> <4E7F8A99-68A7-49C2-9919-E2FC5652C8D7@uiuc.edu> Message-ID: <764978cf0708160230o4ade944er8c8529199f3a0262@mail.gmail.com> Hi, For now I am using the homologene parser available here :- http://bioinformatics.tgen.org/brunit/software/bioparser/docs/pod_bio_parser_homologene_fileparser_pm.shtml , for parsing the homologene.data file. But the README at the ftp site says HOMOLOGENE.XML has much more data, I am still to see how to parse this one. ~Neeti. On 8/14/07, Andrew Macgregor wrote: > > On 14/08/2007, at 11:21 AM, Chris Fields wrote: > > > It looks like Heikki responded and thought a good place for it > > would be Bio::SeqIO, but it didn't go anywhere I suppose. I see > > that a few other posts suggest it could be placed in Bio::Cluster > > as well which I'm not familiar with. We could add it in if you > > were still interested, just need to find a good place for it; might > > be nice to have a Parse::RecDescent-based parser. > > > > chris > > > > Hi Chris, > > I was also doing some parsing of UniGene at the time but found > RecDescent was too slow and went back to regexes. That code found > it's way into Bio::Cluster. Occasionally I see a message with someone > looking for a Homologene parser but not very often, so I'm not sure > it is worth the effort of moving the code into bioperl. > > Cheers, Andrew. > -- -Neeti Even my blood says, B positive From bix at sendu.me.uk Thu Aug 16 05:59:08 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 16 Aug 2007 10:59:08 +0100 Subject: [Bioperl-l] PDB Parser In-Reply-To: <764978cf0708152256w38b2bfa7ic4d196ae08a46bd0@mail.gmail.com> References: <764978cf0708152256w38b2bfa7ic4d196ae08a46bd0@mail.gmail.com> Message-ID: <46C41FEC.2000206@sendu.me.uk> neeti somaiya wrote: > I tried using Bio::Structure::IO::pdb with some code like :- > use Bio::Structure::IO; > > $in = Bio::Structure::IO->new(-file => "pdb100d.ent", > -format => 'pdb'); > > while ( my $struc = $in->next_structure() ) { > print "Structure ", $struc->id,"\n"; > } > > It works well. But I am not able to find documentation of other methods > which will give me various specific details available in a pdb file, right > from title, keywords, references to structure details, atoms, coordinates > etc. There must be different methods to fetch and parse each of this data > from a pdb file, right? Where can I find the details? $struct is a Bio::Structure::Entry, so look at the docs for that: http://doc.bioperl.org/bioperl-live/Bio/Structure/Entry.html You'll probably want to look at the docs for the other Structure modules as well: http://doc.bioperl.org/bioperl-live/Bio/Structure/modules.html I agree, the documentation in this area could be improved. Bio::Structure::StructureI could actually contain something, and Bio::Structure should actually exist or not be referenced in the docs. From ewijaya at gmail.com Thu Aug 16 00:18:57 2007 From: ewijaya at gmail.com (Edward Wijaya) Date: Thu, 16 Aug 2007 12:18:57 +0800 Subject: [Bioperl-l] How to create contrasting colors in every singe track - Bio::Graphics Message-ID: <3521d3670708152118y415f512clc51046cd7ae8c11a@mail.gmail.com> Dear experts, I am trying to draw a figures that shows binding sites hits for various program (see attached) for example. Now, I have a problem in creating contrasting colour for each of the Programs (MEME, AlignACE, etc). I want to avoid "graded segments", so that I can have more contrasting color, e.g: red, blue, yellow, etc. Can anybody suggest how can we achieve that? My full source code can be found here: http://dpaste.com/16985/ The portion of the script is this: __BEGIN__ my %prog_color = ( "Actual" => 800000, "ALIGNACE" => 230000, "BP" => 80000, "MDSCAN" => 5000, "MITRA" => 10000, "MTSAMP" => 200000, "SPACE" => 40000, "NONE" => 0, ); foreach my $seqid ( sort {$a <=> $b }keys %nlist ) { my $track = $panel->add_track( -glyph => 'graded_segments', -key => "SEQ " . $seqid, -connector => "dashed", -label => 1, -fontcolor => 'red', -bgcolor => 'blue', -bump => +1, -height => 8, -min_score => 0, -max_score => 500000 ); # rest of the script __END__ Regards, Edward -------------- next part -------------- A non-text attachment was scrubbed... Name: hits.png Type: image/png Size: 2509 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070816/31057225/attachment.png From pratchusha.kamireddy at aamu.edu Wed Aug 15 23:45:22 2007 From: pratchusha.kamireddy at aamu.edu (pratchusha kamireddy) Date: Wed, 15 Aug 2007 22:45:22 -0500 (CDT) Subject: [Bioperl-l] Request for Activeperl software Message-ID: <32393254.1187235922749.JavaMail.oracle@my.aamu.edu> Hello I am Pratchusha Kamireddy doing masters in Alabama A&M University. I am working under Dr.Kantety in Plant and Soil Science Department.I am the beginner to learn perl programming. I need Activeperl software to run the perl programs. Can you help me in this regard like: where can I dowmload this software, how can i Install this and how can i use this. I am eagerlu waiting for your reply.Please help me in this regard. Thanking you Pratchusha Kamireddy From spiros at lokku.com Thu Aug 16 09:32:05 2007 From: spiros at lokku.com (Spiros Denaxas) Date: Thu, 16 Aug 2007 14:32:05 +0100 Subject: [Bioperl-l] Request for Activeperl software In-Reply-To: <32393254.1187235922749.JavaMail.oracle@my.aamu.edu> References: <32393254.1187235922749.JavaMail.oracle@my.aamu.edu> Message-ID: Hi, You can download ActivePerl from ActiveStates website at http://www.activestate.com/Products/ActivePerl/ Get a book: http://www.oreilly.com/catalog/lperl3/ Visit: http://perl-begin.org/ http://learn.perl.org/ Usenet: http://www.nntp.perl.org/group/perl.beginners/ Spiros On 8/16/07, pratchusha kamireddy wrote: > Hello > I am Pratchusha Kamireddy doing masters in Alabama A&M University. I am working under Dr.Kantety in Plant and Soil Science Department.I am the beginner to learn perl programming. I need Activeperl software to run the perl programs. Can you help me in this regard like: where can I dowmload this software, how can i Install this and how can i use this. I am eagerlu waiting for your reply.Please help me in this regard. > Thanking you > Pratchusha Kamireddy > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From razi.khaja at gmail.com Thu Aug 16 09:37:09 2007 From: razi.khaja at gmail.com (Razi Khaja) Date: Thu, 16 Aug 2007 09:37:09 -0400 Subject: [Bioperl-l] How to create contrasting colors in every singe track - Bio::Graphics In-Reply-To: <3521d3670708152118y415f512clc51046cd7ae8c11a@mail.gmail.com> References: <3521d3670708152118y415f512clc51046cd7ae8c11a@mail.gmail.com> Message-ID: <62e9dabc0708160637o36380ecbv69fe479d0a26989d@mail.gmail.com> You would probably want to consider a "Graph-Coloring" algorithm in order to optimally pick contrasting colors for the features being displayed. This might be overkill for what your trying to accomplish and may not be possible (depending on how many features you have in your dataset ... ie. how big your graph is). In anycase, some resources are: http://en.wikipedia.org/wiki/Graph_coloring http://web.cs.ualberta.ca/~joe/Coloring/ If your problem is simpler, see the modifications to your program Ive made below: Razi Khaja On 8/16/07, Edward Wijaya wrote: > Dear experts, > > I am trying to draw a figures that shows binding sites hits for various > program (see attached) for example. > > Now, I have a problem in creating contrasting colour for each of > the Programs (MEME, AlignACE, etc). I want to avoid "graded segments", > so that I can have more contrasting color, e.g: red, blue, yellow, etc. > > Can anybody suggest how can we achieve that? > > My full source code can be found here: http://dpaste.com/16985/ > The portion of the script is this: > > __BEGIN__ > my %prog_color = ( > "Actual" => 800000, > "ALIGNACE" => 230000, > "BP" => 80000, > "MDSCAN" => 5000, > "MITRA" => 10000, > "MTSAMP" => 200000, > "SPACE" => 40000, > "NONE" => 0, > ); > my %color = ( 'MEME' => 'red', 'ALIGNACE => 'blue'); > foreach my $seqid ( sort {$a <=> $b }keys %nlist ) { my( @feild ) = split( /\s+/, $nlist{$seqid} ); my $prog_name = $feild[3]; > my $track = $panel->add_track( > -glyph => 'graded_segments', > -key => "SEQ " . $seqid, > -connector => "dashed", > -label => 1, > -fontcolor => 'red', -bgcolor => $color{ $prog_name }, > -bump => +1, > -height => 8, > -min_score => 0, > -max_score => 500000 > ); > # rest of the script > __END__ > > Regards, > Edward > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bix at sendu.me.uk Thu Aug 16 09:49:52 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 16 Aug 2007 14:49:52 +0100 Subject: [Bioperl-l] Request for Activeperl software In-Reply-To: <32393254.1187235922749.JavaMail.oracle@my.aamu.edu> References: <32393254.1187235922749.JavaMail.oracle@my.aamu.edu> Message-ID: <46C45600.4040906@sendu.me.uk> pratchusha kamireddy wrote: > I am Pratchusha Kamireddy doing masters in Alabama A&M University. I > am working under Dr.Kantety in Plant and Soil Science Department.I am > the beginner to learn perl programming. I need Activeperl software to > run the perl programs. Can you help me in this regard like: where can > I dowmload this software, how can i Install this and how can i use > this. I am eagerlu waiting for your reply.Please help me in this > regard. Firstly, Google is your friend: http://www.google.co.uk/search?q=activeperl The first hit is the correct one: http://www.activestate.com/Products/activeperl/ I suppose your next question will be how to install Bioperl (if not, you're in the wrong place): http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows (which also tells you where to get ActivePerl from) From cjfields at uiuc.edu Thu Aug 16 10:11:22 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 16 Aug 2007 09:11:22 -0500 Subject: [Bioperl-l] How to create contrasting colors in every singe track - Bio::Graphics In-Reply-To: <3521d3670708152118y415f512clc51046cd7ae8c11a@mail.gmail.com> References: <3521d3670708152118y415f512clc51046cd7ae8c11a@mail.gmail.com> Message-ID: On Aug 15, 2007, at 11:18 PM, Edward Wijaya wrote: > Dear experts, > > I am trying to draw a figures that shows binding sites hits for > various > program (see attached) for example. > > Now, I have a problem in creating contrasting colour for each of > the Programs (MEME, AlignACE, etc). I want to avoid "graded > segments", > so that I can have more contrasting color, e.g: red, blue, yellow, > etc. > > Can anybody suggest how can we achieve that? > > My full source code can be found here: http://dpaste.com/16985/ > The portion of the script is this: > > __BEGIN__ > my %prog_color = ( > "Actual" => 800000, > "ALIGNACE" => 230000, > "BP" => 80000, > "MDSCAN" => 5000, > "MITRA" => 10000, > "MTSAMP" => 200000, > "SPACE" => 40000, > "NONE" => 0, > ); > > foreach my $seqid ( sort {$a <=> $b }keys %nlist ) { > my $track = $panel->add_track( > -glyph => 'graded_segments', > -key => "SEQ " . $seqid, > -connector => "dashed", > -label => 1, > -fontcolor => 'red', > -bgcolor => 'blue', > -bump => +1, > -height => 8, > -min_score => 0, > -max_score => 500000 > ); > # rest of the script > __END__ > > Regards, > Edward I think you have two options: 1) Split the seqfeatures into different tracks based on the source (AlignACE, MP, etc), then give each it's own graded segment color. I like this personally as it doesn't glob various results together onto one track and (at least to me) is easier to maintain. It also allows one more flexibility in using varying scoring schemes. 2) Use a callback for bgcolor which changes the color explicitly based on the source/score. The GenBank/EMBL section of the Bio::Graphics HOWTO reveals how to add different tracks, and there are several scattered examples on how to use callbacks. http://www.bioperl.org/wiki/HOWTO:Graphics chris From cjfields at uiuc.edu Thu Aug 16 10:12:30 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 16 Aug 2007 09:12:30 -0500 Subject: [Bioperl-l] PDB Parser In-Reply-To: <46C41FEC.2000206@sendu.me.uk> References: <764978cf0708152256w38b2bfa7ic4d196ae08a46bd0@mail.gmail.com> <46C41FEC.2000206@sendu.me.uk> Message-ID: <5D32F747-60FC-4EEE-BD38-3A522A67EA27@uiuc.edu> On Aug 16, 2007, at 4:59 AM, Sendu Bala wrote: > neeti somaiya wrote: >> I tried using Bio::Structure::IO::pdb with some code like :- >> use Bio::Structure::IO; >> >> $in = Bio::Structure::IO->new(-file => "pdb100d.ent", >> -format => 'pdb'); >> >> while ( my $struc = $in->next_structure() ) { >> print "Structure ", $struc->id,"\n"; >> } >> >> It works well. But I am not able to find documentation of other >> methods >> which will give me various specific details available in a pdb >> file, right >> from title, keywords, references to structure details, atoms, >> coordinates >> etc. There must be different methods to fetch and parse each of >> this data >> from a pdb file, right? Where can I find the details? > > $struct is a Bio::Structure::Entry, so look at the docs for that: > http://doc.bioperl.org/bioperl-live/Bio/Structure/Entry.html > > You'll probably want to look at the docs for the other Structure > modules > as well: > http://doc.bioperl.org/bioperl-live/Bio/Structure/modules.html > > > I agree, the documentation in this area could be improved. > Bio::Structure::StructureI could actually contain something, and > Bio::Structure should actually exist or not be referenced in the docs. There was a discussion a while back on refactoring the code within Bio::Structure to better deal with HETATM and other stuff. As far as I'm concerned it's open for anyone wanted to tinker with it. chris From cjfields at uiuc.edu Thu Aug 16 10:37:31 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 16 Aug 2007 09:37:31 -0500 Subject: [Bioperl-l] Announcement: infernal/erpin/rnamotif parsers Message-ID: <7CE60504-FA1A-4AFF-A02E-036B8E37C3F9@uiuc.edu> To anyone using the aforementioned parsers: I don't plan on continuing development of the Bio::Tools-related Infernal, RNAMotif, and ERPIN parsers at this time unless there is substantial interest in doing so. Instead, I plan on focusing my efforts on the Bio::SearchIO-based parsers as I feel they are much better at representing the data present in the output. In my opinion having two sets of parsers that accomplish essentially the same task is redundant and non-productive. Again, if there is considerable interest in keeping them I suggest responding to this message, otherwise I would consider them deprecated and removed completely by rel 1.7 (maybe sooner). Infernal: It's very likely that a new stable version (v. 1.0) of Infernal will be released in the near future. I may upgrade the Bio::SearchIO-based parser in the meantime to parse the latest Infernal output (v 0.81), but I don't plan on supporting pre-1.0 releases once the final version is out. Infernal has been in developer release for some time now and the program output has changed dramatically over time; however, the format is expected to solidify once a stable release is made, which makes supporting the parser much easier over time. Questions? Gripes? chris From awitney at sgul.ac.uk Thu Aug 16 10:07:02 2007 From: awitney at sgul.ac.uk (Adam Witney) Date: Thu, 16 Aug 2007 15:07:02 +0100 Subject: [Bioperl-l] Request for Activeperl software In-Reply-To: <32393254.1187235922749.JavaMail.oracle@my.aamu.edu> Message-ID: This would be the best place to start http://www.activeperl.org/ Or more specifically for the language: http://www.activeperl.org/store/activeperl/download/ (Which will require you to register with them) adam On 16/8/07 04:45, "pratchusha kamireddy" wrote: > Hello > I am Pratchusha Kamireddy doing masters in Alabama A&M University. I am > working under Dr.Kantety in Plant and Soil Science Department.I am the > beginner to learn perl programming. I need Activeperl software to run the perl > programs. Can you help me in this regard like: where can I dowmload this > software, how can i Install this and how can i use this. I am eagerlu waiting > for your reply.Please help me in this regard. > Thanking you > Pratchusha Kamireddy > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From muratem at eng.uah.edu Thu Aug 16 15:10:34 2007 From: muratem at eng.uah.edu (muratem at eng.uah.edu) Date: Thu, 16 Aug 2007 14:10:34 -0500 (CDT) Subject: [Bioperl-l] Problem with Bio::SeqIO::staden::read on Mac OS X Message-ID: <27981.69.147.139.126.1187291434.squirrel@webmail.eng.uah.edu> Hello This might not be the correct list for this particular problem, but hopefully someone can help. I am trying to install ...staden::read on a Mac OS X 10.4. I tried installing cpan but it wouldn't work so I went to the manual methods. Perl is on the system and appears to be installed correctly for a Mac. Bioperl 1.5.2 was installed via fink and appears to be OK also. I'm trying to install the Bio::SeqIO::staden::read module. I downloaded the bioperl-ext-1.5.1 tarball from bioperl.org, did the usual perl Makefile.PL and make and get: newyork:/usr/local/bioperl-ext-1.5.1 root# make Makefile:1148: *** multiple target patterns. Stop. A snippet from the Makefile... 1148 pm_to_blib: $(TO_INST_PM) 1149 $(NOECHO) $(PERLRUN) -MExtUtils::Install -e 'pm_to_blib({@ARGV}, '\''$(INST_LIB)/auto'\'', '\''$(PM_FILTER)'\'')'\ 1150 Bio/Ext/Align/libs/hscore.h $(INST_LIB)/Bio/Ext/Align/libs/hscore.h \ 1151 Bio/Ext/Align/libs/probability.c $(INST_LIB)/Bio/Ext/Align/libs/probability.c \ 1152 Bio/Ext/Align/libs/linesubs.h $(INST_LIB)/Bio/Ext/Align/libs/linesubs.h \ 1153 Bio/Ext/Align/test.pl $(INST_LIB)/Bio/Ext/Align/test.pl \ 1154 Bio/Ext/Align/libs/wiseoverlay.h $(INST_LIB)/Bio/Ext/Align/libs/wiseoverlay.h \ 1155 Bio/Ext/Align/libs/proteinsw.h $(INST_LIB)/Bio/Ext/Align/libs/proteinsw.h \ 1156 Bio/Ext/Align/libs/wisebase.h $(INST_LIB)/Bio/Ext/Align/libs/wisebase.h \ 1157 Bio/Ext/Align/libs/seqaligndisplay.h $(INST_LIB)/Bio/Ext/Align/libs/seqaligndisplay.h \ 1158 Bio/Ext/Align/libs/dyna.h $(INST_LIB)/Bio/Ext/Align/libs/dyna.h \ The README says you don't have to build the whole package, so I descended to the staden directory and did a Make and didn't get any problems reported. But when I did a make test I get: newyork:/usr/local/bioperl-ext-1.5.1/Bio/SeqIO/staden root# make test PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, '../blib/lib', '../blib/arch')" test.pl test....Had problems bootstrapping Inline module 'Bio::SeqIO::staden::read' Can't load '/usr/local/bioperl-ext-1.5.1/Bio/SeqIO/staden/../blib/arch/auto/Bio/SeqIO/staden/read/read.bundle' for module Bio::SeqIO::staden::read: dlopen(/usr/local/bioperl-ext-1.5.1/Bio/SeqIO/staden/../blib/arch/auto/Bio/SeqIO/staden/read/read.bundle, 2): Symbol not found: _curl_easy_init Referenced from: /usr/local/bioperl-ext-1.5.1/Bio/SeqIO/staden/../blib/arch/auto/Bio/SeqIO/staden/read/read.bundle Expected in: dynamic lookup at /Library/Perl/5.8.6/Inline.pm line 500 at test.pl line 0 INIT failed--call queue aborted, line 1. test....dubious Test returned status 255 (wstat 65280, 0xff00) DIED. FAILED tests 1-94 Failed 94/94 tests, 0.00% okay Failed Test Stat Wstat Total Fail Failed List of Failed ------------------------------------------------------------------------------- test.pl 255 65280 94 188 200.00% 1-94 Failed 1/1 test scripts, 0.00% okay. 94/94 subtests failed, 0.00% okay. make: *** [test_dynamic] Error 2 The missing symbol is apparently from libcurl. I have both libcurl.2.dylib and libcurl.3.dylib with copies in multiple locations including /usr/lib, /usr/local/lib and the usual Mac directories. I used the Mac otool to look at the externals in read.bundle and it references libz.1.dylib and libSystem.B.dylib. Could this be a case where there should have been a link to libcurl and wasn't? I've searched the list and see only the Inline versioning problem (which I had and fixed). Has anybody seen this problem before or built the module on a Mac? How did you do it? Is this a question for the Staden list on sourceforge? Thanks Mike From cjfields at uiuc.edu Thu Aug 16 15:55:05 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 16 Aug 2007 14:55:05 -0500 Subject: [Bioperl-l] Problem with Bio::SeqIO::staden::read on Mac OS X In-Reply-To: <27981.69.147.139.126.1187291434.squirrel@webmail.eng.uah.edu> References: <27981.69.147.139.126.1187291434.squirrel@webmail.eng.uah.edu> Message-ID: <9BBC30AD-9AFE-4D52-88E4-656D9EB8924E@uiuc.edu> On Aug 16, 2007, at 2:10 PM, muratem at eng.uah.edu wrote: > Hello > > This might not be the correct list for this particular problem, but > hopefully someone can help. I am trying to install ...staden::read > on a > Mac OS X 10.4. I tried installing cpan but it wouldn't work so I > went to > the manual methods. Perl is on the system and appears to be installed > correctly for a Mac. Bioperl 1.5.2 was installed via fink and > appears to > be OK also. I'm trying to install the Bio::SeqIO::staden::read > module. I > downloaded the bioperl-ext-1.5.1 tarball from bioperl.org, did the > usual > perl Makefile.PL and make and get: > > newyork:/usr/local/bioperl-ext-1.5.1 root# make > Makefile:1148: *** multiple target patterns. Stop. > > A snippet from the Makefile... > > 1148 pm_to_blib: $(TO_INST_PM) > 1149 $(NOECHO) $(PERLRUN) -MExtUtils::Install -e > 'pm_to_blib({@ARGV}, '\''$(INST_LIB)/auto'\'', '\''$(PM_FILTER)'\'')'\ > 1150 Bio/Ext/Align/libs/hscore.h > $(INST_LIB)/Bio/Ext/Align/libs/hscore.h \ > 1151 Bio/Ext/Align/libs/probability.c > $(INST_LIB)/Bio/Ext/Align/libs/probability.c \ > 1152 Bio/Ext/Align/libs/linesubs.h > $(INST_LIB)/Bio/Ext/Align/libs/linesubs.h \ > 1153 Bio/Ext/Align/test.pl $(INST_LIB)/Bio/Ext/Align/ > test.pl \ > 1154 Bio/Ext/Align/libs/wiseoverlay.h > $(INST_LIB)/Bio/Ext/Align/libs/wiseoverlay.h \ > 1155 Bio/Ext/Align/libs/proteinsw.h > $(INST_LIB)/Bio/Ext/Align/libs/proteinsw.h \ > 1156 Bio/Ext/Align/libs/wisebase.h > $(INST_LIB)/Bio/Ext/Align/libs/wisebase.h \ > 1157 Bio/Ext/Align/libs/seqaligndisplay.h > $(INST_LIB)/Bio/Ext/Align/libs/seqaligndisplay.h \ > 1158 Bio/Ext/Align/libs/dyna.h > $(INST_LIB)/Bio/Ext/Align/libs/dyna.h \ > > The README says you don't have to build the whole package, so I > descended > to the staden directory and did a Make and didn't get any problems > reported. But when I did a make test I get: > > newyork:/usr/local/bioperl-ext-1.5.1/Bio/SeqIO/staden root# make test > PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" > "test_harness(0, '../blib/lib', '../blib/arch')" test.pl > test....Had problems bootstrapping Inline module > 'Bio::SeqIO::staden::read' > > Can't load > '/usr/local/bioperl-ext-1.5.1/Bio/SeqIO/staden/../blib/arch/auto/ > Bio/SeqIO/staden/read/read.bundle' > for module Bio::SeqIO::staden::read: > dlopen(/usr/local/bioperl-ext-1.5.1/Bio/SeqIO/staden/../blib/arch/ > auto/Bio/SeqIO/staden/read/read.bundle, > 2): Symbol not found: _curl_easy_init > Referenced from: > /usr/local/bioperl-ext-1.5.1/Bio/SeqIO/staden/../blib/arch/auto/Bio/ > SeqIO/staden/read/read.bundle > Expected in: dynamic lookup > at /Library/Perl/5.8.6/Inline.pm line 500 > > > at test.pl line 0 > INIT failed--call queue aborted, line 1. > test....dubious > Test returned status 255 (wstat 65280, 0xff00) > DIED. FAILED tests 1-94 > Failed 94/94 tests, 0.00% okay > Failed Test Stat Wstat Total Fail Failed List of Failed > ---------------------------------------------------------------------- > --------- > test.pl 255 65280 94 188 200.00% 1-94 > Failed 1/1 test scripts, 0.00% okay. 94/94 subtests failed, 0.00% > okay. > make: *** [test_dynamic] Error 2 > > The missing symbol is apparently from libcurl. I have both libcurl. > 2.dylib > and libcurl.3.dylib with copies in multiple locations including / > usr/lib, > /usr/local/lib and the usual Mac directories. I used the Mac otool > to look > at the externals in read.bundle and it references libz.1.dylib and > libSystem.B.dylib. Could this be a case where there should have been a > link to libcurl and wasn't? > > I've searched the list and see only the Inline versioning problem > (which I > had and fixed). Has anybody seen this problem before or built the > module > on a Mac? How did you do it? Is this a question for the Staden list on > sourceforge? > > Thanks > > Mike Haven't seen the problem you list. I have installed it on Mac OS X (intel) w/o problems so I know it works; at least all tests passed though I remember Inline complaining for some reason. You should try using bioperl-ext from CVS (it is really 1.5.1 but with updated docs and maybe a change or two). The process is a little tricky but is documented in the README in the package. You'll need the old io_lib (1.8.12 or earlier) from Staden if memory serves. chris From zhaodj at ioz.ac.cn Thu Aug 16 22:13:16 2007 From: zhaodj at ioz.ac.cn (De-Jian,ZHAO) Date: Fri, 17 Aug 2007 10:13:16 +0800 (CST) Subject: [Bioperl-l] How to get the full methods of a bioperl object? Message-ID: <55928.159.226.67.49.1187316796.squirrel@mail.ioz.ac.cn> Dear list members, I have a question about the methods of bioperl objects.It is how and where we can get the whole methods of a bioperl object. Take Bio::Tools::Run::RemoteBlast for example. In the synopsis of this object, some sample codes are given.The following five clauses are excerpted from the synopsis. (1)my $factory = Bio::Tools::Run::RemoteBlast->new(@params); (2)while ( my @rids = $factory->each_rid ) { (3)$factory->remove_rid($rid); (4)my $rc = $factory->retrieve_blast($rid); (5)my $r = $factory->submit_blast($input); The five clauses use five methods of the RemoteBlast object,i.e. (1)new, (2)each_rid, (3)remove_rid,(4)retrieve_blast,and (5)submit_blast. However,I only find part of them(45) are listed in the appendix while others(123) are absent. Are there some more methods not explictly declared? I don't know.This will lead to the partial understanding and utilization of the module.Therefore I come here for the way to get the full methods of a bioperl object. Thanks! -- De-Jian Zhao Institute of Zoology,Chinese Academy of Sciences +86-10-64807217 zhaodj at ioz.ac.cn From zhaodj at ioz.ac.cn Thu Aug 16 22:13:16 2007 From: zhaodj at ioz.ac.cn (De-Jian,ZHAO) Date: Fri, 17 Aug 2007 10:13:16 +0800 (CST) Subject: [Bioperl-l] How to get the full methods of a bioperl object? Message-ID: <55928.159.226.67.49.1187316796.squirrel@mail.ioz.ac.cn> Dear list members, I have a question about the methods of bioperl objects.It is how and where we can get the whole methods of a bioperl object. Take Bio::Tools::Run::RemoteBlast for example. In the synopsis of this object, some sample codes are given.The following five clauses are excerpted from the synopsis. (1)my $factory = Bio::Tools::Run::RemoteBlast->new(@params); (2)while ( my @rids = $factory->each_rid ) { (3)$factory->remove_rid($rid); (4)my $rc = $factory->retrieve_blast($rid); (5)my $r = $factory->submit_blast($input); The five clauses use five methods of the RemoteBlast object,i.e. (1)new, (2)each_rid, (3)remove_rid,(4)retrieve_blast,and (5)submit_blast. However,I only find part of them(45) are listed in the appendix while others(123) are absent. Are there some more methods not explictly declared? I don't know.This will lead to the partial understanding and utilization of the module.Therefore I come here for the way to get the full methods of a bioperl object. Thanks! -- De-Jian Zhao Institute of Zoology,Chinese Academy of Sciences +86-10-64807217 zhaodj at ioz.ac.cn From neetisomaiya at gmail.com Fri Aug 17 02:23:08 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Fri, 17 Aug 2007 11:53:08 +0530 Subject: [Bioperl-l] PDB Parser In-Reply-To: <5D32F747-60FC-4EEE-BD38-3A522A67EA27@uiuc.edu> References: <764978cf0708152256w38b2bfa7ic4d196ae08a46bd0@mail.gmail.com> <46C41FEC.2000206@sendu.me.uk> <5D32F747-60FC-4EEE-BD38-3A522A67EA27@uiuc.edu> Message-ID: <764978cf0708162323r17c4fc59w5adfb61ccfc5ac6@mail.gmail.com> Hi, My main concern is just the pdb id and title. PDB id I am able to fetch easily, but is there a method which can give me the title of the PDB structure? Like for example from the following :- HEADER DNA/RNA 05-DEC-94 100D TITLE CRYSTAL STRUCTURE OF THE HIGHLY DISTORTED CHIMERIC DECAMER TITLE 2 R(C)D(CGGCGCCG)R(G)-SPERMINE COMPLEX-SPERMINE BINDING TO TITLE 3 PHOSPHATE ONLY AND MINOR GROOVE TERTIARY BASE-PAIRING COMPND MOL_ID: 1; COMPND 2 MOLECULE: DNA/RNA (5'-R(*CP*)-D(*CP*GP*GP*CP*GP*CP*CP*GP*)- COMPND 3 R(*G)-3'); COMPND 4 CHAIN: A, B; . . . . I just want "CRYSTAL STRUCTURE OF THE HIGHLY DISTORTED CHIMERIC DECAMER R(C)D(CGGCGCCG)R(G)-SPERMINE COMPLEX-SPERMINE BINDING TO PHOSPHATE ONLY AND MINOR GROOVE TERTIARY BASE-PAIRING". Thanks, Neeti. On 8/16/07, Chris Fields wrote: > > > On Aug 16, 2007, at 4:59 AM, Sendu Bala wrote: > > > neeti somaiya wrote: > >> I tried using Bio::Structure::IO::pdb with some code like :- > >> use Bio::Structure::IO; > >> > >> $in = Bio::Structure::IO->new(-file => " pdb100d.ent", > >> -format => 'pdb'); > >> > >> while ( my $struc = $in->next_structure() ) { > >> print "Structure ", $struc->id,"\n"; > >> } > >> > >> It works well. But I am not able to find documentation of other > >> methods > >> which will give me various specific details available in a pdb > >> file, right > >> from title, keywords, references to structure details, atoms, > >> coordinates > >> etc. There must be different methods to fetch and parse each of > >> this data > >> from a pdb file, right? Where can I find the details? > > > > $struct is a Bio::Structure::Entry, so look at the docs for that: > > http://doc.bioperl.org/bioperl-live/Bio/Structure/Entry.html > > > > You'll probably want to look at the docs for the other Structure > > modules > > as well: > > http://doc.bioperl.org/bioperl-live/Bio/Structure/modules.html > > > > > > I agree, the documentation in this area could be improved. > > Bio::Structure::StructureI could actually contain something, and > > Bio::Structure should actually exist or not be referenced in the docs. > > There was a discussion a while back on refactoring the code within > Bio::Structure to better deal with HETATM and other stuff. As far as > I'm concerned it's open for anyone wanted to tinker with it. > > chris > -- -Neeti Even my blood says, B positive From alexl at users.sourceforge.net Fri Aug 17 03:22:16 2007 From: alexl at users.sourceforge.net (Alex Lancaster) Date: Fri, 17 Aug 2007 00:22:16 -0700 Subject: [Bioperl-l] Clarifying license of bioperl Message-ID: Hi all, I'd like to clarify the license of bioperl. Currently the LICENSE only includes the text of the Artistic artist. But the wiki http://www.bioperl.org/wiki/FAQ#What_are_the_license_terms_for_BioPerl.3F says: BioPerl is licensed under the same terms as Perl itself which is the Perl Artistic License (see http://www.perl.com/pub/a/language/misc/Artistic.html or http://www.opensource.org/licenses/artistic-license.html and most of the modules in the source say: "You may distribute this module under the same terms as perl itself" But the current distribution of Perl is actually dually-licensed under the GPL or Artistic licenses (so the wiki is technically out of sync with the "same terms as Perl itself"), see: http://dev.perl.org/licenses/ I assume that the intent of the bioperl authors is to license with the same terms as Perl's *current* license (which would mean bioperl is really effectively dually-licensed under the GPL or Artistic license). If so, it would be good if the LICENSE text and the wiki were updated to reflect this. Also some of the source modules say "under the same terms as perl itself", but then only mention the Artistic license. This has important ramifications for distribution: I maintain the Fedora package for bioperl and I have currently listed the license of bioperl as "GPL or Artistic". But if bioperl were distributed under the Artistic license only then I would have to pull the package from the distribution, because the Artistic 1.0 (original)-only license is deprecated (but "GPL or Artistic" is OK): http://fedoraproject.org/wiki/Licensing#head-d8cc605dd386091c8b6be97b8a43fb6a5d624ae1 Thanks! Alex From alexl at users.sourceforge.net Fri Aug 17 03:42:07 2007 From: alexl at users.sourceforge.net (Alex Lancaster) Date: Fri, 17 Aug 2007 00:42:07 -0700 Subject: [Bioperl-l] Clarifying license of bioperl In-Reply-To: (Alex Lancaster's message of "Fri\, 17 Aug 2007 00\:22\:16 -0700") References: Message-ID: >>>>> "AL" == Alex Lancaster writes: [...] AL> I assume that the intent of the bioperl authors is to license with AL> the same terms as Perl's *current* license (which would mean AL> bioperl is really effectively dually-licensed under the GPL or AL> Artistic license). If so, it would be good if the LICENSE text AL> and the wiki were updated to reflect this. Also note that since Perl's license is a dual-license "GPL or Artistic" then people aren't required to submit their modifications back to the bioperl distribution because they can choose to follow the Artistic (rather than the GPL) license which doesn't require modifications to be submitted back. This means the point: "If you fix bugs, please let us know about them. This is not the GPL license so you are not required to submit the code fixes, but in the spirit of making a better product we hope you'll contribute back to the community any insight or code improvements." listed here: http://www.bioperl.org/wiki/Licensing_BioPerl would still stand, because you can choose the Artistic license, but you could modify the clause to say: "If you fix bugs, please let us know about them. Because Bioperl is dual-licensed under the GPL or Artistic licenses, you can choose the Artistic license, which means that you are not required to submit the code fixes, but in the spirit of making a better product we hope you'll contribute back to the community any insight or code improvements." From n.haigh at sheffield.ac.uk Fri Aug 17 06:27:43 2007 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Fri, 17 Aug 2007 11:27:43 +0100 Subject: [Bioperl-l] How to get the full methods of a bioperl object? In-Reply-To: <55928.159.226.67.49.1187316796.squirrel@mail.ioz.ac.cn> References: <55928.159.226.67.49.1187316796.squirrel@mail.ioz.ac.cn> Message-ID: <46C5781F.60301@sheffield.ac.uk> De-Jian,ZHAO wrote: > Dear list members, > > I have a question about the methods of bioperl objects.It is how and > where we can get the whole methods of a bioperl object. > > Take Bio::Tools::Run::RemoteBlast for example. In the synopsis of > this object, some sample codes are given.The following five clauses > are excerpted from the synopsis. > (1)my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > (2)while ( my @rids = $factory->each_rid ) { > (3)$factory->remove_rid($rid); > (4)my $rc = $factory->retrieve_blast($rid); > (5)my $r = $factory->submit_blast($input); > > The five clauses use five methods of the RemoteBlast object,i.e. > (1)new, (2)each_rid, (3)remove_rid,(4)retrieve_blast,and > (5)submit_blast. However,I only find part of them(45) are listed in > the appendix while others(123) are absent. Are there some more > methods not explictly declared? I don't know.This will lead to the > partial understanding and utilization of the module.Therefore I come > here for the way to get the full methods of a bioperl object. > > Thanks! > You should check out the Deobfuscator at: http://bioperl.org/cgi-bin/deob_interface.cgi Search and choose the object of choice. e.g. Bio::Tools::Run::RemoteBlast You will be provided a list of methods available to that object, including all the methods up the inheritance hierarchy. Unfortunately, some bioperl modules are documented more thoroughly than others. Nath From neetisomaiya at gmail.com Fri Aug 17 06:42:09 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Fri, 17 Aug 2007 16:12:09 +0530 Subject: [Bioperl-l] PDB Parser In-Reply-To: <764978cf0708162323r17c4fc59w5adfb61ccfc5ac6@mail.gmail.com> References: <764978cf0708152256w38b2bfa7ic4d196ae08a46bd0@mail.gmail.com> <46C41FEC.2000206@sendu.me.uk> <5D32F747-60FC-4EEE-BD38-3A522A67EA27@uiuc.edu> <764978cf0708162323r17c4fc59w5adfb61ccfc5ac6@mail.gmail.com> Message-ID: <764978cf0708170342q45acbea1vebaf1a8defb93896@mail.gmail.com> Hi, I have done it currently as follows : while ( my $struc = $in->next_structure() ) { my $title; my $pdb_id = $struc->id; print "Structure ", $pdb_id,"\n"; my $ac = $struc->annotation(); foreach my $key ( $ac->get_all_annotation_keys() ) { if($key eq "title") { my @values = $ac->get_Annotations($key); foreach my $value (@values) { $title = $value->as_text; chomp($title); if($title =~ /Value\: (.*)/) { $title = $1; } $title =~ s/\s+/ /g; print "Title ",$title,"\n"; last; } last; } } } Is this ok? On 8/17/07, neeti somaiya wrote: > > Hi, > > My main concern is just the pdb id and title. PDB id I am able to fetch > easily, but is there a method which can give me the title of the PDB > structure? > > Like for example from the following :- > > HEADER DNA/RNA 05-DEC-94 100D > TITLE CRYSTAL STRUCTURE OF THE HIGHLY DISTORTED CHIMERIC DECAMER > TITLE 2 R(C)D(CGGCGCCG)R(G)-SPERMINE COMPLEX-SPERMINE BINDING TO > TITLE 3 PHOSPHATE ONLY AND MINOR GROOVE TERTIARY BASE-PAIRING > COMPND MOL_ID: 1; > COMPND 2 MOLECULE: DNA/RNA (5'-R(*CP*)-D(*CP*GP*GP*CP*GP*CP*CP*GP*)- > COMPND 3 R(*G)-3'); > COMPND 4 CHAIN: A, B; > . > . > . > . > > I just want "CRYSTAL STRUCTURE OF THE HIGHLY DISTORTED CHIMERIC DECAMER > R(C)D(CGGCGCCG)R(G)-SPERMINE COMPLEX-SPERMINE BINDING TO PHOSPHATE ONLY AND > MINOR GROOVE TERTIARY BASE-PAIRING". > > Thanks, > Neeti. > > On 8/16/07, Chris Fields wrote: > > > > > > On Aug 16, 2007, at 4:59 AM, Sendu Bala wrote: > > > > > neeti somaiya wrote: > > >> I tried using Bio::Structure::IO::pdb with some code like :- > > >> use Bio::Structure::IO; > > >> > > >> $in = Bio::Structure::IO->new(-file => " pdb100d.ent", > > >> -format => 'pdb'); > > >> > > >> while ( my $struc = $in->next_structure() ) { > > >> print "Structure ", $struc->id,"\n"; > > >> } > > >> > > >> It works well. But I am not able to find documentation of other > > >> methods > > >> which will give me various specific details available in a pdb > > >> file, right > > >> from title, keywords, references to structure details, atoms, > > >> coordinates > > >> etc. There must be different methods to fetch and parse each of > > >> this data > > >> from a pdb file, right? Where can I find the details? > > > > > > $struct is a Bio::Structure::Entry, so look at the docs for that: > > > http://doc.bioperl.org/bioperl-live/Bio/Structure/Entry.html > > > > > > You'll probably want to look at the docs for the other Structure > > > modules > > > as well: > > > http://doc.bioperl.org/bioperl-live/Bio/Structure/modules.html > > > > > > > > > I agree, the documentation in this area could be improved. > > > Bio::Structure::StructureI could actually contain something, and > > > Bio::Structure should actually exist or not be referenced in the docs. > > > > > > There was a discussion a while back on refactoring the code within > > Bio::Structure to better deal with HETATM and other stuff. As far as > > I'm concerned it's open for anyone wanted to tinker with it. > > > > chris > > > > > > -- > -Neeti > Even my blood says, B positive > -- -Neeti Even my blood says, B positive From n.haigh at sheffield.ac.uk Fri Aug 17 06:27:43 2007 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Fri, 17 Aug 2007 11:27:43 +0100 Subject: [Bioperl-l] How to get the full methods of a bioperl object? In-Reply-To: <55928.159.226.67.49.1187316796.squirrel@mail.ioz.ac.cn> References: <55928.159.226.67.49.1187316796.squirrel@mail.ioz.ac.cn> Message-ID: <46C5781F.60301@sheffield.ac.uk> De-Jian,ZHAO wrote: > Dear list members, > > I have a question about the methods of bioperl objects.It is how and > where we can get the whole methods of a bioperl object. > > Take Bio::Tools::Run::RemoteBlast for example. In the synopsis of > this object, some sample codes are given.The following five clauses > are excerpted from the synopsis. > (1)my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > (2)while ( my @rids = $factory->each_rid ) { > (3)$factory->remove_rid($rid); > (4)my $rc = $factory->retrieve_blast($rid); > (5)my $r = $factory->submit_blast($input); > > The five clauses use five methods of the RemoteBlast object,i.e. > (1)new, (2)each_rid, (3)remove_rid,(4)retrieve_blast,and > (5)submit_blast. However,I only find part of them(45) are listed in > the appendix while others(123) are absent. Are there some more > methods not explictly declared? I don't know.This will lead to the > partial understanding and utilization of the module.Therefore I come > here for the way to get the full methods of a bioperl object. > > Thanks! > You should check out the Deobfuscator at: http://bioperl.org/cgi-bin/deob_interface.cgi Search and choose the object of choice. e.g. Bio::Tools::Run::RemoteBlast You will be provided a list of methods available to that object, including all the methods up the inheritance hierarchy. Unfortunately, some bioperl modules are documented more thoroughly than others. Nath From bix at sendu.me.uk Fri Aug 17 09:35:01 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 17 Aug 2007 14:35:01 +0100 Subject: [Bioperl-l] PDB Parser In-Reply-To: <764978cf0708170342q45acbea1vebaf1a8defb93896@mail.gmail.com> References: <764978cf0708152256w38b2bfa7ic4d196ae08a46bd0@mail.gmail.com> <46C41FEC.2000206@sendu.me.uk> <5D32F747-60FC-4EEE-BD38-3A522A67EA27@uiuc.edu> <764978cf0708162323r17c4fc59w5adfb61ccfc5ac6@mail.gmail.com> <764978cf0708170342q45acbea1vebaf1a8defb93896@mail.gmail.com> Message-ID: <46C5A405.2070005@sendu.me.uk> neeti somaiya wrote: > Hi, > > I have done it currently as follows : [snip] > Is this ok? If it works, of course. There seems to be some redundant code there, however. I'm guessing this would be better (assuming your code worked in the first place): while (my $struc = $in->next_structure()) { my $pdb_id = $struc->id; print "Structure ", $pdb_id,"\n"; my $ac = $struc->annotation(); my ($title) = $ac->get_Annotations('title'); $title = $title->as_text; chomp($title); if ($title =~ /Value\: (.*)/) { $title = $1; } $title =~ s/\s+/ /g; print "Title ",$title,"\n"; } From muratem at eng.uah.edu Fri Aug 17 10:03:22 2007 From: muratem at eng.uah.edu (Mike Muratet) Date: Fri, 17 Aug 2007 09:03:22 -0500 (CDT) Subject: [Bioperl-l] Problem with Bio::SeqIO::staden::read on Mac OS X In-Reply-To: <9BBC30AD-9AFE-4D52-88E4-656D9EB8924E@uiuc.edu> References: <27981.69.147.139.126.1187291434.squirrel@webmail.eng.uah.edu> <9BBC30AD-9AFE-4D52-88E4-656D9EB8924E@uiuc.edu> Message-ID: On Thu, 16 Aug 2007, Chris Fields wrote: > Date: Thu, 16 Aug 2007 14:55:05 -0500 > From: Chris Fields > To: muratem at eng.uah.edu > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Problem with Bio::SeqIO::staden::read on Mac OS X > > > On Aug 16, 2007, at 2:10 PM, muratem at eng.uah.edu wrote: > >> Hello >> >> This might not be the correct list for this particular problem, but >> hopefully someone can help. I am trying to install ...staden::read on a >> Mac OS X 10.4. I tried installing cpan but it wouldn't work so I went to >> the manual methods. Perl is on the system and appears to be installed >> correctly for a Mac. Bioperl 1.5.2 was installed via fink and appears to >> be OK also. I'm trying to install the Bio::SeqIO::staden::read module. I >> downloaded the bioperl-ext-1.5.1 tarball from bioperl.org, did the usual >> perl Makefile.PL and make and get: >> >> newyork:/usr/local/bioperl-ext-1.5.1 root# make >> Makefile:1148: *** multiple target patterns. Stop. >> >> A snippet from the Makefile... >> >> 1148 pm_to_blib: $(TO_INST_PM) >> 1149 $(NOECHO) $(PERLRUN) -MExtUtils::Install -e >> 'pm_to_blib({@ARGV}, '\''$(INST_LIB)/auto'\'', '\''$(PM_FILTER)'\'')'\ >> 1150 Bio/Ext/Align/libs/hscore.h >> $(INST_LIB)/Bio/Ext/Align/libs/hscore.h \ >> 1151 Bio/Ext/Align/libs/probability.c >> $(INST_LIB)/Bio/Ext/Align/libs/probability.c \ >> 1152 Bio/Ext/Align/libs/linesubs.h >> $(INST_LIB)/Bio/Ext/Align/libs/linesubs.h \ >> 1153 Bio/Ext/Align/test.pl $(INST_LIB)/Bio/Ext/Align/test.pl >> \ >> 1154 Bio/Ext/Align/libs/wiseoverlay.h >> $(INST_LIB)/Bio/Ext/Align/libs/wiseoverlay.h \ >> 1155 Bio/Ext/Align/libs/proteinsw.h >> $(INST_LIB)/Bio/Ext/Align/libs/proteinsw.h \ >> 1156 Bio/Ext/Align/libs/wisebase.h >> $(INST_LIB)/Bio/Ext/Align/libs/wisebase.h \ >> 1157 Bio/Ext/Align/libs/seqaligndisplay.h >> $(INST_LIB)/Bio/Ext/Align/libs/seqaligndisplay.h \ >> 1158 Bio/Ext/Align/libs/dyna.h >> $(INST_LIB)/Bio/Ext/Align/libs/dyna.h \ >> >> The README says you don't have to build the whole package, so I descended >> to the staden directory and did a Make and didn't get any problems >> reported. But when I did a make test I get: >> >> newyork:/usr/local/bioperl-ext-1.5.1/Bio/SeqIO/staden root# make test >> PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" >> "test_harness(0, '../blib/lib', '../blib/arch')" test.pl >> test....Had problems bootstrapping Inline module >> 'Bio::SeqIO::staden::read' >> >> Can't load >> '/usr/local/bioperl-ext-1.5.1/Bio/SeqIO/staden/../blib/arch/auto/ >> Bio/SeqIO/staden/read/read.bundle' >> for module Bio::SeqIO::staden::read: >> dlopen(/usr/local/bioperl-ext-1.5.1/Bio/SeqIO/staden/../blib/arch/ >> auto/Bio/SeqIO/staden/read/read.bundle, >> 2): Symbol not found: _curl_easy_init >> Referenced from: >> /usr/local/bioperl-ext-1.5.1/Bio/SeqIO/staden/../blib/arch/auto/Bio/ >> SeqIO/staden/read/read.bundle >> Expected in: dynamic lookup >> at /Library/Perl/5.8.6/Inline.pm line 500 >> >> >> at test.pl line 0 >> INIT failed--call queue aborted, line 1. >> test....dubious >> Test returned status 255 (wstat 65280, 0xff00) >> DIED. FAILED tests 1-94 >> Failed 94/94 tests, 0.00% okay >> Failed Test Stat Wstat Total Fail Failed List of Failed >> ---------------------------------------------------------------------- >> --------- >> test.pl 255 65280 94 188 200.00% 1-94 >> Failed 1/1 test scripts, 0.00% okay. 94/94 subtests failed, 0.00% okay. >> make: *** [test_dynamic] Error 2 >> >> The missing symbol is apparently from libcurl. I have both libcurl.2.dylib >> and libcurl.3.dylib with copies in multiple locations including /usr/lib, >> /usr/local/lib and the usual Mac directories. I used the Mac otool to look >> at the externals in read.bundle and it references libz.1.dylib and >> libSystem.B.dylib. Could this be a case where there should have been a >> link to libcurl and wasn't? >> >> I've searched the list and see only the Inline versioning problem (which I >> had and fixed). Has anybody seen this problem before or built the module >> on a Mac? How did you do it? Is this a question for the Staden list on >> sourceforge? >> >> Thanks >> >> Mike > > Haven't seen the problem you list. I have installed it on Mac OS X (intel) > w/o problems so I know it works; at least all tests passed though I remember > Inline complaining for some reason. > > You should try using bioperl-ext from CVS (it is really 1.5.1 but with > updated docs and maybe a change or two). The process is a little tricky but > is documented in the README in the package. You'll need the old io_lib > (1.8.12 or earlier) from Staden if memory serves. > > chris > Thanks, I'll give that a try. Mike From alexl at users.sourceforge.net Fri Aug 17 11:23:33 2007 From: alexl at users.sourceforge.net (Alex Lancaster) Date: Fri, 17 Aug 2007 08:23:33 -0700 Subject: [Bioperl-l] Clarifying license of bioperl In-Reply-To: <1A4207F8295607498283FE9E93B775B40390172E@EX02.asurite.ad.asu.edu> (Kevin Brown's message of "Fri\, 17 Aug 2007 08\:11\:40 -0700") References: <1A4207F8295607498283FE9E93B775B40390172E@EX02.asurite.ad.asu.edu> Message-ID: >>>>> "KB" == Kevin Brown writes: [...] >> Also note that since Perl's license is a dual-license "GPL or >> Artistic" then people aren't required to submit their modifications >> back to the bioperl distribution because they can choose to follow >> the Artistic (rather than the GPL) license which doesn't require >> modifications to be submitted back. This means the point: KB> You aren't required to submit patches even under the GPL. If I KB> make changes and don't distribute them then I have no requirement KB> to reveal my changes to the bioperl source code. Also the GPL KB> does not require that the code be made freely available to all, KB> just that users of GPL'd software can request the source from the KB> vendor/distributor and should not find lots of little hoops to KB> jump through to get it. You can even charge to get access if that KB> charge is to cover the cost of the expense to get it (such as the KB> cost of a cd + mail delivery charge). Sure, I was just pointing out that you can avoid even these things if you choose the Artistic license. I have no problem with the GPL, but some people do. The other possibility (if the current Perl "GPL or Artistic" is not a possibility) is simply upgrading to the "Artistic 2.0" license adopted by the Perl Foundation for Perl 6 and later (I think?): http://www.perlfoundation.org/artistic_license_2_0 it's a GPL-compatible free software license. Alex From Kevin.M.Brown at asu.edu Fri Aug 17 11:11:40 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Fri, 17 Aug 2007 08:11:40 -0700 Subject: [Bioperl-l] Clarifying license of bioperl In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B40390172E@EX02.asurite.ad.asu.edu> > AL> I assume that the intent of the bioperl authors is to > license with > AL> the same terms as Perl's *current* license (which would > mean bioperl > AL> is really effectively dually-licensed under the GPL or Artistic > AL> license). If so, it would be good if the LICENSE text > and the wiki > AL> were updated to reflect this. > > Also note that since Perl's license is a dual-license "GPL or > Artistic" then people aren't required to submit their > modifications back to the bioperl distribution because they > can choose to follow the Artistic (rather than the GPL) > license which doesn't require modifications to be submitted > back. This means the point: You aren't required to submit patches even under the GPL. If I make changes and don't distribute them then I have no requirement to reveal my changes to the bioperl source code. Also the GPL does not require that the code be made freely available to all, just that users of GPL'd software can request the source from the vendor/distributor and should not find lots of little hoops to jump through to get it. You can even charge to get access if that charge is to cover the cost of the expense to get it (such as the cost of a cd + mail delivery charge). From cjfields at uiuc.edu Fri Aug 17 12:07:47 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 17 Aug 2007 11:07:47 -0500 Subject: [Bioperl-l] Clarifying license of bioperl In-Reply-To: References: <1A4207F8295607498283FE9E93B775B40390172E@EX02.asurite.ad.asu.edu> Message-ID: <3515AB25-9919-407B-93E9-352BC426AFA1@uiuc.edu> On Aug 17, 2007, at 10:23 AM, Alex Lancaster wrote: >>>>>> "KB" == Kevin Brown writes: > > [...] > >>> Also note that since Perl's license is a dual-license "GPL or >>> Artistic" then people aren't required to submit their modifications >>> back to the bioperl distribution because they can choose to follow >>> the Artistic (rather than the GPL) license which doesn't require >>> modifications to be submitted back. This means the point: > > KB> You aren't required to submit patches even under the GPL. If I > KB> make changes and don't distribute them then I have no requirement > KB> to reveal my changes to the bioperl source code. Also the GPL > KB> does not require that the code be made freely available to all, > KB> just that users of GPL'd software can request the source from the > KB> vendor/distributor and should not find lots of little hoops to > KB> jump through to get it. You can even charge to get access if that > KB> charge is to cover the cost of the expense to get it (such as the > KB> cost of a cd + mail delivery charge). > > Sure, I was just pointing out that you can avoid even these things if > you choose the Artistic license. I have no problem with the GPL, but > some people do. The other possibility (if the current Perl "GPL or > Artistic" is not a possibility) is simply upgrading to the "Artistic > 2.0" license adopted by the Perl Foundation for Perl 6 and later (I > think?): > > http://www.perlfoundation.org/artistic_license_2_0 > > it's a GPL-compatible free software license. > > Alex Switching to Artistic 2.0 is probably the best way to go. We'll need a more involved discussion but I don't think there'll be too many objections. You mention GPL-compatibility; is that for v2 and v3? chris From gonzaled at tcd.ie Fri Aug 17 13:03:35 2007 From: gonzaled at tcd.ie (David Gonzalez) Date: Fri, 17 Aug 2007 18:03:35 +0100 Subject: [Bioperl-l] Bio::SeqIO::swiss species parsing bug? Message-ID: <46C5D4E7.6000605@tcd.ie> Hi, I had a problem with a swissprot file in which the genus and species were being left undefined, and I believe it could be a bug in the swiss.pm module. When I tried to parse the file with Bio::SeqIO, I got the following error messages: Use of uninitialized value in pattern match (m//) at /sw/lib/perl5/5.8.6/Bio/SeqIO/swiss.pm line 965, line 12. Use of uninitialized value in string eq at /sw/lib/perl5/5.8.6/Bio/SeqIO/swiss.pm line 967, line 12. The fields I wanted from the file (gene_id , etc.. ) were fine however, so it was being parsed. I checked the output with Data::Dumper and I found the following in the species entry; the species is left undefined, and the common name is absent. 'species' => bless( { '_ncbi_taxid' => 'Not', '_classification' => [ undef, undef, 'Aedes', 'Culicini', 'Culicinae', 'Culicidae', 'Culicoidea', 'Nematocera', 'Diptera', 'Endopterygota', 'Neoptera', 'Pterygota', 'Insecta', 'Hexapoda', 'Arthropoda', 'Metazoa', 'Eukaryota' ] }, 'Bio::Species' ), The species line in the file is formatted according to the swissprot specifications and includes a common name OS Aedes aegypti (yellow fever mosquito) OC Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera; OC Endopterygota; Diptera; Nematocera; Culicoidea; Culicidae; Culicinae; OC Culicini; Aedes. OX NCBI_TaxID=Not defined; I think the problem is in the line 905 of the swiss.pm file: 902 if(/^OS\s+(\S.+)/ && (! defined($binomial))) { 903 $osline .= " " if $osline; 904 $osline .= $1; 905 if($osline =~ s/(,|, and|\.)$//) { 906 ($binomial, $descr) = $osline =~ /(\S[^\(]+)(.*)/; 907 ($ns_name) = $binomial; 908 $ns_name =~ s/\s+$//; ##### The problem seems to be that there are no punctuation signs, so 905 returns false. The swissprot format does not require the line to end in '.' I think although it normally does. By just removing the requirement for the substitution the output of Data::Dumper seemed normal .... '_common_name' => 'yellow fever mosquito', '_ncbi_taxid' => 'Not', '_classification' => [ 'aegypti', 'Aedes', 'Culicini', .... I am using the fink installed bioperl: bioperl-pm586 1.4-5 Perl module for biology I don't know if this has been reported/solved in the newer versions of bioperl. David -- David Gonzalez Knowles Smurfit Institute of Genetics Trinity College Dublin From cjfields at uiuc.edu Fri Aug 17 13:20:21 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 17 Aug 2007 12:20:21 -0500 Subject: [Bioperl-l] Bio::SeqIO::swiss species parsing bug? In-Reply-To: <46C5D4E7.6000605@tcd.ie> References: <46C5D4E7.6000605@tcd.ie> Message-ID: <04912FDE-2AA4-414C-9CE4-A0BA5E9C89C9@uiuc.edu> On Aug 17, 2007, at 12:03 PM, David Gonzalez wrote: > Hi, > > I had a problem with a swissprot file in which the genus and species > were being left undefined, and I believe it could be a bug in the > swiss.pm module. > > > When I tried to parse the file with Bio::SeqIO, I got the following > error messages: > > Use of uninitialized value in pattern match (m//) at > /sw/lib/perl5/5.8.6/Bio/SeqIO/swiss.pm line 965, line 12. > Use of uninitialized value in string eq at > /sw/lib/perl5/5.8.6/Bio/SeqIO/swiss.pm line 967, line 12. > ... > I am using the fink installed bioperl: > bioperl-pm586 1.4-5 Perl module for biology > > I don't know if this has been reported/solved in the newer > versions of > bioperl. > > David > > -- > David Gonzalez Knowles > Smurfit Institute of Genetics > Trinity College > Dublin That looks like bioperl 1.4, which is several years old. You should update to the latest official release (1.5.2), then see if the problem persists. chris From alexl at users.sourceforge.net Sat Aug 18 07:33:34 2007 From: alexl at users.sourceforge.net (Alex Lancaster) Date: Sat, 18 Aug 2007 04:33:34 -0700 Subject: [Bioperl-l] Clarifying license of bioperl In-Reply-To: <3515AB25-9919-407B-93E9-352BC426AFA1@uiuc.edu> (Chris Fields's message of "Fri\, 17 Aug 2007 11\:07\:47 -0500") References: <1A4207F8295607498283FE9E93B775B40390172E@EX02.asurite.ad.asu.edu> <3515AB25-9919-407B-93E9-352BC426AFA1@uiuc.edu> Message-ID: <8td4xlyt4h.fsf@allele2.localdomain> >>>>> "CF" == Chris Fields writes: [...] >> Sure, I was just pointing out that you can avoid even these things >> if you choose the Artistic license. I have no problem with the >> GPL, but some people do. The other possibility (if the current >> Perl "GPL or Artistic" is not a possibility) is simply upgrading to >> the "Artistic 2.0" license adopted by the Perl Foundation for Perl >> 6 and later (I think?): >> http://www.perlfoundation.org/artistic_license_2_0 >> it's a GPL-compatible free software license. CF> Switching to Artistic 2.0 is probably the best way to go. We'll CF> need a more involved discussion but I don't think there'll be too CF> many objections. You mention GPL-compatibility; is that for v2 CF> and v3? IANAL, but looking at: http://www.perlfoundation.org/artistic_2_0_notes http://www.gnu.org/licenses/license-list.html (scroll down to "Artistic 2.0") it looks like you can choose any GPL license (i.e. v1 to v3). I was really more concerned with clarifying what the bioperl license was *right now*, because "the same license as Perl" implies the so-called "disjunctive" "GPL or Artistic license": http://www.gnu.org/licenses/license-list.html#PerlLicense which is what I've marked the Fedora package as (since it listed "the same license as Perl" in most of the source files), which is fine for Fedora. Fedora may possibly (still under discussion I believe) require removal of any package that is licensed under the original (1.0) Artistic alone and it would be a real shame if that required bioperl being pulled from the repo. I imagine the intent of the bioperl contributors is that it should be under the same terms as Perl, whatever that happens to be (which just happens to be GPL or Artistic, which is fine). A clarification to that effect would be useful. Cheers, Alex From zhaodj at ioz.ac.cn Sat Aug 18 11:06:41 2007 From: zhaodj at ioz.ac.cn (De-Jian,ZHAO) Date: Sat, 18 Aug 2007 23:06:41 +0800 (CST) Subject: [Bioperl-l] How to get the full methods of a bioperl object? In-Reply-To: <46C5781F.60301@sheffield.ac.uk> References: <55928.159.226.67.49.1187316796.squirrel@mail.ioz.ac.cn> <46C5781F.60301@sheffield.ac.uk> Message-ID: <52869.159.226.67.49.1187449601.squirrel@mail.ioz.ac.cn> Thank you,Nathan. The Deobfuscator is very helpful. On Fri, Aug 17, 2007 18:27, Nathan Haigh wrote: > De-Jian,ZHAO wrote: >> Dear list members, >> >> I have a question about the methods of bioperl objects.It is how >> and >> where we can get the whole methods of a bioperl object. >> >> Take Bio::Tools::Run::RemoteBlast for example. In the synopsis of >> this object, some sample codes are given.The following five >> clauses >> are excerpted from the synopsis. >> (1)my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >> (2)while ( my @rids = $factory->each_rid ) { >> (3)$factory->remove_rid($rid); >> (4)my $rc = $factory->retrieve_blast($rid); >> (5)my $r = $factory->submit_blast($input); >> >> The five clauses use five methods of the RemoteBlast object,i.e. >> (1)new, (2)each_rid, (3)remove_rid,(4)retrieve_blast,and >> (5)submit_blast. However,I only find part of them(45) are listed >> in >> the appendix while others(123) are absent. Are there some more >> methods not explictly declared? I don't know.This will lead to the >> partial understanding and utilization of the module.Therefore I >> come >> here for the way to get the full methods of a bioperl object. >> >> Thanks! >> > > > You should check out the Deobfuscator at: > http://bioperl.org/cgi-bin/deob_interface.cgi > > Search and choose the object of choice. e.g. > Bio::Tools::Run::RemoteBlast > > You will be provided a list of methods available to that object, > including all the methods up the inheritance hierarchy. > Unfortunately, > some bioperl modules are documented more thoroughly than others. > > Nath > From hlapp at gmx.net Sat Aug 18 12:13:28 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 18 Aug 2007 12:13:28 -0400 Subject: [Bioperl-l] Clarifying license of bioperl In-Reply-To: <8td4xlyt4h.fsf@allele2.localdomain> References: <1A4207F8295607498283FE9E93B775B40390172E@EX02.asurite.ad.asu.edu> <3515AB25-9919-407B-93E9-352BC426AFA1@uiuc.edu> <8td4xlyt4h.fsf@allele2.localdomain> Message-ID: <8D3FBCDF-47E7-4A6E-8001-C034CA27BF3F@gmx.net> On Aug 18, 2007, at 7:33 AM, Alex Lancaster wrote: > I imagine the intent of the bioperl > contributors is that it should be under the same terms as Perl, > whatever that happens to be (which just happens to be GPL or Artistic, > which is fine). I fully agree. > A clarification to that effect would be useful. Agreed, too. Would you mind changing that language on the wiki, since you seem to have a fairly good grasp on the issue? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Sat Aug 18 12:42:04 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 18 Aug 2007 11:42:04 -0500 Subject: [Bioperl-l] Clarifying license of bioperl In-Reply-To: <8D3FBCDF-47E7-4A6E-8001-C034CA27BF3F@gmx.net> References: <1A4207F8295607498283FE9E93B775B40390172E@EX02.asurite.ad.asu.edu> <3515AB25-9919-407B-93E9-352BC426AFA1@uiuc.edu> <8td4xlyt4h.fsf@allele2.localdomain> <8D3FBCDF-47E7-4A6E-8001-C034CA27BF3F@gmx.net> Message-ID: On Aug 18, 2007, at 11:13 AM, Hilmar Lapp wrote: > > On Aug 18, 2007, at 7:33 AM, Alex Lancaster wrote: > >> I imagine the intent of the bioperl >> contributors is that it should be under the same terms as Perl, >> whatever that happens to be (which just happens to be GPL or >> Artistic, >> which is fine). > > I fully agree. > >> A clarification to that effect would be useful. > > Agreed, too. Would you mind changing that language on the wiki, since > you seem to have a fairly good grasp on the issue? > > -hilmar Looks like the modules mostly state 'You may distribute this module under the same terms as perl itself', but there are likely a few which need to be changed. Might be worth running a quick code audit to see what's present. chris From avilella at gmail.com Sat Aug 18 16:38:10 2007 From: avilella at gmail.com (Albert Vilella) Date: Sat, 18 Aug 2007 21:38:10 +0100 Subject: [Bioperl-l] How to get the full methods of a bioperl object? In-Reply-To: <55928.159.226.67.49.1187316796.squirrel@mail.ioz.ac.cn> References: <55928.159.226.67.49.1187316796.squirrel@mail.ioz.ac.cn> Message-ID: <358f4d650708181338s5a5caadbscfa85786327f4304@mail.gmail.com> I particularly like to code and debug at the same time. When you are using the perl debugger, you can do an: m $object and it will show up all the information and methods for that object. Cheers, Albert. On 8/17/07, De-Jian,ZHAO wrote: > > Dear list members, > > I have a question about the methods of bioperl objects.It is how and > where we can get the whole methods of a bioperl object. > > Take Bio::Tools::Run::RemoteBlast for example. In the synopsis of > this object, some sample codes are given.The following five clauses > are excerpted from the synopsis. > (1)my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > (2)while ( my @rids = $factory->each_rid ) { > (3)$factory->remove_rid($rid); > (4)my $rc = $factory->retrieve_blast($rid); > (5)my $r = $factory->submit_blast($input); > > The five clauses use five methods of the RemoteBlast object,i.e. > (1)new, (2)each_rid, (3)remove_rid,(4)retrieve_blast,and > (5)submit_blast. However,I only find part of them(45) are listed in > the appendix while others(123) are absent. Are there some more > methods not explictly declared? I don't know.This will lead to the > partial understanding and utilization of the module.Therefore I come > here for the way to get the full methods of a bioperl object. > > Thanks! > -- > De-Jian Zhao > Institute of Zoology,Chinese Academy of Sciences > +86-10-64807217 > zhaodj at ioz.ac.cn > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From avilella at gmail.com Sat Aug 18 16:38:10 2007 From: avilella at gmail.com (Albert Vilella) Date: Sat, 18 Aug 2007 21:38:10 +0100 Subject: [Bioperl-l] How to get the full methods of a bioperl object? In-Reply-To: <55928.159.226.67.49.1187316796.squirrel@mail.ioz.ac.cn> References: <55928.159.226.67.49.1187316796.squirrel@mail.ioz.ac.cn> Message-ID: <358f4d650708181338s5a5caadbscfa85786327f4304@mail.gmail.com> I particularly like to code and debug at the same time. When you are using the perl debugger, you can do an: m $object and it will show up all the information and methods for that object. Cheers, Albert. On 8/17/07, De-Jian,ZHAO wrote: > > Dear list members, > > I have a question about the methods of bioperl objects.It is how and > where we can get the whole methods of a bioperl object. > > Take Bio::Tools::Run::RemoteBlast for example. In the synopsis of > this object, some sample codes are given.The following five clauses > are excerpted from the synopsis. > (1)my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > (2)while ( my @rids = $factory->each_rid ) { > (3)$factory->remove_rid($rid); > (4)my $rc = $factory->retrieve_blast($rid); > (5)my $r = $factory->submit_blast($input); > > The five clauses use five methods of the RemoteBlast object,i.e. > (1)new, (2)each_rid, (3)remove_rid,(4)retrieve_blast,and > (5)submit_blast. However,I only find part of them(45) are listed in > the appendix while others(123) are absent. Are there some more > methods not explictly declared? I don't know.This will lead to the > partial understanding and utilization of the module.Therefore I come > here for the way to get the full methods of a bioperl object. > > Thanks! > -- > De-Jian Zhao > Institute of Zoology,Chinese Academy of Sciences > +86-10-64807217 > zhaodj at ioz.ac.cn > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From neetisomaiya at gmail.com Mon Aug 20 00:33:17 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Mon, 20 Aug 2007 10:03:17 +0530 Subject: [Bioperl-l] PDB Parser In-Reply-To: <46C5A405.2070005@sendu.me.uk> References: <764978cf0708152256w38b2bfa7ic4d196ae08a46bd0@mail.gmail.com> <46C41FEC.2000206@sendu.me.uk> <5D32F747-60FC-4EEE-BD38-3A522A67EA27@uiuc.edu> <764978cf0708162323r17c4fc59w5adfb61ccfc5ac6@mail.gmail.com> <764978cf0708170342q45acbea1vebaf1a8defb93896@mail.gmail.com> <46C5A405.2070005@sendu.me.uk> Message-ID: <764978cf0708192133i3d4ef031ic34d11f1f91e59b7@mail.gmail.com> Hi, Thanks for the responses. Another question I had was, I am interested only in pdb id and title, and for this I am downloading and unzipping each of the full pdb structure files, parsing to get just id and title. Is there any other data source which can give me just id and title of pdb structures, without me having to download the full file of each structre? Thanks, Neeti. On 8/17/07, Sendu Bala wrote: > > neeti somaiya wrote: > > Hi, > > > > I have done it currently as follows : > [snip] > > Is this ok? > > If it works, of course. There seems to be some redundant code there, > however. I'm guessing this would be better (assuming your code worked in > the first place): > > while (my $struc = $in->next_structure()) { > my $pdb_id = $struc->id; > print "Structure ", $pdb_id,"\n"; > > my $ac = $struc->annotation(); > my ($title) = $ac->get_Annotations('title'); > $title = $title->as_text; > chomp($title); > if ($title =~ /Value\: (.*)/) { > $title = $1; > } > $title =~ s/\s+/ /g; > > print "Title ",$title,"\n"; > } > -- -Neeti Even my blood says, B positive From jaudall at gmail.com Mon Aug 20 00:39:18 2007 From: jaudall at gmail.com (Joshua Udall) Date: Sun, 19 Aug 2007 21:39:18 -0700 Subject: [Bioperl-l] concatenating aln splices Message-ID: <52cea20c0708192139r3886fe71j58f69a0aaa8c8a4f@mail.gmail.com> Based on several criteria, I've extracted several splices from a single alignment and I'm trying to concatenate my selected sequences together. Unfortunately, one of the sequences in the original alignment only has gap characters for one or more of the splices. I'd like to keep the gap splices because other downstream aligned bases depend on them. I get these two warning messages splicing my alignments together: -------------------- WARNING --------------------- MSG: Got a sequence with no letters in it cannot guess alphabet [] --------------------------------------------------- -------------------- WARNING --------------------- MSG: Slice [232-233] of sequence [X2A/1-202] contains no residues. Sequence excluded from the new alignment. --------------------------------------------------- and now because of missing gaps, I get this error when trying to concatenate them: -------------------- WARNING --------------------- MSG: expecting 236 not 203 from X2A --------------------------------------------------- ------------- EXCEPTION ------------- MSG: All sequences in the alignment must be the same length STACK Bio::AlignIO::phylip::write_aln /sw/lib/perl5/5.8.6/Bio/AlignIO/phylip.pm:292 I don't mind the warnings, in fact I like them, but is there a way to stop the splice function from removing the 'gap' sequence from the alignment? Perhaps catching the warning and inserting the gaps afterwards might work, but I'm wondering if there's is a simpler modification of SimpleAlign.pm that might work. Any thoughts? Josh From bix at sendu.me.uk Mon Aug 20 03:43:45 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 20 Aug 2007 08:43:45 +0100 Subject: [Bioperl-l] concatenating aln splices In-Reply-To: <52cea20c0708192139r3886fe71j58f69a0aaa8c8a4f@mail.gmail.com> References: <52cea20c0708192139r3886fe71j58f69a0aaa8c8a4f@mail.gmail.com> Message-ID: <46C94631.2060704@sendu.me.uk> Joshua Udall wrote: > Based on several criteria, I've extracted several splices from a > single alignment and I'm trying to concatenate my selected sequences > together. Unfortunately, one of the sequences in the original > alignment only has gap characters for one or more of the splices. I'd > like to keep the gap splices because other downstream aligned bases > depend on them. [snip] > I don't mind the warnings, in fact I like them, but is there a way to > stop the splice function from removing the 'gap' sequence from the > alignment? Perhaps catching the warning and inserting the gaps > afterwards might work, but I'm wondering if there's is a simpler > modification of SimpleAlign.pm that might work. Any thoughts? Let us see some code, so we can get a better idea of what you're doing and what you've tried. You can avoid losing sequences during a slice by not doing a slice. Instead, remove_columns(). This way you don't have to splice alignments together; you go from original alignment to 'spliced' version in one step. From Oliver.Wafzig at sygnis.de Mon Aug 20 04:42:55 2007 From: Oliver.Wafzig at sygnis.de (Oliver Wafzig) Date: Mon, 20 Aug 2007 10:42:55 +0200 Subject: [Bioperl-l] PDB Parser In-Reply-To: <764978cf0708192133i3d4ef031ic34d11f1f91e59b7@mail.gmail.com> References: <764978cf0708152256w38b2bfa7ic4d196ae08a46bd0@mail.gmail.com> <46C5A405.2070005@sendu.me.uk> <764978cf0708192133i3d4ef031ic34d11f1f91e59b7@mail.gmail.com> Message-ID: <200708201042.55292.Oliver.Wafzig@sygnis.de> On Monday 20 August 2007 06:33, neeti somaiya wrote: > Another question I had was, I am interested only in pdb id and title, and > for this I am downloading and unzipping each of the full pdb structure > files, parsing to get just id and title. Is there any other data source Hi Neeti, this is a non bioperl way to download the data. Use the SRS server on the EBI page to download only id and title lines from pdb. 1) Point your browser to the SRS page (http://srs.ebi.ac.uk). 2) Search for 'PDB' on the 'library page' and select it. 3) Use the standard query form. Select 'id' in the dropdown list and insert '*' (wildcard). 4) Create a view by selecting 'ID' and 'Title', then click the search button. 5) Click the save results button. 6) Select 'file' in the 'output to' area and 'ALL' in the 'Number of entries to download' field. Press 'save'. If the download is slow, read the 'download tips' on the download page and split the results in chunks. -- Oliver From neetisomaiya at gmail.com Mon Aug 20 09:05:01 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Mon, 20 Aug 2007 18:35:01 +0530 Subject: [Bioperl-l] PDB Parser In-Reply-To: <200708201042.55292.Oliver.Wafzig@sygnis.de> References: <764978cf0708152256w38b2bfa7ic4d196ae08a46bd0@mail.gmail.com> <46C5A405.2070005@sendu.me.uk> <764978cf0708192133i3d4ef031ic34d11f1f91e59b7@mail.gmail.com> <200708201042.55292.Oliver.Wafzig@sygnis.de> Message-ID: <764978cf0708200605j2dd63973p9cd88787b3acdbc8@mail.gmail.com> Thanks for your response. Actually I am looking for something standalone and not on the web, as in something which I can download onto my machine and parse later to get id and title. On 8/20/07, Oliver Wafzig wrote: > > On Monday 20 August 2007 06:33, neeti somaiya wrote: > > Another question I had was, I am interested only in pdb id and title, > and > > for this I am downloading and unzipping each of the full pdb structure > > files, parsing to get just id and title. Is there any other data source > > Hi Neeti, > this is a non bioperl way to download the data. > Use the SRS server on the EBI page to download only id and title lines > from > pdb. > > 1) Point your browser to the SRS page (http://srs.ebi.ac.uk). > 2) Search for 'PDB' on the 'library page' and select it. > 3) Use the standard query form. Select 'id' in the dropdown list and > insert '*' (wildcard). > 4) Create a view by selecting 'ID' and 'Title', then click the search > button. > 5) Click the save results button. > 6) Select 'file' in the 'output to' area and 'ALL' in the 'Number of > entries > to download' field. Press 'save'. > > If the download is slow, read the 'download tips' on the download page and > split the results in chunks. > > -- > Oliver > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- -Neeti Even my blood says, B positive From bernd at kirx.de Mon Aug 20 12:57:28 2007 From: bernd at kirx.de (Bernd Mueller) Date: Mon, 20 Aug 2007 18:57:28 +0200 Subject: [Bioperl-l] PDB Parser In-Reply-To: <764978cf0708200605j2dd63973p9cd88787b3acdbc8@mail.gmail.com> References: <764978cf0708152256w38b2bfa7ic4d196ae08a46bd0@mail.gmail.com> <46C5A405.2070005@sendu.me.uk> <764978cf0708192133i3d4ef031ic34d11f1f91e59b7@mail.gmail.com> <200708201042.55292.Oliver.Wafzig@sygnis.de> <764978cf0708200605j2dd63973p9cd88787b3acdbc8@mail.gmail.com> Message-ID: <46C9C7F8.3020608@kirx.de> Hello, Maybe you wanna try the Database-EUtilities module from bioperl. They are described on http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook I tried them for a similar search on pubmed but without any reasonable results because my target was too focused. From EUtilities documentation on http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helpentrez.section.EntrezHelp.The_Databases "Protein Database The Protein database contains sequence data from the translated coding regions from DNA sequences in GenBank, EMBL, and DDBJ as well as protein sequences submitted to Protein Information Resource (PIR), SWISS-PROT, Protein Research Foundation (PRF), and Protein Data Bank (PDB) (sequences from solved structures). " So PDB is included in eutilities from NCBI. Regards, Bernd neeti somaiya wrote: > Thanks for your response. > Actually I am looking for something standalone and not on the web, as in > something which I can download onto my machine and parse later to get id and > title. > > On 8/20/07, Oliver Wafzig wrote: >> On Monday 20 August 2007 06:33, neeti somaiya wrote: >>> Another question I had was, I am interested only in pdb id and title, >> and >>> for this I am downloading and unzipping each of the full pdb structure >>> files, parsing to get just id and title. Is there any other data source >> Hi Neeti, >> this is a non bioperl way to download the data. >> Use the SRS server on the EBI page to download only id and title lines >> from >> pdb. >> >> 1) Point your browser to the SRS page (http://srs.ebi.ac.uk). >> 2) Search for 'PDB' on the 'library page' and select it. >> 3) Use the standard query form. Select 'id' in the dropdown list and >> insert '*' (wildcard). >> 4) Create a view by selecting 'ID' and 'Title', then click the search >> button. >> 5) Click the save results button. >> 6) Select 'file' in the 'output to' area and 'ALL' in the 'Number of >> entries >> to download' field. Press 'save'. >> >> If the download is slow, read the 'download tips' on the download page and >> split the results in chunks. >> >> -- >> Oliver >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > -- Dipl.-Inform.(FH) Bernd Mueller phone: +49 179 2336692 email: bernd at kirx.de From neetisomaiya at gmail.com Mon Aug 20 13:39:01 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Mon, 20 Aug 2007 23:09:01 +0530 Subject: [Bioperl-l] PDB Parser In-Reply-To: <46C9C7F8.3020608@kirx.de> References: <764978cf0708152256w38b2bfa7ic4d196ae08a46bd0@mail.gmail.com> <46C5A405.2070005@sendu.me.uk> <764978cf0708192133i3d4ef031ic34d11f1f91e59b7@mail.gmail.com> <200708201042.55292.Oliver.Wafzig@sygnis.de> <764978cf0708200605j2dd63973p9cd88787b3acdbc8@mail.gmail.com> <46C9C7F8.3020608@kirx.de> Message-ID: <764978cf0708201039g53b29f29i36eed1a7acd5a892@mail.gmail.com> Hi, Thanks for all the responses. I got the solution from RCBS people :- Dear Dr. Somaiya, Thank you for your email message. Please try the following: 1) Go to http://www.pdb.org/pdb/statistics/holdings.do and select the number in the bottom right corner of the table (currently 45213). 2) From the menu on the left select 'Tabulate'>>'Custom Report' and under 'Primary Citation' select 'Title' 3) At the bottom, select 'Create Report' and then one of the 'Download' options. Please let us know if we can be of additional assistance. Sincerely, Rachel Green On 8/20/07, Bernd Mueller wrote: > > Hello, > > Maybe you wanna try the Database-EUtilities module from bioperl. They > are described on http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > > I tried them for a similar search on pubmed but without any reasonable > results because my target was too focused. > > From EUtilities documentation on > > http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helpentrez.section.EntrezHelp.The_Databases > > "Protein Database > > The Protein database contains sequence data from the translated coding > regions from DNA sequences in GenBank, EMBL, and DDBJ as well as protein > sequences submitted to Protein Information Resource (PIR), SWISS-PROT, > Protein Research Foundation (PRF), and Protein Data Bank (PDB) > (sequences from solved structures). " > > So PDB is included in eutilities from NCBI. > > Regards, > Bernd > > neeti somaiya wrote: > > Thanks for your response. > > Actually I am looking for something standalone and not on the web, as in > > something which I can download onto my machine and parse later to get id > and > > title. > > > > On 8/20/07, Oliver Wafzig wrote: > >> On Monday 20 August 2007 06:33, neeti somaiya wrote: > >>> Another question I had was, I am interested only in pdb id and title, > >> and > >>> for this I am downloading and unzipping each of the full pdb structure > >>> files, parsing to get just id and title. Is there any other data > source > >> Hi Neeti, > >> this is a non bioperl way to download the data. > >> Use the SRS server on the EBI page to download only id and title lines > >> from > >> pdb. > >> > >> 1) Point your browser to the SRS page (http://srs.ebi.ac.uk). > >> 2) Search for 'PDB' on the 'library page' and select it. > >> 3) Use the standard query form. Select 'id' in the dropdown list and > >> insert '*' (wildcard). > >> 4) Create a view by selecting 'ID' and 'Title', then click the search > >> button. > >> 5) Click the save results button. > >> 6) Select 'file' in the 'output to' area and 'ALL' in the 'Number of > >> entries > >> to download' field. Press 'save'. > >> > >> If the download is slow, read the 'download tips' on the download page > and > >> split the results in chunks. > >> > >> -- > >> Oliver > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > > > > -- > Dipl.-Inform.(FH) > Bernd Mueller > phone: +49 179 2336692 > email: bernd at kirx.de > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- -Neeti Even my blood says, B positive From jaudall at gmail.com Mon Aug 20 14:30:26 2007 From: jaudall at gmail.com (Joshua Udall) Date: Mon, 20 Aug 2007 12:30:26 -0600 Subject: [Bioperl-l] concatenating aln splices In-Reply-To: <46C94631.2060704@sendu.me.uk> References: <52cea20c0708192139r3886fe71j58f69a0aaa8c8a4f@mail.gmail.com> <46C94631.2060704@sendu.me.uk> Message-ID: <52cea20c0708201130u29af2e10w78a852d7f88c23d1@mail.gmail.com> Thanks, Sendu! That suggestion was exactly what I needed. I have it worked out now with the remove_columns function. Much easier that way :) Josh On 8/20/07, Sendu Bala wrote: > > Joshua Udall wrote: > > Based on several criteria, I've extracted several splices from a > > single alignment and I'm trying to concatenate my selected sequences > > together. Unfortunately, one of the sequences in the original > > alignment only has gap characters for one or more of the splices. I'd > > like to keep the gap splices because other downstream aligned bases > > depend on them. > [snip] > > I don't mind the warnings, in fact I like them, but is there a way to > > stop the splice function from removing the 'gap' sequence from the > > alignment? Perhaps catching the warning and inserting the gaps > > afterwards might work, but I'm wondering if there's is a simpler > > modification of SimpleAlign.pm that might work. Any thoughts? > > Let us see some code, so we can get a better idea of what you're doing > and what you've tried. > > You can avoid losing sequences during a slice by not doing a slice. > Instead, remove_columns(). This way you don't have to splice alignments > together; you go from original alignment to 'spliced' version in one step. > From cjfields at uiuc.edu Mon Aug 20 14:51:14 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 20 Aug 2007 13:51:14 -0500 Subject: [Bioperl-l] PDB Parser In-Reply-To: <46C9C7F8.3020608@kirx.de> References: <764978cf0708152256w38b2bfa7ic4d196ae08a46bd0@mail.gmail.com> <46C5A405.2070005@sendu.me.uk> <764978cf0708192133i3d4ef031ic34d11f1f91e59b7@mail.gmail.com> <200708201042.55292.Oliver.Wafzig@sygnis.de> <764978cf0708200605j2dd63973p9cd88787b3acdbc8@mail.gmail.com> <46C9C7F8.3020608@kirx.de> Message-ID: <4EAE752E-CACB-41AF-BF55-7A83071CE590@uiuc.edu> Just curious, but what kind of query were you trying? It might be worth trying to work through it to add as an example to the cookbook page. chris On Aug 20, 2007, at 11:57 AM, Bernd Mueller wrote: > Hello, > > Maybe you wanna try the Database-EUtilities module from bioperl. They > are described on http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > > I tried them for a similar search on pubmed but without any reasonable > results because my target was too focused. > > From EUtilities documentation on > http://www.ncbi.nlm.nih.gov/books/bv.fcgi? > rid=helpentrez.section.EntrezHelp.The_Databases > > "Protein Database > > The Protein database contains sequence data from the translated coding > regions from DNA sequences in GenBank, EMBL, and DDBJ as well as > protein > sequences submitted to Protein Information Resource (PIR), SWISS-PROT, > Protein Research Foundation (PRF), and Protein Data Bank (PDB) > (sequences from solved structures). " > > So PDB is included in eutilities from NCBI. > > Regards, > Bernd > > neeti somaiya wrote: >> Thanks for your response. >> Actually I am looking for something standalone and not on the web, >> as in >> something which I can download onto my machine and parse later to >> get id and >> title. >> >> On 8/20/07, Oliver Wafzig wrote: >>> On Monday 20 August 2007 06:33, neeti somaiya wrote: >>>> Another question I had was, I am interested only in pdb id and >>>> title, >>> and >>>> for this I am downloading and unzipping each of the full pdb >>>> structure >>>> files, parsing to get just id and title. Is there any other data >>>> source >>> Hi Neeti, >>> this is a non bioperl way to download the data. >>> Use the SRS server on the EBI page to download only id and title >>> lines >>> from >>> pdb. >>> >>> 1) Point your browser to the SRS page (http://srs.ebi.ac.uk). >>> 2) Search for 'PDB' on the 'library page' and select it. >>> 3) Use the standard query form. Select 'id' in the dropdown list and >>> insert '*' (wildcard). >>> 4) Create a view by selecting 'ID' and 'Title', then click the >>> search >>> button. >>> 5) Click the save results button. >>> 6) Select 'file' in the 'output to' area and 'ALL' in the 'Number of >>> entries >>> to download' field. Press 'save'. >>> >>> If the download is slow, read the 'download tips' on the download >>> page and >>> split the results in chunks. >>> >>> -- >>> Oliver >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> > > -- > Dipl.-Inform.(FH) > Bernd Mueller > phone: +49 179 2336692 > email: bernd at kirx.de > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bernd at kirx.de Mon Aug 20 15:03:29 2007 From: bernd at kirx.de (Bernd Mueller) Date: Mon, 20 Aug 2007 21:03:29 +0200 Subject: [Bioperl-l] PDB Parser In-Reply-To: <4EAE752E-CACB-41AF-BF55-7A83071CE590@uiuc.edu> References: <764978cf0708152256w38b2bfa7ic4d196ae08a46bd0@mail.gmail.com> <46C5A405.2070005@sendu.me.uk> <764978cf0708192133i3d4ef031ic34d11f1f91e59b7@mail.gmail.com> <200708201042.55292.Oliver.Wafzig@sygnis.de> <764978cf0708200605j2dd63973p9cd88787b3acdbc8@mail.gmail.com> <46C9C7F8.3020608@kirx.de> <4EAE752E-CACB-41AF-BF55-7A83071CE590@uiuc.edu> Message-ID: <46C9E581.1010907@kirx.de> I attached my script. Actually I tried to download all articles to a certain search term with that script. The problem was that the retrieved documents were not free as mentioned in the documentation of EUtilities on the NCBI page. So many of the downloaded documents in xml-format were just dummies containing only the abstract but not the fulltext article. Bernd Chris Fields wrote: > Just curious, but what kind of query were you trying? It might be worth > trying to work through it to add as an example to the cookbook page. > > chris > > On Aug 20, 2007, at 11:57 AM, Bernd Mueller wrote: > >> Hello, >> >> Maybe you wanna try the Database-EUtilities module from bioperl. They >> are described on http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook >> >> I tried them for a similar search on pubmed but without any reasonable >> results because my target was too focused. >> >> From EUtilities documentation on >> http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helpentrez.section.EntrezHelp.The_Databases >> >> >> "Protein Database >> >> The Protein database contains sequence data from the translated coding >> regions from DNA sequences in GenBank, EMBL, and DDBJ as well as protein >> sequences submitted to Protein Information Resource (PIR), SWISS-PROT, >> Protein Research Foundation (PRF), and Protein Data Bank (PDB) >> (sequences from solved structures). " >> >> So PDB is included in eutilities from NCBI. >> >> Regards, >> Bernd >> >> neeti somaiya wrote: >>> Thanks for your response. >>> Actually I am looking for something standalone and not on the web, as in >>> something which I can download onto my machine and parse later to get >>> id and >>> title. >>> >>> On 8/20/07, Oliver Wafzig wrote: >>>> On Monday 20 August 2007 06:33, neeti somaiya wrote: >>>>> Another question I had was, I am interested only in pdb id and title, >>>> and >>>>> for this I am downloading and unzipping each of the full pdb structure >>>>> files, parsing to get just id and title. Is there any other data >>>>> source >>>> Hi Neeti, >>>> this is a non bioperl way to download the data. >>>> Use the SRS server on the EBI page to download only id and title lines >>>> from >>>> pdb. >>>> >>>> 1) Point your browser to the SRS page (http://srs.ebi.ac.uk). >>>> 2) Search for 'PDB' on the 'library page' and select it. >>>> 3) Use the standard query form. Select 'id' in the dropdown list and >>>> insert '*' (wildcard). >>>> 4) Create a view by selecting 'ID' and 'Title', then click the search >>>> button. >>>> 5) Click the save results button. >>>> 6) Select 'file' in the 'output to' area and 'ALL' in the 'Number of >>>> entries >>>> to download' field. Press 'save'. >>>> >>>> If the download is slow, read the 'download tips' on the download >>>> page and >>>> split the results in chunks. >>>> >>>> -- >>>> Oliver >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> >>> >> >> --Dipl.-Inform.(FH) >> Bernd Mueller >> phone: +49 179 2336692 >> email: bernd at kirx.de >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > > > -- Dipl.-Inform.(FH) Bernd Mueller phone: +49 179 2336692 email: bernd at kirx.de -------------- next part -------------- A non-text attachment was scrubbed... Name: myBioPerl.pl Type: application/x-perl Size: 1983 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070820/af579f0a/attachment.bin From jayoung at fhcrc.org Mon Aug 20 18:09:04 2007 From: jayoung at fhcrc.org (Janet Young) Date: Mon, 20 Aug 2007 15:09:04 -0700 Subject: [Bioperl-l] Assembly::IO write_assembly and remove_seq Message-ID: Hi all, I realized last week that write_assembly isn't implemented in Assemble::IO (see http://bioperl.org/pipermail/bioperl-l/2006-May/021619.html ) I know this has been asked before, but I wondered if anything has changed - does anyone have any plans to write a write_assembly method? Alternatively, any suggestions for an alternative solution to what I'm trying to do? I'm trying to write a script to make improvements to the assembly that phredPhrap comes out with - it seems to quite frequently throw an unrelated sequence into a contig with either no matching sequence at all, or very little matching sequence. Mysterious. Anyway, my script can recognize the bad sequences easily enough, and thought I'd be able to remove them and then write the modified assembly. No joy. One very inelegant solution I've played with is that I can add some "markedHighQuality" tags to the discrepant sequences in the ace file, meaning that next time phredPhrap is run, it sometimes manages not to assemble the sequences that shouldn't be there. I'm not sure this will work in all cases, and it seems like quite an unsatisfactory way to do it. For the same reason, I'm hoping someone can tell me what remove_seq does to a contig object? I'm using it and I don't get any error messages (returns 1), but when I check the contig object afterwards with get_seq_ids, the sequence I wanted to remove didn't seem to go away. Also, when I check out the primary_tags for that contig in the objects returned by get_features_collection, nothing seems to have changed. So I'm not sure whether the sequence really was removed from anything at all, and if it was, which object did it get removed from? (a snippet of my code is below) my @seqids = $contig->get_seq_ids(); print OUT "seqids @seqids\n"; my $seqobj = $contig->get_seq_by_name($seq); $contig->remove_seq($seqobj) || die "failed to remove seq\n"; @seqids = $contig->get_seq_ids(); print OUT "seqids @seqids\n"; thanks for any advice, Janet Young ------------------------------------------------------------------- Dr. Janet Young (Trask lab) Fred Hutchinson Cancer Research Center 1100 Fairview Avenue N., C3-168, P.O. Box 19024, Seattle, WA 98109-1024, USA. tel: (206) 667 1471 fax: (206) 667 6524 email: jayoung at fhcrc.org http://www.fhcrc.org/labs/trask/ ------------------------------------------------------------------- From cjfields at uiuc.edu Tue Aug 21 00:06:26 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 20 Aug 2007 23:06:26 -0500 Subject: [Bioperl-l] EUtilities, was Re: PDB Parser In-Reply-To: <46C9E581.1010907@kirx.de> References: <764978cf0708152256w38b2bfa7ic4d196ae08a46bd0@mail.gmail.com> <46C5A405.2070005@sendu.me.uk> <764978cf0708192133i3d4ef031ic34d11f1f91e59b7@mail.gmail.com> <200708201042.55292.Oliver.Wafzig@sygnis.de> <764978cf0708200605j2dd63973p9cd88787b3acdbc8@mail.gmail.com> <46C9C7F8.3020608@kirx.de> <4EAE752E-CACB-41AF-BF55-7A83071CE590@uiuc.edu> <46C9E581.1010907@kirx.de> Message-ID: <7BE17595-9BC0-498B-AFA9-03ED0C853BFC@uiuc.edu> Bernd, Just in case you weren't aware, I have changed several aspects of EUtilities since the 1.5.2 release, so any code in the HOWTO cookbook applies ONLY to the version found in CVS (there is a big note at the top stating such). This should be the finalized API which I intend on supporting from this point on. The reason I indicate that is there are several giveaways which indicate you are using the older API from 1.5.2 (using next_cookie, for instance). The following modification of your script (using the API in bioperl- live) works for me. You should be able to do something similar with the older API as well but I haven't tried. Note that PMC full-text retrieval only works if the article is declared 'open-access'; not all journals allow that. Also, any full-text is only available as XML which (I'm guessing here) is transformed to HTML for PMC. .... my $agent = Bio::DB::EUtilities->new(-eutil => 'esearch', -db => $db, -term => $query, -usehistory => 'y'); my $ct = $agent->get_count; print "Count = $ct\n"; my $history = $agent->next_History; if ($fetch eq 'yes') { my ($retmax, $retstart) = (1,0); while ($retstart < $ct) { $agent->set_parameters( -eutil => 'efetch', -history => $history, -rettype => 'xml', -retmax => $retmax, -retstart => $retstart, ); $agent->get_Response(-file => ">./papers/paper_ $retstart.xml"); $retstart += $retmax; } } ------------------------------ It may also be possible to grab the LinkOut for these and try to nab the PDF or use the DOI, but I haven't tried anything like that. chris On Aug 20, 2007, at 2:03 PM, Bernd Mueller wrote: > I attached my script. > > Actually I tried to download all articles to a certain search term > with > that script. The problem was that the retrieved documents were not > free > as mentioned in the documentation of EUtilities on the NCBI page. So > many of the downloaded documents in xml-format were just dummies > containing only the abstract but not the fulltext article. > > Bernd > > Chris Fields wrote: >> Just curious, but what kind of query were you trying? It might be >> worth trying to work through it to add as an example to the >> cookbook page. >> chris From n.haigh at sheffield.ac.uk Tue Aug 21 04:19:59 2007 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Tue, 21 Aug 2007 09:19:59 +0100 Subject: [Bioperl-l] subversion progress Message-ID: <46CAA02F.60803@sheffield.ac.uk> Hi, I was just wondering if there was any further progress towards the svn migration recently? What is still needing to be done? Cheers Nath From neetisomaiya at gmail.com Tue Aug 21 05:41:22 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Tue, 21 Aug 2007 15:11:22 +0530 Subject: [Bioperl-l] PDB Parser In-Reply-To: <764978cf0708201039g53b29f29i36eed1a7acd5a892@mail.gmail.com> References: <764978cf0708152256w38b2bfa7ic4d196ae08a46bd0@mail.gmail.com> <46C5A405.2070005@sendu.me.uk> <764978cf0708192133i3d4ef031ic34d11f1f91e59b7@mail.gmail.com> <200708201042.55292.Oliver.Wafzig@sygnis.de> <764978cf0708200605j2dd63973p9cd88787b3acdbc8@mail.gmail.com> <46C9C7F8.3020608@kirx.de> <764978cf0708201039g53b29f29i36eed1a7acd5a892@mail.gmail.com> Message-ID: <764978cf0708210241h4c4b802en8ec2f6e9b0c01a74@mail.gmail.com> Hi, I wanted to automate my pdb script, right from downloading of data. As per the solution given by RCSB about custom report for pdb ids and titles only, I was trying something like the code below, but it doesnt seem to work :- my $url = ' http://www.pdb.org/pdb/results/tabularReport.do?reportTitle=CustomReport&customReportColumns= VStructureSummary.structureId~VCitation.title&format=csv'; use LWP::Simple; my $content = get $url; die "Couldn't get $url" unless defined $content; Can anyone tell how I can do it, if there is any other way to do it, or if I am going wrong somewhere, or if it is't possible for this case at all. Please help. On 8/20/07, neeti somaiya wrote: > > Hi, > > Thanks for all the responses. > I got the solution from RCBS people :- > > Dear Dr. Somaiya, > > Thank you for your email message. > > Please try the following: > 1) Go to http://www.pdb.org/pdb/statistics/holdings.do and select the > number in the bottom right corner of the table (currently 45213). > 2) From the menu on the left select 'Tabulate'>>'Custom Report' and > under 'Primary Citation' select 'Title' > 3) At the bottom, select 'Create Report' and then one of the 'Download' > options. > > Please let us know if we can be of additional assistance. > > Sincerely, > Rachel Green > > On 8/20/07, Bernd Mueller wrote: > > > > Hello, > > > > Maybe you wanna try the Database-EUtilities module from bioperl. They > > are described on http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > > > > I tried them for a similar search on pubmed but without any reasonable > > results because my target was too focused. > > > > From EUtilities documentation on > > > > http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helpentrez.section.EntrezHelp.The_Databases > > > > "Protein Database > > > > The Protein database contains sequence data from the translated coding > > regions from DNA sequences in GenBank, EMBL, and DDBJ as well as protein > > > > sequences submitted to Protein Information Resource (PIR), SWISS-PROT, > > Protein Research Foundation (PRF), and Protein Data Bank (PDB) > > (sequences from solved structures). " > > > > So PDB is included in eutilities from NCBI. > > > > Regards, > > Bernd > > > > neeti somaiya wrote: > > > Thanks for your response. > > > Actually I am looking for something standalone and not on the web, as > > in > > > something which I can download onto my machine and parse later to get > > id and > > > title. > > > > > > On 8/20/07, Oliver Wafzig wrote: > > >> On Monday 20 August 2007 06:33, neeti somaiya wrote: > > >>> Another question I had was, I am interested only in pdb id and > > title, > > >> and > > >>> for this I am downloading and unzipping each of the full pdb > > structure > > >>> files, parsing to get just id and title. Is there any other data > > source > > >> Hi Neeti, > > >> this is a non bioperl way to download the data. > > >> Use the SRS server on the EBI page to download only id and title > > lines > > >> from > > >> pdb. > > >> > > >> 1) Point your browser to the SRS page (http://srs.ebi.ac.uk ). > > >> 2) Search for 'PDB' on the 'library page' and select it. > > >> 3) Use the standard query form. Select 'id' in the dropdown list and > > >> insert '*' (wildcard). > > >> 4) Create a view by selecting 'ID' and 'Title', then click the search > > >> button. > > >> 5) Click the save results button. > > >> 6) Select 'file' in the 'output to' area and 'ALL' in the 'Number of > > >> entries > > >> to download' field. Press 'save'. > > >> > > >> If the download is slow, read the 'download tips' on the download > > page and > > >> split the results in chunks. > > >> > > >> -- > > >> Oliver > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >> > > > > > > > > > > > > > -- > > Dipl.-Inform.(FH) > > Bernd Mueller > > phone: +49 179 2336692 > > email: bernd at kirx.de > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > -Neeti > Even my blood says, B positive > -- -Neeti Even my blood says, B positive From cjfields at uiuc.edu Tue Aug 21 10:40:03 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 21 Aug 2007 09:40:03 -0500 Subject: [Bioperl-l] subversion progress In-Reply-To: <46CAA02F.60803@sheffield.ac.uk> References: <46CAA02F.60803@sheffield.ac.uk> Message-ID: <5C65BAED-61CF-4028-977E-0CD451FA2EC3@uiuc.edu> Not sure myself, to tell the truth. Pretty much everything was ready to go (i.e. svn commits work, commits post to bioperl-guts, etc.); the only possible exception was svn->cvs syncing. I believe the decision for svn access is to stick with ssh only for now for simplicity's sake. I may have to go back into the archives to refresh my memory on all the details... I think a time for the switchover just has to be set so that everybody is adequately forewarned, and the docs for getting started on SVN need to be updated accordingly. chris On Aug 21, 2007, at 3:19 AM, Nathan Haigh wrote: > Hi, > > I was just wondering if there was any further progress towards the svn > migration recently? What is still needing to be done? > > Cheers > Nath > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jwalker at watson.wustl.edu Tue Aug 21 11:20:46 2007 From: jwalker at watson.wustl.edu (Jason Walker) Date: Tue, 21 Aug 2007 10:20:46 -0500 Subject: [Bioperl-l] RemoteBlast not handling NCBI Error message Message-ID: <46CB02CE.1080803@watson.wustl.edu> I've noticed RemoteBlast does not handle a specific error message from NCBI correctly. retrieve_blast() should return 0 if waiting, -1 on error, or the results when completed. It looks like the method relies on a specific tag in the NCBI return, 'QBlastInfoBegin'. The error message I'm getting does not have this tag or a value of 'Status=ERROR'. After contacting NCBI 'Blast-help', they stated that QBlastInfoBegin should not be expected from all GET requests. The error can be reproduced by using RID CM2YJJW501R, until it expires tomorrow. my $rid = 'CM2YJJW501R'; my $factory = Bio::Tools::Run::RemoteBlast->new( -verbose => 1,); my $rc = $factory->retrieve_blast($rid); print $rc ."\n"; The content returned from NCBI looks like:
ERROR: An error has occurred on the server, Too many HSPs to save all Contact Blast-help at ncbi.nlm.nih.gov and include your RID: CM2YJJW501R
I added a conditional statement as seen below to correct my local copy. I'm not sure this is the best fix, but it works. sub retrieve_blast { ... if( /QBlastInfoBegin/i ) { $s = 1; } elsif( $s ) { if( /Status=(WAITING|ERROR|READY)/i ) { ... } } elsif( /^(?:#\s)?[\w-]*?BLAST\w+/ ) { $waiting = 0; last; } elsif ( /ERROR/i ) { close($TMP); open(my $ERR, "<$tempfile") or $self->throw("cannot open file $tempfile"); $self->warn(join("", <$ERR>)); close $ERR; return -1; } ... } Thanks, Jason Walker From cjfields at uiuc.edu Tue Aug 21 12:15:36 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 21 Aug 2007 11:15:36 -0500 Subject: [Bioperl-l] RemoteBlast not handling NCBI Error message In-Reply-To: <46CB02CE.1080803@watson.wustl.edu> References: <46CB02CE.1080803@watson.wustl.edu> Message-ID: <348D8645-5DC2-4606-9650-EB08D8053F3D@uiuc.edu> On Aug 21, 2007, at 10:20 AM, Jason Walker wrote: > I've noticed RemoteBlast does not handle a specific error message from > NCBI correctly. retrieve_blast() should return 0 if waiting, -1 on > error, or the results when completed. It looks like the method relies > on a specific tag in the NCBI return, 'QBlastInfoBegin'. The error > message I'm getting does not have this tag or a value of > 'Status=ERROR'. After contacting NCBI 'Blast-help', they stated that > QBlastInfoBegin should not be expected from all GET requests. The > error > can be reproduced by using RID CM2YJJW501R, until it expires tomorrow. > ... > I added a conditional statement as seen below to correct my local > copy. > I'm not sure this is the best fix, but it works. > sub retrieve_blast { > ... > if( /QBlastInfoBegin/i ) { > $s = 1; > } elsif( $s ) { > if( /Status=(WAITING|ERROR|READY)/i ) { > ... > } > } elsif( /^(?:#\s)?[\w-]*?BLAST\w+/ ) { > $waiting = 0; > last; > } elsif ( /ERROR/i ) { > close($TMP); > open(my $ERR, "<$tempfile") or $self->throw("cannot open file > $tempfile"); > $self->warn(join("", <$ERR>)); > close $ERR; > return -1; > } > ... > } > > Thanks, > Jason Walker I have added this to RemoteBlast in bioperl cvs. Thanks for the notice! chris From bernd.web at gmail.com Tue Aug 21 12:32:09 2007 From: bernd.web at gmail.com (Bernd Web) Date: Tue, 21 Aug 2007 18:32:09 +0200 Subject: [Bioperl-l] SearchIO-BLAST Message-ID: <716af09c0708210932m34bfb2a7o2094124a8832d705@mail.gmail.com> Dear all, Recently, I stumbled on something with parsing BLAST reports. To a plain text blast report from NCBI a ">aaa" got prepended. This (fasta-like header) changes the $result->hits array. The amount of hits is now 2*num_hits + 1. Clearly, this is related to faulty input, but still the effect of this line is great. Does someone see what is causing this, and should the BLAST parser maybe be slightly more relaxed wrt pre/appended text? I have not seen yet why this extra fastaheader line has such a "large" effect. A short example BLASTN output is attached. Example code is: use Bio::SearchIO; my $in = new Bio::SearchIO(-format => 'blast', -file => 'apoe_plain.bls'); while( my $result = $in->next_result ) { print "Num of hits: ", $result->num_hits, "\n"; my @hits = $result->hits; foreach my $el (@hits) { print $el->name, "\n"; } Kind regards, Bernd -------------- next part -------------- A non-text attachment was scrubbed... Name: apoe_plain.bls Type: application/octet-stream Size: 7890 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070821/a367809e/attachment.obj From cjfields at uiuc.edu Tue Aug 21 17:53:44 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 21 Aug 2007 16:53:44 -0500 Subject: [Bioperl-l] SearchIO-BLAST In-Reply-To: <716af09c0708210932m34bfb2a7o2094124a8832d705@mail.gmail.com> References: <716af09c0708210932m34bfb2a7o2094124a8832d705@mail.gmail.com> Message-ID: <59FF775C-8CAC-4947-A5BA-835ADD45CD32@uiuc.edu> I can confirm this (I'm using bioperl-live). The output I get is: Num of hits: 9 ref|NM_000039.1| ref|NT_113960.1|Hs22_111679 ref|NT_033899.7|Hs11_34054 ref|NW_925173.1|HsCraAADB02_444 ref|NM_000039.1| ref|NT_113960.1|Hs22_111679 ref|NT_033899.7|Hs11_34054 ref|NW_925173.1|HsCraAADB02_444 ref|NW_925173.1|HsCraAADB02_444 The extra '>' is definitely throwing the event calls for a loop; the 2x increase is b/c an extra iteration is started when '>' is encountered (changing the event handler reduces the number to 5). The extra hit is from the '>' at the beginning. I hate to say it, but this is an instance where we can't be more flexible, primarily b/c '>' is a legit token the parser looks for (it is the beginning of the hit block in reports). Finding it as the initial token in the report is also legitimate for some older BLAST output, so we also can't simply bypass it. You'll unfortunately have to preparse the reports to get rid of those lines prior to feeding them to the BLAST text report parser. chris On Aug 21, 2007, at 11:32 AM, Bernd Web wrote: > Dear all, > > Recently, I stumbled on something with parsing BLAST reports. To a > plain text blast report from NCBI a ">aaa" got prepended. This > (fasta-like header) changes the $result->hits array. > The amount of hits is now 2*num_hits + 1. Clearly, this is related to > faulty input, but still the effect of this line is great. Does someone > see what is causing this, and should the BLAST parser maybe be > slightly more relaxed wrt pre/appended text? I have not seen yet why > this extra fastaheader line has such a "large" effect. > > A short example BLASTN output is attached. > Example code is: > > use Bio::SearchIO; > my $in = new Bio::SearchIO(-format => 'blast', > -file => 'apoe_plain.bls'); > while( my $result = $in->next_result ) { > print "Num of hits: ", $result->num_hits, "\n"; > my @hits = $result->hits; > foreach my $el (@hits) { > print $el->name, "\n"; > } > > > Kind regards, > Bernd > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Tue Aug 21 23:03:55 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 21 Aug 2007 23:03:55 -0400 Subject: [Bioperl-l] subversion progress In-Reply-To: <5C65BAED-61CF-4028-977E-0CD451FA2EC3@uiuc.edu> References: <46CAA02F.60803@sheffield.ac.uk> <5C65BAED-61CF-4028-977E-0CD451FA2EC3@uiuc.edu> Message-ID: <51A5996D-A976-47FD-8807-20F6EBAF9E42@gmx.net> On Aug 21, 2007, at 10:40 AM, Chris Fields wrote: > I think a time for the switchover just has to be set so that > everybody is adequately forewarned, and the docs for getting started > on SVN need to be updated accordingly. That was my recollection too. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From bix at sendu.me.uk Wed Aug 22 03:51:42 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 22 Aug 2007 08:51:42 +0100 Subject: [Bioperl-l] PDB Parser In-Reply-To: <764978cf0708210241h4c4b802en8ec2f6e9b0c01a74@mail.gmail.com> References: <764978cf0708152256w38b2bfa7ic4d196ae08a46bd0@mail.gmail.com> <46C5A405.2070005@sendu.me.uk> <764978cf0708192133i3d4ef031ic34d11f1f91e59b7@mail.gmail.com> <200708201042.55292.Oliver.Wafzig@sygnis.de> <764978cf0708200605j2dd63973p9cd88787b3acdbc8@mail.gmail.com> <46C9C7F8.3020608@kirx.de> <764978cf0708201039g53b29f29i36eed1a7acd5a892@mail.gmail.com> <764978cf0708210241h4c4b802en8ec2f6e9b0c01a74@mail.gmail.com> Message-ID: <46CBEB0E.8030200@sendu.me.uk> neeti somaiya wrote: > Hi, > > I wanted to automate my pdb script, right from downloading of data. As per > the solution given by RCSB about custom report for pdb ids and titles only, > I was trying something like the code below, but it doesnt seem to work :- > > my $url = ' > http://www.pdb.org/pdb/results/tabularReport.do?reportTitle=CustomReport&customReportColumns= > VStructureSummary.structureId~VCitation.title&format=csv'; > use LWP::Simple; > my $content = get $url; > die "Couldn't get $url" unless defined $content; > > Can anyone tell how I can do it, if there is any other way to do it, or if I > am going wrong somewhere, or if it is't possible for this case at all. Use LWP::UserAgent so you can see what's going on. my $ua = LWP::UserAgent->new; $ua->timeout(10); my $response = $ua->get($url); if ($response->is_success) { print $response->content; } else { die $response->status_line; } Gives: 500 Internal Server Error Most likely the server is expecting some kind of cookie and falls over when you try to visit that url without it. So start where they told you to and grab pages successively, keeping any cookies. From neetisomaiya at gmail.com Wed Aug 22 06:06:38 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Wed, 22 Aug 2007 15:36:38 +0530 Subject: [Bioperl-l] PDB Parser In-Reply-To: <46CBEB0E.8030200@sendu.me.uk> References: <764978cf0708152256w38b2bfa7ic4d196ae08a46bd0@mail.gmail.com> <46C5A405.2070005@sendu.me.uk> <764978cf0708192133i3d4ef031ic34d11f1f91e59b7@mail.gmail.com> <200708201042.55292.Oliver.Wafzig@sygnis.de> <764978cf0708200605j2dd63973p9cd88787b3acdbc8@mail.gmail.com> <46C9C7F8.3020608@kirx.de> <764978cf0708201039g53b29f29i36eed1a7acd5a892@mail.gmail.com> <764978cf0708210241h4c4b802en8ec2f6e9b0c01a74@mail.gmail.com> <46CBEB0E.8030200@sendu.me.uk> Message-ID: <764978cf0708220306u77cedf22xdd132b324e306f33@mail.gmail.com> Thanks a lot. It worked for me. use LWP::UserAgent; use HTTP::Cookies; $ua = LWP::UserAgent->new; $ua->cookie_jar(HTTP::Cookies->new(file => "lwpcookies.txt", autosave => 1)); $request = HTTP::Request->new('GET', ' http://www.pdb.org/pdb/search/smartSubquery.do?smartSearchSubtype=HoldingsQuery&moleculeType=ignore&experimentalMethod=ignore' ); $response = $ua->request($request); if ($response->is_success) { print "\nSuccessfully connected to url http://www.pdb.org/pdb/search/smartSubquery.do?smartSearchSubtype=HoldingsQuery&moleculeType=ignore&experimentalMethod=ignore\n "; $request = HTTP::Request->new('GET', ' http://www.pdb.org/pdb/results/tabularForm.do'); $response = $ua->request($request); if ($response->is_success) { print "\nSuccessfully connected to url http://www.pdb.org/pdb/results/tabularForm.do\n"; $request = HTTP::Request->new('GET', ' http://www.pdb.org/pdb/results/tabularReport.do?reportTitle=CustomReport&customReportColumns= VStructureSummary.structureId~VCitation.title&format=csv'); $response = $ua->request($request); if ($response->is_success) { print "\nSuccessfully connected to url http://www.pdb.org/pdb/results/tabularReport.do?reportTitle=CustomReport&customReportColumns= VStructureSummary.structureId~VCitation.title&format=csv\n"; open(FH,">tabularResults.csv"); print FH $response->content; close(FH); } else { die $response->status_line; } } else { die $response->status_line; } } else { die $response->status_line; } On 8/22/07, Sendu Bala wrote: > > neeti somaiya wrote: > > Hi, > > > > I wanted to automate my pdb script, right from downloading of data. As > per > > the solution given by RCSB about custom report for pdb ids and titles > only, > > I was trying something like the code below, but it doesnt seem to work > :- > > > > my $url = ' > > > http://www.pdb.org/pdb/results/tabularReport.do?reportTitle=CustomReport&customReportColumns= > > VStructureSummary.structureId~VCitation.title&format=csv'; > > use LWP::Simple; > > my $content = get $url; > > die "Couldn't get $url" unless defined $content; > > > > Can anyone tell how I can do it, if there is any other way to do it, or > if I > > am going wrong somewhere, or if it is't possible for this case at all. > > Use LWP::UserAgent so you can see what's going on. > > my $ua = LWP::UserAgent->new; > $ua->timeout(10); > my $response = $ua->get($url); > if ($response->is_success) { > print $response->content; > } > else { > die $response->status_line; > } > > > Gives: > 500 Internal Server Error > > Most likely the server is expecting some kind of cookie and falls over > when you try to visit that url without it. So start where they told you > to and grab pages successively, keeping any cookies. > -- -Neeti Even my blood says, B positive From jay at jays.net Wed Aug 22 08:54:29 2007 From: jay at jays.net (Jay Hannah) Date: Wed, 22 Aug 2007 07:54:29 -0500 Subject: [Bioperl-l] wiki: Current Events Message-ID: <24715480-EC15-493F-85C9-C367348E28F1@jays.net> http://www.bioperl.org/wiki/Main_Page Please change: < BOSC 2007 will be held July 19-20, 2007 > BOSC 2007 was held July 19-20, 2007 I'd change it but the page is locked. Even when I'm logged in. :) Thanks, Jay Hannah http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From cjfields at uiuc.edu Wed Aug 22 09:58:32 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 22 Aug 2007 08:58:32 -0500 Subject: [Bioperl-l] wiki: Current Events In-Reply-To: <24715480-EC15-493F-85C9-C367348E28F1@jays.net> References: <24715480-EC15-493F-85C9-C367348E28F1@jays.net> Message-ID: Done. chris On Aug 22, 2007, at 7:54 AM, Jay Hannah wrote: > http://www.bioperl.org/wiki/Main_Page > > Please change: > > < BOSC 2007 will be held July 19-20, 2007 >> BOSC 2007 was held July 19-20, 2007 > > I'd change it but the page is locked. Even when I'm logged in. :) > > Thanks, > > Jay Hannah > http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From shameer at ncbs.res.in Wed Aug 22 15:45:42 2007 From: shameer at ncbs.res.in (Shameer Khadar) Date: Thu, 23 Aug 2007 01:15:42 +0530 (IST) Subject: [Bioperl-l] How to 'force' Bio::Graphics to draw image according to input file ? In-Reply-To: References: <10259461.post@talk.nabble.com> <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in> <1178028249.2644.13.camel@localhost.localdomain> <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in> <6dce9a0b0705030901w203344b4te03ad271a5482faf@mail.gmail.com> <51133.192.168.1.1.1187003265.squirrel@mail.ncbs.res.in> <46C05896.1010002@sendu.me.uk> <59564.192.168.1.1.1187016455.squirrel@mail.ncbs.res.in> <46C07257.1000308@sendu.me.uk> Message-ID: <44632.192.168.1.1.1187811942.squirrel@mail.ncbs.res.in> Dear All, Is there any option in Bio::Graphics to draw image based on the hits as explained in the hits file. For example I am using an input file: # hit score start end Query 0 1 101 Sequence_Segment_1 0 1 101 PD:LRR_1|CS:AAC34139 0.16 1 23 PD:LRR_1|CS:AAC34139 3.6 1 22 PD:LRR_1|CS:AAC34139 1.8 1 22 PD:LRR_1|CS:AAC34139 1.3 1 22 PD:LRR_1|CS:XP_640228 2.5 2 23 ..... Cropped PD:LRR_1|CS:NP_611007 55 3 23 PD:LRR_1|CS:NP_611007 3.7 3 24 PD:LRR_1|CS:NP_611007 4.5 3 24 PD:LRR_1|CS:NP_611007 0.71 3 24 If you look at the image, you can see that, its all jumbled up and it doesnt make any sense in the first look. I am looking for an option to draw each of the glyph one by one (say \n), rather that accomodating it internally by the Bio::Graphics. PS. Image is attached with this mail. I am using Dr. L. Stein's example : use strict; use Bio::Graphics; use Bio::SeqFeature::Generic; my $panel = Bio::Graphics::Panel->new(-length => 700, -width => 800, -pad_left => 10, -pad_right => 10, ); my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>700); $panel->add_track($full_length, -glyph => 'arrow', -tick => 2, -fgcolor => 'black', -double => 1, ); my $track = $panel->add_track( -------------- next part -------------- A non-text attachment was scrubbed... Name: test.png Type: image/png Size: 27974 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070823/be285f43/attachment-0001.png From cjfields at uiuc.edu Thu Aug 23 00:53:55 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 22 Aug 2007 23:53:55 -0500 Subject: [Bioperl-l] SeqFeature/AnnotatableI and rel. 1.6 Message-ID: As many of the devs know, there are a number of Feature/Annotation issues that need to be resolved prior to a 1.6 release: http://www.bioperl.org/wiki/Release_Schedule#SeqFeature. 2FAnnotation_changes:_Keep_or_roll_back.3F There has been little work done over the last 2 1/2 years to undo or rectify problems associated with those additions; I feel like those of us still routinely contributing have been left holding the bag. There has also been very little attempt to document any of this adequately enough; as an example see POD for Bio::SeqFeature::Annotated (what little there is). I would like to suggest the radical idea of rolling back AnnotatableI/ SeqFeatureI changes to a much simpler rel. 1.4-like behavior (tags are simple scalars) and possibly work in implementing Ewan's SeqFeature::TypedSeqFeatureI for those who want strong data types (i.e. Bio::FeatureIO/Bio::SeqFeature::Annotated). The various AnnotatableI changes, odd inheritance, and operator overloading have really obfuscated the code to the point where no one wants to touch it in case it breaks something important. However, I believe it is the one serious impediment to a new stable release. My thought is we simplify all the relevant interfaces, essentially reverting back to rel 1.4. For instance, we move the various Bio::AnnotatableI tag methods back into Bio::SeqFeatureI. Bio::SeqFeature::Annotated would implement Bio::AnnotatableI directly, and (if needed) also implement Bio::SeqFeature::TypedSeqFeatureI, so the impetus is on Bio::SeqFeature::Annotated to overload the relevant SeqFeatureI methods correctly, just as any other class would when implementing an abstract interface. I have played around with this a bit and managed to get most tests working again for Bio::SeqFeature::Generic and FeatureIO but a number of others break. If needed I can try this out on a branch (a bit ironic, since the changes instigating this mess should have been tested on a branch!). Maybe this will get the ball rolling towards a 1.6 release. Any thoughts? chris From shameer at ncbs.res.in Thu Aug 23 03:06:34 2007 From: shameer at ncbs.res.in (Shameer Khadar) Date: Thu, 23 Aug 2007 12:36:34 +0530 (IST) Subject: [Bioperl-l] How to 'force' Bio::Graphics to draw image according to input file ? In-Reply-To: <44632.192.168.1.1.1187811942.squirrel@mail.ncbs.res.in> References: <10259461.post@talk.nabble.com> <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in> <1178028249.2644.13.camel@localhost.localdomain> <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in> <6dce9a0b0705030901w203344b4te03ad271a5482faf@mail.gmail.com> <51133.192.168.1.1.1187003265.squirrel@mail.ncbs.res.in> <46C05896.1010002@sendu.me.uk> <59564.192.168.1.1.1187016455.squirrel@mail.ncbs.res.in> <46C07257.1000308@sendu.me.uk> <44632.192.168.1.1.1187811942.squirrel@mail.ncbs.res.in> Message-ID: <34980.192.168.1.1.1187852794.squirrel@mail.ncbs.res.in> Dear All, I will make my question simple : Is there any way to force the 'Bio::graphics' module to print only one glyph in a track ? PS. More Detailed explanation is in my earlier mail (Dont want to spam the community with my same mail) Eagerly waiting for a reply. Thanks, -- Shameer Khadar Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India T - 91-080-23666001 EXT - 6251 W - http://www.ncbs.res.in From cain.cshl at gmail.com Thu Aug 23 04:54:40 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 23 Aug 2007 04:54:40 -0400 Subject: [Bioperl-l] How to 'force' Bio::Graphics to draw image according to input file ? In-Reply-To: <34980.192.168.1.1.1187852794.squirrel@mail.ncbs.res.in> References: <10259461.post@talk.nabble.com> <41667.192.168.1.1.1178019391.squirrel@mail.ncbs.res.in> <1178028249.2644.13.camel@localhost.localdomain> <42391.192.168.1.1.1178035451.squirrel@mail.ncbs.res.in> <6dce9a0b0705030901w203344b4te03ad271a5482faf@mail.gmail.com> <51133.192.168.1.1.1187003265.squirrel@mail.ncbs.res.in> <46C05896.1010002@sendu.me.uk> <59564.192.168.1.1.1187016455.squirrel@mail.ncbs.res.in> <46C07257.1000308@sendu.me.uk> <44632.192.168.1.1.1187811942.squirrel@mail.ncbs.res.in> <34980.192.168.1.1.1187852794.squirrel@mail.ncbs.res.in> Message-ID: <1187859296.2546.6.camel@103.48.216.10.in-addr.arpa> Shameer, I don't think that's really what you want. It seems to me that sorting them in some useful way (say, by score) would make more sense. There is an example using the -sort_order option in Lincoln's howto. Scott On Thu, 2007-08-23 at 12:36 +0530, Shameer Khadar wrote: > Dear All, > > I will make my question simple : > Is there any way to force the 'Bio::graphics' module to print only one > glyph in a track ? > > PS. More Detailed explanation is in my earlier mail (Dont want to spam the > community with my same mail) > > Eagerly waiting for a reply. > Thanks, -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070823/6066f0ec/attachment.bin From cjfields at uiuc.edu Thu Aug 23 10:14:51 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 23 Aug 2007 09:14:51 -0500 Subject: [Bioperl-l] extra rel. 1.6 suggestion Message-ID: <3A2C3BFD-2FA1-402B-9597-6E51A72E7096@uiuc.edu> Some interesting points by Sendu: http://www.bioperl.org/wiki/Release_Schedule#Need_tests which I agree with completely. Maybe the best way out if this is a variation on something that was suggested before, which was 'splitting' the code into groups. What if we set up a way to automatically gauge test coverage, documentation, etc.? If I remember correctly Nathan had something running at one point which did this. If so, we could determine which code is potentially 'non-compliant' and needs to be fixed (tests added, docs brought up to spec, so on), and thus prioritize at the minimum what needs to be done for a 1.6 release. If it's deemed not worth worrying about (no active development, author is out of contact, we have more important priorities) we split that code off into a separate 'dev' package. That would save some of the headache of trying to split maintenance of ~1000 modules up on only a few devs. Thoughts? chris From bix at sendu.me.uk Thu Aug 23 10:57:21 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 23 Aug 2007 15:57:21 +0100 Subject: [Bioperl-l] extra rel. 1.6 suggestion In-Reply-To: <3A2C3BFD-2FA1-402B-9597-6E51A72E7096@uiuc.edu> References: <3A2C3BFD-2FA1-402B-9597-6E51A72E7096@uiuc.edu> Message-ID: <46CDA051.40408@sendu.me.uk> Chris Fields wrote: > Maybe the best way out if this is a variation on something that was > suggested before, which was 'splitting' the code into groups. What > if we set up a way to automatically gauge test coverage, > documentation, etc.? If I remember correctly Nathan had something > running at one point which did this. You can generate this yourself by doing ./Build testcover Mauricio was going to sort out having this run daily with the results displayed on the website... Mauricio? The major 'annoyance' is that the coverage results don't get generated if any test fails. But they shouldn't be failing anyway ;) From cain.cshl at gmail.com Thu Aug 23 15:53:37 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 23 Aug 2007 15:53:37 -0400 Subject: [Bioperl-l] SeqFeature/AnnotatableI and rel. 1.6 In-Reply-To: References: Message-ID: <1187898817.2562.19.camel@localhost.localdomain> Hi Chris, GBrowse would be unaffected by this as it doesn't use Bio::SeqFeature::Annotated. The GMOD GFF3 Chado loader on the other hand will almost certainly break horribly, as it depends on the strong typing of Bio::FeatureIO/Bio::SeqFeature::Annotated. If you could try your ideas out in a branch that I could checkout and test on, that would be good. Thanks, Scott On Wed, 2007-08-22 at 23:53 -0500, Chris Fields wrote: > As many of the devs know, there are a number of Feature/Annotation > issues that need to be resolved prior to a 1.6 release: > > http://www.bioperl.org/wiki/Release_Schedule#SeqFeature. > 2FAnnotation_changes:_Keep_or_roll_back.3F > > There has been little work done over the last 2 1/2 years to undo or > rectify problems associated with those additions; I feel like those > of us still routinely contributing have been left holding the bag. > There has also been very little attempt to document any of this > adequately enough; as an example see POD for > Bio::SeqFeature::Annotated (what little there is). > > I would like to suggest the radical idea of rolling back AnnotatableI/ > SeqFeatureI changes to a much simpler rel. 1.4-like behavior (tags > are simple scalars) and possibly work in implementing Ewan's > SeqFeature::TypedSeqFeatureI for those who want strong data types > (i.e. Bio::FeatureIO/Bio::SeqFeature::Annotated). The various > AnnotatableI changes, odd inheritance, and operator overloading have > really obfuscated the code to the point where no one wants to touch > it in case it breaks something important. However, I believe it is > the one serious impediment to a new stable release. > > My thought is we simplify all the relevant interfaces, essentially > reverting back to rel 1.4. For instance, we move the various > Bio::AnnotatableI tag methods back into Bio::SeqFeatureI. > Bio::SeqFeature::Annotated would implement Bio::AnnotatableI > directly, and (if needed) also implement > Bio::SeqFeature::TypedSeqFeatureI, so the impetus is on > Bio::SeqFeature::Annotated to overload the relevant SeqFeatureI > methods correctly, just as any other class would when implementing an > abstract interface. I have played around with this a bit and managed > to get most tests working again for Bio::SeqFeature::Generic and > FeatureIO but a number of others break. > > If needed I can try this out on a branch (a bit ironic, since the > changes instigating this mess should have been tested on a branch!). > Maybe this will get the ball rolling towards a 1.6 release. Any > thoughts? > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070823/11ce47d3/attachment.bin From N.Haigh at sheffield.ac.uk Thu Aug 23 16:32:12 2007 From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 23 Aug 2007 21:32:12 +0100 Subject: [Bioperl-l] extra rel. 1.6 suggestion In-Reply-To: <46CDA051.40408@sendu.me.uk> References: <3A2C3BFD-2FA1-402B-9597-6E51A72E7096@uiuc.edu> <46CDA051.40408@sendu.me.uk> Message-ID: <1187901132.46cdeeccce68d@webmail.shef.ac.uk> Quoting Sendu Bala : > Chris Fields wrote: > > Maybe the best way out if this is a variation on something that was > > suggested before, which was 'splitting' the code into groups. What > > if we set up a way to automatically gauge test coverage, > > documentation, etc.? If I remember correctly Nathan had something > > running at one point which did this. > > You can generate this yourself by doing > ./Build testcover What I did was to patch Devel::Cover to include JavaScript to allow soring of the results by clicking a header in the table. This way, it was easier to find those modules with poor POD coverage, and any other coverage metric. The developer(s) of Devel::Cover are introducing this into their next release, ut who knows when that release will be. I could provide a diff, but we may be able to check out Devel::Cover from cvs/svn until the 0.62 is made. > > Mauricio was going to sort out having this run daily with the results > displayed on the website... Mauricio? > > The major 'annoyance' is that the coverage results don't get generated > if any test fails. But they shouldn't be failing anyway ;) > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Thu Aug 23 17:33:25 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 23 Aug 2007 16:33:25 -0500 Subject: [Bioperl-l] SeqFeature/AnnotatableI and rel. 1.6 In-Reply-To: <1187898817.2562.19.camel@localhost.localdomain> References: <1187898817.2562.19.camel@localhost.localdomain> Message-ID: <38B989E4-34CA-42CD-A608-9D2A095E7ADF@uiuc.edu> Scott, So far most of FeatureIO.t passes, with only a few exceptions dealing with the from_feature method (I know what the problem is there). A large number of other tests crash horribly (not so surprising), so I'll have to go through those. Ergo any changes and testing will definitely be conducted on a branch then merged back to main trunk once everything is okay. I'll probably start a branch in the next few days or so. Here's what I have been working on so far, which I think is reasonable: 1) Move all *_tag_* related methods out of Bio::AnnotatableI and into Bio::SeqFeature::Annotatable. 2) Reinstate the same tag methods in Bio::SeqFeatureI and remove Bio::AnnotatableI from the inheritance tree. 3) Make Bio::SeqFeature::Annotatable Bio::AnnotatableI (which it already was, strangely enough). Now it simple implements the proper methods from the interface classes SeqFeatureI and AnnotatableI. 4) Revert Bio::SeqFeature::Generic tags back to simple untyped strings (reimplement the 1.4 rel methods). I'm interested in seeing whether this results in a significant performance increase in SeqIO since the Annotation instantiation is removed. ToDo: I plan on removing the operator overloading in Bio::Annotation, which was a serious sticking point with most of the devs. This won't be done until after tests pass for everything else. What we will need at some point which I can't provide: Bio::SeqFeature::Annotated has no docs (no synopsis, no description). The reason I bring this up is Sendu and I are seriously considering running an automated code audits in order to gauge which modules lack docs, test coverage, etc.. We're likely splitting those without adequate test/doc coverage off into a separate 'dev' release. chris On Aug 23, 2007, at 2:53 PM, Scott Cain wrote: > Hi Chris, > > GBrowse would be unaffected by this as it doesn't use > Bio::SeqFeature::Annotated. The GMOD GFF3 Chado loader on the other > hand will almost certainly break horribly, as it depends on the strong > typing of Bio::FeatureIO/Bio::SeqFeature::Annotated. If you could try > your ideas out in a branch that I could checkout and test on, that > would > be good. > > Thanks, > Scott > > > On Wed, 2007-08-22 at 23:53 -0500, Chris Fields wrote: >> As many of the devs know, there are a number of Feature/Annotation >> issues that need to be resolved prior to a 1.6 release: >> >> http://www.bioperl.org/wiki/Release_Schedule#SeqFeature. >> 2FAnnotation_changes:_Keep_or_roll_back.3F >> >> There has been little work done over the last 2 1/2 years to undo or >> rectify problems associated with those additions; I feel like those >> of us still routinely contributing have been left holding the bag. >> There has also been very little attempt to document any of this >> adequately enough; as an example see POD for >> Bio::SeqFeature::Annotated (what little there is). >> >> I would like to suggest the radical idea of rolling back >> AnnotatableI/ >> SeqFeatureI changes to a much simpler rel. 1.4-like behavior (tags >> are simple scalars) and possibly work in implementing Ewan's >> SeqFeature::TypedSeqFeatureI for those who want strong data types >> (i.e. Bio::FeatureIO/Bio::SeqFeature::Annotated). The various >> AnnotatableI changes, odd inheritance, and operator overloading have >> really obfuscated the code to the point where no one wants to touch >> it in case it breaks something important. However, I believe it is >> the one serious impediment to a new stable release. >> >> My thought is we simplify all the relevant interfaces, essentially >> reverting back to rel 1.4. For instance, we move the various >> Bio::AnnotatableI tag methods back into Bio::SeqFeatureI. >> Bio::SeqFeature::Annotated would implement Bio::AnnotatableI >> directly, and (if needed) also implement >> Bio::SeqFeature::TypedSeqFeatureI, so the impetus is on >> Bio::SeqFeature::Annotated to overload the relevant SeqFeatureI >> methods correctly, just as any other class would when implementing an >> abstract interface. I have played around with this a bit and managed >> to get most tests working again for Bio::SeqFeature::Generic and >> FeatureIO but a number of others break. >> >> If needed I can try this out on a branch (a bit ironic, since the >> changes instigating this mess should have been tested on a branch!). >> Maybe this will get the ball rolling towards a 1.6 release. Any >> thoughts? >> >> chris >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- > ---------------------------------------------------------------------- > -- > Scott Cain, Ph. D. > cain at cshl.edu > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From smarkel at accelrys.com Thu Aug 23 17:59:37 2007 From: smarkel at accelrys.com (Scott Markel) Date: Thu, 23 Aug 2007 14:59:37 -0700 Subject: [Bioperl-l] SeqFeature/AnnotatableI and rel. 1.6 In-Reply-To: <38B989E4-34CA-42CD-A608-9D2A095E7ADF@uiuc.edu> Message-ID: Chris, Pipeline Pilot's Sequence Analysis Collection wraps BioPerl. Once you think the branch changes have converged a bit we'd be happy to try running our regression suite and report what we find. Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys, Inc. mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com bioperl-l-bounces at lists.open-bio.org wrote on 23.08.2007 14:33:25: > Scott, > > So far most of FeatureIO.t passes, with only a few exceptions dealing > with the from_feature method (I know what the problem is there). A > large number of other tests crash horribly (not so surprising), so > I'll have to go through those. Ergo any changes and testing will > definitely be conducted on a branch then merged back to main trunk > once everything is okay. I'll probably start a branch in the next > few days or so. > > Here's what I have been working on so far, which I think is reasonable: > > 1) Move all *_tag_* related methods out of Bio::AnnotatableI and into > Bio::SeqFeature::Annotatable. > > 2) Reinstate the same tag methods in Bio::SeqFeatureI and remove > Bio::AnnotatableI from the inheritance tree. > > 3) Make Bio::SeqFeature::Annotatable Bio::AnnotatableI (which it > already was, strangely enough). Now it simple implements the proper > methods from the interface classes SeqFeatureI and AnnotatableI. > > 4) Revert Bio::SeqFeature::Generic tags back to simple untyped > strings (reimplement the 1.4 rel methods). > > I'm interested in seeing whether this results in a significant > performance increase in SeqIO since the Annotation instantiation is > removed. > > ToDo: I plan on removing the operator overloading in Bio::Annotation, > which was a serious sticking point with most of the devs. This won't > be done until after tests pass for everything else. > > What we will need at some point which I can't provide: > Bio::SeqFeature::Annotated has no docs (no synopsis, no > description). The reason I bring this up is Sendu and I are > seriously considering running an automated code audits in order to > gauge which modules lack docs, test coverage, etc.. We're likely > splitting those without adequate test/doc coverage off into a > separate 'dev' release. > > chris > > On Aug 23, 2007, at 2:53 PM, Scott Cain wrote: > > > Hi Chris, > > > > GBrowse would be unaffected by this as it doesn't use > > Bio::SeqFeature::Annotated. The GMOD GFF3 Chado loader on the other > > hand will almost certainly break horribly, as it depends on the strong > > typing of Bio::FeatureIO/Bio::SeqFeature::Annotated. If you could try > > your ideas out in a branch that I could checkout and test on, that > > would > > be good. > > > > Thanks, > > Scott > > > > > > On Wed, 2007-08-22 at 23:53 -0500, Chris Fields wrote: > >> As many of the devs know, there are a number of Feature/Annotation > >> issues that need to be resolved prior to a 1.6 release: > >> > >> http://www.bioperl.org/wiki/Release_Schedule#SeqFeature. > >> 2FAnnotation_changes:_Keep_or_roll_back.3F > >> > >> There has been little work done over the last 2 1/2 years to undo or > >> rectify problems associated with those additions; I feel like those > >> of us still routinely contributing have been left holding the bag. > >> There has also been very little attempt to document any of this > >> adequately enough; as an example see POD for > >> Bio::SeqFeature::Annotated (what little there is). > >> > >> I would like to suggest the radical idea of rolling back > >> AnnotatableI/ > >> SeqFeatureI changes to a much simpler rel. 1.4-like behavior (tags > >> are simple scalars) and possibly work in implementing Ewan's > >> SeqFeature::TypedSeqFeatureI for those who want strong data types > >> (i.e. Bio::FeatureIO/Bio::SeqFeature::Annotated). The various > >> AnnotatableI changes, odd inheritance, and operator overloading have > >> really obfuscated the code to the point where no one wants to touch > >> it in case it breaks something important. However, I believe it is > >> the one serious impediment to a new stable release. > >> > >> My thought is we simplify all the relevant interfaces, essentially > >> reverting back to rel 1.4. For instance, we move the various > >> Bio::AnnotatableI tag methods back into Bio::SeqFeatureI. > >> Bio::SeqFeature::Annotated would implement Bio::AnnotatableI > >> directly, and (if needed) also implement > >> Bio::SeqFeature::TypedSeqFeatureI, so the impetus is on > >> Bio::SeqFeature::Annotated to overload the relevant SeqFeatureI > >> methods correctly, just as any other class would when implementing an > >> abstract interface. I have played around with this a bit and managed > >> to get most tests working again for Bio::SeqFeature::Generic and > >> FeatureIO but a number of others break. > >> > >> If needed I can try this out on a branch (a bit ironic, since the > >> changes instigating this mess should have been tested on a branch!). > >> Maybe this will get the ball rolling towards a 1.6 release. Any > >> thoughts? > >> > >> chris > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > > ---------------------------------------------------------------------- > > -- > > Scott Cain, Ph. D. > > cain at cshl.edu > > GMOD Coordinator (http://www.gmod.org/) > > 216-392-3087 > > Cold Spring Harbor Laboratory > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > Click on the link below to report this email as spam > https://www.mailcontrol.com/sr/Z! > PZbyWH8JjiAfutpwULH4r7uW5Ugf1xtM+hyl21+efKtFgsAvNc3weh2hLqBsx8qT3rbOWim! > Vn7A6djKguyK4O2gER4dLr9AKQF+tbnNRe+5lUPSgNICEO3B01XGW5n2DPe! > yEtP3js8LAfwb38Bepj7AEJrDzVAG8yHc2pI5Y2U7! > XHn0N1xbhPb0KSgNCfpTRCAMi3+BBkPbzT1bgrPmgUSJxQ9e From cjfields at uiuc.edu Thu Aug 23 20:39:30 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 23 Aug 2007 19:39:30 -0500 Subject: [Bioperl-l] SeqFeature/AnnotatableI and rel. 1.6 In-Reply-To: References: Message-ID: <241563BB-F96A-4631-B504-F73699FDE84B@uiuc.edu> Having an independent test would be great! The reason I suggest there may be a speedup: one complaint popping up after 1.5 was the slowdown in sequence parsing, which could be related to the 'heavier' objectified tags. chris On Aug 23, 2007, at 4:59 PM, Scott Markel wrote: > Chris, > > Pipeline Pilot's Sequence Analysis Collection wraps BioPerl. > Once you think the branch changes have converged a bit we'd > be happy to try running our regression suite and report what > we find. > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at accelrys.com > Accelrys, Inc. mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > San Diego, CA 92121 fax: +1 858 799 5222 > USA web: http://www.accelrys.com > > > bioperl-l-bounces at lists.open-bio.org wrote on 23.08.2007 14:33:25: > >> Scott, >> >> So far most of FeatureIO.t passes, with only a few exceptions dealing >> with the from_feature method (I know what the problem is there). A >> large number of other tests crash horribly (not so surprising), so >> I'll have to go through those. Ergo any changes and testing will >> definitely be conducted on a branch then merged back to main trunk >> once everything is okay. I'll probably start a branch in the next >> few days or so. >> >> Here's what I have been working on so far, which I think is >> reasonable: >> >> 1) Move all *_tag_* related methods out of Bio::AnnotatableI and into >> Bio::SeqFeature::Annotatable. >> >> 2) Reinstate the same tag methods in Bio::SeqFeatureI and remove >> Bio::AnnotatableI from the inheritance tree. >> >> 3) Make Bio::SeqFeature::Annotatable Bio::AnnotatableI (which it >> already was, strangely enough). Now it simple implements the proper >> methods from the interface classes SeqFeatureI and AnnotatableI. >> >> 4) Revert Bio::SeqFeature::Generic tags back to simple untyped >> strings (reimplement the 1.4 rel methods). >> >> I'm interested in seeing whether this results in a significant >> performance increase in SeqIO since the Annotation instantiation is >> removed. >> >> ToDo: I plan on removing the operator overloading in Bio::Annotation, >> which was a serious sticking point with most of the devs. This won't >> be done until after tests pass for everything else. >> >> What we will need at some point which I can't provide: >> Bio::SeqFeature::Annotated has no docs (no synopsis, no >> description). The reason I bring this up is Sendu and I are >> seriously considering running an automated code audits in order to >> gauge which modules lack docs, test coverage, etc.. We're likely >> splitting those without adequate test/doc coverage off into a >> separate 'dev' release. >> >> chris >> >> On Aug 23, 2007, at 2:53 PM, Scott Cain wrote: >> >>> Hi Chris, >>> >>> GBrowse would be unaffected by this as it doesn't use >>> Bio::SeqFeature::Annotated. The GMOD GFF3 Chado loader on the other >>> hand will almost certainly break horribly, as it depends on the >>> strong >>> typing of Bio::FeatureIO/Bio::SeqFeature::Annotated. If you >>> could try >>> your ideas out in a branch that I could checkout and test on, that >>> would >>> be good. >>> >>> Thanks, >>> Scott >>> >>> >>> On Wed, 2007-08-22 at 23:53 -0500, Chris Fields wrote: >>>> As many of the devs know, there are a number of Feature/Annotation >>>> issues that need to be resolved prior to a 1.6 release: >>>> >>>> http://www.bioperl.org/wiki/Release_Schedule#SeqFeature. >>>> 2FAnnotation_changes:_Keep_or_roll_back.3F >>>> >>>> There has been little work done over the last 2 1/2 years to >>>> undo or >>>> rectify problems associated with those additions; I feel like those >>>> of us still routinely contributing have been left holding the bag. >>>> There has also been very little attempt to document any of this >>>> adequately enough; as an example see POD for >>>> Bio::SeqFeature::Annotated (what little there is). >>>> >>>> I would like to suggest the radical idea of rolling back >>>> AnnotatableI/ >>>> SeqFeatureI changes to a much simpler rel. 1.4-like behavior (tags >>>> are simple scalars) and possibly work in implementing Ewan's >>>> SeqFeature::TypedSeqFeatureI for those who want strong data types >>>> (i.e. Bio::FeatureIO/Bio::SeqFeature::Annotated). The various >>>> AnnotatableI changes, odd inheritance, and operator overloading >>>> have >>>> really obfuscated the code to the point where no one wants to touch >>>> it in case it breaks something important. However, I believe it is >>>> the one serious impediment to a new stable release. >>>> >>>> My thought is we simplify all the relevant interfaces, essentially >>>> reverting back to rel 1.4. For instance, we move the various >>>> Bio::AnnotatableI tag methods back into Bio::SeqFeatureI. >>>> Bio::SeqFeature::Annotated would implement Bio::AnnotatableI >>>> directly, and (if needed) also implement >>>> Bio::SeqFeature::TypedSeqFeatureI, so the impetus is on >>>> Bio::SeqFeature::Annotated to overload the relevant SeqFeatureI >>>> methods correctly, just as any other class would when >>>> implementing an >>>> abstract interface. I have played around with this a bit and >>>> managed >>>> to get most tests working again for Bio::SeqFeature::Generic and >>>> FeatureIO but a number of others break. >>>> >>>> If needed I can try this out on a branch (a bit ironic, since the >>>> changes instigating this mess should have been tested on a >>>> branch!). >>>> Maybe this will get the ball rolling towards a 1.6 release. Any >>>> thoughts? >>>> >>>> chris >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> -- >>> -------------------------------------------------------------------- >>> -- > >>> -- >>> Scott Cain, Ph. D. >>> cain at cshl.edu >>> GMOD Coordinator (http://www.gmod.org/) >>> 216-392-3087 >>> Cold Spring Harbor Laboratory >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> -- >> Click on the link below to report this email as spam >> https://www.mailcontrol.com/sr/Z! >> PZbyWH8JjiAfutpwULH4r7uW5Ugf1xtM+hyl21 >> +efKtFgsAvNc3weh2hLqBsx8qT3rbOWim! >> Vn7A6djKguyK4O2gER4dLr9AKQF+tbnNRe+5lUPSgNICEO3B01XGW5n2DPe! >> yEtP3js8LAfwb38Bepj7AEJrDzVAG8yHc2pI5Y2U7! >> XHn0N1xbhPb0KSgNCfpTRCAMi3+BBkPbzT1bgrPmgUSJxQ9e > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Thu Aug 23 23:34:12 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 23 Aug 2007 23:34:12 -0400 Subject: [Bioperl-l] SeqFeature/AnnotatableI and rel. 1.6 In-Reply-To: References: Message-ID: On Aug 23, 2007, at 12:53 AM, Chris Fields wrote: > There has been little work done over the last 2 1/2 years to undo or > rectify problems associated with those additions; I feel like those > of us still routinely contributing have been left holding the bag. Not by intention, but unfortunately that's probably a fair assessment. (And I'm one of those guilty of inaction.) > [...] > I would like to suggest the radical idea of rolling back AnnotatableI/ > SeqFeatureI changes to a much simpler rel. 1.4-like behavior (tags > are simple scalars) and possibly work in implementing Ewan's > SeqFeature::TypedSeqFeatureI for those who want strong data types > (i.e. Bio::FeatureIO/Bio::SeqFeature::Annotated). I fully support this; to me that sounds exactly like the way to go. > The various AnnotatableI changes, odd inheritance, and operator > overloading have > really obfuscated the code to the point where no one wants to touch > it in case it breaks something important. However, I believe it is > the one serious impediment to a new stable release. Yes, I think you're hitting the nail on the head. Chris, if you take the lead on this and carry it through we will all owe you hugely. I'm not sure how many beers that would compare to, but I'll throw in some. (Who else do I owe beer? I'm losing track. Strangely nobody tried to redeem beer from me in Vienna. Maybe in Toronto?) Seriously, rectifying this problem would lift a huge weight. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From florent.angly at gmail.com Fri Aug 24 00:43:23 2007 From: florent.angly at gmail.com (Florent Angly) Date: Thu, 23 Aug 2007 21:43:23 -0700 Subject: [Bioperl-l] Is it possible to do contig alignments? Message-ID: <46CE61EB.5000300@gmail.com> Dear list members, I would like to "produce" an alignment of a contig, or more exactly visualize it in a such a fashion based on the aligned sequences provided to be by a sequence assembler: Consensus: ACGTACGTTG Sequence1: ACG-AC Sequence2: CGTACGT Sequence3: AC-TTG It sounds like a very trivial task but after searching for a long time, it seems impossible using the methods BioPerl provides. Using the Bio::Align classes, it seems like the only way is if the sequences have the same aligned length, i.e. like this: Consensus: ACGTACGTTG Sequence1: ACG-AC---- Sequence2: -CGTACGT-- Sequence3: ----AC-TTG It's not very satisfactory if I have to pad the sequences with gaps manually. In the context of a phylogenetic alignment, it might make sense, but not for contigs. For assemblies whole sequences are mapped on contigs. Bio::LocatableSeq does not help here because it defines locations _within_ the sequence (the name LocatableSeq was pretty misleading to me). I think it's all very strange that contigs have the coordinates of the aligned sequences composing them but there is no straightforward way to exploit this information. So what's the bottom line? Am I missing something obvious, an out-of-the-box solution? Is it a "missing feature" of BioPerl that is planned to be implemented in the future or that should be requested? Should I pad my sequences with dashes or spaces after assembly? Or is it expected that my aligned reads coming from my assembly be padded with lots of gaps at their beginning and end? What's the BioPerl philosophy here? Thanks for giving me pointers, Florent From bix at sendu.me.uk Fri Aug 24 04:35:23 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 24 Aug 2007 09:35:23 +0100 Subject: [Bioperl-l] Is it possible to do contig alignments? In-Reply-To: <46CE61EB.5000300@gmail.com> References: <46CE61EB.5000300@gmail.com> Message-ID: <46CE984B.3060701@sendu.me.uk> Florent Angly wrote: > Dear list members, > > I would like to "produce" an alignment of a contig, or more exactly > visualize it in a such a fashion based on the aligned sequences provided > to be by a sequence assembler: > > Consensus: ACGTACGTTG > Sequence1: ACG-AC > Sequence2: CGTACGT > Sequence3: AC-TTG > > It sounds like a very trivial task but after searching for a long time, > it seems impossible using the methods BioPerl provides. Isn't Bio::Assembly::Contig what you need? http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Assembly/Contig.html From zhaodj at ioz.ac.cn Fri Aug 24 05:34:07 2007 From: zhaodj at ioz.ac.cn (De-Jian,ZHAO) Date: Fri, 24 Aug 2007 17:34:07 +0800 (CST) Subject: [Bioperl-l] Is it possible to do contig alignments? In-Reply-To: <46CE61EB.5000300@gmail.com> References: <46CE61EB.5000300@gmail.com> Message-ID: <51693.159.226.67.49.1187948047.squirrel@mail.ioz.ac.cn> On Fri, Aug 24, 2007 12:43, Florent Angly wrote: > Dear list members, > > I would like to "produce" an alignment of a contig, or more exactly > visualize it in a such a fashion based on the aligned sequences > provided > to be by a sequence assembler: > > Consensus: ACGTACGTTG > Sequence1: ACG-AC > Sequence2: CGTACGT > Sequence3: AC-TTG > > It sounds like a very trivial task but after searching for a long time, > it seems impossible using the methods BioPerl provides. > > Using the Bio::Align classes, it seems like the only way is if the sequences have the same aligned length, i.e. like this: > > Consensus: ACGTACGTTG > Sequence1: ACG-AC---- > Sequence2: -CGTACGT-- > Sequence3: ----AC-TTG > > It's not very satisfactory if I have to pad the sequences with gaps > manually. In the context of a phylogenetic alignment, it might make > sense, but not for contigs. How do you pad the sequences with gaps manually? Just replace the hyphens with blanks? If yes, you can program in perl to automate this process. > For assemblies whole sequences are mapped on contigs. > Bio::LocatableSeq > does not help here because it defines locations _within_ the > sequence > (the name LocatableSeq was pretty misleading to me). > > I think it's all very strange that contigs have the coordinates of the > aligned sequences composing them but there is no straightforward way > to > exploit this information. > > So what's the bottom line? Am I missing something obvious, an > out-of-the-box solution? Is it a "missing feature" of BioPerl that is > planned to be implemented in the future or that should be requested? > Should I pad my sequences with dashes or spaces after assembly? Or is it > expected that my aligned reads coming from my assembly be padded with > lots of gaps at their beginning and end? What's the BioPerl > philosophy here? > > Thanks for giving me pointers, > > Florent > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- De-Jian Zhao Institute of Zoology,Chinese Academy of Sciences +86-10-64807217 zhaodj at ioz.ac.cn From marian.thieme at arcor.de Fri Aug 24 06:05:55 2007 From: marian.thieme at arcor.de (Marian Thieme) Date: Fri, 24 Aug 2007 12:05:55 +0200 Subject: [Bioperl-l] ReseqChip, module/package name Message-ID: <46CEAD83.2050904@arcor.de> Hi, 2 questions about the naming of the module I did submit (see http://bugzilla.open-bio.org/show_bug.cgi?id=2332) 1.) The package: because there exists already an expression package I suggest to create a new package called resequencing 2.) I would suggest that the module is called RedundantFragments or AdditionalFragments so we would have something like: Bio::Resequencing::AdditionalFragments Any other ideas ? Marian By the way can anybody change my email adress to marian.thieme at arcor.de in bugzilla as well as in the bioperl list, please ?!! didnt achieve that by my own... From mcons004 at fiu.edu Thu Aug 23 23:30:44 2007 From: mcons004 at fiu.edu (mcons004 at fiu.edu) Date: Thu, 23 Aug 2007 23:30:44 -0400 (EDT) Subject: [Bioperl-l] please some help Message-ID: <20070823233044.BJQ45014@mailstore2.fiu.edu> Hello, I am new to this software and I am having some trouble starting. The version of Bioperl I am working on is v5.8.6. My OS is Unix (Mac OS X). I am trying to use Bioperl with a file called blastParser to process a file which is the output of a "blastall" operation. The code that gives me error is: > perl blastParser.pl junk.out 1 1 1.0 and the error message says: Can't locate Bio/SearchIO.pm in @INC (@INC contains: /System/Library/Perl/5.8.6/darwin-thread-multi-2level You online info says I probably means that the module Bio::SearchIO.pm is not instaled and I can either install Bundle::Bioperl or install that specific module by hand. Could you give me some tips in this? I am new working with Unix, and Bioperl so I am a little confused. Any information will be helpful for me. Thanks From bix at sendu.me.uk Fri Aug 24 10:38:39 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 24 Aug 2007 15:38:39 +0100 Subject: [Bioperl-l] please some help In-Reply-To: <20070823233044.BJQ45014@mailstore2.fiu.edu> References: <20070823233044.BJQ45014@mailstore2.fiu.edu> Message-ID: <46CEED6F.1080101@sendu.me.uk> mcons004 at fiu.edu wrote: > Hello, I am new to this software and I am having some trouble > starting. The version of Bioperl I am working on is v5.8.6. My OS is > Unix (Mac OS X). I am trying to use Bioperl with a file called > blastParser to process a file which is the output of a "blastall" > operation. > > The code that gives me error is: >> perl blastParser.pl junk.out 1 1 1.0 > and the error message says: Can't locate Bio/SearchIO.pm in @INC > (@INC contains: /System/Library/Perl/5.8.6/darwin-thread-multi-2level > > > You online info says I probably means that the module > Bio::SearchIO.pm is not instaled and I can either install > Bundle::Bioperl or install that specific module by hand. Could you > give me some tips in this? I am new working with Unix, and Bioperl so > I am a little confused. You need to install Bioperl first. You can find instructions here: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix If this is your own Mac (you have the root/admin password), when it tells you to run cpan (">perl -MCPAN -e shell" or ">cpan"), start the command with 'sudo'. So: >sudo cpan From florent.angly at gmail.com Fri Aug 24 12:07:04 2007 From: florent.angly at gmail.com (Florent Angly) Date: Fri, 24 Aug 2007 09:07:04 -0700 Subject: [Bioperl-l] Is it possible to do contig alignments? In-Reply-To: <51693.159.226.67.49.1187948047.squirrel@mail.ioz.ac.cn> References: <46CE61EB.5000300@gmail.com> <51693.159.226.67.49.1187948047.squirrel@mail.ioz.ac.cn> Message-ID: <46CF0228.2000404@gmail.com> Thanks for all the replies. Sendu Bala wrote: > Isn't Bio::Assembly::Contig what you need? > > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Assembly/Contig.html > I'm using this module already to manipulate the contigs, but there's no option that I know of to _display_ the contigs in the way I described. (Sorry, the title of my email was misleading.) De-Jian,ZHAO wrote: > How do you pad the sequences with gaps manually? Just replace the > hyphens with blanks? If yes, you can program in perl to automate > this process. > How do I pad the sequences manually?? I calculate how many gaps have to go left and right of the aligned sequence based on its length, its position in the aligned consensus and the consensus length. my $newseq = '-' x $leftnum . $seq . '-'x$rightnum By the way, the sequences cannot be stored with blanks in them... I think the best way to provide an out-of-the-box solution for displaying contigs the described way would be to _not_ use Bio::Align at all, but rather to create a new assembly IO module like Bio::Assembly::IO::simpleout for example. That would be useful. The reason I wanted to visualize these contigs is because I made a Bio::Assembly::IO module for TIGR Assembler files that I intend on submitting to BioPerl. I wanted to make sure first that I did not have any obvious bug in my contig coordinates. I've read the documentation on the Wiki so if a BioPerl developer would please like lo step up and contact me directly for checking my code, that would be nice =) Florent From cjfields at uiuc.edu Fri Aug 24 12:07:36 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 24 Aug 2007 11:07:36 -0500 Subject: [Bioperl-l] Bio::Expression & Re: ReseqChip, module/package name In-Reply-To: <46CEAD83.2050904@arcor.de> References: <46CEAD83.2050904@arcor.de> Message-ID: <03D7F0EB-3BC2-4988-B67F-09C4225EAE13@uiuc.edu> Marian, First, apologies about not getting on this sooner. It's shaping up to be a busy year! The new package: How about Bio::Expression::Tools::MitoChip? My reasoning: I don't think it's necessary to define a new Bio::Resequencing namespace for just one module; my inclination is towards using Bio::Expression namespace as Bio::Tools have been traditionally reserved for output parsers. I am unsure what the Bio::Expression status is (very little is documented, no tests are written, nothing on the mail list archives); maybe Allen can answer that? I don't see anything that precludes you from using that namespace as long as your tools are fairly well-defined (they are) and have tests (they do). Also, your module deals with doing one specific thing (extraction and incorporation of information about redundant fragments) for the Affy MitoChip. It might be worth genericizing the class a bit so that you can add new parser or analysis methods w/o having to define new classes to deal with the same Mitochip data. Mail list: The mail list subscription page (http://bioperl.org/ mailman/listinfo/bioperl-l) allows you to subscribe or change subscription options (at the bottom of the page). Bugzilla: if you are logged into Bugzilla under your old email, there is an option at the bottom of the page (Edit : Prefs) where you can change your email address and other preferences. chris On Aug 24, 2007, at 5:05 AM, Marian Thieme wrote: > Hi, > > 2 questions about the naming of the module I did submit > (see http://bugzilla.open-bio.org/show_bug.cgi?id=2332) > > 1.) The package: > because there exists already an expression package I suggest to > create a > new package called resequencing > > 2.) I would suggest that the module is called RedundantFragments or > AdditionalFragments > > so we would have something like: > > Bio::Resequencing::AdditionalFragments > > Any other ideas ? > > Marian > > By the way can anybody change my email adress to > marian.thieme at arcor.de > in bugzilla as well as in the bioperl list, please ?!! didnt achieve > that by my own... > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Fri Aug 24 12:23:12 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 24 Aug 2007 11:23:12 -0500 Subject: [Bioperl-l] SeqFeature/AnnotatableI and rel. 1.6 In-Reply-To: References: Message-ID: <4F5FD173-FC80-4F70-B294-83DA58FDCE64@uiuc.edu> On Aug 23, 2007, at 10:34 PM, Hilmar Lapp wrote: > On Aug 23, 2007, at 12:53 AM, Chris Fields wrote: > >> There has been little work done over the last 2 1/2 years to undo or >> rectify problems associated with those additions; I feel like those >> of us still routinely contributing have been left holding the bag. > > Not by intention, but unfortunately that's probably a fair > assessment. (And I'm one of those guilty of inaction.) Not completely. You, Jason, Chris M., and several others expressed yourselves quite clearly (move the code to a branch and test). I think that everyone was trying to be diplomatic about it and so never attempted to do anything except get it working correctly. >> [...] >> I would like to suggest the radical idea of rolling back >> AnnotatableI/ >> SeqFeatureI changes to a much simpler rel. 1.4-like behavior (tags >> are simple scalars) and possibly work in implementing Ewan's >> SeqFeature::TypedSeqFeatureI for those who want strong data types >> (i.e. Bio::FeatureIO/Bio::SeqFeature::Annotated). > > I fully support this; to me that sounds exactly like the way to go. Okay, I'll probably go ahead and get a branch started today. I'll have to look at Ewan's interface in more detail; it's possible a new SeqFeature implementation will need to be written up to incorporate it. >> The various AnnotatableI changes, odd inheritance, and operator >> overloading have >> really obfuscated the code to the point where no one wants to touch >> it in case it breaks something important. However, I believe it is >> the one serious impediment to a new stable release. > > Yes, I think you're hitting the nail on the head. > > Chris, if you take the lead on this and carry it through we will > all owe you hugely. I'm not sure how many beers that would compare > to, but I'll throw in some. (Who else do I owe beer? I'm losing > track. Strangely nobody tried to redeem beer from me in Vienna. > Maybe in Toronto?) > > Seriously, rectifying this problem would lift a huge weight. > > -hilmar It would be nice to get regular releases started again. I think this'll help. chris From marian.thieme at arcor.de Fri Aug 24 13:01:07 2007 From: marian.thieme at arcor.de (Marian Thieme) Date: Fri, 24 Aug 2007 19:01:07 +0200 Subject: [Bioperl-l] Bio::Expression & Re: ReseqChip, module/package name Message-ID: <46CF0ED3.8000708@arcor.de> > The new package: How about Bio::Expression::Tools::MitoChip? My > reasoning: I don't think it's necessary to define a new > Bio::Resequencing namespace for just one module; my inclination is > towards using Bio::Expression namespace as Bio::Tools have been > traditionally reserved for output parsers. I am unsure what the > Bio::Expression status is (very little is documented, no tests are > written, nothing on the mail list archives); maybe Allen can answer > that? I don't see anything that precludes you from using that > namespace as long as your tools are fairly well-defined (they are) > and have tests (they do). The problem I see, with Bio::Expression, is that Resequencing chips are not belongs to Expression chips. (Expression chips are designed to hybridisize RNA strands and hence measure RNA expression levels, on the other hand a resequencing chip is based on DNA, also the design and the probe length is quite different). So, from my point of view it make sence to differ between dna and rna chips, at least. > > Also, your module deals with doing one specific thing (extraction and > incorporation of information about redundant fragments) for the Affy > MitoChip. It might be worth genericizing the class a bit so that you > can add new parser or analysis methods w/o having to define new > classes to deal with the same Mitochip data. OK, need to think about that. > > Mail list: The mail list subscription page (http://bioperl.org/ > mailman/listinfo/bioperl-l) allows you to subscribe or change > subscription options (at the bottom of the page). > cleared > Bugzilla: if you are logged into Bugzilla under your old email, there > is an option at the bottom of the page (Edit : Prefs) where you can > change your email address and other preferences. > unfortunatly I dont recieve a mail to confirm the change. did try that twice.. Marian From bix at sendu.me.uk Fri Aug 24 12:43:22 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 24 Aug 2007 17:43:22 +0100 Subject: [Bioperl-l] Is it possible to do contig alignments? In-Reply-To: <46CF0228.2000404@gmail.com> References: <46CE61EB.5000300@gmail.com> <51693.159.226.67.49.1187948047.squirrel@mail.ioz.ac.cn> <46CF0228.2000404@gmail.com> Message-ID: <46CF0AAA.4090301@sendu.me.uk> Florent Angly wrote: > Thanks for all the replies. > > Sendu Bala wrote: > >> Isn't Bio::Assembly::Contig what you need? > > I'm using this module already to manipulate the contigs, but there's > no option that I know of to _display_ the contigs in the way I > described. [snip] > I think the best way to provide an out-of-the-box solution for > displaying contigs the described way would be to _not_ use Bio::Align > at all, but rather to create a new assembly IO module like > Bio::Assembly::IO::simpleout for example. That would be useful. Yes... > The reason I wanted to visualize these contigs is because I made a > Bio::Assembly::IO module for TIGR Assembler files that I intend on > submitting to BioPerl. That's wonderful... might I cheekily suggest that the solution to your problem is to extend your IO module so that it does the 'O' as well? Ie. unlike the other IO modules, write_assembly() is actually implemented. Then you can round-trip to ensure your next_assembly() method has no bugs. > I've read the documentation on the Wiki so if a BioPerl developer > would please like lo step up and contact me directly for checking my > code, that would be nice =) If no one does, post it as an enhancement request to bugzilla. A test script is a must. http://www.bioperl.org/wiki/HOWTO:Writing_BioPerl_Tests From cjfields at uiuc.edu Fri Aug 24 13:16:10 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 24 Aug 2007 12:16:10 -0500 Subject: [Bioperl-l] Is it possible to do contig alignments? In-Reply-To: <46CF0228.2000404@gmail.com> References: <46CE61EB.5000300@gmail.com> <51693.159.226.67.49.1187948047.squirrel@mail.ioz.ac.cn> <46CF0228.2000404@gmail.com> Message-ID: <32D5D3FF-D0A5-4EEB-BA5E-B0087CC64B19@uiuc.edu> On Aug 24, 2007, at 11:07 AM, Florent Angly wrote: ... > De-Jian,ZHAO wrote: >> How do you pad the sequences with gaps manually? Just replace the >> hyphens with blanks? If yes, you can program in perl to automate >> this process. >> > How do I pad the sequences manually?? I calculate how many gaps > have to > go left and right of the aligned sequence based on its length, its > position in the aligned consensus and the consensus length. > my $newseq = '-' x $leftnum . $seq . '-'x$rightnum > By the way, the sequences cannot be stored with blanks in them... > > I think the best way to provide an out-of-the-box solution for > displaying contigs the described way would be to _not_ use > Bio::Align at > all, but rather to create a new assembly IO module like > Bio::Assembly::IO::simpleout for example. That would be useful. > > The reason I wanted to visualize these contigs is because I made a > Bio::Assembly::IO module for TIGR Assembler files that I intend on > submitting to BioPerl. I wanted to make sure first that I did not have > any obvious bug in my contig coordinates. I've read the > documentation on > the Wiki so if a BioPerl developer would please like lo step up and > contact me directly for checking my code, that would be nice =) > > Florent A similar question has been previously asked on the same subject: http://thread.gmane.org/gmane.comp.lang.perl.bio.general/2827/focus=2869 Jason's suggestion was to have a Bio::Assembly::Contig method get_aln () which produces a Bio::SimpleAlign object containing appropriately padded seqs compatible for AlignIO output. However, the method was never implemented. Personally, the way I would try going about this would be to implement the Contig::get_aln() method, padding with bioperl- compliant alignment gap symbols (currently -.*?=~), so if anyone wanted they could write to any AlignIO-implemented format (MSF, Clustal, etc). In your Bio::Assembly::IO::simpleout module implement write_assembly() and use the Contig::get_aln() method where needed to grab the SimpleAlign, then simply substitute gap symbols with spaces when writing contig output. In general, any new code is attached to a bugzilla report as an enhancement request: http://bugzilla.open-bio.org/ One of the devs will work on getting the code incorporated into bioperl. Make sure the code is documented (http://www.bioperl.org/ wiki/Advanced_BioPerl), and attach appropriate tests (http:// www.bioperl.org/wiki/HOWTO:Writing_BioPerl_Tests) and test data. chris From cjfields at uiuc.edu Fri Aug 24 13:20:16 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 24 Aug 2007 12:20:16 -0500 Subject: [Bioperl-l] Bio::Expression & Re: ReseqChip, module/package name In-Reply-To: <9824900.1187973171940.JavaMail.ngmail@webmail17> References: <03D7F0EB-3BC2-4988-B67F-09C4225EAE13@uiuc.edu> <46CEAD83.2050904@arcor.de> <9824900.1187973171940.JavaMail.ngmail@webmail17> Message-ID: On Aug 24, 2007, at 11:32 AM, marian.thieme at arcor.de wrote: >> ... > The problem I see, with Bio::Expression, is that Resequencing chips > are not belongs to Expression chips. > (Expression chips are designed to hybridisize RNA strands and hence > measure RNA expression levels, on the other hand a resequencing > chip is based on DNA, also the design and the probe length is quite > different). So, from my point of view it make sence to differ > between dna and rna chips, at least. Then maybe the more generic Bio::Microarray namespace is the way to go, with the module name Bio::Microarray::Tools:: MitoChip. If needed other tools can be added as needed. >> Also, your module deals with doing one specific thing (extraction and >> incorporation of information about redundant fragments) for the Affy >> MitoChip. It might be worth genericizing the class a bit so that you >> can add new parser or analysis methods w/o having to define new >> classes to deal with the same Mitochip data. > > OK, need to think about that. It all depends on how much you intend to contribute; if you plan on adding to it over time we can talk about starting up a developer account. >> Mail list: The mail list subscription page (http://bioperl.org/ >> mailman/listinfo/bioperl-l) allows you to subscribe or change >> subscription options (at the bottom of the page). >> > cleared > >> Bugzilla: if you are logged into Bugzilla under your old email, there >> is an option at the bottom of the page (Edit : Prefs) where you can >> change your email address and other preferences. >> > unfortunatly I dont recieve a mail to confirm the change. did try > that twice.. > > > Marian I tested it out and received the email at both addresses (as it states). If you respond to either email it should implement the change in three days time. If it doesn't you can email support at open.bio.org to see if there is a problem. chris From florent.angly at gmail.com Fri Aug 24 13:58:13 2007 From: florent.angly at gmail.com (Florent Angly) Date: Fri, 24 Aug 2007 10:58:13 -0700 Subject: [Bioperl-l] Is it possible to do contig alignments? In-Reply-To: <32D5D3FF-D0A5-4EEB-BA5E-B0087CC64B19@uiuc.edu> References: <46CE61EB.5000300@gmail.com> <51693.159.226.67.49.1187948047.squirrel@mail.ioz.ac.cn> <46CF0228.2000404@gmail.com> <32D5D3FF-D0A5-4EEB-BA5E-B0087CC64B19@uiuc.edu> Message-ID: <46CF1C35.3050100@gmail.com> Chris Fields wrote: > > A similar question has been previously asked on the same subject: > > http://thread.gmane.org/gmane.comp.lang.perl.bio.general/2827/focus=2869 > > Jason's suggestion was to have a Bio::Assembly::Contig method > get_aln() which produces a Bio::SimpleAlign object containing > appropriately padded seqs compatible for AlignIO output. However, the > method was never implemented. > > Personally, the way I would try going about this would be to implement > the Contig::get_aln() method, padding with bioperl-compliant alignment > gap symbols (currently -.*?=~), so if anyone wanted they could write > to any AlignIO-implemented format (MSF, Clustal, etc). In your > Bio::Assembly::IO::simpleout module implement write_assembly() and use > the Contig::get_aln() method where needed to grab the SimpleAlign, > then simply substitute gap symbols with spaces when writing contig > output. > > In general, any new code is attached to a bugzilla report as an > enhancement request: > > http://bugzilla.open-bio.org/ > > One of the devs will work on getting the code incorporated into > bioperl. Make sure the code is documented > (http://www.bioperl.org/wiki/Advanced_BioPerl), and attach appropriate > tests (http://www.bioperl.org/wiki/HOWTO:Writing_BioPerl_Tests) and > test data. > > chris > > Thanks Chris for the pointers, I will be looking into these things. Florent From hlapp at gmx.net Fri Aug 24 14:25:57 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 24 Aug 2007 14:25:57 -0400 Subject: [Bioperl-l] Bio::Expression & Re: ReseqChip, module/package name In-Reply-To: References: <03D7F0EB-3BC2-4988-B67F-09C4225EAE13@uiuc.edu> <46CEAD83.2050904@arcor.de> <9824900.1187973171940.JavaMail.ngmail@webmail17> Message-ID: On Aug 24, 2007, at 1:20 PM, Chris Fields wrote: >>> ... >> The problem I see, with Bio::Expression, is that Resequencing chips >> are not belongs to Expression chips. >> (Expression chips are designed to hybridisize RNA strands and hence >> measure RNA expression levels, on the other hand a resequencing >> chip is based on DNA, also the design and the probe length is quite >> different). So, from my point of view it make sence to differ >> between dna and rna chips, at least. > > Then maybe the more generic Bio::Microarray namespace is the way to > go, with the module name Bio::Microarray::Tools:: MitoChip. If > needed other tools can be added as needed. > Makes sense to me too. Presumably, regardless of DNA or RNA being hybridized or length of probes, the data that comes out of them is quite similar in a general nature (namely hybridization signals)? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From marian.thieme at arcor.de Fri Aug 24 12:32:51 2007 From: marian.thieme at arcor.de (marian.thieme at arcor.de) Date: Fri, 24 Aug 2007 18:32:51 +0200 (CEST) Subject: [Bioperl-l] Bio::Expression & Re: ReseqChip, module/package name In-Reply-To: <03D7F0EB-3BC2-4988-B67F-09C4225EAE13@uiuc.edu> References: <03D7F0EB-3BC2-4988-B67F-09C4225EAE13@uiuc.edu> <46CEAD83.2050904@arcor.de> Message-ID: <9824900.1187973171940.JavaMail.ngmail@webmail17> > The new package: How about Bio::Expression::Tools::MitoChip? My > reasoning: I don't think it's necessary to define a new > Bio::Resequencing namespace for just one module; my inclination is > towards using Bio::Expression namespace as Bio::Tools have been > traditionally reserved for output parsers. I am unsure what the > Bio::Expression status is (very little is documented, no tests are > written, nothing on the mail list archives); maybe Allen can answer > that? I don't see anything that precludes you from using that > namespace as long as your tools are fairly well-defined (they are) > and have tests (they do). The problem I see, with Bio::Expression, is that Resequencing chips are not belongs to Expression chips. (Expression chips are designed to hybridisize RNA strands and hence measure RNA expression levels, on the other hand a resequencing chip is based on DNA, also the design and the probe length is quite different). So, from my point of view it make sence to differ between dna and rna chips, at least. > > Also, your module deals with doing one specific thing (extraction and > incorporation of information about redundant fragments) for the Affy > MitoChip. It might be worth genericizing the class a bit so that you > can add new parser or analysis methods w/o having to define new > classes to deal with the same Mitochip data. OK, need to think about that. > > Mail list: The mail list subscription page (http://bioperl.org/ > mailman/listinfo/bioperl-l) allows you to subscribe or change > subscription options (at the bottom of the page). > cleared > Bugzilla: if you are logged into Bugzilla under your old email, there > is an option at the bottom of the page (Edit : Prefs) where you can > change your email address and other preferences. > unfortunatly I dont recieve a mail to confirm the change. did try that twice.. Marian > On Aug 24, 2007, at 5:05 AM, Marian Thieme wrote: > > > Hi, > > > > 2 questions about the naming of the module I did submit > > (see http://bugzilla.open-bio.org/show_bug.cgi?id=2332) > > > > 1.) The package: > > because there exists already an expression package I suggest to > > create a > > new package called resequencing > > > > 2.) I would suggest that the module is called RedundantFragments or > > AdditionalFragments > > > > so we would have something like: > > > > Bio::Resequencing::AdditionalFragments > > > > Any other ideas ? > > > > Marian > > > > By the way can anybody change my email adress to > > marian.thieme at arcor.de > > in bugzilla as well as in the bioperl list, please ?!! didnt achieve > > that by my own... > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT F?R ALLE NEUEINSTEIGER Jetzt bei Arcor: g?nstig und schnell mit DSL - das All-Inclusive-Paket f?r clevere Doppel-Sparer, nur 34,95 ? inkl. DSL- und ISDN-Grundgeb?hr! http://www.arcor.de/rd/emf-dsl-2 From cjfields at uiuc.edu Fri Aug 24 17:12:25 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 24 Aug 2007 16:12:25 -0500 Subject: [Bioperl-l] SeqFeature/AnnotatableI and rel. 1.6 In-Reply-To: <4F5FD173-FC80-4F70-B294-83DA58FDCE64@uiuc.edu> References: <4F5FD173-FC80-4F70-B294-83DA58FDCE64@uiuc.edu> Message-ID: Okay, I have started a new branch in cvs (tagged featann_rollback). I'll start looking through everything within the next few days to get a general idea of what needs to be done. All I know is the changes were extensive and included modifications to tests. If anyone has comments I have added a wiki page here: http://www.bioperl.org/wiki/Feature_Annotation_rollback chris On Aug 24, 2007, at 11:23 AM, Chris Fields wrote: > On Aug 23, 2007, at 10:34 PM, Hilmar Lapp wrote: > >> On Aug 23, 2007, at 12:53 AM, Chris Fields wrote: >> >>> There has been little work done over the last 2 1/2 years to undo or >>> rectify problems associated with those additions; I feel like those >>> of us still routinely contributing have been left holding the bag. >> >> Not by intention, but unfortunately that's probably a fair >> assessment. (And I'm one of those guilty of inaction.) > > Not completely. You, Jason, Chris M., and several others expressed > yourselves quite clearly (move the code to a branch and test). I > think that everyone was trying to be diplomatic about it and so never > attempted to do anything except get it working correctly. > >>> [...] >>> I would like to suggest the radical idea of rolling back >>> AnnotatableI/ >>> SeqFeatureI changes to a much simpler rel. 1.4-like behavior (tags >>> are simple scalars) and possibly work in implementing Ewan's >>> SeqFeature::TypedSeqFeatureI for those who want strong data types >>> (i.e. Bio::FeatureIO/Bio::SeqFeature::Annotated). >> >> I fully support this; to me that sounds exactly like the way to go. > > Okay, I'll probably go ahead and get a branch started today. I'll > have to look at Ewan's interface in more detail; it's possible a new > SeqFeature implementation will need to be written up to incorporate > it. > >>> The various AnnotatableI changes, odd inheritance, and operator >>> overloading have >>> really obfuscated the code to the point where no one wants to touch >>> it in case it breaks something important. However, I believe it is >>> the one serious impediment to a new stable release. >> >> Yes, I think you're hitting the nail on the head. >> >> Chris, if you take the lead on this and carry it through we will >> all owe you hugely. I'm not sure how many beers that would compare >> to, but I'll throw in some. (Who else do I owe beer? I'm losing >> track. Strangely nobody tried to redeem beer from me in Vienna. >> Maybe in Toronto?) >> >> Seriously, rectifying this problem would lift a huge weight. >> >> -hilmar > > It would be nice to get regular releases started again. I think > this'll help. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From marian at arcor.de Fri Aug 24 14:48:20 2007 From: marian at arcor.de (marian) Date: Fri, 24 Aug 2007 20:48:20 +0200 Subject: [Bioperl-l] Bio::Expression & Re: ReseqChip, module/package name In-Reply-To: References: <03D7F0EB-3BC2-4988-B67F-09C4225EAE13@uiuc.edu> <46CEAD83.2050904@arcor.de> <9824900.1187973171940.JavaMail.ngmail@webmail17> Message-ID: <46CF27F4.8030608@arcor.de> Hilmar Lapp schrieb: > On Aug 24, 2007, at 1:20 PM, Chris Fields wrote: > > >>>> ... >>>> >>> The problem I see, with Bio::Expression, is that Resequencing chips >>> are not belongs to Expression chips. >>> (Expression chips are designed to hybridisize RNA strands and hence >>> measure RNA expression levels, on the other hand a resequencing >>> chip is based on DNA, also the design and the probe length is quite >>> different). So, from my point of view it make sence to differ >>> between dna and rna chips, at least. >>> >> Then maybe the more generic Bio::Microarray namespace is the way to >> go, with the module name Bio::Microarray::Tools:: MitoChip. If >> needed other tools can be added as needed. >> >> > > Makes sense to me too. Presumably, regardless of DNA or RNA being > hybridized or length of probes, the data that comes out of them is > quite similar in a general nature (namely hybridization signals)? > > -hilmar > Bio::Microarray::Tools::MitoChip would be OK to me. I merely meant, that it isnt an expression chip and you also wont/cant analyze expression data with the tool I am talking about. Marian From cjfields at uiuc.edu Fri Aug 24 18:36:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 24 Aug 2007 17:36:46 -0500 Subject: [Bioperl-l] undef SeqFeature tag values Message-ID: <88A352F1-EC1A-44FA-90DA-B869FF965F86@uiuc.edu> One thing I am noticing with the rollback to tag as strings is that tags with an undefined value are not set; I'm assuming when tags were Bio::AnnotationI they were instantiated regardless with an undef value. When attempting to call an undef tag with get_tag_values() I get: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: asking for tag value that does not exist signalPeptideLength STACK: Error::throw STACK: Bio::Root::Root::throw /Users/cjfields/src/featann_rollback/ bioperl-live/blib/lib/Bio/Root/Root.pm:357 STACK: Bio::SeqFeature::Generic::get_tag_values /Users/cjfields/src/ featann_rollback/bioperl-live/blib/lib/Bio/SeqFeature/Generic.pm:499 STACK: t/targetp.t:189 ----------------------------------------------------------- I personally think of this as a feature (why set a tag at all if it is undef?). However, are there any circumstances where we might want this behavior? Do we want to simply return w/o a value if a tag name isn't found (i.e. remove the exception)? chris From hlapp at gmx.net Fri Aug 24 19:02:43 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 24 Aug 2007 19:02:43 -0400 Subject: [Bioperl-l] undef SeqFeature tag values In-Reply-To: <88A352F1-EC1A-44FA-90DA-B869FF965F86@uiuc.edu> References: <88A352F1-EC1A-44FA-90DA-B869FF965F86@uiuc.edu> Message-ID: <7F5FDC98-24A6-4B74-A374-16780F9A5CC9@gmx.net> You're supposed to call has_tag() first before you can assume that you can call get_tag_values() w/o an exception. That was the original API. -hilmar On Aug 24, 2007, at 6:36 PM, Chris Fields wrote: > One thing I am noticing with the rollback to tag as strings is that > tags with an undefined value are not set; I'm assuming when tags were > Bio::AnnotationI they were instantiated regardless with an undef > value. When attempting to call an undef tag with get_tag_values() I > get: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: asking for tag value that does not exist signalPeptideLength > STACK: Error::throw > STACK: Bio::Root::Root::throw /Users/cjfields/src/featann_rollback/ > bioperl-live/blib/lib/Bio/Root/Root.pm:357 > STACK: Bio::SeqFeature::Generic::get_tag_values /Users/cjfields/src/ > featann_rollback/bioperl-live/blib/lib/Bio/SeqFeature/Generic.pm:499 > STACK: t/targetp.t:189 > ----------------------------------------------------------- > > I personally think of this as a feature (why set a tag at all if it > is undef?). However, are there any circumstances where we might want > this behavior? Do we want to simply return w/o a value if a tag name > isn't found (i.e. remove the exception)? > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Sat Aug 25 00:05:58 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 24 Aug 2007 23:05:58 -0500 Subject: [Bioperl-l] undef SeqFeature tag values In-Reply-To: <7F5FDC98-24A6-4B74-A374-16780F9A5CC9@gmx.net> References: <88A352F1-EC1A-44FA-90DA-B869FF965F86@uiuc.edu> <7F5FDC98-24A6-4B74-A374-16780F9A5CC9@gmx.net> Message-ID: <6392DF1D-D91B-4B6E-812B-38FC0EA0D234@uiuc.edu> Makes sense. Okay, I'll leave the exception in. Thanks! chris On Aug 24, 2007, at 6:02 PM, Hilmar Lapp wrote: > You're supposed to call has_tag() first before you can assume that > you can call get_tag_values() w/o an exception. That was the original > API. > > -hilmar > > On Aug 24, 2007, at 6:36 PM, Chris Fields wrote: > >> One thing I am noticing with the rollback to tag as strings is that >> tags with an undefined value are not set; I'm assuming when tags were >> Bio::AnnotationI they were instantiated regardless with an undef >> value. When attempting to call an undef tag with get_tag_values() I >> get: >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: asking for tag value that does not exist signalPeptideLength >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /Users/cjfields/src/featann_rollback/ >> bioperl-live/blib/lib/Bio/Root/Root.pm:357 >> STACK: Bio::SeqFeature::Generic::get_tag_values /Users/cjfields/src/ >> featann_rollback/bioperl-live/blib/lib/Bio/SeqFeature/Generic.pm:499 >> STACK: t/targetp.t:189 >> ----------------------------------------------------------- >> >> I personally think of this as a feature (why set a tag at all if it >> is undef?). However, are there any circumstances where we might want >> this behavior? Do we want to simply return w/o a value if a tag name >> isn't found (i.e. remove the exception)? >> >> chris >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From n.haigh at sheffield.ac.uk Sat Aug 25 03:50:29 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Sat, 25 Aug 2007 08:50:29 +0100 Subject: [Bioperl-l] undef SeqFeature tag values In-Reply-To: <7F5FDC98-24A6-4B74-A374-16780F9A5CC9@gmx.net> References: <88A352F1-EC1A-44FA-90DA-B869FF965F86@uiuc.edu> <7F5FDC98-24A6-4B74-A374-16780F9A5CC9@gmx.net> Message-ID: <46CFDF45.8030200@sheffield.ac.uk> This sort of highlights a comment I made previously about how do you test for a stable API? It seems to me that unless you have intricate knowledge about the changes that took place, you will find it difficult to know when an API change has occurred. Is it possible to run the 1.4 test suite against existing code to ensure tests pass? What if the 1.4 tests contained bugs? This approach would need good code coverage by the tests to ensure things work the same i.e. test code in HEAD against the test suite from the previous stable release's branch - would/should this work conceptually?** Nath Hilmar Lapp wrote: > You're supposed to call has_tag() first before you can assume that > you can call get_tag_values() w/o an exception. That was the original > API. > > -hilmar > > On Aug 24, 2007, at 6:36 PM, Chris Fields wrote: > > >> One thing I am noticing with the rollback to tag as strings is that >> tags with an undefined value are not set; I'm assuming when tags were >> Bio::AnnotationI they were instantiated regardless with an undef >> value. When attempting to call an undef tag with get_tag_values() I >> get: >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: asking for tag value that does not exist signalPeptideLength >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /Users/cjfields/src/featann_rollback/ >> bioperl-live/blib/lib/Bio/Root/Root.pm:357 >> STACK: Bio::SeqFeature::Generic::get_tag_values /Users/cjfields/src/ >> featann_rollback/bioperl-live/blib/lib/Bio/SeqFeature/Generic.pm:499 >> STACK: t/targetp.t:189 >> ----------------------------------------------------------- >> >> I personally think of this as a feature (why set a tag at all if it >> is undef?). However, are there any circumstances where we might want >> this behavior? Do we want to simply return w/o a value if a tag name >> isn't found (i.e. remove the exception)? >> >> chris >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From cjfields at uiuc.edu Sat Aug 25 10:36:08 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 25 Aug 2007 09:36:08 -0500 Subject: [Bioperl-l] undef SeqFeature tag values In-Reply-To: <46CFDF45.8030200@sheffield.ac.uk> References: <88A352F1-EC1A-44FA-90DA-B869FF965F86@uiuc.edu> <7F5FDC98-24A6-4B74-A374-16780F9A5CC9@gmx.net> <46CFDF45.8030200@sheffield.ac.uk> Message-ID: <3F3C311E-3CD5-436B-987F-FD7695904647@uiuc.edu> The rollback branch is off of HEAD, not 1.4, so any bugs fixed since then and any modules/tests added will be present. So far everything has worked relatively well; you can check the history of this page to track what has happened so far: http://www.bioperl.org/wiki/Feature_Annotation_rollback The only problem code remaining for the first round of changes is a single method in Bio::SeqFeature::Annotated (if the tests are to be trusted) and one test in Bio::SeqFeature::AnnotationAdaptor using Hilmar's original test suite. Most of those were tests breaking Feature/Annotation API outlined in the HOWTO (calling get_Annotations directly from a Bio::SeqI or Bio::SeqFeatureI for instance), or examples where has_tag() was not used. I agree good test coverage would probably help catch some of those still silently lingering in code, but I don't think it can find everything; that's the reason I indicate there will need extensive testing. That applies within the suite but also by users in the wild. The SeqFeatureI and AnnotatableI API is defined very specifically in the Feature/Annotation HOWTO, so if anything the introduced changes violated much of that and started a domino effect of users unknowingly violating the API (me among them). Also, just b/c a test passes doesn't mean it is the ->correct<- result; it is very easy to just throw something from Data::Dumper into an is() test and have it pass. As an example, it appears there was a bit of cheating going on with AnnotationAdaptor.t in particular, where expected numbers were changed to conform to results w/o explanation. Which is the correct answer? I trust Hilmar's original test suite over the (rushed) changes. chris On Aug 25, 2007, at 2:50 AM, Nathan S. Haigh wrote: > This sort of highlights a comment I made previously about how do you > test for a stable API? > > It seems to me that unless you have intricate knowledge about the > changes that took place, you will find it difficult to know when an > API > change has occurred. Is it possible to run the 1.4 test suite against > existing code to ensure tests pass? What if the 1.4 tests contained > bugs? This approach would need good code coverage by the tests to > ensure > things work the same i.e. test code in HEAD against the test suite > from > the previous stable release's branch - would/should this work > conceptually?** > > Nath > > Hilmar Lapp wrote: >> You're supposed to call has_tag() first before you can assume that >> you can call get_tag_values() w/o an exception. That was the original >> API. >> >> -hilmar >> >> On Aug 24, 2007, at 6:36 PM, Chris Fields wrote: >> >> >>> One thing I am noticing with the rollback to tag as strings is that >>> tags with an undefined value are not set; I'm assuming when tags >>> were >>> Bio::AnnotationI they were instantiated regardless with an undef >>> value. When attempting to call an undef tag with get_tag_values() I >>> get: >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: asking for tag value that does not exist signalPeptideLength >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw /Users/cjfields/src/featann_rollback/ >>> bioperl-live/blib/lib/Bio/Root/Root.pm:357 >>> STACK: Bio::SeqFeature::Generic::get_tag_values /Users/cjfields/src/ >>> featann_rollback/bioperl-live/blib/lib/Bio/SeqFeature/Generic.pm:499 >>> STACK: t/targetp.t:189 >>> ----------------------------------------------------------- >>> >>> I personally think of this as a feature (why set a tag at all if it >>> is undef?). However, are there any circumstances where we might >>> want >>> this behavior? Do we want to simply return w/o a value if a tag >>> name >>> isn't found (i.e. remove the exception)? >>> >>> chris >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sat Aug 25 18:12:49 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 25 Aug 2007 17:12:49 -0500 Subject: [Bioperl-l] Feature/Annotation rollback(update) Message-ID: I have finished rolling back most of the specific changes made prior to the 1.5 release and have relevant tests passing: http://www.bioperl.org/wiki/Feature_Annotation_rollback#First_round Operator overloading of Bio::Annotation objects will be trickier to debug as tons of tests fail when the overloading is removed: http://www.bioperl.org/wiki/Feature_Annotation_rollback#Second_round I'll start looking into fixes. I don't like overloads from a personal standpoint (problems w/ long-term code maintenance), but was there a more specific reason for removing them? chris From hlapp at gmx.net Sun Aug 26 00:58:46 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 26 Aug 2007 00:58:46 -0400 Subject: [Bioperl-l] Feature/Annotation rollback(update) In-Reply-To: References: Message-ID: <3BC5C775-0062-4B02-A929-D2D3F8FDD768@gmx.net> The reason was to provide for backward compatibility with the original API in which tag values were scalars, not objects. The idea was that if someone relied on that and treats the object as a scalar (comparison, printing, etc), the operator overloading would take care of that. So by going back to the original API the overloading should become obsolete, at least theoretically. The overloading can cause some very subtle issues that I pointed out in an earlier email. It's one of those really "clever" tricks that just make it very hard for newcomers to understand what's going on in their code. -hilmar On Aug 25, 2007, at 6:12 PM, Chris Fields wrote: > I have finished rolling back most of the specific changes made prior > to the 1.5 release and have relevant tests passing: > > http://www.bioperl.org/wiki/Feature_Annotation_rollback#First_round > > Operator overloading of Bio::Annotation objects will be trickier to > debug as tons of tests fail when the overloading is removed: > > http://www.bioperl.org/wiki/Feature_Annotation_rollback#Second_round > > I'll start looking into fixes. I don't like overloads from a > personal standpoint (problems w/ long-term code maintenance), but was > there a more specific reason for removing them? > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From n.haigh at sheffield.ac.uk Sun Aug 26 03:35:36 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Sun, 26 Aug 2007 08:35:36 +0100 Subject: [Bioperl-l] please some help In-Reply-To: <20070823233044.BJQ45014@mailstore2.fiu.edu> References: <20070823233044.BJQ45014@mailstore2.fiu.edu> Message-ID: <46D12D48.8080301@sheffield.ac.uk> mcons004 at fiu.edu wrote: > Hello, > I am new to this software and I am having some trouble starting. The version of Bioperl I am working on is v5.8.6. My OS is Unix (Mac OS X). I am trying to use Bioperl with a file called blastParser to process a file which is the output of a "blastall" operation. > > The code that gives me error is: >> perl blastParser.pl junk.out 1 1 1.0 > and the error message says: > Can't locate Bio/SearchIO.pm in @INC (@INC contains: /System/Library/Perl/5.8.6/darwin-thread-multi-2level > > You online info says I probably means that the module Bio::SearchIO.pm is not instaled and I can either install Bundle::Bioperl or install that specific module by hand. Could you give me some tips in this? I am new working with Unix, and Bioperl so I am a little confused. Any information will be helpful for me. Thanks > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From what you have said, it appears you need some basic info to understand what you are trying to achieve. The Perl programming language requires the Perl interpreter in order to execute a Perl script. The Perl interpreter is usually installed as standard with Unix/Linux based Operating Systems. The version you mention (5.8.6) will not be the version of Bioperl but the version of the Perl interpreter you have installed - you can check this by typing "perl -v" at a command prompt. Given your apparent lack of understanding of the Unix OS, it is likely that you don't have Bioperl installed. You should have a look at: http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink Nath From cjfields at uiuc.edu Sun Aug 26 15:22:24 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 26 Aug 2007 14:22:24 -0500 Subject: [Bioperl-l] Feature/Annotation rollback(update) In-Reply-To: <3BC5C775-0062-4B02-A929-D2D3F8FDD768@gmx.net> References: <3BC5C775-0062-4B02-A929-D2D3F8FDD768@gmx.net> Message-ID: I managed to find your comments (as well as ones from Ewan, Jason, and a few others) on the mail list archives, so I'll link to them. The problem will be fixing the several places where overloading is assumed but no longer exists (i.e. in write_* methods), but we can probably pinpoint those by throwing or warning when overloading is assumed. My thought is to either modify as_text() or add a new display_text() method to all AnnotationI that explicitly does what the overloading implied (print the annotation in a specified or assumed way). We could then delegate to that in the stringification overload (with appropriate deprecation warnings) until 1.6, where we remove it completely. Something like: my $link1 = Bio::Annotation::DBLink->new(-database => 'TSC', -primary_id => 'TSC0000030', -tagname => "tag2); # either print $link1->display_text(),"\n"; # or ... print $link1->as_text("display"),"\n"; # prints "TSC:TSC0000030" # default human readable print $link1->as_text(),"\n"; # prints "Direct database link to TSC0000030 in database TSC" print "$link1\n"; # gets a deprecation warning for now, removed completely for 1.6 chris On Aug 25, 2007, at 11:58 PM, Hilmar Lapp wrote: > The reason was to provide for backward compatibility with the > original API in which tag values were scalars, not objects. The > idea was that if someone relied on that and treats the object as a > scalar (comparison, printing, etc), the operator overloading would > take care of that. > > So by going back to the original API the overloading should become > obsolete, at least theoretically. > > The overloading can cause some very subtle issues that I pointed > out in an earlier email. It's one of those really "clever" tricks > that just make it very hard for newcomers to understand what's > going on in their code. > > -hilmar > > On Aug 25, 2007, at 6:12 PM, Chris Fields wrote: > >> I have finished rolling back most of the specific changes made prior >> to the 1.5 release and have relevant tests passing: >> >> http://www.bioperl.org/wiki/Feature_Annotation_rollback#First_round >> >> Operator overloading of Bio::Annotation objects will be trickier to >> debug as tons of tests fail when the overloading is removed: >> >> http://www.bioperl.org/wiki/Feature_Annotation_rollback#Second_round >> >> I'll start looking into fixes. I don't like overloads from a >> personal standpoint (problems w/ long-term code maintenance), but was >> there a more specific reason for removing them? >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Sun Aug 26 16:57:37 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 26 Aug 2007 16:57:37 -0400 Subject: [Bioperl-l] Feature/Annotation rollback(update) In-Reply-To: References: <3BC5C775-0062-4B02-A929-D2D3F8FDD768@gmx.net> Message-ID: <503E47B9-EB4E-4442-8A56-D1513489EEA3@gmx.net> The thing that I actually never quite understood (and predates the API changes) is why $ann->as_text() needs to include explanatory text such as 'Direct database link to blah in database foo.' I would have said that "TSC:TSC0000030" is human readable enough, unless you present it without any context so that one would have no clue that it is a database cross-reference. The as_text() method shouldn't be meant for the sole purpose of debugging annotation collections. However, I'm not sure for what else you could use it for, given that there are no guidelines for what to expect. In fact, I do use as_text() a lot for a real purpose, namely as a surrogate unique key. For example, making a collection of dblinks unique is quite simple using the as_text() method: my %dbhash = map { ($_->as_text(), $_) } $anncoll->remove_Annotations ('dblink'); $anncoll->add_Annotation('dblink',$_) foreach (values %dbhash); This is a common task when harvesting annotation from various places and then integrating it. However, there is nothing in the API documentation that suggests that this might be a reliable or even expected property such that you could omit the 'dblink' tag above. I agree that having a conceptual equivalent to $feature->display_name and $seq->display_id would be good, but these methods have no claim to returning something that's unique in any way. I guess I've now raised more questions than I answered (in fact I didn't answer any). Sorry 'bout that. -hilmar On Aug 26, 2007, at 3:22 PM, Chris Fields wrote: > I managed to find your comments (as well as ones from Ewan, Jason, > and a few others) on the mail list archives, so I'll link to them. > The problem will be fixing the several places where overloading is > assumed but no longer exists (i.e. in write_* methods), but we can > probably pinpoint those by throwing or warning when overloading is > assumed. > > My thought is to either modify as_text() or add a new display_text > () method to all AnnotationI that explicitly does what the > overloading implied (print the annotation in a specified or assumed > way). We could then delegate to that in the stringification > overload (with appropriate deprecation warnings) until 1.6, where > we remove it completely. Something like: > > my $link1 = Bio::Annotation::DBLink->new(-database => 'TSC', > -primary_id => 'TSC0000030', > -tagname => "tag2); > > # either > print $link1->display_text(),"\n"; > # or ... > print $link1->as_text("display"),"\n"; > # prints "TSC:TSC0000030" > > # default human readable > print $link1->as_text(),"\n"; > # prints "Direct database link to TSC0000030 in database TSC" > > print "$link1\n"; > # gets a deprecation warning for now, removed completely for 1.6 > > chris > > On Aug 25, 2007, at 11:58 PM, Hilmar Lapp wrote: > >> The reason was to provide for backward compatibility with the >> original API in which tag values were scalars, not objects. The >> idea was that if someone relied on that and treats the object as a >> scalar (comparison, printing, etc), the operator overloading would >> take care of that. >> >> So by going back to the original API the overloading should become >> obsolete, at least theoretically. >> >> The overloading can cause some very subtle issues that I pointed >> out in an earlier email. It's one of those really "clever" tricks >> that just make it very hard for newcomers to understand what's >> going on in their code. >> >> -hilmar >> >> On Aug 25, 2007, at 6:12 PM, Chris Fields wrote: >> >>> I have finished rolling back most of the specific changes made prior >>> to the 1.5 release and have relevant tests passing: >>> >>> http://www.bioperl.org/wiki/Feature_Annotation_rollback#First_round >>> >>> Operator overloading of Bio::Annotation objects will be trickier to >>> debug as tons of tests fail when the overloading is removed: >>> >>> http://www.bioperl.org/wiki/Feature_Annotation_rollback#Second_round >>> >>> I'll start looking into fixes. I don't like overloads from a >>> personal standpoint (problems w/ long-term code maintenance), but >>> was >>> there a more specific reason for removing them? >>> >>> chris >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Sun Aug 26 18:47:41 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 26 Aug 2007 17:47:41 -0500 Subject: [Bioperl-l] Feature/Annotation rollback(update) In-Reply-To: <503E47B9-EB4E-4442-8A56-D1513489EEA3@gmx.net> References: <3BC5C775-0062-4B02-A929-D2D3F8FDD768@gmx.net> <503E47B9-EB4E-4442-8A56-D1513489EEA3@gmx.net> Message-ID: Either way I implement, it would be used simply as a generic convenience method to replicate output via stringification overloading, using a common method name for all AnnotationI; there seem to be several instances where this is used for generating output (i.e. SeqIO::genbank). So, for instance, when formatting output you could just call as_text('display') or display_text() and you would get the most common formatting for that particular annotation type. chris On Aug 26, 2007, at 3:57 PM, Hilmar Lapp wrote: > The thing that I actually never quite understood (and predates the > API changes) is why $ann->as_text() needs to include explanatory > text such as 'Direct database link to blah in database foo.' I > would have said that "TSC:TSC0000030" is human readable enough, > unless you present it without any context so that one would have no > clue that it is a database cross-reference. > > The as_text() method shouldn't be meant for the sole purpose of > debugging annotation collections. However, I'm not sure for what > else you could use it for, given that there are no guidelines for > what to expect. > > In fact, I do use as_text() a lot for a real purpose, namely as a > surrogate unique key. For example, making a collection of dblinks > unique is quite simple using the as_text() method: > > my %dbhash = map { ($_->as_text(), $_) } $anncoll- > >remove_Annotations('dblink'); > $anncoll->add_Annotation('dblink',$_) foreach (values %dbhash); > > This is a common task when harvesting annotation from various > places and then integrating it. However, there is nothing in the > API documentation that suggests that this might be a reliable or > even expected property such that you could omit the 'dblink' tag > above. > > I agree that having a conceptual equivalent to $feature- > >display_name and $seq->display_id would be good, but these methods > have no claim to returning something that's unique in any way. > > I guess I've now raised more questions than I answered (in fact I > didn't answer any). Sorry 'bout that. > > -hilmar > > On Aug 26, 2007, at 3:22 PM, Chris Fields wrote: > >> I managed to find your comments (as well as ones from Ewan, Jason, >> and a few others) on the mail list archives, so I'll link to >> them. The problem will be fixing the several places where >> overloading is assumed but no longer exists (i.e. in write_* >> methods), but we can probably pinpoint those by throwing or >> warning when overloading is assumed. >> >> My thought is to either modify as_text() or add a new display_text >> () method to all AnnotationI that explicitly does what the >> overloading implied (print the annotation in a specified or >> assumed way). We could then delegate to that in the >> stringification overload (with appropriate deprecation warnings) >> until 1.6, where we remove it completely. Something like: >> >> my $link1 = Bio::Annotation::DBLink->new(-database => 'TSC', >> -primary_id => 'TSC0000030', >> -tagname => "tag2); >> >> # either >> print $link1->display_text(),"\n"; >> # or ... >> print $link1->as_text("display"),"\n"; >> # prints "TSC:TSC0000030" >> >> # default human readable >> print $link1->as_text(),"\n"; >> # prints "Direct database link to TSC0000030 in database TSC" >> >> print "$link1\n"; >> # gets a deprecation warning for now, removed completely for 1.6 >> >> chris >> >> On Aug 25, 2007, at 11:58 PM, Hilmar Lapp wrote: >> >>> The reason was to provide for backward compatibility with the >>> original API in which tag values were scalars, not objects. The >>> idea was that if someone relied on that and treats the object as >>> a scalar (comparison, printing, etc), the operator overloading >>> would take care of that. >>> >>> So by going back to the original API the overloading should >>> become obsolete, at least theoretically. >>> >>> The overloading can cause some very subtle issues that I pointed >>> out in an earlier email. It's one of those really "clever" tricks >>> that just make it very hard for newcomers to understand what's >>> going on in their code. >>> >>> -hilmar >>> >>> On Aug 25, 2007, at 6:12 PM, Chris Fields wrote: >>> >>>> I have finished rolling back most of the specific changes made >>>> prior >>>> to the 1.5 release and have relevant tests passing: >>>> >>>> http://www.bioperl.org/wiki/Feature_Annotation_rollback#First_round >>>> >>>> Operator overloading of Bio::Annotation objects will be trickier to >>>> debug as tons of tests fail when the overloading is removed: >>>> >>>> http://www.bioperl.org/wiki/ >>>> Feature_Annotation_rollback#Second_round >>>> >>>> I'll start looking into fixes. I don't like overloads from a >>>> personal standpoint (problems w/ long-term code maintenance), >>>> but was >>>> there a more specific reason for removing them? >>>> >>>> chris >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >>> >>> >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Sun Aug 26 19:01:03 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 26 Aug 2007 19:01:03 -0400 Subject: [Bioperl-l] Feature/Annotation rollback(update) In-Reply-To: References: <3BC5C775-0062-4B02-A929-D2D3F8FDD768@gmx.net> <503E47B9-EB4E-4442-8A56-D1513489EEA3@gmx.net> Message-ID: <35BBCF3B-BA1B-4C8D-8753-2A27AB3B423C@gmx.net> On Aug 26, 2007, at 6:47 PM, Chris Fields wrote: > just call as_text('display') or display_text() The latter is more obvious, and can be better tested for presence and implementation, though in the world of perl that's of course not strictly true. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From zeroliu at 163.com Mon Aug 27 07:49:53 2007 From: zeroliu at 163.com (zeroliu) Date: Mon, 27 Aug 2007 19:49:53 +0800 (CST) Subject: [Bioperl-l] Problems of parse emboss water result by Bio::AlignIO Message-ID: <534546299.525411188215393753.JavaMail.coremail@bj163app118.163.com> Hello, I'm trying to parse water (EMBOSS 5.0.0) result by Bio::AlignIO (Bioperl-1.4) and encountered some problems. 1. What does the Bio::AlignIO->next_aln() return? Does it return a Bio::Align::AlignI or Bio::SimpleAlign object? Or it depends on the alignment file format? 2. How can I get the "score" properity in a water alignment result? There is a score method in Bio::SimpleAlign but not in Bio::AlignIO. In 2004, Jason mentioned: Scores are set by the Alignment parser - we separate the 'running' from the 'parsing'. Bio::AlignIO::emboss had to be updated. (http://article.gmane.org/gmane.comp.lang.perl.bio.general/7156/match=alignio+water) How could I know it? Thank you very much! From cjfields at uiuc.edu Mon Aug 27 13:13:13 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 27 Aug 2007 12:13:13 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Annotated status Message-ID: <6DC5ECA8-3DF1-4B84-914C-4F2B3B44E29A@uiuc.edu> What is the current status on maintenance of Bio::SeqFeature::Annotated? From what I gather (based on the code and past mail list posts) the intent of the module seems to be to store any SeqFeature-specific data (tags, score, source, primary_tag, etc) in a Bio::AnnotationCollectionI as strongly typed data. However there are several inconsistencies, such as objects being returned when a string is expected (score(), source()). Also, several methods appear half-implemented, aren't consistent with SeqFeatureI API or similar methods in other SeqFeatureI's, and there are no docs explaining what is expected. If no one speaks up on it, I'll do my best with maintaining it myself, but don't expect the API to stay as it is. chris From cjfields at uiuc.edu Mon Aug 27 18:31:01 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 27 Aug 2007 17:31:01 -0500 Subject: [Bioperl-l] Bio::Ontology::Term (rollback question) Message-ID: This is related to the ongoing Feature/Annotation rollback. I have found that a few Ontology-related modules are (either directly or indirectly) passing strings instead of Bio::Annotation::DBLinks to Bio::Ontology::Term::new(), add_dblink(), or add_dblink_context() (thelast is where the error occurs). If needed we could allow strings to be passed but this isn't consistent with the API. Any thoughts on what to do here? chris From hlapp at gmx.net Mon Aug 27 19:07:12 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 27 Aug 2007 19:07:12 -0400 Subject: [Bioperl-l] Bio::Ontology::Term (rollback question) In-Reply-To: References: Message-ID: <01A56BFB-DE36-4C95-9BD3-DB35A706BD87@gmx.net> The B::O::TermI interface actually says that get_dblinks() would return scalars. That's why the add_dblink methods accept strings. I also agree that this is inconsistent with with the rest of BioPerl. Oddly enough, Term::add_dblink_context() does ask for DBLink objects, though it doesn't seem to be enforced, even though Term::get_dblink_context() is advertised as returning scalars. So it does seem this is messed up design-wise. It seems to me that to really fix this would inevitably break the API, and I don't see how you would make this backwards compatible w/o creating a lot of messy code, the sole purpose of which would be backwards compatibility. One could only fix Term::add_dblink_context() as it's not in the interface but that wouldn't contribute anything to improving consistency. So the alternative to breaking the API in a non-backwards compatible fashion would be to add to it, map the existing dblink methods onto the added ones, and start deprecating them. For example, you could add methods get_dbxrefs() (also on the interface), add_dbxref(), etc, and build in a context argument so we don't need another set of methods for that. They would accept and return DBLink objects, and the get_dblink() methods could be changed to map those to scalars while also getting slated for deprecation. Does this make sense? -hilmar On Aug 27, 2007, at 6:31 PM, Chris Fields wrote: > This is related to the ongoing Feature/Annotation rollback. I have > found that a few Ontology-related modules are (either directly or > indirectly) passing strings instead of Bio::Annotation::DBLinks to > Bio::Ontology::Term::new(), add_dblink(), or add_dblink_context() > (thelast is where the error occurs). > > If needed we could allow strings to be passed but this isn't > consistent with the API. Any thoughts on what to do here? > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Mon Aug 27 21:12:35 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 27 Aug 2007 20:12:35 -0500 Subject: [Bioperl-l] Bio::Ontology::Term (rollback question) In-Reply-To: <01A56BFB-DE36-4C95-9BD3-DB35A706BD87@gmx.net> References: <01A56BFB-DE36-4C95-9BD3-DB35A706BD87@gmx.net> Message-ID: On Aug 27, 2007, at 6:07 PM, Hilmar Lapp wrote: > The B::O::TermI interface actually says that get_dblinks() would > return scalars. That's why the add_dblink methods accept strings. I > also agree that this is inconsistent with with the rest of BioPerl. > > Oddly enough, Term::add_dblink_context() does ask for DBLink > objects, though it doesn't seem to be enforced, even though > Term::get_dblink_context() is advertised as returning scalars. This happened b/c of stringification and 'eq' overloading. Just removing the overloads didn't reveal this problem; I had to add exceptions to them to fish this out. > So it does seem this is messed up design-wise. It seems to me that > to really fix this would inevitably break the API, and I don't see > how you would make this backwards compatible w/o creating a lot of > messy code, the sole purpose of which would be backwards > compatibility. > > One could only fix Term::add_dblink_context() as it's not in the > interface but that wouldn't contribute anything to improving > consistency. Agreed; in fact it may make it more confusing. > So the alternative to breaking the API in a non-backwards > compatible fashion would be to add to it, map the existing dblink > methods onto the added ones, and start deprecating them. For > example, you could add methods get_dbxrefs() (also on the > interface), add_dbxref(), etc, and build in a context argument so > we don't need another set of methods for that. They would accept > and return DBLink objects, and the get_dblink() methods could be > changed to map those to scalars while also getting slated for > deprecation. > > Does this make sense? > > -hilmar I think so; I'll have to look over the code to see how we would implement this, though I'm guessing everything would be stored as DBLink objects by default. Any changes will probably need to wait until after I fish out any remaining spots in the code where overloading is being used, but at least we have a direction on where to go. chris From cjfields at uiuc.edu Tue Aug 28 00:18:19 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 27 Aug 2007 23:18:19 -0500 Subject: [Bioperl-l] Feature/Annotation rollback (update #2) Message-ID: Okay, the planned rollback on is pretty much complete with a few exceptions. I'll probably merge back to bioperl-live within the next few days once the following issues are addressed: 1) Bio::Ontology::Term - several classes are using Bio::Ontology::Term in ways inconsistent with one another; some are passing Bio::Annotation::DBLink instances and other are passing simple strings. This was somewhat transparent with various operator overloads but now they have really come to the surface. I'll probably work on Hilmar's suggestion on adding extra class methods to give it a more consistent interface and deprecate the older ones. As one might guess this affects much of Bio::Ontology but also Bio::Seqfeature::Annotated; strangely enough FeatureIO tests pass (which may simply mean there isn't enough test coverage for FeatureIO). 2) Bio::SeqFeature::Annotated - no word back on maintenance for this module. It needs to implement Bio::SeqFeature::TypedSeqFeatureI (pretty easy) and needs documentation (not so easy). It's apparently essential for FeatureIO. I'll basically get it up-and-running and clean up the API. There are a few odds and ends that need to be addressed with roundtripping, but these are already problems on the MAIN trunk so they will be addressed once code is merged back in. chris From Frigerio at pierroton.inra.fr Tue Aug 28 03:12:22 2007 From: Frigerio at pierroton.inra.fr (Jean-Marc FRIGERIO) Date: Tue, 28 Aug 2007 09:12:22 +0200 Subject: [Bioperl-l] Bio::SeqIO::phd_comment objet Message-ID: <200708280912.22798.Frigerio@pierroton.inra.fr> Hi, The Bio::SeqIO::phd module says, speaking about the COMMENT section of a phd file: # this should be an actual object to assist in serialization # but I don't have time for this now." The doc says ( http://www.bioperl.org/wiki/Core_1.5.1_1.5.2_delta) This really needs a "phred_comments" object of some sort so that it will be serializable. Then when java clients get this object they will be able to deserialize it. I volunteer to do this, but need your opinion. Do we really need an object (Bio::phd_comment ? Bio::SeqIO::phd_comment ? Bio::phd_header ? other ?). Or adding few Bio::Seq::SeqWithQuality subs in the Bio::SeqIO::phd module would suffice ? What are the constraints of serialization/deserialization of the java clients ? I was thinking of just adding get/setter for all the comments chromat_file(), abi_thumbprint(), etc. touch() for the timestamp attribute() for new unknown comments write_comment(). others ? -- jmf -- Jean-Marc Frigerio, UMR BIOGECO 69, route d'Arcachon, 33612 CESTAS France Tel : +33(0) 557 122 829 Fax : +33(0) 557 122 881 Frigerio at pierroton.inra.fr http://www.pierroton.inra.fr/biogeco/index.html From jay at jays.net Tue Aug 28 07:14:37 2007 From: jay at jays.net (Jay Hannah) Date: Tue, 28 Aug 2007 06:14:37 -0500 Subject: [Bioperl-l] Problems of parse emboss water result by Bio::AlignIO In-Reply-To: <534546299.525411188215393753.JavaMail.coremail@bj163app118.163.com> References: <534546299.525411188215393753.JavaMail.coremail@bj163app118.163.com> Message-ID: <4CD8B5C2-3C87-495C-894E-17C3C67091DA@jays.net> On Aug 27, 2007, at 6:49 AM, zeroliu wrote: > I'm trying to parse water (EMBOSS 5.0.0) result by Bio::AlignIO > (Bioperl-1.4) and encountered some problems. > 1. What does the Bio::AlignIO->next_aln() return? > Does it return a Bio::Align::AlignI or Bio::SimpleAlign object? > Or it depends on the alignment file format? http://doc.bioperl.org/bioperl-live/Bio/AlignIO.html Title : next_aln Usage : $aln = stream->next_aln Function: reads the next $aln object from the stream Returns : a Bio::Align::AlignI compliant object > 2. How can I get the "score" properity in a water alignment result? > There is a score method in Bio::SimpleAlign but not in Bio::AlignIO. > In 2004, Jason mentioned: > Scores are set by the Alignment parser - we separate the 'running' > from > the 'parsing'. > Bio::AlignIO::emboss had to be updated. > (http://article.gmane.org/gmane.comp.lang.perl.bio.general/7156/ > match=alignio+water) > How could I know it? Line 480 of t/AlignIO.t seems to walk you through? Here's the block, with the test overhead removed. # EMBOSS water $str = Bio::AlignIO->new('-format' => 'emboss', '-file' => 'cysprot.water'); $aln = $str->next_aln(); # $aln is now a Bio::Align::AlignI object print $aln->score; # '501.50' HTH, Jay Hannah http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From cjfields at uiuc.edu Tue Aug 28 17:05:10 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 28 Aug 2007 16:05:10 -0500 Subject: [Bioperl-l] Feature/Annotation rollback finished Message-ID: I'm now wrapping up the Feature/Annotation rollback. I will probably start merging back to the main branch in the next day or two., as soon as interested parties (*cough*devs*cough*) look over the last batch of changes. http://www.bioperl.org/wiki/Feature_Annotation_rollback#Fourth_Round I have also added a small benchmark test which indicates a decrease in parsing time in SeqIO::genbank with all tests passing. I expect this will translate over to any Bio::SeqFeature::Generic-using class (open mouth, prepare to insert foot....). It is also possible there are still some instances where overloading is expected lurking about in the ~1000 or so modules, so I'll leave the exceptions I added to all Bio::AnnotationI; we can remove them down the line, maybe prior to rel1.6, after more tests are added or if they get particularly annoying. My guess is I caught 99.99% of them (prepare to insert other foot....). The key change in this last round is the addition of several class *dbxref* methods to Bio::Ontology::Term and Bio::Annotation::OntologyTerm, all of which are capable of working with either DBLink instances or simple scalars. This was primarily done in order to clear up inconsistencies in the older *dblink* methods, which were ambiguous (some indicates simple scalar arguments, others DBLink objects); operator overloading was used extensively in these cases, which led to several issues. I have added deprecation warnings to the older methods which now map to using the newer methods. All tests pass with the exception of a few already failing on the MAIN branch; the single test which needs to be fixed is a round-tripping error in swiss.t (now a TODO), which can be fixed after merging back. Please respond to this if there are any questions or if I need to clarify the changes I made a bit more. chris From hlapp at duke.edu Tue Aug 28 18:13:32 2007 From: hlapp at duke.edu (Hilmar Lapp) Date: Tue, 28 Aug 2007 18:13:32 -0400 Subject: [Bioperl-l] Fwd: Announcing Ngila 1.2.1 Alignment Program References: <20070828070219.DE03668527@evol.biology.mcmaster.ca> Message-ID: <1F006707-291C-4895-A178-33FDFBDE6AE6@duke.edu> Is anyone thinking about adding support for this as an aligner option? I'm not sure whether aside from a Bio::Tools::Run module we'd also need a format parser - it sounds like it's emitting clustalw format? -hilmar Begin forwarded message: > From: evoldir at evol.biology.mcmaster.ca > Date: August 28, 2007 3:02:19 AM EDT > To: hlapp at duke.edu > Subject: Other: Announcing Ngila 1.2.1 Alignment Program > Reply-To: racartwr at ncsu.edu > > > Ngila is a global, pairwise alignment program that uses logarithmic > and > affine gap costs, i.e. C(g) = a+b*g+c*ln(g). These gap costs are more > biologically realistic than the more popular (and efficient) affine > gap > cost model. > > I have recently completed updating the program to version 1.2.1. The > new version includes two new, evolutionary alignment models based > on my > current research. These models allow you to find the maximum > alignment > of two sequences based on biological, evolutionary parameters---no > more > guessing at biological costs. Additional changes are noted on the > website. > > Website & Manual: > > http://scit.us/projects/ngila/ > > Windows Binary: > > http://scit.us/projects/files/ngila/Releases/ngila-release-win32.zip > > Unix/Mac Source Code: > > http://scit.us/projects/files/ngila/Releases/ngila-release.tar.gz > > I'll be happy to answer any questions users have about the new > models or > the program. > > -- > ********************************************************* > Reed A. Cartwright, PhD http://scit.us/ > Postdoctoral Researcher http://www.dererumnatura.us/ > Department of Genetics http://www.pandasthumb.org/ > > Bioinformatics Research Center > North Carolina State University > Campus Box 7566 > Raleigh, NC 27695-7566 > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== From hlapp at duke.edu Tue Aug 28 18:13:32 2007 From: hlapp at duke.edu (Hilmar Lapp) Date: Tue, 28 Aug 2007 18:13:32 -0400 Subject: [Bioperl-l] Fwd: Announcing Ngila 1.2.1 Alignment Program Message-ID: Is anyone thinking about adding support for this as an aligner option? I'm not sure whether aside from a Bio::Tools::Run module we'd also need a format parser - it sounds like it's emitting clustalw format? -hilmar Begin forwarded message: > From: evoldir at evol.biology.mcmaster.ca > Date: August 28, 2007 3:02:19 AM EDT > Subject: Other: Announcing Ngila 1.2.1 Alignment Program > Reply-To: racartwr at ncsu.edu > > > Ngila is a global, pairwise alignment program that uses logarithmic > and > affine gap costs, i.e. C(g) = a+b*g+c*ln(g). These gap costs are more > biologically realistic than the more popular (and efficient) affine > gap > cost model. > > I have recently completed updating the program to version 1.2.1. The > new version includes two new, evolutionary alignment models based > on my > current research. These models allow you to find the maximum > alignment > of two sequences based on biological, evolutionary parameters---no > more > guessing at biological costs. Additional changes are noted on the > website. > > Website & Manual: > > http://scit.us/projects/ngila/ > > Windows Binary: > > http://scit.us/projects/files/ngila/Releases/ngila-release-win32.zip > > Unix/Mac Source Code: > > http://scit.us/projects/files/ngila/Releases/ngila-release.tar.gz > > I'll be happy to answer any questions users have about the new > models or > the program. > > -- > ********************************************************* > Reed A. Cartwright, PhD http://scit.us/ > Postdoctoral Researcher http://www.dererumnatura.us/ > Department of Genetics http://www.pandasthumb.org/ > > Bioinformatics Research Center > North Carolina State University > Campus Box 7566 > Raleigh, NC 27695-7566 > From hlapp at gmx.net Tue Aug 28 19:09:46 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 28 Aug 2007 19:09:46 -0400 Subject: [Bioperl-l] Fwd: Announcing Ngila 1.2.1 Alignment Program In-Reply-To: References: Message-ID: Sorry for the double post, BTW. I had erroneously assumed that the first email would be held for post by non-member. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Wed Aug 29 00:01:13 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 28 Aug 2007 23:01:13 -0500 Subject: [Bioperl-l] Fwd: Announcing Ngila 1.2.1 Alignment Program In-Reply-To: References: Message-ID: It probably wouldn't be hard to write one up, particularly if it's got already parsable format. We could probably base it off the current clustalw wrapper unless someone else thinks there is a better way. chris On Aug 28, 2007, at 5:13 PM, Hilmar Lapp wrote: > Is anyone thinking about adding support for this as an aligner > option? I'm not sure whether aside from a Bio::Tools::Run module we'd > also need a format parser - it sounds like it's emitting clustalw > format? > > -hilmar > > Begin forwarded message: > >> From: evoldir at evol.biology.mcmaster.ca >> Date: August 28, 2007 3:02:19 AM EDT >> Subject: Other: Announcing Ngila 1.2.1 Alignment Program >> Reply-To: racartwr at ncsu.edu >> >> >> Ngila is a global, pairwise alignment program that uses logarithmic >> and >> affine gap costs, i.e. C(g) = a+b*g+c*ln(g). These gap costs are >> more >> biologically realistic than the more popular (and efficient) affine >> gap >> cost model. >> >> I have recently completed updating the program to version 1.2.1. The >> new version includes two new, evolutionary alignment models based >> on my >> current research. These models allow you to find the maximum >> alignment >> of two sequences based on biological, evolutionary parameters---no >> more >> guessing at biological costs. Additional changes are noted on the >> website. >> >> Website & Manual: >> >> http://scit.us/projects/ngila/ >> >> Windows Binary: >> >> http://scit.us/projects/files/ngila/Releases/ngila-release-win32.zip >> >> Unix/Mac Source Code: >> >> http://scit.us/projects/files/ngila/Releases/ngila-release.tar.gz >> >> I'll be happy to answer any questions users have about the new >> models or >> the program. >> >> -- >> ********************************************************* >> Reed A. Cartwright, PhD http://scit.us/ >> Postdoctoral Researcher http://www.dererumnatura.us/ >> Department of Genetics http://www.pandasthumb.org/ >> >> Bioinformatics Research Center >> North Carolina State University >> Campus Box 7566 >> Raleigh, NC 27695-7566 >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Aug 29 12:03:07 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 29 Aug 2007 11:03:07 -0500 Subject: [Bioperl-l] remote SwissProt server problems Message-ID: <6805F552-9947-4C28-B846-47B5501B31DF@uiuc.edu> Just as a notice, DBFetch is currently retrieving only single records for the UniProtKB database (where Bio::DB::SwissProt fetches sequences). If anyone runs remote sevrer tests and DB.t in the test suite you'll see a failure towards the end which indicates this. I've posted a notice to the server help desk and will respond when I hear more. chris From cain.cshl at gmail.com Wed Aug 29 15:45:48 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Wed, 29 Aug 2007 15:45:48 -0400 Subject: [Bioperl-l] Feature/Annotation rollback finished In-Reply-To: References: Message-ID: <1188416748.2567.36.camel@localhost.localdomain> Hi Chris, I just wanted to let you know that I was out of town for a few days, but now I'm back and I'm doing testing of GMOD software based on the branch you are working on. I'll let you know how it goes, but don't let me stop you if you confident of your changes. I'm sure whatever goes wrong, it will just point out holes in the FeatureIO tests (I'm sure there are plenty) and will require hopefully minimal changes on my end. Thanks for your considerable efforts on this! (Regardless of how much work it makes for me :-) Scott On Tue, 2007-08-28 at 16:05 -0500, Chris Fields wrote: > I'm now wrapping up the Feature/Annotation rollback. I will probably > start merging back to the main branch in the next day or two., as > soon as interested parties (*cough*devs*cough*) look over the last > batch of changes. > > http://www.bioperl.org/wiki/Feature_Annotation_rollback#Fourth_Round > > I have also added a small benchmark test which indicates a decrease > in parsing time in SeqIO::genbank with all tests passing. I expect > this will translate over to any Bio::SeqFeature::Generic-using class > (open mouth, prepare to insert foot....). > > It is also possible there are still some instances where overloading > is expected lurking about in the ~1000 or so modules, so I'll leave > the exceptions I added to all Bio::AnnotationI; we can remove them > down the line, maybe prior to rel1.6, after more tests are added or > if they get particularly annoying. My guess is I caught 99.99% of > them (prepare to insert other foot....). > > The key change in this last round is the addition of several class > *dbxref* methods to Bio::Ontology::Term and > Bio::Annotation::OntologyTerm, all of which are capable of working > with either DBLink instances or simple scalars. This was primarily > done in order to clear up inconsistencies in the older *dblink* > methods, which were ambiguous (some indicates simple scalar > arguments, others DBLink objects); operator overloading was used > extensively in these cases, which led to several issues. I have > added deprecation warnings to the older methods which now map to > using the newer methods. All tests pass with the exception of a few > already failing on the MAIN branch; the single test which needs to be > fixed is a round-tripping error in swiss.t (now a TODO), which can be > fixed after merging back. > > Please respond to this if there are any questions or if I need to > clarify the changes I made a bit more. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070829/f8433568/attachment.bin From cjfields at uiuc.edu Wed Aug 29 16:13:17 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 29 Aug 2007 15:13:17 -0500 Subject: [Bioperl-l] Feature/Annotation rollback finished In-Reply-To: <1188416748.2567.36.camel@localhost.localdomain> References: <1188416748.2567.36.camel@localhost.localdomain> Message-ID: <8BA0C799-9899-4926-B7F5-24180F764146@uiuc.edu> I'll probably go ahead and start merging this stuff over to CVS HEAD then. There haven't been any objections so far. The page I posted outlines the more critical fixes, primarily the changes to Bio::Ontology::Term methods (along with relevant code) due to inconsistencies in the interface. The Bio::Annotation classes also now throw if you attempt to use them in an overloaded context. I also split off SeqFeature::Annotated tests into it's own test suite (SeqFeatAnnotated.t). Let me know if there are any problems along the way! chris On Aug 29, 2007, at 2:45 PM, Scott Cain wrote: > Hi Chris, > > I just wanted to let you know that I was out of town for a few > days, but > now I'm back and I'm doing testing of GMOD software based on the > branch > you are working on. I'll let you know how it goes, but don't let me > stop you if you confident of your changes. I'm sure whatever goes > wrong, it will just point out holes in the FeatureIO tests (I'm sure > there are plenty) and will require hopefully minimal changes on my > end. > > Thanks for your considerable efforts on this! (Regardless of how much > work it makes for me :-) > Scott > > > On Tue, 2007-08-28 at 16:05 -0500, Chris Fields wrote: >> I'm now wrapping up the Feature/Annotation rollback. I will probably >> start merging back to the main branch in the next day or two., as >> soon as interested parties (*cough*devs*cough*) look over the last >> batch of changes. >> >> http://www.bioperl.org/wiki/Feature_Annotation_rollback#Fourth_Round >> >> I have also added a small benchmark test which indicates a decrease >> in parsing time in SeqIO::genbank with all tests passing. I expect >> this will translate over to any Bio::SeqFeature::Generic-using class >> (open mouth, prepare to insert foot....). >> >> It is also possible there are still some instances where overloading >> is expected lurking about in the ~1000 or so modules, so I'll leave >> the exceptions I added to all Bio::AnnotationI; we can remove them >> down the line, maybe prior to rel1.6, after more tests are added or >> if they get particularly annoying. My guess is I caught 99.99% of >> them (prepare to insert other foot....). >> >> The key change in this last round is the addition of several class >> *dbxref* methods to Bio::Ontology::Term and >> Bio::Annotation::OntologyTerm, all of which are capable of working >> with either DBLink instances or simple scalars. This was primarily >> done in order to clear up inconsistencies in the older *dblink* >> methods, which were ambiguous (some indicates simple scalar >> arguments, others DBLink objects); operator overloading was used >> extensively in these cases, which led to several issues. I have >> added deprecation warnings to the older methods which now map to >> using the newer methods. All tests pass with the exception of a few >> already failing on the MAIN branch; the single test which needs to be >> fixed is a round-tripping error in swiss.t (now a TODO), which can be >> fixed after merging back. >> >> Please respond to this if there are any questions or if I need to >> clarify the changes I made a bit more. >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- > ---------------------------------------------------------------------- > -- > Scott Cain, Ph. D. > cain at cshl.edu > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jay at jays.net Wed Aug 29 18:11:55 2007 From: jay at jays.net (Jay Hannah) Date: Wed, 29 Aug 2007 17:11:55 -0500 Subject: [Bioperl-l] Bio::Seq -> Solr (Lucene) ? Message-ID: <46D5EF2B.5000101@jays.net> Please slap me if I'm hysterical. I'm seeking a broad bioinformatics search engine platform. I want to take gobs of data in gobs of formats and allow people to search it on the web. - Entrez is awesome. Unfortunately I don't see anything in the NCBI toolkit that helps me run my own version of it. Even a tiny one. After an initial "check out our toolkit" response from NCBI I don't seem to be getting anywhere. Maybe I'm not communicating enough or well enough. - EB-eye Search is slick. I don't see any developer kit or source code of any kind and I've gotten no response to my emails to them. - LuceGene is very cool. But it looks like no one has touched it in 2.5 years and I've gotten no response from their contact email address. I'm especially intrigued by their src/LuceGene/src/org/eugenes/index/LuceneReadseqIndexer.java which seems to use the rather popular(?) Java Readseq to populate Lucene with source data in all sorts of different formats. I don't know Java. - Solr is really neat. It's easy to install and gives a simple/powerful XML API to populate a Lucene index. ... so ... I'm thinking BioPerl knows how to parse lots of formats into a Bio::Seq. I'm thinking I could write Perl which would take a Bio::Seq object and convert it to an XML file which Solr would happily inject into Lucene for me. If I could do that I'm thinking that any of the many formats that Bio::SeqIO can slurp could magically be sent into a Lucene index for searching. I'm thinking that would be really cool and I'm going to write it. Now's your chance to slap me. Since I haven't started yet, what would I call this thing? Bio::SeqIO::Solr? (and I wouldn't implement the I part?) Thanks, Jay Hannah http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah More notes: http://clab.ist.unomaha.edu/CLAB/index.php/RT11 From hlapp at gmx.net Wed Aug 29 21:37:59 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 29 Aug 2007 21:37:59 -0400 Subject: [Bioperl-l] Bio::Seq -> Solr (Lucene) ? In-Reply-To: <46D5EF2B.5000101@jays.net> References: <46D5EF2B.5000101@jays.net> Message-ID: On Aug 29, 2007, at 6:11 PM, Jay Hannah wrote: > [...] > > I'm thinking I could write Perl which would take a Bio::Seq object and > convert it to an XML file which Solr would happily inject into Lucene > for me. > > If I could do that I'm thinking that any of the many formats that > Bio::SeqIO can slurp could magically be sent into a Lucene index for > searching. > > [...] > Since I haven't started yet, what would I call this thing? > Bio::SeqIO::Solr? (and I wouldn't implement the I part?) Would this be a Solr-specific XML writer? Or could you use an existing XML format for sequences? (as an aside, if you do need a Solr-specific format writer, my suggestion would be to name it solrxml [lowercase]) -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Wed Aug 29 22:01:45 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 29 Aug 2007 21:01:45 -0500 Subject: [Bioperl-l] Bio::Seq -> Solr (Lucene) ? In-Reply-To: <46D5EF2B.5000101@jays.net> References: <46D5EF2B.5000101@jays.net> Message-ID: <0FF63232-25DE-4676-8C06-B9B00BE28349@uiuc.edu> On Aug 29, 2007, at 5:11 PM, Jay Hannah wrote: > Please slap me if I'm hysterical. > > I'm seeking a broad bioinformatics search engine platform. I want to > take gobs of data in gobs of formats and allow people to search it on > the web. > > - Entrez is awesome. Unfortunately I don't see anything in the NCBI > toolkit that helps me run my own version of it. Even a tiny one. After > an initial "check out our toolkit" response from NCBI I don't seem > to be > getting anywhere. Maybe I'm not communicating enough or well enough. No. I have had non-responses before from NCBI; they may just be too busy. Warnock probably applies. > - EB-eye Search is slick. I don't see any developer kit or source code > of any kind and I've gotten no response to my emails to them. Not sure of this one personally. > - LuceGene is very cool. > ... > I don't know Java. ...but you could write a (perl) wrapper around it. You can try contacting Don Gilbert about it, though I think he's been trying out Chado. > - Solr is really neat. It's easy to install and gives a simple/ > powerful > XML API to populate a Lucene index. > ... so ... > > I'm thinking BioPerl knows how to parse lots of formats into a > Bio::Seq. > > ... > > I'm thinking that would be really cool and I'm going to write it. > > Now's your chance to slap me. No need. > Since I haven't started yet, what would I call this thing? > Bio::SeqIO::Solr? (and I wouldn't implement the I part?) > > Thanks, > > Jay Hannah > http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah > > More notes: > http://clab.ist.unomaha.edu/CLAB/index.php/RT11 The way I would go about it is use an established XML schema as a starting point and implement a writer (if bioperl doesn't already support it). It's better than reinventing (a constantly reinvented) wheel and starting up a brand-new schema of your own. INSDSeq (http://www.insdc.org/page.php?page=xmlstatus) is one I've been wanting to add for a while but haven't had time to work on; there are several other examples. Note that a few of the currently supported ones in bioperl, such as bsml and game, have had very little to no development over the years in favor of newer (better?) XML flavors, so it likely isn't worth working with those. chris From hlapp at gmx.net Wed Aug 29 22:02:45 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 29 Aug 2007 22:02:45 -0400 Subject: [Bioperl-l] Feature/Annotation rollback finished In-Reply-To: References: Message-ID: On Aug 28, 2007, at 5:05 PM, Chris Fields wrote: > I'm now wrapping up the Feature/Annotation rollback. I will probably > start merging back to the main branch in the next day or two., as > soon as interested parties (*cough*devs*cough*) look over the last > batch of changes. > > http://www.bioperl.org/wiki/Feature_Annotation_rollback#Fourth_Round > > [...] > It is also possible there are still some instances where overloading > is expected lurking about in the ~1000 or so modules, so I'll leave > the exceptions I added to all Bio::AnnotationI Keep in mind that code such as if ($ann) { ... } is mostly not b/c someone wanted to use overloading, but rather someone was lazy and really meant to say if (defined($ann)) { ... } In the absence of eq overloading, these will behave identically. So if you leave the exceptions in it is sort-of policing lazy programmers, which I guess is fine in principle, but is guaranteed to trip up a lot of script code. I'd take it out if you're reasonably sure that at least within BioPerl itself those lazy programming incidents are removed. > [...] > The key change in this last round is the addition of several class > *dbxref* methods to Bio::Ontology::Term and > Bio::Annotation::OntologyTerm, all of which are capable of working > with either DBLink instances or simple scalars. I don't think you need the code here to deal with both scalars and objects. It is fine I think to define the new methods from the outset to consistently accept and return DBLink objects, and period. The backwards compatibility logic should rather be in the *_dblink*() methods; i.e., instead of simple aliases they should have the code to map to and from the new API. That way, once the deprecation cycle ends, they can be removed, and with them all the legacy code that now is no longer needed, whereas if you have that in the new methods, it keeps bothering the maintainers. You also mention a add_dbxref_context() on the wiki page - I'm not sure why that would be needed given that you build in the -context option to add_dbxref() from the outset. But maybe I've glossed over some detail. Once this is merged back to the main trunk, I guess we need to give Bio::SeqFeature::TypedSeqFeatureI a thorough look and make sure it makes real sense. Thanks Chris for this effort, this clears a monumental roadblock. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Wed Aug 29 23:23:14 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 29 Aug 2007 22:23:14 -0500 Subject: [Bioperl-l] Feature/Annotation rollback finished In-Reply-To: References: Message-ID: On Aug 29, 2007, at 9:02 PM, Hilmar Lapp wrote: > > On Aug 28, 2007, at 5:05 PM, Chris Fields wrote: > >> I'm now wrapping up the Feature/Annotation rollback. I will probably >> start merging back to the main branch in the next day or two., as >> soon as interested parties (*cough*devs*cough*) look over the last >> batch of changes. >> >> http://www.bioperl.org/wiki/Feature_Annotation_rollback#Fourth_Round >> >> [...] >> It is also possible there are still some instances where overloading >> is expected lurking about in the ~1000 or so modules, so I'll leave >> the exceptions I added to all Bio::AnnotationI > > Keep in mind that code such as > > if ($ann) { ... } > > is mostly not b/c someone wanted to use overloading, but rather > someone was lazy and really meant to say > > if (defined($ann)) { ... } Agreed. > In the absence of eq overloading, these will behave identically. So > if you leave the exceptions in it is sort-of policing lazy > programmers, which I guess is fine in principle, but is guaranteed to > trip up a lot of script code. I'd take it out if you're reasonably > sure that at least within BioPerl itself those lazy programming > incidents are removed. I agree the overload exceptions shouldn't be left in. The problem is I'm not certain we have caught most implicit overload calls (just the ones tested for). Scott's checking everything against GMOD, though, so we can remove them after that. >> [...] >> The key change in this last round is the addition of several class >> *dbxref* methods to Bio::Ontology::Term and >> Bio::Annotation::OntologyTerm, all of which are capable of working >> with either DBLink instances or simple scalars. > > I don't think you need the code here to deal with both scalars and > objects. It is fine I think to define the new methods from the outset > to consistently accept and return DBLink objects, and period. > > The backwards compatibility logic should rather be in the *_dblink*() > methods; i.e., instead of simple aliases they should have the code to > map to and from the new API. That way, once the deprecation cycle > ends, they can be removed, and with them all the legacy code that now > is no longer needed, whereas if you have that in the new methods, it > keeps bothering the maintainers. That should be easy enough to fix and would be more consistent. I can look over the various calls to dbxref methods and see what needs to be done, then fix that in cvs. > You also mention a add_dbxref_context() on the wiki page - I'm not > sure why that would be needed given that you build in the -context > option to add_dbxref() from the outset. But maybe I've glossed over > some detail. The -context parameter was in get_dbxref(), to grab those DBLinks in a particular context. We could do the same with add_dbxref() (pass DBLinks in first arg as array ref, context as second arg). That would then obviate the need for add_dbxref_context(). I'll also change the parameter passing in get_dbxref() to just accept context as an single optional argument since we're dealing with only DBLink instances now. > Once this is merged back to the main trunk, I guess we need to give > Bio::SeqFeature::TypedSeqFeatureI a thorough look and make sure it > makes real sense. It describes one method, ontology_term(), which returns a Bio::Ontology::TermI. This is similar to SeqFeature::Annotated::type (), which returns a Bio::Annotation::OntologyTerm (a Bio::Ontology::TermI). My thought is to simply deprecate type() in favor of TypedSeqFeatureI::ontology_term(). > Thanks Chris for this effort, this clears a monumental roadblock. > > -hilmar No problem. It just needed to be done. chris From florent.angly at gmail.com Wed Aug 29 23:44:58 2007 From: florent.angly at gmail.com (Florent Angly) Date: Wed, 29 Aug 2007 20:44:58 -0700 Subject: [Bioperl-l] Feature/Annotation rollback finished In-Reply-To: References: Message-ID: <46D63D3A.6050308@gmail.com> Hilmar Lapp wrote: > Keep in mind that code such as > > if ($ann) { ... } > > is mostly not b/c someone wanted to use overloading, but rather > someone was lazy and really meant to say > > if (defined($ann)) { ... } > > In the absence of eq overloading, these will behave identically. So > if you leave the exceptions in it is sort-of policing lazy > programmers, which I guess is fine in principle, but is guaranteed to > trip up a lot of script code. I'd take it out if you're reasonably > sure that at least within BioPerl itself those lazy programming > incidents are removed. if ($ann) { ... } and if (defined($ann)) { ... } are not the same. if ($ann) is evaluated false for an empty string like $ann = ''; and for a value of zero, i.e. $ann = 0; while defined($ann) returns true in these 2 cases. Florent From cjfields at uiuc.edu Wed Aug 29 23:54:05 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 29 Aug 2007 22:54:05 -0500 Subject: [Bioperl-l] Feature/Annotation rollback finished In-Reply-To: <46D63D3A.6050308@gmail.com> References: <46D63D3A.6050308@gmail.com> Message-ID: <90C3DE31-12FD-4BF3-B9F7-0FB5E1DE2A28@uiuc.edu> On Aug 29, 2007, at 10:44 PM, Florent Angly wrote: > Hilmar Lapp wrote: >> Keep in mind that code such as >> >> if ($ann) { ... } >> >> is mostly not b/c someone wanted to use overloading, but rather >> someone was lazy and really meant to say >> >> if (defined($ann)) { ... } >> >> In the absence of eq overloading, these will behave identically. >> So if you leave the exceptions in it is sort-of policing lazy >> programmers, which I guess is fine in principle, but is guaranteed >> to trip up a lot of script code. I'd take it out if you're >> reasonably sure that at least within BioPerl itself those lazy >> programming incidents are removed. > if ($ann) { ... } > > and > if (defined($ann)) { ... } > > are not the same. > > if ($ann) > > is evaluated false for an empty string like > > $ann = ''; > > and for a value of zero, i.e. > > $ann = 0; > > while > > defined($ann) > > returns true in these 2 cases. > > Florent I agree, but we're talking about the context in which this test is performed, where $ann is either an instance of a Bio::AnnotationI or undef (not a scalar value or ''). In this case it works both as 'if ($ann)' or 'if (defined($ann))', though the latter is preferred. Never underestimate laziness! chris From cain.cshl at gmail.com Wed Aug 29 23:59:11 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Wed, 29 Aug 2007 23:59:11 -0400 Subject: [Bioperl-l] Feature/Annotation rollback finished In-Reply-To: <46D63D3A.6050308@gmail.com> References: <46D63D3A.6050308@gmail.com> Message-ID: <1188446351.2567.55.camel@localhost.localdomain> Hi Florent, Of course what you wrote below is true, but what Hilmar was writing about was lazy programmers (like me) who assume that the empty string and 0 value cases aren't going to happen (because we happen to know they never should in certain contexts), and so use 'if ($ann)'. Of course, at the moment, I am in the process of de-lazifying my code (though I tended to think of it as being efficent :-) Scott On Wed, 2007-08-29 at 20:44 -0700, Florent Angly wrote: > Hilmar Lapp wrote: > > Keep in mind that code such as > > > > if ($ann) { ... } > > > > is mostly not b/c someone wanted to use overloading, but rather > > someone was lazy and really meant to say > > > > if (defined($ann)) { ... } > > > > In the absence of eq overloading, these will behave identically. So > > if you leave the exceptions in it is sort-of policing lazy > > programmers, which I guess is fine in principle, but is guaranteed to > > trip up a lot of script code. I'd take it out if you're reasonably > > sure that at least within BioPerl itself those lazy programming > > incidents are removed. > if ($ann) { ... } > > and > > if (defined($ann)) { ... } > > are not the same. > > if ($ann) > > is evaluated false for an empty string like > > $ann = ''; > > and for a value of zero, i.e. > > $ann = 0; > > while > > defined($ann) > > returns true in these 2 cases. > > Florent > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070829/27872681/attachment.bin From cain.cshl at gmail.com Thu Aug 30 00:05:06 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 30 Aug 2007 00:05:06 -0400 Subject: [Bioperl-l] Feature/Annotation rollback finished In-Reply-To: <8BA0C799-9899-4926-B7F5-24180F764146@uiuc.edu> References: <1188416748.2567.36.camel@localhost.localdomain> <8BA0C799-9899-4926-B7F5-24180F764146@uiuc.edu> Message-ID: <1188446706.2567.59.camel@localhost.localdomain> Hi Chris, Is there a reason that the value method of the Bio::Annotation::SimpleValue (and possibly some of its siblings) returning "Value: $value"? It didn't used to have the "Value: " before, did it? Thanks, Scott On Wed, 2007-08-29 at 15:13 -0500, Chris Fields wrote: > I'll probably go ahead and start merging this stuff over to CVS HEAD > then. There haven't been any objections so far. > > The page I posted outlines the more critical fixes, primarily the > changes to Bio::Ontology::Term methods (along with relevant code) due > to inconsistencies in the interface. The Bio::Annotation classes > also now throw if you attempt to use them in an overloaded context. > I also split off SeqFeature::Annotated tests into it's own test suite > (SeqFeatAnnotated.t). > > Let me know if there are any problems along the way! > > chris > > On Aug 29, 2007, at 2:45 PM, Scott Cain wrote: > > > Hi Chris, > > > > I just wanted to let you know that I was out of town for a few > > days, but > > now I'm back and I'm doing testing of GMOD software based on the > > branch > > you are working on. I'll let you know how it goes, but don't let me > > stop you if you confident of your changes. I'm sure whatever goes > > wrong, it will just point out holes in the FeatureIO tests (I'm sure > > there are plenty) and will require hopefully minimal changes on my > > end. > > > > Thanks for your considerable efforts on this! (Regardless of how much > > work it makes for me :-) > > Scott > > > > > > On Tue, 2007-08-28 at 16:05 -0500, Chris Fields wrote: > >> I'm now wrapping up the Feature/Annotation rollback. I will probably > >> start merging back to the main branch in the next day or two., as > >> soon as interested parties (*cough*devs*cough*) look over the last > >> batch of changes. > >> > >> http://www.bioperl.org/wiki/Feature_Annotation_rollback#Fourth_Round > >> > >> I have also added a small benchmark test which indicates a decrease > >> in parsing time in SeqIO::genbank with all tests passing. I expect > >> this will translate over to any Bio::SeqFeature::Generic-using class > >> (open mouth, prepare to insert foot....). > >> > >> It is also possible there are still some instances where overloading > >> is expected lurking about in the ~1000 or so modules, so I'll leave > >> the exceptions I added to all Bio::AnnotationI; we can remove them > >> down the line, maybe prior to rel1.6, after more tests are added or > >> if they get particularly annoying. My guess is I caught 99.99% of > >> them (prepare to insert other foot....). > >> > >> The key change in this last round is the addition of several class > >> *dbxref* methods to Bio::Ontology::Term and > >> Bio::Annotation::OntologyTerm, all of which are capable of working > >> with either DBLink instances or simple scalars. This was primarily > >> done in order to clear up inconsistencies in the older *dblink* > >> methods, which were ambiguous (some indicates simple scalar > >> arguments, others DBLink objects); operator overloading was used > >> extensively in these cases, which led to several issues. I have > >> added deprecation warnings to the older methods which now map to > >> using the newer methods. All tests pass with the exception of a few > >> already failing on the MAIN branch; the single test which needs to be > >> fixed is a round-tripping error in swiss.t (now a TODO), which can be > >> fixed after merging back. > >> > >> Please respond to this if there are any questions or if I need to > >> clarify the changes I made a bit more. > >> > >> chris > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > > ---------------------------------------------------------------------- > > -- > > Scott Cain, Ph. D. > > cain at cshl.edu > > GMOD Coordinator (http://www.gmod.org/) > > 216-392-3087 > > Cold Spring Harbor Laboratory > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070830/b03eef7e/attachment.bin From cjfields at uiuc.edu Thu Aug 30 00:17:18 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 29 Aug 2007 23:17:18 -0500 Subject: [Bioperl-l] Feature/Annotation rollback finished In-Reply-To: <1188446706.2567.59.camel@localhost.localdomain> References: <1188416748.2567.36.camel@localhost.localdomain> <8BA0C799-9899-4926-B7F5-24180F764146@uiuc.edu> <1188446706.2567.59.camel@localhost.localdomain> Message-ID: It shouldn't, that sounds like the output for add_text(). value() should just return the scalar value. As a note, I added a new method, display_text(), for all Bio::AnnotationI classes which by default replicates the same output that stringification overloads produced. So you should be able to explicitly call $ann->display_text for any Bio::AnnotationI where you once used an implicit call: # old print "$ann\n"; # new print $ann->display_text,"\n"; chris On Aug 29, 2007, at 11:05 PM, Scott Cain wrote: > Hi Chris, > > Is there a reason that the value method of the > Bio::Annotation::SimpleValue (and possibly some of its siblings) > returning "Value: $value"? It didn't used to have the "Value: " > before, > did it? > > Thanks, > Scott > > > On Wed, 2007-08-29 at 15:13 -0500, Chris Fields wrote: >> I'll probably go ahead and start merging this stuff over to CVS HEAD >> then. There haven't been any objections so far. >> >> The page I posted outlines the more critical fixes, primarily the >> changes to Bio::Ontology::Term methods (along with relevant code) due >> to inconsistencies in the interface. The Bio::Annotation classes >> also now throw if you attempt to use them in an overloaded context. >> I also split off SeqFeature::Annotated tests into it's own test suite >> (SeqFeatAnnotated.t). >> >> Let me know if there are any problems along the way! >> >> chris >> >> On Aug 29, 2007, at 2:45 PM, Scott Cain wrote: >> >>> Hi Chris, >>> >>> I just wanted to let you know that I was out of town for a few >>> days, but >>> now I'm back and I'm doing testing of GMOD software based on the >>> branch >>> you are working on. I'll let you know how it goes, but don't let me >>> stop you if you confident of your changes. I'm sure whatever goes >>> wrong, it will just point out holes in the FeatureIO tests (I'm sure >>> there are plenty) and will require hopefully minimal changes on my >>> end. >>> >>> Thanks for your considerable efforts on this! (Regardless of how >>> much >>> work it makes for me :-) >>> Scott >>> >>> >>> On Tue, 2007-08-28 at 16:05 -0500, Chris Fields wrote: >>>> I'm now wrapping up the Feature/Annotation rollback. I will >>>> probably >>>> start merging back to the main branch in the next day or two., as >>>> soon as interested parties (*cough*devs*cough*) look over the last >>>> batch of changes. >>>> >>>> http://www.bioperl.org/wiki/ >>>> Feature_Annotation_rollback#Fourth_Round >>>> >>>> I have also added a small benchmark test which indicates a decrease >>>> in parsing time in SeqIO::genbank with all tests passing. I expect >>>> this will translate over to any Bio::SeqFeature::Generic-using >>>> class >>>> (open mouth, prepare to insert foot....). >>>> >>>> It is also possible there are still some instances where >>>> overloading >>>> is expected lurking about in the ~1000 or so modules, so I'll leave >>>> the exceptions I added to all Bio::AnnotationI; we can remove them >>>> down the line, maybe prior to rel1.6, after more tests are added or >>>> if they get particularly annoying. My guess is I caught 99.99% of >>>> them (prepare to insert other foot....). >>>> >>>> The key change in this last round is the addition of several class >>>> *dbxref* methods to Bio::Ontology::Term and >>>> Bio::Annotation::OntologyTerm, all of which are capable of working >>>> with either DBLink instances or simple scalars. This was primarily >>>> done in order to clear up inconsistencies in the older *dblink* >>>> methods, which were ambiguous (some indicates simple scalar >>>> arguments, others DBLink objects); operator overloading was used >>>> extensively in these cases, which led to several issues. I have >>>> added deprecation warnings to the older methods which now map to >>>> using the newer methods. All tests pass with the exception of a >>>> few >>>> already failing on the MAIN branch; the single test which needs >>>> to be >>>> fixed is a round-tripping error in swiss.t (now a TODO), which >>>> can be >>>> fixed after merging back. >>>> >>>> Please respond to this if there are any questions or if I need to >>>> clarify the changes I made a bit more. >>>> >>>> chris >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> -- >>> -------------------------------------------------------------------- >>> -- >>> -- >>> Scott Cain, Ph. D. >>> cain at cshl.edu >>> GMOD Coordinator (http://www.gmod.org/) >>> 216-392-3087 >>> Cold Spring Harbor Laboratory >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> > -- > ---------------------------------------------------------------------- > -- > Scott Cain, Ph. D. > cain.cshl at gmail.com > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From neetisomaiya at gmail.com Thu Aug 30 00:47:53 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Thu, 30 Aug 2007 10:17:53 +0530 Subject: [Bioperl-l] kegg xml parsing Message-ID: <764978cf0708292147q4ead37b0i782b83ecda8ce3da@mail.gmail.com> Hi, Has anyone used XML::Twig for parsing of kegg xml data? I was looking for some small example code of the same. Thanks. -- -Neeti Even my blood says, B positive From sdavis2 at mail.nih.gov Thu Aug 30 06:16:54 2007 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu, 30 Aug 2007 06:16:54 -0400 Subject: [Bioperl-l] Bio::Seq -> Solr (Lucene) ? In-Reply-To: <0FF63232-25DE-4676-8C06-B9B00BE28349@uiuc.edu> References: <46D5EF2B.5000101@jays.net> <0FF63232-25DE-4676-8C06-B9B00BE28349@uiuc.edu> Message-ID: <46D69916.4060202@mail.nih.gov> Chris Fields wrote: > On Aug 29, 2007, at 5:11 PM, Jay Hannah wrote: > >> Please slap me if I'm hysterical. >> >> I'm seeking a broad bioinformatics search engine platform. I want to >> take gobs of data in gobs of formats and allow people to search it on >> the web. Not sure how it might or might not meet your needs, but have you looked at SRS (Sequence Retrieval System)? I have never tried to use it, personally, though. Sean From cjfields at uiuc.edu Thu Aug 30 09:17:17 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 30 Aug 2007 08:17:17 -0500 Subject: [Bioperl-l] remote SwissProt server problems In-Reply-To: <6805F552-9947-4C28-B846-47B5501B31DF@uiuc.edu> References: <6805F552-9947-4C28-B846-47B5501B31DF@uiuc.edu> Message-ID: <62B4DE62-C11E-4E75-837C-6C1005FB12A4@uiuc.edu> This should be fixed now (DBFetch-related tests pass, though MeSH tests are now failing!). chris On Aug 29, 2007, at 11:03 AM, Chris Fields wrote: > Just as a notice, DBFetch is currently retrieving only single records > for the UniProtKB database (where Bio::DB::SwissProt fetches > sequences). If anyone runs remote sevrer tests and DB.t in the test > suite you'll see a failure towards the end which indicates this. > I've posted a notice to the server help desk and will respond when I > hear more. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cain.cshl at gmail.com Thu Aug 30 10:39:59 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 30 Aug 2007 10:39:59 -0400 Subject: [Bioperl-l] Feature/Annotation rollback finished In-Reply-To: References: <1188416748.2567.36.camel@localhost.localdomain> <8BA0C799-9899-4926-B7F5-24180F764146@uiuc.edu> <1188446706.2567.59.camel@localhost.localdomain> Message-ID: <1188484799.2567.84.camel@localhost.localdomain> Hi Chris, I see--I was using as_text and getting the "Value: $value"; there are places in my code where I have always used ->value and I thought that the way it was working had changed. What is the use case for having the as_text method work the way it does? Thanks, Scott On Wed, 2007-08-29 at 23:17 -0500, Chris Fields wrote: > It shouldn't, that sounds like the output for add_text(). value() > should just return the scalar value. > > As a note, I added a new method, display_text(), for all > Bio::AnnotationI classes which by default replicates the same output > that stringification overloads produced. So you should be able to > explicitly call $ann->display_text for any Bio::AnnotationI where you > once used an implicit call: > > # old > print "$ann\n"; > > # new > print $ann->display_text,"\n"; > > chris > > On Aug 29, 2007, at 11:05 PM, Scott Cain wrote: > > > Hi Chris, > > > > Is there a reason that the value method of the > > Bio::Annotation::SimpleValue (and possibly some of its siblings) > > returning "Value: $value"? It didn't used to have the "Value: " > > before, > > did it? > > > > Thanks, > > Scott > > > > > > On Wed, 2007-08-29 at 15:13 -0500, Chris Fields wrote: > >> I'll probably go ahead and start merging this stuff over to CVS HEAD > >> then. There haven't been any objections so far. > >> > >> The page I posted outlines the more critical fixes, primarily the > >> changes to Bio::Ontology::Term methods (along with relevant code) due > >> to inconsistencies in the interface. The Bio::Annotation classes > >> also now throw if you attempt to use them in an overloaded context. > >> I also split off SeqFeature::Annotated tests into it's own test suite > >> (SeqFeatAnnotated.t). > >> > >> Let me know if there are any problems along the way! > >> > >> chris > >> > >> On Aug 29, 2007, at 2:45 PM, Scott Cain wrote: > >> > >>> Hi Chris, > >>> > >>> I just wanted to let you know that I was out of town for a few > >>> days, but > >>> now I'm back and I'm doing testing of GMOD software based on the > >>> branch > >>> you are working on. I'll let you know how it goes, but don't let me > >>> stop you if you confident of your changes. I'm sure whatever goes > >>> wrong, it will just point out holes in the FeatureIO tests (I'm sure > >>> there are plenty) and will require hopefully minimal changes on my > >>> end. > >>> > >>> Thanks for your considerable efforts on this! (Regardless of how > >>> much > >>> work it makes for me :-) > >>> Scott > >>> > >>> > >>> On Tue, 2007-08-28 at 16:05 -0500, Chris Fields wrote: > >>>> I'm now wrapping up the Feature/Annotation rollback. I will > >>>> probably > >>>> start merging back to the main branch in the next day or two., as > >>>> soon as interested parties (*cough*devs*cough*) look over the last > >>>> batch of changes. > >>>> > >>>> http://www.bioperl.org/wiki/ > >>>> Feature_Annotation_rollback#Fourth_Round > >>>> > >>>> I have also added a small benchmark test which indicates a decrease > >>>> in parsing time in SeqIO::genbank with all tests passing. I expect > >>>> this will translate over to any Bio::SeqFeature::Generic-using > >>>> class > >>>> (open mouth, prepare to insert foot....). > >>>> > >>>> It is also possible there are still some instances where > >>>> overloading > >>>> is expected lurking about in the ~1000 or so modules, so I'll leave > >>>> the exceptions I added to all Bio::AnnotationI; we can remove them > >>>> down the line, maybe prior to rel1.6, after more tests are added or > >>>> if they get particularly annoying. My guess is I caught 99.99% of > >>>> them (prepare to insert other foot....). > >>>> > >>>> The key change in this last round is the addition of several class > >>>> *dbxref* methods to Bio::Ontology::Term and > >>>> Bio::Annotation::OntologyTerm, all of which are capable of working > >>>> with either DBLink instances or simple scalars. This was primarily > >>>> done in order to clear up inconsistencies in the older *dblink* > >>>> methods, which were ambiguous (some indicates simple scalar > >>>> arguments, others DBLink objects); operator overloading was used > >>>> extensively in these cases, which led to several issues. I have > >>>> added deprecation warnings to the older methods which now map to > >>>> using the newer methods. All tests pass with the exception of a > >>>> few > >>>> already failing on the MAIN branch; the single test which needs > >>>> to be > >>>> fixed is a round-tripping error in swiss.t (now a TODO), which > >>>> can be > >>>> fixed after merging back. > >>>> > >>>> Please respond to this if there are any questions or if I need to > >>>> clarify the changes I made a bit more. > >>>> > >>>> chris > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> -- > >>> -------------------------------------------------------------------- > >>> -- > >>> -- > >>> Scott Cain, Ph. D. > >>> cain at cshl.edu > >>> GMOD Coordinator (http://www.gmod.org/) > >>> 216-392-3087 > >>> Cold Spring Harbor Laboratory > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> Christopher Fields > >> Postdoctoral Researcher > >> Lab of Dr. Robert Switzer > >> Dept of Biochemistry > >> University of Illinois Urbana-Champaign > >> > >> > >> > > -- > > ---------------------------------------------------------------------- > > -- > > Scott Cain, Ph. D. > > cain.cshl at gmail.com > > GMOD Coordinator (http://www.gmod.org/) > > 216-392-3087 > > Cold Spring Harbor Laboratory > > > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070830/f2f5159f/attachment.bin From cain.cshl at gmail.com Thu Aug 30 11:46:24 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 30 Aug 2007 11:46:24 -0400 Subject: [Bioperl-l] Feature/Annotation rollback finished In-Reply-To: References: <1188416748.2567.36.camel@localhost.localdomain> <8BA0C799-9899-4926-B7F5-24180F764146@uiuc.edu> <1188446706.2567.59.camel@localhost.localdomain> Message-ID: <1188488785.2567.93.camel@localhost.localdomain> Hi Chris, Good news! I only had to add a few defineds and a few display_texts and I was able to successfully create a database and load the yeast GFF3 file. While I want to do more testing with GFF from other sources, clearly, I am 95% of the way there with relatively little work. Nice job and Thanks! Scott On Wed, 2007-08-29 at 23:17 -0500, Chris Fields wrote: > It shouldn't, that sounds like the output for add_text(). value() > should just return the scalar value. > > As a note, I added a new method, display_text(), for all > Bio::AnnotationI classes which by default replicates the same output > that stringification overloads produced. So you should be able to > explicitly call $ann->display_text for any Bio::AnnotationI where you > once used an implicit call: > > # old > print "$ann\n"; > > # new > print $ann->display_text,"\n"; > > chris > > On Aug 29, 2007, at 11:05 PM, Scott Cain wrote: > > > Hi Chris, > > > > Is there a reason that the value method of the > > Bio::Annotation::SimpleValue (and possibly some of its siblings) > > returning "Value: $value"? It didn't used to have the "Value: " > > before, > > did it? > > > > Thanks, > > Scott > > > > > > On Wed, 2007-08-29 at 15:13 -0500, Chris Fields wrote: > >> I'll probably go ahead and start merging this stuff over to CVS HEAD > >> then. There haven't been any objections so far. > >> > >> The page I posted outlines the more critical fixes, primarily the > >> changes to Bio::Ontology::Term methods (along with relevant code) due > >> to inconsistencies in the interface. The Bio::Annotation classes > >> also now throw if you attempt to use them in an overloaded context. > >> I also split off SeqFeature::Annotated tests into it's own test suite > >> (SeqFeatAnnotated.t). > >> > >> Let me know if there are any problems along the way! > >> > >> chris > >> > >> On Aug 29, 2007, at 2:45 PM, Scott Cain wrote: > >> > >>> Hi Chris, > >>> > >>> I just wanted to let you know that I was out of town for a few > >>> days, but > >>> now I'm back and I'm doing testing of GMOD software based on the > >>> branch > >>> you are working on. I'll let you know how it goes, but don't let me > >>> stop you if you confident of your changes. I'm sure whatever goes > >>> wrong, it will just point out holes in the FeatureIO tests (I'm sure > >>> there are plenty) and will require hopefully minimal changes on my > >>> end. > >>> > >>> Thanks for your considerable efforts on this! (Regardless of how > >>> much > >>> work it makes for me :-) > >>> Scott > >>> > >>> > >>> On Tue, 2007-08-28 at 16:05 -0500, Chris Fields wrote: > >>>> I'm now wrapping up the Feature/Annotation rollback. I will > >>>> probably > >>>> start merging back to the main branch in the next day or two., as > >>>> soon as interested parties (*cough*devs*cough*) look over the last > >>>> batch of changes. > >>>> > >>>> http://www.bioperl.org/wiki/ > >>>> Feature_Annotation_rollback#Fourth_Round > >>>> > >>>> I have also added a small benchmark test which indicates a decrease > >>>> in parsing time in SeqIO::genbank with all tests passing. I expect > >>>> this will translate over to any Bio::SeqFeature::Generic-using > >>>> class > >>>> (open mouth, prepare to insert foot....). > >>>> > >>>> It is also possible there are still some instances where > >>>> overloading > >>>> is expected lurking about in the ~1000 or so modules, so I'll leave > >>>> the exceptions I added to all Bio::AnnotationI; we can remove them > >>>> down the line, maybe prior to rel1.6, after more tests are added or > >>>> if they get particularly annoying. My guess is I caught 99.99% of > >>>> them (prepare to insert other foot....). > >>>> > >>>> The key change in this last round is the addition of several class > >>>> *dbxref* methods to Bio::Ontology::Term and > >>>> Bio::Annotation::OntologyTerm, all of which are capable of working > >>>> with either DBLink instances or simple scalars. This was primarily > >>>> done in order to clear up inconsistencies in the older *dblink* > >>>> methods, which were ambiguous (some indicates simple scalar > >>>> arguments, others DBLink objects); operator overloading was used > >>>> extensively in these cases, which led to several issues. I have > >>>> added deprecation warnings to the older methods which now map to > >>>> using the newer methods. All tests pass with the exception of a > >>>> few > >>>> already failing on the MAIN branch; the single test which needs > >>>> to be > >>>> fixed is a round-tripping error in swiss.t (now a TODO), which > >>>> can be > >>>> fixed after merging back. > >>>> > >>>> Please respond to this if there are any questions or if I need to > >>>> clarify the changes I made a bit more. > >>>> > >>>> chris > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> -- > >>> -------------------------------------------------------------------- > >>> -- > >>> -- > >>> Scott Cain, Ph. D. > >>> cain at cshl.edu > >>> GMOD Coordinator (http://www.gmod.org/) > >>> 216-392-3087 > >>> Cold Spring Harbor Laboratory > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> Christopher Fields > >> Postdoctoral Researcher > >> Lab of Dr. Robert Switzer > >> Dept of Biochemistry > >> University of Illinois Urbana-Champaign > >> > >> > >> > > -- > > ---------------------------------------------------------------------- > > -- > > Scott Cain, Ph. D. > > cain.cshl at gmail.com > > GMOD Coordinator (http://www.gmod.org/) > > 216-392-3087 > > Cold Spring Harbor Laboratory > > > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070830/ec7a594e/attachment.bin From hlapp at gmx.net Thu Aug 30 12:07:18 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 30 Aug 2007 12:07:18 -0400 Subject: [Bioperl-l] Feature/Annotation rollback finished In-Reply-To: <1188488785.2567.93.camel@localhost.localdomain> References: <1188416748.2567.36.camel@localhost.localdomain> <8BA0C799-9899-4926-B7F5-24180F764146@uiuc.edu> <1188446706.2567.59.camel@localhost.localdomain> <1188488785.2567.93.camel@localhost.localdomain> Message-ID: <0545DE1A-F2E2-4FA8-BE7C-436EE25C7D92@gmx.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 30, 2007, at 11:46 AM, Scott Cain wrote: > Good news! I only had to add a few defineds and a few > display_texts and > I was able to successfully create a database and load the yeast GFF3 Scott - I'm a little worried - what are you using the display_text() calls for? There is no method to set a property that would be returned here, so you only have control over that if you override the method in a custom AnnotationI class. -hilmar - -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (Darwin) iD8DBQFG1us5uV6N2JxL7qsRAicFAKCFCHPORyK9273X8u2/gbaZCNpEHgCeMovA OtZghop1tET5iMqnwXzL+lk= =NVrK -----END PGP SIGNATURE----- From hlapp at gmx.net Thu Aug 30 12:10:14 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 30 Aug 2007 12:10:14 -0400 Subject: [Bioperl-l] Feature/Annotation rollback finished In-Reply-To: <1188484799.2567.84.camel@localhost.localdomain> References: <1188416748.2567.36.camel@localhost.localdomain> <8BA0C799-9899-4926-B7F5-24180F764146@uiuc.edu> <1188446706.2567.59.camel@localhost.localdomain> <1188484799.2567.84.camel@localhost.localdomain> Message-ID: <49824C75-3FA5-4E59-8F99-BC0E974E9652@gmx.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 30, 2007, at 10:39 AM, Scott Cain wrote: > What is the use case for having the as_text method work the way it > does? That's a bit nebulous as I tried to point out the other day. It's just a textual representation of the annotation, but you don't really have control over what the particular Annotation class considers to fulfill that purpose. So, it's fine to expect a printable meaningful string to be returned, but don't try to parse it or rely on exactly what it is going to look like. -hilmar - -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (Darwin) iD8DBQFG1uvnuV6N2JxL7qsRAn+dAKC9iLj93El38uv7kjprdZDo0sXC6wCgqwhm 0/tF89/FO1a4CWAf1bahd+8= =I7SM -----END PGP SIGNATURE----- From hlapp at gmx.net Thu Aug 30 12:20:18 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 30 Aug 2007 12:20:18 -0400 Subject: [Bioperl-l] Feature/Annotation rollback finished In-Reply-To: References: Message-ID: On Aug 29, 2007, at 11:23 PM, Chris Fields wrote: >> Once this is merged back to the main trunk, I guess we need to give >> Bio::SeqFeature::TypedSeqFeatureI a thorough look and make sure it >> makes real sense. > > It describes one method, ontology_term(), which returns a > Bio::Ontology::TermI. This is similar to > SeqFeature::Annotated::type(), which returns a > Bio::Annotation::OntologyTerm (a Bio::Ontology::TermI). My thought > is to simply deprecate type() in favor of > TypedSeqFeatureI::ontology_term(). I think we'll want to think about that. type() gives me some indication of what the returned value might represent, whereas ontology_term() only tells me about the type of the returned object. You could make ontology_term() accept a context argument, such as my $feature_type = $typedFeat->ontology_term(-context => -type); Or you could name the method(s) more explicitly, such as my $feature_type = $typedFeat->type_term(); my $feature_source = $typedFeat->source_term(); my @annTerms = $typedFeat->get_Annotations('Gene Ontology'); Am I making sense? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cain.cshl at gmail.com Thu Aug 30 12:28:47 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 30 Aug 2007 12:28:47 -0400 Subject: [Bioperl-l] Feature/Annotation rollback finished In-Reply-To: <0545DE1A-F2E2-4FA8-BE7C-436EE25C7D92@gmx.net> References: <1188416748.2567.36.camel@localhost.localdomain> <8BA0C799-9899-4926-B7F5-24180F764146@uiuc.edu> <1188446706.2567.59.camel@localhost.localdomain> <1188488785.2567.93.camel@localhost.localdomain> <0545DE1A-F2E2-4FA8-BE7C-436EE25C7D92@gmx.net> Message-ID: <1188491327.2567.101.camel@localhost.localdomain> Hi Hilmar, I'm using it as Chris suggested: where I had be depending on "" overloading. I think in most places, I am using it on Bio::Annotation::SimpleValue to get the string that is the simple value. On more complex data types, I am using other methods built into those classes to extract useful stuff for inserting into the database. Scott On Thu, 2007-08-30 at 12:07 -0400, Hilmar Lapp wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > On Aug 30, 2007, at 11:46 AM, Scott Cain wrote: > > > Good news! I only had to add a few defineds and a few > > display_texts and > > I was able to successfully create a database and load the yeast GFF3 > > Scott - I'm a little worried - what are you using the display_text() > calls for? There is no method to set a property that would be > returned here, so you only have control over that if you override the > method in a custom AnnotationI class. > > -hilmar > - -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.3 (Darwin) > > iD8DBQFG1us5uV6N2JxL7qsRAicFAKCFCHPORyK9273X8u2/gbaZCNpEHgCeMovA > OtZghop1tET5iMqnwXzL+lk= > =NVrK > -----END PGP SIGNATURE----- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070830/1d98e384/attachment.bin From hlapp at gmx.net Thu Aug 30 12:52:14 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 30 Aug 2007 12:52:14 -0400 Subject: [Bioperl-l] Feature/Annotation rollback finished In-Reply-To: <1188491327.2567.101.camel@localhost.localdomain> References: <1188416748.2567.36.camel@localhost.localdomain> <8BA0C799-9899-4926-B7F5-24180F764146@uiuc.edu> <1188446706.2567.59.camel@localhost.localdomain> <1188488785.2567.93.camel@localhost.localdomain> <0545DE1A-F2E2-4FA8-BE7C-436EE25C7D92@gmx.net> <1188491327.2567.101.camel@localhost.localdomain> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 30, 2007, at 12:28 PM, Scott Cain wrote: > I think in most places, I am using it on > Bio::Annotation::SimpleValue to get the string that is the simple > value. You should be using $ann->value() for that, unless I'm missing something. -hilmar - -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (Darwin) iD8DBQFG1vXCuV6N2JxL7qsRAkcJAKCICRtOSlPLVYYKCbOTvDIf4idb3wCgkxYM seeaNvSsFY/4bHLGZ9dum2Q= =E35w -----END PGP SIGNATURE----- From cain.cshl at gmail.com Thu Aug 30 13:16:09 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 30 Aug 2007 13:16:09 -0400 Subject: [Bioperl-l] Feature/Annotation rollback finished In-Reply-To: References: <1188416748.2567.36.camel@localhost.localdomain> <8BA0C799-9899-4926-B7F5-24180F764146@uiuc.edu> <1188446706.2567.59.camel@localhost.localdomain> <1188488785.2567.93.camel@localhost.localdomain> <0545DE1A-F2E2-4FA8-BE7C-436EE25C7D92@gmx.net> <1188491327.2567.101.camel@localhost.localdomain> Message-ID: <1188494169.2567.109.camel@localhost.localdomain> Well, in the instances where I was using it, ->value seems to work exactly the same, so I changed it to value to be more consistent with other code I'd written. I'd used display_name without really thinking about it. Thanks, Scott On Thu, 2007-08-30 at 12:52 -0400, Hilmar Lapp wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > On Aug 30, 2007, at 12:28 PM, Scott Cain wrote: > > > I think in most places, I am using it on > > Bio::Annotation::SimpleValue to get the string that is the simple > > value. > > You should be using $ann->value() for that, unless I'm missing > something. > > -hilmar > - -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.3 (Darwin) > > iD8DBQFG1vXCuV6N2JxL7qsRAkcJAKCICRtOSlPLVYYKCbOTvDIf4idb3wCgkxYM > seeaNvSsFY/4bHLGZ9dum2Q= > =E35w > -----END PGP SIGNATURE----- -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070830/4c383cd3/attachment.bin From cjfields at uiuc.edu Thu Aug 30 13:27:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 30 Aug 2007 12:27:46 -0500 Subject: [Bioperl-l] Feature/Annotation rollback finished In-Reply-To: <1188491327.2567.101.camel@localhost.localdomain> References: <1188416748.2567.36.camel@localhost.localdomain> <8BA0C799-9899-4926-B7F5-24180F764146@uiuc.edu> <1188446706.2567.59.camel@localhost.localdomain> <1188488785.2567.93.camel@localhost.localdomain> <0545DE1A-F2E2-4FA8-BE7C-436EE25C7D92@gmx.net> <1188491327.2567.101.camel@localhost.localdomain> Message-ID: <6E9B07D0-AB37-4439-AA9D-9268AB5A38C0@uiuc.edu> display_text() is really a hack for explicitly getting the same output one would have expected from stringification overload for any Bio::AnnotationI (you can also use callbacks on it for customizing it if needed, but that's not important here). It works depending on the context of what you're trying to accomplish, but it might be best to use value() specifically in places where you expect only using Bio::Annotation::Simple. chris On Aug 30, 2007, at 11:28 AM, Scott Cain wrote: > Hi Hilmar, > > I'm using it as Chris suggested: where I had be depending on "" > overloading. I think in most places, I am using it on > Bio::Annotation::SimpleValue to get the string that is the simple > value. > On more complex data types, I am using other methods built into those > classes to extract useful stuff for inserting into the database. > > Scott > > > > On Thu, 2007-08-30 at 12:07 -0400, Hilmar Lapp wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> >> On Aug 30, 2007, at 11:46 AM, Scott Cain wrote: >> >>> Good news! I only had to add a few defineds and a few >>> display_texts and >>> I was able to successfully create a database and load the yeast GFF3 >> >> Scott - I'm a little worried - what are you using the display_text() >> calls for? There is no method to set a property that would be >> returned here, so you only have control over that if you override the >> method in a custom AnnotationI class. >> >> -hilmar >> - -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> -----BEGIN PGP SIGNATURE----- >> Version: GnuPG v1.4.3 (Darwin) >> >> iD8DBQFG1us5uV6N2JxL7qsRAicFAKCFCHPORyK9273X8u2/gbaZCNpEHgCeMovA >> OtZghop1tET5iMqnwXzL+lk= >> =NVrK >> -----END PGP SIGNATURE----- >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- > ---------------------------------------------------------------------- > -- > Scott Cain, Ph. D. > cain.cshl at gmail.com > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Aug 30 13:45:44 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 30 Aug 2007 12:45:44 -0500 Subject: [Bioperl-l] Feature/Annotation rollback finished In-Reply-To: <1188488785.2567.93.camel@localhost.localdomain> References: <1188416748.2567.36.camel@localhost.localdomain> <8BA0C799-9899-4926-B7F5-24180F764146@uiuc.edu> <1188446706.2567.59.camel@localhost.localdomain> <1188488785.2567.93.camel@localhost.localdomain> Message-ID: Sounds good but I have yet to commit some of the Ontology changes Hilmar and I discussed (whereupon our brace heroes deprecate dblinks methods in favor of dbxrefs). These should be committed fairly soon (hour or two). My guess is the change will be fairly transparent so shouldn't affect anything unless you have scripts using those methods directly. chris On Aug 30, 2007, at 10:46 AM, Scott Cain wrote: > Hi Chris, > > Good news! I only had to add a few defineds and a few > display_texts and > I was able to successfully create a database and load the yeast GFF3 > file. While I want to do more testing with GFF from other sources, > clearly, I am 95% of the way there with relatively little work. > > Nice job and Thanks! > Scott > > > On Wed, 2007-08-29 at 23:17 -0500, Chris Fields wrote: >> It shouldn't, that sounds like the output for add_text(). value() >> should just return the scalar value. >> >> As a note, I added a new method, display_text(), for all >> Bio::AnnotationI classes which by default replicates the same output >> that stringification overloads produced. So you should be able to >> explicitly call $ann->display_text for any Bio::AnnotationI where you >> once used an implicit call: >> >> # old >> print "$ann\n"; >> >> # new >> print $ann->display_text,"\n"; >> >> chris >> >> On Aug 29, 2007, at 11:05 PM, Scott Cain wrote: >> >>> Hi Chris, >>> >>> Is there a reason that the value method of the >>> Bio::Annotation::SimpleValue (and possibly some of its siblings) >>> returning "Value: $value"? It didn't used to have the "Value: " >>> before, >>> did it? >>> >>> Thanks, >>> Scott >>> >>> >>> On Wed, 2007-08-29 at 15:13 -0500, Chris Fields wrote: >>>> I'll probably go ahead and start merging this stuff over to CVS >>>> HEAD >>>> then. There haven't been any objections so far. >>>> >>>> The page I posted outlines the more critical fixes, primarily the >>>> changes to Bio::Ontology::Term methods (along with relevant >>>> code) due >>>> to inconsistencies in the interface. The Bio::Annotation classes >>>> also now throw if you attempt to use them in an overloaded context. >>>> I also split off SeqFeature::Annotated tests into it's own test >>>> suite >>>> (SeqFeatAnnotated.t). >>>> >>>> Let me know if there are any problems along the way! >>>> >>>> chris >>>> >>>> On Aug 29, 2007, at 2:45 PM, Scott Cain wrote: >>>> >>>>> Hi Chris, >>>>> >>>>> I just wanted to let you know that I was out of town for a few >>>>> days, but >>>>> now I'm back and I'm doing testing of GMOD software based on the >>>>> branch >>>>> you are working on. I'll let you know how it goes, but don't >>>>> let me >>>>> stop you if you confident of your changes. I'm sure whatever goes >>>>> wrong, it will just point out holes in the FeatureIO tests (I'm >>>>> sure >>>>> there are plenty) and will require hopefully minimal changes on my >>>>> end. >>>>> >>>>> Thanks for your considerable efforts on this! (Regardless of how >>>>> much >>>>> work it makes for me :-) >>>>> Scott >>>>> >>>>> >>>>> On Tue, 2007-08-28 at 16:05 -0500, Chris Fields wrote: >>>>>> I'm now wrapping up the Feature/Annotation rollback. I will >>>>>> probably >>>>>> start merging back to the main branch in the next day or two., as >>>>>> soon as interested parties (*cough*devs*cough*) look over the >>>>>> last >>>>>> batch of changes. >>>>>> >>>>>> http://www.bioperl.org/wiki/ >>>>>> Feature_Annotation_rollback#Fourth_Round >>>>>> >>>>>> I have also added a small benchmark test which indicates a >>>>>> decrease >>>>>> in parsing time in SeqIO::genbank with all tests passing. I >>>>>> expect >>>>>> this will translate over to any Bio::SeqFeature::Generic-using >>>>>> class >>>>>> (open mouth, prepare to insert foot....). >>>>>> >>>>>> It is also possible there are still some instances where >>>>>> overloading >>>>>> is expected lurking about in the ~1000 or so modules, so I'll >>>>>> leave >>>>>> the exceptions I added to all Bio::AnnotationI; we can remove >>>>>> them >>>>>> down the line, maybe prior to rel1.6, after more tests are >>>>>> added or >>>>>> if they get particularly annoying. My guess is I caught >>>>>> 99.99% of >>>>>> them (prepare to insert other foot....). >>>>>> >>>>>> The key change in this last round is the addition of several >>>>>> class >>>>>> *dbxref* methods to Bio::Ontology::Term and >>>>>> Bio::Annotation::OntologyTerm, all of which are capable of >>>>>> working >>>>>> with either DBLink instances or simple scalars. This was >>>>>> primarily >>>>>> done in order to clear up inconsistencies in the older *dblink* >>>>>> methods, which were ambiguous (some indicates simple scalar >>>>>> arguments, others DBLink objects); operator overloading was used >>>>>> extensively in these cases, which led to several issues. I have >>>>>> added deprecation warnings to the older methods which now map to >>>>>> using the newer methods. All tests pass with the exception of a >>>>>> few >>>>>> already failing on the MAIN branch; the single test which needs >>>>>> to be >>>>>> fixed is a round-tripping error in swiss.t (now a TODO), which >>>>>> can be >>>>>> fixed after merging back. >>>>>> >>>>>> Please respond to this if there are any questions or if I need to >>>>>> clarify the changes I made a bit more. >>>>>> >>>>>> chris >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> -- >>>>> ------------------------------------------------------------------ >>>>> -- >>>>> -- >>>>> -- >>>>> Scott Cain, Ph. D. >>>>> cain at cshl.edu >>>>> GMOD Coordinator (http://www.gmod.org/) >>>>> 216-392-3087 >>>>> Cold Spring Harbor Laboratory >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> Christopher Fields >>>> Postdoctoral Researcher >>>> Lab of Dr. Robert Switzer >>>> Dept of Biochemistry >>>> University of Illinois Urbana-Champaign >>>> >>>> >>>> >>> -- >>> -------------------------------------------------------------------- >>> -- >>> -- >>> Scott Cain, Ph. D. >>> cain.cshl at gmail.com >>> GMOD Coordinator (http://www.gmod.org/) >>> 216-392-3087 >>> Cold Spring Harbor Laboratory >>> >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- > ---------------------------------------------------------------------- > -- > Scott Cain, Ph. D. > cain.cshl at gmail.com > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Aug 30 14:03:29 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 30 Aug 2007 13:03:29 -0500 Subject: [Bioperl-l] Feature/Annotation rollback finished In-Reply-To: References: Message-ID: On Aug 30, 2007, at 11:20 AM, Hilmar Lapp wrote: >> ...It describes one method, ontology_term(), which returns a >> Bio::Ontology::TermI. This is similar to >> SeqFeature::Annotated::type(), which returns a >> Bio::Annotation::OntologyTerm (a Bio::Ontology::TermI). My >> thought is to simply deprecate type() in favor of >> TypedSeqFeatureI::ontology_term(). > > I think we'll want to think about that. type() gives me some > indication of what the returned value might represent, whereas > ontology_term() only tells me about the type of the returned object. > > You could make ontology_term() accept a context argument, such as > > my $feature_type = $typedFeat->ontology_term(-context => -type); > > Or you could name the method(s) more explicitly, such as > > my $feature_type = $typedFeat->type_term(); > my $feature_source = $typedFeat->source_term(); > my @annTerms = $typedFeat->get_Annotations('Gene Ontology'); > > Am I making sense? > > -hilmar I think so; I'll have to look at what is returned from type() in some more detail. It appears that the two main culprits for passing strings off to Ontology::Term are the Bio::OntologyIO::obo and Bio::OntologyIO::dagflat parsers. I can add some code in there to change those to DBLinks prior to creating Ontology::Term instances, which should clean that up. chris From cjfields at uiuc.edu Thu Aug 30 20:57:15 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 30 Aug 2007 19:57:15 -0500 Subject: [Bioperl-l] Bio::Expression & Re: ReseqChip, module/package name In-Reply-To: <46CF27F4.8030608@arcor.de> References: <03D7F0EB-3BC2-4988-B67F-09C4225EAE13@uiuc.edu> <46CEAD83.2050904@arcor.de> <9824900.1187973171940.JavaMail.ngmail@webmail17> <46CF27F4.8030608@arcor.de> Message-ID: <4ED2E2B0-8E36-4500-A4C9-B8C333E14614@uiuc.edu> On Aug 24, 2007, at 1:48 PM, marian wrote: > ... > Bio::Microarray::Tools::MitoChip would be OK to me. I merely meant, > that it > isnt an expression chip and you also wont/cant analyze expression > data with > the tool I am talking about. > > Marian Okay, I have everything working from bugzilla: http://bugzilla.open-bio.org/show_bug.cgi?id=2332 I suppose what we need to do next is get a test script going. I'll look at the script attached to see if we can get something going that is fairly quick. chris From avilella at gmail.com Fri Aug 31 05:29:43 2007 From: avilella at gmail.com (Albert Vilella) Date: Fri, 31 Aug 2007 10:29:43 +0100 Subject: [Bioperl-l] display protein/CDS multiple sequence alignments with exon boundaries Message-ID: <358f4d650708310229i421bb1d7w5f64e3ded5f92618@mail.gmail.com> Hi, Probably a bit of a long shot but does anyone have code for displaying protein or CDS multiple sequence alignments with the exon boundaries of each gene in the alignment? Something in the bioperl world without funky external dependencies. I think it would be an awesome addition to the howtos. Currently, the Bio::Graphics howto has cdna to genome mapping scripts or blast output scripts, but I couldn't find code for dealing with multiple sequence alignments. Cheers, Albert. From neetisomaiya at gmail.com Fri Aug 31 05:41:51 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Fri, 31 Aug 2007 15:11:51 +0530 Subject: [Bioperl-l] need help Message-ID: <764978cf0708310241i1baf6feeoc808c396125c078e@mail.gmail.com> Hi, I am trying to parse the compound ( ftp://ftp.genome.jp/pub/kegg/ligand/compound/compound) and glycan ( ftp://ftp.genome.jp/pub/kegg/ligand/glycan/glycan) files of KEGG using bioperl. I just want the kegg id of the compound/glycan and its names and synonyms if any. Bio::SeqIO is giving some problem, I am not able to fetch the id and name. Can someone help me with this. Thanks. -- -Neeti Even my blood says, B positive From cjfields at uiuc.edu Fri Aug 31 10:51:51 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 31 Aug 2007 09:51:51 -0500 Subject: [Bioperl-l] need help In-Reply-To: <764978cf0708310241i1baf6feeoc808c396125c078e@mail.gmail.com> References: <764978cf0708310241i1baf6feeoc808c396125c078e@mail.gmail.com> Message-ID: I don't believe Bio::SeqIO::kegg will parse those files (they aren't sequence files). The format it recognizes is: http://www.bioperl.org/wiki/KEGG_sequence_format for the files found in the subdirectories here: ftp://ftp.genome.ad.jp/pub/kegg/genes/organisms I would just build a custom parser if all you're interested in is id/ names/synonyms. It'll be much faster. chris On Aug 31, 2007, at 4:41 AM, neeti somaiya wrote: > Hi, > > I am trying to parse the compound ( > ftp://ftp.genome.jp/pub/kegg/ligand/compound/compound) and glycan ( > ftp://ftp.genome.jp/pub/kegg/ligand/glycan/glycan) files of KEGG using > bioperl. > I just want the kegg id of the compound/glycan and its names and > synonyms if > any. > Bio::SeqIO is giving some problem, I am not able to fetch the id > and name. > Can someone help me with this. > > Thanks. > > -- > -Neeti > Even my blood says, B positive > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign