From cuiw at ncbi.nlm.nih.gov Thu Feb 1 09:47:38 2007 From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C]) Date: Thu, 1 Feb 2007 09:47:38 -0500 Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ? In-Reply-To: <45C1059D.1070100@tbi.univie.ac.at> References: <45C1059D.1070100@tbi.univie.ac.at> Message-ID: <18C407FD4FFB424292D769FBD68C1987020BB960@NIHCESMLBX8.nih.gov> This is a simple test from gene ID 3632373 (protein is 46100068) to contig coordinates: perl -MLWP::Simple -e 'map {print $_, "\n" if /<(Gene-source_src.*?>)(.*)?<$1/} (split "\n", get(q{http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&i d=3632373&retmode=xml}))' You need to translate protein id to gene id though. If the genome is available at Map Viewer, try (the contig name is NW_101115 from last step) http://www.ncbi.nlm.nih.gov/mapview/maps.cgi?taxid=5270&gnl=NW_101115&MA PS=genes&cmd=txt Wenwu Cui, PhD -----Original Message----- From: Rainer Machne [mailto:raim at tbi.univie.ac.at] Sent: Wednesday, January 31, 2007 4:10 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ? Dear Bioperl list, hoping not be on the wrong email list, i would have a short question: Is there a standard way or are there nice (Bioperl) tools to come from a gene id (gi) other ids (see below) to the genomic coordinates of the respective gene? We have Fasta files retrieved from NCBI protein Blast in fungal genomes: >gi|46100068|gb|EAK85301.1| hypothetical protein UM04252.1 [Ustilago maydis 521] or >gi|50292953|ref|XP_448909.1| unnamed protein product [Candida glabrata] (we only have gi, ref and gb in my set). I retrieved all my fasta files from whole fungal genomes with available protein sequences at http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?organism=fungi As I only searched whole finished genomes (not shotgun), I thought it would then be easy to get the genomic coordinates and retrieve upstream sequences, but we have failed so far to find a consistent way to do this automatically. Many of the gi entries refer to mRNAs or partial mRNAs and the way to the coordinates seems to differ for each case. Any suggestions would be appreciated. with kind regards, Rainer Machne University of Vienna Department for Theoretical Chemistry Theoretical Biochemistry Group _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From raim at tbi.univie.ac.at Thu Feb 1 07:54:21 2007 From: raim at tbi.univie.ac.at (Rainer Machne) Date: Thu, 01 Feb 2007 13:54:21 +0100 Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ? In-Reply-To: References: <45C1059D.1070100@tbi.univie.ac.at> Message-ID: <45C1E2FD.3070709@tbi.univie.ac.at> Barry and Jason, thanks for your quick and very helpful replies. I guess we should have done (or repeat) our blast search at http://fungal.genome.duke.edu/ to get better mapping from proteins to genomes ? As I retrieved all my proteins via whole genome blasts we should find (most of) them in the genbank files ... a good opportunity for me to learn some Bioperl and the other packages you mentioned in case we want to do more complex analysis later :-) Thank you very much! Rainer Barry Moore wrote: > Rainer, > > We use a perl library called CGL written by Mark Yandell and colleagues > (which in turn uses Chris Mungal's BioChaos and Unflattener.pm referred > to by Jason) for this type of task. The basic pipeline is convert > GenBank files to Chaos XML, then use CGL with those XML files to get a > nice object oriented access to exons, transcripts, proteins, > coordinates and more for of those genes. I am currently using this > with good success on most GenBank genomes (unfortunately I haven't been > working with the fungal genomes, but it should work fine). The Ensembl > API provides similar functionality for Ensembl genomes - but not very > many fungi there. > > http://www.yandell-lab.org/cgl/ > http://www.ensembl.org/info/software/core/core_tutorial.html > > Feel free to contact Mark or myself directly if you are interested in > using CGL. > > Barry > > On Jan 31, 2007, at 2:09 PM, Rainer Machne wrote: > >> Dear Bioperl list, >> >> hoping not be on the wrong email list, i would have a short question: >> >> Is there a standard way or are there nice (Bioperl) tools to come from a >> gene id (gi) other ids (see below) to the genomic coordinates of the >> respective gene? >> >> We have Fasta files retrieved from NCBI protein Blast in fungal genomes: >> >>> gi|46100068|gb|EAK85301.1| hypothetical protein UM04252.1 [Ustilago >> >> maydis 521] >> or >> >>> gi|50292953|ref|XP_448909.1| unnamed protein product [Candida glabrata] >> >> >> (we only have gi, ref and gb in my set). >> >> I retrieved all my fasta files from whole fungal genomes with available >> protein sequences at >> http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?organism=fungi >> >> As I only searched whole finished genomes (not shotgun), I thought it >> would then be easy to get the genomic coordinates and retrieve upstream >> sequences, but we have failed so far to find a consistent way to do this >> automatically. Many of the gi entries refer to mRNAs or partial mRNAs >> and the way to the coordinates seems to differ for each case. >> >> Any suggestions would be appreciated. >> >> with kind regards, >> Rainer Machne >> >> University of Vienna >> Department for Theoretical Chemistry >> Theoretical Biochemistry Group >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at uiuc.edu Thu Feb 1 12:55:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Feb 2007 11:55:27 -0600 Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ? In-Reply-To: <45C1E2FD.3070709@tbi.univie.ac.at> References: <45C1059D.1070100@tbi.univie.ac.at> <45C1E2FD.3070709@tbi.univie.ac.at> Message-ID: On Feb 1, 2007, at 6:54 AM, Rainer Machne wrote: > Barry and Jason, > > thanks for your quick and very helpful replies. > > I guess we should have done (or repeat) our blast search at > http://fungal.genome.duke.edu/ > to get better mapping from proteins to genomes ? > > As I retrieved all my proteins via whole genome blasts we should find > (most of) them in the genbank files ... a good opportunity for me to > learn some Bioperl and the other packages you mentioned in case we > want > to do more complex analysis later :-) > > Thank you very much! > > Rainer If the data is available in GenBank you could run the BLAST searches at NCBI and limit the search with an Entrez query: http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#entrez_query Most (all?) genome files are tagged as complete I'm not sure but there might be a way of doing this via Bio::Tools::Run::RemoteBlast. Jason, any ideas? chris From cjfields at uiuc.edu Thu Feb 1 13:09:16 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Feb 2007 12:09:16 -0600 Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ? In-Reply-To: References: <45C1059D.1070100@tbi.univie.ac.at> <45C1E2FD.3070709@tbi.univie.ac.at> Message-ID: <748CC48E-D224-4234-A5C4-E33968F17418@uiuc.edu> > If the data is available in GenBank you could run the BLAST searches > at NCBI and limit the search with an Entrez query: > > http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#entrez_query > > Most (all?) genome files are tagged as complete sorry, didn't finish that... "Most (all?) genome files are tagged as complete, wgs, in progress, etc. and can be limited by taxonomy using Fungi[ORGN] or similar." chris From jason at bioperl.org Thu Feb 1 13:36:02 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 1 Feb 2007 10:36:02 -0800 Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ? In-Reply-To: References: <45C1059D.1070100@tbi.univie.ac.at> <45C1E2FD.3070709@tbi.univie.ac.at> Message-ID: On Feb 1, 2007, at 9:55 AM, Chris Fields wrote: > > On Feb 1, 2007, at 6:54 AM, Rainer Machne wrote: > >> Barry and Jason, >> >> thanks for your quick and very helpful replies. >> >> I guess we should have done (or repeat) our blast search at >> http://fungal.genome.duke.edu/ >> to get better mapping from proteins to genomes ? >> Well I'm not quite sure of your exact goals. To find upstream regions of known genes, or look at upstream regions of orthologous genes? You can first figure out orthologs based on protein similarities, then go in an extract upstream regions for the orthologous genes (I provide a link to a big all-vs-all FASTA result at the bottom of the page if you want those results, as well as some pairiwise orthology assignments, although you may want more or less stringent parameters). All the GFF and AA data is freely available for download on the site for each genome we've annotated or for annotation we've re-formatted so you can do things locally and/or modify it to your liking. >> As I retrieved all my proteins via whole genome blasts we should find >> (most of) them in the genbank files ... a good opportunity for me to >> learn some Bioperl and the other packages you mentioned in case we >> want >> to do more complex analysis later :-) >> >> Thank you very much! >> >> Rainer > > If the data is available in GenBank you could run the BLAST > searches at NCBI and limit the search with an Entrez query: > > http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#entrez_query > > Most (all?) genome files are tagged as complete > > I'm not sure but there might be a way of doing this via > Bio::Tools::Run::RemoteBlast. Jason, any ideas? > > chris -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From reenayadav at gmail.com Thu Feb 1 13:38:03 2007 From: reenayadav at gmail.com (Reena Yadav) Date: Fri, 2 Feb 2007 00:08:03 +0530 Subject: [Bioperl-l] pdb parser Message-ID: <76f897dd0702011038v7afe0207gb05465478e026205@mail.gmail.com> hi need to extract pdb atomic coordinates (1ake), and do certain calculations. i am going stepwise: steps that involved are: (1) reading the atomic coordinates (2) read the result in a file. need to understand how to whole xyz line in another file. could someone help. R. From jason at bioperl.org Thu Feb 1 08:06:42 2007 From: jason at bioperl.org (sandhya khatal) Date: Thu, 1 Feb 2007 13:06:42 +0000 Subject: [Bioperl-l] Regarding Bioperl program Message-ID: <75899ED1-72C6-4272-8CAC-028CF133A0B4@gmail.com> Respected Sir, I want to do a program which gives dendrogram like UPGMA a clustering method, but i want this dendrogram by using single linkage or centroid method.Can u help me for this.U have given the code for tree but i want dendrogram as output by using above any method. Thanks for anticipating. Regards, Sandhya Khatal. From jason at bioperl.org Thu Feb 1 19:55:26 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 1 Feb 2007 16:55:26 -0800 Subject: [Bioperl-l] Fwd: Regarding Bioperl program References: <394d31ee0702010506j4bbd79dck41d5ac2162eaafdd@mail.gmail.com> Message-ID: <40020502-3421-407D-85EB-24F420AB699C@bioperl.org> re-forwarding Sandhya's email to the list so the email address is visible. The approach that is coded in bioperl is for distance based data such as evolutionary distance of DNA or protein sequences - I assume you are talking about clustering expression data? You may want to focus on the available literature and toolkits that focus on expression data - something BioPerl doesn't deliberately focus on right now. -jason Begin forwarded message: > From: "sandhya khatal" > Date: February 1, 2007 5:06:42 AM PST > To: jason at bioperl.org > Subject: Regarding Bioperl program > > Respected Sir, > I want to do a program which gives dendrogram > like > UPGMA a clustering method, but i want this dendrogram by using single > linkage or centroid method.Can u help me for this.U have given the > code for > tree but i want dendrogram as output by using above any method. > > Thanks for anticipating. > > Regards, > Sandhya Khatal. -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From lzhtom at hotmail.com Thu Feb 1 22:20:10 2007 From: lzhtom at hotmail.com (zhihua li) Date: Fri, 02 Feb 2007 03:20:10 +0000 Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file? Message-ID: _________________________________________________________________ ???????? MSN Explorer: http://explorer.msn.com/lccn/ From lzhtom at hotmail.com Thu Feb 1 22:27:39 2007 From: lzhtom at hotmail.com (zhihua li) Date: Fri, 02 Feb 2007 03:27:39 +0000 Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file? Message-ID: Sorry guys, the former empty mail was sent out by mistake. I'm using Bio::index::Fasta to index a file containing lots of sequences in fasta format. All is fine except one thing. According to the bioperl tutorial and the documents, the following code will make a indexed file: my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx", -write_flag => 1); $inx->make_index("test.fasta"); And in another script I can access the indexed file by sayinig $ENV{BIOPERL_INDEX} = "."; # find index in current directory my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx"); my $seq=$inx->fetch("ent1001"); #fetch the sequence named ent1001 However, after running the first script, I cannot find a new file test.fasta.idx in my current directory. And not surprisingly, when I ran the second script, perl told me it couldn't find "test.fasta.idx". What's going on here? Thanks a lot! _________________________________________________________________ ???????????????????????????? MSN Messenger: http://messenger.msn.com/cn From jason at bioperl.org Fri Feb 2 01:24:44 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 1 Feb 2007 22:24:44 -0800 Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file? In-Reply-To: References: Message-ID: I don't think BIOPERL_INDEX does anything in the module so that documentation is not quite right. the env variable is used in the scripts/index/bp_index and bp_fetch scripts so maybe a cut+paste job went bad somewhere. you need to specify the full path you want with -filename - you can just prepen the BIOPERL_INDEX to the filename like. -filename => $ENV{BIOPERL_INDEX}."/$index" -jason On Feb 1, 2007, at 7:27 PM, zhihua li wrote: > Sorry guys, the former empty mail was sent out by mistake. > > I'm using Bio::index::Fasta to index a file containing lots of > sequences in fasta format. All is fine except one thing. > > According to the bioperl tutorial and the documents, the following > code will make a indexed file: > > my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx", > -write_flag => 1); > $inx->make_index("test.fasta"); > > And in another script I can access the indexed file by sayinig > > $ENV{BIOPERL_INDEX} = "."; # find index in current directory > my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx"); > my $seq=$inx->fetch("ent1001"); #fetch the sequence named > ent1001 > > However, after running the first script, I cannot find a new file > test.fasta.idx in my current directory. And not surprisingly, when > I ran the second script, perl told me it couldn't find > "test.fasta.idx". > > What's going on here? > > Thanks a lot! > > _________________________________________________________________ > ?????????????? MSN Messenger: http:// > messenger.msn.com/cn > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From marian.thieme at lycos.de Fri Feb 2 05:06:09 2007 From: marian.thieme at lycos.de (marian thieme) Date: Fri, 2 Feb 2007 10:06:09 +0000 Subject: [Bioperl-l] seqDiff Message-ID: <101051013116870@lycos-europe.com> An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070202/cb3feed1/attachment.html From marian.thieme at lycos.de Fri Feb 2 06:37:05 2007 From: marian.thieme at lycos.de (marian thieme) Date: Fri, 2 Feb 2007 11:37:05 +0000 Subject: [Bioperl-l] susp. header Message-ID: <188661178024725@lycos-europe.com> An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070202/d3c3535c/attachment.html From lubapardo at gmail.com Fri Feb 2 09:31:06 2007 From: lubapardo at gmail.com (Luba Pardo) Date: Fri, 2 Feb 2007 15:31:06 +0100 Subject: [Bioperl-l] Problem using Bio::DB::Query::GenBank; Message-ID: <58ff33550702020631l4e7bc59dmabcf8c72fa67a6d5@mail.gmail.com> Hello, (I am using bioperl-1.5.2_100, linux machine) I am trying to get the ids of a list of genes using the module Bio::DB::Query:GenBank. I have the following code: use Bio::DB::Query::GenBank; use strict; use warnings; open (READER_1,"list.txt") || die "\n I can't open the file READER_1 !!\n"; my @a1=; close (READER_1); for (my $i=0; $i<=$#a1;$i=$i+1 ) { my @a1_s=split/\s+/,$a1[$i]; my $query_string = ' Homo Sapiens[Organism] AND $a1[$i] '; my $query = Bio::DB::Query::GenBank->new(-db=>'Protein', -query=>$query_string ); my $count = $query->count; my @ids = $query->ids; print " gene: $a1[$i] first id is $ids[0] o no? \n"; I want to tell the program to get all the genes contained in the file list.txt and to retrieve the ids from GenBank. However the program gives me the following error: ------------EXCEPTION: Bio::Root::Exception ------------- MSG: Id list has been truncated even after maxids requested STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.1/Bio/Root/Root.pm:359 STACK: Bio::DB::Query::WebQuery::_fetch_ids /usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:236 STACK: Bio::DB::Query::WebQuery::ids /usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:200 STACK: query.pl:27 ------------------ Is that a problem if I try to use the $a1[$i] to replace the name of the gene? I thank before hand for the attention you may pay to this message Regards, Luba Pardo From hlapp at gmx.net Fri Feb 2 10:44:02 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 2 Feb 2007 10:44:02 -0500 Subject: [Bioperl-l] susp. header In-Reply-To: <188661178024725@lycos-europe.com> References: <188661178024725@lycos-europe.com> Message-ID: You are sending HTML emails. You should configure your mailer to ideally just send plain text. If you really must have fancy formatted emails (i.e., HTML-formatted emails), then configure it such that the mailer will send a plain text and a HTML version. (Many spam filters will flag email the body of which consists of only an HTML attachment.) -hilmar On Feb 2, 2007, at 6:37 AM, marian thieme wrote: > why each message I sent to this list is considered to have a susp. > header ? > > Marian > > Schreiben Sie sich kostenlos ein und erhalten Sie eine Liste mit > 20 Singles aus Ihrer Umgebung.Meetic.de > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cain.cshl at gmail.com Fri Feb 2 11:03:16 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Fri, 02 Feb 2007 11:03:16 -0500 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu> References: <1170359746.2706.622.camel@localhost.localdomain> <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu> Message-ID: <1170432196.2706.661.camel@localhost.localdomain> Hi Hilmar, That is a good idea; when I started down this road, it felt like there would only be a few things that I might want to allow to be different, but I think you are right that having one standard implementation that can be subclassed for legacy systems is a good thing. Scott On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote: > On Feb 1, 2007, at 2:55 PM, Scott Cain wrote: > > > The second main change was to introduce a -flybase_compat argument > > when > > initializing the Bio::SeqIO writer, so that 'old style' cv and cvterms > > (that are compatable with flybase) will be used, but now the default > > will be to use current standards: > > Just my $0.02 ... obviously, Flybase may be the only organization > that uses an 'old style' or any other way not compliant with 'current > standards' (presumably SO), but if it's not the only one then this > approach won't scale. > > Also, an argument -flybase_compat suggests to the unsuspecting that > this is an endorsed flavor of the standard and fine to use for > everyone else too. > > If Flybase is idiosyncratic in this way, why not make chadoxml.pm > compliant with the standard as we all want it, keep it free from > litter caused by usage of old versions of SO, and create a second > module fb-chadoxml.pm that inherits from the first and merely > overrides a few things so that it works for Flybase. This way, other > organizations with similar needs can follow the path and create their > own xyz-chadoxml.pm, rather than having to muck around in the > chadoxml.pm that comes with the distribution. > > I'm not sure I fully grasp the underlying issue, so I may not make > much sense here. Apologies if that's the case ... > > -hilmar -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070202/2488afc4/attachment.bin From bosborne11 at verizon.net Fri Feb 2 10:27:44 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 02 Feb 2007 10:27:44 -0500 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu> Message-ID: Hilmar, I second your motion, good idea. Let's keep the standard module nice and clean. Brian O. On 2/2/07 10:09 AM, "Hilmar Lapp" wrote: > and create a second > module fb-chadoxml.pm that inherits from the first and merely > overrides a few things so that it works for Flybase From Kevin.M.Brown at asu.edu Fri Feb 2 10:52:20 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Fri, 2 Feb 2007 08:52:20 -0700 Subject: [Bioperl-l] Problem using Bio::DB::Query::GenBank; References: <58ff33550702020631l4e7bc59dmabcf8c72fa67a6d5@mail.gmail.com> Message-ID: <1A4207F8295607498283FE9E93B775B402AABA1C@EX02.asurite.ad.asu.edu> It looks like you have some problems with the code you posted. use Bio::DB::Query::GenBank; use strict; use warnings; open (READER_1,"list.txt") || die "\n I can't open the file READER_1 !!\n"; my @a1=; close (READER_1); for (my $i=0; $i < @a1;$i++ ) { # is this necessary as you don't seem to use it anywhere later in your code. my @a1_s=split/\s+/,$a1[$i]; # you enclosed the variable in '' which means perl won't evaluate it # changed the query so that perl can evaluate the variable my $query_string = ' Homo Sapiens[Organism] AND '.$a1[$i] .' '; my $query = Bio::DB::Query::GenBank->new(-db=>'Protein', -query=>$query_string ); my $count = $query->count; my @ids = $query->ids; print " gene: $a1[$i] first id is $ids[0] o no? \n"; -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Luba Pardo Sent: Friday, February 02, 2007 7:31 AM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Problem using Bio::DB::Query::GenBank; Hello, (I am using bioperl-1.5.2_100, linux machine) I am trying to get the ids of a list of genes using the module Bio::DB::Query:GenBank. I have the following code: use Bio::DB::Query::GenBank; use strict; use warnings; open (READER_1,"list.txt") || die "\n I can't open the file READER_1 !!\n"; my @a1=; close (READER_1); for (my $i=0; $i<=$#a1;$i=$i+1 ) { my @a1_s=split/\s+/,$a1[$i]; my $query_string = ' Homo Sapiens[Organism] AND $a1[$i] '; my $query = Bio::DB::Query::GenBank->new(-db=>'Protein', -query=>$query_string ); my $count = $query->count; my @ids = $query->ids; print " gene: $a1[$i] first id is $ids[0] o no? \n"; I want to tell the program to get all the genes contained in the file list.txt and to retrieve the ids from GenBank. However the program gives me the following error: ------------EXCEPTION: Bio::Root::Exception ------------- MSG: Id list has been truncated even after maxids requested STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.1/Bio/Root/Root.pm:359 STACK: Bio::DB::Query::WebQuery::_fetch_ids /usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:236 STACK: Bio::DB::Query::WebQuery::ids /usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:200 STACK: query.pl:27 ------------------ Is that a problem if I try to use the $a1[$i] to replace the name of the gene? I thank before hand for the attention you may pay to this message Regards, Luba Pardo _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri Feb 2 11:37:49 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 2 Feb 2007 10:37:49 -0600 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: <1170432196.2706.661.camel@localhost.localdomain> References: <1170359746.2706.622.camel@localhost.localdomain> <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu> <1170432196.2706.661.camel@localhost.localdomain> Message-ID: <64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu> I was going to suggest maybe allowing one to switch out XML handlers/ writers based on the style (ala XML::SAX), but I see that chadoxml currently uses XML::Writer and there is no next_seq() implemented. Oh well... chris On Feb 2, 2007, at 10:03 AM, Scott Cain wrote: > Hi Hilmar, > > That is a good idea; when I started down this road, it felt like there > would only be a few things that I might want to allow to be different, > but I think you are right that having one standard implementation that > can be subclassed for legacy systems is a good thing. > > Scott > > > On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote: >> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote: >> >>> The second main change was to introduce a -flybase_compat argument >>> when >>> initializing the Bio::SeqIO writer, so that 'old style' cv and >>> cvterms >>> (that are compatable with flybase) will be used, but now the default >>> will be to use current standards: >> >> Just my $0.02 ... obviously, Flybase may be the only organization >> that uses an 'old style' or any other way not compliant with 'current >> standards' (presumably SO), but if it's not the only one then this >> approach won't scale. >> >> Also, an argument -flybase_compat suggests to the unsuspecting that >> this is an endorsed flavor of the standard and fine to use for >> everyone else too. >> >> If Flybase is idiosyncratic in this way, why not make chadoxml.pm >> compliant with the standard as we all want it, keep it free from >> litter caused by usage of old versions of SO, and create a second >> module fb-chadoxml.pm that inherits from the first and merely >> overrides a few things so that it works for Flybase. This way, other >> organizations with similar needs can follow the path and create their >> own xyz-chadoxml.pm, rather than having to muck around in the >> chadoxml.pm that comes with the distribution. >> >> I'm not sure I fully grasp the underlying issue, so I may not make >> much sense here. Apologies if that's the case ... >> >> -hilmar > -- > ---------------------------------------------------------------------- > -- > Scott Cain, Ph. D. > cain.cshl at gmail.com > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Fri Feb 2 11:45:30 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 2 Feb 2007 11:45:30 -0500 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: <64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu> References: <1170359746.2706.622.camel@localhost.localdomain> <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu> <1170432196.2706.661.camel@localhost.localdomain> <64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu> Message-ID: <3A3FF1B0-129A-410D-9DE2-D1CB28015C9A@gmx.net> There must be at least a stub for next_seq(). It may throw a not- implemented exception, but it should not just be absent. -hilmar On Feb 2, 2007, at 11:37 AM, Chris Fields wrote: > I was going to suggest maybe allowing one to switch out XML > handlers/writers based on the style (ala XML::SAX), but I see that > chadoxml currently uses XML::Writer and there is no next_seq() > implemented. Oh well... > > chris > > On Feb 2, 2007, at 10:03 AM, Scott Cain wrote: > >> Hi Hilmar, >> >> That is a good idea; when I started down this road, it felt like >> there >> would only be a few things that I might want to allow to be >> different, >> but I think you are right that having one standard implementation >> that >> can be subclassed for legacy systems is a good thing. >> >> Scott >> >> >> On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote: >>> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote: >>> >>>> The second main change was to introduce a -flybase_compat argument >>>> when >>>> initializing the Bio::SeqIO writer, so that 'old style' cv and >>>> cvterms >>>> (that are compatable with flybase) will be used, but now the >>>> default >>>> will be to use current standards: >>> >>> Just my $0.02 ... obviously, Flybase may be the only organization >>> that uses an 'old style' or any other way not compliant with >>> 'current >>> standards' (presumably SO), but if it's not the only one then this >>> approach won't scale. >>> >>> Also, an argument -flybase_compat suggests to the unsuspecting that >>> this is an endorsed flavor of the standard and fine to use for >>> everyone else too. >>> >>> If Flybase is idiosyncratic in this way, why not make chadoxml.pm >>> compliant with the standard as we all want it, keep it free from >>> litter caused by usage of old versions of SO, and create a second >>> module fb-chadoxml.pm that inherits from the first and merely >>> overrides a few things so that it works for Flybase. This way, other >>> organizations with similar needs can follow the path and create >>> their >>> own xyz-chadoxml.pm, rather than having to muck around in the >>> chadoxml.pm that comes with the distribution. >>> >>> I'm not sure I fully grasp the underlying issue, so I may not make >>> much sense here. Apologies if that's the case ... >>> >>> -hilmar >> -- >> --------------------------------------------------------------------- >> --- >> Scott Cain, Ph. D. >> cain.cshl at gmail.com >> GMOD Coordinator (http://www.gmod.org/) >> 216-392-3087 >> Cold Spring Harbor Laboratory >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cain.cshl at gmail.com Fri Feb 2 12:02:32 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Fri, 02 Feb 2007 12:02:32 -0500 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: <3A3FF1B0-129A-410D-9DE2-D1CB28015C9A@gmx.net> References: <1170359746.2706.622.camel@localhost.localdomain> <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu> <1170432196.2706.661.camel@localhost.localdomain> <64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu> <3A3FF1B0-129A-410D-9DE2-D1CB28015C9A@gmx.net> Message-ID: <1170435752.2706.676.camel@localhost.localdomain> Ah, I'll go ahead and add one, though it will just throw an exception because this is a write-only adapter. Scott On Fri, 2007-02-02 at 11:45 -0500, Hilmar Lapp wrote: > There must be at least a stub for next_seq(). It may throw a not- > implemented exception, but it should not just be absent. > > -hilmar > > On Feb 2, 2007, at 11:37 AM, Chris Fields wrote: > > > I was going to suggest maybe allowing one to switch out XML > > handlers/writers based on the style (ala XML::SAX), but I see that > > chadoxml currently uses XML::Writer and there is no next_seq() > > implemented. Oh well... > > > > chris > > > > On Feb 2, 2007, at 10:03 AM, Scott Cain wrote: > > > >> Hi Hilmar, > >> > >> That is a good idea; when I started down this road, it felt like > >> there > >> would only be a few things that I might want to allow to be > >> different, > >> but I think you are right that having one standard implementation > >> that > >> can be subclassed for legacy systems is a good thing. > >> > >> Scott > >> > >> > >> On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote: > >>> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote: > >>> > >>>> The second main change was to introduce a -flybase_compat argument > >>>> when > >>>> initializing the Bio::SeqIO writer, so that 'old style' cv and > >>>> cvterms > >>>> (that are compatable with flybase) will be used, but now the > >>>> default > >>>> will be to use current standards: > >>> > >>> Just my $0.02 ... obviously, Flybase may be the only organization > >>> that uses an 'old style' or any other way not compliant with > >>> 'current > >>> standards' (presumably SO), but if it's not the only one then this > >>> approach won't scale. > >>> > >>> Also, an argument -flybase_compat suggests to the unsuspecting that > >>> this is an endorsed flavor of the standard and fine to use for > >>> everyone else too. > >>> > >>> If Flybase is idiosyncratic in this way, why not make chadoxml.pm > >>> compliant with the standard as we all want it, keep it free from > >>> litter caused by usage of old versions of SO, and create a second > >>> module fb-chadoxml.pm that inherits from the first and merely > >>> overrides a few things so that it works for Flybase. This way, other > >>> organizations with similar needs can follow the path and create > >>> their > >>> own xyz-chadoxml.pm, rather than having to muck around in the > >>> chadoxml.pm that comes with the distribution. > >>> > >>> I'm not sure I fully grasp the underlying issue, so I may not make > >>> much sense here. Apologies if that's the case ... > >>> > >>> -hilmar > >> -- > >> --------------------------------------------------------------------- > >> --- > >> Scott Cain, Ph. D. > >> cain.cshl at gmail.com > >> GMOD Coordinator (http://www.gmod.org/) > >> 216-392-3087 > >> Cold Spring Harbor Laboratory > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070202/9acaa3c3/attachment.bin From peili at morgan.harvard.edu Fri Feb 2 10:56:56 2007 From: peili at morgan.harvard.edu (Peili Zhang) Date: Fri, 02 Feb 2007 10:56:56 -0500 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: References: Message-ID: <1170431816.6583.47.camel@jacks> i 'third' Hilmar's opinion. flybase's fingerprint is shown in the module because i wrote it for fb's data loading task. no need to worry about flybase compatibility in making the module generic. in fact, at flybase, i tweak the module frequently to make it work for different scenarios. cheers, peili On Fri, 2007-02-02 at 10:27, Brian Osborne wrote: > Hilmar, > > I second your motion, good idea. Let's keep the standard module nice and > clean. > > Brian O. > > > On 2/2/07 10:09 AM, "Hilmar Lapp" wrote: > > > and create a second > > module fb-chadoxml.pm that inherits from the first and merely > > overrides a few things so that it works for Flybase > > > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier. > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Gmod-schema mailing list > Gmod-schema at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-schema > From cain.cshl at gmail.com Fri Feb 2 13:05:47 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Fri, 02 Feb 2007 13:05:47 -0500 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: <1170431816.6583.47.camel@jacks> References: <1170431816.6583.47.camel@jacks> Message-ID: <1170439549.2706.683.camel@localhost.localdomain> Hi Peili, A little bit ago I checked in Bio::SeqIO::flybase_chadoxml that is fairly simple. My suggestion is that when you make tweaks for different scenarios, that you turn the things you are tweaking into methods in BSIO::chadoxml and then override them in flybase_chadoxml (and commit at least the chadoxml module) to make it more flexible when other people have similar scenarios. Scott On Fri, 2007-02-02 at 10:56 -0500, Peili Zhang wrote: > i 'third' Hilmar's opinion. flybase's fingerprint is shown in the module > because i wrote it for fb's data loading task. no need to worry about > flybase compatibility in making the module generic. in fact, at flybase, > i tweak the module frequently to make it work for different scenarios. > > cheers, > peili > > On Fri, 2007-02-02 at 10:27, Brian Osborne wrote: > > Hilmar, > > > > I second your motion, good idea. Let's keep the standard module nice and > > clean. > > > > Brian O. > > > > > > On 2/2/07 10:09 AM, "Hilmar Lapp" wrote: > > > > > and create a second > > > module fb-chadoxml.pm that inherits from the first and merely > > > overrides a few things so that it works for Flybase > > > > > > > > ------------------------------------------------------------------------- > > Using Tomcat but need to do more? Need to support web services, security? > > Get stuff done quickly with pre-integrated technology to make your job easier. > > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > > _______________________________________________ > > Gmod-schema mailing list > > Gmod-schema at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070202/a6d23204/attachment.bin From cjfields at uiuc.edu Fri Feb 2 15:33:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 2 Feb 2007 14:33:46 -0600 Subject: [Bioperl-l] seqDiff In-Reply-To: <101051013116870@lycos-europe.com> References: <101051013116870@lycos-europe.com> Message-ID: Judging by the code you'll have to recreate the SeqDiff while iterating through various alleles; there is no method to remove particular variants or purge them (at least I couldn't find one). I also noticed SeqDiff doesn't support deletions/insertions either; using a null allele (no seq) or leaving out either the mutant or original allele leads to errors. I'll look into the latter, and I may try to add a method to at least purge variants and reset dna_mut(). chris On Feb 2, 2007, at 4:06 AM, marian thieme wrote: > HI, > > is there a way to put out all mutated sequences of a seqdiff object ? > Suppose I add some variants via: > > $dnamut->add_Allele($a2); > $dnamut->add_Allele($a3); > $seqDiff->add_Variant($dnamut); > > and afterwards want to access the alternative sequences via > $seqDiff->dna_mut() > > which allele is choosen when using dna_mut(), respective can I > control to access the first or the second alternate sequence ? > If yes, how can I do this ? > > Regards, > Marian > > Brauchst du eine Schocktherapie gegen den Alltag? L?chle! Die warme > Sonne von Ibiza und ein bisschen Sand vom Mittelmeer ist die > Therapie, die du brauchst. Plan deinen Urlaub in Spanien auf > www.spain.info > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From MEC at stowers-institute.org Fri Feb 2 16:47:08 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 2 Feb 2007 15:47:08 -0600 Subject: [Bioperl-l] Bio::DB::SeqFeature treamtent of tags and annotations Message-ID: Lincoln, I don't think that adding this directive is a good idea after all either. But, I see that you remap the ID= to a load_id attribute which is preserved in the Bio::DB::SeqFeatureStore database. And then it gets squelched during GFF production by NormalizedFeature::format_attributes. However, if ID is prone to clashes, then certainly simply renaming the attribute to be load_id does not preclude clashes from happening, and only courts disaster. Don't you think? I'm a little blurry on the GFF3Loader, but it looks like you're using load_id to facilitate loading parent/child features out of order. Is that right? If so, I suggest you delete all load_id features immediately after performing a load. It has not further use. Or, you might consider instead of `round-trip-ids` directive, rather, give the GFF3Loader an IDAttribute option which would allow the use of the loader to preserve the ID values, but to use a named In my case, munging flybase gff, I would then use it like this: bp_seqfeature_load.PLS --fast --IDAttribute flybaseID which would preserve the ID values in the database but under the FlybaseID attribute for features so loaded. --------------------------------------------- On a related topic: I just committed this patch to Bio::DB::SeqFeature::NormalizedFeature _create_subfeatures : ensure that subfeatures get the `source` of their parent While doing this I wonder: what is the -class that subfeatures are getting from their parent...??? I left it in place. Please advise! Fix my thinking.... ---------------------------------------------- Further, I observe that Bio::Graphics::FeatureBase::new handles the -segments option is to call add_segment. So, when I create a new Bio::DB::SeqFeature with -segments [[ 100,200 ] [300,400]], the -segments option gets handled by Bio::Graphics::FeatureBase::new, which, as mentioned, calls add_segment. The surprising thing to me when thrying to trace through the class modules and understand what is going on is that what gets run at this point is not Bio::Graphics::FeatureBase::add_segment, but rather Bio::DB::SeqFeature::add_segment, whose semantics is different in at least one regard, namely, that it does not set the start and stop of the parent feature from the min and max of the segments. I have committed a patch to Bio::Graphics::FeatureBase with a comment to this effect, and have also patched it's add_segment method to copy the parent's source into the segment. I hope my commits and suggestions further the cause. Let me know if not! -- Malcolm ________________________________ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: Tuesday, January 30, 2007 4:46 PM To: Cook, Malcolm Cc: bioperl list; lstein at cshl.org Subject: Re: Bio::DB::SeqFeature treamtent of tags and annotations I've fixed the first issue in CVS. Sorry for the inconsistency. add_tag_value(), delete_tag_value() and get_Annotations() now all work as expected. The problem with the ID column is that it is supposed to be LOCAL to the GFF3 file and is not intended to be stored in the database. In contrast, Name can survive roundtripping. Perhaps the thing to do is to add a flag to the GFF3 file that turns on ID round-tripping, e.g. ##round-trip-ids: 1 If you like this idea, I can implement it. Lincoln On 1/29/07, Cook, Malcolm < MEC at stowers-institute.org > wrote: Lincoln, Thanks for your suggestions on approach to my problems augmenting Flybase annotation. I am trying to follow them and finding the following oddities The first issue relates to the intermix of 'annotations' and 'tag values'. I find that Bio::DB::SeqFeature implements some of the 'tag' methods and some of the 'Annotation' methods. Here is a perl one-liner that shows values stored using add_tag_value are not retreived with get_tag_values, but rather with get_Annotations. > perl -MBio::DB::SeqFeature -e 'my $f=Bio::DB::SeqFeature->new; $f->add_tag_value("x",666); print "get_tag_values:\t" . $f->get_tag_values("x") . "\nget_Annotations:\t" . $f->get_Annotations("x");' whose output is: get_tag_values: get_Annotations: 666 Tracing this shows me that this results from the fact that: Bio::DB::SeqFeature uses of Bio::Graphics::FeatureBase (via Bio::DB::SeqFeature::NormalizedFeature) which does not support -tags in ->new but rather -attributes, viz: -attributes a hashref of tag value attributes, in which the key is the tag and the value is an array reference of values And though Bio::Graphics::FeatureBase purports to implement Bio::SeqFeatureI, it only partially implements the 'tag' methods (now deprecated and relegated to Bio::AnnotatableI). In particular, the '*' methods Bio::SeqFeatureI are not implemented in Bio::Graphics::FeatureBase has_tag * add_tag_value get_tag_values get_all_tags * remove_tag get_tagset_values get_Annotations As a result, add_tag_value and remove_tag are inherited from different modules whose understanding of tags is not the same! This one-liner : >perl -MClass::ISA -MClass::Inspector -MBio::DB::SeqFeature -e 'my @c = Class::ISA::self_and_super_path("Bio::DB::SeqFeature"); foreach my $fn qw(add_tag_value get_tag_values) {print "\n$fn:\t", join "\t", (grep {Class::Inspector->function_exists($_, $fn)} @c)}' confirms that they are defined in different packages, namely: add_tag_value: Bio::AnnotatableI get_tag_values: Bio::Graphics::FeatureBase Bio::AnnotatableI Proposed solution... hmmmm ..... I dunno.... maybe the following patch to Bio::Graphics::FeatureBase->add_tag_value : sub add_tag_value { my ($self,$tag, at vals) = @_; push @{$self->{attributes}{$tag}}, @vals; } It fixes my use case for now but I'm still concerned and confused about this variety of methods. Suggestions? ------------------------------------------------------------------------ - Also, I think that any "ID" in column 9 of GFF3 float file should be preserved through a round-trip through a Bio::DB::SeqFeature store, but this is not yet possible since any ID attribute in GFF3 column 9 is being lost by GFF3Loader, causing me to locally patch GFF3Loader::handle_feature method to add the following: # mec at stowers-institute.org , wondering why not all attributes are # carried forward, adds ID tag in particular service of # round-tripping ID, which, though present in database as load_id # attribute, was getting lost as itself $unreserved->{ID}= $reserved->{ID} if exists $reserved->{ID}; Poised to patch.... what d'you think? Malcolm Cook Stowers Institute for Medical Research - Kansas City, Missouri ________________________________ From: lincoln.stein at gmail.com [mailto: lincoln.stein at gmail.com ] On Behalf Of Lincoln Stein Sent: Tuesday, December 19, 2006 3:58 PM To: Cook, Malcolm Cc: bioperl list; lstein at cshl.org Subject: Re: bp_seqfeature_load / Bio::DB::SeqFeature::Store::GFF3Loader problems augmenting Flybase annotation Hi Malcom, Your second guess was right. The use case of augmenting an existing gene with additional splice forms isn't provided for. You can get the functionality by making direct calls to Bio::DB::SeqFeature::Store methods: my @genes = $db->get_features_by_name('FBgn0017545'); @genes == 1 or die "Didn't get exactly one gene"; my $parent = $genes[0]; my $parent = $genes[0]; my $chr = $parent->seq_id; my $start = $parent->start; my $end = $parent->end; my $strand = $parent->strand; my $new_splice_form = $db->new_feature(-primary_tag => 'mRNA', -source => 'added', -seq_id => '4r', -strand => $strand, -start => $start+10, -end => $end, ); $parent->add_SeqFeature($new_splice_form); for my $pos ([$start+10,$start+100],[$start+200,$end]) { my ($e_start,$e_end) = @$pos; my $exon = Bio::DB::SeqFeature->new(-primary_tag => 'exon', -store => $db, -seq_id => '4r', -strand => $strand, -start => $e_start, -end => $e_end); $new_splice_form->add_SeqFeature($exon); } I found a bug in updating the seqfeature database when I wrote this script, so you'll have to get the latest biperl live. I think you can use this to write a splice form updating script. In order to support the idea of adding new splice forms to an existing gene using the GFF3 format, I will have to either modify the loader, or write a separate script (probably better to do the latter). It shouldn't be hard if you'd like to give it a try. Lincoln On 12/19/06, Cook, Malcolm > wrote: Lincoln and fellow Bio::DB::SeqFeature travelers, I find that using bp_seqfeature_load.PLS to load subfeatures of genes already loaded using bp_seqfeature_load.PLS fails with ------------- EXCEPTION ------------- MSG: FBgn0017545 doesn't have a primary id STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682 STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663 STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372 STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345 STACK Bio::DB::SeqFeature::Store::GFF3Loader::load /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242 STACK toplevel /home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo ad.PLS:76 Where FBgn0017545 is the ID of a gene previously loaded. I am unsure how to remedy my situation and welcome any advise on correct or improved approach to my problem. Here's some detail if it helps. I am developing a pipeline to design a microarray probes capable of distinguishing among splice variants in drosophila (using latest Flybase dmel_r5.1 annotation). So I 1) load a filtered selection of Flybase annotation using bp_seqfeature_load. (for testing purposes, I am using a single gene's worth of annotation, FBgn0017545.gff, attached). This is done as follows: > bp_seqfeature_load.PLS --create FBgn0017545.gff 2) analyze all the genes in the database, and create GFF3 output each feature of which has a 'Parent' that is a previously loaded gene (i.e. FBgn0017545). (These features represent the unique introns, splice sites, and exonic design targets. Output of this analysis, FBgn0017545_matd.gff, is also attached) 3) load these analysis results into the same database, as follows: > bp_seqfeature_load.PLS FBgn0017545_matd.gff It is at this point that I get the above error. However, I don't get any error and the data loads fine if I load the two files together, as follows: > bp_seqfeature_load.PLS --create <(cat FBgn0017545.gff FBgn0017545_matd.gff) So, I suspect that either I am misunderstanding when/how to use bp_seqfeature_load.PLS or else this use case has not yet arisen and must be provided for somehow. I am running against bioperl-live Thanks for your thoughts and assistance, Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From neha_bafs at yahoo.co.in Mon Feb 5 12:59:03 2007 From: neha_bafs at yahoo.co.in (Neha Nahar) Date: Mon, 5 Feb 2007 17:59:03 +0000 (GMT) Subject: [Bioperl-l] Convert newick to nexus format Message-ID: <20070205175903.30379.qmail@web8701.mail.in.yahoo.com> Hello everyone, I am trying to convert newick tree to nexus format. Using the script (refered from and email from George dated Wed Sep 22 11:52:47 EDT 2004) : /*------------------------------------------------------------*/ $ cat nexus.pl #!/usr/bin/perl -w use Bio::TreeIO; ($NEWICKFILE, $NEXUSFILE) = @ARGV; print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n"; my $treeio = new Bio::TreeIO(-format => 'newick', -file => "$NEWICKFILE"); my $treeout = new Bio::TreeIO(-format => 'nexus', -file => ">$NEXUSFILE"); while(my $tree = $treeio->next_tree) { $treeout->write_tree($treeout); } exit 0; /*------------------------------------------------------------*/ Running the script through command line: Gives the following error: $ ./nexus.pl mrp-input.txt nexus.out newickfile=mrp-input.txt, nexusfile=nexus.out ------------- EXCEPTION ------------- MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170 STACK toplevel ./nexus.pl:23 -------------------------------------- Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm Questions:- 1. Please let me know if I am using the correct version. If not, please point me to the latest one. 2. Provided that the version I am using is the right one, please let me know what is wrong with the script. Thank you. Regards, Neha. -Neha Nahar " Work for cause and not for applause, live to express and not to impress !" --------------------------------- Here?s a new way to find what you're looking for - Yahoo! Answers From jason at bioperl.org Mon Feb 5 13:10:42 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 5 Feb 2007 10:10:42 -0800 Subject: [Bioperl-l] Convert newick to nexus format In-Reply-To: <20070205175903.30379.qmail@web8701.mail.in.yahoo.com> References: <20070205175903.30379.qmail@web8701.mail.in.yahoo.com> Message-ID: <46219DCD-8C6E-4DBE-82F2-D4B58207AD54@bioperl.org> you want to write the TREE out not the TREE WRITER. $treeout->write_tree($tree) not $treeout->write_tree($treeout); On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote: > Hello everyone, > > I am trying to convert newick tree to nexus format. > Using the script (refered from and email from George dated Wed Sep > 22 11:52:47 EDT 2004) : > > /*------------------------------------------------------------*/ > > $ cat nexus.pl > #!/usr/bin/perl -w > > use Bio::TreeIO; > > ($NEWICKFILE, $NEXUSFILE) = @ARGV; > print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n"; > my $treeio = new Bio::TreeIO(-format => 'newick', -file => > "$NEWICKFILE"); > my $treeout = new Bio::TreeIO(-format => 'nexus', -file => "> > $NEXUSFILE"); > while(my $tree = $treeio->next_tree) { > $treeout->write_tree($treeout); > } > > exit 0; > > > /*------------------------------------------------------------*/ > > Running the script through command line: > Gives the following error: > > $ ./nexus.pl mrp-input.txt nexus.out > newickfile=mrp-input.txt, nexusfile=nexus.out > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus.pl:23 > > -------------------------------------- > > > Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ > ~sendu/bioperl/Bio/TreeIO.pm > > Questions:- > > 1. Please let me know if I am using the correct version. > If not, please point me to the latest one. > > 2. Provided that the version I am using is the right one, please > let me know what is wrong with the script. > > Thank you. > Regards, > Neha. > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From nehadnahar at yahoo.co.in Mon Feb 5 13:05:26 2007 From: nehadnahar at yahoo.co.in (Neha Nahar) Date: Mon, 5 Feb 2007 18:05:26 +0000 (GMT) Subject: [Bioperl-l] Convert newick to nexus format Message-ID: <288335.22352.qm@web8412.mail.in.yahoo.com> Hello everyone, I am trying to convert newick tree to nexus format. Using the script (refered from and email from George dated Wed Sep 22 11:52:47 EDT 2004) : /*------------------------------------------------------------*/ $ cat nexus.pl #!/usr/bin/perl -w use Bio::TreeIO; ($NEWICKFILE, $NEXUSFILE) = @ARGV; print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n"; my $treeio = new Bio::TreeIO(-format => 'newick', -file => "$NEWICKFILE"); my $treeout = new Bio::TreeIO(-format => 'nexus', -file => ">$NEXUSFILE"); while(my $tree = $treeio->next_tree) { $treeout->write_tree($treeout); } exit 0; /*------------------------------------------------------------*/ Running the script through command line: Gives the following error: $ ./nexus.pl mrp-input.txt nexus.out newickfile=mrp-input.txt, nexusfile=nexus.out ------------- EXCEPTION ------------- MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170 STACK toplevel ./nexus.pl:23 -------------------------------------- Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm Questions:- 1. Please let me know if I am using the correct version. If not, please point me to the latest one. 2. Provided that the version I am using is the right one, please let me know what is wrong with the script. Thank you. Regards, Neha. -Neha Nahar " Work for cause and not for applause, live to express and not to impress !" --------------------------------- Here?s a new way to find what you're looking for - Yahoo! Answers From hlapp at duke.edu Fri Feb 2 10:09:57 2007 From: hlapp at duke.edu (Hilmar Lapp) Date: Fri, 2 Feb 2007 10:09:57 -0500 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: <1170359746.2706.622.camel@localhost.localdomain> References: <1170359746.2706.622.camel@localhost.localdomain> Message-ID: <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote: > The second main change was to introduce a -flybase_compat argument > when > initializing the Bio::SeqIO writer, so that 'old style' cv and cvterms > (that are compatable with flybase) will be used, but now the default > will be to use current standards: Just my $0.02 ... obviously, Flybase may be the only organization that uses an 'old style' or any other way not compliant with 'current standards' (presumably SO), but if it's not the only one then this approach won't scale. Also, an argument -flybase_compat suggests to the unsuspecting that this is an endorsed flavor of the standard and fine to use for everyone else too. If Flybase is idiosyncratic in this way, why not make chadoxml.pm compliant with the standard as we all want it, keep it free from litter caused by usage of old versions of SO, and create a second module fb-chadoxml.pm that inherits from the first and merely overrides a few things so that it works for Flybase. This way, other organizations with similar needs can follow the path and create their own xyz-chadoxml.pm, rather than having to muck around in the chadoxml.pm that comes with the distribution. I'm not sure I fully grasp the underlying issue, so I may not make much sense here. Apologies if that's the case ... -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== From jason at bioperl.org Mon Feb 5 14:43:09 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 5 Feb 2007 11:43:09 -0800 Subject: [Bioperl-l] Convert newick to nexus format In-Reply-To: <209988.63723.qm@web8715.mail.in.yahoo.com> References: <209988.63723.qm@web8715.mail.in.yahoo.com> Message-ID: <9E477447-67F5-46CA-BCC1-47BB4170EC76@bioperl.org> please cc the mailing list when asking a question or followup. Sorry I don't know what you are doing wrong - you didn't resend your code so I don't know if you still have a typo. This code works fine for me use Bio::TreeIO; use strict; my ($filein,$fileout) = @ARGV; my ($format,$oformat) = qw(newick nexus); my $in = Bio::TreeIO->new(-file => $filein, -format => $format); my $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout"); while( my $t = $in->next_tree ) { $out->write_tree($t); } On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote: > Thank you very much for the reply. > > I fixed the code as per your suggestion,but now am getting a > different error: > > $ ./nexus.pl mrp-input.txt nexus.out > newickfile=mrp-input.txt, nexusfile=nexus.out > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus.pl:23 > > -------------------------------------- > > Please help me out with this script. > > Thank you. > Regards, > Neha > > > > > Jason Stajich wrote: you want to write the TREE > out not the TREE WRITER. > > > $treeout->write_tree($tree) > > not > $treeout->write_tree($treeout); > > On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote: > > Hello everyone, > > > I am trying to convert newick tree to nexus format. > Using the script (refered from and email from George dated Wed Sep > 22 11:52:47 EDT 2004) : > > > /*------------------------------------------------------------*/ > > > $ cat nexus.pl > #!/usr/bin/perl -w > > > use Bio::TreeIO; > > > ($NEWICKFILE, $NEXUSFILE) = @ARGV; > print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n"; > my $treeio = new Bio::TreeIO(-format => 'newick', -file => > "$NEWICKFILE"); > my $treeout = new Bio::TreeIO(-format => 'nexus', -file => "> > $NEXUSFILE"); > while(my $tree = $treeio->next_tree) { > $treeout->write_tree($treeout); > } > > > exit 0; > > > > > /*------------------------------------------------------------*/ > > > Running the script through command line: > Gives the following error: > > > $ ./nexus.pl mrp-input.txt nexus.out > newickfile=mrp-input.txt, nexusfile=nexus.out > > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus.pl:23 > > > -------------------------------------- > > > > > Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ > ~sendu/bioperl/Bio/TreeIO.pm > > > Questions:- > > > 1. Please let me know if I am using the correct version. > If not, please point me to the latest one. > > > 2. Provided that the version I am using is the right one, please > let me know what is wrong with the script. > > > Thank you. > Regards, > Neha. > > > > > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From nehadnahar at yahoo.co.in Mon Feb 5 14:58:08 2007 From: nehadnahar at yahoo.co.in (Neha Nahar) Date: Mon, 5 Feb 2007 19:58:08 +0000 (GMT) Subject: [Bioperl-l] Convert newick to nexus format In-Reply-To: <99196.23114.qm@web8711.mail.in.yahoo.com> Message-ID: <36024.1212.qm@web8405.mail.in.yahoo.com> Hi, Thank you for the code. I tried it but I still get the same exception. ------------- EXCEPTION ------------- MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170 STACK toplevel ./nexus1.pl:18 Please find attached the perl file(nexus.pl). Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm Please let me know if I am using the correct version.If not, please point me to the latest one. Thank you. Regards, nnahar Jason Stajich wrote:please cc the mailing list when asking a question or followup. Sorry I don't know what you are doing wrong - you didn't resend your code so I don't know if you still have a typo. This code works fine for me use Bio::TreeIO; use strict; my ($filein,$fileout) = @ARGV; my ($format,$oformat) = qw(newick nexus); my $in = Bio::TreeIO->new(-file => $filein, -format => $format);my $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout"); while( my $t = $in->next_tree ) { $out->write_tree($t); } On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote: Thank you very much for the reply. I fixed the code as per your suggestion,but now am getting a different error: $ ./nexus.pl mrp-input.txt nexus.out newickfile=mrp-input.txt, nexusfile=nexus.out ------------- EXCEPTION ------------- MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170 STACK toplevel ./nexus.pl:23 -------------------------------------- Please help me out with this script. Thank you. Regards, Neha Jason Stajich wrote: you want to write the TREE out not the TREE WRITER. $treeout->write_tree($tree) not $treeout->write_tree($treeout); On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote: Hello everyone, I am trying to convert newick tree to nexus format. Using the script (refered from and email from George dated Wed Sep 22 11:52:47 EDT 2004) : /*------------------------------------------------------------*/ $ cat nexus.pl #!/usr/bin/perl -w use Bio::TreeIO; ($NEWICKFILE, $NEXUSFILE) = @ARGV; print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n"; my $treeio = new Bio::TreeIO(-format => 'newick', -file => "$NEWICKFILE"); my $treeout = new Bio::TreeIO(-format => 'nexus', -file => ">$NEXUSFILE"); while(my $tree = $treeio->next_tree) { $treeout->write_tree($treeout); } exit 0; /*------------------------------------------------------------*/ Running the script through command line: Gives the following error: $ ./nexus.pl mrp-input.txt nexus.out newickfile=mrp-input.txt, nexusfile=nexus.out ------------- EXCEPTION ------------- MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170 STACK toplevel ./nexus.pl:23 -------------------------------------- Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm Questions:- 1. Please let me know if I am using the correct version. If not, please point me to the latest one. 2. Provided that the version I am using is the right one, please let me know what is wrong with the script. Thank you. Regards, Neha. -Neha Nahar " Work for cause and not for applause, live to express and not to impress !" --------------------------------- Here?s a new way to find what you're looking for - Yahoo! Answers -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ -Neha Nahar " Work for cause and not for applause, live to express and not to impress !" --------------------------------- Here?s a new way to find what you're looking for - Yahoo! Answers -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ -Neha Nahar " Work for cause and not for applause, live to express and not to impress !" --------------------------------- Here?s a new way to find what you're looking for - Yahoo! Answers -Neha Nahar " Work for cause and not for applause, live to express and not to impress !" --------------------------------- Here?s a new way to find what you're looking for - Yahoo! Answers -------------- next part -------------- A non-text attachment was scrubbed... Name: nexus.pl Type: application/x-perl Size: 811 bytes Desc: 1389215665-nexus.pl Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070205/c6453dcf/attachment.bin From jason at bioperl.org Mon Feb 5 17:15:52 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 5 Feb 2007 14:15:52 -0800 Subject: [Bioperl-l] Convert newick to nexus format In-Reply-To: <36024.1212.qm@web8405.mail.in.yahoo.com> References: <36024.1212.qm@web8405.mail.in.yahoo.com> Message-ID: <84703383-600F-42F4-A860-DD0D1C43EE83@bioperl.org> Something is wrong with your install I am guessing - can you run the tests? Go to bioperl directory: $ perl t/TreeIO.t can you describe how you installed bioperl? On Feb 5, 2007, at 11:58 AM, Neha Nahar wrote: > > Hi, > Thank you for the code. > I tried it but I still get the same exception. > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus1.pl:18 > > > Please find attached the perl file(nexus.pl). > > > Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ > ~sendu/bioperl/Bio/TreeIO.pm > > Please let me know if I am using the correct version.If not, please > point me to the latest one. > > Thank you. > Regards, > nnahar > > > > > > Jason Stajich wrote:please cc the mailing list > when asking a question or followup. > > Sorry I don't know what you are doing wrong - you didn't resend > your code so I don't know if you still have a typo. > > This code works fine for me > > use Bio::TreeIO; > use strict; > my ($filein,$fileout) = @ARGV; > my ($format,$oformat) = qw(newick nexus); > my $in = Bio::TreeIO->new(-file => $filein, -format => $format);my > $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout"); > > > while( my $t = $in->next_tree ) { > $out->write_tree($t); > } > > > > On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote: > > Thank you very much for the reply. > > > I fixed the code as per your suggestion,but now am getting a > different error: > > > $ ./nexus.pl mrp-input.txt nexus.out > newickfile=mrp-input.txt, nexusfile=nexus.out > > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus.pl:23 > > > -------------------------------------- > > > Please help me out with this script. > > > Thank you. > Regards, > Neha > > > > > > > > > Jason Stajich wrote: you want to write the TREE > out not the TREE WRITER. > > > > > $treeout->write_tree($tree) > > > not > $treeout->write_tree($treeout); > > > On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote: > > > Hello everyone, > > > > > I am trying to convert newick tree to nexus format. > Using the script (refered from and email from George dated Wed Sep > 22 11:52:47 EDT 2004) : > > > > > /*------------------------------------------------------------*/ > > > > > $ cat nexus.pl > #!/usr/bin/perl -w > > > > > use Bio::TreeIO; > > > > > ($NEWICKFILE, $NEXUSFILE) = @ARGV; > print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n"; > my $treeio = new Bio::TreeIO(-format => 'newick', -file => > "$NEWICKFILE"); > my $treeout = new Bio::TreeIO(-format => 'nexus', -file => "> > $NEXUSFILE"); > while(my $tree = $treeio->next_tree) { > $treeout->write_tree($treeout); > } > > > > > exit 0; > > > > > > > > > /*------------------------------------------------------------*/ > > > > > Running the script through command line: > Gives the following error: > > > > > $ ./nexus.pl mrp-input.txt nexus.out > newickfile=mrp-input.txt, nexusfile=nexus.out > > > > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus.pl:23 > > > > > -------------------------------------- > > > > > > > > > Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ > ~sendu/bioperl/Bio/TreeIO.pm > > > > > Questions:- > > > > > 1. Please let me know if I am using the correct version. > If not, please point me to the latest one. > > > > > 2. Provided that the version I am using is the right one, please > let me know what is wrong with the script. > > > > > Thank you. > Regards, > Neha. > > > > > > > > > > > > > > > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > > > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > > > > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > > > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > > > > > > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not > to impress !" > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > From lzhtom at hotmail.com Mon Feb 5 22:31:56 2007 From: lzhtom at hotmail.com (zhihua li) Date: Tue, 06 Feb 2007 03:31:56 +0000 Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file? In-Reply-To: Message-ID: Thanks a lot! After checking out the script bp_index, I changed the syntax to: my $inx = Bio::Index::Fasta->new("test.fasta.idx", 'WRITE'); $inx->make_index("test.fasta"); Now I have a index file test.fasta.idx in my current directory. And I can use it in my later script by saying my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx"); So now everything is OK. But I don't understand why I have to use that syntax. And why the syntax provided in the document didn't work? >From: Jason Stajich >To: zhihua li >CC: bioperl-l at lists.open-bio.org, arokfl at yahoo.com >Subject: Re: [Bioperl-l] Bio::index::Fasta- where's the indexed file? >Date: Thu, 1 Feb 2007 22:24:44 -0800 > >I don't think BIOPERL_INDEX does anything in the module so that >documentation is not quite right. the env variable is used in the >scripts/index/bp_index and bp_fetch scripts so maybe a cut+paste job >went bad somewhere. > >you need to specify the full path you want with -filename - you can >just prepen the BIOPERL_INDEX to the filename like. >-filename => $ENV{BIOPERL_INDEX}."/$index" > >-jason >On Feb 1, 2007, at 7:27 PM, zhihua li wrote: > > > Sorry guys, the former empty mail was sent out by mistake. > > > > I'm using Bio::index::Fasta to index a file containing lots of > > sequences in fasta format. All is fine except one thing. > > > > According to the bioperl tutorial and the documents, the following > > code will make a indexed file: > > > > my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx", > > -write_flag => 1); > > $inx->make_index("test.fasta"); > > > > And in another script I can access the indexed file by sayinig > > > > $ENV{BIOPERL_INDEX} = "."; # find index in current directory > > my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx"); > > my $seq=$inx->fetch("ent1001"); #fetch the sequence named > > ent1001 > > > > However, after running the first script, I cannot find a new file > > test.fasta.idx in my current directory. And not surprisingly, when > > I ran the second script, perl told me it couldn't find > > "test.fasta.idx". > > > > What's going on here? > > > > Thanks a lot! > > > > _________________________________________________________________ > > ???????????????????????????????????????? MSN Messenger: http:// > > messenger.msn.com/cn > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > >-- >Jason Stajich >Miller Research Fellow >University of California, Berkeley >lab: 510.642.8441 >http://pmb.berkeley.edu/~taylor/people/js.html >http://fungalgenomes.org/ > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l _________________________________________________________________ ???????? MSN Explorer: http://explorer.msn.com/lccn/ From johnston at biochem.ucl.ac.uk Tue Feb 6 06:52:08 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Tue, 6 Feb 2007 11:52:08 +0000 (GMT) Subject: [Bioperl-l] RNA folding Message-ID: Hello, I've just joined the list - I'm a Bioinformatics PhD student at Essex University doing transcriptomics-related things. Mainly microarray analysis and more recently looking at RNA structure prediction. I was thinking about having a go at writing a bioperl-run wrapper around some of the structure prediction stuff, but according to the wiki this is being done already (at least for the Vienna tools). I spoke to Albert Vilella at the EBI the other day and he said Chris Fields was the man to speak to. So could he (or anyone) let me know what the current state of RNA structure prediction tools in bioperl is? Cheers, Cass xx From marian.thieme at lycos.de Tue Feb 6 08:52:10 2007 From: marian.thieme at lycos.de (Marian Thieme) Date: Tue, 06 Feb 2007 14:52:10 +0100 Subject: [Bioperl-l] dbSNP Message-ID: <45C8880A.7030702@lycos.de> Hello all, I looked for a method/class/function/script in the docuementation which provides the opportunity to generate a snp assay suited to submit to dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP/ http://www.ncbi.nlm.nih.gov/projects/SNP/how_to_submit.html) I didnt find those code, but I recognized that there is at least a xml parser to read dbSNP reports. Does anybody know if there is also an output class to generate dbSNP reports ? I could imagine, that at least the snp assay section is worth to be implemented. This example is given by ncbi: TYPE:SNPASSAY HANDLE:WI BATCH: 1.98 MOLTYPE:Genomic METHOD:RESEQ SYN NAMES:WI-SNP,DnaId,MapDna COMMENT: Here is where some public comment that applies to the entire batch of SNPS could be put. PRIVATE: Here is where a note to NCBI regarding processing that would not be seen by the outside, could be put. Note that these are is not exactly real SNPs, as the data were modified. || SNP:WI|WIAF-1234567 SYNONYM:EST4291092,EST8291092,EST7291092 ACCESSION:H30533 LENGTH:101 5'_ASSAY:GGCAGGGAAGGAAAATCCTAGGGNCAGCATTGGGGAGGGGGGGACTCTG OBSERVED:C/T 3'_ASSAY:TAAATTTATTGGGCAACAGGCTGCAGGTGAGGGGGCTGACAGGAGGAGGGA || SNP:WI|WIAF-1722 SYNONYM:STS-T17494,STS-T17494,STS-T17494 ACCESSION:T17494 LENGTH:269 5'_FLANK:CTTTCCCTCATCCCCTCTTCCACCACACCATCCCGGAACAAGTGCTCCAGGATT 5'_ASSAY:CCCTGCCCACTGGCCATTTTGGAGTGTGTCC OBSERVED:A/T 3'_ASSAY:GTGGGTAGCAATGTGGAAACCACCAGGGCCTTTGTGGAGAAAA 3'_FLANK:TGGAGGGGGTTGAGGGAGTCCCAGGAGGGGCTTATTTGAGGGCCTTTGCCACTT GCTCATAGGCGAGCTCGATCTCCTCATCATCTGGACAGGTGGAAGCGAATTCTT CCCGGGCGTAGGCATTGCTCAAGTACCGAT || Regards, Marian P.S. this is not in contradiction to my first request about the brackets notation. We need both formats. From cjfields at uiuc.edu Tue Feb 6 11:45:36 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 6 Feb 2007 10:45:36 -0600 Subject: [Bioperl-l] RNA folding In-Reply-To: References: Message-ID: On Feb 6, 2007, at 5:52 AM, Caroline Johnston wrote: > Hello, > > I've just joined the list - I'm a Bioinformatics PhD student at Essex > University doing transcriptomics-related things. Mainly microarray > analysis and more recently looking at RNA structure prediction. > > I was thinking about having a go at writing a bioperl-run wrapper > around > some of the structure prediction stuff, but according to the wiki > this is > being done already (at least for the Vienna tools). I spoke to Albert > Vilella at the EBI the other day and he said Chris Fields was the > man to > speak to. So could he (or anyone) let me know what the current > state of > RNA structure prediction tools in bioperl is? > > Cheers, > Cass xx > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Actually, the only RNA tool wrappers I have made are ones for ERPIN, RNAMotif, and Infernal (the only one in bioperl-run CVS at this time is RNAMotif). I am planning on writing up wrappers for Vienna, UNAFold, and a few others but haven't really started in. Here's where I'm at right now... I am writing up a new set of AnnotationI classes which positionally describe data (Meta) which I hope will help deal with this stuff. These would be similar in nature to Heikki's Bio::Seq::Meta classes: http://bioperl.org/pipermail/bioperl-l/2006-December/024414.html I would use a regular Bio::SeqI and store the structural data and anything else (such as energy calculations, etc) as Annotation objects in an AnnotationCollection, and then write up a series of SeqIO modules to get data into/out of the designated structure formats, like UNAfold ct, RNAML, and so on. Each sequence would then be capable of holding more than one structural Annotation (i.e. could represent different folding pathways, alternative RNA folds, and so on). At this point I represent the data as an array of hashes where $array [0] is nt 1 and the hash keys indicate the type of interaction, base interacted with, etc. The text representation would be as simple Eddy WUSS (Rfam-like) format by default, which is capable of representing some complex data (pseudoknots, for instance), is compact, and is documented (via the Infernal manual). Tags will probably switch to more ontologically relevant terms (probably from RNAML or RNA Ontology), but in general it is something like this: [ {'interaction' => 'WC', 'base' => 24}, {'interaction' => 'WC', 'base' => 23}, {'interaction' => 'SS'}, ... ] In this implementation every seq position would have some kind of interaction designation, though that's open for debate as it could just be simple text or undef for single-stranded regions. This is also scalable based on complexity of the data: if one wanted to add tert/quaternary interactions, location, base modifications, remote sequence interactions, etc., extra key/value pairs could be used. Comversely, if one only wanted sec structure (for drawing RNA structures, for example), then only that data would be parsed. If you (or anyone listening) have any suggestions I would greatly appreciate them. chris From johnsonm at gmail.com Tue Feb 6 18:53:49 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Tue, 6 Feb 2007 17:53:49 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> Message-ID: Okay, I need to get something going for a project I'm working on. Options: 1) Stick it all in one module: This can get a bit ugly, as Glimmer, as opposed to GlimmerM and GlimmerHMM, does not explicitly identify itself in the prediction report. You can pick up on some unique things in the output file, but you don't know what you've got until you're actually parsing it. Unless you require a format argument up front, then you can split the parsing code up into different functions. 2) Two modules, one for GlimmerM/GlimmerHMM and one for Glimmer2/Glimmer3. With or without an abstract dispatch front end. I suppose at this point, after getting my hands dirty, I'd prefer 1), with an explicit -format => Glimmer2/3/M/HMM arg required in the constructor. Though I'm not opposed to 2) if that is what it takes to get it into Bioperl. If we can achieve some sort of consensus without too much bloodshed, I'll shoot y'all some patches and we can consider this issue checked off the list. On 9/20/06, Mark Johnson wrote: > > I think it's going to be at least two modules, one for the > prokaryotic stuff and one for the eukaryotic. And really, the > prokaryotic stuff is different enough to warrant two modules. So three > different parsers. Could do it in one, but it would be ugly and > nasty. However, this does not preclude three parsers and one abstract > interface, which is your excellent suggestion. > Oh, and excuse me, but I have a bit of a rant here, after dealing > with parsers and pipelines for the last few months. Parsers should > not load the whole input file into RAM to parse it. And Pipelines > using the parsers (Ensembl / biopipe) should not stuff the whole > result set from the parser into a single array. When you're trying to > annotate assemblies, it sucks to have to split up contigs/supercontigs > because the whole result set won't fit into RAM on a 12 gig blade. > Sheesh. Though this doesn't matter for bacterial genomes, as they're > tiny (by comparison to vertebrates). There, sorry, been saving up > that frustration for a while. No offense meant, hope I didn't tick > anybody off. 8) > Torsten: You sound like you know what you're doing with respect > to Bioperl more than I do, and I know I don't have CVS access, so I'll > defer to you. I'd be happy to help out, though. > > > On 9/20/06, Hilmar Lapp wrote: > > > > On Sep 19, 2006, at 9:13 PM, Torsten Seemann wrote: > > > > > I'm not sure whether to > > > > > > 1. parse them all under the same module, perhaps with a > > > -format=>'glimmerXXX' parameter > > > > > > 2. create a single new module Glimmer2 and Glimmer3 > > > > > > 3. create two new modules, one for Glimmer2 and one for Glimmer3, > > > given > > > they are different outputs both in syntax and number of output files > > > > > > Any advice from Bioperl 'old timers' appreciated ;-) > > > > > > > If at all possible I'd favor 1), with e.g. Bio::Tools::GFF being an > > example for how this can work. > > > > If this would amount to basically 4 modules stringed together into > > one file (because the parsing code can't share much if anything > > between the flavors), it'd still be advantageous to have a single > > frontend module that would then dispatch. > > > > -hilmar > > > > -- > > =========================================================== > > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > =========================================================== > > > > > > > > > > > > > From jason at bioperl.org Tue Feb 6 19:33:11 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 6 Feb 2007 16:33:11 -0800 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> Message-ID: <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> I definitely vote for 1) - worst case you have 4 separate methods if there is no good way to condense the parsing for each format and require the user to specify the format. I have no problem with requiring user to specify what program she used - if we can be fancy and guess the format later (i.e. guess format in SeqIO) -then that's icing. -jason On Feb 6, 2007, at 3:53 PM, Mark Johnson wrote: > Okay, I need to get something going for a project I'm working on. > Options: > > 1) Stick it all in one module: This can get a bit ugly, as > Glimmer, as > opposed to GlimmerM and GlimmerHMM, does not explicitly identify > itself in > the prediction report. You can pick up on some unique things in > the output > file, but you don't know what you've got until you're actually > parsing it. > Unless you require a format argument up front, then you can split the > parsing code up into different functions. > 2) Two modules, one for GlimmerM/GlimmerHMM and one for Glimmer2/ > Glimmer3. > With or without an abstract dispatch front end. > > I suppose at this point, after getting my hands dirty, I'd prefer > 1), with > an explicit -format => Glimmer2/3/M/HMM arg required in the > constructor. > Though I'm not opposed to 2) if that is what it takes to get it into > Bioperl. > > If we can achieve some sort of consensus without too much > bloodshed, I'll > shoot y'all some patches and we can consider this issue checked off > the > list. > > On 9/20/06, Mark Johnson wrote: >> >> I think it's going to be at least two modules, one for the >> prokaryotic stuff and one for the eukaryotic. And really, the >> prokaryotic stuff is different enough to warrant two modules. So >> three >> different parsers. Could do it in one, but it would be ugly and >> nasty. However, this does not preclude three parsers and one >> abstract >> interface, which is your excellent suggestion. >> Oh, and excuse me, but I have a bit of a rant here, after dealing >> with parsers and pipelines for the last few months. Parsers should >> not load the whole input file into RAM to parse it. And Pipelines >> using the parsers (Ensembl / biopipe) should not stuff the whole >> result set from the parser into a single array. When you're >> trying to >> annotate assemblies, it sucks to have to split up contigs/ >> supercontigs >> because the whole result set won't fit into RAM on a 12 gig blade. >> Sheesh. Though this doesn't matter for bacterial genomes, as they're >> tiny (by comparison to vertebrates). There, sorry, been saving up >> that frustration for a while. No offense meant, hope I didn't tick >> anybody off. 8) >> Torsten: You sound like you know what you're doing with respect >> to Bioperl more than I do, and I know I don't have CVS access, so >> I'll >> defer to you. I'd be happy to help out, though. >> >> >> On 9/20/06, Hilmar Lapp wrote: >>> >>> On Sep 19, 2006, at 9:13 PM, Torsten Seemann wrote: >>> >>>> I'm not sure whether to >>>> >>>> 1. parse them all under the same module, perhaps with a >>>> -format=>'glimmerXXX' parameter >>>> >>>> 2. create a single new module Glimmer2 and Glimmer3 >>>> >>>> 3. create two new modules, one for Glimmer2 and one for Glimmer3, >>>> given >>>> they are different outputs both in syntax and number of output >>>> files >>>> >>>> Any advice from Bioperl 'old timers' appreciated ;-) >>>> >>> >>> If at all possible I'd favor 1), with e.g. Bio::Tools::GFF being an >>> example for how this can work. >>> >>> If this would amount to basically 4 modules stringed together into >>> one file (because the parsing code can't share much if anything >>> between the flavors), it'd still be advantageous to have a single >>> frontend module that would then dispatch. >>> >>> -hilmar >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >>> >>> >>> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From torsten.seemann at infotech.monash.edu.au Tue Feb 6 21:36:54 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Wed, 7 Feb 2007 13:36:54 +1100 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: > I definitely vote for 1) - worst case you have 4 separate methods if > there is no good way to condense the parsing for each format and > require the user to specify the format. And make the defaut -format to be what is currently parses, ie. GlimmerM/GlimmerHMM > I have no problem with requiring user to specify what program she > used - if we can be fancy and guess the format later (i.e. guess > format in SeqIO) -then that's icing. Agreed. >> Okay, I need to get something going for a project I'm working on. I would normally try to help but I am so swamped with work-work at the moment. Just a reminder that last year I added examples of the different Glimmer outputs to the CVS repository: ./t/data/Glimmer3.predict ./t/data/Glimmer3.detail ./t/data/GlimmerHMM.out ./t/data/Glimmer2.out ./t/data/GlimmerM.out ./t/data/glimmer.out (this was the original one) Thanks for taking this on! --Torsten From mitch_skinner at berkeley.edu Tue Feb 6 23:37:35 2007 From: mitch_skinner at berkeley.edu (Mitch Skinner) Date: Tue, 06 Feb 2007 20:37:35 -0800 Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels Message-ID: <45C9578F.2060802@berkeley.edu> Hello, I'm working on an AJAX version of GBrowse (http://genome.biowiki.org), where we're pre-rendering entire chromosomes by breaking them up into tiles. One of the problems we have is that it takes a long time to render all those tiles. One of the things that's slowing the process down (and using lots of RAM) is rendering the gridlines, and it would make things a lot easier (and faster) for us if we could assume that the gridlines were the same for each tile. Since we're only rendering at a particular set of zoom levels (that we have control over), I think this is a reasonable assumption. Given the right set of zoom levels, the assumption works almost all the time, except for one specific case. It has to do with the way draw_grid and map_pt in Bio::Graphics::Panel work for the very first gridline. Here's how draw_grid (in CVS HEAD) computes the first gridline: my $first_tick = $minor * int($self->start/$minor); $first_tick, $minor and $self->start are in base-pair space, which is 1-based. However, if ($self->start < $minor) then $first_tick is 0. This might not be a problem, except that $first_tick is translated into pixel coordinates in map_pt, which expects 1-based bp coordinates. Here are the relevant lines in map_pt: my $val = $flip ? int (0.5 + $pr - ($length - ($_- 1)) * $scale) : int (0.5 + ($_-$offset-1) * $scale); This style of rounding only works for positive numbers; rounding 0.6 by doing int(0.5 + 0.6) gives you 1 as expected, but rounding -0.6 by doing int(0.5 + -0.6) gives you 0. So if the first three gridlines are at 0, 10, and 20 bp, then (assuming $scale is 1, $offset is 0, $flip evaluates false, and pad left is 0) they're drawn at pixels 0, 9, and 19. I think that there should be gridlines at pixels 0, 10, and 20. The fact that currently the first interval is 9 pixels and the second is 10 pixels is breaking my hopeful assumption about the gridlines. AFAICT my problems are solved if we make two changes: change the above line from draw_grid to this: my $first_tick = 1 + $minor * int(($start - 1)/$minor); and change the lines from map_pt to this: my $val = $flip ? ($pr - ($length - ($_- 1)) * $scale) : (($_-$offset-1) * $scale); $val = int($val + .5 * ($val <=> 0)); Does this make sense? If people agree that these changes are right then I can also produce a proper patch if y'all would prefer that. Regards, Mitch From lstein at cshl.edu Wed Feb 7 07:17:22 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 7 Feb 2007 07:17:22 -0500 Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels In-Reply-To: <45C9578F.2060802@berkeley.edu> References: <45C9578F.2060802@berkeley.edu> Message-ID: <6dce9a0b0702070417k692fedcchfac5b1f0e01f72bd@mail.gmail.com> Hi Mitch, Zero is not a forbidden coordinate, since gbrowse also works on genetic maps which have negative and floating point coordinates. You've simply picked up a boundary case where the rounding isn't working properly. I will fix this now. Lincoln On 2/6/07, Mitch Skinner wrote: > > Hello, > > I'm working on an AJAX version of GBrowse (http://genome.biowiki.org), > where we're pre-rendering entire chromosomes by breaking them up into > tiles. One of the problems we have is that it takes a long time to > render all those tiles. One of the things that's slowing the process > down (and using lots of RAM) is rendering the gridlines, and it would > make things a lot easier (and faster) for us if we could assume that the > gridlines were the same for each tile. Since we're only rendering at a > particular set of zoom levels (that we have control over), I think this > is a reasonable assumption. > > Given the right set of zoom levels, the assumption works almost all the > time, except for one specific case. It has to do with the way draw_grid > and map_pt in Bio::Graphics::Panel work for the very first gridline. > > Here's how draw_grid (in CVS HEAD) computes the first gridline: > > my $first_tick = $minor * int($self->start/$minor); > > $first_tick, $minor and $self->start are in base-pair space, which is > 1-based. However, if ($self->start < $minor) then $first_tick is 0. > This might not be a problem, except that $first_tick is translated into > pixel coordinates in map_pt, which expects 1-based bp coordinates. Here > are the relevant lines in map_pt: > > my $val = $flip > ? int (0.5 + $pr - ($length - ($_- 1)) * $scale) > : int (0.5 + ($_-$offset-1) * $scale); > > This style of rounding only works for positive numbers; rounding 0.6 by > doing int(0.5 + 0.6) gives you 1 as expected, but rounding -0.6 by doing > int(0.5 + -0.6) gives you 0. So if the first three gridlines are at 0, > 10, and 20 bp, then (assuming $scale is 1, $offset is 0, $flip evaluates > false, and pad left is 0) they're drawn at pixels 0, 9, and 19. > > I think that there should be gridlines at pixels 0, 10, and 20. The > fact that currently the first interval is 9 pixels and the second is 10 > pixels is breaking my hopeful assumption about the gridlines. > > AFAICT my problems are solved if we make two changes: > change the above line from draw_grid to this: > my $first_tick = 1 + $minor * int(($start - 1)/$minor); > and change the lines from map_pt to this: > > my $val = $flip > ? ($pr - ($length - ($_- 1)) * $scale) > : (($_-$offset-1) * $scale); > $val = int($val + .5 * ($val <=> 0)); > > Does this make sense? If people agree that these changes are right then > I can also produce a proper patch if y'all would prefer that. > > Regards, > Mitch > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Wed Feb 7 07:18:40 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 7 Feb 2007 07:18:40 -0500 Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels In-Reply-To: <45C9578F.2060802@berkeley.edu> References: <45C9578F.2060802@berkeley.edu> Message-ID: <6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com> However, I'm also very interested in why grid-drawing takes so long. When I've profiled drawing, neither grid drawing nor map_pt() consume any significant amount of time. Lincoln On 2/6/07, Mitch Skinner wrote: > > Hello, > > I'm working on an AJAX version of GBrowse (http://genome.biowiki.org), > where we're pre-rendering entire chromosomes by breaking them up into > tiles. One of the problems we have is that it takes a long time to > render all those tiles. One of the things that's slowing the process > down (and using lots of RAM) is rendering the gridlines, and it would > make things a lot easier (and faster) for us if we could assume that the > gridlines were the same for each tile. Since we're only rendering at a > particular set of zoom levels (that we have control over), I think this > is a reasonable assumption. > > Given the right set of zoom levels, the assumption works almost all the > time, except for one specific case. It has to do with the way draw_grid > and map_pt in Bio::Graphics::Panel work for the very first gridline. > > Here's how draw_grid (in CVS HEAD) computes the first gridline: > > my $first_tick = $minor * int($self->start/$minor); > > $first_tick, $minor and $self->start are in base-pair space, which is > 1-based. However, if ($self->start < $minor) then $first_tick is 0. > This might not be a problem, except that $first_tick is translated into > pixel coordinates in map_pt, which expects 1-based bp coordinates. Here > are the relevant lines in map_pt: > > my $val = $flip > ? int (0.5 + $pr - ($length - ($_- 1)) * $scale) > : int (0.5 + ($_-$offset-1) * $scale); > > This style of rounding only works for positive numbers; rounding 0.6 by > doing int(0.5 + 0.6) gives you 1 as expected, but rounding -0.6 by doing > int(0.5 + -0.6) gives you 0. So if the first three gridlines are at 0, > 10, and 20 bp, then (assuming $scale is 1, $offset is 0, $flip evaluates > false, and pad left is 0) they're drawn at pixels 0, 9, and 19. > > I think that there should be gridlines at pixels 0, 10, and 20. The > fact that currently the first interval is 9 pixels and the second is 10 > pixels is breaking my hopeful assumption about the gridlines. > > AFAICT my problems are solved if we make two changes: > change the above line from draw_grid to this: > my $first_tick = 1 + $minor * int(($start - 1)/$minor); > and change the lines from map_pt to this: > > my $val = $flip > ? ($pr - ($length - ($_- 1)) * $scale) > : (($_-$offset-1) * $scale); > $val = int($val + .5 * ($val <=> 0)); > > Does this make sense? If people agree that these changes are right then > I can also produce a proper patch if y'all would prefer that. > > Regards, > Mitch > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From johnsonm at gmail.com Wed Feb 7 11:50:05 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Wed, 7 Feb 2007 10:50:05 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: Well, each format has some unique features. If the user declines to specify the format, I can figure it out, but it will probably involve scanning the input file twice. I'll take a look. I can do all the parsing in one function, in fact I have, just to see how nasty it would end up being. I just can't stomach having the code that tightly coupled and hard to read. In the end it'll probably be three functions. GlimmermM/HMM are pretty close. Maybe two, Glimmer2 and Glimmer3 aren't *that* different, either. On 2/6/07, Jason Stajich wrote: > > I definitely vote for 1) - worst case you have 4 separate methods if there > is no good way to condense the parsing for each format and require the user > to specify the format. > > I have no problem with requiring user to specify what program she used - > if we can be fancy and guess the format later (i.e. guess format in SeqIO) > -then that's icing. > > -jason > > From adsj at novozymes.com Wed Feb 7 12:11:32 2007 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Wed, 07 Feb 2007 18:11:32 +0100 Subject: [Bioperl-l] Bioperl 1.4 to 1.5.2, adding qualifiers to Bio::Seq-objects Message-ID: <8764adoptn.fsf@topper.koldfront.dk> Hi. I am transitioning from Bioperl 1.4 to 1.5.2, and the qualifiers I add to features in Bio::Seq objects have stopped appearing when I output them as EMBL or GenBank-files. Below is a test-script that exercises the problem. I guess I should be doing something else when adding qualifiers, now with 1.5.2 (as reading an EMBL-file with Bio::SeqIO and outputting it again of course works perfectly), but I can't deduce what from perldoc Bio::SeqFeature::Generic - it still lists the add_tag_value method, and calling it doesn't croak nor warn. I have found some comments on this in the release notes of 1.5.0? on the Bioperl wiki, but I must admit I wasn't able to extract what methods I should be calling instead. If someone could point me to the relevant documentation or tell me what method to use instead, I would be happy as a clam. Best regards, Adam == = use Test::More tests=>2; use strict; use warnings; use Bio::Seq; use Bio::SeqFeature::Generic; use IO::String; use Bio::SeqIO; my $seq=Bio::Seq->new( -seq=>'actgactgactg', ); $seq->display_id('D27'); $seq->accession_number('DB:D27'); my $seq_feature=Bio::SeqFeature::Generic->new( -strand=>1, -primary=>'source', ); $seq_feature->set_attributes(-start=>2, -end=>8); $seq_feature->add_tag_value(note=>'TEST'); $seq_feature->add_tag_value(db_xref=>'DB:D27'); $seq->add_SeqFeature($seq_feature); my $raw=''; my $fh=IO::String->new($raw); my $out=Bio::SeqIO->new(-format=>'EMBL', -fh=>$fh); $out->write_seq($seq); ok($raw=~m!/note!, 'Qualifier note found'); ok($raw=~m!/db_xref!, 'Qualifier db_xref found'); == = ? -- Adam Sj?gren adsj at novozymes.com From cjfields at uiuc.edu Wed Feb 7 12:50:13 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Feb 2007 11:50:13 -0600 Subject: [Bioperl-l] Bioperl 1.4 to 1.5.2, adding qualifiers to Bio::Seq-objects In-Reply-To: <8764adoptn.fsf@topper.koldfront.dk> References: <8764adoptn.fsf@topper.koldfront.dk> Message-ID: On Feb 7, 2007, at 11:11 AM, Adam Sj?gren wrote: > Hi. > > > I am transitioning from Bioperl 1.4 to 1.5.2, and the qualifiers I add > to features in Bio::Seq objects have stopped appearing when I output > them as EMBL or GenBank-files. > > Below is a test-script that exercises the problem. > > I guess I should be doing something else when adding qualifiers, now > with 1.5.2 (as reading an EMBL-file with Bio::SeqIO and outputting it > again of course works perfectly), but I can't deduce what from perldoc > Bio::SeqFeature::Generic - it still lists the add_tag_value method, > and calling it doesn't croak nor warn. > > I have found some comments on this in the release notes of 1.5.0? on > the Bioperl wiki, but I must admit I wasn't able to extract what > methods I should be calling instead. > > If someone could point me to the relevant documentation or tell me > what method to use instead, I would be happy as a clam. > > > Best regards, > > Adam ... This works for me using bioperl-live (Mac OS X): ok 1 - Qualifier note found ok 2 - Qualifier db_xref found If I print the string I get: ID DB:D27; SV 1; linear; unassigned DNA; STD; UNC; 12 BP. XX AC DB:D27; XX XX FH Key Location/Qualifiers FH FT source 2..8 FT /db_xref="DB:D27" FT /note="TEST" XX SQ Sequence 12 BP; 3 A; 3 C; 3 G; 3 T; 0 other; actgactgac tg 12 // GenBank also works: LOCUS D27 12 bp dna linear UNK ACCESSION DB:D27 FEATURES Location/Qualifiers source 2..8 /db_xref="DB:D27" /note="TEST" BASE COUNT 3 a 3 c 3 g 3 t ORIGIN 1 actgactgac tg // If you haven't uninstalled 1.4, make sure you aren't running 1.4 or mixing the two versions (you can check by using 'perldoc -l Bio::Root::Root'). chris From cjfields at uiuc.edu Wed Feb 7 13:04:33 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Feb 2007 12:04:33 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: <91A3A651-C0D5-495F-941F-05B8AA0DDA60@uiuc.edu> On Feb 7, 2007, at 10:50 AM, Mark Johnson wrote: > Well, each format has some unique features. If the user > declines to > specify the format, I can figure it out, but it will probably involve > scanning the input file twice. I'll take a look. > I can do all the parsing in one function, in fact I have, just > to see > how nasty it would end up being. I just can't stomach having the > code that > tightly coupled and hard to read. In the end it'll probably be three > functions. GlimmermM/HMM are pretty close. Maybe two, Glimmer2 and > Glimmer3 aren't *that* different, either. I don't see a problem with passing off the parse to a defined class method either right off or mid-parse. I'm doing something like this with a revamped GenBank parser: # declare local to module my %GLIMMER_METHODS = ( 'GlimmerHMM' => '_parsehmm', 'Glimmer' => '_parsenormal', ....others if needed '_DEFAULT_' => '_parseabnormal' ); ... Then either preparse part of file using _readline() to determine format, or use -format and bypass preparsing: sub next_thingy { ... if (!$format) { while (my $line = $self->_readline()) { if ($line =~ m{(something)}) { $format = $1; $self->_pushback($line); last; } } } my $method = (exists $GLIMMER_METHODS($format)) ? $GLIMMER_METHODS($format) : ($GLIMMER_METHODS('_DEFAULT_'); # fallback to this one return $self->$method() # hand off parsing flow to to proper parser ... } # all parser variants would have this structure: sub _parsehmm { my $self = shift; ... init stuff here while (my $line = $self->_readline()) { ... do stuff until END of next prediction/report } ... return data if any } chris > On 2/6/07, Jason Stajich wrote: >> >> I definitely vote for 1) - worst case you have 4 separate methods >> if there >> is no good way to condense the parsing for each format and require >> the user >> to specify the format. >> >> I have no problem with requiring user to specify what program she >> used - >> if we can be fancy and guess the format later (i.e. guess format >> in SeqIO) >> -then that's icing. >> >> -jason >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From johnston at biochem.ucl.ac.uk Wed Feb 7 13:56:52 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Wed, 7 Feb 2007 18:56:52 +0000 (GMT) Subject: [Bioperl-l] RNA folding In-Reply-To: References: Message-ID: Thanks Chris. Storing the interaction data as a hash according to an ontology and using an extended bracket notation as the string representation seems to make sense, but I'm still unsure how this is supposed to be attached to the Seq objects. You reckon it should be an AnnotationI? I'm not sure I understand the distinction between annotations and features. From the docs I got the impression that Features were like annotation on bits of sequences and had a reference to the sequence to which they belong, whereas annotations don't. If that's the case though, why would RNA structure be an annotation rather than a feature? If not, what is the distinction between them? Are the positional Annotation subclasses you're developing intended to replace features? Have I got the wrong end of the stick entirely? Cheers, Cass On Tue, 6 Feb 2007, Chris Fields wrote: > Actually, the only RNA tool wrappers I have made are ones for ERPIN, > RNAMotif, and Infernal (the only one in bioperl-run CVS at this time > is RNAMotif). I am planning on writing up wrappers for Vienna, > UNAFold, and a few others but haven't really started in. Here's > where I'm at right now... > > I am writing up a new set of AnnotationI classes which positionally > describe data (Meta) which I hope will help deal with this stuff. > These would be similar in nature to Heikki's Bio::Seq::Meta classes: > > http://bioperl.org/pipermail/bioperl-l/2006-December/024414.html > > I would use a regular Bio::SeqI and store the structural data and > anything else (such as energy calculations, etc) as Annotation > objects in an AnnotationCollection, and then write up a series of > SeqIO modules to get data into/out of the designated structure > formats, like UNAfold ct, RNAML, and so on. Each sequence would then > be capable of holding more than one structural Annotation (i.e. could > represent different folding pathways, alternative RNA folds, and so on). > > At this point I represent the data as an array of hashes where $array > [0] is nt 1 and the hash keys indicate the type of interaction, base > interacted with, etc. The text representation would be as simple > Eddy WUSS (Rfam-like) format by default, which is capable of > representing some complex data (pseudoknots, for instance), is > compact, and is documented (via the Infernal manual). Tags will > probably switch to more ontologically relevant terms (probably from > RNAML or RNA Ontology), but in general it is something like this: > > [ > {'interaction' => 'WC', > 'base' => 24}, > {'interaction' => 'WC', > 'base' => 23}, > {'interaction' => 'SS'}, > ... > ] > > In this implementation every seq position would have some kind of > interaction designation, though that's open for debate as it could > just be simple text or undef for single-stranded regions. > > This is also scalable based on complexity of the data: if one wanted > to add tert/quaternary interactions, location, base modifications, > remote sequence interactions, etc., extra key/value pairs could be > used. Comversely, if one only wanted sec structure (for drawing RNA > structures, for example), then only that data would be parsed. > > If you (or anyone listening) have any suggestions I would greatly > appreciate them. > > chris > > From cjfields at uiuc.edu Wed Feb 7 17:15:44 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Feb 2007 16:15:44 -0600 Subject: [Bioperl-l] RNA folding In-Reply-To: References: Message-ID: <7360B66F-6AF3-4CB1-8343-0A19E42AD7F8@uiuc.edu> On Feb 7, 2007, at 12:56 PM, Caroline Johnston wrote: > Thanks Chris. > > Storing the interaction data as a hash according to an ontology and > using > an extended bracket notation as the string representation seems to > make > sense, but I'm still unsure how this is supposed to be > attached to the Seq objects. You reckon it should be an AnnotationI? As long as it describes everything in the object and that there is a reasonable way of textually representing the data, I think you can attach anything as annotation. A recent example is the addition of trees as annotation. Also, Annotation can be used to describe alignments (such as the structure consensus string in Rfam alignments), or added to SeqFeatures. The class just needs to implement AnnotatableI. > I'm not sure I understand the distinction between annotations and > features. From the docs I got the impression that Features were like > annotation on bits of sequences and had a reference to the sequence to > which they belong, whereas annotations don't. If that's the case > though, > why would RNA structure be an annotation rather than a feature? If > not, > what is the distinction between them? Are the positional Annotation > subclasses you're developing intended to replace features? Have I > got the > wrong end of the stick entirely? > > Cheers, > Cass The key distinction between seqfeatures and annotations is that annotations are normally associated with the entire sequence record, while seqfeatures normally describe a part of the sequence (and thus have a location on the sequence). There are a few exceptions, but in general that's that case. The HOWTO gives a bit more background: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation Using annotations or seqfeatures in a case like this may be completely dependent on one's point of view. For instance, one implementation I had considered was adding an interface to Bio::Seq which would allow Seq objects to also have Bio::Structure objects/ since my view is that any sequence could (optionally) have a structure associated with it. However, I reasoned that a sequence could actually have multiple structures (RNA, ssDNA, and protein can have several alternative folds or different folding pathways, for instance). Instead of splitting up each structure into individual seqfeatures (where each which would have to be tagged with the relevant structure and score info), I could have one class encompass all of that data in a reasonable way. Hence I used Annotation. BTW, this isn't meant to replace features in any way. It would be primarily used to describe (1) a sequence as a whole, such as a tRNA sequence, (2) a seqfeature, such as a tRNA, rRNA, riboswitch, etc in a genome sequence, or (3) a conserved structure in an alignment, such as Rfam stockholm output. I'll add that the option of splitting the data into seqfeatures isn't ruled out. It would be a matter of using a helper method, maybe in SeqUtils or directly in Annotation::Meta or whatever I end up calling it. I plan on adding something along those lines at some point. chris From mitch_skinner at berkeley.edu Wed Feb 7 18:26:53 2007 From: mitch_skinner at berkeley.edu (Mitch Skinner) Date: Wed, 07 Feb 2007 15:26:53 -0800 Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels In-Reply-To: <6dce9a0b0702070417k692fedcchfac5b1f0e01f72bd@mail.gmail.com> References: <45C9578F.2060802@berkeley.edu> <6dce9a0b0702070417k692fedcchfac5b1f0e01f72bd@mail.gmail.com> Message-ID: <45CA603D.1070901@berkeley.edu> Lincoln Stein wrote: > Zero is not a forbidden coordinate, since gbrowse also works on > genetic maps which have negative and floating point coordinates. > You've simply picked up a boundary case where the rounding isn't > working properly. I will fix this now. Thanks for the fix. What do you think of the following case?. This is something I actually ran into. Suppose you have: the original draw_grid: my $first_tick = $minor * int($self->start/$minor); and my version of map_pt: my $val = $flip ? ($pr - ($length - ($_- 1)) * $scale) : (($_-$offset-1) * $scale); $val = int($val + .5 * ($val <=> 0)); and scale=0.5, offset=0, pad_left=0, flip=0, and minor=10. Our tiles are currently 1000px wide. So the first gridline will be at 0bp => -1px and the 200th gridline will be at 2000bp => 1000px. So the first tile will not have a gridline at it's 0th pixel but the second tile will have one there. Last night I was thinking that this was an artifact of having gridlines start at 0bp but now I'm thinking this is just because rounding half-pixels leaves an extra space when crossing zero. Which is not unreasonable; it just invalidates the assumption I was hoping to make that the gridlines are the same for each tile. Maybe it's just unreasonable to think that floating point calculations will give pixel-exact results. Or I may just be barking up the wrong tree entirely. Perhaps it's time to reconsider at a higher level (see my next message). Mitch From mitch_skinner at berkeley.edu Wed Feb 7 18:28:11 2007 From: mitch_skinner at berkeley.edu (Mitch Skinner) Date: Wed, 07 Feb 2007 15:28:11 -0800 Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels In-Reply-To: <6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com> References: <45C9578F.2060802@berkeley.edu> <6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com> Message-ID: <45CA608B.80907@berkeley.edu> Lincoln Stein wrote: > However, I'm also very interested in why grid-drawing takes so long. > When I've profiled drawing, neither grid drawing nor map_pt() consume > any significant amount of time. Well, the approach that we've been taking is to hand Bio::Graphics::Panel a fake GD object that stores all of the graphical primitives (line, rectangle, filledRectangle, etc. + their parameters) and then draws them later in chunks (for each tile, we draw all the primitives that overlap its pixel boundaries). We're doing this because trying to create a real GD object that's hundreds of millions of pixels wide takes too much RAM. But storing all the gridlines (for a whole chromosome, at a high zoom level) also takes a lot of RAM, and getting the gridlines for the current tile and translating their coordinates into the coordinate space of the tile also takes a fair amount of CPU. The gridline hack I've been experimenting with (that prompted these emails) was motivated by the hope that the gridlines were regular enough that we wouldn't have to store them explicitly, but just draw the same gridlines over and over again. It runs almost twice as fast as the version that explicitly stores the gridlines. So the main slowdown is not in draw_grid or map_pt, but in our code that's storing/retrieving and translating the gridlines. Which we are also looking into speeding up. But the memory usage is harder to reduce; I've experimented with trying to compress the gridline data but it seems easier to just have the panel draw the grid directly. The more I read the Panel code, the more I think it would be nice to make more use of it. One of the reasons that we're trying to fool it right now is that there seem to be a number of behaviors in it (and/or in the glyphs?) that take the current image boundaries into account (drawing an arrow where a feature runs off the edge of the image, etc.). But in our browser each tile is supposed to mesh seamlessly with its neighbor, so if there's an easy way to turn off those edge-aware behaviors that would be pretty interesting. Ian has also suggested that it might be better to store less information than the full set of graphics primitives. For example, we could just store the Panel's glyph boxes and use their (pixel bound)->feature information to decide which features need to be drawn for each tile. I'm going to be spending some time reading the Bio::Graphics code in more depth. I'd also welcome suggestions from you or anyone on the list. Thanks, Mitch From sdbrown at annular.org Wed Feb 7 18:41:13 2007 From: sdbrown at annular.org (Steven Brown) Date: Wed, 7 Feb 2007 15:41:13 -0800 Subject: [Bioperl-l] Bio::Restriction::Analysis cut site problem in 1.5.2 Message-ID: <3828F5B9-BFF8-47F4-8228-3BEA036513A0@annular.org> The module seems to have trouble handling the cut-site specifiers that surround the sequence that the enzyme is specific for. The error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Bad end parameter (22). End must be less than the total length of sequence (total=6) STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.8.6/ Bio/Root/Root.pm:328 STACK: Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.8.6/ Bio/PrimarySeq.pm:371 STACK: Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/perl5/ site_perl/5.8.6/Bio/Restriction/Analysis.pm:884 STACK: Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ site_perl/5.8.6/Bio/Restriction/Analysis.pm:785 STACK: Bio::Restriction::Analysis::cut /usr/local/lib/perl5/site_perl/ 5.8.6/Bio/Restriction/Analysis.pm:369 STACK: Bio::Restriction::Analysis::cutters /usr/local/lib/perl5/ site_perl/5.8.6/Bio/Restriction/Analysis.pm:678 ---snip (my script line)--- ----------------------------------------------------------- The offending enzyme: ---snip--- <1>AcuI <2>Eco57I,Bsp6II,BspD6II,BspKT5I,Eco112I,Eco125I,FsfI <3>CTGAAG(16/14) ---snip--- If I get rid of the (16/14) the error disappears and the right sequence site is matched. It seems like maybe a decision was made not analyze enzymes with remote cut positions, or the code wouldn't throw the error...? Any information on this would be helpful. Thanks, Steve From adsj at novozymes.com Thu Feb 8 03:55:50 2007 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Thu, 08 Feb 2007 09:55:50 +0100 Subject: [Bioperl-l] Bioperl 1.4 to 1.5.2, adding qualifiers to Bio::Seq-objects References: <8764adoptn.fsf@topper.koldfront.dk> Message-ID: <87fy9hqb8p.fsf@topper.koldfront.dk> On Wed, 7 Feb 2007 11:50:13 -0600, Chris wrote: > This works for me using bioperl-live (Mac OS X): > ok 1 - Qualifier note found > ok 2 - Qualifier db_xref found *slaps forehead* Thanks for the test - your diagnose was spot on: > If you haven't uninstalled 1.4, make sure you aren't running 1.4 or > mixing the two versions (you can check by using 'perldoc -l > Bio::Root::Root'). I had a modified version of Bio::Seq and Bio::SeqFeature::Generic in my @INC (added, and promptly forgotten, writing the patch mentioned here: ). Removing those and patching 1.5.2 fixed my self-inflicted problem. Thanks again! Adam -- Adam Sj?gren adsj at novozymes.com From heikki at sanbi.ac.za Thu Feb 8 04:39:47 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 8 Feb 2007 11:39:47 +0200 Subject: [Bioperl-l] Bio::Restriction::Analysis cut site problem in 1.5.2 In-Reply-To: <3828F5B9-BFF8-47F4-8228-3BEA036513A0@annular.org> References: <3828F5B9-BFF8-47F4-8228-3BEA036513A0@annular.org> Message-ID: <200702081139.48125.heikki@sanbi.ac.za> The error comes from Bio::PrimarySeq::subseq when it tries to cut beyond an existing sequence. Maybe your sequence has a restriction site that is near the end of your sequence? This is a special case which has not been into account in Bio::Restriction::Analysis::_cuts method. The question is : should the site be be detected if its cut site is not within the studied sequence? Please submit a bugzilla bug, so this gets solved. I probably do not have time to tweak the code myself. -Heikki On Thursday 08 February 2007 01:41:13 Steven Brown wrote: > The module seems to have trouble handling the cut-site specifiers > that surround the sequence that the enzyme is specific for. The error: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Bad end parameter (22). End must be less than the total length > of sequence (total=6) > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.8.6/ > Bio/Root/Root.pm:328 > STACK: Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.8.6/ > Bio/PrimarySeq.pm:371 > STACK: Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/perl5/ > site_perl/5.8.6/Bio/Restriction/Analysis.pm:884 > STACK: Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ > site_perl/5.8.6/Bio/Restriction/Analysis.pm:785 > STACK: Bio::Restriction::Analysis::cut /usr/local/lib/perl5/site_perl/ > 5.8.6/Bio/Restriction/Analysis.pm:369 > STACK: Bio::Restriction::Analysis::cutters /usr/local/lib/perl5/ > site_perl/5.8.6/Bio/Restriction/Analysis.pm:678 > ---snip (my script line)--- > ----------------------------------------------------------- > > The offending enzyme: > > ---snip--- > <1>AcuI > <2>Eco57I,Bsp6II,BspD6II,BspKT5I,Eco112I,Eco125I,FsfI > <3>CTGAAG(16/14) > ---snip--- > > If I get rid of the (16/14) the error disappears and the right > sequence site is matched. It seems like maybe a decision was made > not analyze enzymes with remote cut positions, or the code wouldn't > throw the error...? Any information on this would be helpful. > > Thanks, > Steve > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From cjfields at uiuc.edu Thu Feb 8 09:20:26 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 8 Feb 2007 08:20:26 -0600 Subject: [Bioperl-l] BLASTXML changes (good this time!) Message-ID: All, BLAST XML parsing should now work for any CPAN-based XML::SAX parser! XML::SAX::PurePerl (comes with XML::SAX, the slowest) XML::SAX::Expat XML::SAX::ExpatXS (the fastest) XML::LibXML::SAX XML::LibXML::SAX::Parser Grant MacLean has updated XML::SAX on CPAN to fix a XML::SAX:PurePerl bug, so using that parser will necessitate an XML::SAX upgrade. I had also found a bug in the SAX handler which chopped off a large chunk of the description for hits which is now fixed in CVS. If Sendu is out there, I think we can safely remove any dependencies beyond XML::SAX 0.15 for the next release. Should I go ahead and modify Build.PL? chris From lstein at cshl.edu Thu Feb 8 10:51:49 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 8 Feb 2007 10:51:49 -0500 Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels In-Reply-To: <45CA608B.80907@berkeley.edu> References: <45C9578F.2060802@berkeley.edu> <6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com> <45CA608B.80907@berkeley.edu> Message-ID: <6dce9a0b0702080751m210e4d44k3e5c38bfdd3ee9ea@mail.gmail.com> Hi, I like the approach you're taking (creating a fake GD object that stores the graphics primitives). Perhaps the best thing to do is to subclass Panel itself so that it doesn't draw the gridlines (or turn gridlines off completely). Then you can draw gridlines after the fact in each tile as needed. Lincoln On 2/7/07, Mitch Skinner wrote: > > Lincoln Stein wrote: > > However, I'm also very interested in why grid-drawing takes so long. > > When I've profiled drawing, neither grid drawing nor map_pt() consume > > any significant amount of time. > Well, the approach that we've been taking is to hand > Bio::Graphics::Panel a fake GD object that stores all of the graphical > primitives (line, rectangle, filledRectangle, etc. + their parameters) > and then draws them later in chunks (for each tile, we draw all the > primitives that overlap its pixel boundaries). We're doing this because > trying to create a real GD object that's hundreds of millions of pixels > wide takes too much RAM. But storing all the gridlines (for a whole > chromosome, at a high zoom level) also takes a lot of RAM, and getting > the gridlines for the current tile and translating their coordinates > into the coordinate space of the tile also takes a fair amount of CPU. > The gridline hack I've been experimenting with (that prompted these > emails) was motivated by the hope that the gridlines were regular enough > that we wouldn't have to store them explicitly, but just draw the same > gridlines over and over again. It runs almost twice as fast as the > version that explicitly stores the gridlines. > > So the main slowdown is not in draw_grid or map_pt, but in our code > that's storing/retrieving and translating the gridlines. Which we are > also looking into speeding up. But the memory usage is harder to > reduce; I've experimented with trying to compress the gridline data but > it seems easier to just have the panel draw the grid directly. > > The more I read the Panel code, the more I think it would be nice to > make more use of it. One of the reasons that we're trying to fool it > right now is that there seem to be a number of behaviors in it (and/or > in the glyphs?) that take the current image boundaries into account > (drawing an arrow where a feature runs off the edge of the image, > etc.). But in our browser each tile is supposed to mesh seamlessly with > its neighbor, so if there's an easy way to turn off those edge-aware > behaviors that would be pretty interesting. > > Ian has also suggested that it might be better to store less information > than the full set of graphics primitives. For example, we could just > store the Panel's glyph boxes and use their (pixel bound)->feature > information to decide which features need to be drawn for each tile. > > I'm going to be spending some time reading the Bio::Graphics code in > more depth. I'd also welcome suggestions from you or anyone on the list. > > Thanks, > Mitch > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From Kevin.M.Brown at asu.edu Thu Feb 8 10:28:30 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 8 Feb 2007 08:28:30 -0700 Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels References: <45C9578F.2060802@berkeley.edu><6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com> <45CA608B.80907@berkeley.edu> Message-ID: <1A4207F8295607498283FE9E93B775B402AAC1D0@EX02.asurite.ad.asu.edu> > The more I read the Panel code, the more I think it would be > nice to make more use of it. One of the reasons that we're > trying to fool it right now is that there seem to be a number > of behaviors in it (and/or in the glyphs?) that take the > current image boundaries into account (drawing an arrow where > a feature runs off the edge of the image, etc.). But in our > browser each tile is supposed to mesh seamlessly with its > neighbor, so if there's an easy way to turn off those > edge-aware behaviors that would be pretty interesting. I think the glyphs try to deal with edges because if they didn't, then they would flow out into whatever right or left padding had been placed around the image when the panel was created. Something I've noticed is that when I create tiles for the chromosomes I'm working on the panels don't line up because the bump position in one panel is not accounted for when the next panel is drawn. From sheris at eps.berkeley.edu Thu Feb 8 12:42:27 2007 From: sheris at eps.berkeley.edu (Sheri Simmons) Date: Thu, 08 Feb 2007 09:42:27 -0800 Subject: [Bioperl-l] bioperl newbie needs help with extracting cds info Message-ID: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu> Hi, I'm a newbie to BioPerl so apologies if this is a very basic question. I am trying to parse GenBank files with the goal of creating concatenated gene lists in nucleic acid or amino acid format. It is working fine, except for one thing: I need to create gene labels incorporating information on whether the gene is on the complementary strand or not ("complement" in the CDS tag). How can I get Bioperl to tell me whether the CDS tag value includes the word "complement"? Thanks Sheri From george.heller at yahoo.com Thu Feb 8 13:54:41 2007 From: george.heller at yahoo.com (George Heller) Date: Thu, 8 Feb 2007 10:54:41 -0800 (PST) Subject: [Bioperl-l] Perl script to extract from ncbi Message-ID: <178139.85769.qm@web56506.mail.re3.yahoo.com> Hi all, I have a question regarding extracting data from Ncbi. I have a database to store the sequence data, but the files I have loaded into it, dont have a proper description line specified. Based on the accession number, I need to find out what is the genus and species name (organism name) from ncbi. I have about 1500 records for which I need to extract the names from ncbi. Any ideas of how I can go about writing a perl script for extracting this information from ncbi? Thanks! George. --------------------------------- Now that's room service! Choose from over 150,000 hotels in 45,000 destinations on Yahoo! Travel to find your fit. From Kevin.M.Brown at asu.edu Thu Feb 8 14:11:50 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 8 Feb 2007 12:11:50 -0700 Subject: [Bioperl-l] bioperl newbie needs help with extracting cds info References: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu> Message-ID: <1A4207F8295607498283FE9E93B775B402AAC29A@EX02.asurite.ad.asu.edu> When you extract the features, just look at the strand method on the returned sequence to find out. @features = $seq->all_SeqFeatures; # sort features by their primary tags for my $f (@features) { my $tag = $f->primary_tag; if ($tag eq 'CDS') { print $f->strand ."\n"; } } > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Sheri Simmons > Sent: Thursday, February 08, 2007 10:42 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] bioperl newbie needs help with > extracting cds info > > Hi, > I'm a newbie to BioPerl so apologies if this is a very basic > question. I am trying to parse GenBank files with the goal of > creating concatenated gene lists in nucleic acid or amino > acid format. It is working fine, except for one thing: I need > to create gene labels incorporating information on whether > the gene is on the complementary strand or not ("complement" > in the CDS tag). How can I get Bioperl to tell me whether the > CDS tag value includes the word "complement"? > > Thanks > Sheri > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From barry.moore at genetics.utah.edu Thu Feb 8 14:35:03 2007 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu, 8 Feb 2007 12:35:03 -0700 Subject: [Bioperl-l] bioperl newbie needs help with extracting cds info In-Reply-To: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu> References: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu> Message-ID: Sheri- The Bio::SeqFeature::Generic object has a 'strand' method, so you can just call strand on the CDS (or any other) feature like this. my @features = grep { $_->primary_tag eq 'CDS' } $seq- >get_SeqFeatures(); for my $feature (@features) { my $strand = $feature->strand; } Barry On Feb 8, 2007, at 10:42 AM, Sheri Simmons wrote: > Hi, > I'm a newbie to BioPerl so apologies if this is a very basic > question. I am trying to parse GenBank files with the goal of > creating concatenated gene lists in nucleic acid or amino acid > format. It is working fine, except for one thing: I need to create > gene labels incorporating information on whether the gene is on the > complementary strand or not ("complement" in the CDS tag). How can I > get Bioperl to tell me whether the CDS tag value includes the word > "complement"? > > Thanks > Sheri > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From torsten.seemann at infotech.monash.edu.au Thu Feb 8 23:18:33 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 9 Feb 2007 15:18:33 +1100 Subject: [Bioperl-l] BLASTXML changes (good this time!) In-Reply-To: References: Message-ID: Chris, > BLAST XML parsing should now work for any CPAN-based XML::SAX parser! > XML::SAX::PurePerl (comes with XML::SAX, the slowest) > XML::SAX::Expat > XML::SAX::ExpatXS (the fastest) > XML::LibXML::SAX > XML::LibXML::SAX::Parser That's excellent news - thanks for all the work you have put in on this one. I'm impressed. This is a good opportunity to encourage people who use Bio::SearchIO for BLAST parsing to switch to 'blastxml' format over 'blast'. Although the latter is more human readable, it perenially requires parser source changes to cope with the variations and new formatting introduced with each new NCBI BLAST release. Best to use "-m 7" XML format, and convert as appropriate using one of the Bio::Search::Writer:: classes. --Torsten From cjfields at uiuc.edu Fri Feb 9 08:58:24 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 9 Feb 2007 07:58:24 -0600 Subject: [Bioperl-l] BLASTXML changes (good this time!) In-Reply-To: References: Message-ID: <4FC966A7-7A03-41D9-ABF7-6ACD888720FB@uiuc.edu> On Feb 8, 2007, at 10:18 PM, Torsten Seemann wrote: > Chris, > >> BLAST XML parsing should now work for any CPAN-based XML::SAX parser! >> XML::SAX::PurePerl (comes with XML::SAX, the slowest) >> XML::SAX::Expat >> XML::SAX::ExpatXS (the fastest) >> XML::LibXML::SAX >> XML::LibXML::SAX::Parser > > That's excellent news - thanks for all the work you have put in on > this one. I'm impressed. Jason did most of the hard work; I just tinkered with it until it worked (and pestered a few perl XML guys along the way). Thanks Grant and Bj?rn! > This is a good opportunity to encourage people who use Bio::SearchIO > for BLAST parsing to switch to 'blastxml' format over 'blast'. > Although the latter is more human readable, it perenially requires > parser source changes to cope with the variations and new formatting > introduced with each new NCBI BLAST release. Best to use "-m 7" XML > format, and convert as appropriate using one of the > Bio::Search::Writer:: classes. > > --Torsten I'll try getting some benchmarks for the different parsers up today on the wiki if I have time. Strangely enough, NCBI changed a few things about BLAST XML a few releases back w/o mentioning it to anyone (it was a silent bug in BLAST XML parsing which I fixed recently). If you sent in multiple queries in older versions of BLAST you would get all of the BLAST XML reports concatenated together, which required preparsing the reports to carve up the XML prior to parsing. Now they treat it like PSI- BLAST where multiple queries = multiple iterations, so you get one long XML BLAST report where each iteration=Result. The current parser should handle both as it just caches the other results and returns them one at a time prior to new parses, but I wouldn't recommend parsing a huge BLAST XML file with hundreds of queries as you'll quickly run out of memory! If they get Perl SAX2 up to date with Expat they'll eventually add parse_chunk() and pause_parse() for each parser. Until then... chris Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cuiw at ncbi.nlm.nih.gov Fri Feb 9 09:20:10 2007 From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C]) Date: Fri, 9 Feb 2007 09:20:10 -0500 Subject: [Bioperl-l] Perl script to extract from ncbi In-Reply-To: <178139.85769.qm@web56506.mail.re3.yahoo.com> References: <178139.85769.qm@web56506.mail.re3.yahoo.com> Message-ID: <18C407FD4FFB424292D769FBD68C1987020BBC58@NIHCESMLBX8.nih.gov> This is an example for fetching two GenBank records (id=124504630,110665734) in XML format. Organism names like 'Rattus norvegicus' can be parsed from the XML. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&i d=124504630,110665734&retmode=xml&rettype=gb Or you can get TaxIds and translate them into real names: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=nucleotide &id=124504630,110665734&retmode=xml Wenwu Cui, PhD -----Original Message----- From: George Heller [mailto:george.heller at yahoo.com] Sent: Thursday, February 08, 2007 1:55 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Perl script to extract from ncbi Hi all, I have a question regarding extracting data from Ncbi. I have a database to store the sequence data, but the files I have loaded into it, dont have a proper description line specified. Based on the accession number, I need to find out what is the genus and species name () from ncbi. I have about 1500 records for which I need to extract the names from ncbi. Any ideas of how I can go about writing a perl script for extracting this information from ncbi? Thanks! George. --------------------------------- Now that's room service! Choose from over 150,000 hotels in 45,000 destinations on Yahoo! Travel to find your fit. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Fri Feb 9 12:51:19 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 09 Feb 2007 12:51:19 -0500 Subject: [Bioperl-l] Perl script to extract from ncbi In-Reply-To: <178139.85769.qm@web56506.mail.re3.yahoo.com> Message-ID: George, http://www.bioperl.org/wiki/HOWTO:Beginners#Retrieving_a_sequence_from_a_dat abase Brian O. On 2/8/07 1:54 PM, "George Heller" wrote: > Hi all, > > I have a question regarding extracting data from Ncbi. I have a database to > store the sequence data, but the files I have loaded into it, dont have a > proper description line specified. Based on the accession number, I need to > find out what is the genus and species name (organism name) from ncbi. > > I have about 1500 records for which I need to extract the names from ncbi. > > Any ideas of how I can go about writing a perl script for extracting this > information from ncbi? > > Thanks! > George. > > > --------------------------------- > Now that's room service! Choose from over 150,000 hotels > in 45,000 destinations on Yahoo! Travel to find your fit. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From johnston at biochem.ucl.ac.uk Fri Feb 9 14:23:41 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Fri, 9 Feb 2007 19:23:41 +0000 (GMT) Subject: [Bioperl-l] WrapperBase Message-ID: Hi, Could WrapperBase::executable warn you if it doesn't find the exe in program_path? At the moment it just silently goes ahead and uses one in the system path if it exists. Cass. I've never used diff, so not sure if this is right, but: 305,308c305,314 < if( $prog_path && -e $prog_path && -x $prog_path ) { < $self->{'_pathtoexe'} = $prog_path; < } else { < my $exe; --- > if($prog_path){ > if(-e $prog_path && -x $prog_path){ > $self->{'_pathtoexe'} = $prog_path; > } > else{ > $self->warn("executable not found in $prog_path, trying system path...") if $warn; > } > } > unless ($self->{_path_to_exe}){ > my $exe; 335a342 From bix at sendu.me.uk Fri Feb 9 17:38:59 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 09 Feb 2007 22:38:59 +0000 Subject: [Bioperl-l] WrapperBase In-Reply-To: References: Message-ID: <45CCF803.9030004@sendu.me.uk> Caroline Johnston wrote: > Hi, > > Could WrapperBase::executable warn you if it doesn't find the exe in > program_path? At the moment it just silently goes ahead and uses one in > the system path if it exists. No, I think not. That would be very annoying when using wrappers for programs that you just have in your system path. What specific problem are you encountering with the current behaviour? From bix at sendu.me.uk Fri Feb 9 17:40:33 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 09 Feb 2007 22:40:33 +0000 Subject: [Bioperl-l] BLASTXML changes (good this time!) In-Reply-To: References: Message-ID: <45CCF861.8030000@sendu.me.uk> Chris Fields wrote: > If Sendu is out there, I think we can safely remove any dependencies > beyond XML::SAX 0.15 for the next release. Should I go ahead and > modify Build.PL? Sure, good to hear. From cjfields at uiuc.edu Fri Feb 9 22:42:24 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 9 Feb 2007 21:42:24 -0600 Subject: [Bioperl-l] BLASTXML changes (good this time!) In-Reply-To: <45CCF861.8030000@sendu.me.uk> References: <45CCF861.8030000@sendu.me.uk> Message-ID: On Feb 9, 2007, at 4:40 PM, Sendu Bala wrote: > Chris Fields wrote: >> If Sendu is out there, I think we can safely remove any dependencies >> beyond XML::SAX 0.15 for the next release. Should I go ahead and >> modify Build.PL? > > Sure, good to hear. I added a version dependency for XML::SAX (v. 0.15) for the PurePerl fix. That likely obviates the need for a Bundle for XML::Simple. Not too pressing; we can determine that before the next release. chris From johnston at biochem.ucl.ac.uk Sat Feb 10 11:27:53 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Sat, 10 Feb 2007 16:27:53 +0000 (GMT) Subject: [Bioperl-l] WrapperBase In-Reply-To: <45CCF803.9030004@sendu.me.uk> References: <45CCF803.9030004@sendu.me.uk> Message-ID: > No, I think not. That would be very annoying when using wrappers for > programs that you just have in your system path. > Hmm, maybe I misundertood what the program_path was for? The executable method goes straight to the system path unless program_path is set, so I assumed you would only set program_path if you specifically wanted it to look somewhere else. You wouldn't get a warning if you didn't specify a program_path and just left it to look in the system path. > What specific problem are you encountering with the current behaviour? One version of an executable in /usr/local, another version - which I wanted to use in my home directory. The program_path method gets a path from an environment variable, which was set to ~/. I didn't realise I had the wrong permissions on the executable though, and it was silently failing to use my version and using the one in /usr/local instead. Cass From george.heller at yahoo.com Sat Feb 10 15:35:18 2007 From: george.heller at yahoo.com (George Heller) Date: Sat, 10 Feb 2007 12:35:18 -0800 (PST) Subject: [Bioperl-l] Error while parsing Message-ID: <162150.76282.qm@web56511.mail.re3.yahoo.com> Hi all, I am in the process of parsing a few files, actually blast results, but happen to get the following error: ------------- EXCEPTION ------------- MSG: Can't get HSPs: data not collected. STACK Bio::Search::Hit::GenericHit::hsp /usr/lib/perl5/site_perl/5.8.5/Bio/Search/Hit/GenericHit.pm:649 STACK toplevel parser.pl:31 -------------------------------------- I am not sure if this is a bug, or is there something I am doing wrong. Any pointers are appreciated. Thanks! George. --------------------------------- No need to miss a message. Get email on-the-go with Yahoo! Mail for Mobile. Get started. From cjfields at uiuc.edu Sat Feb 10 17:56:19 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 10 Feb 2007 16:56:19 -0600 Subject: [Bioperl-l] Error while parsing In-Reply-To: <162150.76282.qm@web56511.mail.re3.yahoo.com> References: <162150.76282.qm@web56511.mail.re3.yahoo.com> Message-ID: On Feb 10, 2007, at 2:35 PM, George Heller wrote: > Hi all, > > I am in the process of parsing a few files, actually blast > results, but happen to get the following error: > > ------------- EXCEPTION ------------- > MSG: Can't get HSPs: data not collected. > STACK Bio::Search::Hit::GenericHit::hsp /usr/lib/perl5/site_perl/ > 5.8.5/Bio/Search/Hit/GenericHit.pm:649 > STACK toplevel parser.pl:31 > -------------------------------------- > > I am not sure if this is a bug, or is there something I am doing > wrong. Any pointers are appreciated. > > Thanks! > George. We'll need more to go on than that. If the bioperl version is v1.5.2, please file a bug via the bioperl bugzilla: http://bugzilla.open-bio.org/ Don't forget to attach a test file which triggers the bug using the 'Create a new attachment' link after the report has been filed. chris From sac at bioperl.org Sat Feb 10 22:56:10 2007 From: sac at bioperl.org (Steve Chervitz) Date: Sat, 10 Feb 2007 19:56:10 -0800 Subject: [Bioperl-l] Error while parsing In-Reply-To: <162150.76282.qm@web56511.mail.re3.yahoo.com> References: <162150.76282.qm@web56511.mail.re3.yahoo.com> Message-ID: <8f200b4c0702101956h53fea96dm241126c680d64ab4@mail.gmail.com> Your report may be lacking HSP alignments for the hit you are attempting to process. Note that by default, blast will report twice as many one-line descriptions as it will alignments: -v Number of database sequences to show one-line descriptions for (V) [Integer] default = 500 -b Number of database sequence to show alignments for (B) [Integer] default = 250 Verify that this isn't the case for your error. If not, go ahead and file a bug report. Attach the report (zipped if big) as well as the relevant portion of your processing script. Steve On 2/10/07, George Heller wrote: > > Hi all, > > I am in the process of parsing a few files, actually blast results, but > happen to get the following error: > > ------------- EXCEPTION ------------- > MSG: Can't get HSPs: data not collected. > STACK Bio::Search::Hit::GenericHit::hsp > /usr/lib/perl5/site_perl/5.8.5/Bio/Search/Hit/GenericHit.pm:649 > STACK toplevel parser.pl:31 > -------------------------------------- > > I am not sure if this is a bug, or is there something I am doing wrong. > Any pointers are appreciated. > > Thanks! > George. > > > --------------------------------- > No need to miss a message. Get email on-the-go > with Yahoo! Mail for Mobile. Get started. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jay at jays.net Sun Feb 11 09:24:55 2007 From: jay at jays.net (Jay Hannah) Date: Sun, 11 Feb 2007 08:24:55 -0600 Subject: [Bioperl-l] wiki: Database error when attempting to change preferences (1054: Unknown column 'user_newpass_time') Message-ID: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net> Just a heads-up -- I wanted to check the "E-mail me when a page I'm watching is changed" box in my preferences http://www.bioperl.org/wiki/Special:Preferences But I can't. Even if I change nothing and hit the Save button I get this: ---------- Database error A database query syntax error has occurred. This may indicate a bug in the software. The last attempted database query was: (SQL query hidden) from within function "User::saveSettings". MySQL returned error "1054: Unknown column 'user_newpass_time' in 'field list' (localhost)". ---------- (Yes, it literally says "(SQL query hidden)". That wasn't me for the purposes of this email. -grin-) Thanks, j seqlab.net http://www.bioperl.org/wiki/User:Jhannah Username: Jhannah User ID: 51 From jay at jays.net Sun Feb 11 10:16:13 2007 From: jay at jays.net (Jay Hannah) Date: Sun, 11 Feb 2007 09:16:13 -0600 Subject: [Bioperl-l] wiki: Database error when attempting to change preferences (1054: Unknown column 'user_newpass_time') In-Reply-To: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net> References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net> Message-ID: <9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net> Hmm.... The error appears to not be limited to changing preferences. I tried to update a couple different pages and got errors like this: ------ Database error A database query syntax error has occurred. This may indicate a bug in the software. The last attempted database query was: (SQL query hidden) from within function "Article::updateRedirectOn". MySQL returned error "1146: Table 'perlwikidb.redirect' doesn't exist (localhost)". ------ So all changes to the wiki aren't working right now? j seqlab.net http://www.bioperl.org/wiki/User:Jhannah From jason at bioperl.org Sun Feb 11 15:18:15 2007 From: jason at bioperl.org (Jason Stajich) Date: Sun, 11 Feb 2007 12:18:15 -0800 Subject: [Bioperl-l] wiki: Database error when attempting to change preferences (1054: Unknown column 'user_newpass_time') In-Reply-To: <9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net> References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net> <9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net> Message-ID: <3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org> Should be fine now - I did an upgrade to mediawiki 1.9 this weekend and i think the upgrade script didn't finish. In the future system support requests should go to support - AT - open-bio.org so we can track them. -jason On Feb 11, 2007, at 7:16 AM, Jay Hannah wrote: > Hmm.... The error appears to not be limited to changing preferences. > I tried to update a couple different pages and got errors like this: > > ------ > Database error > A database query syntax error has occurred. This may indicate a bug > in the software. The last attempted database query was: > > (SQL query hidden) > > from within function "Article::updateRedirectOn". MySQL returned > error "1146: Table 'perlwikidb.redirect' doesn't exist (localhost)". > ------ > > So all changes to the wiki aren't working right now? > > j > seqlab.net > http://www.bioperl.org/wiki/User:Jhannah > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From cjfields at uiuc.edu Sun Feb 11 15:51:53 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 11 Feb 2007 14:51:53 -0600 Subject: [Bioperl-l] wiki: Database error when attempting to change preferences (1054: Unknown column 'user_newpass_time') In-Reply-To: <3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org> References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net> <9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net> <3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org> Message-ID: Is there a good place on the main wiki page to prominently display this? I wanted to place something at the top of the main page but I didn't know if we wanted to post the support email address on the page itself. chris On Feb 11, 2007, at 2:18 PM, Jason Stajich wrote: > Should be fine now - I did an upgrade to mediawiki 1.9 this weekend > and i think the upgrade script didn't finish. > > In the future system support requests should go to support - AT - > open-bio.org so we can track them. > > -jason > On Feb 11, 2007, at 7:16 AM, Jay Hannah wrote: > >> Hmm.... The error appears to not be limited to changing preferences. >> I tried to update a couple different pages and got errors like this: >> >> ------ >> Database error >> A database query syntax error has occurred. This may indicate a bug >> in the software. The last attempted database query was: >> >> (SQL query hidden) >> >> from within function "Article::updateRedirectOn". MySQL returned >> error "1146: Table 'perlwikidb.redirect' doesn't exist (localhost)". >> ------ >> >> So all changes to the wiki aren't working right now? >> >> j >> seqlab.net >> http://www.bioperl.org/wiki/User:Jhannah >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jay at jays.net Sun Feb 11 15:56:53 2007 From: jay at jays.net (Jay Hannah) Date: Sun, 11 Feb 2007 14:56:53 -0600 Subject: [Bioperl-l] wiki: Database error when attempting to change preferences (1054: Unknown column 'user_newpass_time') In-Reply-To: References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net> <9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net> <3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org> Message-ID: On Feb 11, 2007, at 2:51 PM, Chris Fields wrote: > Is there a good place on the main wiki page to prominently display > this? I wanted to place something at the top of the main page but > I didn't know if we wanted to post the support email address on the > page itself. I added it here: http://www.bioperl.org/wiki/About_site Which is linked from all pages via the left-hand bar: community | About this site j seqlab.net http://www.bioperl.org/wiki/User:Jhannah From agd27 at cornell.edu Sun Feb 11 12:47:03 2007 From: agd27 at cornell.edu (Adam Diehl) Date: Sun, 11 Feb 2007 12:47:03 -0500 Subject: [Bioperl-l] Getting GFF output in UCSC-specific format Message-ID: <45CF5697.60703@cornell.edu> Good morning folks, I've got sort of a newbie question regarding how to get gff's out of Bio::Tools:GFF objects that are formatted according to the UCSC browser conventions, described here: http://genome.ucsc.edu/goldenPath/help/customTrack.html#GFF (Ignore the custom track headers and what-not. I just need the fields to be set up according to the descriptions in 1 - 9). The write_feature($feature) method isn't doing it for me, as I get lines like the following (newlines excepted): chr1 EMBL/GenBank/SwissProt gene 1712 2848 . + . db_xref=GeneID:4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002 chr1 EMBL/GenBank/SwissProt CDS 1712 2848 . + . EC_number=2.7.7.7;codon_start=1;db_xref=GI:94989511,GeneID:4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002;product=DNA+polymerase+III%2C+beta+chain;protein_ id=YP_597611.1;transl_table=11;translation=MIQFSINRTLFIHALNATKRAISTKNAIPILSSIKIEVTSTGVTLTGSNGQISIENTIPVSNENAGLLITSPGAILLEASFFINIISSLPDISINVKEIEQHQVVLTSGKSEITLKGKDVDQYPRLQEVSTENPLILKTKLLKSIIAETAFAASLQESRPILTGVHIVLSNHKDFKAVATDSHRMSQRLIT LENTSADFDVVIPSKSLREFSAVFTDDIETVEVFFSPSQILFRSEHISFYTRLLEGNYPDTDRLLMTEFETEVVFNTQSLRHAMERAFLISNATQNGTVKLEITQNHISAHVNSPEVGKVNEDLDIVSQSGSDLTISFNPTYLIESLKAIKSETVKIHFLSPVRPFTLTPGDEEESFIQLITPVRTN As you can see, field 8, which should be frame according to UCSC conventions is blank, and field 9, group according to UCSC, has frame, along with ID, etc. All this extra stuff causes the UCSC browser to choke. First off, it can't identify which features are the same (it does this by matching the group field), and second, it can't interpret the CDS's into translated proteins because it lacks frame data. Basically what I need to do is, for CDS features, extract frame (or codon_start, as it were), from the last field, parse out the integer value and store that in field 8 (as frame), then parse out locus_tag from the last field, clear out everything else and store the locus_tag only in that field (preferably without the qualifier locus_tag=). For feature type gene, I just want to do the last step, so that the gene and CDS features for the same feature have matching group fields that are as simple as possible. Let me know if this is not clear. The way I've been trying to do this is by stringifying each gff object, splitting into an array, @tmp1, splitting @tmp1[8] into @tmp2 with the following code: my @tmp2 = split /\;\, $tmp1[8]; and finally, trying to parse out the bits I need with regular expressions and store back to @tmp1[n]. -- This does not work, because perl wants to interpret every / + etc. as a metacharacter! I am assuming there's a simple way to get at each value in the last field of the gff object using methods supplied by Bio::Tools::GFF, but the API docs seem a bit lacking in this area. Could anyone steer me towards what I need to know to do this? Please let me know if I can clarify any details! Cheers, Adam Diehl From jason at bioperl.org Sun Feb 11 18:29:16 2007 From: jason at bioperl.org (Jason Stajich) Date: Sun, 11 Feb 2007 15:29:16 -0800 Subject: [Bioperl-l] Getting GFF output in UCSC-specific format In-Reply-To: <45CF5697.60703@cornell.edu> References: <45CF5697.60703@cornell.edu> Message-ID: I assume you are getting your features from a Bio::SeqIO parse of a Genbank file? you get back a Bio::SeqFeature::Generic objects so you want to look at the docs for that module to see what the API is. you will need to set frame via $feature->frame($frame) You are going to have to determine the frame yourself if that isn't part of the feature, we don't calculate it for you. For the 9th column, this is available through the tags methods has_tag, add_tag_values, get_tag_values, get_all_tags, and remove_tag so you can remove all the tags you don't want through remove_tag (or if you want to remove them all) my $locus; for my $tag ( $feature->get_all_tags ) { if( $tag eq 'locus_tag' ) { # save the locus_tag when we see it ($locus) = $feature->get_tag_values($tag); } $feature->remove_tag($tag); } You will also want to set the GFF format when you call Bio::Tools::GFF - I think the UCSC site is only supporting GFF1, I don't know exactly how you set the tag then when they aren't paired with key=>value, you'll need to set the tag to 'group' so $feature->add_tag_value('group', $locus); If this is all unsatistfactory you can easily write your own GFF write for your flavor of the data with the print join("\t", $feat->seq_id, $feat->source_tag, $feat->primary_tag, $feat->start, $feat->end, $feat->score, $feat->strand > 0 ? '+' : '-', $feat->frame, $locus), "\n"; -jason On Feb 11, 2007, at 9:47 AM, Adam Diehl wrote: > Good morning folks, > > I've got sort of a newbie question regarding how to get gff's out of > Bio::Tools:GFF objects that are formatted according to the UCSC > browser > conventions, described here: > > http://genome.ucsc.edu/goldenPath/help/customTrack.html#GFF > (Ignore the custom track headers and what-not. I just need the > fields to > be set up according to the descriptions in 1 - 9). > > The write_feature($feature) method isn't doing it for me, as I get > lines > like the following (newlines excepted): > > chr1 EMBL/GenBank/SwissProt gene 1712 2848 . + > . db_xref=GeneID:4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002 > chr1 EMBL/GenBank/SwissProt CDS 1712 2848 . + > . > EC_number=2.7.7.7;codon_start=1;db_xref=GI:94989511,GeneID: > 4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002;product=DNA+polymerase > +III%2C+beta+chain;protein_ > id=YP_597611.1;transl_table=11;translation=MIQFSINRTLFIHALNATKRAISTKNA > IPILSSIKIEVTSTGVTLTGSNGQISIENTIPVSNENAGLLITSPGAILLEASFFINIISSLPDISINVK > EIEQHQVVLTSGKSEITLKGKDVDQYPRLQEVSTENPLILKTKLLKSIIAETAFAASLQESRPILTGVHI > VLSNHKDFKAVATDSHRMSQRLIT > LENTSADFDVVIPSKSLREFSAVFTDDIETVEVFFSPSQILFRSEHISFYTRLLEGNYPDTDRLLMTEFE > TEVVFNTQSLRHAMERAFLISNATQNGTVKLEITQNHISAHVNSPEVGKVNEDLDIVSQSGSDLTISFNP > TYLIESLKAIKSETVKIHFLSPVRPFTLTPGDEEESFIQLITPVRTN > > As you can see, field 8, which should be frame according to UCSC > conventions is blank, and field 9, group according to UCSC, has frame, > along with ID, etc. All this extra stuff causes the UCSC browser to > choke. First off, it can't identify which features are the same (it > does > this by matching the group field), and second, it can't interpret the > CDS's into translated proteins because it lacks frame data. > > Basically what I need to do is, for CDS features, extract frame (or > codon_start, as it were), from the last field, parse out the integer > value and store that in field 8 (as frame), then parse out locus_tag > from the last field, clear out everything else and store the locus_tag > only in that field (preferably without the qualifier locus_tag=). For > feature type gene, I just want to do the last step, so that the > gene and > CDS features for the same feature have matching group fields that > are as > simple as possible. Let me know if this is not clear. > > The way I've been trying to do this is by stringifying each gff > object, > splitting into an array, @tmp1, splitting @tmp1[8] into @tmp2 with the > following code: my @tmp2 = split /\;\, $tmp1[8]; and finally, > trying to > parse out the bits I need with regular expressions and store back to > @tmp1[n]. -- This does not work, because perl wants to interpret > every > / + etc. as a metacharacter! > > I am assuming there's a simple way to get at each value in the last > field of the gff object using methods supplied by Bio::Tools::GFF, but > the API docs seem a bit lacking in this area. Could anyone steer me > towards what I need to know to do this? Please let me know if I can > clarify any details! > > Cheers, > Adam Diehl > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From bix at sendu.me.uk Sun Feb 11 18:39:15 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 11 Feb 2007 23:39:15 +0000 Subject: [Bioperl-l] WrapperBase In-Reply-To: References: <45CCF803.9030004@sendu.me.uk> Message-ID: <45CFA923.8010201@sendu.me.uk> Caroline Johnston wrote: >> No, I think not. That would be very annoying when using wrappers for >> programs that you just have in your system path. > > Hmm, maybe I misundertood what the program_path was for? The executable > method goes straight to the system path unless program_path is set, so I > assumed you would only set program_path if you specifically wanted it to > look somewhere else. You wouldn't get a warning if you didn't specify a > program_path and just left it to look in the system path. Yes, sorry. Having now actually looked at your patch it seems fine. I'll commit it unless someone beats me to it. From flope004 at hotmail.com Sun Feb 11 21:40:08 2007 From: flope004 at hotmail.com (Wolverine Fran) Date: Mon, 12 Feb 2007 03:40:08 +0100 Subject: [Bioperl-l] TreeIO, how it works? Message-ID: Hi, I have a problem. I don't understand how TreeIO reads the trees: my input: ((dog:0.04,cat:0.08):0.12,(human:0.15,mouse:0.2)); An unrooted tree with 4 tips and 2 internal nodes. when I asked for: print "Total number of nodes ",$tree->number_nodes; I get 6 but when I ask for: foreach my $node (@nodes) { print $node->internal_id,","; } I get 6,0,1,2,3,4,5. Total 7. The root is number 6 and 2 and 5 are my internal nodes. If I set the root to be number 5 this node 6 is still present. Why? what is the node 6? when I try the following: $node5 = $tree->find_node(-internal_id => '5'); $node6 = $tree->find_node(-internal_id => '6'); $node2 = $tree->find_node(-internal_id => '2'); $distance1 = $tree->distance(-nodes =>[$node5,$node2]); $distance2 = $tree->distance(-nodes =>[$node5,$node6]); $distance3 = $tree->distance(-nodes =>[$node2,$node6]); or any other distance I get 2 warnings: -------------------- WARNING --------------------- MSG: Must provide a valid array reference for -nodes --------------------------------------------------- -------------------- WARNING --------------------- MSG: Could not find distance! --------------------------------------------------- What am I doing incorrectly? I am practicing with AlignIO and TreeIO to calculate the maximum likelihood for a given tree. So, other information about that would be of great help. I am practicing with this to see how Bioperl can help me with more complex problems. Thank you very much for your help! _________________________________________________________________ Acepta el reto MSN Premium: Correos m?s divertidos con fotos y textos incre?bles en MSN Premium. Desc?rgalo y pru?balo 2 meses gratis. http://join.msn.com?XAPID=1697&DI=1055&HL=Footer_mailsenviados_correosmasdivertidos From jason at bioperl.org Sun Feb 11 22:05:18 2007 From: jason at bioperl.org (Jason Stajich) Date: Sun, 11 Feb 2007 19:05:18 -0800 Subject: [Bioperl-l] TreeIO, how (does) it work? In-Reply-To: References: Message-ID: <60D2DD3B-2ED1-4A92-A9FA-2875FEAA28CC@bioperl.org> On Feb 11, 2007, at 6:40 PM, Wolverine Fran wrote: > Hi, > > I have a problem. I don't understand how TreeIO reads the trees: > my input: ((dog:0.04,cat:0.08):0.12,(human:0.15,mouse:0.2)); > > An unrooted tree with 4 tips and 2 internal nodes. > when I asked for: > print "Total number of nodes ",$tree->number_nodes; > > I get 6 but when I ask for: > foreach my $node (@nodes) { > print $node->internal_id,","; > } > I get 6,0,1,2,3,4,5. Total 7. > > The root is number 6 and 2 and 5 are my internal nodes. > If I set the root to be number 5 this node 6 is still present. > Why? what is the node 6? Node 6 is to hold the root or a fake root with a trifurcation for unrooted trees. Did you actually call the reroot method to set the root to node 5? > > when I try the following: > $node5 = $tree->find_node(-internal_id => '5'); > $node6 = $tree->find_node(-internal_id => '6'); > $node2 = $tree->find_node(-internal_id => '2'); > $distance1 = $tree->distance(-nodes =>[$node5,$node2]); > $distance2 = $tree->distance(-nodes =>[$node5,$node6]); > $distance3 = $tree->distance(-nodes =>[$node2,$node6]); > or any other distance I get 2 warnings: > -------------------- WARNING --------------------- > MSG: Must provide a valid array reference for -nodes > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: Could not find distance! > --------------------------------------------------- > What am I doing incorrectly? > The distance method is just summing branch lengths on the path between two nodes. Is that what are you trying to do? The error message you report doesn't make sense as "Must provide a valid array reference for -nodes" is only printed when you call is_monophyletic or is_paraphyletic as far as I can tell. what version of bioperl are you using? > I am practicing with AlignIO and TreeIO to calculate the maximum > likelihood > for a given tree. So,other information about that would be of great > help. I am practicing with > this to see how Bioperl can help me with more complex problems. > You are trying to calculate the likelihood of a tree or are you trying to generate a ML tree from an alignment? > Thank you very much for your help! > > _________________________________________________________________ > Acepta el reto MSN Premium: Correos m?s divertidos con fotos y textos > incre?bles en MSN Premium. Desc?rgalo y pru?balo 2 meses gratis. > http://join.msn.com? > XAPID=1697&DI=1055&HL=Footer_mailsenviados_correosmasdivertidos > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From er at xs4all.nl Mon Feb 12 08:03:06 2007 From: er at xs4all.nl (Erik) Date: Mon, 12 Feb 2007 14:03:06 +0100 (CET) Subject: [Bioperl-l] bioperl wiki changes rss / atom In-Reply-To: References: <162150.76282.qm@web56511.mail.re3.yahoo.com> Message-ID: <20181.156.83.0.124.1171285386.squirrel@webmail.xs4all.nl> Hi, The bioperl wiki changes rss / atom feed has two leading empty lines which invalidate the xml: XML Parsing Error: xml declaration not at start of external entity Location: http://www.bioperl.org/w/index.php?title=Special:Recentchanges&feed=rss Line Number 3, Column 1: ^ Could those be removed? (I didn't see a way to do it myself). Might be a useful feed :) thanks, Erik From cjfields at uiuc.edu Mon Feb 12 09:52:44 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 12 Feb 2007 08:52:44 -0600 Subject: [Bioperl-l] bioperl wiki changes rss / atom In-Reply-To: <20181.156.83.0.124.1171285386.squirrel@webmail.xs4all.nl> References: <162150.76282.qm@web56511.mail.re3.yahoo.com> <20181.156.83.0.124.1171285386.squirrel@webmail.xs4all.nl> Message-ID: I have forwarded this to support at open-bio.org, which should take care of it. chris On Feb 12, 2007, at 7:03 AM, Erik wrote: > Hi, > > > The bioperl wiki changes rss / atom feed has two leading empty > lines which > invalidate the xml: > > XML Parsing Error: xml declaration not at start of external entity > Location: > http://www.bioperl.org/w/index.php? > title=Special:Recentchanges&feed=rss > Line Number 3, Column 1: > ^ > > Could those be removed? (I didn't see a way to do it myself). Might > be a > useful feed :) > > > thanks, > > Erik > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From sm8 at sanger.ac.uk Mon Feb 12 12:12:00 2007 From: sm8 at sanger.ac.uk (Stephen Montgomery) Date: Mon, 12 Feb 2007 17:12:00 -0000 Subject: [Bioperl-l] subtract for Bio::RangeI.pm Message-ID: Hi - It is a subtract function for the Bio::RangeI class. (To be added if interested) All the best! Stephen Montgomery //ADD TO BIO::RANGEI =head2 subtract Title : subtract Usage : my @subtracted = $r1->subtract($r2) Function: Subtract range r2 from range r1 Args : arg #1 = a range to subtract from this one (mandatory) arg #2 = strand option ('strong', 'weak', 'ignore') (optional) Returns : undef if they do not overlap or r2 contains this RangeI, or an arrayref of Range objects (this is an array since some instances where the subtract range is enclosed within this range will result in the creation of two new disjoint ranges) =cut sub subtract() { my ($self, $range, $so) = @_; $self->throw("missing arg: you need to pass in another feature") unless $range; return unless $self->_testStrand($range, $so); if ($self eq "Bio::RangeI") { $self = "Bio::Range"; $self->warn("calling static methods of an interface is deprecated; use $self instead"); } $range->throw("Input a Bio::RangeI object") unless $range->isa('Bio::RangeI'); if (!$self->overlaps($range)) { return undef; } ##Subtracts everything if ($range->contains($self)) { return undef; } my ($start, $end, $strand) = $self->intersection($range, $so); ##Subtract intersection from $self range my @outranges = (); if ($self->start < $start) { push(@outranges, $self->new('-start'=>$self->start, '-end'=>$start - 1, '-strand'=>$self->strand, )); } if ($self->end > $end) { push(@outranges, $self->new('-start'=>$end + 1, '-end'=>$self->end, '-strand'=>$self->strand, )); } return \@outranges; } //UNIT TEST #!/usr/bin/perl use strict; use Bio::SeqFeature::Generic; use Data::Dumper; use Test; plan tests => 13; my $feature1 = new Bio::SeqFeature::Generic ( -start => 1, -end => 1000, -strand => 1); my $feature2 = new Bio::SeqFeature::Generic ( -start => 100, -end => 900, -strand => -1); my $subtracted = $feature1->subtract($feature2); ok(defined($subtracted)); ok(scalar(@$subtracted) == 2); foreach my $range (@$subtracted) { ok($range->start == 1 || $range->start == 901); ok($range->end == 99 || $range->end == 1000); } my $subtracted = $feature2->subtract($feature1); ok(!defined($subtracted)); my $subtracted = $feature1->subtract($feature2, 'weak'); ok(!defined($subtracted)); my $subtracted = $feature1->subtract($feature2, 'strong'); ok(!defined($subtracted)); my $feature3 = new Bio::SeqFeature::Generic ( -start => 500, -end => 1500, -strand => 1); my $subtracted = $feature1->subtract($feature3); ok(defined($subtracted)); ok(scalar(@$subtracted) == 1); my $subtracted_i = @$subtracted[0]; ok($subtracted_i->start == 1); ok($subtracted_i->end == 499); From sm8 at sanger.ac.uk Mon Feb 12 11:04:41 2007 From: sm8 at sanger.ac.uk (Stephen Montgomery) Date: Mon, 12 Feb 2007 16:04:41 -0000 Subject: [Bioperl-l] subtract for Bio::RangeI.pm Message-ID: Hi - It is a subtract function for the Bio::RangeI class. (To be added if interested) All the best! Stephen Montgomery //ADD TO BIO::RANGEI =head2 subtract Title : subtract Usage : my @subtracted = $r1->subtract($r2) Function: Subtract range r2 from range r1 Args : arg #1 = a range to subtract from this one (mandatory) arg #2 = strand option ('strong', 'weak', 'ignore') (optional) Returns : undef if they do not overlap or r2 contains this RangeI, or an arrayref of Range objects (this is an array since some instances where the subtract range is enclosed within this range will result in the creation of two new disjoint ranges) =cut sub subtract() { my ($self, $range, $so) = @_; $self->throw("missing arg: you need to pass in another feature") unless $range; return unless $self->_testStrand($range, $so); if ($self eq "Bio::RangeI") { $self = "Bio::Range"; $self->warn("calling static methods of an interface is deprecated; use $self instead"); } $range->throw("Input a Bio::RangeI object") unless $range->isa('Bio::RangeI'); if (!$self->overlaps($range)) { return undef; } ##Subtracts everything if ($range->contains($self)) { return undef; } my ($start, $end, $strand) = $self->intersection($range, $so); ##Subtract intersection from $self range my @outranges = (); if ($self->start < $start) { push(@outranges, $self->new('-start'=>$self->start, '-end'=>$start - 1, '-strand'=>$self->strand, )); } if ($self->end > $end) { push(@outranges, $self->new('-start'=>$end + 1, '-end'=>$self->end, '-strand'=>$self->strand, )); } return \@outranges; } //UNIT TEST #!/usr/bin/perl use strict; use Bio::SeqFeature::Generic; use Data::Dumper; use Test; plan tests => 13; my $feature1 = new Bio::SeqFeature::Generic ( -start => 1, -end => 1000, -strand => 1); my $feature2 = new Bio::SeqFeature::Generic ( -start => 100, -end => 900, -strand => -1); my $subtracted = $feature1->subtract($feature2); ok(defined($subtracted)); ok(scalar(@$subtracted) == 2); foreach my $range (@$subtracted) { ok($range->start == 1 || $range->start == 901); ok($range->end == 99 || $range->end == 1000); } my $subtracted = $feature2->subtract($feature1); ok(!defined($subtracted)); my $subtracted = $feature1->subtract($feature2, 'weak'); ok(!defined($subtracted)); my $subtracted = $feature1->subtract($feature2, 'strong'); ok(!defined($subtracted)); my $feature3 = new Bio::SeqFeature::Generic ( -start => 500, -end => 1500, -strand => 1); my $subtracted = $feature1->subtract($feature3); ok(defined($subtracted)); ok(scalar(@$subtracted) == 1); my $subtracted_i = @$subtracted[0]; ok($subtracted_i->start == 1); ok($subtracted_i->end == 499); From flope004 at hotmail.com Mon Feb 12 13:07:12 2007 From: flope004 at hotmail.com (Wolverine Fran) Date: Mon, 12 Feb 2007 19:07:12 +0100 Subject: [Bioperl-l] TreeIO, how (does) it work? In-Reply-To: <60D2DD3B-2ED1-4A92-A9FA-2875FEAA28CC@bioperl.org> Message-ID: thanks for your reply! I am using Bioperl 1.4. >Node 6 is to hold the root or a fake root with a trifurcation for >unrooted trees. Did you actually call the reroot method to set the >root to node 5? Yes, I tried the following with the same result: $tree->reroot($tree->find_node(-internal_id => '5')); or $tree->set_root_node($tree->find_node(-internal_id => '5')); Even if I use a rooted tree: (((dog:0.04,cat:0.08):0.12,human:0.15):0.1,mouse:0.1); I get the node #6. So, is it always present? Am I not representing properly a rooted tree in newick format? >The distance method is just summing branch lengths on the path >between two nodes. Is that what are you trying to do? > >The error message you report doesn't make sense as >"Must provide a valid array reference for -nodes" >is only printed when you call is_monophyletic or is_paraphyletic as >far as I can tell. I do not know yet what I was doing incorrectly but now It works. Yes, I was using the distance method to know where the node 6 was located. For the unrooted tree, node 6 was node 5 (an internal node) and for the rooted tree node 6 was 0.1 from the mouse leaf and the internal node (root). The error message: "Must provide a valid array reference for -nodes" is shown if I indicate a node which is not present in the tree. >You are trying to calculate the likelihood of a tree or are you >trying to generate a ML tree from an alignment? I am trying to calculate the likelihood of a tree, as a practice. Probably there are other bioperl modules, besides AlignIO and TreeIO, which can help me in the process and I do not know them. Again, thank you for your time! _________________________________________________________________ Acepta el reto MSN Premium: Protecci?n para tus hijos en internet. Desc?rgalo y pru?balo 2 meses gratis. http://join.msn.com?XAPID=1697&DI=1055&HL=Footer_mailsenviados_proteccioninfantil From dmessina at wustl.edu Mon Feb 12 12:49:49 2007 From: dmessina at wustl.edu (David Messina) Date: Mon, 12 Feb 2007 11:49:49 -0600 Subject: [Bioperl-l] subtract for Bio::RangeI.pm In-Reply-To: References: Message-ID: <1574ACCF-92D5-4DEC-AD04-14EB7767F22A@wustl.edu> Stephen, Great, thanks for this. Could you submit it to Bugzilla as an enhancement? http://bugzilla.open-bio.org/ Thanks, Dave From jason at bioperl.org Mon Feb 12 13:38:11 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 12 Feb 2007 10:38:11 -0800 Subject: [Bioperl-l] TreeIO, how (does) it work? In-Reply-To: References: Message-ID: I would definitely suggest getting ahold of bioperl 1.5.2 as I seem to remember there are several fixes in the tree module code for re- rooting a tree. -jason On Feb 12, 2007, at 10:07 AM, Wolverine Fran wrote: > thanks for your reply! > > I am using Bioperl 1.4. > >> Node 6 is to hold the root or a fake root with a trifurcation for >> unrooted trees. Did you actually call the reroot method to set the >> root to node 5? > > Yes, I tried the following with the same result: > $tree->reroot($tree->find_node(-internal_id => '5')); > or > $tree->set_root_node($tree->find_node(-internal_id => '5')); > > Even if I use a rooted tree: (((dog:0.04,cat:0.08):0.12,human:0.15): > 0.1,mouse:0.1); > I get the node #6. So, is it always present? Am I not representing > properly a rooted tree in newick format? > >> The distance method is just summing branch lengths on the path >> between two nodes. Is that what are you trying to do? >> >> The error message you report doesn't make sense as >> "Must provide a valid array reference for -nodes" >> is only printed when you call is_monophyletic or is_paraphyletic as >> far as I can tell. > > I do not know yet what I was doing incorrectly but now It works. > Yes, I was using the distance method to know where the node 6 was > located. For the unrooted tree, node 6 was node 5 (an internal > node) and for the rooted tree node 6 was 0.1 from the mouse leaf > and the internal node (root). > The error message: "Must provide a valid array reference for - > nodes" is shown if I indicate a node which is not present in the tree. > >> You are trying to calculate the likelihood of a tree or are you >> trying to generate a ML tree from an alignment? > > I am trying to calculate the likelihood of a tree, as a practice. > Probably there are other bioperl modules, besides AlignIO and > TreeIO, which can help me in the process and I do not know them. > > Again, thank you for your time! > > _________________________________________________________________ > Acepta el reto MSN Premium: Protecci?n para tus hijos en internet. > Desc?rgalo y pru?balo 2 meses gratis. http://join.msn.com? > XAPID=1697&DI=1055&HL=Footer_mailsenviados_proteccioninfantil > -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From johnsonm at gmail.com Mon Feb 12 18:13:09 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Mon, 12 Feb 2007 17:13:09 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: On 2/7/07, Mark Johnson wrote: > > Well, each format has some unique features. If the user declines to > specify the format, I can figure it out, but it will probably involve > scanning the input file twice. I'll take a look. > I can do all the parsing in one function, in fact I have, just to see > how nasty it would end up being. I just can't stomach having the code that > tightly coupled and hard to read. In the end it'll probably be three > functions. GlimmermM/HMM are pretty close. Maybe two, Glimmer2 and > Glimmer3 aren't *that* different, either. I've got a 4-in-1 parser roughed in per Chris Fields' suggestion. Two actual parsing routines (prokaryotic and eukaryotic). You can specify -format as an arg to the constructor (Glimmer, GlimmerM, GlimmerHMM), or it will look through the input until it can figure out what it is looking at. I've got one main issue to solve, the rest is just stuff like updating the POD. Torsten Seemann very helpfully added example output for all 4 formats to t/data. Looking at GlimmerHMM.out, the first line is 'GlimmerHMM'. However, I think there is a bug in the existing _parse_predictions: Shouldn't this: } elsif( /^(Glimmer\S*)$/ ) { # GlimmerHMM has no version $source = $1; next; } be this instead: } elsif( /^(GlimmerHMM\S*)$/ ) { # GlimmerHMM has no version $source = $1; next; } I lifted that bit of code to do format detection...we don't have GlimmerHMM installed locally, so I'm assuming Torsten's output is correct and the above is a bug. Guess I'll go check bugzilla... From torsten.seemann at infotech.monash.edu.au Mon Feb 12 21:07:40 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 13 Feb 2007 13:07:40 +1100 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: Mark, > I've got one main issue to solve, the rest is just stuff like updating > the POD. Torsten Seemann very helpfully added example output for all 4 > formats to t/data. Looking at GlimmerHMM.out, the first line is > 'GlimmerHMM'. However, I think there is a bug in the existing > _parse_predictions: > Shouldn't this: > } elsif( /^(Glimmer\S*)$/ ) { # GlimmerHMM has no version > be this instead: > } elsif( /^(GlimmerHMM\S*)$/ ) { # GlimmerHMM has no version I think it should be what it says, or perhaps now /^(Glimmer(M|HMM))/. Here's why: I came onto the scene at Glimmer.pm rev 1.4. At that stage it only parse GlimmerM. I noted that GlimmerHMM was the same output format as GlimmerM, except for the first line. So in rev 1.5 I modified the regexp to match both ie. \S* . This would also hopefully match any other Glimmer-clone formats that arose. I also fixed the pdocs to say this, and added tests to t/Genpred.t. % cvs diff -r 1.4 -r 1.5 Bio/Tools/Glimmer.pm % cvs diff -r 1.15 -r 1.16 t/Genpred.t I then planned to extend support to Glimmer2 and Glimmer3. I added the 4 test files (t/Glimmer*.out) but never wrote the code. This is where you have come in Mark :-) > I lifted that bit of code to do format detection...we don't have GlimmerHMM > installed locally, so I'm assuming Torsten's output is correct and the above > is a bug. Guess I'll go check bugzilla... I'm pretty sure my 4 test files are correct - I spent a lot of time ensuring they were consistent etc, as I was getting very confused with the different "glimmer" versions! Hope this all helps, --Torsten From avilella at gmail.com Tue Feb 13 08:20:15 2007 From: avilella at gmail.com (Albert Vilella) Date: Tue, 13 Feb 2007 13:20:15 +0000 Subject: [Bioperl-l] number of gaps for the other sequences in an alignment Message-ID: <358f4d650702130520n269419cfkb9cb6dac8feaaa5c@mail.gmail.com> Hi, It would be great if we could have a method to count, given one sequence in an alignment, the number of gaps present in the rest of the sequences of the alignment. That is, for each nucleotide/aminoacidic position of the sequence of interest, look at the column in the alignment, count the gaps, then sum them over for the rest of the non-gapped columns in the sequence of interest. Has anyone tried this before? My idea is to end up having a coefficient of indel contribution for each of the sequences in the alignment, with this coefficient being high when one sequences forces a lot of gaps to be inserted in the final alignment, in order to accommodate this given sequence. I would say that the best place for this is either using methods already available in SimpleAlign, or have something new added there. Looking forward to your comments, Cheers, Albert. From bix at sendu.me.uk Tue Feb 13 11:09:09 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 13 Feb 2007 16:09:09 +0000 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts Message-ID: <45D1E2A5.6060104@sendu.me.uk> I have some raw sequences in a Bio::DB::SeqFeature::Store mysql database and wanted to associated some basic information with them, like exon positions. I thought of creating Bio::SeqFeature::Gene::Transcript objects and storing them so I could later use features() to see what other features overlapped exons. I ran into a fatal error that can be replicated with the following simplified one-liner: perl -MBio::DB::SeqFeature::Store -MBio::SeqFeature::Gene::Transcript -e '$db = Bio::DB::SeqFeature::Store->new(-adaptor => "DBI::mysql", -dsn => "dbi:mysql:test"); $trans = Bio::SeqFeature::Gene::Transcript->new(-start => 1, -end => 2, -seq_id => "test"); $db->store($trans); @trans = $db->features(-seqid => $id, -type => "transcript"); print "@trans\n";' code sub { package Bio::SeqFeature::Generic; use strict 'refs'; my $self = shift @_; foreach my $f (@{[] unless $$self{'_gsf_sub_array'};}) { $f = undef; } $$self{'_gsf_seq'} = undef; foreach my $t (keys %{{} unless $$self{'_gsf_tag_hash'};}) { $$self{'_gsf_tag_hash'}{$t} = undef; delete $$self{'_gsf_tag_hash'}{$t}; } } did not evaluate to a subroutine reference, at /.../Bio/DB/SeqFeature/Store.pm line 2280 Is this a bug? Or am I taking the wrong approach? From johnsonm at gmail.com Tue Feb 13 15:10:23 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Tue, 13 Feb 2007 14:10:23 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: You're quite correct. I wasn't paying enough attention. That does work just fine. I fat-fingered something somewhere else, broke my version of the module for GlimmerHMM, hallucinated and confused \S and \s. 8) All I have left now is to fixup the POD documentation and such and then I can send the module along and somebody can make whatever tweaks and check it in. Shall I open a ticket in Bugzilla for this and attach diffs, or just send them along to somebody to take care of directly? Oh, one thing I have not mentioned. I also added a -seqname argument. Glimmer2 does not provide any kind of sequence identifier in the output, and only processes the first sequence in a fasta file. It would be tedious to have to code around this by fixing up the predictions after they are produced, so I added the option to provide this missing info up front, hopefully allowing downstream code to not have to care as much and have a special case for fixing up Glimmer2 predictions. On 2/12/07, Torsten Seemann wrote: > I think it should be what it says, or perhaps now /^(Glimmer(M|HMM))/. > Here's why: > > I came onto the scene at Glimmer.pm rev 1.4. At that stage it only > parse GlimmerM. I noted that GlimmerHMM was the same output format as > GlimmerM, except for the first line. So in rev 1.5 I modified the > regexp to match both ie. \S* . This would also hopefully match any > other Glimmer-clone formats that arose. I also fixed the pdocs to say > this, and added tests to t/Genpred.t. > % cvs diff -r 1.4 -r 1.5 Bio/Tools/Glimmer.pm > % cvs diff -r 1.15 -r 1.16 t/Genpred.t > > I then planned to extend support to Glimmer2 and Glimmer3. I added the > 4 test files (t/Glimmer*.out) but never wrote the code. This is where > you have come in Mark :-) > > > I lifted that bit of code to do format detection...we don't have > GlimmerHMM > > installed locally, so I'm assuming Torsten's output is correct and the > above > > is a bug. Guess I'll go check bugzilla... > > I'm pretty sure my 4 test files are correct - I spent a lot of time > ensuring they were consistent etc, as I was getting very confused with > the different "glimmer" versions! > > Hope this all helps, > > --Torsten > From cjfields at uiuc.edu Tue Feb 13 15:47:19 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 13 Feb 2007 14:47:19 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: You'll also want to update whatever relevant tests there are for Glimmer; looks like they are in GenPred.t. chris On Feb 13, 2007, at 2:10 PM, Mark Johnson wrote: > You're quite correct. I wasn't paying enough attention. That > does work > just fine. I fat-fingered something somewhere else, broke my > version of the > module for GlimmerHMM, hallucinated and confused \S and \s. 8) > All I have left now is to fixup the POD documentation and such > and then > I can send the module along and somebody can make whatever tweaks > and check > it in. Shall I open a ticket in Bugzilla for this and attach > diffs, or just > send them along to somebody to take care of directly? > Oh, one thing I have not mentioned. I also added a -seqname > argument. > Glimmer2 does not provide any kind of sequence identifier in the > output, and > only processes the first sequence in a fasta file. It would be > tedious to > have to code around this by fixing up the predictions after they are > produced, so I added the option to provide this missing info up front, > hopefully allowing downstream code to not have to care as much and > have a > special case for fixing up Glimmer2 predictions. > > On 2/12/07, Torsten Seemann > wrote: > >> I think it should be what it says, or perhaps now /^(Glimmer(M| >> HMM))/. >> Here's why: >> >> I came onto the scene at Glimmer.pm rev 1.4. At that stage it only >> parse GlimmerM. I noted that GlimmerHMM was the same output format as >> GlimmerM, except for the first line. So in rev 1.5 I modified the >> regexp to match both ie. \S* . This would also hopefully match any >> other Glimmer-clone formats that arose. I also fixed the pdocs to say >> this, and added tests to t/Genpred.t. >> % cvs diff -r 1.4 -r 1.5 Bio/Tools/Glimmer.pm >> % cvs diff -r 1.15 -r 1.16 t/Genpred.t >> >> I then planned to extend support to Glimmer2 and Glimmer3. I added >> the >> 4 test files (t/Glimmer*.out) but never wrote the code. This is where >> you have come in Mark :-) >> >>> I lifted that bit of code to do format detection...we don't have >> GlimmerHMM >>> installed locally, so I'm assuming Torsten's output is correct >>> and the >> above >>> is a bug. Guess I'll go check bugzilla... >> >> I'm pretty sure my 4 test files are correct - I spent a lot of time >> ensuring they were consistent etc, as I was getting very confused >> with >> the different "glimmer" versions! >> >> Hope this all helps, >> >> --Torsten >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From thokeller at gmail.com Tue Feb 13 17:00:06 2007 From: thokeller at gmail.com (Thomas Keller) Date: Tue, 13 Feb 2007 14:00:06 -0800 Subject: [Bioperl-l] update/install problem Message-ID: Could someone suggest a workaround or fix for this error? $ sudo fink update bioperl-pm586 Information about 5850 packages read in 2 seconds. The package 'bioperl-pm586' will be built and installed. The package 'xml-sax-pm586' will be installed. The package 'xml-sax-writer-pm586' will be built and installed. The package 'xml-filter-buffertext-pm586' will be built and installed. The following package will be installed or updated: bioperl-pm586 The following 3 additional packages will be installed: xml-filter-buffertext-pm586 xml-sax-pm586 xml-sax-writer-pm586 Do you want to continue? [Y/n] Y /sw/bin/dpkg-lockwait -i /sw/fink/dists/unstable/main/binary-darwin-powerpc/libs/perlmods/xml-sax-pm586_0.13-2_darwin- powerpc.deb (Reading database ... 48029 files and directories currently installed.) Preparing to replace xml-sax-pm586 0.13-2 (using .../xml-sax-pm586_0.13-2_darwin-powerpc.deb) ... Unpacking replacement xml-sax-pm586 ... Setting up xml-sax-pm586 (0.13-2) ... update-perl586-sax-parsers: adding Perl SAX parser module info file of XML::SAX::PurePerl... Can't locate object method "save_parsers_debian" via package "XML::SAX" at /sw/sbin/update-perl586-sax-parsers line 96. /sw/bin/dpkg: error processing xml-sax-pm586 (--install): subprocess post-installation script returned error exit status 22 Errors were encountered while processing: xml-sax-pm586 ### execution of /sw/bin/dpkg-lockwait failed, exit code 1 Failed: can't install package xml-sax-pm586-0.13-2 -- Tom Keller "Ecrasez l'Infame!" -- Voltaire From sac at bioperl.org Tue Feb 13 18:00:46 2007 From: sac at bioperl.org (Steve Chervitz) Date: Tue, 13 Feb 2007 15:00:46 -0800 Subject: [Bioperl-l] Bio::Root::Utilities.pm Message-ID: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com> I noticed that Bio::Root::Utilities was purged from bioperl-live for the 1.5.2 release, but I'd like us to consider adding it back. I agree that the other purged Root modules were ancient relics of the past, but Bio::Root:: Utilities.pm still has signs of life (at least I still find occasion to use it, or refer to code in it). I know that it's not currently used by any other modules in Bioperl, but there are likely some legacy scripts out there that rely on it. Probably most of those scripts are ones I've written, but there have been substantive commits by others in the not-to-distant past (Dec 2005), so at least some folks besides myself are using it and may hesitate to upgrade their bioperl installation if it's absent. I'm all for avoiding bloat in the codebase and am eager to see Bioperl be more lean and mean, but I'd like to keep this module around. I'll agree to add some tests for it as well as clean some things up (e.g., use Bio::Root::IO to get temp file name). Cheers, Steve -- Steve Chervitz sac at bioperl.org From cjfields at uiuc.edu Tue Feb 13 20:29:03 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 13 Feb 2007 19:29:03 -0600 Subject: [Bioperl-l] update/install problem In-Reply-To: References: Message-ID: On Feb 13, 2007, at 4:00 PM, Thomas Keller wrote: > Could someone suggest a workaround or fix for this error? > > $ sudo fink update bioperl-pm586 > Information about 5850 packages read in 2 seconds. > The package 'bioperl-pm586' will be built and installed. > The package 'xml-sax-pm586' will be installed. > The package 'xml-sax-writer-pm586' will be built and installed. > The package 'xml-filter-buffertext-pm586' will be built and installed. > The following package will be installed or updated: > bioperl-pm586 > The following 3 additional packages will be installed: > xml-filter-buffertext-pm586 xml-sax-pm586 xml-sax-writer-pm586 > Do you want to continue? [Y/n] Y > /sw/bin/dpkg-lockwait -i > /sw/fink/dists/unstable/main/binary-darwin-powerpc/libs/perlmods/ > xml-sax-pm586_0.13-2_darwin- > powerpc.deb > (Reading database ... 48029 files and directories currently > installed.) > Preparing to replace xml-sax-pm586 0.13-2 (using > .../xml-sax-pm586_0.13-2_darwin-powerpc.deb) ... > Unpacking replacement xml-sax-pm586 ... > Setting up xml-sax-pm586 (0.13-2) ... > update-perl586-sax-parsers: adding Perl SAX parser module info file of > XML::SAX::PurePerl... > Can't locate object method "save_parsers_debian" via package > "XML::SAX" at > /sw/sbin/update-perl586-sax-parsers line 96. > /sw/bin/dpkg: error processing xml-sax-pm586 (--install): > subprocess post-installation script returned error exit status 22 > Errors were encountered while processing: > xml-sax-pm586 > ### execution of /sw/bin/dpkg-lockwait failed, exit code 1 > Failed: can't install package xml-sax-pm586-0.13-2 The fink installation seems to be hanging on XML::SAX, not bioperl. You could try installing XML::SAX (now at v. 0.15) via CPAN using 'sudo cpan'; I updated just recently w/o problems. As an aside, you could similarly install bioperl directly from CPAN (which I also haven't had any problems with). The installation allows for installing optional modules. chris From cjfields at uiuc.edu Tue Feb 13 22:41:31 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 13 Feb 2007 21:41:31 -0600 Subject: [Bioperl-l] Bio::Root::Utilities.pm In-Reply-To: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com> References: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com> Message-ID: <1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu> On Feb 13, 2007, at 5:00 PM, Steve Chervitz wrote: > I noticed that Bio::Root::Utilities was purged from bioperl-live > for the > 1.5.2 release, but I'd like us to consider adding it back. I agree > that the > other purged Root modules were ancient relics of the past, but > Bio::Root:: > Utilities.pm still has signs of life (at least I still find > occasion to use > it, or refer to code in it). > > I know that it's not currently used by any other modules in > Bioperl, but > there are likely some legacy scripts out there that rely on it. > Probably > most of those scripts are ones I've written, but there have been > substantive > commits by others in the not-to-distant past (Dec 2005), so at > least some > folks besides myself are using it and may hesitate to upgrade their > bioperl > installation if it's absent. > > I'm all for avoiding bloat in the codebase and am eager to see > Bioperl be > more lean and mean, but I'd like to keep this module around. I'll > agree to > add some tests for it as well as clean some things up (e.g., use > Bio::Root::IO to get temp file name). > > Cheers, > Steve > -- > Steve Chervitz > sac at bioperl.org I don't have a problem with adding it back, esp. if tests are added. Everything in Bio::Root* not tied to a module was yanked out when no one spoke up about cleaning up Bio::Root* modules: http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12792/ focus=12839 Maybe others disagree? chris From bix at sendu.me.uk Wed Feb 14 03:00:35 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 14 Feb 2007 08:00:35 +0000 Subject: [Bioperl-l] update/install problem In-Reply-To: References: Message-ID: <45D2C1A3.9060300@sendu.me.uk> Chris Fields wrote: > As an aside, you could similarly install bioperl directly from CPAN > (which I also haven't had any problems with). Indeed. If you follow the unix instructions at http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix you should have a problem-free complete install under Mac OS X. From bix at sendu.me.uk Wed Feb 14 09:08:22 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 14 Feb 2007 14:08:22 +0000 Subject: [Bioperl-l] BLASTXML changes (good this time!) In-Reply-To: References: <45CCF861.8030000@sendu.me.uk> Message-ID: <45D317D6.5070903@sendu.me.uk> Chris Fields wrote: > > On Feb 9, 2007, at 4:40 PM, Sendu Bala wrote: > >> Chris Fields wrote: >>> If Sendu is out there, I think we can safely remove any dependencies >>> beyond XML::SAX 0.15 for the next release. Should I go ahead and >>> modify Build.PL? >> >> Sure, good to hear. > > I added a version dependency for XML::SAX (v. 0.15) for the PurePerl > fix. That likely obviates the need for a Bundle for XML::Simple. Not > too pressing; we can determine that before the next release. The bundle is now obsolete. Does anything in Bioperl, or any of its dependencies, now make use of the expat library? If not, I can remove mention of it from the install documentation. From bix at sendu.me.uk Wed Feb 14 09:02:39 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 14 Feb 2007 14:02:39 +0000 Subject: [Bioperl-l] DB.t failures Message-ID: <45D3167F.2000608@sendu.me.uk> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer getting sequences back from NCBI in the order we requested them in batch mode. Is this a change at NCBI? Is there some way we can make sure to return the sequences in the expected order? Or shouldn't the order be expected (should the test script be altered)? From cjfields at uiuc.edu Wed Feb 14 09:37:07 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 14 Feb 2007 08:37:07 -0600 Subject: [Bioperl-l] DB.t failures In-Reply-To: <45D3167F.2000608@sendu.me.uk> References: <45D3167F.2000608@sendu.me.uk> Message-ID: <49A5C7D3-8D63-452C-B0EA-6F7144F85E35@uiuc.edu> Confirmed on this end. It's possible that the default sort order from eutils is different now though I haven't seen anything on the eutils mail list. There may be a way to set the sort order via the base URL; I'll check into it later today; I'm still digging myself out from the midwest blizzard. chris On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: > DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer > getting sequences back from NCBI in the order we requested them in > batch > mode. > > Is this a change at NCBI? Is there some way we can make sure to return > the sequences in the expected order? Or shouldn't the order be > expected > (should the test script be altered)? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Feb 14 09:42:05 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 14 Feb 2007 08:42:05 -0600 Subject: [Bioperl-l] BLASTXML changes (good this time!) In-Reply-To: <45D317D6.5070903@sendu.me.uk> References: <45CCF861.8030000@sendu.me.uk> <45D317D6.5070903@sendu.me.uk> Message-ID: On Feb 14, 2007, at 8:08 AM, Sendu Bala wrote: > Chris Fields wrote: >> >> On Feb 9, 2007, at 4:40 PM, Sendu Bala wrote: >> >>> Chris Fields wrote: >>>> If Sendu is out there, I think we can safely remove any >>>> dependencies >>>> beyond XML::SAX 0.15 for the next release. Should I go ahead and >>>> modify Build.PL? >>> >>> Sure, good to hear. >> >> I added a version dependency for XML::SAX (v. 0.15) for the PurePerl >> fix. That likely obviates the need for a Bundle for XML::Simple. >> Not >> too pressing; we can determine that before the next release. > > The bundle is now obsolete. Does anything in Bioperl, or any of its > dependencies, now make use of the expat library? If not, I can remove > mention of it from the install documentation. I'll try getting something up about XML::SAX on the wiki today. XML::Parser, though, still requires expat AFAIK: http://www.bioperl.org/wiki/BioPerl_Dependencies chris From kellert at ohsu.edu Tue Feb 13 17:43:24 2007 From: kellert at ohsu.edu (Thomas J Keller) Date: Tue, 13 Feb 2007 14:43:24 -0800 Subject: [Bioperl-l] HowTo:SearchIO Message-ID: Greetings, I've been away from programming and informatics for many months. Hoping to get back into it, I thought it would be good to review the tutorials. I tried the code in the tutorial on the sample blast report in the tutorial and it worked fine. So I ran a blastx search and saved the results and tried to parse them: It gave the "... parsing" message, but no other results get reported. Any suggestions? Thanks, Tom Tom Keller, Ph.D. kellert at ohsu.edu 503-494-2442 6339b Basic Science Bldg http://www.ohsu.edu/research/core From mrouard at gmail.com Wed Feb 14 06:23:47 2007 From: mrouard at gmail.com (Mathieu Rouard) Date: Wed, 14 Feb 2007 12:23:47 +0100 Subject: [Bioperl-l] get the sequence of a column in a multiple alignment Message-ID: Dear all, I am starting to use the bioperl API to parse multiple alignments and I am wondering what is the most effective way to splice all the columns from an alignment (all the AA at the postion 1, position 2 etc.). I quickly implemented this simple code but it becomes quite slow when the length of sequences increases. my $stream = Bio::AlignIO->new(-file => $inputfilename, '-format' => 'stockholm'); my $aln = $stream->next_aln(); my $length = $aln->length(); my %column; for (my $i=1;$i<=$length;$i++) { my $aa; foreach my $seq ($aln->each_seq()) { my $obj = $seq->trunc($i,$i); $aa .=$obj->seq; } # need to track the column number and the sequence of the column push $column, $aa; } Would you have any other suggestion? thanks Mathieu From avilella at gmail.com Wed Feb 14 10:29:02 2007 From: avilella at gmail.com (Albert Vilella) Date: Wed, 14 Feb 2007 15:29:02 +0000 Subject: [Bioperl-l] get the sequence of a column in a multiple alignment In-Reply-To: References: Message-ID: <358f4d650702140729u4dae2847qc8eeeb45b20faca4@mail.gmail.com> there is a slice method: $mini_aln = $aln->slice(20,30); # get a block of columns Title : slice Usage : $aln2 = $aln->slice(20,30) Function : Creates a slice from the alignment inclusive of start and end columns, and the first column in the alignment is denoted 1. Sequences with no residues in the slice are excluded from the new alignment and a warning is printed. Slice beyond the length of the sequence does not do padding. Returns : A Bio::SimpleAlign object Args : Positive integer for start column, positive integer for end column, optional boolean which if true will keep gap-only columns in the newly created slice. Example: $aln2 = $aln->slice(20,30,1) but I don't know how well it behaves for lots of sequences :) On 2/14/07, Mathieu Rouard wrote: > Dear all, > > I am starting to use the bioperl API to parse multiple alignments and I am > wondering what is the most effective way to splice all the columns from an > alignment (all the AA at the postion 1, position 2 etc.). I quickly > implemented this simple code but it becomes quite slow when the length of > sequences increases. > > my $stream = Bio::AlignIO->new(-file => $inputfilename, > '-format' => 'stockholm'); > > my $aln = $stream->next_aln(); > > my $length = $aln->length(); > my %column; > > for (my $i=1;$i<=$length;$i++) { > my $aa; > foreach my $seq ($aln->each_seq()) { > my $obj = $seq->trunc($i,$i); > $aa .=$obj->seq; > } > # need to track the column number and the sequence of the column > push $column, $aa; > } > > Would you have any other suggestion? > > thanks > Mathieu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Wed Feb 14 11:59:49 2007 From: jason at bioperl.org (Jason Stajich) Date: Wed, 14 Feb 2007 08:59:49 -0800 Subject: [Bioperl-l] HowTo:SearchIO In-Reply-To: References: Message-ID: As always, reporting the version of BLAST and Bioperl you have installed will help someone diagnose if this is a fixed problem or not. If you trawl through the list archives you'll chris and others have been playing cat and mouse with the text version output from NCBI BLAST which appears to be an ever evolving beast. So the best advice right now is to get the latest bioperl from CVS to insure you have all the patches that might parse this version. If it still fails then the standard response will be to submit the report as an attachment to a new bug report on the bugzilla. thanks, -jason On Feb 13, 2007, at 2:43 PM, Thomas J Keller wrote: > Greetings, > I've been away from programming and informatics for many months. > Hoping to get back into it, I thought it would be good to review the > tutorials. > I tried the code in the tutorial on the sample blast report in the > tutorial and it worked fine. So I ran a blastx search and saved the > results and tried to parse them: It gave the "... parsing" message, > but no other results get reported. > > Any suggestions? > > Thanks, > Tom > > Tom Keller, Ph.D. > kellert at ohsu.edu > 503-494-2442 > 6339b Basic Science Bldg > http://www.ohsu.edu/research/core > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From dmessina at wustl.edu Wed Feb 14 11:58:45 2007 From: dmessina at wustl.edu (David Messina) Date: Wed, 14 Feb 2007 10:58:45 -0600 Subject: [Bioperl-l] HowTo:SearchIO In-Reply-To: References: Message-ID: <6E3CAB6B-9F9E-46FD-9021-50D7FE011860@wustl.edu> Hi Tom, Could you tell us what version of BioPerl you are using, and what specific example is failing for you? And could you post your code? That would make it easier to diagnose the problem. Thanks, Dave -- Dave Messina Senior Programmer/Analyst, Assembly Group WashU Genome Sequencing Center dmessina a t wustl.edu 314-286-1415 From cjfields at uiuc.edu Wed Feb 14 12:28:24 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 14 Feb 2007 11:28:24 -0600 Subject: [Bioperl-l] HowTo:SearchIO In-Reply-To: References: Message-ID: <3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu> I would also strongly encourage switching to using XML-based parsing, which is much more stable now. Here's the link to the NCBI response re: BLAST report parsing: http://bioperl.org/wiki/NCBI_Blast_email chris (taking a break from shoveling snow...) On Feb 14, 2007, at 10:59 AM, Jason Stajich wrote: > As always, reporting the version of BLAST and Bioperl you have > installed will help someone diagnose if this is a fixed problem or > not. If you trawl through the list archives you'll chris and others > have been playing cat and mouse with the text version output from > NCBI BLAST which appears to be an ever evolving beast. > > So the best advice right now is to get the latest bioperl from CVS > to insure you have all the patches that might parse this version. If > it still fails then the standard response will be to submit the > report as an attachment to a new bug report on the bugzilla. > > thanks, > -jason > > > On Feb 13, 2007, at 2:43 PM, Thomas J Keller wrote: > >> Greetings, >> I've been away from programming and informatics for many months. >> Hoping to get back into it, I thought it would be good to review the >> tutorials. >> I tried the code in the tutorial on the sample blast report in the >> tutorial and it worked fine. So I ran a blastx search and saved the >> results and tried to parse them: It gave the "... parsing" message, >> but no other results get reported. >> >> Any suggestions? >> >> Thanks, >> Tom >> >> Tom Keller, Ph.D. >> kellert at ohsu.edu >> 503-494-2442 >> 6339b Basic Science Bldg >> http://www.ohsu.edu/research/core >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From sac at bioperl.org Wed Feb 14 13:20:17 2007 From: sac at bioperl.org (Steve Chervitz) Date: Wed, 14 Feb 2007 10:20:17 -0800 Subject: [Bioperl-l] Bio::Root::Utilities.pm In-Reply-To: <1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu> References: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com> <1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu> Message-ID: <8f200b4c0702141020x5815018bvd69873222582d9d5@mail.gmail.com> On 2/13/07, Chris Fields wrote: > > > On Feb 13, 2007, at 5:00 PM, Steve Chervitz wrote: > > > I noticed that Bio::Root::Utilities was purged from bioperl-live > > for the > > 1.5.2 release, but I'd like us to consider adding it back. I agree > > that the > > other purged Root modules were ancient relics of the past, but > > Bio::Root:: > > Utilities.pm still has signs of life (at least I still find > > occasion to use > > it, or refer to code in it). > > > > I know that it's not currently used by any other modules in > > Bioperl, but > > there are likely some legacy scripts out there that rely on it. > > Probably > > most of those scripts are ones I've written, but there have been > > substantive > > commits by others in the not-to-distant past (Dec 2005), so at > > least some > > folks besides myself are using it and may hesitate to upgrade their > > bioperl > > installation if it's absent. > > > > I'm all for avoiding bloat in the codebase and am eager to see > > Bioperl be > > more lean and mean, but I'd like to keep this module around. I'll > > agree to > > add some tests for it as well as clean some things up (e.g., use > > Bio::Root::IO to get temp file name). > > > > Cheers, > > Steve > > -- > > Steve Chervitz > > sac at bioperl.org > > I don't have a problem with adding it back, esp. if tests are added. > Everything in Bio::Root* not tied to a module was yanked out when no > one spoke up about cleaning up Bio::Root* modules: > > http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12792/ > focus=12839 > > Maybe others disagree? > > chris > Sorry I missed out on that thread. I had some trouble with my bioperl-l email delivery getting disabled due to excessive bounces, and it took me a while to catch it. Bio::Root::Utilities is quite a grab bag of miscellaneous general functions that are occasionally useful for perl scripting (e.g., determining end-of-line characters, sending email, etc.). The code could definitely use a review, and maybe an example script to advertise it. I can look into this, and suggestions are welcome. Steve From dmessina at wustl.edu Wed Feb 14 13:55:18 2007 From: dmessina at wustl.edu (David Messina) Date: Wed, 14 Feb 2007 12:55:18 -0600 Subject: [Bioperl-l] HowTo:SearchIO In-Reply-To: <3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu> References: <3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu> Message-ID: <49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu> On Feb 14, 2007, at 11:28 AM, Chris Fields wrote: > I would also strongly encourage switching to using XML-based parsing, Unless anyone objects, I would be happy to update the HOWTO to suggest people make the switch and give an example of XML parsing. The Bio::SearchIO synopsis is already an XML example. However, there's no warning about text-based parsing nor a suggestion to use XML that I can see -- perhaps should be added? Dave From cjfields at uiuc.edu Wed Feb 14 15:12:21 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 14 Feb 2007 14:12:21 -0600 Subject: [Bioperl-l] HowTo:SearchIO In-Reply-To: <49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu> References: <3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu> <49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu> Message-ID: On Feb 14, 2007, at 12:55 PM, David Messina wrote: > > On Feb 14, 2007, at 11:28 AM, Chris Fields wrote: > >> I would also strongly encourage switching to using XML-based parsing, > > Unless anyone objects, I would be happy to update the HOWTO to > suggest people make the switch and give an example of XML parsing. > > The Bio::SearchIO synopsis is already an XML example. However, > there's no warning about text-based parsing nor a suggestion to use > XML that I can see -- perhaps should be added? > > Dave We should probably add something specifically for BLAST, yes. Other text parsers should be fine. Personally, I use XML or tabular output parsing simply b/c they are faster and do what I need. I think we'll need to retain the capability for text-based BLAST parsing, but it will become extremely bloated long-term if we plan on continuing support for parsing all versions and flavors of BLAST, particularly if NCBI continues to change the output. chris From dmessina at wustl.edu Wed Feb 14 15:46:31 2007 From: dmessina at wustl.edu (David Messina) Date: Wed, 14 Feb 2007 14:46:31 -0600 Subject: [Bioperl-l] HowTo:SearchIO In-Reply-To: References: <3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu> <49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu> Message-ID: <136DA052-B9FD-4547-B262-EC6E38B47392@wustl.edu> On Feb 14, 2007, at 2:12 PM, Chris Fields wrote: > We should probably add something specifically for BLAST, yes. > Other text parsers should be fine. Good point -- I'll make it clear it's only pertinent to BLAST. > I think we'll need to retain the capability for text-based BLAST > parsing, Agreed. Through the 1.6 release at least, I would think. > particularly if NCBI continues to change the output. Well, clearly the solution is not to use the NCBI flavor of BLAST. :) Dave (look at my email address) From jay at jays.net Thu Feb 15 08:08:56 2007 From: jay at jays.net (Jay Hannah) Date: Thu, 15 Feb 2007 07:08:56 -0600 Subject: [Bioperl-l] DB.t failures In-Reply-To: <45D3167F.2000608@sendu.me.uk> References: <45D3167F.2000608@sendu.me.uk> Message-ID: On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: > DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer > getting sequences back from NCBI in the order we requested them in > batch > mode. Is this the same result you get? DIED. FAILED tests 59-60, 63-64, 67-68, 71-72 Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97 okay, 85.84%) Failed Test Stat Wstat Total Fail Failed List of Failed ------------------------------------------------------------------------ ------- DB.t 8 2048 113 8 7.08% 59-60 63-64 67-68 71-72 8 subtests skipped. Thanks, j seqlab.net http://www.bioperl.org/wiki/User:Jhannah From bix at sendu.me.uk Thu Feb 15 08:37:32 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 15 Feb 2007 13:37:32 +0000 Subject: [Bioperl-l] DB.t failures In-Reply-To: References: <45D3167F.2000608@sendu.me.uk> Message-ID: <45D4621C.6040309@sendu.me.uk> Jay Hannah wrote: > On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: >> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer >> getting sequences back from NCBI in the order we requested them in >> batch >> mode. > > Is this the same result you get? > > > DIED. FAILED tests 59-60, 63-64, 67-68, 71-72 > Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97 > okay, 85.84%) > Failed Test Stat Wstat Total Fail Failed List of Failed > ------------------------------------------------------------------------ > ------- > DB.t 8 2048 113 8 7.08% 59-60 63-64 67-68 71-72 > 8 subtests skipped. Yes, those fails are all caused by results in the wrong order (I believe). From cjfields at uiuc.edu Thu Feb 15 09:22:09 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 08:22:09 -0600 Subject: [Bioperl-l] DB.t failures In-Reply-To: <45D4621C.6040309@sendu.me.uk> References: <45D3167F.2000608@sendu.me.uk> <45D4621C.6040309@sendu.me.uk> Message-ID: On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote: > Jay Hannah wrote: >> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: >>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer >>> getting sequences back from NCBI in the order we requested them in >>> batch >>> mode. >> >> Is this the same result you get? >> >> >> DIED. FAILED tests 59-60, 63-64, 67-68, 71-72 >> Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97 >> okay, 85.84%) >> Failed Test Stat Wstat Total Fail Failed List of Failed >> --------------------------------------------------------------------- >> --- >> ------- >> DB.t 8 2048 113 8 7.08% 59-60 63-64 67-68 71-72 >> 8 subtests skipped. > > Yes, those fails are all caused by results in the wrong order (I > believe). I'm fixing those now so it doesn't depend on order and will commit in the next few minutes. chris From bix at sendu.me.uk Thu Feb 15 09:37:00 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 15 Feb 2007 14:37:00 +0000 Subject: [Bioperl-l] DB.t failures In-Reply-To: <586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu> References: <45D3167F.2000608@sendu.me.uk> <45D4621C.6040309@sendu.me.uk> <586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu> Message-ID: <45D4700C.8020305@sendu.me.uk> Chris Fields wrote: > > On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote: > >> Jay Hannah wrote: >>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: >>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer >>>> getting sequences back from NCBI in the order we requested them in >>>> batch mode. > > Okay, I committed a fix for that. I hope there are many users who > depend on the returned sequence order for anything! s/are/aren't/ ? I suspect there might be, and its certainly a reasonable assumption to make. Did you not see an easy way of maintaining the order? From cjfields at uiuc.edu Thu Feb 15 09:28:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 08:28:46 -0600 Subject: [Bioperl-l] DB.t failures In-Reply-To: <45D4621C.6040309@sendu.me.uk> References: <45D3167F.2000608@sendu.me.uk> <45D4621C.6040309@sendu.me.uk> Message-ID: <586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu> On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote: > Jay Hannah wrote: >> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: >>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer >>> getting sequences back from NCBI in the order we requested them in >>> batch >>> mode. >> >> Is this the same result you get? >> >> >> DIED. FAILED tests 59-60, 63-64, 67-68, 71-72 >> Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97 >> okay, 85.84%) >> Failed Test Stat Wstat Total Fail Failed List of Failed >> --------------------------------------------------------------------- >> --- >> ------- >> DB.t 8 2048 113 8 7.08% 59-60 63-64 67-68 71-72 >> 8 subtests skipped. > > Yes, those fails are all caused by results in the wrong order (I > believe). Okay, I committed a fix for that. I hope there are many users who depend on the returned sequence order for anything! chris From michael.watson at bbsrc.ac.uk Thu Feb 15 09:44:27 2007 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu, 15 Feb 2007 14:44:27 -0000 Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk> Hi OK I have some great images out of this glyph, but I can't see the axis, and nor is it labelled (ie does it go from 0 - 100%?) so isn't great for publication. The docs say: "NOTE: -gc_window=>'auto' gives nice results and is recommended for drawing GC content. The GC content axes draw slightly outside the panel, so you may wish to add some extra padding on the right and left. " Any idea how to do this? Basically, I want a nice GC graph with the axis quite clearly labelled, and a nice "%GC" title next to it :) Thanks Mick The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. From nehadnahar at yahoo.co.in Thu Feb 15 10:28:42 2007 From: nehadnahar at yahoo.co.in (Neha Nahar) Date: Thu, 15 Feb 2007 15:28:42 +0000 (GMT) Subject: [Bioperl-l] Convert newick to nexus format In-Reply-To: <84703383-600F-42F4-A860-DD0D1C43EE83@bioperl.org> Message-ID: <777943.33252.qm@web8404.mail.in.yahoo.com> Thank you Jason. I ran the tests and they failed, so I re-installed the bioperl module and now it works fine. Regards, Neha. Jason Stajich wrote: Something is wrong with your install I am guessing - can you run the tests? Go to bioperl directory: $ perl t/TreeIO.t can you describe how you installed bioperl? On Feb 5, 2007, at 11:58 AM, Neha Nahar wrote: > > Hi, > Thank you for the code. > I tried it but I still get the same exception. > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus1.pl:18 > > > Please find attached the perl file(nexus.pl). > > > Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ > ~sendu/bioperl/Bio/TreeIO.pm > > Please let me know if I am using the correct version.If not, please > point me to the latest one. > > Thank you. > Regards, > nnahar > > > > > > Jason Stajich wrote:please cc the mailing list > when asking a question or followup. > > Sorry I don't know what you are doing wrong - you didn't resend > your code so I don't know if you still have a typo. > > This code works fine for me > > use Bio::TreeIO; > use strict; > my ($filein,$fileout) = @ARGV; > my ($format,$oformat) = qw(newick nexus); > my $in = Bio::TreeIO->new(-file => $filein, -format => $format);my > $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout"); > > > while( my $t = $in->next_tree ) { > $out->write_tree($t); > } > > > > On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote: > > Thank you very much for the reply. > > > I fixed the code as per your suggestion,but now am getting a > different error: > > > $ ./nexus.pl mrp-input.txt nexus.out > newickfile=mrp-input.txt, nexusfile=nexus.out > > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus.pl:23 > > > -------------------------------------- > > > Please help me out with this script. > > > Thank you. > Regards, > Neha > > > > > > > > > Jason Stajich wrote: you want to write the TREE > out not the TREE WRITER. > > > > > $treeout->write_tree($tree) > > > not > $treeout->write_tree($treeout); > > > On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote: > > > Hello everyone, > > > > > I am trying to convert newick tree to nexus format. > Using the script (refered from and email from George dated Wed Sep > 22 11:52:47 EDT 2004) : > > > > > /*------------------------------------------------------------*/ > > > > > $ cat nexus.pl > #!/usr/bin/perl -w > > > > > use Bio::TreeIO; > > > > > ($NEWICKFILE, $NEXUSFILE) = @ARGV; > print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n"; > my $treeio = new Bio::TreeIO(-format => 'newick', -file => > "$NEWICKFILE"); > my $treeout = new Bio::TreeIO(-format => 'nexus', -file => "> > $NEXUSFILE"); > while(my $tree = $treeio->next_tree) { > $treeout->write_tree($treeout); > } > > > > > exit 0; > > > > > > > > > /*------------------------------------------------------------*/ > > > > > Running the script through command line: > Gives the following error: > > > > > $ ./nexus.pl mrp-input.txt nexus.out > newickfile=mrp-input.txt, nexusfile=nexus.out > > > > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus.pl:23 > > > > > -------------------------------------- > > > > > > > > > Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ > ~sendu/bioperl/Bio/TreeIO.pm > > > > > Questions:- > > > > > 1. Please let me know if I am using the correct version. > If not, please point me to the latest one. > > > > > 2. Provided that the version I am using is the right one, please > let me know what is wrong with the script. > > > > > Thank you. > Regards, > Neha. > > > > > > > > > > > > > > > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > > > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > > > > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > > > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > > > > > > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not > to impress !" > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -Neha Nahar " Work for cause and not for applause, live to express and not to impress !" --------------------------------- Here?s a new way to find what you're looking for - Yahoo! Answers From cjfields at uiuc.edu Thu Feb 15 10:44:23 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 09:44:23 -0600 Subject: [Bioperl-l] DB.t failures In-Reply-To: <45D4700C.8020305@sendu.me.uk> References: <45D3167F.2000608@sendu.me.uk> <45D4621C.6040309@sendu.me.uk> <586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu> <45D4700C.8020305@sendu.me.uk> Message-ID: <809A1A9D-0344-412D-98D7-EA852EFED63D@uiuc.edu> On Feb 15, 2007, at 8:37 AM, Sendu Bala wrote: > Chris Fields wrote: >> >> On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote: >> >>> Jay Hannah wrote: >>>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: >>>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no >>>>> longer >>>>> getting sequences back from NCBI in the order we requested them in >>>>> batch mode. >> >> Okay, I committed a fix for that. I hope there are many users who >> depend on the returned sequence order for anything! > > s/are/aren't/ ? Yes, my oops. > I suspect there might be, and its certainly a reasonable assumption to > make. Did you not see an easy way of maintaining the order? I haven't looked (been busy the last few days), but I think there is a way via efetch. We could add in something to the default base URL if there is something or (probably better) add a sort_order() method to designate a particular sort order, defaulting to the old order if not set. chris From lstein at cshl.edu Thu Feb 15 13:53:13 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 15 Feb 2007 13:53:13 -0500 Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com> Hi Michael, When you set up the panel, do this: Bio::Graphics::Panel->new(-blah -blah, -pad_left => 20, -pad_right => 20); This will leave enough room on the left and right for you to see the Y axis. Otherwise it runs off the edge of the image (ok, this is a mis-design, but it was the only way to solve a chicken-and-egg problem about who gets to say how wide the panel is) Lincoln On 2/15/07, michael watson (IAH-C) wrote: > > Hi > > OK I have some great images out of this glyph, but I can't see the axis, > and nor is it labelled (ie does it go from 0 - 100%?) so isn't great for > publication. The docs say: > > "NOTE: -gc_window=>'auto' gives nice results and is recommended for > drawing GC content. The GC content axes draw slightly outside the > panel, so you may wish to add some extra padding on the right and > left. " > > Any idea how to do this? > > Basically, I want a nice GC graph with the axis quite clearly labelled, > and a nice "%GC" title next to it :) > > Thanks > > Mick > > The information contained in this message may be confidential or legally > privileged and is intended solely for the addressee. If you have > received this message in error please delete it & notify the originator > immediately. > Unauthorised use, disclosure, copying or alteration of this message is > forbidden & may be unlawful. > The contents of this e-mail are the views of the sender and do not > necessarily represent the views of the Institute. > This email and associated attachments has been checked locally for > viruses but we can accept no responsibility once it has left our > systems. > Communications on Institute computers are monitored to secure the > effective operation of the systems and for other lawful purposes. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From johnsonm at gmail.com Thu Feb 15 14:24:08 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Thu, 15 Feb 2007 13:24:08 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: Done. Bug opened in Bugzilla, diffs attached including new/updated tests: http://bugzilla.open-bio.org/show_bug.cgi?id=2206 Can somebody grab that, take a look, tweak to taste, test and commit? Tests pass on my end presently. On 2/13/07, Chris Fields wrote: > > You'll also want to update whatever relevant tests there are for > Glimmer; looks like they are in GenPred.t. > > chris > From cjfields at uiuc.edu Thu Feb 15 14:37:22 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 13:37:22 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: <4C15214E-AE4B-4D85-A710-60536B08BE86@uiuc.edu> On Feb 15, 2007, at 1:24 PM, Mark Johnson wrote: > Done. Bug opened in Bugzilla, diffs attached including new/updated > tests: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2206 > > Can somebody grab that, take a look, tweak to taste, test and > commit? Tests > pass on my end presently. > > On 2/13/07, Chris Fields wrote: >> >> You'll also want to update whatever relevant tests there are for >> Glimmer; looks like they are in GenPred.t. >> >> chris Done; everything passed on this end as well, no tweaking necessary. If there are problems we'll definitely hear about it down the road (Glimmer is a popular tool), but I think you'll be fine. Thanks Mark! chris From cjfields at uiuc.edu Thu Feb 15 14:46:07 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 13:46:07 -0600 Subject: [Bioperl-l] DB.t failures In-Reply-To: <809A1A9D-0344-412D-98D7-EA852EFED63D@uiuc.edu> References: <45D3167F.2000608@sendu.me.uk> <45D4621C.6040309@sendu.me.uk> <586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu> <45D4700C.8020305@sendu.me.uk> <809A1A9D-0344-412D-98D7-EA852EFED63D@uiuc.edu> Message-ID: On Feb 15, 2007, at 9:44 AM, Chris Fields wrote: > > On Feb 15, 2007, at 8:37 AM, Sendu Bala wrote: > >> Chris Fields wrote: >>> >>> On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote: >>> >>>> Jay Hannah wrote: >>>>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: >>>>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no >>>>>> longer >>>>>> getting sequences back from NCBI in the order we requested >>>>>> them in >>>>>> batch mode. >>> >>> Okay, I committed a fix for that. I hope there are many users who >>> depend on the returned sequence order for anything! >> >> s/are/aren't/ ? > > Yes, my oops. > >> I suspect there might be, and its certainly a reasonable >> assumption to >> make. Did you not see an easy way of maintaining the order? > > I haven't looked (been busy the last few days), but I think there is > a way via efetch. > > We could add in something to the default base URL if there is > something or (probably better) add a sort_order() method to designate > a particular sort order, defaulting to the old order if not set. > > chris Delving in to it further, the problem only occurs when using get_seq_stream() directly in batch mode, which is likely only used by developers for testing. The sort issue only pops up when eposting IDs using that mode; retrieved seqs are returned in a different order than through a direct efetch query (the default with get_Stream* or get_Seq* methods). No use of the 'sort' parameter works to get around that problem, not a complete surprise since it is supposed to only work for PubMed, but since the method is rarely used I'll just leave the bullet-proofed tests alone. chris From letondal at pasteur.fr Thu Feb 15 15:23:55 2007 From: letondal at pasteur.fr (Catherine Letondal) Date: Thu, 15 Feb 2007 21:23:55 +0100 Subject: [Bioperl-l] Problem parsing clustalw format ni AlignIO Message-ID: Hi bioperlers, I have a script called protal2dna (http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html, see attachment #1) that realign DNA sequences giving their sequences + the corresponding protein alignment (sequences have to be in the same order or named equivalently). We have a parsing problem reported from the AlignIO class when users enter some clustalw file (see attachment #2 for an example): % protal2dna alig-protal2dna.dat dna-protal2dna.data no alignment available in 'clustalw' format from file 'alig-protal2dna.dat' % I have tried with bioperl 1.4. I have looked in the archive and in the BUGS, but found nothing? Is there any bug fix for this? I also provide the DNA sequences file if you want to test. Thanks a lot in advance, -- Catherine Letondal -- Institut Pasteur www.pasteur.fr/~letondal -------------- next part -------------- A non-text attachment was scrubbed... Name: protal2dna Type: application/octet-stream Size: 11093 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070215/42a0ba43/attachment.obj -------------- next part -------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: alig-protal2dna.dat Type: application/octet-stream Size: 12022 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070215/42a0ba43/attachment-0001.obj -------------- next part -------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: dna-protal2dna.data Type: application/octet-stream Size: 7739 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070215/42a0ba43/attachment-0002.obj From Kevin.M.Brown at asu.edu Thu Feb 15 16:38:25 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 15 Feb 2007 14:38:25 -0700 Subject: [Bioperl-l] Problem parsing clustalw format ni AlignIO In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B402BA7764@EX02.asurite.ad.asu.edu> Did you try Bioperl 1.5.2 to see if updates to it might fix the issue? IIRC 1.4 is nearly 2 years old now. 1.5.2 was released within the last few months. > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Catherine Letondal > Sent: Thursday, February 15, 2007 1:24 PM > To: bioperl-l > Cc: Catherine Letondal; Katja Schuerer > Subject: [Bioperl-l] Problem parsing clustalw format ni AlignIO > > Hi bioperlers, > > I have a script called protal2dna > (http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html, > see attachment #1) that realign DNA sequences giving their > sequences + the corresponding protein alignment (sequences > have to be in the same order or named equivalently). We have > a parsing problem reported from the AlignIO class when users > enter some clustalw file (see attachment #2 for an example): > > % protal2dna alig-protal2dna.dat dna-protal2dna.data no > alignment available in 'clustalw' format from file > 'alig-protal2dna.dat' > % > > I have tried with bioperl 1.4. I have looked in the archive > and in the BUGS, but found nothing? > Is there any bug fix for this? I also provide the DNA > sequences file if you want to test. > > Thanks a lot in advance, > > -- > Catherine Letondal -- Institut Pasteur > www.pasteur.fr/~letondal > > From cjfields at uiuc.edu Thu Feb 15 16:50:54 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 15:50:54 -0600 Subject: [Bioperl-l] Bio::Root::Utilities.pm In-Reply-To: <8f200b4c0702141020x5815018bvd69873222582d9d5@mail.gmail.com> References: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com> <1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu> <8f200b4c0702141020x5815018bvd69873222582d9d5@mail.gmail.com> Message-ID: On Feb 14, 2007, at 12:20 PM, Steve Chervitz wrote: ... >> >> I don't have a problem with adding it back, esp. if tests are added. >> Everything in Bio::Root* not tied to a module was yanked out when no >> one spoke up about cleaning up Bio::Root* modules: >> >> http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12792/ >> focus=12839 >> >> Maybe others disagree? >> >> chris >> > > Sorry I missed out on that thread. I had some trouble with my > bioperl-l > email delivery getting disabled due to excessive bounces, and it > took me a > while to catch it. > > Bio::Root::Utilities is quite a grab bag of miscellaneous general > functions > that are occasionally useful for perl scripting (e.g., determining > end-of-line characters, sending email, etc.). The code could > definitely use > a review, and maybe an example script to advertise it. I can look > into this, > and suggestions are welcome. > > Steve Steve, I have added Root::Utilities back to CVS but I didn't know if I should add back the other related Root modules (didn't know what your future plans were for them). Could the Bio::Root::Global and Bio::Root::Object stuff be consolidated into Bio::Root::Utilities or would that be too problematic? None of the other Bio* modules currently use them. Personally, I use Date::Manip for anything that requires date/time manipulation (updating seq records based on dates, for instance). Some of the other utilities could come in handy, though. Don't know if that helps... chris From cjfields at uiuc.edu Thu Feb 15 16:51:58 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 15:51:58 -0600 Subject: [Bioperl-l] XEMBL deprecation Message-ID: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> I have gone ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService both for deprecation in the wiki and in CVS (though I haven't set any timeline): http://www.bioperl.org/wiki/Deprecated_modules The XEMBL web services are no longer available, and it looks like everything is running through DBFetch now. The XEMBL tests are skipped if no server is detected, so they shouldn't cause any problems with Bioperl installations. Lincoln, was there anything to salvage from these? I noticed they used SOAP::Lite, so maybe we could convert these over to a SOAP-based interface to DBFetch web services? chris From johnsonm at gmail.com Thu Feb 15 17:29:37 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Thu, 15 Feb 2007 16:29:37 -0600 Subject: [Bioperl-l] Bio::Tools::Run::Glimmer? Message-ID: Now that I've got Bio::Tools::Glimmer parsing Glimmer2 and Glimmer3 output, I suppose I might as well go and write Bio::Tools::Run::Glimmer. I suspect another 4-in-1 module may be possible. Now that I think about it, I'll need one for GeneMark, too. Comments? Suggestions on a good module to use as a template? From hlapp at gmx.net Thu Feb 15 20:18:56 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 15 Feb 2007 20:18:56 -0500 Subject: [Bioperl-l] XEMBL deprecation In-Reply-To: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> Message-ID: <989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net> On Feb 15, 2007, at 4:51 PM, Chris Fields wrote: > The XEMBL web services are no longer available What happens if someone invokes the module? Should it maybe return nothing and warn()? I don't think it's a good idea if the module just silently does not function because its backend is no more. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Thu Feb 15 20:48:12 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 19:48:12 -0600 Subject: [Bioperl-l] XEMBL deprecation In-Reply-To: <989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net> References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> <989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net> Message-ID: <00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu> On Feb 15, 2007, at 7:18 PM, Hilmar Lapp wrote: > On Feb 15, 2007, at 4:51 PM, Chris Fields wrote: > >> The XEMBL web services are no longer available > > What happens if someone invokes the module? Should it maybe return > nothing and warn()? I don't think it's a good idea if the module > just silently does not function because its backend is no more. > > -hilmar Yes, I thought the same. I have added a warn() noting the deprecation to the XEMBL constructor and removed XEMBL tests from CVS. The modules are still there for the time being. I actually worry more about the internals; it would be a shame to toss them altogether. Would it be worth it to shift this towards a SOAP-based interface to DBFetch? Or, more precisely, how much trouble would it be to do so? chris From hlapp at gmx.net Thu Feb 15 20:54:29 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 15 Feb 2007 20:54:29 -0500 Subject: [Bioperl-l] XEMBL deprecation In-Reply-To: <00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu> References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> <989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net> <00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu> Message-ID: Well, if dbFetch dosn't have a SOAP based interface, how would you want to do this? -hilmar On Feb 15, 2007, at 8:48 PM, Chris Fields wrote: > On Feb 15, 2007, at 7:18 PM, Hilmar Lapp wrote: > >> On Feb 15, 2007, at 4:51 PM, Chris Fields wrote: >> >>> The XEMBL web services are no longer available >> >> What happens if someone invokes the module? Should it maybe return >> nothing and warn()? I don't think it's a good idea if the module >> just silently does not function because its backend is no more. >> >> -hilmar > > Yes, I thought the same. I have added a warn() noting the > deprecation to the XEMBL constructor and removed XEMBL tests from > CVS. The modules are still there for the time being. > > I actually worry more about the internals; it would be a shame to > toss them altogether. Would it be worth it to shift this towards a > SOAP-based interface to DBFetch? Or, more precisely, how much > trouble would it be to do so? > > chris -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Thu Feb 15 20:59:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 19:59:46 -0600 Subject: [Bioperl-l] XEMBL deprecation In-Reply-To: References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> <989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net> <00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu> Message-ID: <8C7E18C6-B38D-4E15-BE9C-84256B09C312@uiuc.edu> On Feb 15, 2007, at 7:54 PM, Hilmar Lapp wrote: > Well, if dbFetch dosn't have a SOAP based interface, how would you > want to do this? > > -hilmar DBfetch has a SOAP-based interface: http://www.ebi.ac.uk/Tools/webservices/services/dbfetch Just not sure how easy it would be to switch XEMBL code over to using it. We already have Bio::DB::DBFetch so it may be redundant, but I don't recall any other SOAP-based tools in BioPerl beyond some stuff in bioperl-run (and I'm not sure how up-to-date the DBFetch module is). chris From jimhu at tamu.edu Fri Feb 16 00:20:09 2007 From: jimhu at tamu.edu (Jim Hu) Date: Thu, 15 Feb 2007 23:20:09 -0600 Subject: [Bioperl-l] Pathway tools output parser In-Reply-To: References: Message-ID: <1632E2BF-4402-47DE-B750-9763E02711D2@tamu.edu> Hi Chris, I need to check the list more often! I never got an answer here, but Eric Just pointed out a perl api at TAIR that's linked from the BioCyc site. I've used the lisp parser functions from that to move the data to a perl array of arrays, and I'm working on creating object classes for BioCyc objects, starting with genes and products. I need to look at the appropriate ways to link this up to the existing codebase for interconverting to Chado and other BioPerl data types. Jim ===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 On Feb 7, 2007, at 12:07 AM, Chris Mungall wrote: > > Hi Jim > > Did you ever get an answer to this? I'm interested in storing > pathway data > in Chado & I remember enough lisp to get it into something perl- > manageable > like XML > > On Thu, 25 Jan 2007, Jim Hu wrote: > >> Is there a module to parse the lisp object files from Peter Karp's >> Pathway Tools? I need a parser to convert the gene and protein >> objects in EcoCyc releases into something that can be imported into >> Chado. >> ===================================== >> Jim Hu >> Associate Professor >> Dept. of Biochemistry and Biophysics >> 2128 TAMU >> Texas A&M Univ. >> College Station, TX 77843-2128 >> 979-862-4054 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From lstein at cshl.edu Fri Feb 16 08:35:19 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 16 Feb 2007 08:35:19 -0500 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts In-Reply-To: <45D1E2A5.6060104@sendu.me.uk> References: <45D1E2A5.6060104@sendu.me.uk> Message-ID: <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com> Hi, Older versions of Storable can't deal with features that contain subroutine refs. You should get the current version from CPAN. Note that there is a slight security problem here if you don't trust the objects stored in the database. If they contain code refs, the code will be evaluated during deserialization. Lincoln On 2/13/07, Sendu Bala wrote: > > I have some raw sequences in a Bio::DB::SeqFeature::Store mysql database > and wanted to associated some basic information with them, like exon > positions. I thought of creating Bio::SeqFeature::Gene::Transcript > objects and storing them so I could later use features() to see what > other features overlapped exons. I ran into a fatal error that can be > replicated with the following simplified one-liner: > > perl -MBio::DB::SeqFeature::Store -MBio::SeqFeature::Gene::Transcript -e > '$db = Bio::DB::SeqFeature::Store->new(-adaptor => "DBI::mysql", -dsn => > "dbi:mysql:test"); $trans = > Bio::SeqFeature::Gene::Transcript->new(-start => 1, -end => 2, -seq_id > => "test"); $db->store($trans); @trans = $db->features(-seqid => $id, > -type => "transcript"); print "@trans\n";' > > code sub { > package Bio::SeqFeature::Generic; > use strict 'refs'; > my $self = shift @_; > foreach my $f (@{[] unless $$self{'_gsf_sub_array'};}) { > $f = undef; > } > $$self{'_gsf_seq'} = undef; > foreach my $t (keys %{{} unless $$self{'_gsf_tag_hash'};}) { > $$self{'_gsf_tag_hash'}{$t} = undef; > delete $$self{'_gsf_tag_hash'}{$t}; > } > } did not evaluate to a subroutine reference, at > /.../Bio/DB/SeqFeature/Store.pm line 2280 > > > Is this a bug? Or am I taking the wrong approach? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Fri Feb 16 08:47:29 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 16 Feb 2007 08:47:29 -0500 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts In-Reply-To: <45D5B42A.1080303@sendu.me.uk> References: <45D1E2A5.6060104@sendu.me.uk> <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com> <45D5B42A.1080303@sendu.me.uk> Message-ID: <6dce9a0b0702160547s5873cd2bg2c5cf09779138249@mail.gmail.com> Hi Sendu, I'll do a little digging and let you know. Lincoln On 2/16/07, Sendu Bala wrote: > > Lincoln Stein wrote: > > Hi, > > > > Older versions of Storable can't deal with features that contain > > subroutine refs. You should get the current version from CPAN. > > Do you have any idea which version of Storable first supported this? I > can specify that version in Bioperl's Build.PL. > > (else I just just specify the latest version) > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Fri Feb 16 08:52:30 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 16 Feb 2007 08:52:30 -0500 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts In-Reply-To: <45D5B42A.1080303@sendu.me.uk> References: <45D1E2A5.6060104@sendu.me.uk> <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com> <45D5B42A.1080303@sendu.me.uk> Message-ID: <6dce9a0b0702160552l11417675kec7b5b4537cbfa70@mail.gmail.com> It looks like 2.05 or higher is the Storable version to use. It requires B::Deparse, which is (I think) standard on perl 5.6 or higher. Lincoln On 2/16/07, Sendu Bala wrote: > > Lincoln Stein wrote: > > Hi, > > > > Older versions of Storable can't deal with features that contain > > subroutine refs. You should get the current version from CPAN. > > Do you have any idea which version of Storable first supported this? I > can specify that version in Bioperl's Build.PL. > > (else I just just specify the latest version) > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Fri Feb 16 08:55:06 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 16 Feb 2007 08:55:06 -0500 Subject: [Bioperl-l] XEMBL deprecation In-Reply-To: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> Message-ID: <6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com> I like the idea of converting these over to use DBFetch's SOAP services. On the other hand, it isn't llikely that I'm going to have time to do this anytime soon. Probably the best thing to do is to issue a warning and return undef if someone tries to use othe XEMBL module. I'll make that change. Lincoln On 2/15/07, Chris Fields wrote: > > I have gone ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService > both for deprecation in the wiki and in CVS (though I haven't set any > timeline): > > http://www.bioperl.org/wiki/Deprecated_modules > > The XEMBL web services are no longer available, and it looks like > everything is running through DBFetch now. The XEMBL tests are > skipped if no server is detected, so they shouldn't cause any > problems with Bioperl installations. > > Lincoln, was there anything to salvage from these? I noticed they > used SOAP::Lite, so maybe we could convert these over to a SOAP-based > interface to DBFetch web services? > > chris > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Fri Feb 16 08:55:47 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 16 Feb 2007 08:55:47 -0500 Subject: [Bioperl-l] XEMBL deprecation In-Reply-To: <6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com> References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> <6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com> Message-ID: <6dce9a0b0702160555s5b4d7d70n936c603bc79dd40d@mail.gmail.com> Oh, looks like someone has inserted the warnings already. Good. Lincoln On 2/16/07, Lincoln Stein wrote: > > I like the idea of converting these over to use DBFetch's SOAP services. > On the other hand, it isn't llikely that I'm going to have time to do this > anytime soon. > > Probably the best thing to do is to issue a warning and return undef if > someone tries to use othe XEMBL module. I'll make that change. > > Lincoln > > On 2/15/07, Chris Fields wrote: > > > > I have gone ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService > > both for deprecation in the wiki and in CVS (though I haven't set any > > timeline): > > > > http://www.bioperl.org/wiki/Deprecated_modules > > > > The XEMBL web services are no longer available, and it looks like > > everything is running through DBFetch now. The XEMBL tests are > > skipped if no server is detected, so they shouldn't cause any > > problems with Bioperl installations. > > > > Lincoln, was there anything to salvage from these? I noticed they > > used SOAP::Lite, so maybe we could convert these over to a SOAP-based > > interface to DBFetch web services? > > > > chris > > > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From bix at sendu.me.uk Fri Feb 16 08:56:50 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 16 Feb 2007 13:56:50 +0000 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts In-Reply-To: <6dce9a0b0702160552l11417675kec7b5b4537cbfa70@mail.gmail.com> References: <45D1E2A5.6060104@sendu.me.uk> <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com> <45D5B42A.1080303@sendu.me.uk> <6dce9a0b0702160552l11417675kec7b5b4537cbfa70@mail.gmail.com> Message-ID: <45D5B822.6080908@sendu.me.uk> Lincoln Stein wrote: > It looks like 2.05 or higher is the Storable version to use. It requires > B::Deparse, which is (I think) standard on perl 5.6 or higher. Thanks, now recommended in Build.PL From cjfields at uiuc.edu Fri Feb 16 09:05:08 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 16 Feb 2007 08:05:08 -0600 Subject: [Bioperl-l] XEMBL deprecation In-Reply-To: <6dce9a0b0702160555s5b4d7d70n936c603bc79dd40d@mail.gmail.com> References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> <6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com> <6dce9a0b0702160555s5b4d7d70n936c603bc79dd40d@mail.gmail.com> Message-ID: I added the warning yesterday. We can add something to the project priority list on modifying XEMBL to use DBFetch instead; I like the SOAP-based interface. I am thinking of a similar interface for NCBI eutils but I haven't had time to work on it. chris On Feb 16, 2007, at 7:55 AM, Lincoln Stein wrote: > Oh, looks like someone has inserted the warnings already. Good. > > Lincoln > > On 2/16/07, Lincoln Stein wrote:I like the idea > of converting these over to use DBFetch's SOAP services. On the > other hand, it isn't llikely that I'm going to have time to do this > anytime soon. > > Probably the best thing to do is to issue a warning and return > undef if someone tries to use othe XEMBL module. I'll make that > change. > > Lincoln > > > On 2/15/07, Chris Fields < cjfields at uiuc.edu> wrote: I have gone > ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService > both for deprecation in the wiki and in CVS (though I haven't set any > timeline): > > http://www.bioperl.org/wiki/Deprecated_modules > > The XEMBL web services are no longer available, and it looks like > everything is running through DBFetch now. The XEMBL tests are > skipped if no server is detected, so they shouldn't cause any > problems with Bioperl installations. > > Lincoln, was there anything to salvage from these? I noticed they > used SOAP::Lite, so maybe we could convert these over to a SOAP-based > interface to DBFetch web services? > > chris > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Fri Feb 16 08:39:54 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 16 Feb 2007 13:39:54 +0000 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts In-Reply-To: <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com> References: <45D1E2A5.6060104@sendu.me.uk> <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com> Message-ID: <45D5B42A.1080303@sendu.me.uk> Lincoln Stein wrote: > Hi, > > Older versions of Storable can't deal with features that contain > subroutine refs. You should get the current version from CPAN. Do you have any idea which version of Storable first supported this? I can specify that version in Bioperl's Build.PL. (else I just just specify the latest version) From eu at otelo-online.de Sat Feb 17 07:55:08 2007 From: eu at otelo-online.de (eu at otelo-online.de) Date: Sat, 17 Feb 2007 13:55:08 +0100 (CET) Subject: [Bioperl-l] Bioperl Module OddCodes(help) Message-ID: <29037001.1171716908969.JavaMail.ngmail@webmail18> Hello @all, i want translate a Sequence in Fasta Format only to acidic,basic and polar dependent on the pH. OddCodes Module can ony to acidic,basic, polar and hydrophobic. And i think on default pH. Can somebody help me? I dont know whether it is possible? Because i need for each amino acid a positive, negative charge and unchargedly. thx Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: g?nstig und schnell mit DSL - das All-Inclusive-Paket f?r clevere Doppel-Sparer, nur 44,85 ? inkl. DSL- und ISDN-Grundgeb?hr! http://www.arcor.de/rd/emf-dsl-2 From The_Polymorph at rocketmail.com Sun Feb 18 14:08:34 2007 From: The_Polymorph at rocketmail.com (Caitlin) Date: Sun, 18 Feb 2007 11:08:34 -0800 (PST) Subject: [Bioperl-l] Missing ppm for Bioperl 1.5.2_100(?) Message-ID: <148421.50501.qm@web50801.mail.yahoo.com> Hi. In an attempt to upgrade my Bioperl install from 1.5.2 RC5 to 1.5.2_100, I noticed the ppm was not found on the activestate repositories. Thanks, ~Caitlin ____________________________________________________________________________________ No need to miss a message. Get email on-the-go with Yahoo! Mail for Mobile. Get started. http://mobile.yahoo.com/mail From bix at sendu.me.uk Sun Feb 18 15:36:03 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 18 Feb 2007 20:36:03 +0000 Subject: [Bioperl-l] Missing ppm for Bioperl 1.5.2_100(?) In-Reply-To: <148421.50501.qm@web50801.mail.yahoo.com> References: <148421.50501.qm@web50801.mail.yahoo.com> Message-ID: <45D8B8B3.4000408@sendu.me.uk> Caitlin wrote: > Hi. > > In an attempt to upgrade my Bioperl install from 1.5.2 RC5 to > 1.5.2_100, I noticed the ppm was not found on the activestate > repositories. Follow the install instructions: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows Its not in the normal activestate repository, but on bioperl.org. From t.nugent at cs.ucl.ac.uk Mon Feb 19 12:29:48 2007 From: t.nugent at cs.ucl.ac.uk (Tim Nugent) Date: Mon, 19 Feb 2007 17:29:48 +0000 Subject: [Bioperl-l] Module to draw transmembrane protein toplogoy Message-ID: <45D9DE8C.2010301@cs.ucl.ac.uk> Hi everyone, I've written a perl module to display transmembrane protein topology using GD. There are various options, including labels, helix/loop dimensions, colour schemes etc but it only requires a string or array containing the protein topology (e.g. transmembrane helix start/stop points). It produces output like this: http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png using the code at the bottom. Here is a the module: http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm I've never submitted anything to Bioperl before - is this sort of thing likely to be of use to others? I imagine it would sit alongside some of the Bio::Graphics stuff. Best wishes, Tim #!/usr/bin/perl use strict; use warnings; use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to module use DrawTransmembrane; my @topology = (20,45,59,70,86,109,145,168,194,220); my %labels = ('5' => '5 - Sulphation Site', '21' => '1st Helix', '47' => '40 - Mutation', '60' => 'Voltage Sensor', '72' => '72 - Mutation 2', '73' => '73 - Mutation 3', '138' => '138 - Glycosylation Site', '170' => '170 - Phosphorylation Site', '200' => 'Last Helix'); my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a cartoon displaying transmembrane helices.', -topology => \@topology, -n_terminal => 'out', -helix_width => 48, -helix_height => 125, -short_loop_limit => 10, -long_loop_limit => 35, -loop_width => 25, -colour_scheme => 'yellow', -labels => \%labels, -text_offset => -10); ## print the .png file my $output = 'test.png'; open(OUTPUT, ">$output"); binmode OUTPUT; print OUTPUT $im->png; close OUTPUT; my $system = `display $output`; -- Tim Nugent (MRes) Research Student Bioinformatics Unit Department of Computer Science University College London Gower Street London WC1E 6BT Tel: 020-7679-0410 t.nugent at ucl.ac.uk From bix at sendu.me.uk Mon Feb 19 12:42:23 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 19 Feb 2007 17:42:23 +0000 Subject: [Bioperl-l] t/FeatureHolder.x Message-ID: <45D9E17F.4030302@sendu.me.uk> Is this supposed to work? It doesn't get run in the test suite normally because of its name. With a live checkout I get: ./Build test --test_files t/FeatureHolder.x --verbose t/FeatureHolder....1..6 ok 1 ok 2 Set group tag to: locus_tag GROUPS: GROUP [?]:source [snip] resolved pair Bio::SeqFeature::Generic=HASH(0x1375dc0) Bio::SeqFeature::Generic=HASH(0x1362830) UNFLATTENING GROUP: GROUP [?]:gene UNFLATTENING GROUP: GROUP [?]:repeat_region UNFLATTENING GROUP: GROUP [?]:gene UNFLATTENING GROUP: GROUP [?]:repeat_region UNFLATTENING GROUP: GROUP [BG:DS07721.3]:gene mRNA CDS UNFLATTENING GROUP: GROUP [BG:DS07721.6]:gene mRNA CDS ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: DUPLICATE ID: AAF53399.1 STACK: Error::throw STACK: Bio::Root::Root::throw /home/sendu/src/bioperl/core/blib/lib/Bio/Root/Root.pm:359 STACK: Bio::SeqFeature::Tools::IDHandler::create_hierarchy_from_ParentIDs /home/sendu/src/bioperl/core/blib/lib/Bio/SeqFeature/Tools/IDHandler.pm:175 STACK: Bio::FeatureHolderI::create_hierarchy_from_ParentIDs /home/sendu/src/bioperl/core/blib/lib/Bio/FeatureHolderI.pm:245 STACK: t/FeatureHolder.x:68 ----------------------------------------------------------- dubious Test returned status 255 (wstat 65280, 0xff00) DIED. FAILED tests 3-6 Failed 4/6 tests, 33.33% okay Failed Test Stat Wstat Total Fail List of Failed ------------------------------------------------------------------------------- t/FeatureHolder.x 255 65280 6 8 3-6 Failed 1/1 test scripts. 4/6 subtests failed. Files=1, Tests=6, 1 wallclock secs ( 0.55 cusr + 0.04 csys = 0.59 CPU) Failed 1/1 test programs. 4/6 subtests failed. It also fails quite differently with 1.5.2. From cjfields at uiuc.edu Mon Feb 19 15:04:20 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 19 Feb 2007 14:04:20 -0600 Subject: [Bioperl-l] t/FeatureHolder.x In-Reply-To: <45D9E17F.4030302@sendu.me.uk> References: <45D9E17F.4030302@sendu.me.uk> Message-ID: <534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu> Looks like that's some of Chris Mungall's stuff for GFF3. Don't know if he's stalking the mail list. Wonder if this has anything to do the feature/annotation changes around rel 1.5. (the other) chris On Feb 19, 2007, at 11:42 AM, Sendu Bala wrote: > Is this supposed to work? It doesn't get run in the test suite > normally > because of its name. > > With a live checkout I get: > ./Build test --test_files t/FeatureHolder.x --verbose > t/FeatureHolder....1..6 ... From cjfields at uiuc.edu Mon Feb 19 16:24:04 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 19 Feb 2007 15:24:04 -0600 Subject: [Bioperl-l] Module to draw transmembrane protein toplogoy In-Reply-To: <45D9DE8C.2010301@cs.ucl.ac.uk> References: <45D9DE8C.2010301@cs.ucl.ac.uk> Message-ID: <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu> I think this is pretty nice! We can add the code and test script to bugzilla and (if someone has time) try to see where it might fit in, though Bio::Graphics sounds like a good spot. Anyone else have ideas on where this could go? chris On Feb 19, 2007, at 11:29 AM, Tim Nugent wrote: > Hi everyone, > > I've written a perl module to display transmembrane protein topology > using GD. There are various options, including labels, helix/loop > dimensions, colour schemes etc but it only requires a string or array > containing the protein topology (e.g. transmembrane helix start/stop > points). It produces output like this: > > http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png > http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png > > using the code at the bottom. > > Here is a the module: > http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm > > I've never submitted anything to Bioperl before - is this sort of > thing > likely to be of use to others? I imagine it would sit alongside > some of > the Bio::Graphics stuff. > > Best wishes, > > Tim > > #!/usr/bin/perl > > use strict; > use warnings; > use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to module > use DrawTransmembrane; > > my @topology = (20,45,59,70,86,109,145,168,194,220); > > my %labels = ('5' => '5 - Sulphation Site', > '21' => '1st Helix', > '47' => '40 - Mutation', > '60' => 'Voltage Sensor', > '72' => '72 - Mutation 2', > '73' => '73 - Mutation 3', > '138' => '138 - Glycosylation Site', > '170' => '170 - Phosphorylation Site', > '200' => 'Last Helix'); > > my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a > cartoon displaying transmembrane helices.', > -topology => > \@topology, > -n_terminal => 'out', > -helix_width => 48, > -helix_height => 125, > -short_loop_limit > => 10, > -long_loop_limit => > 35, > -loop_width => 25, > -colour_scheme => > 'yellow', > -labels => \%labels, > -text_offset => -10); > > ## print the .png file > my $output = 'test.png'; > open(OUTPUT, ">$output"); > binmode OUTPUT; > print OUTPUT $im->png; > close OUTPUT; > > my $system = `display $output`; > > -- > Tim Nugent (MRes) > Research Student > Bioinformatics Unit > Department of Computer Science > University College London > Gower Street > London WC1E 6BT > Tel: 020-7679-0410 > t.nugent at ucl.ac.uk > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjm at fruitfly.org Mon Feb 19 17:23:56 2007 From: cjm at fruitfly.org (Chris Mungall) Date: Mon, 19 Feb 2007 14:23:56 -0800 Subject: [Bioperl-l] t/FeatureHolder.x In-Reply-To: <534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu> References: <45D9E17F.4030302@sendu.me.uk> <534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu> Message-ID: On Feb 19, 2007, at 12:04 PM, Chris Fields wrote: > Looks like that's some of Chris Mungall's stuff for GFF3. Don't know > if he's stalking the mail list. occasionally.. > Wonder if this has anything to do the feature/annotation changes > around rel 1.5. possibly even before then. there was a reason for the .x prefix... I think it was intended to denote requirements; tests that don't pass yet but should in the future anyway, this file can go > (the other) chris > > On Feb 19, 2007, at 11:42 AM, Sendu Bala wrote: > >> Is this supposed to work? It doesn't get run in the test suite >> normally >> because of its name. >> >> With a live checkout I get: >> ./Build test --test_files t/FeatureHolder.x --verbose >> t/FeatureHolder....1..6 > ... > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From torsten.seemann at infotech.monash.edu.au Mon Feb 19 18:20:48 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 20 Feb 2007 10:20:48 +1100 Subject: [Bioperl-l] Bioperl Module OddCodes(help) In-Reply-To: <29037001.1171716908969.JavaMail.ngmail@webmail18> References: <29037001.1171716908969.JavaMail.ngmail@webmail18> Message-ID: > i want translate a Sequence in Fasta Format only to acidic,basic and polar dependent on the pH. > OddCodes Module can ony to acidic,basic, polar and hydrophobic. And i think on default pH. > Can somebody help me? I dont know whether it is possible? > Because i need for each amino acid a positive, negative charge and unchargedly. The latest released Bioperl 1.5.x has a charge() function which does what you want: http://doc.bioperl.org/releases/bioperl-1.5.2/Bio/Tools/OddCodes.html It returns A, N, C for the charges. --Torsten From bix at sendu.me.uk Tue Feb 20 06:18:14 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 20 Feb 2007 11:18:14 +0000 Subject: [Bioperl-l] Bio::Graphics::FeatureBase seq_id question Message-ID: <45DAD8F6.1030409@sendu.me.uk> Bio::Graphics::FeatureBase::seq_id is currently implemented as a read-only alias to ref(): sub seq_id { shift->ref() } What is the reasoning behind this? Can it be made to handle setting of the value as well?: sub seq_id { shift->ref(@_) } Cheers, Sendu. From cjfields at uiuc.edu Tue Feb 20 08:39:11 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Feb 2007 07:39:11 -0600 Subject: [Bioperl-l] t/FeatureHolder.x In-Reply-To: References: <45D9E17F.4030302@sendu.me.uk> <534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu> Message-ID: <67E26F10-67D5-405E-A00E-826EF51C476F@uiuc.edu> On Feb 19, 2007, at 4:23 PM, Chris Mungall wrote: > On Feb 19, 2007, at 12:04 PM, Chris Fields wrote: > >> Looks like that's some of Chris Mungall's stuff for GFF3. Don't know >> if he's stalking the mail list. > > occasionally.. > >> Wonder if this has anything to do the feature/annotation changes >> around rel 1.5. > > possibly even before then. > > there was a reason for the .x prefix... I think it was intended to > denote requirements; tests that don't pass yet but should in the > future > > anyway, this file can go Chris, I removed it from CVS. Thanks! (the other) chris besides chris D. P.S. I may have some Data::Stag questions for you at some point. I'm guessing you're still at fruitfly.org? From cjfields at uiuc.edu Tue Feb 20 08:29:20 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Feb 2007 07:29:20 -0600 Subject: [Bioperl-l] Fwd: help on remote blast References: <20070220073200.M42567@bic.boseinst.ernet.in> Message-ID: <6CC54E14-0581-45AF-8F12-E500A2FFDE86@uiuc.edu> Sanjib, You shouldn't email the developers directly. Questions like this should go to the bioperl mail list in case I (or others) can't answer them immediately. chris Begin forwarded message: > From: "Sanjib Kumar Gupta" > Date: February 20, 2007 1:32:00 AM CST > To: cjfields at uiuc.edu > Subject: help on remote blast > > Dear Dr. Chris > I am very new usedr to bioperl. and have been using the script for > retrieving some blast sequences . But suddenly it has stopped > retrieving > #perl n9.pl > te.pep > waiting........ > for a long time > > I am attaching the file. Can you please tell me what I should do so > that it > again runs. > > > -- > Sanjib Kumar Gupta > Bioinformatics Centre > Bose Institute > Kolkata 700054, INDIA > Phone : +91-33-2355 6626, 2816, 2355 4766 > Fax : +91-33-2355 3886 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: n9.pl Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070220/02f96eab/attachment.pl -------------- next part -------------- Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From t.nugent at cs.ucl.ac.uk Tue Feb 20 09:31:20 2007 From: t.nugent at cs.ucl.ac.uk (Tim Nugent) Date: Tue, 20 Feb 2007 14:31:20 +0000 Subject: [Bioperl-l] Module to draw transmembrane protein topology In-Reply-To: <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu> References: <45D9DE8C.2010301@cs.ucl.ac.uk> <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu> Message-ID: <45DB0638.1030001@cs.ucl.ac.uk> Thanks Chris, glad it's appreciated. Is there anything else I can do? If anyone has any requests/suggestions please let me know too. Best wishes, Tim Chris Fields wrote: > I think this is pretty nice! We can add the code and test script to > bugzilla and (if someone has time) try to see where it might fit in, > though Bio::Graphics sounds like a good spot. > > Anyone else have ideas on where this could go? > > chris > > On Feb 19, 2007, at 11:29 AM, Tim Nugent wrote: > >> Hi everyone, >> >> I've written a perl module to display transmembrane protein topology >> using GD. There are various options, including labels, helix/loop >> dimensions, colour schemes etc but it only requires a string or array >> containing the protein topology (e.g. transmembrane helix start/stop >> points). It produces output like this: >> >> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png >> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png >> >> using the code at the bottom. >> >> Here is a the module: >> http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm >> >> I've never submitted anything to Bioperl before - is this sort of >> thing >> likely to be of use to others? I imagine it would sit alongside >> some of >> the Bio::Graphics stuff. >> >> Best wishes, >> >> Tim >> >> #!/usr/bin/perl >> >> use strict; >> use warnings; >> use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to module >> use DrawTransmembrane; >> >> my @topology = (20,45,59,70,86,109,145,168,194,220); >> >> my %labels = ('5' => '5 - Sulphation Site', >> '21' => '1st Helix', >> '47' => '40 - Mutation', >> '60' => 'Voltage Sensor', >> '72' => '72 - Mutation 2', >> '73' => '73 - Mutation 3', >> '138' => '138 - Glycosylation Site', >> '170' => '170 - Phosphorylation Site', >> '200' => 'Last Helix'); >> >> my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a >> cartoon displaying transmembrane helices.', >> -topology => >> \@topology, >> -n_terminal => 'out', >> -helix_width => 48, >> -helix_height => 125, >> -short_loop_limit >> => 10, >> -long_loop_limit => >> 35, >> -loop_width => 25, >> -colour_scheme => >> 'yellow', >> -labels => \%labels, >> -text_offset => -10); >> >> ## print the .png file >> my $output = 'test.png'; >> open(OUTPUT, ">$output"); >> binmode OUTPUT; >> print OUTPUT $im->png; >> close OUTPUT; >> >> my $system = `display $output`; >> >> -- >> Tim Nugent (MRes) >> Research Student >> Bioinformatics Unit >> Department of Computer Science >> University College London >> Gower Street >> London WC1E 6BT >> Tel: 020-7679-0410 >> t.nugent at ucl.ac.uk >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Tim Nugent (MRes) Research Student Bioinformatics Unit Department of Computer Science University College London Gower Street London WC1E 6BT Tel: 020-7679-0410 t.nugent at ucl.ac.uk From marian.thieme at lycos.de Tue Feb 20 08:34:24 2007 From: marian.thieme at lycos.de (marian thieme) Date: Tue, 20 Feb 2007 13:34:24 +0000 Subject: [Bioperl-l] Alignment Message-ID: <188661178021328@lycos-europe.com> Hi all, perhaps somebody can give some comments in the following matter: I have a series of sequences which should be aligned against a reference sequence. In this special case we dont need to calculate anything, we only need to represent the sequences and get for instance some columns of interest. The problem now is, that some sequences have gaps and we need to represent gaps in the rewference sequence as well as in some individual sequences. Question: Can I use LocatableSeq to describe sequences with gaps and to add the sequence to the alignment ? If yes how I have to understand the example in the doc: use Bio::LocatableSeq; my $seq = new Bio::LocatableSeq(-seq => "CAGT-GGT",-id => "seq1", -start => 1,-end => 7); Does the "-" sign represents a gap ? When this sequence starts at position 1 why it ends at position 7, because when considering the gap, there are 8 positions. Does the SimpleAlign object can treat the gap ? Thanks for your attention, Marian Benachrichtigung bei E-Mail Empfang! - http://mail.lycos.de/app/lycosinside/setupLI.exe From cjfields at uiuc.edu Tue Feb 20 09:40:38 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Feb 2007 08:40:38 -0600 Subject: [Bioperl-l] Module to draw transmembrane protein topology In-Reply-To: <45DB0638.1030001@cs.ucl.ac.uk> References: <45D9DE8C.2010301@cs.ucl.ac.uk> <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu> <45DB0638.1030001@cs.ucl.ac.uk> Message-ID: You can add the module and test code (the script) to bugzilla: http://www.bioperl.org/wiki/Bugs http://bugzilla.open-bio.org/ Basically file a new bug report but note that it in an enhancement request when filling it out. Attach the code and test script to the report after it is generated (note that it may be easier to add all of the files together as a zipped archive). I think you could also add the graphical output as a binary file if they are huge files. chris On Feb 20, 2007, at 8:31 AM, Tim Nugent wrote: > Thanks Chris, glad it's appreciated. > > Is there anything else I can do? If anyone has any requests/ > suggestions please let me know too. > > Best wishes, > > Tim > > Chris Fields wrote: >> I think this is pretty nice! We can add the code and test script >> to bugzilla and (if someone has time) try to see where it might >> fit in, though Bio::Graphics sounds like a good spot. >> Anyone else have ideas on where this could go? >> chris >> On Feb 19, 2007, at 11:29 AM, Tim Nugent wrote: >>> Hi everyone, >>> >>> I've written a perl module to display transmembrane protein topology >>> using GD. There are various options, including labels, helix/loop >>> dimensions, colour schemes etc but it only requires a string or >>> array >>> containing the protein topology (e.g. transmembrane helix start/stop >>> points). It produces output like this: >>> >>> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png >>> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png >>> >>> using the code at the bottom. >>> >>> Here is a the module: >>> http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm >>> >>> I've never submitted anything to Bioperl before - is this sort >>> of thing >>> likely to be of use to others? I imagine it would sit alongside >>> some of >>> the Bio::Graphics stuff. >>> >>> Best wishes, >>> >>> Tim >>> >>> #!/usr/bin/perl >>> >>> use strict; >>> use warnings; >>> use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to >>> module >>> use DrawTransmembrane; >>> >>> my @topology = (20,45,59,70,86,109,145,168,194,220); >>> >>> my %labels = ('5' => '5 - Sulphation Site', >>> '21' => '1st Helix', >>> '47' => '40 - Mutation', >>> '60' => 'Voltage Sensor', >>> '72' => '72 - Mutation 2', >>> '73' => '73 - Mutation 3', >>> '138' => '138 - Glycosylation Site', >>> '170' => '170 - Phosphorylation Site', >>> '200' => 'Last Helix'); >>> >>> my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a >>> cartoon displaying transmembrane helices.', >>> -topology => >>> \@topology, >>> -n_terminal => >>> 'out', >>> -helix_width => 48, >>> -helix_height => >>> 125, >>> - >>> short_loop_limit => 10, >>> -long_loop_limit >>> => 35, >>> -loop_width => 25, >>> -colour_scheme >>> => 'yellow', >>> -labels => \%labels, >>> -text_offset => >>> -10); >>> >>> ## print the .png file >>> my $output = 'test.png'; >>> open(OUTPUT, ">$output"); >>> binmode OUTPUT; >>> print OUTPUT $im->png; >>> close OUTPUT; >>> >>> my $system = `display $output`; >>> >>> -- >>> Tim Nugent (MRes) >>> Research Student >>> Bioinformatics Unit >>> Department of Computer Science >>> University College London >>> Gower Street >>> London WC1E 6BT >>> Tel: 020-7679-0410 >>> t.nugent at ucl.ac.uk >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Tim Nugent (MRes) > Research Student > Bioinformatics Unit > Department of Computer Science > University College London > Gower Street > London WC1E 6BT > Tel: 020-7679-0410 > t.nugent at ucl.ac.uk Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From avilella at gmail.com Tue Feb 20 10:30:17 2007 From: avilella at gmail.com (Albert Vilella) Date: Tue, 20 Feb 2007 15:30:17 +0000 Subject: [Bioperl-l] Alignment In-Reply-To: <188661178021328@lycos-europe.com> References: <188661178021328@lycos-europe.com> Message-ID: <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com> I think the SimpleAlign object contains a set of sequences, each of which is a LocatableSeq object. These LocatableSeq objects will have gaps, represented by '-' or whatever other symbol is specified (I think there are methods for it), and then one can use methods like column_from_residue_number to map the coordinates between the primary sequence and the aligned sequence. The perldoc for LocatableSeq has some examples on how to use these methods. [Hopefully I haven't written any lie in this message], Cheers, Albert. On 2/20/07, marian thieme wrote: > Hi all, > > perhaps somebody can give some comments in the following matter: > > I have a series of sequences which should be aligned against a reference sequence. > In this special case we dont need to calculate anything, we only need to represent the sequences and get for instance some columns of interest. > The problem now is, that some sequences have gaps and we need to represent gaps in the rewference sequence as well as in some individual sequences. > > Question: Can I use LocatableSeq to describe sequences with gaps and to add the sequence to the alignment ? > If yes how I have to understand the example in the doc: > use Bio::LocatableSeq; > my $seq = new Bio::LocatableSeq(-seq => "CAGT-GGT",-id => "seq1", -start => 1,-end => 7); > > Does the "-" sign represents a gap ? When this sequence starts at position 1 > why it ends at position 7, because when considering the gap, there are 8 positions. > Does the SimpleAlign object can treat the gap ? > > > Thanks for your attention, > Marian > > Benachrichtigung bei E-Mail Empfang! - http://mail.lycos.de/app/lycosinside/setupLI.exe > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Tue Feb 20 10:30:15 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Feb 2007 09:30:15 -0600 Subject: [Bioperl-l] Module to draw transmembrane protein topology In-Reply-To: <45DB0638.1030001@cs.ucl.ac.uk> References: <45D9DE8C.2010301@cs.ucl.ac.uk> <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu> <45DB0638.1030001@cs.ucl.ac.uk> Message-ID: <4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu> Sorry, I sent that last one off prematurely. I could see this being used as a very useful utility if a Bioperl object had SeqFeatures which described transmembrane regions, or if output from something like TMHMM were parsed and used for input. Don't know if it's included, but if not you probably should allow labeling of the intracellular/extracellular space to designate periplasmic space, mitochondrial matrix, thylakoid, etc. I think Bio::Graphics namespace is definitely the place to go. If I ever get around to writing up the RNA structural stuff I may put something there myself. chris On Feb 20, 2007, at 8:31 AM, Tim Nugent wrote: > Thanks Chris, glad it's appreciated. > > Is there anything else I can do? If anyone has any requests/ > suggestions > please let me know too. > > Best wishes, > > Tim From cjfields at uiuc.edu Tue Feb 20 10:49:56 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Feb 2007 09:49:56 -0600 Subject: [Bioperl-l] Alignment In-Reply-To: <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com> References: <188661178021328@lycos-europe.com> <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com> Message-ID: <97E36074-1CF4-4348-85AB-DF23F1048727@uiuc.edu> On Feb 20, 2007, at 9:30 AM, Albert Vilella wrote: > I think the SimpleAlign object contains a set of sequences, each of > which is a LocatableSeq object. > > These LocatableSeq objects will have gaps, represented by '-' or > whatever other symbol is specified (I think there are methods for it), > and then one can use methods like column_from_residue_number to map > the coordinates between the primary sequence and the aligned sequence. > The perldoc for LocatableSeq has some examples on how to use these > methods. > > [Hopefully I haven't written any lie in this message], > > Cheers, > > Albert. No lies. The comparison methods are in SimpleAlign; if you look in SimpleAlign.t you'll see several demos on how to go abouot adding LocatableSeqs to a SimpleAlign object and then use SimpleAlign methods for them. chris PS (to marian): I'm a bit behind this week, so the bracket_strings stuff is lagging behind; I'm writing up some stuff on a deadline. From t.nugent at cs.ucl.ac.uk Tue Feb 20 10:50:10 2007 From: t.nugent at cs.ucl.ac.uk (Tim Nugent) Date: Tue, 20 Feb 2007 15:50:10 +0000 Subject: [Bioperl-l] Module to draw transmembrane protein topology In-Reply-To: <4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu> References: <45D9DE8C.2010301@cs.ucl.ac.uk> <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu> <45DB0638.1030001@cs.ucl.ac.uk> <4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu> Message-ID: <45DB18B2.8070004@cs.ucl.ac.uk> Labeling of inside/outside and membrane is already possible via -inside_label, -outside_label and -membrane_label tags, defaults are intracellular, extracellular and plasma membrane. Was definitely going to add an input/parser for MEMSAT, developed here at UCL, and probably a few other popular TM predictors too, e.g. PHOBIUS, TMHMM etc. Can already accept topology in the string format used by OPM (http://opm.phar.umich.edu/). Tim Chris Fields wrote: > Sorry, I sent that last one off prematurely. > > I could see this being used as a very useful utility if a Bioperl object > had SeqFeatures which described transmembrane regions, or if output from > something like TMHMM were parsed and used for input. Don't know if it's > included, but if not you probably should allow labeling of the > intracellular/extracellular space to designate periplasmic space, > mitochondrial matrix, thylakoid, etc. > > I think Bio::Graphics namespace is definitely the place to go. If I > ever get around to writing up the RNA structural stuff I may put > something there myself. > > chris > > On Feb 20, 2007, at 8:31 AM, Tim Nugent wrote: > >> Thanks Chris, glad it's appreciated. >> >> Is there anything else I can do? If anyone has any requests/suggestions >> please let me know too. >> >> Best wishes, >> >> Tim > > -- Tim Nugent (MRes) Research Student Bioinformatics Unit Department of Computer Science University College London Gower Street London WC1E 6BT Tel: 020-7679-0410 t.nugent at ucl.ac.uk From cjfields at uiuc.edu Tue Feb 20 11:09:00 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Feb 2007 10:09:00 -0600 Subject: [Bioperl-l] Module to draw transmembrane protein topology In-Reply-To: <45DB18B2.8070004@cs.ucl.ac.uk> References: <45D9DE8C.2010301@cs.ucl.ac.uk> <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu> <45DB0638.1030001@cs.ucl.ac.uk> <4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu> <45DB18B2.8070004@cs.ucl.ac.uk> Message-ID: On Feb 20, 2007, at 9:50 AM, Tim Nugent wrote: > Labeling of inside/outside and membrane is already possible via - > inside_label, -outside_label and -membrane_label tags, defaults are > intracellular, extracellular and plasma membrane. > > Was definitely going to add an input/parser for MEMSAT, developed > here at UCL, and probably a few other popular TM predictors too, > e.g. PHOBIUS, TMHMM etc. Can already accept topology in the string > format used by OPM (http://opm.phar.umich.edu/). > > Tim I'll definitely have to take a closer look at it when I have time. My guess is the best fit for data would be a seqfeatures, either in a collection or a Bio::Seq. As for the parsers you can look at the Bio::Tools::Tmhmm module, which scans Tmhmm output and converts everything to seqfeatures. chris From lstein at cshl.edu Tue Feb 20 12:25:24 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 20 Feb 2007 12:25:24 -0500 Subject: [Bioperl-l] Bio::Graphics::FeatureBase seq_id question In-Reply-To: <45DAD8F6.1030409@sendu.me.uk> References: <45DAD8F6.1030409@sendu.me.uk> Message-ID: <6dce9a0b0702200925g74d2db53j3252cca8a41765b@mail.gmail.com> Just an oversight. I'll fix it. Lincoln On 2/20/07, Sendu Bala wrote: > > Bio::Graphics::FeatureBase::seq_id is currently implemented as a > read-only alias to ref(): > sub seq_id { shift->ref() } > > > What is the reasoning behind this? Can it be made to handle setting of > the value as well?: > sub seq_id { shift->ref(@_) } > > > Cheers, > Sendu. > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From khan at cshl.edu Tue Feb 20 15:42:12 2007 From: khan at cshl.edu (Khan, Sohail) Date: Tue, 20 Feb 2007 15:42:12 -0500 Subject: [Bioperl-l] parsing a list of ids to a fasta file. Message-ID: Dear list, I am new to Bio-Perl. I have the following question: I have a list of ids, which I would like to parse against a large fasta file to retrieve the Seq for parsed ids. I appreciate any suggestions. Thanks. Khan From michael.watson at bbsrc.ac.uk Tue Feb 20 16:33:19 2007 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue, 20 Feb 2007 21:33:19 -0000 Subject: [Bioperl-l] parsing a list of ids to a fasta file. References: Message-ID: <8975119BCD0AC5419D61A9CF1A923E95020680FD@iahce2ksrv1.iah.bbsrc.ac.uk> Suggest you use Bio::Index::Fasta to create an index for the fasta file and then a simple script to retrieve sequences using that index. Or just use the pre-written bp_index.PLS and bp_fetch.PLS scripts. http://www.bioperl.org/wiki/Module:Bio::Index::Fasta ________________________________ From: bioperl-l-bounces at lists.open-bio.org on behalf of Khan, Sohail Sent: Tue 20/02/2007 8:42 PM To: Bioperl-l at lists.open-bio.org Subject: [Bioperl-l] parsing a list of ids to a fasta file. Dear list, I am new to Bio-Perl. I have the following question: I have a list of ids, which I would like to parse against a large fasta file to retrieve the Seq for parsed ids. I appreciate any suggestions. Thanks. Khan _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From neetisomaiya at gmail.com Wed Feb 21 03:19:14 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Wed, 21 Feb 2007 13:49:14 +0530 Subject: [Bioperl-l] need help in Bio-SCF Message-ID: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> Hi All, I downloaded module Bio-SCF-1.01from CPAN. And I am trying to install it when I got the following error. Can someone please guide me. [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL Checking if your kit is complete... Looks good Note (probably harmless): No library found for -lread Writing Makefile for Bio::SCF [root at ps2288 Bio-SCF-1.01]# make cp SCF.pm blib/lib/Bio/SCF.pm cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp -typemap /usr/lib/perl5/5.8.5/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc SCF.c Please specify prototyping behavior for SCF.xs (see perlxs manual) gcc -c -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386 -mtune=pentium4 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE" -DLITTLE_ENDIAN SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory SCF.xs:13:26: io_lib/mFILE.h: No such file or directory SCF.xs: In function `XS_Bio__SCF_get_scf_pointer': SCF.xs:27: error: `Scf' undeclared (first use in this function) SCF.xs:27: error: (Each undeclared identifier is reported only once SCF.xs:27: error: for each function it appears in.) SCF.xs:27: error: `scf_data' undeclared (first use in this function) SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer': SCF.xs:66: error: `Scf' undeclared (first use in this function) SCF.xs:66: error: `scf_data' undeclared (first use in this function) SCF.xs:68: error: `mFILE' undeclared (first use in this function) SCF.xs:68: error: `mf' undeclared (first use in this function) SCF.xs: In function `XS_Bio__SCF_scf_free': SCF.xs:89: error: `Scf' undeclared (first use in this function) SCF.xs:89: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_get_comments': SCF.xs:95: error: `Scf' undeclared (first use in this function) SCF.xs:95: error: `scf_data' undeclared (first use in this function) SCF.xs:95: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_set_comments': SCF.xs:108: error: `Scf' undeclared (first use in this function) SCF.xs:108: error: `scf_data' undeclared (first use in this function) SCF.xs:108: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_scf_write': SCF.xs:121: error: `Scf' undeclared (first use in this function) SCF.xs:121: error: `scf_data' undeclared (first use in this function) SCF.xs:121: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_scf_fwrite': SCF.xs:135: error: `mFILE' undeclared (first use in this function) SCF.xs:135: error: `mf' undeclared (first use in this function) SCF.xs:137: error: `Scf' undeclared (first use in this function) SCF.xs:137: error: `scf_data' undeclared (first use in this function) SCF.xs:137: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_get_from_header': SCF.xs:159: error: `Scf' undeclared (first use in this function) SCF.xs:159: error: `scf_data' undeclared (first use in this function) SCF.xs:159: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_get_at': SCF.xs:186: error: `Scf' undeclared (first use in this function) SCF.xs:186: error: `scf_data' undeclared (first use in this function) SCF.xs:186: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_set_base_at': SCF.xs:242: error: `Scf' undeclared (first use in this function) SCF.xs:242: error: `scf_data' undeclared (first use in this function) SCF.xs:242: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_set_at': SCF.xs:255: error: `Scf' undeclared (first use in this function) SCF.xs:255: error: `scf_data' undeclared (first use in this function) SCF.xs:255: error: syntax error before ')' token make: *** [SCF.o] Error 1 -- -Neeti Even my blood says, B positive From neetisomaiya at gmail.com Wed Feb 21 03:19:14 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Wed, 21 Feb 2007 13:49:14 +0530 Subject: [Bioperl-l] need help in Bio-SCF Message-ID: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> Hi All, I downloaded module Bio-SCF-1.01from CPAN. And I am trying to install it when I got the following error. Can someone please guide me. [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL Checking if your kit is complete... Looks good Note (probably harmless): No library found for -lread Writing Makefile for Bio::SCF [root at ps2288 Bio-SCF-1.01]# make cp SCF.pm blib/lib/Bio/SCF.pm cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp -typemap /usr/lib/perl5/5.8.5/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc SCF.c Please specify prototyping behavior for SCF.xs (see perlxs manual) gcc -c -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386 -mtune=pentium4 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE" -DLITTLE_ENDIAN SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory SCF.xs:13:26: io_lib/mFILE.h: No such file or directory SCF.xs: In function `XS_Bio__SCF_get_scf_pointer': SCF.xs:27: error: `Scf' undeclared (first use in this function) SCF.xs:27: error: (Each undeclared identifier is reported only once SCF.xs:27: error: for each function it appears in.) SCF.xs:27: error: `scf_data' undeclared (first use in this function) SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer': SCF.xs:66: error: `Scf' undeclared (first use in this function) SCF.xs:66: error: `scf_data' undeclared (first use in this function) SCF.xs:68: error: `mFILE' undeclared (first use in this function) SCF.xs:68: error: `mf' undeclared (first use in this function) SCF.xs: In function `XS_Bio__SCF_scf_free': SCF.xs:89: error: `Scf' undeclared (first use in this function) SCF.xs:89: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_get_comments': SCF.xs:95: error: `Scf' undeclared (first use in this function) SCF.xs:95: error: `scf_data' undeclared (first use in this function) SCF.xs:95: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_set_comments': SCF.xs:108: error: `Scf' undeclared (first use in this function) SCF.xs:108: error: `scf_data' undeclared (first use in this function) SCF.xs:108: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_scf_write': SCF.xs:121: error: `Scf' undeclared (first use in this function) SCF.xs:121: error: `scf_data' undeclared (first use in this function) SCF.xs:121: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_scf_fwrite': SCF.xs:135: error: `mFILE' undeclared (first use in this function) SCF.xs:135: error: `mf' undeclared (first use in this function) SCF.xs:137: error: `Scf' undeclared (first use in this function) SCF.xs:137: error: `scf_data' undeclared (first use in this function) SCF.xs:137: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_get_from_header': SCF.xs:159: error: `Scf' undeclared (first use in this function) SCF.xs:159: error: `scf_data' undeclared (first use in this function) SCF.xs:159: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_get_at': SCF.xs:186: error: `Scf' undeclared (first use in this function) SCF.xs:186: error: `scf_data' undeclared (first use in this function) SCF.xs:186: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_set_base_at': SCF.xs:242: error: `Scf' undeclared (first use in this function) SCF.xs:242: error: `scf_data' undeclared (first use in this function) SCF.xs:242: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_set_at': SCF.xs:255: error: `Scf' undeclared (first use in this function) SCF.xs:255: error: `scf_data' undeclared (first use in this function) SCF.xs:255: error: syntax error before ')' token make: *** [SCF.o] Error 1 -- -Neeti Even my blood says, B positive From sdavis2 at mail.nih.gov Wed Feb 21 06:17:50 2007 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 21 Feb 2007 06:17:50 -0500 Subject: [Bioperl-l] need help in Bio-SCF In-Reply-To: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> Message-ID: <200702210617.50616.sdavis2@mail.nih.gov> On Wednesday 21 February 2007 03:19, neeti somaiya wrote: > Hi All, > > I downloaded module > Bio-SCF-1.01from CPAN. > And I am trying to install it when I got the following error. Can someone > please guide me. You will probably need to read the INSTALL document. You need to install a couple of libraries first. Looks like you don't have the staden io-lib installed. > [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL > Checking if your kit is complete... > Looks good > Note (probably harmless): No library found for -lread > Writing Makefile for Bio::SCF > > [root at ps2288 Bio-SCF-1.01]# make > cp SCF.pm blib/lib/Bio/SCF.pm > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm > /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp -typemap > /usr/lib/perl5/5.8.5/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc SCF.c > Please specify prototyping behavior for SCF.xs (see perlxs manual) > gcc -c -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386 > -mtune=pentium4 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC > "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE" -DLITTLE_ENDIAN > SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory > SCF.xs:13:26: io_lib/mFILE.h: No such file or directory > SCF.xs: In function `XS_Bio__SCF_get_scf_pointer': > SCF.xs:27: error: `Scf' undeclared (first use in this function) > SCF.xs:27: error: (Each undeclared identifier is reported only once > SCF.xs:27: error: for each function it appears in.) > SCF.xs:27: error: `scf_data' undeclared (first use in this function) > SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer': > SCF.xs:66: error: `Scf' undeclared (first use in this function) > SCF.xs:66: error: `scf_data' undeclared (first use in this function) > SCF.xs:68: error: `mFILE' undeclared (first use in this function) > SCF.xs:68: error: `mf' undeclared (first use in this function) > SCF.xs: In function `XS_Bio__SCF_scf_free': > SCF.xs:89: error: `Scf' undeclared (first use in this function) > SCF.xs:89: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_get_comments': > SCF.xs:95: error: `Scf' undeclared (first use in this function) > SCF.xs:95: error: `scf_data' undeclared (first use in this function) > SCF.xs:95: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_set_comments': > SCF.xs:108: error: `Scf' undeclared (first use in this function) > SCF.xs:108: error: `scf_data' undeclared (first use in this function) > SCF.xs:108: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_scf_write': > SCF.xs:121: error: `Scf' undeclared (first use in this function) > SCF.xs:121: error: `scf_data' undeclared (first use in this function) > SCF.xs:121: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_scf_fwrite': > SCF.xs:135: error: `mFILE' undeclared (first use in this function) > SCF.xs:135: error: `mf' undeclared (first use in this function) > SCF.xs:137: error: `Scf' undeclared (first use in this function) > SCF.xs:137: error: `scf_data' undeclared (first use in this function) > SCF.xs:137: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_get_from_header': > SCF.xs:159: error: `Scf' undeclared (first use in this function) > SCF.xs:159: error: `scf_data' undeclared (first use in this function) > SCF.xs:159: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_get_at': > SCF.xs:186: error: `Scf' undeclared (first use in this function) > SCF.xs:186: error: `scf_data' undeclared (first use in this function) > SCF.xs:186: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_set_base_at': > SCF.xs:242: error: `Scf' undeclared (first use in this function) > SCF.xs:242: error: `scf_data' undeclared (first use in this function) > SCF.xs:242: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_set_at': > SCF.xs:255: error: `Scf' undeclared (first use in this function) > SCF.xs:255: error: `scf_data' undeclared (first use in this function) > SCF.xs:255: error: syntax error before ')' token > make: *** [SCF.o] Error 1 From sdavis2 at mail.nih.gov Wed Feb 21 06:17:50 2007 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 21 Feb 2007 06:17:50 -0500 Subject: [Bioperl-l] need help in Bio-SCF In-Reply-To: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> Message-ID: <200702210617.50616.sdavis2@mail.nih.gov> On Wednesday 21 February 2007 03:19, neeti somaiya wrote: > Hi All, > > I downloaded module > Bio-SCF-1.01from CPAN. > And I am trying to install it when I got the following error. Can someone > please guide me. You will probably need to read the INSTALL document. You need to install a couple of libraries first. Looks like you don't have the staden io-lib installed. > [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL > Checking if your kit is complete... > Looks good > Note (probably harmless): No library found for -lread > Writing Makefile for Bio::SCF > > [root at ps2288 Bio-SCF-1.01]# make > cp SCF.pm blib/lib/Bio/SCF.pm > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm > /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp -typemap > /usr/lib/perl5/5.8.5/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc SCF.c > Please specify prototyping behavior for SCF.xs (see perlxs manual) > gcc -c -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386 > -mtune=pentium4 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC > "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE" -DLITTLE_ENDIAN > SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory > SCF.xs:13:26: io_lib/mFILE.h: No such file or directory > SCF.xs: In function `XS_Bio__SCF_get_scf_pointer': > SCF.xs:27: error: `Scf' undeclared (first use in this function) > SCF.xs:27: error: (Each undeclared identifier is reported only once > SCF.xs:27: error: for each function it appears in.) > SCF.xs:27: error: `scf_data' undeclared (first use in this function) > SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer': > SCF.xs:66: error: `Scf' undeclared (first use in this function) > SCF.xs:66: error: `scf_data' undeclared (first use in this function) > SCF.xs:68: error: `mFILE' undeclared (first use in this function) > SCF.xs:68: error: `mf' undeclared (first use in this function) > SCF.xs: In function `XS_Bio__SCF_scf_free': > SCF.xs:89: error: `Scf' undeclared (first use in this function) > SCF.xs:89: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_get_comments': > SCF.xs:95: error: `Scf' undeclared (first use in this function) > SCF.xs:95: error: `scf_data' undeclared (first use in this function) > SCF.xs:95: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_set_comments': > SCF.xs:108: error: `Scf' undeclared (first use in this function) > SCF.xs:108: error: `scf_data' undeclared (first use in this function) > SCF.xs:108: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_scf_write': > SCF.xs:121: error: `Scf' undeclared (first use in this function) > SCF.xs:121: error: `scf_data' undeclared (first use in this function) > SCF.xs:121: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_scf_fwrite': > SCF.xs:135: error: `mFILE' undeclared (first use in this function) > SCF.xs:135: error: `mf' undeclared (first use in this function) > SCF.xs:137: error: `Scf' undeclared (first use in this function) > SCF.xs:137: error: `scf_data' undeclared (first use in this function) > SCF.xs:137: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_get_from_header': > SCF.xs:159: error: `Scf' undeclared (first use in this function) > SCF.xs:159: error: `scf_data' undeclared (first use in this function) > SCF.xs:159: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_get_at': > SCF.xs:186: error: `Scf' undeclared (first use in this function) > SCF.xs:186: error: `scf_data' undeclared (first use in this function) > SCF.xs:186: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_set_base_at': > SCF.xs:242: error: `Scf' undeclared (first use in this function) > SCF.xs:242: error: `scf_data' undeclared (first use in this function) > SCF.xs:242: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_set_at': > SCF.xs:255: error: `Scf' undeclared (first use in this function) > SCF.xs:255: error: `scf_data' undeclared (first use in this function) > SCF.xs:255: error: syntax error before ')' token > make: *** [SCF.o] Error 1 From cjfields at uiuc.edu Wed Feb 21 07:08:57 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 21 Feb 2007 06:08:57 -0600 Subject: [Bioperl-l] need help in Bio-SCF In-Reply-To: <200702210617.50616.sdavis2@mail.nih.gov> References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> <200702210617.50616.sdavis2@mail.nih.gov> Message-ID: <40C288FE-C74C-4B3F-A835-1A5C563B2B8E@uiuc.edu> On Feb 21, 2007, at 5:17 AM, Sean Davis wrote: > On Wednesday 21 February 2007 03:19, neeti somaiya wrote: >> Hi All, >> >> I downloaded module >> Bio-SCF-1.01from CPAN. >> And I am trying to install it when I got the following error. Can >> someone >> please guide me. > > You will probably need to read the INSTALL document. You need to > install a > couple of libraries first. Looks like you don't have the staden io- > lib > installed. Just to note, this module isn't part of BioPerl (I don't even think it has a Bioperl interface). You'll probably need to contact Lincoln for details on using this module. One thing you may run into is errors with the version of io_lib installed (a problem I've encountered with bioperl-ext), probably from API changes. If you run into problems with newer versions of io_lib you should try downgrading to io_lib 1.8.11 or 1.8.12. From neetisomaiya at gmail.com Wed Feb 21 07:25:26 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Wed, 21 Feb 2007 17:55:26 +0530 Subject: [Bioperl-l] need help in Bio-SCF In-Reply-To: <200702210617.50616.sdavis2@mail.nih.gov> References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> <200702210617.50616.sdavis2@mail.nih.gov> Message-ID: <764978cf0702210425j544330bbr12c86a89960dbb66@mail.gmail.com> Thanks. It resolved my problem. On 2/21/07, Sean Davis wrote: > > On Wednesday 21 February 2007 03:19, neeti somaiya wrote: > > Hi All, > > > > I downloaded module > > Bio-SCF-1.01from CPAN. > > And I am trying to install it when I got the following error. Can > someone > > please guide me. > > You will probably need to read the INSTALL document. You need to install > a > couple of libraries first. Looks like you don't have the staden io-lib > installed. > > > > [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL > > Checking if your kit is complete... > > Looks good > > Note (probably harmless): No library found for -lread > > Writing Makefile for Bio::SCF > > > > [root at ps2288 Bio-SCF-1.01]# make > > cp SCF.pm blib/lib/Bio/SCF.pm > > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm > > /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp -typemap > > /usr/lib/perl5/5.8.5/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc > SCF.c > > Please specify prototyping behavior for SCF.xs (see perlxs manual) > > gcc -c -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING > > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE > > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386 > > -mtune=pentium4 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC > > "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE" -DLITTLE_ENDIAN > > SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory > > SCF.xs:13:26: io_lib/mFILE.h: No such file or directory > > SCF.xs: In function `XS_Bio__SCF_get_scf_pointer': > > SCF.xs:27: error: `Scf' undeclared (first use in this function) > > SCF.xs:27: error: (Each undeclared identifier is reported only once > > SCF.xs:27: error: for each function it appears in.) > > SCF.xs:27: error: `scf_data' undeclared (first use in this function) > > SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer': > > SCF.xs:66: error: `Scf' undeclared (first use in this function) > > SCF.xs:66: error: `scf_data' undeclared (first use in this function) > > SCF.xs:68: error: `mFILE' undeclared (first use in this function) > > SCF.xs:68: error: `mf' undeclared (first use in this function) > > SCF.xs: In function `XS_Bio__SCF_scf_free': > > SCF.xs:89: error: `Scf' undeclared (first use in this function) > > SCF.xs:89: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_get_comments': > > SCF.xs:95: error: `Scf' undeclared (first use in this function) > > SCF.xs:95: error: `scf_data' undeclared (first use in this function) > > SCF.xs:95: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_set_comments': > > SCF.xs:108: error: `Scf' undeclared (first use in this function) > > SCF.xs:108: error: `scf_data' undeclared (first use in this function) > > SCF.xs:108: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_scf_write': > > SCF.xs:121: error: `Scf' undeclared (first use in this function) > > SCF.xs:121: error: `scf_data' undeclared (first use in this function) > > SCF.xs:121: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_scf_fwrite': > > SCF.xs:135: error: `mFILE' undeclared (first use in this function) > > SCF.xs:135: error: `mf' undeclared (first use in this function) > > SCF.xs:137: error: `Scf' undeclared (first use in this function) > > SCF.xs:137: error: `scf_data' undeclared (first use in this function) > > SCF.xs:137: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_get_from_header': > > SCF.xs:159: error: `Scf' undeclared (first use in this function) > > SCF.xs:159: error: `scf_data' undeclared (first use in this function) > > SCF.xs:159: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_get_at': > > SCF.xs:186: error: `Scf' undeclared (first use in this function) > > SCF.xs:186: error: `scf_data' undeclared (first use in this function) > > SCF.xs:186: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_set_base_at': > > SCF.xs:242: error: `Scf' undeclared (first use in this function) > > SCF.xs:242: error: `scf_data' undeclared (first use in this function) > > SCF.xs:242: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_set_at': > > SCF.xs:255: error: `Scf' undeclared (first use in this function) > > SCF.xs:255: error: `scf_data' undeclared (first use in this function) > > SCF.xs:255: error: syntax error before ')' token > > make: *** [SCF.o] Error 1 > -- -Neeti Even my blood says, B positive From neetisomaiya at gmail.com Wed Feb 21 07:25:26 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Wed, 21 Feb 2007 17:55:26 +0530 Subject: [Bioperl-l] need help in Bio-SCF In-Reply-To: <200702210617.50616.sdavis2@mail.nih.gov> References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> <200702210617.50616.sdavis2@mail.nih.gov> Message-ID: <764978cf0702210425j544330bbr12c86a89960dbb66@mail.gmail.com> Thanks. It resolved my problem. On 2/21/07, Sean Davis wrote: > > On Wednesday 21 February 2007 03:19, neeti somaiya wrote: > > Hi All, > > > > I downloaded module > > Bio-SCF-1.01from CPAN. > > And I am trying to install it when I got the following error. Can > someone > > please guide me. > > You will probably need to read the INSTALL document. You need to install > a > couple of libraries first. Looks like you don't have the staden io-lib > installed. > > > > [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL > > Checking if your kit is complete... > > Looks good > > Note (probably harmless): No library found for -lread > > Writing Makefile for Bio::SCF > > > > [root at ps2288 Bio-SCF-1.01]# make > > cp SCF.pm blib/lib/Bio/SCF.pm > > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm > > /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp -typemap > > /usr/lib/perl5/5.8.5/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc > SCF.c > > Please specify prototyping behavior for SCF.xs (see perlxs manual) > > gcc -c -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING > > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE > > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386 > > -mtune=pentium4 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC > > "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE" -DLITTLE_ENDIAN > > SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory > > SCF.xs:13:26: io_lib/mFILE.h: No such file or directory > > SCF.xs: In function `XS_Bio__SCF_get_scf_pointer': > > SCF.xs:27: error: `Scf' undeclared (first use in this function) > > SCF.xs:27: error: (Each undeclared identifier is reported only once > > SCF.xs:27: error: for each function it appears in.) > > SCF.xs:27: error: `scf_data' undeclared (first use in this function) > > SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer': > > SCF.xs:66: error: `Scf' undeclared (first use in this function) > > SCF.xs:66: error: `scf_data' undeclared (first use in this function) > > SCF.xs:68: error: `mFILE' undeclared (first use in this function) > > SCF.xs:68: error: `mf' undeclared (first use in this function) > > SCF.xs: In function `XS_Bio__SCF_scf_free': > > SCF.xs:89: error: `Scf' undeclared (first use in this function) > > SCF.xs:89: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_get_comments': > > SCF.xs:95: error: `Scf' undeclared (first use in this function) > > SCF.xs:95: error: `scf_data' undeclared (first use in this function) > > SCF.xs:95: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_set_comments': > > SCF.xs:108: error: `Scf' undeclared (first use in this function) > > SCF.xs:108: error: `scf_data' undeclared (first use in this function) > > SCF.xs:108: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_scf_write': > > SCF.xs:121: error: `Scf' undeclared (first use in this function) > > SCF.xs:121: error: `scf_data' undeclared (first use in this function) > > SCF.xs:121: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_scf_fwrite': > > SCF.xs:135: error: `mFILE' undeclared (first use in this function) > > SCF.xs:135: error: `mf' undeclared (first use in this function) > > SCF.xs:137: error: `Scf' undeclared (first use in this function) > > SCF.xs:137: error: `scf_data' undeclared (first use in this function) > > SCF.xs:137: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_get_from_header': > > SCF.xs:159: error: `Scf' undeclared (first use in this function) > > SCF.xs:159: error: `scf_data' undeclared (first use in this function) > > SCF.xs:159: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_get_at': > > SCF.xs:186: error: `Scf' undeclared (first use in this function) > > SCF.xs:186: error: `scf_data' undeclared (first use in this function) > > SCF.xs:186: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_set_base_at': > > SCF.xs:242: error: `Scf' undeclared (first use in this function) > > SCF.xs:242: error: `scf_data' undeclared (first use in this function) > > SCF.xs:242: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_set_at': > > SCF.xs:255: error: `Scf' undeclared (first use in this function) > > SCF.xs:255: error: `scf_data' undeclared (first use in this function) > > SCF.xs:255: error: syntax error before ')' token > > make: *** [SCF.o] Error 1 > -- -Neeti Even my blood says, B positive From jay at jays.net Tue Feb 20 19:27:01 2007 From: jay at jays.net (Jay Hannah) Date: Tue, 20 Feb 2007 18:27:01 -0600 Subject: [Bioperl-l] Alignment In-Reply-To: <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com> References: <188661178021328@lycos-europe.com> <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com> Message-ID: > On 2/20/07, marian thieme wrote: >> I have a series of sequences which should be aligned against a >> reference sequence. >> In this special case we dont need to calculate anything, we only need >> to represent the sequences and get for instance some columns of >> interest. >> The problem now is, that some sequences have gaps and we need to >> represent gaps in the rewference sequence as well as in some >> individual sequences. On Feb 20, 2007, at 9:30 AM, Albert Vilella wrote: > I think the SimpleAlign object contains a set of sequences, each of > which is a LocatableSeq object. Fascinating. In my BLAST-centric universe I went and rolled my own solution for SeqLab where I hold onto the Bio::Seq from the reference sequences and then hold onto the Bio::Search::HSP::GenericHSP objects for all my BLAST hits. From that dataset I can write whatever reports I want and/or perform any subsequent actions. I wonder if I should have done that differently... What typically creates .pfam files? j seqlab.net http://www.bioperl.org/wiki/User:Jhannah From cjfields at uiuc.edu Wed Feb 21 08:36:02 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 21 Feb 2007 07:36:02 -0600 Subject: [Bioperl-l] Alignment In-Reply-To: References: <188661178021328@lycos-europe.com> <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com> Message-ID: <2233F0EE-94FE-42F0-B8E5-1BE14A25C0D4@uiuc.edu> On Feb 20, 2007, at 6:27 PM, Jay Hannah wrote: ... > > On Feb 20, 2007, at 9:30 AM, Albert Vilella wrote: >> I think the SimpleAlign object contains a set of sequences, each of >> which is a LocatableSeq object. > > Fascinating. In my BLAST-centric universe I went and rolled my own > solution for SeqLab where I hold onto the Bio::Seq from the reference > sequences and then hold onto the Bio::Search::HSP::GenericHSP objects > for all my BLAST hits. From that dataset I can write whatever > reports I > want and/or perform any subsequent actions. I wonder if I should have > done that differently... > > What typically creates .pfam files? > > j > seqlab.net > http://www.bioperl.org/wiki/User:Jhannah Pfam alignments come in two formats (pfam and stockholm) that can both be parsed into SimpleAlign objects via Bio::AlignIO: my $alnin = Bio::AlignIO->new(-format => 'stockholm', -file => 'dho.sto'); while (my $aln = $alnin->next_aln) { # do stuff to $aln SimpleAlign } Personally I stick with Stockholm as it's a richer format (with annotations and so on), but the parser was rewritten recently (by moi!) so may have some bugs still. I'm a bit confused as to what you do with BLAST files. You can generate a SimpleAlign right from the HSP for most SearchIO parsers: http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods chris From sanjib at bic.boseinst.ernet.in Wed Feb 21 01:12:06 2007 From: sanjib at bic.boseinst.ernet.in (Sanjib Kumar Gupta) Date: Wed, 21 Feb 2007 11:42:06 +0530 Subject: [Bioperl-l] help on remote blast In-Reply-To: <20070220073200.M42567@bic.boseinst.ernet.in> References: <20070220073200.M42567@bic.boseinst.ernet.in> Message-ID: <20070221061206.M37845@bic.boseinst.ernet.in> Hi I have been running this script for some time and it was running fine. I am using this linux machine with live IP(no proxy). But suudenly it has stopped working with this errors waiting...waiting... -------------------- WARNING --------------------- MSG: An Error Occurred

An Error Occurred

500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad hostname 'www.ncbi.nlm.nih.gov') --------------------------------------------------- xx.pep -------------------- WARNING --------------------- MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5 Content-Length: 497 Content-Type: application/x-www-form-urlencoded DATABASE=nr&QUERY=%3ENM_005277.3%3B_1414961_1+837% 0AMEENMEEGQTQKGCFECCIKCLGGIPYASLIATILLYAGVALFCGCGHEALSGTVNILQTYFEMARTAGDTLDVF TMIDIFKYVIYGIAAAFFVYGILLMVEGFFTTGAIKDLYGDFKITTCGRCVSAWFIMLTYLFMLAWLGVTAFTSLPV YMYFNLWTICRNTTLVEGANLCLDLRQFGIVTIGEEKKICTVSENFLRMCESTELNMTFHLFIVALAGAGAAVIAMV HYLMVLSANWAYVKDACRMQKYEDIKSKEEQELHDIHSTRSKERLNAYT*&COMPOSITION_BASED_STATISTI CS=off&EXPECT=1e- 10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&MATRIX_NAME=BLOSUM62&ENTREZ_ QUERY=Xenopus+laevis[Organism]&PROGRAM=blastp An Error Occurred

An Error Occurred

500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad hostname 'www.ncbi.nlm.nih.gov') --------------------------------------------------- waiting...waiting... -------------------- WARNING --------------------- MSG: An Error Occurred

An Error Occurred

500 Internal Server Error --------------------------------------------------- Though I am able to see the ncbi page from browser but am unable to ping ot trace route to the server. Please help me. -- Sanjib Kumar Gupta Bioinformatics Centre Bose Institute Kolkata 700054, INDIA Phone : +91-33-2355 6626, 2816, 2355 4766 Fax : +91-33-2355 3886 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: n9.pl Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070221/5a3382d6/attachment.pl From granjeau at tagc.univ-mrs.fr Wed Feb 21 08:50:39 2007 From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137) Date: Wed, 21 Feb 2007 14:50:39 +0100 Subject: [Bioperl-l] Adding empty member list in Bio::Cluster::SequenceFamily Message-ID: <45DC4E2F.4060804@tagc.univ-mrs.fr> Hello! Not clear to me, but I find a work around by checking for empty list before adding, here is what I noticed. Adding as members an empty list () is not the same as adding a reference to an empty list [], of course, but could be thought to be the same. Calling get_members, for the second case, I got a list of 0 member, but in the first case I got of 1 member, which is not an object at all. I am warned now, but may be the documentation should emphasize on using by the reference call. Best regards, --Samuel use Bio::Cluster::SequenceFamily; $f = new Bio::Cluster::SequenceFamily( -id => 'aa' ); $f->add_members( () ); print scalar $f->get_members(); # 1 $g = new Bio::Cluster::SequenceFamily( -id => 'aa' ); $g->add_members( [] ); print scalar $g->get_members(); # 0 From stephen.marshall at novartis.com Wed Feb 21 12:01:00 2007 From: stephen.marshall at novartis.com (stephen.marshall at novartis.com) Date: Wed, 21 Feb 2007 12:01:00 -0500 Subject: [Bioperl-l] Parsing kegg files Message-ID: Hello I"m trying to parse a Kegg file and I can't seem to get at the pathway information... Here's a snippet of my code. I only see dblink and description as annotation use Bio::SeqIO; my $stream = Bio::SeqIO->new(-file => $filename, -format => 'KEGG'); while ( my $seq = $stream->next_seq() ) { # do something with $seq my $id = $seq->display_id(); print "$id:"; my $ann = $seq->annotation(); foreach my $key ( $ann->get_all_annotation_keys() ) { my @values = $ann->get_Annotations($key); foreach my $value ( @values ) { print "Annotation: ",$key," value: ",$value->as_text,"\n"; } } } _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From prateek.vit at gmail.com Wed Feb 21 12:40:25 2007 From: prateek.vit at gmail.com (prateek singh yadav) Date: Wed, 21 Feb 2007 23:10:25 +0530 Subject: [Bioperl-l] Problem in BioPerl Installation Message-ID: Hello all, I was trying to install Bioperl on my redhat linux (EL) using CPAN. but CPAN shows this problem. [root at HX342SBC054 Desktop]# cpan Terminal does not support AddHistory. cpan shell -- CPAN exploration and modules installation (v1.7601) ReadLine support available (try 'install Bundle::CPAN') cpan> get bioperl CPAN: Storable loaded ok Going to read /root/.cpan/Metadata Warning: Found only 25 objects in /root/.cpan/Metadata Going to read /root/.cpan/sources/authors/01mailrc.txt.gz Going to read /root/.cpan/sources/modules/02packages.details.txt.gz Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not contain a Line-Count header. Please check the validity of the index file by comparing it to more than one CPAN mirror. I'll continue but problems seem likely to happen. Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not contain a Last-Updated header. Please check the validity of the index file by comparing it to more than one CPAN mirror. I'll continue but problems seem likely to happen. Going to read /root/.cpan/sources/modules/03modlist.data.gz Can't locate object method "data" via package "CPAN::Modulelist" (perhaps you forgot to load "CPAN::Modulelist"?) at (eval 13) line 1. at /usr/lib/perl5/5.8.5/CPAN.pm line 3406 CPAN::Index::rd_modlist('CPAN::Index', '/root/.cpan/sources/modules/03modlist.data.gz') called at /usr/lib/perl5/5.8.5/CPAN.pm line 3129 CPAN::Index::reload('CPAN::Index') called at /usr/lib/perl5/5.8.5/CPAN.pm line 675 CPAN::exists('CPAN=HASH(0x8548f20)', 'CPAN::Module', 'bioperl') called at /usr/lib/perl5/5.8.5/CPAN.pm line 1842 CPAN::Shell::expandany('CPAN::Shell', 'bioperl') called at /usr/lib/perl5/5.8.5/CPAN.pm line 2078 CPAN::Shell::rematein('CPAN::Shell', 'get', 'bioperl') called at /usr/lib/perl5/5.8.5/CPAN.pm line 2157 CPAN::Shell::get('CPAN::Shell', 'bioperl') called at /usr/lib/perl5/5.8.5/CPAN.pm line 201 eval {...} called at /usr/lib/perl5/5.8.5/CPAN.pm line 201 CPAN::shell() called at /usr/bin/cpan line 193 cpan> Can anyone give me direction how to configure cpan again or how to install BioPerl on linux with its complete dependencies. Because I think I have a problem in CPAN configuration. Regards, Prateek -- Prateek Singh 3rd year Bioinformatics(BTech) Vellore Institute Of Technology Vellore-632014 prateek.vit at gmail.com From bosborne11 at verizon.net Wed Feb 21 12:29:40 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 21 Feb 2007 12:29:40 -0500 Subject: [Bioperl-l] Parsing kegg files In-Reply-To: Message-ID: Stephen, I don't know what your eventual goals are but you might want to take a look at bioperl-network. However, there are problems with this package. One, it only parses DIP tab-delimited and PSI-MI and it does this last one only partially (you will get the graph though). Two, it seems to have only a single developer interested in it, that's me, and few users. In my Bioperl experience projects like this tend to fade away. http://www.bioperl.org/wiki/Network_package Brian O. On 2/21/07 12:01 PM, "stephen.marshall at novartis.com" wrote: > Hello > I"m trying to parse a Kegg file and I can't seem to get at the pathway > information... Here's a snippet of my code. I only see dblink and > description as annotation > > use Bio::SeqIO; > > my $stream = Bio::SeqIO->new(-file => $filename, -format => 'KEGG'); > > while ( my $seq = $stream->next_seq() ) { > # do something with $seq > my $id = $seq->display_id(); > print "$id:"; > my $ann = $seq->annotation(); > foreach my $key ( $ann->get_all_annotation_keys() ) { > my @values = $ann->get_Annotations($key); > foreach my $value ( @values ) { > print "Annotation: ",$key," value: > ",$value->as_text,"\n"; > } > } > > } > _________________________ > > CONFIDENTIALITY NOTICE > > The information contained in this e-mail message is intended only for the > exclusive use of the individual or entity named above and may contain > information that is privileged, confidential or exempt from disclosure > under applicable law. If the reader of this message is not the intended > recipient, or the employee or agent responsible for delivery of the > message to the intended recipient, you are hereby notified that any > dissemination, distribution or copying of this communication is strictly > prohibited. If you have received this communication in error, please > notify the sender immediately by e-mail and delete the material from any > computer. Thank you. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From arareko at campus.iztacala.unam.mx Wed Feb 21 13:18:37 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Wed, 21 Feb 2007 12:18:37 -0600 Subject: [Bioperl-l] Problem in BioPerl Installation In-Reply-To: References: Message-ID: <45DC8CFD.1060108@campus.iztacala.unam.mx> You can always rebuild your CPAN configuration by deleting the existing .cpan/ directory in root's $HOME dir (quick & dirty trick), then invoke CPAN again from root's shell to rebuild the config: # perl -MCPAN -e shell Hope this helps. Regards, Mauricio. prateek singh yadav wrote: > Hello all, > > I was trying to install Bioperl on my redhat linux (EL) using CPAN. but CPAN > shows this problem. > > > [root at HX342SBC054 Desktop]# cpan > Terminal does not support AddHistory. > > cpan shell -- CPAN exploration and modules installation (v1.7601) > ReadLine support available (try 'install Bundle::CPAN') > > cpan> get bioperl > CPAN: Storable loaded ok > Going to read /root/.cpan/Metadata > Warning: Found only 25 objects in /root/.cpan/Metadata > Going to read /root/.cpan/sources/authors/01mailrc.txt.gz > Going to read /root/.cpan/sources/modules/02packages.details.txt.gz > Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not > contain a Line-Count header. > Please check the validity of the index file by comparing it to more > than one CPAN mirror. I'll continue but problems seem likely to > happen. > Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not > contain a Last-Updated header. > Please check the validity of the index file by comparing it to more > than one CPAN mirror. I'll continue but problems seem likely to > happen. > Going to read /root/.cpan/sources/modules/03modlist.data.gz > Can't locate object method "data" via package "CPAN::Modulelist" (perhaps > you forgot to load "CPAN::Modulelist"?) at (eval 13) line 1. > at /usr/lib/perl5/5.8.5/CPAN.pm line 3406 > CPAN::Index::rd_modlist('CPAN::Index', > '/root/.cpan/sources/modules/03modlist.data.gz') called at > /usr/lib/perl5/5.8.5/CPAN.pm line 3129 > CPAN::Index::reload('CPAN::Index') called at > /usr/lib/perl5/5.8.5/CPAN.pm line 675 > CPAN::exists('CPAN=HASH(0x8548f20)', 'CPAN::Module', 'bioperl') > called at /usr/lib/perl5/5.8.5/CPAN.pm line 1842 > CPAN::Shell::expandany('CPAN::Shell', 'bioperl') called at > /usr/lib/perl5/5.8.5/CPAN.pm line 2078 > CPAN::Shell::rematein('CPAN::Shell', 'get', 'bioperl') called at > /usr/lib/perl5/5.8.5/CPAN.pm line 2157 > CPAN::Shell::get('CPAN::Shell', 'bioperl') called at > /usr/lib/perl5/5.8.5/CPAN.pm line 201 > eval {...} called at /usr/lib/perl5/5.8.5/CPAN.pm line 201 > CPAN::shell() called at /usr/bin/cpan line 193 > > cpan> > > Can anyone give me direction how to configure cpan again or how to install > BioPerl on linux with its complete dependencies. Because I think I have a > problem in CPAN configuration. > > Regards, > Prateek > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From hlapp at gmx.net Wed Feb 21 13:33:17 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 21 Feb 2007 13:33:17 -0500 Subject: [Bioperl-l] Adding empty member list in Bio::Cluster::SequenceFamily In-Reply-To: <45DC4E2F.4060804@tagc.univ-mrs.fr> References: <45DC4E2F.4060804@tagc.univ-mrs.fr> Message-ID: <5B31EEBD-FFE5-4A0F-BB05-DF7297103BBD@gmx.net> Fixed in CVS HEAD. -hilmar On Feb 21, 2007, at 8:50 AM, Samuel GRANJEAUD - IR/IFR137 wrote: > Hello! > > Not clear to me, but I find a work around by checking for empty list > before adding, here is what I noticed. Adding as members an empty list > () is not the same as adding a reference to an empty list [], of > course, > but could be thought to be the same. Calling get_members, for the > second > case, I got a list of 0 member, but in the first case I got of 1 > member, > which is not an object at all. I am warned now, but may be the > documentation should emphasize on using by the reference call. > > Best regards, > --Samuel > > > use Bio::Cluster::SequenceFamily; > > $f = new Bio::Cluster::SequenceFamily( -id => 'aa' ); > $f->add_members( () ); > print scalar $f->get_members(); > # 1 > $g = new Bio::Cluster::SequenceFamily( -id => 'aa' ); > $g->add_members( [] ); > print scalar $g->get_members(); > # 0 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Wed Feb 21 14:12:57 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 21 Feb 2007 13:12:57 -0600 Subject: [Bioperl-l] GenBank accession bug? Message-ID: <11B83C2C-7BFE-48B5-A20C-7EAF282F39BD@uiuc.edu> Dmitry, I'm forwarding this to the mail list. In the future please post/ respond to the regular mail list so other BioPerl developers/users can comment. You'll get feedback much faster here (and maybe even some support!). The issue at hand is whether we can support GenBank accessions/ display_id/version with your naming scheme. My feeling is that support for nonalphanumerics was removed to be compliant with the GenBank standard for accessions, though I may be wrong. Maybe someone who was around during bioperl 1.2 can elaborate more? From http://bugzilla.open-bio.org/show_bug.cgi?id=2214 -------------------------------------------------- .... Thanks for verbose explanation. It seems that I would need to apply my local patches to the BioPerl module(s). With BioPerl-1.2 there was no problem with '-' in sequence names. The problem is that in the project we participate (Vizier project) following sequence name convention was adopted: VZ##-(or)-<$$> VZ Stands for Vizier ## Your 2-digits Partner ID within the VIZIER consortium Virus name according to the ICTV nomenclature; , If sequence has not been assigned a GenBank LOCUS ID, available strain designation, short as possible, should be used <$$> Unique 2-digits number on your discretion to label sequence variant -------------------------------------------------- chris From gabriel.cardona at uib.es Thu Feb 22 04:33:14 2007 From: gabriel.cardona at uib.es (gcardona) Date: Thu, 22 Feb 2007 01:33:14 -0800 (PST) Subject: [Bioperl-l] bioperl-1.5.2_100 ... not found Message-ID: <9096740.post@talk.nabble.com> Hello, I am trying to install Bioperl on a Windows system, following the installation notes in http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows When the Perl Package Manager tries to download bioperl-1.5.2_100, it cannot find the package and answers: Downloading bioperl-1.5.2_100 ... not found I've looked the contents of http://bioperl.org/DIST and in package.xml the version for bioperl is bioperl-1.5.2_100, but in that folder the available version is bioperl-1.5.2_102 Is this a bug? or should I download and install manually? Thank you in advance, Gabriel Cardona -- View this message in context: http://www.nabble.com/bioperl-1.5.2_100-...-not-found-tf3271747.html#a9096740 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From bix at sendu.me.uk Thu Feb 22 07:35:14 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 22 Feb 2007 12:35:14 +0000 Subject: [Bioperl-l] bioperl-1.5.2_100 ... not found In-Reply-To: <9096740.post@talk.nabble.com> References: <9096740.post@talk.nabble.com> Message-ID: <45DD8E02.1070404@sendu.me.uk> gcardona wrote: > Hello, > > I am trying to install Bioperl on a Windows system, following the > installation notes in > http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows > When the Perl Package Manager tries to download bioperl-1.5.2_100, it cannot > find the package and answers: > Downloading bioperl-1.5.2_100 ... not found > > I've looked the contents of > http://bioperl.org/DIST > and in package.xml the version for bioperl is bioperl-1.5.2_100, but in that > folder the available version is bioperl-1.5.2_102 > Is this a bug? or should I download and install manually? Sorry, my mistake. I accidentally moved the ppm to a different folder. It should work now though. I may make a 1.5.2_102 ppm at some point, but there are no relevant differences between _102 and _100 as far as Windows users are concerned. From enrique_rulz at yahoo.com Thu Feb 22 15:41:37 2007 From: enrique_rulz at yahoo.com (Kurt Gobain) Date: Thu, 22 Feb 2007 12:41:37 -0800 (PST) Subject: [Bioperl-l] Sequence matching problem! Message-ID: <9107936.post@talk.nabble.com> Hi every1.. I m facing a great deal of problem in simple pattern matching between sequence & a pattern ..Program shod be designed such a way that it shod be able do two things 1) normal matching...For eg: GATCAAT....if TC is entered... output shod be 2...2) matching using spl character..In same example if C*T value is entered It shod give o/p as 3 & seq to b displayed is CAAT..I m easily getting 1st part...But in 2nd part Its giving sum problem..output I m gettin as 1 instead of 3...Code is really simple! #!/usr/bin/perl $alphabet = "GATCAAT"; $pattern= "C*T "; $alphabet =~ /($pattern)/i; print "The entire '$pattern' match began at $-[0] and ended at $+[0]\n"; ==================== OUTPUT! The entire C*T match began at 1 and ended at 2 ==================== but the o/p shod be 3???? & Is there n e chance I can get seq too..I mean instead of C*T'' i need 'CAAT'...???? Well..Its not compulsion to use regex....But I find it quite simple..can there be n e other method?? Thanx in advance! Kurt! -- View this message in context: http://www.nabble.com/Sequence-matching-problem%21-tf3275153.html#a9107936 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at uiuc.edu Thu Feb 22 16:01:03 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 22 Feb 2007 15:01:03 -0600 Subject: [Bioperl-l] GenBank accession bug? In-Reply-To: <51879.10.0.7.57.1172176272.squirrel@gscmail.wustl.edu> References: <11B83C2C-7BFE-48B5-A20C-7EAF282F39BD@uiuc.edu> <51879.10.0.7.57.1172176272.squirrel@gscmail.wustl.edu> Message-ID: <028E16D7-036A-44DA-BECD-F910BEA58E53@uiuc.edu> On Feb 22, 2007, at 2:31 PM, dmessina at watson.wustl.edu wrote: >> The issue at hand is whether we can support GenBank accessions/ >> display_id/version with your naming scheme. > > Chris, I'm a little unsure of what you're saying here (which might > mean > that you're already saying what I'm about to...say). Do you mean it > might > be tricky to support both the Genbank standard and Dmitry's > simultaneously? > > I would argue any arbitrary ID should be supported as long as that > ID is a > contiguous non-space word (\S+). > > Actually the existing accession regex looks like it already > supports IDs > with '-': > > /^ACCESSION\s+(\S.*\S)/ > > It's only the version regex which doesn't (\w doesn't include '-'): > > /^\w+\.(\d+)/ > > > Anyone else have thoughts or comments on this? Off the top of my > head, I > can't think of any issues that might arise from doing so (apart from > having to modify all of the SeqIO modules to support it). > > Dave You're right; the argument comes down simply to whether we would support \S+ or just \w+. I'm neutral on this myself, but I wonder how allowing \S+ would affect other modules (for instance, indexing for a flat db), where one might just use \w+ for accessions, expecting them to be GenBank- or EMBL-like alphanumerics. The fact that \S+ was supported in the past (as indicated in the bug report) and then wasn't post 1.2 makes me think there was a reason for someone going in and modifying it, but that was before my time on the group. I'll have a look at the CVS history when I have time to see what I can dig up. chris From mkiwala at watson.wustl.edu Thu Feb 22 15:36:33 2007 From: mkiwala at watson.wustl.edu (Michael Kiwala) Date: Thu, 22 Feb 2007 14:36:33 -0600 Subject: [Bioperl-l] Bio::AnnotatableI vs Bio::FeatureHolderI Message-ID: <45DDFED1.1090503@watson.wustl.edu> Are Bio::AnnotatableI and Bio::FeatureHolderI competing interfaces? I get the impression they are designed to do similar things. If so is one deprecated and the other preferred? If their responsibilities are orthogonal to each other, what sorts of tasks are suited to each? Thanks, Michael From dmessina at wustl.edu Thu Feb 22 15:53:01 2007 From: dmessina at wustl.edu (Dave Messina) Date: Thu, 22 Feb 2007 14:53:01 -0600 (CST) Subject: [Bioperl-l] GenBank accession bug? Message-ID: <51923.10.0.7.57.1172177581.squirrel@gscmail.wustl.edu> > The issue at hand is whether we can support GenBank accessions/ > display_id/version with your naming scheme. Chris, I'm a little unsure of what you're saying here (which might mean that you're already saying what I'm about to...say). Do you mean it might be tricky to support both the Genbank standard and Dmitry's simultaneously? I would argue any arbitrary ID should be supported as long as that ID is a contiguous non-space word (\S+). Actually the existing accession regex looks like it already supports IDs with '-': /^ACCESSION\s+(\S.*\S)/ It's only the version regex which doesn't (\w doesn't include '-'): /^\w+\.(\d+)/ Anyone else have thoughts or comments on this? Off the top of my head, I can't think of any issues that might arise from doing so (apart from having to modify all of the SeqIO modules to support it). Dave From heikki at sanbi.ac.za Fri Feb 23 03:25:39 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Fri, 23 Feb 2007 10:25:39 +0200 Subject: [Bioperl-l] Sequence matching problem! In-Reply-To: <9107936.post@talk.nabble.com> References: <9107936.post@talk.nabble.com> Message-ID: <200702231025.39416.heikki@sanbi.ac.za> Kurt, There are few things in your code to note: - regexp /C*T/ matches any T preceded by zero or more Cs, not what you meant - $- and $+ are among the "expensive" perl functions worth not using unless you have to. Using them once in your code slows execution down considerable. There is always an other way. - Keep in mind what you want to use the match positions for: Human readable locations usually start counting with 1 but perl code uses 0 as the first location. The code below assumes you want to print the locations out. Study my example code below. Yours, -Heikki ################################################################### #!/usr/bin/perl $seq = "GATCAAT"; #$pattern= 'C*T'; $pattern= 'C.*T'; while ($seq =~ m/($pattern)/gi) { $match = $1; $end = pos($seq); $start = $end - length($match) +1; print "$match : $start - $end\n"; } ################################################################### On Thursday 22 February 2007 22:41:37 Kurt Gobain wrote: > Hi every1.. > I m facing a great deal of problem in simple pattern matching between > sequence & a pattern ..Program shod be designed such a way that it shod be > able do two things 1) normal matching...For eg: GATCAAT....if TC is > entered... output shod be 2...2) matching using spl character..In same > example if C*T value is entered It shod give o/p as 3 & seq to b displayed > is CAAT..I m easily getting 1st part...But in 2nd part Its giving sum > problem..output I m gettin as 1 instead of 3...Code is really simple! > > #!/usr/bin/perl > $alphabet = "GATCAAT"; > $pattern= "C*T "; > > $alphabet =~ /($pattern)/i; > > print "The entire '$pattern' match began at $-[0] and ended at $+[0]\n"; > > ==================== > OUTPUT! > The entire C*T match began at 1 and ended at 2 > ==================== > > but the o/p shod be 3???? > & Is there n e chance I can get seq too..I mean instead of C*T'' i need > 'CAAT'...???? > > Well..Its not compulsion to use regex....But I find it quite simple..can > there be n e other method?? > > Thanx in advance! > Kurt! -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From avilella at gmail.com Fri Feb 23 04:59:49 2007 From: avilella at gmail.com (Albert Vilella) Date: Fri, 23 Feb 2007 09:59:49 +0000 Subject: [Bioperl-l] Sequence matching problem! In-Reply-To: <200702231025.39416.heikki@sanbi.ac.za> References: <9107936.post@talk.nabble.com> <200702231025.39416.heikki@sanbi.ac.za> Message-ID: <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com> now that we are at this pattern matching thread, I was wondering if any perl guru could enlighten me on the issue of matching exact sequence patterns on a gapped target sequence. E.g.: my $seq = "CGATCAACGAATCGTACGTACTC"; my $gapped_seq = "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG"; and one would like to get as a result: "CG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTC" which is the match of $seq but in $gapped_seq. Cheers, Albert. On 2/23/07, Heikki Lehvaslaiho wrote: > Kurt, > > There are few things in your code to note: > > - regexp /C*T/ matches any T preceded by zero or more Cs, > not what you meant > - $- and $+ are among the "expensive" perl functions worth > not using unless you have to. Using them once in your > code slows execution down considerable. There is always > an other way. > - Keep in mind what you want to use the match positions for: > Human readable locations usually start counting with 1 but > perl code uses 0 as the first location. The code below assumes > you want to print the locations out. > > Study my example code below. > > Yours, > -Heikki > > ################################################################### > #!/usr/bin/perl > $seq = "GATCAAT"; > #$pattern= 'C*T'; > $pattern= 'C.*T'; > > while ($seq =~ m/($pattern)/gi) { > > $match = $1; > $end = pos($seq); > $start = $end - length($match) +1; > > print "$match : $start - $end\n"; > } > > ################################################################### > > > On Thursday 22 February 2007 22:41:37 Kurt Gobain wrote: > > Hi every1.. > > I m facing a great deal of problem in simple pattern matching between > > sequence & a pattern ..Program shod be designed such a way that it shod be > > able do two things 1) normal matching...For eg: GATCAAT....if TC is > > entered... output shod be 2...2) matching using spl character..In same > > example if C*T value is entered It shod give o/p as 3 & seq to b displayed > > is CAAT..I m easily getting 1st part...But in 2nd part Its giving sum > > problem..output I m gettin as 1 instead of 3...Code is really simple! > > > > #!/usr/bin/perl > > $alphabet = "GATCAAT"; > > $pattern= "C*T "; > > > > $alphabet =~ /($pattern)/i; > > > > print "The entire '$pattern' match began at $-[0] and ended at $+[0]\n"; > > > > ==================== > > OUTPUT! > > The entire C*T match began at 1 and ended at 2 > > ==================== > > > > but the o/p shod be 3???? > > & Is there n e chance I can get seq too..I mean instead of C*T'' i need > > 'CAAT'...???? > > > > Well..Its not compulsion to use regex....But I find it quite simple..can > > there be n e other method?? > > > > Thanx in advance! > > Kurt! > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From js5 at sanger.ac.uk Fri Feb 23 06:34:37 2007 From: js5 at sanger.ac.uk (James Smith) Date: Fri, 23 Feb 2007 11:34:37 +0000 (GMT) Subject: [Bioperl-l] Sequence matching problem! In-Reply-To: <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com> References: <9107936.post@talk.nabble.com> <200702231025.39416.heikki@sanbi.ac.za> <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com> Message-ID: On Fri, 23 Feb 2007, Albert Vilella wrote: > now that we are at this pattern matching thread, I was wondering if > any perl guru could enlighten me on the issue of matching exact > sequence patterns on a gapped target sequence. E.g.: > > my $seq = "CGATCAACGAATCGTACGTACTC"; > my $gapped_seq = > "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG"; > > and one would like to get as a result: > > "CG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTC" > > which is the match of $seq but in $gapped_seq. Try... my $seq = "CGATCAACGAATCGTACGTACTC"; my $gapped_seq = "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG"; my $regexp = '('.join('-*?',split//,$seq).')'; if( $gapped_seq =~ /$regexp/ ) { print "Match is $1\n"; } else { print "No match\n"; } (not sure on the efficiency if $seq is long tho') James > > Cheers, From khoueiry at ibdm.univ-mrs.fr Fri Feb 23 08:09:33 2007 From: khoueiry at ibdm.univ-mrs.fr (pierre) Date: Fri, 23 Feb 2007 14:09:33 +0100 Subject: [Bioperl-l] Sequence matching problem! In-Reply-To: <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com> References: <9107936.post@talk.nabble.com> <200702231025.39416.heikki@sanbi.ac.za> <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com> Message-ID: <1172236173.4309.6.camel@ciona-pierre> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070223/0e08ebe6/attachment.ksh From neetisomaiya at gmail.com Fri Feb 23 07:27:28 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Fri, 23 Feb 2007 17:57:28 +0530 Subject: [Bioperl-l] need help urgently - needle output parsing Message-ID: <764978cf0702230427x5b5acf73y6538527ade3fd453@mail.gmail.com> Hi, I am using needle alignment tool (standalone, on a linux machine), and then I am using Bioperl to parse the output. All data - sequence files and alignment outputs are attached with this mail. I have 2 small sequences :- 693.seq and revcomp693.seq I have 2 big sequences :- 80768-4291-5639.84809_84810_84809_1.scf.seq and 80768-4291-5639.84809_84810_84810_1.scf.seq All these are in fasta format Now I am doing the following :- 1) Aligning 80768-4291-5639.84809_84810_84809_1.scf.seq and 693.seq - output file is 80768-4291-5639.84809_84810_84809_1.scf.out parsing the output gives me the alignment start in 'traceseq' as 97 2) Aligning 80768-4291-5639.84809_84810_84809_1.scf.seq and revcomp693.seq - output file is 80768-4291-5639.84809_84810_84809_1.scf.comp.out parsing the output gives me the alignment start in 'traceseq' as 91 All this is correct. Now I am doing the following :- 1) Aligning 80768-4291-5639.84809_84810_84810_1.scf.seq and 693.seq - output file is 80768-4291-5639.84809_84810_84810_1.scf.out parsing the output gives me the alignment start in 'traceseq' as 341 (this is correct) 2) Aligning 80768-4291-5639.84809_84810_84810_1.scf.seq and revcomp693.seq - output file is 80768-4291-5639.84809_84810_84810_1.scf.comp.out parsing the output gives me the alignment start in 'traceseq' as 341 (this is incorrect, correct position is 330) Part of my code is as follows :- --------------------------------------------- # running needle `$needle_path./needle $trace.seq $snp_position_on_con.seq -gapopen 10.0-gapextend 0.5 $output`; # parsing needle output my $str = Bio::AlignIO->new(-format => 'emboss',-file => $output); my $aln = $str->next_aln(); my $pos = $aln->column_from_residue_number('original',1); $logger->info("Alignment pos is $pos"); #################################### # running needle `$needle_path./needle $trace.seq revcomp$snp_position_on_con.seq -gapopen 10.0 -gapextend 0.5 $comp_output`; # parsing needle output my $comp_str = Bio::AlignIO->new(-format => 'emboss',-file => $comp_output); my $comp_aln = $comp_str->next_aln(); my $comp_pos = $comp_aln->column_from_residue_number('revcomp',1); $logger->info("Alignment pos is $comp_pos"); Can someone please tell me what is going wrong here? -- -Neeti Even my blood says, B positive -------------- next part -------------- A non-text attachment was scrubbed... Name: data.zip Type: application/zip Size: 4456 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070223/21658b7d/attachment-0001.zip From bix at sendu.me.uk Fri Feb 23 08:55:24 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 23 Feb 2007 13:55:24 +0000 Subject: [Bioperl-l] Sequence matching problem! In-Reply-To: References: <9107936.post@talk.nabble.com> <200702231025.39416.heikki@sanbi.ac.za> <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com> Message-ID: <45DEF24C.1010303@sendu.me.uk> James Smith wrote: > On Fri, 23 Feb 2007, Albert Vilella wrote: > >> now that we are at this pattern matching thread, I was wondering if >> any perl guru could enlighten me on the issue of matching exact >> sequence patterns on a gapped target sequence. E.g.: >> >> my $seq = "CGATCAACGAATCGTACGTACTC"; >> my $gapped_seq = >> "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG"; >> >> and one would like to get as a result: >> >> "CG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTC" >> >> which is the match of $seq but in $gapped_seq. > > Try... > > my $seq = "CGATCAACGAATCGTACGTACTC"; > my $gapped_seq = > "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG"; > > my $regexp = '('.join('-*?',split//,$seq).')'; > > if( $gapped_seq =~ /$regexp/ ) { > print "Match is $1\n"; > } else { > print "No match\n"; > } That's great stuff. If you were matching thousands of different $seq against the same very large $gapped_seq, and only needed the first match of $seq in $gapped_seq, the alternative to the above approach (remove the gaps from $gapped_seq and do index() matching) will be faster. Here's one (overly long-winded) way of implementing it, that I found to take ~2s vs ~22s for the above regex approach when doing the job on 999999 copies of $seq: #!/usr/bin/perl -w use strict; use warnings; my $gapped_seq = "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG"; # note the total gap-length at position in gapless 0-based coords my @gap_lengths; my $gap_length = 0; while ($gapped_seq =~ /(-+)/g) { my $match = $1; my $prev_length = $gap_length; $gap_length += length($match); my $end = pos($gapped_seq) - $gap_length - 1; push(@gap_lengths, $prev_length) for (1..$end-$#gap_lengths); } push(@gap_lengths, $gap_length) for (1..(length($gapped_seq) - @gap_lengths - $gap_length)); # remove the gaps my $gapless_seq = $gapped_seq; $gapless_seq =~ s/-//g; # now for each of thousands of seqs... my $seq = 'CGATCAACGAATCGTACGTACTC'; my @seqs; for (1..999999) { push(@seqs, $seq); } foreach my $seq (@seqs) { my $start = index($gapless_seq, $seq); if ($start == -1) { print "No match found for seq '$seq'\n"; next; } my $end = $start + length($seq) - 1; # calculate the coords in $gapped_seq $start = $start + $gap_lengths[$start]; $end = $end + $gap_lengths[$end]; my $result = substr($gapped_seq, $start, ($end - $start + 1)); #print $result, "\n"; } exit; From MEC at stowers-institute.org Fri Feb 23 10:54:57 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 23 Feb 2007 09:54:57 -0600 Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes with multiple values In-Reply-To: <6dce9a0b0701301446w7fc31d6eufe27442fecd0f20e@mail.gmail.com> Message-ID: Lincoln, and other Bio::DB::SeqFeature wanderers: I find that generating GFF from a Bio::DB::SeqFeature using gff3_string does not respect the following: "Multiple attributes of the same type are indicated by separating the values with the comma "," character" (c.f. http://www.sequenceontology.org/gff3.shtml) This one-liner demonstrates the problem: perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id => "J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A', -name => 'mec', -attributes => {foo => [qw(bar blat)]})->gff3_string' J A PH 1 2 . . . foo=bar;foo=blat;Name=mec Do you agree this is a problem? The fix is in the post-sig patch to /Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the stylistic privilege of promoting any ID, Parent, or Name attribute to the front of column 9, so output is now: J A PH 1 2 . . . Name=mec;foo=bar,blat Do you agree this is better? I am poised to commit it, as well as the functionally same patch to the equivilent function in Bio/Graphics/FeatureBase.pm All clear? -- Malcolm Cook *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25 --- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000 *************** *** 481,494 **** next if $t eq 'load_id'; next if $t eq 'parent_id'; foreach (@values) { s/\s+$// } # get rid of trailing whitespace ! ! push @result,join '=',$self->escape($t),$self->escape($_) foreach @values; } my $id = $self->primary_id; my $name = $self->display_name; ! push @result,"ID=".$self->escape($id) if defined $id; ! push @result,"Parent=".$self->escape($parent->primary_id) if defined $parent; ! push @result,"Name=".$self->escape($name) if defined $name; return join ';', at result; } --- 481,498 ---- next if $t eq 'load_id'; next if $t eq 'parent_id'; foreach (@values) { s/\s+$// } # get rid of trailing whitespace ! ! push @result,join '=',$self->escape($t),$self->escape($_) foreach @values; ! # NO! Multiple attributes of the same type are indicated by ! # separating the values with the comma "," character - per ! # http://www.sequenceontology.org/gff3.shtml. Do it this way: ! #push @result,join '=',$self->escape($t),join(',', map {$self->escape($_)} @values); } my $id = $self->primary_id; my $name = $self->display_name; ! unshift @result,"ID=".$self->escape($id) if defined $id; ! unshift @result,"Parent=".$self->escape($parent->primary_id) if defined $parent; ! unshift @result,"Name=".$self->escape($name) if defined $name; return join ';', at result; } From MEC at stowers-institute.org Fri Feb 23 12:08:11 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 23 Feb 2007 11:08:11 -0600 Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes withmultiple values In-Reply-To: Message-ID: Oy, I hit send too soon. The patch I send had my new attribute encoder commented out. It should've been: *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25 --- NormalizedFeature.pm 23 Feb 2007 17:06:37 -0000 *************** *** 481,494 **** next if $t eq 'load_id'; next if $t eq 'parent_id'; foreach (@values) { s/\s+$// } # get rid of trailing whitespace ! ! push @result,join '=',$self->escape($t),$self->escape($_) foreach @values; } my $id = $self->primary_id; my $name = $self->display_name; ! push @result,"ID=".$self->escape($id) if defined $id; ! push @result,"Parent=".$self->escape($parent->primary_id) if defined $parent; ! push @result,"Name=".$self->escape($name) if defined $name; return join ';', at result; } --- 481,497 ---- next if $t eq 'load_id'; next if $t eq 'parent_id'; foreach (@values) { s/\s+$// } # get rid of trailing whitespace ! # push @result,join '=',$self->escape($t),$self->escape($_) foreach @values; ! # NO! Multiple attributes of the same type are indicated by ! # separating the values with the comma "," character - per ! # http://www.sequenceontology.org/gff3.shtml. Do it this way: ! push @result,join '=',$self->escape($t),join(',', map {$self->escape($_)} @values); } my $id = $self->primary_id; my $name = $self->display_name; ! unshift @result,"ID=".$self->escape($id) if defined $id; ! unshift @result,"Parent=".$self->escape($parent->primary_id) if defined $parent; ! unshift @result,"Name=".$self->escape($name) if defined $name; return join ';', at result; } Malcolm From lstein at cshl.edu Fri Feb 23 12:16:01 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 23 Feb 2007 12:16:01 -0500 Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes with multiple values In-Reply-To: References: <6dce9a0b0701301446w7fc31d6eufe27442fecd0f20e@mail.gmail.com> Message-ID: <6dce9a0b0702230916o684e490cu9d01d3be3e2ae061@mail.gmail.com> Hi Malcom, You're quite right, and I appreciate your work in tracking down and fixing it. Before you commit the patch, can you confirm that the loader is working correctly so that comma-separated values are read back into the data structure as multiple attributes? Lincoln On 2/23/07, Cook, Malcolm wrote: > > Lincoln, and other Bio::DB::SeqFeature wanderers: > > I find that generating GFF from a Bio::DB::SeqFeature using gff3_string > does not respect the following: > > "Multiple attributes of the same type are indicated by separating the > values with the comma "," character" (c.f. > http://www.sequenceontology.org/gff3.shtml) > > This one-liner demonstrates the problem: > > perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id => > "J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A', > -name => 'mec', -attributes => {foo => [qw(bar blat)]})->gff3_string' > J A PH 1 2 . . . > foo=bar;foo=blat;Name=mec > > Do you agree this is a problem? > > The fix is in the post-sig patch to > /Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the > stylistic privilege of promoting any ID, Parent, or Name attribute to > the front of column 9, so output is now: > > J A PH 1 2 . . . > Name=mec;foo=bar,blat > > Do you agree this is better? > > I am poised to commit it, as well as the functionally same patch to the > equivilent function in Bio/Graphics/FeatureBase.pm > > All clear? > > -- Malcolm Cook > > > > *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25 > --- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000 > *************** > *** 481,494 **** > next if $t eq 'load_id'; > next if $t eq 'parent_id'; > foreach (@values) { s/\s+$// } # get rid of trailing whitespace > ! > ! push @result,join '=',$self->escape($t),$self->escape($_) foreach > @values; > } > my $id = $self->primary_id; > my $name = $self->display_name; > ! push @result,"ID=".$self->escape($id) if defined > $id; > ! push @result,"Parent=".$self->escape($parent->primary_id) if defined > $parent; > ! push @result,"Name=".$self->escape($name) if > defined $name; > return join ';', at result; > } > > --- 481,498 ---- > next if $t eq 'load_id'; > next if $t eq 'parent_id'; > foreach (@values) { s/\s+$// } # get rid of trailing whitespace > ! > ! push @result,join '=',$self->escape($t),$self->escape($_) foreach > @values; > ! # NO! Multiple attributes of the same type are indicated by > ! # separating the values with the comma "," character - per > ! # http://www.sequenceontology.org/gff3.shtml. Do it this way: > ! #push @result,join '=',$self->escape($t),join(',', map > {$self->escape($_)} @values); > } > my $id = $self->primary_id; > my $name = $self->display_name; > ! unshift @result,"ID=".$self->escape($id) if > defined $id; > ! unshift @result,"Parent=".$self->escape($parent->primary_id) if > defined $parent; > ! unshift @result,"Name=".$self->escape($name) if > defined $name; > return join ';', at result; > } > > > > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From aaron.j.mackey at gsk.com Fri Feb 23 09:36:18 2007 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Fri, 23 Feb 2007 09:36:18 -0500 Subject: [Bioperl-l] Bio::AnnotatableI vs Bio::FeatureHolderI In-Reply-To: <45DDFED1.1090503@watson.wustl.edu> Message-ID: The fundamental difference (in my mind) between a feature and an annotation, is that a feature has a location/range, and thus the information represented in the feature is applicable only to that location/range. An annotation, on the other hand, is "global", or at least non-localizable (note: a feature with a "fuzzy" location of "somewhere along this sequence, but I'm not sure where" is still not global - if you did/could know the location, you'd describe it as a feature, so it shouldn't be represented with an annotation). -Aaron bioperl-l-bounces at lists.open-bio.org wrote on 02/22/2007 03:36:33 PM: > Are Bio::AnnotatableI and Bio::FeatureHolderI competing interfaces? > > I get the impression they are designed to do similar things. If so is > one deprecated and the other preferred? > > If their responsibilities are orthogonal to each other, what sorts of > tasks are suited to each? > > Thanks, > Michael > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From MEC at stowers-institute.org Fri Feb 23 13:46:00 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 23 Feb 2007 12:46:00 -0600 Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes with multiple values In-Reply-To: <6dce9a0b0702230916o684e490cu9d01d3be3e2ae061@mail.gmail.com> Message-ID: Lincoln, OK. I'll do that... ...let's see, a quick squiz at Bio/DB/SeqFeature/Store/ .... ...ok - parse_attributes _looks_ right to me ...so, let's try it #load a feature into a new database: bp_seqfeature_load.PLS -dsn 'dbi:mysql:database=test;host=mysql-dev' -create -user test -pass test <(echo -e "J\tA\tPH\t1\t2\t.\t.\t.\tfoo=bar,blat;Name=mec\n") #It loaded ok. Now, let's print it out in GFF3: perl -MBio::DB::SeqFeature::Store -e 'foreach (Bio::DB::SeqFeature::Store->new(-dsn => "dbi:mysql:database=test;host=mysql-dev;user=test;password=test")->featu res(-type => "PH:A")) {print $_->gff3_string . "\n"}' J A PH 1 2 . . . Name=mec;ID=1;foo=bar,blat #output looks good to me Note, I tried loading attributes foo=bar;foo=blat and it came back foo=bar,blat. So, you can load either way. I'll commit later today. --Malcolm ________________________________ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: Friday, February 23, 2007 11:16 AM To: Cook, Malcolm Cc: bioperl list; lstein at cshl.org Subject: Re: Bio::DB::SeqFeature to GFF mishandles attributes with multiple values Hi Malcom, You're quite right, and I appreciate your work in tracking down and fixing it. Before you commit the patch, can you confirm that the loader is working correctly so that comma-separated values are read back into the data structure as multiple attributes? Lincoln On 2/23/07, Cook, Malcolm wrote: Lincoln, and other Bio::DB::SeqFeature wanderers: I find that generating GFF from a Bio::DB::SeqFeature using gff3_string does not respect the following: "Multiple attributes of the same type are indicated by separating the values with the comma "," character" (c.f. http://www.sequenceontology.org/gff3.shtml) This one-liner demonstrates the problem: perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id => "J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A', -name => 'mec', -attributes => {foo => [qw(bar blat)]})->gff3_string' J A PH 1 2 . . . foo=bar;foo=blat;Name=mec Do you agree this is a problem? The fix is in the post-sig patch to /Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the stylistic privilege of promoting any ID, Parent, or Name attribute to the front of column 9, so output is now: J A PH 1 2 . . . Name=mec;foo=bar,blat Do you agree this is better? I am poised to commit it, as well as the functionally same patch to the equivilent function in Bio/Graphics/FeatureBase.pm All clear? -- Malcolm Cook *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25 --- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000 *************** *** 481,494 **** next if $t eq 'load_id'; next if $t eq 'parent_id'; foreach (@values) { s/\s+$// } # get rid of trailing whitespace ! ! push @result,join '=',$self->escape($t),$self->escape($_) foreach @values; } my $id = $self->primary_id; my $name = $self->display_name; ! push @result,"ID=".$self->escape($id) if defined $id; ! push @result,"Parent=".$self->escape($parent->primary_id) if defined $parent; ! push @result,"Name=".$self->escape($name) if defined $name; return join ';', at result; } --- 481,498 ---- next if $t eq 'load_id'; next if $t eq 'parent_id'; foreach (@values) { s/\s+$// } # get rid of trailing whitespace ! ! push @result,join '=',$self->escape($t),$self->escape($_) foreach @values; ! # NO! Multiple attributes of the same type are indicated by ! # separating the values with the comma "," character - per ! # http://www.sequenceontology.org/gff3.shtml. Do it this way: ! #push @result,join '=',$self->escape($t),join(',', map {$self->escape($_)} @values); } my $id = $self->primary_id; my $name = $self->display_name; ! unshift @result,"ID=".$self->escape($id) if defined $id; ! unshift @result,"Parent=".$self->escape($parent->primary_id) if defined $parent; ! unshift @result,"Name=".$self->escape($name) if defined $name; return join ';', at result; } -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Fri Feb 23 13:49:44 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 23 Feb 2007 12:49:44 -0600 Subject: [Bioperl-l] Bio::AnnotatableI vs Bio::FeatureHolderI In-Reply-To: References: Message-ID: To add to that, there's a HOWTO describing the differences: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation I agree w/ Aaron; if it has a location it's a feature, otherwise it's an annotation. chris On Feb 23, 2007, at 8:36 AM, aaron.j.mackey at gsk.com wrote: > The fundamental difference (in my mind) between a feature and an > annotation, is that a feature has a location/range, and thus the > information represented in the feature is applicable only to that > location/range. An annotation, on the other hand, is "global", or at > least non-localizable (note: a feature with a "fuzzy" location of > "somewhere along this sequence, but I'm not sure where" is still not > global - if you did/could know the location, you'd describe it as a > feature, so it shouldn't be represented with an annotation). > > -Aaron > > bioperl-l-bounces at lists.open-bio.org wrote on 02/22/2007 03:36:33 PM: > >> Are Bio::AnnotatableI and Bio::FeatureHolderI competing interfaces? >> >> I get the impression they are designed to do similar things. If >> so is >> one deprecated and the other preferred? >> >> If their responsibilities are orthogonal to each other, what sorts of >> tasks are suited to each? >> >> Thanks, >> Michael >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From lstein at cshl.edu Fri Feb 23 16:20:26 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 23 Feb 2007 16:20:26 -0500 Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes with multiple values In-Reply-To: References: <6dce9a0b0702230916o684e490cu9d01d3be3e2ae061@mail.gmail.com> Message-ID: <6dce9a0b0702231320j1f24d4b4oe33bce6d2da96db7@mail.gmail.com> Excellent! Lincoln On 2/23/07, Cook, Malcolm wrote: > > Lincoln, > > OK. I'll do that... > > ...let's see, a quick squiz at Bio/DB/SeqFeature/Store/ .... > > ...ok - parse_attributes _looks_ right to me > > ...so, let's try it > > #load a feature into a new database: > > bp_seqfeature_load.PLS -dsn 'dbi:mysql:database=test;host=mysql-dev' > -create -user test -pass test <(echo -e "J\tA\tPH\t1\t2\t.\t.\t.\tfoo=bar, > blat;Name=mec\n") > > #It loaded ok. Now, let's print it out in GFF3: > > perl -MBio::DB::SeqFeature::Store -e 'foreach > (Bio::DB::SeqFeature::Store->new(-dsn => > "dbi:mysql:database=test;host=mysql-dev;user=test;password=test")->features(-type > => "PH:A")) {print $_->gff3_string . "\n"}' > J A PH 1 2 . . . Name=mec;ID=1;foo=bar,blat > > #output looks good to me > > Note, I tried loading attributes foo=bar;foo=blat and it came back > foo=bar,blat. So, you can load either way. > > I'll commit later today. > > --Malcolm > > > ------------------------------ > *From:* lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] *On > Behalf Of *Lincoln Stein > *Sent:* Friday, February 23, 2007 11:16 AM > *To:* Cook, Malcolm > *Cc:* bioperl list; lstein at cshl.org > *Subject:* Re: Bio::DB::SeqFeature to GFF mishandles attributes with > multiple values > > Hi Malcom, > > You're quite right, and I appreciate your work in tracking down and fixing > it. Before you commit the patch, can you confirm that the loader is working > correctly so that comma-separated values are read back into the data > structure as multiple attributes? > > Lincoln > > On 2/23/07, Cook, Malcolm wrote: > > > > Lincoln, and other Bio::DB::SeqFeature wanderers: > > > > I find that generating GFF from a Bio::DB::SeqFeature using gff3_string > > does not respect the following: > > > > "Multiple attributes of the same type are indicated by separating the > > values with the comma "," character" (c.f. > > http://www.sequenceontology.org/gff3.shtml) > > > > This one-liner demonstrates the problem: > > > > perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id => > > "J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A', > > -name => 'mec', -attributes => {foo => [qw(bar blat)]})->gff3_string' > > J A PH 1 2 . . . > > foo=bar;foo=blat;Name=mec > > > > Do you agree this is a problem? > > > > The fix is in the post-sig patch to > > /Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the > > stylistic privilege of promoting any ID, Parent, or Name attribute to > > the front of column 9, so output is now: > > > > J A PH 1 2 . . . > > Name=mec;foo=bar,blat > > > > Do you agree this is better? > > > > I am poised to commit it, as well as the functionally same patch to the > > equivilent function in Bio/Graphics/FeatureBase.pm > > > > All clear? > > > > -- Malcolm Cook > > > > > > > > *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25 > > --- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000 > > *************** > > *** 481,494 **** > > next if $t eq 'load_id'; > > next if $t eq 'parent_id'; > > foreach (@values) { s/\s+$// } # get rid of trailing whitespace > > ! > > ! push @result,join '=',$self->escape($t),$self->escape($_) foreach > > @values; > > } > > my $id = $self->primary_id; > > my $name = $self->display_name; > > ! push @result,"ID=".$self->escape($id) if defined > > > > $id; > > ! push @result,"Parent=".$self->escape($parent->primary_id) if defined > > $parent; > > ! push @result,"Name=".$self->escape($name) if > > defined $name; > > return join ';', at result; > > } > > > > --- 481,498 ---- > > next if $t eq 'load_id'; > > next if $t eq 'parent_id'; > > foreach (@values) { s/\s+$// } # get rid of trailing whitespace > > ! > > ! push @result,join '=',$self->escape($t),$self->escape($_) foreach > > > > @values; > > ! # NO! Multiple attributes of the same type are indicated by > > ! # separating the values with the comma "," character - per > > ! # http://www.sequenceontology.org/gff3.shtml. Do it this way: > > ! #push @result,join '=',$self->escape($t),join(',', map > > {$self->escape($_)} @values); > > } > > my $id = $self->primary_id; > > my $name = $self->display_name; > > ! unshift @result,"ID=".$self->escape($id) if > > defined $id; > > ! unshift @result,"Parent=".$self->escape($parent->primary_id) if > > defined $parent; > > ! unshift @result,"Name=".$self->escape($name) if > > defined $name; > > return join ';', at result; > > } > > > > > > > > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From enrique_rulz at yahoo.com Sat Feb 24 16:23:59 2007 From: enrique_rulz at yahoo.com (Kurt Gobain) Date: Sat, 24 Feb 2007 13:23:59 -0800 (PST) Subject: [Bioperl-l] Sequence matching problem! In-Reply-To: <200702231025.39416.heikki@sanbi.ac.za> References: <9107936.post@talk.nabble.com> <200702231025.39416.heikki@sanbi.ac.za> Message-ID: <9137941.post@talk.nabble.com> Heikki Lehvaslaiho wrote: > > Kurt, > > There are few things in your code to note: > > - regexp /C*T/ matches any T preceded by zero or more Cs, > not what you meant > - $- and $+ are among the "expensive" perl functions worth > not using unless you have to. Using them once in your > code slows execution down considerable. There is always > an other way. > - Keep in mind what you want to use the match positions for: > Human readable locations usually start counting with 1 but > perl code uses 0 as the first location. The code below assumes > you want to print the locations out. > > Study my example code below. > > Yours, > -Heikki > > ################################################################### > #!/usr/bin/perl > $seq = "GATCAAT"; > #$pattern= 'C*T'; > $pattern= 'C.*T'; > > while ($seq =~ m/($pattern)/gi) { > > $match = $1; > $end = pos($seq); > $start = $end - length($match) +1; > > print "$match : $start - $end\n"; > } > > ################################################################### > > Thanx for the instant reply!...Sorry cudn reply earlier.. Code works perfectly fine...but...sum time its not givin reqd o/p..For eg. If I type sequence as "GATCAAGTCAGGAT" & pattern to be matched as T.*A..then o/p which I am getting frm above prog is TCAAGTCAGGA instead of TCA... & 1 more thing Is there n e chance by which I can replace T*A to T.*A cos the code which I need to write says T*A shod be only the input not T.*A..So Can we use replacment reg ex...sumthing like $pattern =~ s/.*/*/...or sumthing else... But its kinda givin sum error again...Dam! Regex is really hairy!!...:P N e ways thanx a lot again for the code...Hope to listen frm you soon! Kurt! -- View this message in context: http://www.nabble.com/Sequence-matching-problem%21-tf3275153.html#a9137941 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From biology0046 at hotmail.com Sat Feb 24 23:14:51 2007 From: biology0046 at hotmail.com (=?gb2312?B?va0gzsTi/Q==?=) Date: Sun, 25 Feb 2007 04:14:51 +0000 Subject: [Bioperl-l] how to change align output format Message-ID: Dear all: I have problems in changing the output format of clustal alignment. I use the Bio::Tools::Run::Alignment::Clustalw module to carry out an mulitple sequences alignment, then i use the Bio::AlignIO module to write out the alignment. Scripts like this: my $aln_out=Bio::AlignIO->new(-file=>">./clustal/${outfilename}.aln",-format=>'clustalw'); The output : dana_GLEANR_16071 MSKMKMLPVQLSLNQLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL dere_GLEANR_9270 ..............S............................................. FBgn0000097 ..............S............................................. dsec_GLEANR_671 ..............S............................................. dsim_GLEANR_6613 ..............S............................................. dyak_GLEANR_1669 ..............S............................................. . dana_GLEANR_16071 VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM dere_GLEANR_9270 ............................................................ FBgn0000097 ............................................................ dsec_GLEANR_671 ............................................................ dsim_GLEANR_6613 ............................................................ dyak_GLEANR_1669 ............................................................ But , I want to change the output format as below, which do not change the identical residues into "." character. dere_GLEANR_9270 MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL dyak_GLEANR_1669 MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL dsec_GLEANR_671 MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL dsim_GLEANR_6613 MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL FBgn0000097 MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL dana_GLEANR_16071 MSKMKMLPVQLSLNQLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL **************.********************************************* dere_GLEANR_9270 VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM dyak_GLEANR_1669 VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM dsec_GLEANR_671 VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM dsim_GLEANR_6613 VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM FBgn0000097 VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM dana_GLEANR_16071 VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM ************************************************************ Are their any parameters in the package that can be changed so that i can get the postier output format? Thank you Sincerely! Jiang _________________________________________________________________ ?????????????????????????????? MSN Hotmail?? http://www.hotmail.com From bix at sendu.me.uk Sun Feb 25 05:53:48 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 25 Feb 2007 10:53:48 +0000 Subject: [Bioperl-l] [Fwd: Bio::Graph::SimpleGraph] Message-ID: <45E16ABC.3060405@sendu.me.uk> Tels, I've forwarded this to the author of the module, Nat Goodman, and to the Bioperl mailing list (http://www.bioperl.org/wiki/Mailing_lists#Main_BioPerl_list). But actually we have Bio::Graph::* as tentatively deprecated: http://www.bioperl.org/wiki/Deprecated_modules#Bio::Graph_modules so any further work on it doesn't seem worthwhile. -------- Original Message -------- Subject: Bio::Graph::SimpleGraph Date: Sat, 24 Feb 2007 12:07:31 +0100 From: Tels Moin, I just stumble dover Bio::Graph::SimpleGraph and read this comment: "This is a simple, hopefully fast undirected graph package. The only reason this exists is that the standard CPAN Graph pacakge, Graph::Base, is seriously broken." Really sad to see people always reinventing the wheel :/ Anyway, I wonder if you would like to make your module support Graph::Easy (http://search.cpan.org/~tels/Graph-Easy/)? I would be willing to submit patches and do testing/documention for that. All the best, Tels From bix at sendu.me.uk Sun Feb 25 05:45:21 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 25 Feb 2007 10:45:21 +0000 Subject: [Bioperl-l] Sequence matching problem! In-Reply-To: <9137941.post@talk.nabble.com> References: <9107936.post@talk.nabble.com> <200702231025.39416.heikki@sanbi.ac.za> <9137941.post@talk.nabble.com> Message-ID: <45E168C1.80306@sendu.me.uk> Kurt Gobain wrote: > Code works perfectly fine...but...sum time its not givin reqd o/p..For eg. > If I type sequence as "GATCAAGTCAGGAT" & pattern to be matched as T.*A..then > o/p which I am getting frm above prog is TCAAGTCAGGA instead of TCA... > & 1 more thing Is there n e chance by which I can replace T*A to T.*A cos > the code which I need to write says T*A shod be only the input not T.*A..So > Can we use replacment reg ex...sumthing like > $pattern =~ s/.*/*/...or sumthing else... > But its kinda givin sum error again...Dam! Regex is really hairy!!...:P These aren't Bioperl questions. For regular expression help see: http://perldoc.perl.org/perlretut.html Basically, you want a non-greedy match, so T.*?A You can convert T*A by doing s/\*/.*?/ Here are some more regexs for you: s/sum/some/g s/frm/from/g s/n e/any/g etc... From biology0046 at hotmail.com Sun Feb 25 08:28:34 2007 From: biology0046 at hotmail.com (=?gb2312?B?va0gzsTi/Q==?=) Date: Sun, 25 Feb 2007 13:28:34 +0000 Subject: [Bioperl-l] AlignIO problems Message-ID: hi, all, I use the AlignIO module to convert the alignment file. my original file is : CLUSTAL W(1.81) multiple sequence alignment dana_GLEANR_11249 MEAIAKHDFSATADDELSFRKTQTLKILNMEDDSNWYRAELDGKEGLIPSNYIEMKNHDW dere_GLEANR_7213 ...V...................I.................................... dgri_GLEANR_6962 .......................I.................................... FBgn0004638 .......................I.................................... dmoj_GLEANR_6118 ...........N...........I.................................... dper_GLEANR_18885 ...V...................I.................................... dpse_GLEANR_14384 ...V...................I.................................... dsec_GLEANR_3096 .................N.....I.................................... dsim_GLEANR_9744 -----------------------------............................... dvir_GLEANR_4811 .......................I.................................... dwil_GLEANR_10869 .......................I.................................... dyak_GLEANR_13576 .......................I.................................... dana_GLEANR_11249 YYGRITRADAEKLLSNKHEGAFLIRISESSPGDFSLSVKCPDGVQHFKVLRDAQSKFFLW dere_GLEANR_7213 ............................................................ dgri_GLEANR_6962 ............................................................ FBgn0004638 ............................................................ dmoj_GLEANR_6118 .................L.......................................... dper_GLEANR_18885 ............................................................ dpse_GLEANR_14384 ............................................................ dsec_GLEANR_3096 ............................................................ dsim_GLEANR_9744 ............................................................ dvir_GLEANR_4811 ............................................................ dwil_GLEANR_10869 ............................................................ dyak_GLEANR_13576 ............................................................ dana_GLEANR_11249 VVKFNSLNELVEYHRTASVSRSQDVKLRDMIPEEMLVQALYDFVPQESGELDFRRGDVIT dere_GLEANR_7213 ............................................................ dgri_GLEANR_6962 ............................................................ FBgn0004638 ............................................................ dmoj_GLEANR_6118 ..............................V.D........................... dper_GLEANR_18885 .......................E.................................... dpse_GLEANR_14384 .......................E.................................... dsec_GLEANR_3096 ............................................................ dsim_GLEANR_9744 ............................................................ dvir_GLEANR_4811 ............................................................ dwil_GLEANR_10869 ............................................................ dyak_GLEANR_13576 ............................................................ dana_GLEANR_11249 VTDRSDENWWNGEIGNRKGIFPATYVTPYHS dere_GLEANR_7213 ............................... dgri_GLEANR_6962 ............................... FBgn0004638 ............................... dmoj_GLEANR_6118 ............Q.................. dper_GLEANR_18885 ............................... dpse_GLEANR_14384 ............................... dsec_GLEANR_3096 ............................... dsim_GLEANR_9744 ............................... dvir_GLEANR_4811 ............................... dwil_GLEANR_10869 ............................... dyak_GLEANR_13576 ............................... I want to change those "." characters back to alphabetic expression, then i write the code like this: use Bio::AlignIO; my $in=Bio::AlignIO->new(-file =>"FBgn0000097.aln", -format => 'clustalw'); my $out=Bio::AlignIO->new(-file=>">../clustalw/0097.aln", -format =>'clustalw'); while (my $aln=$in->next_aln() ){ $aln->unmatch(); $aln->set_displayname_flat(); $out->write_aln($aln); } but when i run the code, there are error message like: -------------------- WARNING --------------------- MSG: Got a sequence with no letters in it cannot guess alphabet [] --------------------------------------------------- ------------- EXCEPTION ------------- MSG: No sequence with name [dsim_GLEANR_9744/1-182] STACK Bio::SimpleAlign::displayname /home/src/bioperl/bioperl-live/Bio/SimpleAlign.pm:2307 STACK Bio::SimpleAlign::set_displayname_flat /home/src/bioperl/bioperl-live/Bio/SimpleAlign.pm:2374 STACK toplevel aligntest.pl:11 -------------------------------------- I don't know where is the problem. Jiang _________________________________________________________________ ???????? MSN Explorer: http://explorer.msn.com/lccn/ From cjfields at uiuc.edu Sun Feb 25 14:58:23 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 25 Feb 2007 13:58:23 -0600 Subject: [Bioperl-l] AlignIO problems In-Reply-To: References: Message-ID: <19EA5F46-D1A4-45B5-B2DB-55194F79215C@uiuc.edu> Bio::AlignIO::clustalw doesn't work with masked sequences; it parses the output quite literally as is, so any [.-] are treated as gaps. If the seqs are 100% identical then you will have a seq with 100% gaps and no sequence, thus giving you the warnings you see. The best way to accomplish what you want is to not mask the sequence alignment to begin with when running clustalw/muscle/whatever. Exactly how are you generating these? When I use clustalw no identity masking occurs by default. chris On Feb 25, 2007, at 7:28 AM, ? ?? wrote: > hi, all, > I use the AlignIO module to convert the alignment file. > my original file is : > CLUSTAL W(1.81) multiple sequence alignment > > > dana_GLEANR_11249 > MEAIAKHDFSATADDELSFRKTQTLKILNMEDDSNWYRAELDGKEGLIPSNYIEMKNHDW > dere_GLEANR_7213 ...V...................I....................... > ............. > dgri_GLEANR_6962 .......................I....................... > ............. > FBgn0004638 .......................I....................... > ............. > dmoj_GLEANR_6118 ...........N...........I....................... > ............. > dper_GLEANR_18885 ...V...................I....................... > ............. > dpse_GLEANR_14384 ...V...................I....................... > ............. > dsec_GLEANR_3096 .................N.....I....................... > ............. > dsim_GLEANR_9744 > -----------------------------............................... > dvir_GLEANR_4811 .......................I....................... > ............. > dwil_GLEANR_10869 .......................I....................... > ............. > dyak_GLEANR_13576 .......................I....................... > ............. > > > > dana_GLEANR_11249 > YYGRITRADAEKLLSNKHEGAFLIRISESSPGDFSLSVKCPDGVQHFKVLRDAQSKFFLW > dere_GLEANR_7213 ............................................... > ............. > dgri_GLEANR_6962 ............................................... > ............. > FBgn0004638 ............................................... > ............. > dmoj_GLEANR_6118 .................L............................. > ............. > dper_GLEANR_18885 ............................................... > ............. > dpse_GLEANR_14384 ............................................... > ............. > dsec_GLEANR_3096 ............................................... > ............. > dsim_GLEANR_9744 ............................................... > ............. > dvir_GLEANR_4811 ............................................... > ............. > dwil_GLEANR_10869 ............................................... > ............. > dyak_GLEANR_13576 ............................................... > ............. > > > > dana_GLEANR_11249 > VVKFNSLNELVEYHRTASVSRSQDVKLRDMIPEEMLVQALYDFVPQESGELDFRRGDVIT > dere_GLEANR_7213 ............................................... > ............. > dgri_GLEANR_6962 ............................................... > ............. > FBgn0004638 ............................................... > ............. > dmoj_GLEANR_6118 ..............................V.D.............. > ............. > dper_GLEANR_18885 .......................E....................... > ............. > dpse_GLEANR_14384 .......................E....................... > ............. > dsec_GLEANR_3096 ............................................... > ............. > dsim_GLEANR_9744 ............................................... > ............. > dvir_GLEANR_4811 ............................................... > ............. > dwil_GLEANR_10869 ............................................... > ............. > dyak_GLEANR_13576 ............................................... > ............. > > > > dana_GLEANR_11249 VTDRSDENWWNGEIGNRKGIFPATYVTPYHS > dere_GLEANR_7213 ............................... > dgri_GLEANR_6962 ............................... > FBgn0004638 ............................... > dmoj_GLEANR_6118 ............Q.................. > dper_GLEANR_18885 ............................... > dpse_GLEANR_14384 ............................... > dsec_GLEANR_3096 ............................... > dsim_GLEANR_9744 ............................... > dvir_GLEANR_4811 ............................... > dwil_GLEANR_10869 ............................... > dyak_GLEANR_13576 ............................... > > > I want to change those "." characters back to alphabetic > expression, then i write the code like this: > use Bio::AlignIO; > my $in=Bio::AlignIO->new(-file =>"FBgn0000097.aln", > -format => 'clustalw'); > my $out=Bio::AlignIO->new(-file=>">../clustalw/0097.aln", > -format =>'clustalw'); > while (my $aln=$in->next_aln() ){ > $aln->unmatch(); > $aln->set_displayname_flat(); > $out->write_aln($aln); > } > > but when i run the code, there are error message like: > > -------------------- WARNING --------------------- > MSG: Got a sequence with no letters in it cannot guess alphabet [] > --------------------------------------------------- > > ------------- EXCEPTION ------------- > MSG: No sequence with name [dsim_GLEANR_9744/1-182] > STACK Bio::SimpleAlign::displayname /home/src/bioperl/bioperl-live/ > Bio/SimpleAlign.pm:2307 > STACK Bio::SimpleAlign::set_displayname_flat /home/src/bioperl/ > bioperl-live/Bio/SimpleAlign.pm:2374 > STACK toplevel aligntest.pl:11 > > -------------------------------------- > > I don't know where is the problem. > > Jiang > > _________________________________________________________________ > ???? MSN Explorer: http://explorer.msn.com/lccn/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cristiangary at gmail.com Sun Feb 25 16:04:57 2007 From: cristiangary at gmail.com (Cristian Gary) Date: Sun, 25 Feb 2007 18:04:57 -0300 Subject: [Bioperl-l] problem with blast report to ncbi webpage Message-ID: <95ef8cd0702251304o45bea6a0tcedc59156cb0cfe4@mail.gmail.com> i have a problem with the blast report to the ncbi server. the time to wait the Rids dont showme any result. the problem is the ncbi server o the biperl version.? pd: the same code works very well a 3 weeks ago. -- "El conocimiento le pertecene a la humanidad" "Gnu/linux -------- free your mind...... www.kubuntu.org From granjeau at tagc.univ-mrs.fr Mon Feb 26 04:17:15 2007 From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137) Date: Mon, 26 Feb 2007 10:17:15 +0100 Subject: [Bioperl-l] Reading a XML sequence (UniParc) into a BioSeq object Message-ID: <45E2A59B.6080300@tagc.univ-mrs.fr> Hello ! I would like to fill a BioSeq object with the output from a dbfetch request at EI on UniParc database (which replies only XML code, as I am interested in references). If somebody could tell which BioPerl object to use or a way or convert it in Swiss format or could tell me the way to do it or has got a piece of code (is http://doc.bioperl.org/bioperl-live/Bio/SeqIO/interpro.html a good starting point), I would appreciate a lot. Best regards, --Samuel MSTRSVSSSSYRRMFGGPGTASRPSSSRSYVTTSTRTYSLGSALRPSTSRSLYASSPGGV YATRSSAVRLRSSVPGVRLLQDSVDFSLADAINTEFKNTRTNEKVELQELNDRFANYIDK VRFLEQQNKILLAELEQLKGQGKSRLGDLYEEEMRELRRQVDQLTNDKARVEVERDNLAE DIMRLREKLQEEMLQREEAENTLQSFRQDVDNASLARLDLERKVESLQEEIAFLKKLHEE EIQELQAQIQEQHVQIDVDVSKPDLTAALRDVRQQYESVAAKNLQEAEEWYKSKFADLSE AANRNNDALRQAKQESTEYRRQVQSLTCEVDALKGTNESLERQMREMEENFAVEAANYQD TIGRLQDEIQNMKEEMARHLREYQDLLNVKMALDIEIATYRKLLEGEESRISLPLPNFSS LNLRGKHFISL From bix at sendu.me.uk Mon Feb 26 06:46:39 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 26 Feb 2007 11:46:39 +0000 Subject: [Bioperl-l] [Fwd: Bio::Graph::SimpleGraph] In-Reply-To: <45E16ABC.3060405@sendu.me.uk> References: <45E16ABC.3060405@sendu.me.uk> Message-ID: <45E2C89F.1020402@sendu.me.uk> Nat replied, but I messed up to To:s so his reply didn't make it to the list. Here's what he said: Nathan (Nat) Goodman wrote: Hi Tels I agree it's sad to reinvent the wheel, but I don't think that's what happened here. Your module seems to be focused on rendering graphs while my module is concerned with computations on graphs. In any case, as Sendu notes, SimpleGraph is in the process of being deprecated. I fully support this move. It was intended to be a stopgap until the main Perl Graph module was fixed. Since that has now happened, it's time for SimpleGraph to retire. For the benefit of anyone using Graph: last I checked (six months or more ago), it had serious performance problems on large graphs (probably not too much of a surprise), and also was inexplicably slow on graphs with edge attributes. I see that the latter bug is marked "resolved" in CPAN, but there's no indication of when or how. We've moved to Boost for graphs as large as the human protein interaction network. Best, Nat From sanjib at bic.boseinst.ernet.in Mon Feb 26 00:23:36 2007 From: sanjib at bic.boseinst.ernet.in (Sanjib Kumar Gupta) Date: Mon, 26 Feb 2007 10:53:36 +0530 Subject: [Bioperl-l] Remote blast In-Reply-To: <20070221064743.M54123@bic.boseinst.ernet.in> References: <20070221064743.M54123@bic.boseinst.ernet.in> Message-ID: <20070226052336.M74918@bic.boseinst.ernet.in> Hi I have been running this script for some time and it was running fine. I am using this linux machine with live IP(no proxy). But suudenly it has stopped working with this errors waiting...waiting... -------------------- WARNING --------------------- MSG: An Error Occurred

An Error Occurred

500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad hostname 'www.ncbi.nlm.nih.gov') --------------------------------------------------- xx.pep -------------------- WARNING --------------------- MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5 Content-Length: 497 Content-Type: application/x-www-form-urlencoded DATABASE=nr&QUERY=%3ENM_005277.3%3B_1414961_1+837% 0AMEENMEEGQTQKGCFECCIKCLGGIPYASLIATILLYAGVALFCGCGHEALSGTVNILQTYFEMARTAGDTLDVF TMIDIFKYVIYGIAAAFFVYGILLMVEGFFTTGAIKDLYGDFKITTCGRCVSAWFIMLTYLFMLAWLGVTAFTSLPV YMYFNLWTICRNTTLVEGANLCLDLRQFGIVTIGEEKKICTVSENFLRMCESTELNMTFHLFIVALAGAGAAVIAMV HYLMVLSANWAYVKDACRMQKYEDIKSKEEQELHDIHSTRSKERLNAYT*&COMPOSITION_BASED_STATISTI CS=off&EXPECT=1e- 10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&MATRIX_NAME=BLOSUM62&ENTREZ_ QUERY=Xenopus+laevis[Organism]&PROGRAM=blastp An Error Occurred

An Error Occurred

500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad hostname 'www.ncbi.nlm.nih.gov') --------------------------------------------------- waiting...waiting... -------------------- WARNING --------------------- MSG: An Error Occurred

An Error Occurred

500 Internal Server Error --------------------------------------------------- Though I am able to see the ncbi page from browser but am unable to ping ot trace route to the server. Please help me. On Wed, 21 Feb 2007 01:00:46 -0500, bioperl-l-request wrote > Mailing list subscription confirmation notice for mailing list > Bioperl-l > > We have received a request from 202.141.148.27 for subscription of > your email address, "sanjib at bic.boseinst.ernet.in", to the > bioperl-l at lists.open-bio.org mailing list. To confirm that you want > to be added to this mailing list, simply reply to this message, > keeping the Subject: header intact. Or visit this web page: > > http://lists.open-bio.org/mailman/confirm/bioperl- l/d31449c0ad1146c7ae6d2d9b585816664f476568 > > Or include the following line -- and only the following line -- in a > message to bioperl-l-request at lists.open-bio.org: > > confirm d31449c0ad1146c7ae6d2d9b585816664f476568 > > Note that simply sending a `reply' to this message should work from > most mail readers, since that usually leaves the Subject: line in the > right form (additional "Re:" text in the Subject: is okay). > > If you do not wish to be subscribed to this list, please simply > disregard this message. If you think you are being maliciously > subscribed to the list, or have any other questions, send them to > bioperl-l-owner at lists.open-bio.org. -- Sanjib Kumar Gupta Bioinformatics Centre Bose Institute Kolkata 700054, INDIA Phone : +91-33-2355 6626, 2816, 2355 4766 Fax : +91-33-2355 3886 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: n9.pl Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070226/86a0137c/attachment.pl From cjfields at uiuc.edu Mon Feb 26 09:59:21 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 26 Feb 2007 08:59:21 -0600 Subject: [Bioperl-l] Remote blast In-Reply-To: <20070226052336.M74918@bic.boseinst.ernet.in> References: <20070221064743.M54123@bic.boseinst.ernet.in> <20070226052336.M74918@bic.boseinst.ernet.in> Message-ID: I tested this out and got BLAST to work for my test case (single fasta seq, since you didn't send any seqs for testing). It keeps querying for the RID in what appears to be an infinite loop (i.e. it doesn't get rid of the RID properly); you can see this if you add '- verbose => 1' to your parameters. I don't have time to delve into it but from a quick glance it may be due to your looping structure and how you are saving your rids. As for your particular error, could it be something as simple as the server was overloaded or down? It does happen from time to time... Beyond that I can't make heads or tails of your script. Was it cobbled together from a bunch of others? If you are doing that you can probably expect some bugs to occur. chris On Feb 25, 2007, at 11:23 PM, Sanjib Kumar Gupta wrote: > Hi > I have been running this script for some time and it was running > fine. I am > using this linux machine with live IP(no proxy). But suudenly it > has stopped > working with this errors > > waiting...waiting... > -------------------- WARNING --------------------- > MSG: > An Error Occurred > >

An Error Occurred

> 500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad > hostname 'www.ncbi.nlm.nih.gov') > > > > --------------------------------------------------- > xx.pep > > -------------------- WARNING --------------------- > MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi > User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5 > Content-Length: 497 > Content-Type: application/x-www-form-urlencoded > > DATABASE=nr&QUERY=%3ENM_005277.3%3B_1414961_1+837% > 0AMEENMEEGQTQKGCFECCIKCLGGIPYASLIATILLYAGVALFCGCGHEALSGTVNILQTYFEMARTA > GDTLDVF > TMIDIFKYVIYGIAAAFFVYGILLMVEGFFTTGAIKDLYGDFKITTCGRCVSAWFIMLTYLFMLAWLGVT > AFTSLPV > YMYFNLWTICRNTTLVEGANLCLDLRQFGIVTIGEEKKICTVSENFLRMCESTELNMTFHLFIVALAGAG > AAVIAMV > HYLMVLSANWAYVKDACRMQKYEDIKSKEEQELHDIHSTRSKERLNAYT*&COMPOSITION_BASED_S > TATISTI > CS=off&EXPECT=1e- > 10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&MATRIX_NAME=BLOSUM62& > ENTREZ_ > QUERY=Xenopus+laevis[Organism]&PROGRAM=blastp > > > An Error Occurred > >

An Error Occurred

> 500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad > hostname 'www.ncbi.nlm.nih.gov') > > > > --------------------------------------------------- > waiting...waiting... > -------------------- WARNING --------------------- > MSG: > An Error Occurred > >

An Error Occurred

> 500 Internal Server Error > > > > --------------------------------------------------- > > Though I am able to see the ncbi page from browser but am unable to > ping ot > trace route to the server. > > Please help me. From cjfields at uiuc.edu Mon Feb 26 10:05:50 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 26 Feb 2007 09:05:50 -0600 Subject: [Bioperl-l] AlignIO problems In-Reply-To: References: Message-ID: <082E0708-6B1C-45CE-B387-429B8B6A8D7A@uiuc.edu> Make sure to keep this on the list, others may have some input. You should be able to test the various sequence objects you're retrieving from Bio::DB::Fasta via Bio::SeqIO to see if they are what you're expecting, then track down the problematic sequences. My guess is the odd seqs are due to the way you are using Bio::DB::Fasta for each of the files. I'm wondering if you are having problems with indices overwriting one another and are thus getting back blank seq objects. You should probably consider just indexing all of your files together; according to the POD you can use a single Bio::DB::Fasta to index all of the files in one go (indicate the path and use '-glob') and retrieve what you need that way. Either that or separating them into separate directories so the indices are also separate. chris On Feb 25, 2007, at 9:50 PM, ? ?? wrote: > Thank you for your help! > May be you are right, I use the following code to create my seq > object arrays: > my $outfilename=$dmel; > my $ana_pep_db=Bio::DB::Fasta->new("dana.translation.fasta"); > my $ana_cdna_db=Bio::DB::Fasta->new("dana.cds.fasta"); > my $ere_pep_db=Bio::DB::Fasta->new("dere.translation.fasta"); > my $ere_cdna_db=Bio::DB::Fasta->new("dere.cds.fasta"); > my $mel_pep_db=Bio::DB::Fasta->new("dmel.translation.fasta"); > my $mel_cdna_db=Bio::DB::Fasta->new("dmel.cds.fasta"); > my $sec_pep_db=Bio::DB::Fasta->new("dsec.translation.fasta"); > my $sec_cdna_db=Bio::DB::Fasta->new("dsec.cds.fasta"); > my $sim_pep_db=Bio::DB::Fasta->new("dsim.translation.fasta"); > my $sim_cdna_db=Bio::DB::Fasta->new("dsim.cds.fasta"); > my $yak_pep_db=Bio::DB::Fasta->new("dyak.translation.fasta"); > my $yak_cdna_db=Bio::DB::Fasta->new("dyak.cds.fasta"); > my $ana_pep_obj=$ana_pep_db->get_Seq_by_id($dana); > my $ana_nuc_obj=$ana_cdna_db->get_Seq_by_id($dana); > my $ere_pep_obj=$ere_pep_db->get_Seq_by_id($dere); > my $ere_nuc_obj=$ere_cdna_db->get_Seq_by_id($dere); > my $mel_pep_obj=$mel_pep_db->get_Seq_by_id($dmel); > my $mel_nuc_obj=$mel_cdna_db->get_Seq_by_id($dmel); > my $sec_pep_obj=$sec_pep_db->get_Seq_by_id($dsec); > my $sec_nuc_obj=$sec_cdna_db->get_Seq_by_id($dsec); > my $sim_pep_obj=$sim_pep_db->get_Seq_by_id($dsim); > my $sim_nuc_obj=$sim_cdna_db->get_Seq_by_id($dsim); > my $yak_pep_obj=$yak_pep_db->get_Seq_by_id($ddyak); > my $yak_nuc_obj=$yak_cdna_db->get_Seq_by_id($ddyak); > push @prots, $ana_pep_obj; > push @cdna, $ana_nuc_obj; > push @prots, $ere_pep_obj; > push @cdna, $ere_nuc_obj; > push @prots, $mel_pep_obj; > push @cdna, $mel_nuc_obj; > push @prots, $sec_pep_obj; > push @cdna, $sec_nuc_obj; > push @prots, $sim_pep_obj; > push @cdna, $sim_nuc_obj; > push @prots, $yak_pep_obj; > push @cdna, $yak_nuc_obj; > > then I use the @prots as input for my $aln=$aln_factory->align > (\@prots); > This method will create align files with sequences masked. > > But if I use fasta files(not an object) which contain protein > sequences as input, $inputfile='FBgn0000097.pep'; > @params=('outorder'=>'INPUT'); > $factory=Bio::Tools::Run::Alignment::Clustalw->new(@params); > $aln=$factory->align($inputfile); > #$aln->gap_char('-'); > $aln->map_chars('\.','-'); > $aln_out=Bio::AlignIO->new(-file=>">0097.aln",-format=>'clustalw'); > $aln_out->write_aln($aln); > > This methods create files without masking~~~ > I think sequence objects created by "get_Seq_by_id" from sequence > databases directly are not appropriate. > > Thank you for your suggestion again! > > Jiang. > >> From: Chris Fields >> To: ????? >> Subject: Re: [Bioperl-l] AlignIO problems >> Date: Sun, 25 Feb 2007 21:26:34 -0600 >> >> I ran the same using a local fasta formatted file on my system >> which works (no masking). >> >> Of note, the gaps were all marked as '.'. You're gaps were both >> '.' and '-', which may mean that something is wrong with the seq >> objects themselves. Maybe SeqIO is misreading them? >> >> chris >> >> On Feb 25, 2007, at 7:34 PM, ????? wrote: >> >>> I use the Bio::Tools::Run::Alignment::Clustalw module to carry >>> out multiple alignment. >>> my code is: >>> my @clustal_param=('outorder'=>'INPUT'); >>> my $aln_factory=Bio::Tools::Run::Alignment::Clustalw->new >>> (@clustal_param); >>> my $aln=$aln_factory->align(\@prots);###@prots is >>> array of protein sequence objects >>> my $aln_out=Bio::AlignIO->new(-file=>">./dmel_group/ >>> clustal/ ${outfilename}.aln",-format=>'clustalw'); >>> >>> $aln_out->write_aln($aln); >>> This code produce alignment which mask identity residues. >>> But if i use clustalW directly, the output is normal. >>> Thank you for your help~ >>> >>> Jiang >> > > _________________________________________________________________ > ???? MSN Explorer: http://explorer.msn.com/lccn Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From michael.watson at bbsrc.ac.uk Mon Feb 26 11:00:31 2007 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Mon, 26 Feb 2007 16:00:31 -0000 Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna In-Reply-To: <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com> References: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk> <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2EBD3@iahce2ksrv1.iah.bbsrc.ac.uk> Hi Lincoln/List That's great, the axis now appears, but there are no labels. This in itself isn't a problem, as long as we can assume that the tick marks are at 0, 50% and 100%? If that's true, we can go with what we have, otherwise I'm going to have to figure out a way to label the y-axis Thanks Mick ________________________________ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: 15 February 2007 18:53 To: michael watson (IAH-C) Cc: BioPerl-List Subject: Re: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna Hi Michael, When you set up the panel, do this: Bio::Graphics::Panel->new(-blah -blah, -pad_left => 20, -pad_right => 20); This will leave enough room on the left and right for you to see the Y axis. Otherwise it runs off the edge of the image (ok, this is a mis-design, but it was the only way to solve a chicken-and-egg problem about who gets to say how wide the panel is) Lincoln On 2/15/07, michael watson (IAH-C) wrote: Hi OK I have some great images out of this glyph, but I can't see the axis, and nor is it labelled (ie does it go from 0 - 100%?) so isn't great for publication. The docs say: "NOTE: -gc_window=>'auto' gives nice results and is recommended for drawing GC content. The GC content axes draw slightly outside the panel, so you may wish to add some extra padding on the right and left. " Any idea how to do this? Basically, I want a nice GC graph with the axis quite clearly labelled, and a nice "%GC" title next to it :) Thanks Mick The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Mon Feb 26 12:18:38 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 26 Feb 2007 11:18:38 -0600 Subject: [Bioperl-l] AlignIO problems In-Reply-To: References: Message-ID: <7DF958E6-E233-427F-8901-3FE571CD99BD@uiuc.edu> On Feb 26, 2007, at 9:59 AM, ? ?? wrote: > Thank you! > I have checked the sequences retrieved through lots of Bio:DB > objects work simultaneously. > There are not problems you mentioned, the sequences are not > overwritten. Again, keep this on the list. I have my hands full this month so I will be checking the list only very sporadically; someone else may be able to help you. The only explanation for the clustalw output you get is that you are not retrieving the correct sequence in some way fundamental way, which to me indicates the bug originates either in the way the sequences are retrieved (i.e. somehow via Bio::DB::Fasta, hence my thought about conflicting indices) or in the way they are converted via Bio::SeqIO, which is used in Bio::Tools::Run::Alignment::Clustalw. When I have used Bio::DB::Fasta in the past I have never had a problem when indexing multiple files and retrieving sequences, so beyond running tests with your data I can't help you much beyond the above conjecturing. chris From jason at bioperl.org Mon Feb 26 13:45:34 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 26 Feb 2007 10:45:34 -0800 Subject: [Bioperl-l] Question to Bio::Tools::Run RemoteBlast In-Reply-To: <20070226095515.68810@gmx.net> References: <20070226095515.68810@gmx.net> Message-ID: <2D2DF6D9-6DAE-4BB7-B31B-8C19CCCA7301@bioperl.org> Alex - I am glad to see of your interest in the module, but I don't currently have any time to maintain it and so queries should be sent to the BioPerl mailing list. In general we prefer you don't contact developers directly, but use the mailing list so that others can learn from questions. Please note there are several tutorials and documentation on the website, you will get a better response from people if you can show you have at least tried to use the existing example code to construct your program. -jason On Feb 26, 2007, at 1:55 AM, Alexander Auner wrote: > Daer Jason Stajich, > I hope you can me help. > > I am inspired of their module and would like to work with it. > I am a student to the TFH Wildau. > I have problems with the understanding of the module. > > You could send me an example. > > The example is to process a text file (FASTA) with NCBI-Blast (Web). > > Parameter: > Choose database -> Others -> nr > Limit by entrez query -> Campylobacter -> or select from: -> > Bacteria [ORGN] > Expect -> 10 > Other advanced -> -q-1 > > output format > plain text without Graphical Overview > Number of: -> Descriptions -> 10000 > Alignment view -> query-anchored with identities > > All other parameters remain undef. > > Thank you for your help. > > faithfully Alexander Auner > -- > "Feel free" - 5 GB Mailbox, 50 FreeSMS/Monat ... > Jetzt GMX ProMail testen: www.gmx.net/de/go/mailfooter/promail-out From jason at bioperl.org Mon Feb 26 14:13:00 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 26 Feb 2007 11:13:00 -0800 Subject: [Bioperl-l] BioPerl leadership additions Message-ID: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> Dear BioPerl Users and Developers, I want to announce a addition in the leadership of BioPerl. Christopher Fields and and Sendu Bala are now members of the BioPerl Core developer group to recognize their ongoing leadership in the project. Chris and Sendu were instrumental in the 1.5.2 Developer release and have made a significant commitment and contribution to the quality of the code and the documentation of the project. We have invited them to be part of the core to recognize their work and to feel comfortable to ask them to do more. ;-) The Core group was established to insure that someone was responsible for making code releases, vetting new developers for CVS write accounts, and generally dealing with things that might otherwise slip through the cracks. We are very excited to have more people contributing to and maintaining the toolkit. We look forward to their help along with all the other developers, as we work towards a 1.6 release release this year. As always, while their is a need for some individuals to lead the project, we encourage contributions from all levels of expertise to improve the code, documentation, and tutorials of the project. We plan to discuss the progress of the toolkit at this year's Bioinformatics Open Source Conference held in Vienna, Austria in conjunction with the SIG meetings at ISMB. We are trying to use BOSC 2007 as a chance for the developers of Open Bioinformatics Foundation sponsored and related projects to coordinate future development and release cycles. Jason Stajich on behalf of the Core developers From khan at cshl.edu Mon Feb 26 15:29:19 2007 From: khan at cshl.edu (Khan, Sohail) Date: Mon, 26 Feb 2007 15:29:19 -0500 Subject: [Bioperl-l] parsing a list of ids to a fasta file. Message-ID: Thanks Michael. I have the scripts installed. I can pass an id to indexed fasta file and retrieve the seq. However, I was wondering if I can pass a list of ids from a file and get seq. for all the ids? Thanks. -Sohail -----Original Message----- From: michael watson (IAH-C) [mailto:michael.watson at bbsrc.ac.uk] Sent: Tuesday, February 20, 2007 4:33 PM To: Khan, Sohail; Bioperl-l at lists.open-bio.org Subject: RE: [Bioperl-l] parsing a list of ids to a fasta file. Suggest you use Bio::Index::Fasta to create an index for the fasta file and then a simple script to retrieve sequences using that index. Or just use the pre-written bp_index.PLS and bp_fetch.PLS scripts. http://www.bioperl.org/wiki/Module:Bio::Index::Fasta ________________________________ From: bioperl-l-bounces at lists.open-bio.org on behalf of Khan, Sohail Sent: Tue 20/02/2007 8:42 PM To: Bioperl-l at lists.open-bio.org Subject: [Bioperl-l] parsing a list of ids to a fasta file. Dear list, I am new to Bio-Perl. I have the following question: I have a list of ids, which I would like to parse against a large fasta file to retrieve the Seq for parsed ids. I appreciate any suggestions. Thanks. Khan _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From arareko at campus.iztacala.unam.mx Mon Feb 26 16:44:49 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Mon, 26 Feb 2007 15:44:49 -0600 Subject: [Bioperl-l] BioPerl leadership additions In-Reply-To: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> References: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> Message-ID: <45E354D1.4070600@campus.iztacala.unam.mx> Congrats Chris & Sendu! Very well-deserved. Keep up the great work. Cheers! Mauricio. Jason Stajich wrote: > Dear BioPerl Users and Developers, > > I want to announce a addition in the leadership of BioPerl. > Christopher Fields and and Sendu Bala are now members of the BioPerl > Core developer group to recognize their ongoing leadership in the > project. Chris and Sendu were instrumental in the 1.5.2 Developer > release and have made a significant commitment and contribution to > the quality of the code and the documentation of the project. We > have invited them to be part of the core to recognize their work and > to feel comfortable to ask them to do more. ;-) > > The Core group was established to insure that someone was responsible > for making code releases, vetting new developers for CVS write > accounts, and generally dealing with things that might otherwise slip > through the cracks. We are very excited to have more people > contributing to and maintaining the toolkit. We look forward to > their help along with all the other developers, as we work towards a > 1.6 release release this year. > > As always, while their is a need for some individuals to lead the > project, we encourage contributions from all levels of expertise to > improve the code, documentation, and tutorials of the project. > > We plan to discuss the progress of the toolkit at this year's > Bioinformatics Open Source Conference held in Vienna, Austria in > conjunction with the SIG meetings at ISMB. We are trying to use > BOSC 2007 as a chance for the developers of Open Bioinformatics > Foundation sponsored and related projects to coordinate future > development and release cycles. > > Jason Stajich on behalf of the Core developers > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From lubapardo at gmail.com Tue Feb 27 08:26:30 2007 From: lubapardo at gmail.com (Luba Pardo) Date: Tue, 27 Feb 2007 14:26:30 +0100 Subject: [Bioperl-l] parsing blast results Message-ID: <58ff33550702270526q43ab584ci7baf6a09e2584e38@mail.gmail.com> Hi, I am using the module Bio::SearchIO to parse some blast results. I would like to store the ids of the results into an array but I am not sure if this is possible to do it with an existing subroutine. Does anyone have an idea whether there is a method included within the module Bio::SearchIO to do so? Thanks in advance, L.Pardo From cjfields at uiuc.edu Tue Feb 27 09:11:37 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 27 Feb 2007 08:11:37 -0600 Subject: [Bioperl-l] parsing blast results In-Reply-To: <58ff33550702270526q43ab584ci7baf6a09e2584e38@mail.gmail.com> References: <58ff33550702270526q43ab584ci7baf6a09e2584e38@mail.gmail.com> Message-ID: On Feb 27, 2007, at 7:26 AM, Luba Pardo wrote: > Hi, > I am using the module Bio::SearchIO to parse some blast results. I > would > like to store the ids of the results into an array but I am not > sure if this > is possible to do it with an existing subroutine. Does anyone have > an idea > whether there is a method included within the module Bio::SearchIO > to do so? > Thanks in advance, > L.Pardo Bio::SearchIO doesn't currently have a method to retrieve all the accessions in a BLAST result. The best way to do this is to iterate through the objects: my @accs; while (my $result = $searchio->next_result) { while (my $hit = $result->next_hit) { push @accs, $hit->accession; # do whatever else here... } } print join ',', @accs; I don't think all accessions in the description are parsed out at the moment, just the first one (or the one in the hit table). If you want all of them or if you want the NCBI GI you'll need to parse them out of the description heading ($hit->description). chris From sac at bioperl.org Tue Feb 27 12:59:22 2007 From: sac at bioperl.org (Steve Chervitz) Date: Tue, 27 Feb 2007 09:59:22 -0800 Subject: [Bioperl-l] [Bioperl-announce-l] BioPerl leadership additions In-Reply-To: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> References: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> Message-ID: <8f200b4c0702270959n7521f693l915bfabe7ccb7ef7@mail.gmail.com> Welcome to the club, Chris & Sendu. Always good to have an infusion of new blood and capable, motivated hands. Steve On 2/26/07, Jason Stajich wrote: > > Dear BioPerl Users and Developers, > > I want to announce a addition in the leadership of BioPerl. > Christopher Fields and and Sendu Bala are now members of the BioPerl > Core developer group to recognize their ongoing leadership in the > project. Chris and Sendu were instrumental in the 1.5.2 Developer > release and have made a significant commitment and contribution to > the quality of the code and the documentation of the project. We > have invited them to be part of the core to recognize their work and > to feel comfortable to ask them to do more. ;-) > > The Core group was established to insure that someone was responsible > for making code releases, vetting new developers for CVS write > accounts, and generally dealing with things that might otherwise slip > through the cracks. We are very excited to have more people > contributing to and maintaining the toolkit. We look forward to > their help along with all the other developers, as we work towards a > 1.6 release release this year. > > As always, while their is a need for some individuals to lead the > project, we encourage contributions from all levels of expertise to > improve the code, documentation, and tutorials of the project. > > We plan to discuss the progress of the toolkit at this year's > Bioinformatics Open Source Conference held in Vienna, Austria in > conjunction with the SIG meetings at ISMB. We are trying to use > BOSC 2007 as a chance for the developers of Open Bioinformatics > Foundation sponsored and related projects to coordinate future > development and release cycles. > > Jason Stajich on behalf of the Core developers > > _______________________________________________ > Bioperl-announce-l mailing list > Bioperl-announce-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l > From cjfields at uiuc.edu Tue Feb 27 15:57:40 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 27 Feb 2007 14:57:40 -0600 Subject: [Bioperl-l] Bio::SeqIO::FTHelper Message-ID: Could anyone tell me what FTHelper is used for? From what I gather it rolls up seqfeature data into a lightweight object but then creates a SeqFeature::Generic anyway (at least for GenBank/EMBL/ Swiss), which seems to be a waste of memory and time. Is there something I'm missing (besides my sanity of course)? chris From Jay at jays.net Wed Feb 28 04:39:55 2007 From: Jay at jays.net (Jay Hannah) Date: Wed, 28 Feb 2007 03:39:55 -0600 Subject: [Bioperl-l] "Command-Line Bioinformatics" Message-ID: Reading this article: http://www.linuxjournal.com/article/6977 Sequencing the SARS Virus - Linux Journal, Nov 2003 This guy needs Perl and/or BioPerl. :) > The sequence file is in FASTA format consisting of a header line > and the sequence, split into fixed-width lines. The following > counts the number of Gs and Cs in the sequence and presents the > total as a fraction of the total number of bases: > > > grep -v "^>" AY274119.fa | fold -w 1 | > tr "ATGC" "..xx" | sort | uniq -c | > sed 's/[^0-9]//g' | t -s "\012" " " | > sed 's/\([0-9]*\) \([0-9]*\)/scale = 3; > ?\2 \/ (\1+\2)/' | > bc -i > scale = 3; 12127 / (17624+12127) > .407 > > Out of the 29,751 bases in our sequence, 12,127 are either G or C, > giving a GC content of 41%. BioPerl version: use Bio::SeqIO; my $io = Bio::SeqIO->new( -file => 'AY274119.fa', -format => 'Fasta' ); my $seq = $io->next_seq->seq; print ( ($seq =~ tr/GC/GC/) / length ($seq) ); Command-line Perl: perl -e '$/ = undef; $_ = <>; s/>.*//; s/\n//g; print tr/GC/GC/ / length($_)' AY274119.fa I'm sure you can Perl Golf my stabs at it. :) j seqlab.net http://www.bioperl.org/wiki/User:Jhannah From n.saunders at uq.edu.au Wed Feb 28 05:25:08 2007 From: n.saunders at uq.edu.au (Neil Saunders) Date: Wed, 28 Feb 2007 20:25:08 +1000 Subject: [Bioperl-l] Bio::Factory::EMBOSS, CGI and taint Message-ID: <45E55884.9010908@uq.edu.au> Dear Bioperlers, I'm trying to understand an error that occurs when Bio::Factory::EMBOSS is used in a CGI script. Using BioPerl 1.5.2 on Ubuntu Dapper, Apache 2.0.55, Perl 5.8.7. If I load this test CGI script (cgi.pl) in a browser: BEGIN CODE ---------- #!/usr/bin/perl -Tw use strict; use CGI; use Bio::Factory::EMBOSS; my $cgi = new CGI; my $f = new Bio::Factory::EMBOSS; print $cgi->header, $cgi->start_html, $cgi->end_html; -------- END CODE I get a 500 server error and the Apache error log reads: [error] [client 192.168.0.3] Premature end of script headers: cgi.pl I can fix this in 2 ways: (1) Move the "my $f = new Bio::Factory::EMBOSS" line to the end of the script, which isn't a very useful fix. (2) Remove the -T switch from the shebang line There seem to be a few old posts on the list regarding "taint-safe" modules. It seems that the new Bio::Factory::EMBOSS object is interfering with the headers in some way, but I'm no CGI.pm guru and wondered if anyone could shed light on this. thanks, Neil -- School of Molecular and Microbial Sciences University of Queensland Brisbane 4072 Australia http://nsaunders.wordpress.com From n.saunders at uq.edu.au Wed Feb 28 05:30:31 2007 From: n.saunders at uq.edu.au (Neil Saunders) Date: Wed, 28 Feb 2007 20:30:31 +1000 Subject: [Bioperl-l] more on Bio::Factory::EMBOSS, CGI and taint Message-ID: <45E559C7.1090308@uq.edu.au> Further to my previous email, adding: BEGIN { $|=1; print "Content-type: text/html\n\n"; use CGI::Carp('fatalsToBrowser'); } to my CGI script generates: Insecure $ENV{PATH} while running with -T switch at /usr/local/share/perl/5.8.7/Bio/Factory/EMBOSS.pm line 251. Neil -- School of Molecular and Microbial Sciences University of Queensland Brisbane 4072 Australia http://nsaunders.wordpress.com From n.saunders at uq.edu.au Wed Feb 28 05:50:58 2007 From: n.saunders at uq.edu.au (Neil Saunders) Date: Wed, 28 Feb 2007 20:50:58 +1000 Subject: [Bioperl-l] CGI taint solved Message-ID: <45E55E92.10608@uq.edu.au> Apologies for running a one-man thread, but I realised that I've now answered my own question regarding errors with CGI, Bio::Factory::EMBOSS and taint. Given that the EMBOSS binaries are in /usr/local/bin, adding: $ENV{'PATH'} = '/usr/local/bin' near the top of the script does the trick. Neil -- School of Molecular and Microbial Sciences University of Queensland Brisbane 4072 Australia http://nsaunders.wordpress.com From cjfields at uiuc.edu Wed Feb 28 08:39:24 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Feb 2007 07:39:24 -0600 Subject: [Bioperl-l] CGI taint solved In-Reply-To: <45E55E92.10608@uq.edu.au> References: <45E55E92.10608@uq.edu.au> Message-ID: That could possibly clobber any other program calls from within the same script (unless they reside in /usr/local/bin) since you're explicitly assigning PATH, not appending: $ENV{"PATH"} = '/usr/local/bin'; gets me (printing $ENV{"PATH"}): /usr/local/bin whereas this: $ENV{"PATH"} = '/usr/local/bin:' . $ENV{"PATH"}; gets me: /usr/local/bin:/Users/cjfields/bin:/Users/cjfields/dart/bin:/usr/ local/mysql/bin:/usr/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin There's probably a File::* module that does this safely per OS flavor. chris On Feb 28, 2007, at 4:50 AM, Neil Saunders wrote: > Apologies for running a one-man thread, but I realised that I've > now answered my > own question regarding errors with CGI, Bio::Factory::EMBOSS and > taint. > > Given that the EMBOSS binaries are in /usr/local/bin, adding: > > $ENV{'PATH'} = '/usr/local/bin' > > near the top of the script does the trick. > > > Neil > -- > School of Molecular and Microbial Sciences > University of Queensland > Brisbane 4072 Australia > > http://nsaunders.wordpress.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From stefan.kirov at bms.com Wed Feb 28 10:35:31 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Wed, 28 Feb 2007 10:35:31 -0500 Subject: [Bioperl-l] CGI taint solved In-Reply-To: References: <45E55E92.10608@uq.edu.au> Message-ID: <45E5A143.3080303@bms.com> Neil, I believe this is your situation: http://wn.cyberwerks.com/2000/0411.html my advice: any commands executed from within cgi script should have a path hardcoded whenever possible. If those commands require different path, try writing a wrapper shell script that sets the environment (which should be reset to the default once the shell script terminates). It all also depends on the type of environment you have- it it is not secure you may wish to think hard how to eliminate all security loopholes with CGI, I am definitely not an expert on this. Stefan Chris Fields wrote: > That could possibly clobber any other program calls from within the > same script (unless they reside in /usr/local/bin) since you're > explicitly assigning PATH, not appending: > > $ENV{"PATH"} = '/usr/local/bin'; > > gets me (printing $ENV{"PATH"}): > > /usr/local/bin > > whereas this: > > $ENV{"PATH"} = '/usr/local/bin:' . $ENV{"PATH"}; > > gets me: > > /usr/local/bin:/Users/cjfields/bin:/Users/cjfields/dart/bin:/usr/ > local/mysql/bin:/usr/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin > > There's probably a File::* module that does this safely per OS flavor. > > chris > > On Feb 28, 2007, at 4:50 AM, Neil Saunders wrote: > > >> Apologies for running a one-man thread, but I realised that I've >> now answered my >> own question regarding errors with CGI, Bio::Factory::EMBOSS and >> taint. >> >> Given that the EMBOSS binaries are in /usr/local/bin, adding: >> >> $ENV{'PATH'} = '/usr/local/bin' >> >> near the top of the script does the trick. >> >> >> Neil >> -- >> School of Molecular and Microbial Sciences >> University of Queensland >> Brisbane 4072 Australia >> >> http://nsaunders.wordpress.com >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From lubapardo at gmail.com Wed Feb 28 12:21:07 2007 From: lubapardo at gmail.com (Luba Pardo) Date: Wed, 28 Feb 2007 18:21:07 +0100 Subject: [Bioperl-l] retrieven ids Message-ID: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com> Hi everyone, I wonder if someone could give an advice of the following: I want to retrieve the DNA coding sequence of a RefSeq protein id. I do not want to translate the protein back to DNA, but rather get the DNA coding sequence ID and then retrieve the DNA sequence from Gen Bank. Is there any module that allow to get all possible ids for a sequence given a gi protein ? Thank you very much in advance, L. Pardo From johnston at biochem.ucl.ac.uk Wed Feb 28 12:05:49 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Wed, 28 Feb 2007 17:05:49 +0000 (GMT) Subject: [Bioperl-l] _rearrange Message-ID: hi, Is there a discussion of the rationale behind the _rearrange method somewhere? I'm probably just being gormless, but I think I'm missing the point a bit. Is it okay for a method just to expect named params like ->foo(arg1=>'stuff', arg2=>'things'); ? Cxx From ckuanglim at yahoo.com Wed Feb 28 10:51:50 2007 From: ckuanglim at yahoo.com (Chan Kuang Lim) Date: Wed, 28 Feb 2007 07:51:50 -0800 (PST) Subject: [Bioperl-l] Problem of Installing Bioperl Message-ID: <459942.77644.qm@web60518.mail.yahoo.com> I have problem of installing bioperl in windows using command-line installation. In the cmd windows, after ppm-shell search bioperl install 2 many downloading had done, but the next line is: Unpacking bioperl-1.5.2_100...ppm install failed: Can't extract files from C:.............../Bioperl-1.5.2_100.tgz Hope you can answer my question. Thank you. Regards, Chan Kuang Lim Malaysia --------------------------------- TV dinner still cooling? Check out "Tonight's Picks" on Yahoo! TV. From cjfields at uiuc.edu Wed Feb 28 13:30:45 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Feb 2007 12:30:45 -0600 Subject: [Bioperl-l] _rearrange In-Reply-To: References: Message-ID: <25C736A2-2DCA-413B-8F92-D799F583515B@uiuc.edu> From what I gather it's a convenient utility method that is used for consistent and enforced parameter checking/setting for any method, including the constructor. There are a few modules that don't use _rearrange (Bio::WebAgent::new () comes to mind). It's not required that you use it but the naming conventions for parameters outlined in _rearrange (in Bio::Root::RootI POD) are generally enforced for consistency across classes. As a note, Sendu has committed a related method (_set_from_args) to CVS which works rather well, but I don't think it is in the last release. chris On Feb 28, 2007, at 11:05 AM, Caroline Johnston wrote: > hi, > > Is there a discussion of the rationale behind the _rearrange method > somewhere? I'm probably just being gormless, but I think I'm > missing the > point a bit. > > Is it okay for a method just to expect named params like > ->foo(arg1=>'stuff', arg2=>'things'); ? > > Cxx > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dmessina at wustl.edu Wed Feb 28 14:31:29 2007 From: dmessina at wustl.edu (Dave Messina) Date: Wed, 28 Feb 2007 13:31:29 -0600 (CST) Subject: [Bioperl-l] retrieven ids In-Reply-To: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com> References: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com> Message-ID: <57419.10.0.7.57.1172691089.squirrel@gscmail.wustl.edu> Whenever I'm unsure of how to do something, I first look to see if one of the HOWTOs on bioperl.org covers it. In this case, the Features HOWTO has example code which I think will do what you want. Genbank records typically have the coding sequence of a protein as a feature, so I would do something like: - use the RefSeq protein IDs to query Entrez and get back the Genbank records. - read the Features HOWTO to refresh my memory on the syntax for grabbing features. That HOWTO is at: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation - whip up a little script to loop through the Genbank records one at a time with SeqIO and pull out the cDNA sequence features. Dave From bix at sendu.me.uk Wed Feb 28 14:38:46 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 28 Feb 2007 19:38:46 +0000 Subject: [Bioperl-l] _rearrange In-Reply-To: References: Message-ID: <45E5DA46.3020503@sendu.me.uk> Caroline Johnston wrote: > hi, > > Is there a discussion of the rationale behind the _rearrange method > somewhere? I'm probably just being gormless, but I think I'm missing the > point a bit. > > Is it okay for a method just to expect named params like > ->foo(arg1=>'stuff', arg2=>'things'); ? The Bioperl style for named args is -arg1, and wrong case is allowed as well. So, make use of _rearrange; it won't do you any harm. From johnsonm at gmail.com Wed Feb 28 14:59:09 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Wed, 28 Feb 2007 13:59:09 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer Message-ID: I happen to need something like Bio::Tools::Run::Genemark, so I'm coding one up. When I started on the tests for it, I realized I have a problem. I can distribute a fasta file downloaded from GenBank to use as input, but I can't distribute the model file needed to actually run Genemark ( Genemark.hmm for prokaryotes, gmhmmp, in my case). It took *forever* to get a license, and I'm not thrilled with the prospect of talking them out of a redistributable model file. I'd love to distribute the test, but I don't see how I'm going to be able to. Suggestions? Also, I've settled on IPC::Run instead of system(). The docs indicate the bits of it I'm using should be OK on Windows, except maybe for Win9X. I don't want to clutter up the console, I don't like embedding stdout/stderr redirection in command strings, and I don't want to have to worry about signal handling (What if the child catches a ctrl-c halfway through parsing? What if the parent does?). Anybody object to that? One final thing. I'm lazy, I don't want to deal with parsing arguments to the constructor, so I'm just calling _rearrange() to deal with it. The Bio::Tools:: parsers all take dash options, but it looks like a bunch of the stuff in Bio::Tools::Run:: takes dashless args. Objections? From dmessina at wustl.edu Wed Feb 28 15:14:56 2007 From: dmessina at wustl.edu (Dave Messina) Date: Wed, 28 Feb 2007 14:14:56 -0600 (CST) Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: References: Message-ID: <57538.10.0.7.57.1172693696.squirrel@gscmail.wustl.edu> > I'm not thrilled with the prospect of talking them out of a redistributable > model file. I suppose it's not possible to fake your own, or at least the parts of it you're testing for? If not, I'd put the tests in a skip block while waiting to hear from the Genemark folks. > The Bio::Tools:: parsers all take dash options, but it looks like a bunch of > the stuff in Bio::Tools::Run:: takes dashless args. Objections? Sendu will chime in I'm sure, but I think he was planning to switch everything in Bio::Tools::Run over to dashed args anyway... Dave From bix at sendu.me.uk Wed Feb 28 15:52:23 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 28 Feb 2007 20:52:23 +0000 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: References: Message-ID: <45E5EB87.9020106@sendu.me.uk> Mark Johnson wrote: > One final thing. I'm lazy, I don't want to deal with parsing arguments > to the constructor, so I'm just calling _rearrange() to deal with it. The > Bio::Tools:: parsers all take dash options, but it looks like a bunch of the > stuff in Bio::Tools::Run:: takes dashless args. Objections? You can make use of _set_from_args(). See Bio::Tools::Run::Phylo::Gumby for an example. From bix at sendu.me.uk Wed Feb 28 16:29:32 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 28 Feb 2007 21:29:32 +0000 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails Message-ID: <45E5F43C.9080902@sendu.me.uk> I have GD 2.35 and GD::SVG 2.33 installed. I have a working script in which a Bio::Graphics::Panel object is made and output with: print $panel->png; This is fine. Changing it to: print $panel->svg; Gives the error: Can't locate object method "svg" via package "GD:Image" at /.../Bio/Graphics/Panel.pm line 971, line 192. Am I supposed to do something else to get this to work? Cheers, Sendu. From crabtree at tigr.ORG Wed Feb 28 16:40:52 2007 From: crabtree at tigr.ORG (Jonathan Crabtree) Date: Wed, 28 Feb 2007 16:40:52 -0500 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <45E5F43C.9080902@sendu.me.uk> References: <45E5F43C.9080902@sendu.me.uk> Message-ID: <45E5F6E4.80003@tigr.org> Sendu- I believe you must set 'image_class' to 'GD::SVG' when you create the Panel (and note that older versions of Bio::Graphics::Panel don't know anything about this parameter.) Here's the relevant part of the Panel perldoc: -image_class To create output in scalable vector graphics (SVG), optionally pass the image class parameter 'GD::SVG'. Defaults to using vanilla GD. See the corresponding image_class() method below for details. Jonathan Sendu Bala wrote: > I have GD 2.35 and GD::SVG 2.33 installed. > > I have a working script in which a Bio::Graphics::Panel object is made > and output with: > > print $panel->png; > > This is fine. Changing it to: > > print $panel->svg; > > Gives the error: > > Can't locate object method "svg" via package "GD:Image" at > /.../Bio/Graphics/Panel.pm line 971, line 192. > > > Am I supposed to do something else to get this to work? > > > Cheers, > Sendu. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bix at sendu.me.uk Wed Feb 28 17:01:17 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 28 Feb 2007 22:01:17 +0000 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <45E5F6E4.80003@tigr.org> References: <45E5F43C.9080902@sendu.me.uk> <45E5F6E4.80003@tigr.org> Message-ID: <45E5FBAD.3030404@sendu.me.uk> Jonathan Crabtree wrote: > > Sendu- > > I believe you must set 'image_class' to 'GD::SVG' when you create the > Panel (and note that older versions of Bio::Graphics::Panel don't know > anything about this parameter.) Here's the relevant part of the Panel > perldoc: ... Oh! I had no idea there was any perldoc for these modules, hiding down there at the bottom. Does anyone want to intersperse the docs?... From cjfields at uiuc.edu Wed Feb 28 17:10:54 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Feb 2007 16:10:54 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: References: Message-ID: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> On Feb 28, 2007, at 1:59 PM, Mark Johnson wrote: > I happen to need something like Bio::Tools::Run::Genemark, so > I'm coding > one up. When I started on the tests for it, I realized I have a > problem. I > can distribute a fasta file downloaded from GenBank to use as > input, but I > can't distribute the model file needed to actually run Genemark ( > Genemark.hmm for prokaryotes, gmhmmp, in my case). > It took *forever* to get a license, and I'm not thrilled with the > prospect of talking them out of a redistributable model file. I'd > love to > distribute the test, but I don't see how I'm going to be able to. > Suggestions? For bioperl-run tests you have to have the program installed for tests to work (otherwise they are passed over). Therefore one would assume if you had the GeneMark program you would have the models as well. You could set up your module to require an env. variable be set (like the HMMER module, for instance) which contains the executables and/or the models, so that if it isn't set the tests are skipped. > Also, I've settled on IPC::Run instead of system(). The docs > indicate > the bits of it I'm using should be OK on Windows, except maybe for > Win9X. > I don't want to clutter up the console, I don't like embedding > stdout/stderr > redirection in command strings, and I don't want to have to worry > about > signal handling (What if the child catches a ctrl-c halfway through > parsing? What if the parent does?). Anybody object to that? I wouldn't worry too much about Win9x. Is IPC::Run in perl core? Otherwise we'll need to add it to the optional dependencies for bioperl-run. > One final thing. I'm lazy, I don't want to deal with parsing > arguments > to the constructor, so I'm just calling _rearrange() to deal with > it. The > Bio::Tools:: parsers all take dash options, but it looks like a > bunch of the > stuff in Bio::Tools::Run:: takes dashless args. Objections? Sendu's suggestion (_set_from_args() ) is the best. As mentioned in another thread _rearrange() works as well. chris From johnsonm at gmail.com Wed Feb 28 17:29:36 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Wed, 28 Feb 2007 16:29:36 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: <57538.10.0.7.57.1172693696.squirrel@gscmail.wustl.edu> References: <57538.10.0.7.57.1172693696.squirrel@gscmail.wustl.edu> Message-ID: On 2/28/07, Dave Messina wrote: > > > I'm not thrilled with the prospect of talking them out of a > redistributable model file. > > I suppose it's not possible to fake your own, or at least the parts of it > you're testing for? We got a gzipped tarball with some model files and a precompiled executable (gmhmmp). As far as building a model file goes, I don't even have two sticks to rub together. I suppose it's possible that it's not actually some weird proprietary format, I'll go dig for some docs...but I don't hold out a lot of hope. From sukhinder.sandhu at osumc.edu Wed Feb 28 16:49:31 2007 From: sukhinder.sandhu at osumc.edu (Sukhinder Sandhu) Date: Wed, 28 Feb 2007 16:49:31 -0500 Subject: [Bioperl-l] Problem installing bioperl: plz reply soon. thx Message-ID: Hi I am having trouble installing Bundle::BioPerl through CPAN. I don't know if this has something to do with my having root priveleges. Can you please suggest how may I proceed to get over this. I shall really appreciate any help. I am pasting part of the error it keeps giving after trying to install every module. ###################### CPAN.pm: Going to build G/GA/GAAS/HTML-Parser-3.56.tar.gz make: *** No rule to make target `/System/Library/Perl/5.8.6/darwin-thread-multi-2level/CORE/config.h', needed by `Makefile'. Stop. /usr/bin/make -- NOT OK Running make test Can't test without successful make Running make install make had returned bad status, install seems impossible ############################### Thanks sukhinder From sukhinder.sandhu at osumc.edu Tue Feb 27 23:41:43 2007 From: sukhinder.sandhu at osumc.edu (Sukhinder Sandhu) Date: Tue, 27 Feb 2007 23:41:43 -0500 Subject: [Bioperl-l] Problem installing bioperl-1.5.2_102 Message-ID: Hi I am trying to install bioperl on my MACOSX and having problems. I try to following the instructions both at the www.tc.umn.edu..... And in the README and INSTALL files in the bioperl folder that I downloaded. The error I get is the following: (end part of the output is copied) #################### t/versions........ok t/xs..............skipped all skipped: C_support not enabled Failed Test Stat Wstat Total Fail Failed List of Failed ---------------------------------------------------------------------------- --- t/compat.t 5 1280 60 5 8.33% 25-28 31 4 tests and 31 subtests skipped. Failed 1/22 test scripts, 95.45% okay. 5/683 subtests failed, 99.27% okay. make: *** [test] Error 2 /usr/bin/make test -- NOT OK Running make install make test had returned bad status, won't install without force Couldn't install Module::Build, giving up. BEGIN failed--compilation aborted at ModuleBuildBioperl.pm line 51. Compilation failed in require at Build.PL line 14. BEGIN failed--compilation aborted at Build.PL line 14. ########################################################################### I am not able to figure out whats' going wrong. And when I try to run the CPAN, I get the follwing error. I have no idea how to fix these. Any help is greatly appreciated. ############################################################################ [Sukhinders-Computer:~/Desktop/bioperl-1.5.2_102] sand60% perl -MCPAN -e shell Terminal does not support AddHistory. There seems to be running another CPAN process (pid 7207). Contacting... Lockfile /Users/sand60/.cpan/.lock not writeable by you. Cannot proceed. On UNIX try: rm /Users/sand60/.cpan/.lock and then rerun us. at -e line 1 ################################################### And doing what it says, removing some lock file doesn't help. I am wondering if all this has something to do with having root priveleges on the system and if so , is there an alternative? Thanks sukhinder From stefan.kirov at bms.com Wed Feb 28 16:44:05 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Wed, 28 Feb 2007 16:44:05 -0500 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <45E5F43C.9080902@sendu.me.uk> References: <45E5F43C.9080902@sendu.me.uk> Message-ID: <45E5F7A5.3090805@bms.com> I think you should create the object with -image_class='svg'. Can you post the code with wich you create the object? Stefan Sendu Bala wrote: > I have GD 2.35 and GD::SVG 2.33 installed. > > I have a working script in which a Bio::Graphics::Panel object is made > and output with: > > print $panel->png; > > This is fine. Changing it to: > > print $panel->svg; > > Gives the error: > > Can't locate object method "svg" via package "GD:Image" at > /.../Bio/Graphics/Panel.pm line 971, line 192. > > > Am I supposed to do something else to get this to work? > > > Cheers, > Sendu. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From johnsonm at gmail.com Wed Feb 28 17:54:02 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Wed, 28 Feb 2007 16:54:02 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> References: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> Message-ID: On 2/28/07, Chris Fields wrote: > For bioperl-run tests you have to have the program installed for > tests to work (otherwise they are passed over). Therefore one would > assume if you had the GeneMark program you would have the models as > well. > > You could set up your module to require an env. variable be set (like > the HMMER module, for instance) which contains the executables and/or > the models, so that if it isn't set the tests are skipped. Sounds like a plan. I wouldn't worry too much about Win9x. Is IPC::Run in perl core? > Otherwise we'll need to add it to the optional dependencies for > bioperl-run. I'd test it, but I don't have access to any Win9x boxes anymore. IPC::Run is not a core module, but I think it's worth the dependency. I considered IPC::Open3, but it can't be made reliable on Win32, something about not being able to select() on filehandles, only sockets. I also looked at IPC::Run3, but under the hood, it's just got STDOUT/STDERR redirection layered on top of system(). I don't like using system() due to issues with signals (Such as the user hitting ctrl-c and taking out the child). I feel better knowing the wrapped executable is in another process disconnected from the console. Sendu's suggestion (_set_from_args() ) is the best. As mentioned in > another thread _rearrange() works as well. I'm using _rearrange() now. I'll look at _set_from_args(). Is either one preferred to the other? From bix at sendu.me.uk Wed Feb 28 19:13:29 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 01 Mar 2007 00:13:29 +0000 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: References: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> Message-ID: <45E61AA9.9030906@sendu.me.uk> Mark Johnson wrote: > I'm using _rearrange() now. I'll look at _set_from_args(). Is either one > preferred to the other? _set_from_args() is implemented using _rearrange() iirc. In any case, they do different things but _set_from_args() just makes creating wrapper modules a lot simpler. Another example: compare revisions 1.15 and 1.16 of Bio::Tools::Run::Alignment::Lagan where I reimplemented it to use _set_from_args() and _setparams(). http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-run/Bio/Tools/Run/Alignment/Lagan.pm.diff?r1=text&tr1=1.15&r2=text&tr2=1.16&diff_format=h So, its new, but I'd recommend new modules, especially wrappers, make use of it. From bix at sendu.me.uk Wed Feb 28 19:19:29 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 01 Mar 2007 00:19:29 +0000 Subject: [Bioperl-l] Problem of Installing Bioperl In-Reply-To: <459942.77644.qm@web60518.mail.yahoo.com> References: <459942.77644.qm@web60518.mail.yahoo.com> Message-ID: <45E61C11.90806@sendu.me.uk> Chan Kuang Lim wrote: > I have problem of installing bioperl in windows using command-line installation. > In the cmd windows, after > ppm-shell > search bioperl > install 2 > > many downloading had done, but the next line is: > Unpacking bioperl-1.5.2_100...ppm install failed: Can't extract files from C:.............../Bioperl-1.5.2_100.tgz Does that file exist on your system? Is it larger than 0kb? Can you open it yourself? From cjfields at uiuc.edu Wed Feb 28 20:19:31 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Feb 2007 19:19:31 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: <45E61AA9.9030906@sendu.me.uk> References: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> <45E61AA9.9030906@sendu.me.uk> Message-ID: <93734147-BDDE-4D73-B8F1-FB4A9D073F9B@uiuc.edu> On Feb 28, 2007, at 6:13 PM, Sendu Bala wrote: > Mark Johnson wrote: >> I'm using _rearrange() now. I'll look at _set_from_args(). Is >> either one >> preferred to the other? > > _set_from_args() is implemented using _rearrange() iirc. In any case, > they do different things but _set_from_args() just makes creating > wrapper modules a lot simpler. Another example: compare revisions 1.15 > and 1.16 of Bio::Tools::Run::Alignment::Lagan where I reimplemented it > to use _set_from_args() and _setparams(). > > http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-run/Bio/Tools/Run/ > Alignment/Lagan.pm.diff? > r1=text&tr1=1.15&r2=text&tr2=1.16&diff_format=h > > So, its new, but I'd recommend new modules, especially wrappers, make > use of it. Agreed; I think it allows for parameter variations (dashed, dashless, etc) and can create on-the-fly simple get/setters, so is particularly suited for wrappers. _rearrange() will always have use in situations where using named parameters helps (long arg lists) but you don't want get/setters, just values. From dmessina at wustl.edu Wed Feb 28 20:40:39 2007 From: dmessina at wustl.edu (Dave Messina) Date: Wed, 28 Feb 2007 19:40:39 -0600 (CST) Subject: [Bioperl-l] Problem installing bioperl-1.5.2_102 In-Reply-To: References: Message-ID: <58485.75.33.119.169.1172713239.squirrel@gscmail.wustl.edu> > t/compat.t 5 1280 60 5 8.33% 25-28 31 This is the test that failed. I think you snipped the part above where the actual errors causing the failure was printed. > There seems to be running another CPAN process (pid 7207). Contacting... > Lockfile /Users/sand60/.cpan/.lock not writeable by you. Cannot proceed. > On UNIX try: > rm /Users/sand60/.cpan/.lock > and then rerun us. > at -e line 1 > ################################################### > And doing what it says, removing some lock file doesn't help. Are you sure the lock file is really being removed? If so, what was the error you got when running it after doing that? Also, this line is important: > /usr/bin/make test -- NOT OK It looks like you're trying to install on OS X. By default, OS X has perl but not make. So /usr/bin/make probably doesn't exist on your system, along with lots of other UNIX tools you'll want. To verify this, type: which /usr/bin/make on the command line. If you get: /usr/bin/make: Command not found. you'll need to install the OS X developer tools, called Xcode. You'll need to register first, but you can get the latest version at: http://developer.apple.com/tools/download/ After you do that, reread the BioPerl install docs and try to install again. Since you don't have root on your machine, be sure to read the part of the install instructions that describe what to do. Dave From hlapp at gmx.net Wed Feb 28 23:16:38 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 28 Feb 2007 23:16:38 -0500 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: References: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> Message-ID: On Feb 28, 2007, at 5:54 PM, Mark Johnson wrote: > I don't like using system() due to issues with > signals (Such as the user hitting ctrl-c and taking out the > child). I feel > better knowing the wrapped executable is in another process > disconnected > from the console. I'm not sure how the user would be able to take out the child hitting ctrl-c if you run it through system() (except if the parent terminates first - but maybe then terminating a run-away child is in good order). I haven't read the IPC::run POD in full detail but you will want to make sure that if the parent gets killed the child does get killed too, or otherwise you'll have a run-away process that novices will have trouble with understanding or terminating. Other than that though IPC::run seems like a useful module, so incurring this as a dependency should be fine. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cuiw at ncbi.nlm.nih.gov Thu Feb 1 09:47:38 2007 From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C]) Date: Thu, 1 Feb 2007 09:47:38 -0500 Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ? In-Reply-To: <45C1059D.1070100@tbi.univie.ac.at> References: <45C1059D.1070100@tbi.univie.ac.at> Message-ID: <18C407FD4FFB424292D769FBD68C1987020BB960@NIHCESMLBX8.nih.gov> This is a simple test from gene ID 3632373 (protein is 46100068) to contig coordinates: perl -MLWP::Simple -e 'map {print $_, "\n" if /<(Gene-source_src.*?>)(.*)?<$1/} (split "\n", get(q{http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&i d=3632373&retmode=xml}))' You need to translate protein id to gene id though. If the genome is available at Map Viewer, try (the contig name is NW_101115 from last step) http://www.ncbi.nlm.nih.gov/mapview/maps.cgi?taxid=5270&gnl=NW_101115&MA PS=genes&cmd=txt Wenwu Cui, PhD -----Original Message----- From: Rainer Machne [mailto:raim at tbi.univie.ac.at] Sent: Wednesday, January 31, 2007 4:10 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ? Dear Bioperl list, hoping not be on the wrong email list, i would have a short question: Is there a standard way or are there nice (Bioperl) tools to come from a gene id (gi) other ids (see below) to the genomic coordinates of the respective gene? We have Fasta files retrieved from NCBI protein Blast in fungal genomes: >gi|46100068|gb|EAK85301.1| hypothetical protein UM04252.1 [Ustilago maydis 521] or >gi|50292953|ref|XP_448909.1| unnamed protein product [Candida glabrata] (we only have gi, ref and gb in my set). I retrieved all my fasta files from whole fungal genomes with available protein sequences at http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?organism=fungi As I only searched whole finished genomes (not shotgun), I thought it would then be easy to get the genomic coordinates and retrieve upstream sequences, but we have failed so far to find a consistent way to do this automatically. Many of the gi entries refer to mRNAs or partial mRNAs and the way to the coordinates seems to differ for each case. Any suggestions would be appreciated. with kind regards, Rainer Machne University of Vienna Department for Theoretical Chemistry Theoretical Biochemistry Group _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From raim at tbi.univie.ac.at Thu Feb 1 07:54:21 2007 From: raim at tbi.univie.ac.at (Rainer Machne) Date: Thu, 01 Feb 2007 13:54:21 +0100 Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ? In-Reply-To: References: <45C1059D.1070100@tbi.univie.ac.at> Message-ID: <45C1E2FD.3070709@tbi.univie.ac.at> Barry and Jason, thanks for your quick and very helpful replies. I guess we should have done (or repeat) our blast search at http://fungal.genome.duke.edu/ to get better mapping from proteins to genomes ? As I retrieved all my proteins via whole genome blasts we should find (most of) them in the genbank files ... a good opportunity for me to learn some Bioperl and the other packages you mentioned in case we want to do more complex analysis later :-) Thank you very much! Rainer Barry Moore wrote: > Rainer, > > We use a perl library called CGL written by Mark Yandell and colleagues > (which in turn uses Chris Mungal's BioChaos and Unflattener.pm referred > to by Jason) for this type of task. The basic pipeline is convert > GenBank files to Chaos XML, then use CGL with those XML files to get a > nice object oriented access to exons, transcripts, proteins, > coordinates and more for of those genes. I am currently using this > with good success on most GenBank genomes (unfortunately I haven't been > working with the fungal genomes, but it should work fine). The Ensembl > API provides similar functionality for Ensembl genomes - but not very > many fungi there. > > http://www.yandell-lab.org/cgl/ > http://www.ensembl.org/info/software/core/core_tutorial.html > > Feel free to contact Mark or myself directly if you are interested in > using CGL. > > Barry > > On Jan 31, 2007, at 2:09 PM, Rainer Machne wrote: > >> Dear Bioperl list, >> >> hoping not be on the wrong email list, i would have a short question: >> >> Is there a standard way or are there nice (Bioperl) tools to come from a >> gene id (gi) other ids (see below) to the genomic coordinates of the >> respective gene? >> >> We have Fasta files retrieved from NCBI protein Blast in fungal genomes: >> >>> gi|46100068|gb|EAK85301.1| hypothetical protein UM04252.1 [Ustilago >> >> maydis 521] >> or >> >>> gi|50292953|ref|XP_448909.1| unnamed protein product [Candida glabrata] >> >> >> (we only have gi, ref and gb in my set). >> >> I retrieved all my fasta files from whole fungal genomes with available >> protein sequences at >> http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?organism=fungi >> >> As I only searched whole finished genomes (not shotgun), I thought it >> would then be easy to get the genomic coordinates and retrieve upstream >> sequences, but we have failed so far to find a consistent way to do this >> automatically. Many of the gi entries refer to mRNAs or partial mRNAs >> and the way to the coordinates seems to differ for each case. >> >> Any suggestions would be appreciated. >> >> with kind regards, >> Rainer Machne >> >> University of Vienna >> Department for Theoretical Chemistry >> Theoretical Biochemistry Group >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at uiuc.edu Thu Feb 1 12:55:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Feb 2007 11:55:27 -0600 Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ? In-Reply-To: <45C1E2FD.3070709@tbi.univie.ac.at> References: <45C1059D.1070100@tbi.univie.ac.at> <45C1E2FD.3070709@tbi.univie.ac.at> Message-ID: On Feb 1, 2007, at 6:54 AM, Rainer Machne wrote: > Barry and Jason, > > thanks for your quick and very helpful replies. > > I guess we should have done (or repeat) our blast search at > http://fungal.genome.duke.edu/ > to get better mapping from proteins to genomes ? > > As I retrieved all my proteins via whole genome blasts we should find > (most of) them in the genbank files ... a good opportunity for me to > learn some Bioperl and the other packages you mentioned in case we > want > to do more complex analysis later :-) > > Thank you very much! > > Rainer If the data is available in GenBank you could run the BLAST searches at NCBI and limit the search with an Entrez query: http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#entrez_query Most (all?) genome files are tagged as complete I'm not sure but there might be a way of doing this via Bio::Tools::Run::RemoteBlast. Jason, any ideas? chris From cjfields at uiuc.edu Thu Feb 1 13:09:16 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Feb 2007 12:09:16 -0600 Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ? In-Reply-To: References: <45C1059D.1070100@tbi.univie.ac.at> <45C1E2FD.3070709@tbi.univie.ac.at> Message-ID: <748CC48E-D224-4234-A5C4-E33968F17418@uiuc.edu> > If the data is available in GenBank you could run the BLAST searches > at NCBI and limit the search with an Entrez query: > > http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#entrez_query > > Most (all?) genome files are tagged as complete sorry, didn't finish that... "Most (all?) genome files are tagged as complete, wgs, in progress, etc. and can be limited by taxonomy using Fungi[ORGN] or similar." chris From jason at bioperl.org Thu Feb 1 13:36:02 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 1 Feb 2007 10:36:02 -0800 Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ? In-Reply-To: References: <45C1059D.1070100@tbi.univie.ac.at> <45C1E2FD.3070709@tbi.univie.ac.at> Message-ID: On Feb 1, 2007, at 9:55 AM, Chris Fields wrote: > > On Feb 1, 2007, at 6:54 AM, Rainer Machne wrote: > >> Barry and Jason, >> >> thanks for your quick and very helpful replies. >> >> I guess we should have done (or repeat) our blast search at >> http://fungal.genome.duke.edu/ >> to get better mapping from proteins to genomes ? >> Well I'm not quite sure of your exact goals. To find upstream regions of known genes, or look at upstream regions of orthologous genes? You can first figure out orthologs based on protein similarities, then go in an extract upstream regions for the orthologous genes (I provide a link to a big all-vs-all FASTA result at the bottom of the page if you want those results, as well as some pairiwise orthology assignments, although you may want more or less stringent parameters). All the GFF and AA data is freely available for download on the site for each genome we've annotated or for annotation we've re-formatted so you can do things locally and/or modify it to your liking. >> As I retrieved all my proteins via whole genome blasts we should find >> (most of) them in the genbank files ... a good opportunity for me to >> learn some Bioperl and the other packages you mentioned in case we >> want >> to do more complex analysis later :-) >> >> Thank you very much! >> >> Rainer > > If the data is available in GenBank you could run the BLAST > searches at NCBI and limit the search with an Entrez query: > > http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#entrez_query > > Most (all?) genome files are tagged as complete > > I'm not sure but there might be a way of doing this via > Bio::Tools::Run::RemoteBlast. Jason, any ideas? > > chris -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From reenayadav at gmail.com Thu Feb 1 13:38:03 2007 From: reenayadav at gmail.com (Reena Yadav) Date: Fri, 2 Feb 2007 00:08:03 +0530 Subject: [Bioperl-l] pdb parser Message-ID: <76f897dd0702011038v7afe0207gb05465478e026205@mail.gmail.com> hi need to extract pdb atomic coordinates (1ake), and do certain calculations. i am going stepwise: steps that involved are: (1) reading the atomic coordinates (2) read the result in a file. need to understand how to whole xyz line in another file. could someone help. R. From jason at bioperl.org Thu Feb 1 08:06:42 2007 From: jason at bioperl.org (sandhya khatal) Date: Thu, 1 Feb 2007 13:06:42 +0000 Subject: [Bioperl-l] Regarding Bioperl program Message-ID: <75899ED1-72C6-4272-8CAC-028CF133A0B4@gmail.com> Respected Sir, I want to do a program which gives dendrogram like UPGMA a clustering method, but i want this dendrogram by using single linkage or centroid method.Can u help me for this.U have given the code for tree but i want dendrogram as output by using above any method. Thanks for anticipating. Regards, Sandhya Khatal. From jason at bioperl.org Thu Feb 1 19:55:26 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 1 Feb 2007 16:55:26 -0800 Subject: [Bioperl-l] Fwd: Regarding Bioperl program References: <394d31ee0702010506j4bbd79dck41d5ac2162eaafdd@mail.gmail.com> Message-ID: <40020502-3421-407D-85EB-24F420AB699C@bioperl.org> re-forwarding Sandhya's email to the list so the email address is visible. The approach that is coded in bioperl is for distance based data such as evolutionary distance of DNA or protein sequences - I assume you are talking about clustering expression data? You may want to focus on the available literature and toolkits that focus on expression data - something BioPerl doesn't deliberately focus on right now. -jason Begin forwarded message: > From: "sandhya khatal" > Date: February 1, 2007 5:06:42 AM PST > To: jason at bioperl.org > Subject: Regarding Bioperl program > > Respected Sir, > I want to do a program which gives dendrogram > like > UPGMA a clustering method, but i want this dendrogram by using single > linkage or centroid method.Can u help me for this.U have given the > code for > tree but i want dendrogram as output by using above any method. > > Thanks for anticipating. > > Regards, > Sandhya Khatal. -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From lzhtom at hotmail.com Thu Feb 1 22:20:10 2007 From: lzhtom at hotmail.com (zhihua li) Date: Fri, 02 Feb 2007 03:20:10 +0000 Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file? Message-ID: _________________________________________________________________ ???????? MSN Explorer: http://explorer.msn.com/lccn/ From lzhtom at hotmail.com Thu Feb 1 22:27:39 2007 From: lzhtom at hotmail.com (zhihua li) Date: Fri, 02 Feb 2007 03:27:39 +0000 Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file? Message-ID: Sorry guys, the former empty mail was sent out by mistake. I'm using Bio::index::Fasta to index a file containing lots of sequences in fasta format. All is fine except one thing. According to the bioperl tutorial and the documents, the following code will make a indexed file: my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx", -write_flag => 1); $inx->make_index("test.fasta"); And in another script I can access the indexed file by sayinig $ENV{BIOPERL_INDEX} = "."; # find index in current directory my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx"); my $seq=$inx->fetch("ent1001"); #fetch the sequence named ent1001 However, after running the first script, I cannot find a new file test.fasta.idx in my current directory. And not surprisingly, when I ran the second script, perl told me it couldn't find "test.fasta.idx". What's going on here? Thanks a lot! _________________________________________________________________ ???????????????????????????? MSN Messenger: http://messenger.msn.com/cn From jason at bioperl.org Fri Feb 2 01:24:44 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 1 Feb 2007 22:24:44 -0800 Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file? In-Reply-To: References: Message-ID: I don't think BIOPERL_INDEX does anything in the module so that documentation is not quite right. the env variable is used in the scripts/index/bp_index and bp_fetch scripts so maybe a cut+paste job went bad somewhere. you need to specify the full path you want with -filename - you can just prepen the BIOPERL_INDEX to the filename like. -filename => $ENV{BIOPERL_INDEX}."/$index" -jason On Feb 1, 2007, at 7:27 PM, zhihua li wrote: > Sorry guys, the former empty mail was sent out by mistake. > > I'm using Bio::index::Fasta to index a file containing lots of > sequences in fasta format. All is fine except one thing. > > According to the bioperl tutorial and the documents, the following > code will make a indexed file: > > my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx", > -write_flag => 1); > $inx->make_index("test.fasta"); > > And in another script I can access the indexed file by sayinig > > $ENV{BIOPERL_INDEX} = "."; # find index in current directory > my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx"); > my $seq=$inx->fetch("ent1001"); #fetch the sequence named > ent1001 > > However, after running the first script, I cannot find a new file > test.fasta.idx in my current directory. And not surprisingly, when > I ran the second script, perl told me it couldn't find > "test.fasta.idx". > > What's going on here? > > Thanks a lot! > > _________________________________________________________________ > ?????????????? MSN Messenger: http:// > messenger.msn.com/cn > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From marian.thieme at lycos.de Fri Feb 2 05:06:09 2007 From: marian.thieme at lycos.de (marian thieme) Date: Fri, 2 Feb 2007 10:06:09 +0000 Subject: [Bioperl-l] seqDiff Message-ID: <101051013116870@lycos-europe.com> An HTML attachment was scrubbed... URL: From marian.thieme at lycos.de Fri Feb 2 06:37:05 2007 From: marian.thieme at lycos.de (marian thieme) Date: Fri, 2 Feb 2007 11:37:05 +0000 Subject: [Bioperl-l] susp. header Message-ID: <188661178024725@lycos-europe.com> An HTML attachment was scrubbed... URL: From lubapardo at gmail.com Fri Feb 2 09:31:06 2007 From: lubapardo at gmail.com (Luba Pardo) Date: Fri, 2 Feb 2007 15:31:06 +0100 Subject: [Bioperl-l] Problem using Bio::DB::Query::GenBank; Message-ID: <58ff33550702020631l4e7bc59dmabcf8c72fa67a6d5@mail.gmail.com> Hello, (I am using bioperl-1.5.2_100, linux machine) I am trying to get the ids of a list of genes using the module Bio::DB::Query:GenBank. I have the following code: use Bio::DB::Query::GenBank; use strict; use warnings; open (READER_1,"list.txt") || die "\n I can't open the file READER_1 !!\n"; my @a1=; close (READER_1); for (my $i=0; $i<=$#a1;$i=$i+1 ) { my @a1_s=split/\s+/,$a1[$i]; my $query_string = ' Homo Sapiens[Organism] AND $a1[$i] '; my $query = Bio::DB::Query::GenBank->new(-db=>'Protein', -query=>$query_string ); my $count = $query->count; my @ids = $query->ids; print " gene: $a1[$i] first id is $ids[0] o no? \n"; I want to tell the program to get all the genes contained in the file list.txt and to retrieve the ids from GenBank. However the program gives me the following error: ------------EXCEPTION: Bio::Root::Exception ------------- MSG: Id list has been truncated even after maxids requested STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.1/Bio/Root/Root.pm:359 STACK: Bio::DB::Query::WebQuery::_fetch_ids /usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:236 STACK: Bio::DB::Query::WebQuery::ids /usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:200 STACK: query.pl:27 ------------------ Is that a problem if I try to use the $a1[$i] to replace the name of the gene? I thank before hand for the attention you may pay to this message Regards, Luba Pardo From hlapp at gmx.net Fri Feb 2 10:44:02 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 2 Feb 2007 10:44:02 -0500 Subject: [Bioperl-l] susp. header In-Reply-To: <188661178024725@lycos-europe.com> References: <188661178024725@lycos-europe.com> Message-ID: You are sending HTML emails. You should configure your mailer to ideally just send plain text. If you really must have fancy formatted emails (i.e., HTML-formatted emails), then configure it such that the mailer will send a plain text and a HTML version. (Many spam filters will flag email the body of which consists of only an HTML attachment.) -hilmar On Feb 2, 2007, at 6:37 AM, marian thieme wrote: > why each message I sent to this list is considered to have a susp. > header ? > > Marian > > Schreiben Sie sich kostenlos ein und erhalten Sie eine Liste mit > 20 Singles aus Ihrer Umgebung.Meetic.de > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cain.cshl at gmail.com Fri Feb 2 11:03:16 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Fri, 02 Feb 2007 11:03:16 -0500 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu> References: <1170359746.2706.622.camel@localhost.localdomain> <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu> Message-ID: <1170432196.2706.661.camel@localhost.localdomain> Hi Hilmar, That is a good idea; when I started down this road, it felt like there would only be a few things that I might want to allow to be different, but I think you are right that having one standard implementation that can be subclassed for legacy systems is a good thing. Scott On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote: > On Feb 1, 2007, at 2:55 PM, Scott Cain wrote: > > > The second main change was to introduce a -flybase_compat argument > > when > > initializing the Bio::SeqIO writer, so that 'old style' cv and cvterms > > (that are compatable with flybase) will be used, but now the default > > will be to use current standards: > > Just my $0.02 ... obviously, Flybase may be the only organization > that uses an 'old style' or any other way not compliant with 'current > standards' (presumably SO), but if it's not the only one then this > approach won't scale. > > Also, an argument -flybase_compat suggests to the unsuspecting that > this is an endorsed flavor of the standard and fine to use for > everyone else too. > > If Flybase is idiosyncratic in this way, why not make chadoxml.pm > compliant with the standard as we all want it, keep it free from > litter caused by usage of old versions of SO, and create a second > module fb-chadoxml.pm that inherits from the first and merely > overrides a few things so that it works for Flybase. This way, other > organizations with similar needs can follow the path and create their > own xyz-chadoxml.pm, rather than having to muck around in the > chadoxml.pm that comes with the distribution. > > I'm not sure I fully grasp the underlying issue, so I may not make > much sense here. Apologies if that's the case ... > > -hilmar -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From bosborne11 at verizon.net Fri Feb 2 10:27:44 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 02 Feb 2007 10:27:44 -0500 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu> Message-ID: Hilmar, I second your motion, good idea. Let's keep the standard module nice and clean. Brian O. On 2/2/07 10:09 AM, "Hilmar Lapp" wrote: > and create a second > module fb-chadoxml.pm that inherits from the first and merely > overrides a few things so that it works for Flybase From Kevin.M.Brown at asu.edu Fri Feb 2 10:52:20 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Fri, 2 Feb 2007 08:52:20 -0700 Subject: [Bioperl-l] Problem using Bio::DB::Query::GenBank; References: <58ff33550702020631l4e7bc59dmabcf8c72fa67a6d5@mail.gmail.com> Message-ID: <1A4207F8295607498283FE9E93B775B402AABA1C@EX02.asurite.ad.asu.edu> It looks like you have some problems with the code you posted. use Bio::DB::Query::GenBank; use strict; use warnings; open (READER_1,"list.txt") || die "\n I can't open the file READER_1 !!\n"; my @a1=; close (READER_1); for (my $i=0; $i < @a1;$i++ ) { # is this necessary as you don't seem to use it anywhere later in your code. my @a1_s=split/\s+/,$a1[$i]; # you enclosed the variable in '' which means perl won't evaluate it # changed the query so that perl can evaluate the variable my $query_string = ' Homo Sapiens[Organism] AND '.$a1[$i] .' '; my $query = Bio::DB::Query::GenBank->new(-db=>'Protein', -query=>$query_string ); my $count = $query->count; my @ids = $query->ids; print " gene: $a1[$i] first id is $ids[0] o no? \n"; -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Luba Pardo Sent: Friday, February 02, 2007 7:31 AM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Problem using Bio::DB::Query::GenBank; Hello, (I am using bioperl-1.5.2_100, linux machine) I am trying to get the ids of a list of genes using the module Bio::DB::Query:GenBank. I have the following code: use Bio::DB::Query::GenBank; use strict; use warnings; open (READER_1,"list.txt") || die "\n I can't open the file READER_1 !!\n"; my @a1=; close (READER_1); for (my $i=0; $i<=$#a1;$i=$i+1 ) { my @a1_s=split/\s+/,$a1[$i]; my $query_string = ' Homo Sapiens[Organism] AND $a1[$i] '; my $query = Bio::DB::Query::GenBank->new(-db=>'Protein', -query=>$query_string ); my $count = $query->count; my @ids = $query->ids; print " gene: $a1[$i] first id is $ids[0] o no? \n"; I want to tell the program to get all the genes contained in the file list.txt and to retrieve the ids from GenBank. However the program gives me the following error: ------------EXCEPTION: Bio::Root::Exception ------------- MSG: Id list has been truncated even after maxids requested STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.1/Bio/Root/Root.pm:359 STACK: Bio::DB::Query::WebQuery::_fetch_ids /usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:236 STACK: Bio::DB::Query::WebQuery::ids /usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:200 STACK: query.pl:27 ------------------ Is that a problem if I try to use the $a1[$i] to replace the name of the gene? I thank before hand for the attention you may pay to this message Regards, Luba Pardo _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri Feb 2 11:37:49 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 2 Feb 2007 10:37:49 -0600 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: <1170432196.2706.661.camel@localhost.localdomain> References: <1170359746.2706.622.camel@localhost.localdomain> <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu> <1170432196.2706.661.camel@localhost.localdomain> Message-ID: <64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu> I was going to suggest maybe allowing one to switch out XML handlers/ writers based on the style (ala XML::SAX), but I see that chadoxml currently uses XML::Writer and there is no next_seq() implemented. Oh well... chris On Feb 2, 2007, at 10:03 AM, Scott Cain wrote: > Hi Hilmar, > > That is a good idea; when I started down this road, it felt like there > would only be a few things that I might want to allow to be different, > but I think you are right that having one standard implementation that > can be subclassed for legacy systems is a good thing. > > Scott > > > On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote: >> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote: >> >>> The second main change was to introduce a -flybase_compat argument >>> when >>> initializing the Bio::SeqIO writer, so that 'old style' cv and >>> cvterms >>> (that are compatable with flybase) will be used, but now the default >>> will be to use current standards: >> >> Just my $0.02 ... obviously, Flybase may be the only organization >> that uses an 'old style' or any other way not compliant with 'current >> standards' (presumably SO), but if it's not the only one then this >> approach won't scale. >> >> Also, an argument -flybase_compat suggests to the unsuspecting that >> this is an endorsed flavor of the standard and fine to use for >> everyone else too. >> >> If Flybase is idiosyncratic in this way, why not make chadoxml.pm >> compliant with the standard as we all want it, keep it free from >> litter caused by usage of old versions of SO, and create a second >> module fb-chadoxml.pm that inherits from the first and merely >> overrides a few things so that it works for Flybase. This way, other >> organizations with similar needs can follow the path and create their >> own xyz-chadoxml.pm, rather than having to muck around in the >> chadoxml.pm that comes with the distribution. >> >> I'm not sure I fully grasp the underlying issue, so I may not make >> much sense here. Apologies if that's the case ... >> >> -hilmar > -- > ---------------------------------------------------------------------- > -- > Scott Cain, Ph. D. > cain.cshl at gmail.com > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Fri Feb 2 11:45:30 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 2 Feb 2007 11:45:30 -0500 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: <64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu> References: <1170359746.2706.622.camel@localhost.localdomain> <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu> <1170432196.2706.661.camel@localhost.localdomain> <64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu> Message-ID: <3A3FF1B0-129A-410D-9DE2-D1CB28015C9A@gmx.net> There must be at least a stub for next_seq(). It may throw a not- implemented exception, but it should not just be absent. -hilmar On Feb 2, 2007, at 11:37 AM, Chris Fields wrote: > I was going to suggest maybe allowing one to switch out XML > handlers/writers based on the style (ala XML::SAX), but I see that > chadoxml currently uses XML::Writer and there is no next_seq() > implemented. Oh well... > > chris > > On Feb 2, 2007, at 10:03 AM, Scott Cain wrote: > >> Hi Hilmar, >> >> That is a good idea; when I started down this road, it felt like >> there >> would only be a few things that I might want to allow to be >> different, >> but I think you are right that having one standard implementation >> that >> can be subclassed for legacy systems is a good thing. >> >> Scott >> >> >> On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote: >>> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote: >>> >>>> The second main change was to introduce a -flybase_compat argument >>>> when >>>> initializing the Bio::SeqIO writer, so that 'old style' cv and >>>> cvterms >>>> (that are compatable with flybase) will be used, but now the >>>> default >>>> will be to use current standards: >>> >>> Just my $0.02 ... obviously, Flybase may be the only organization >>> that uses an 'old style' or any other way not compliant with >>> 'current >>> standards' (presumably SO), but if it's not the only one then this >>> approach won't scale. >>> >>> Also, an argument -flybase_compat suggests to the unsuspecting that >>> this is an endorsed flavor of the standard and fine to use for >>> everyone else too. >>> >>> If Flybase is idiosyncratic in this way, why not make chadoxml.pm >>> compliant with the standard as we all want it, keep it free from >>> litter caused by usage of old versions of SO, and create a second >>> module fb-chadoxml.pm that inherits from the first and merely >>> overrides a few things so that it works for Flybase. This way, other >>> organizations with similar needs can follow the path and create >>> their >>> own xyz-chadoxml.pm, rather than having to muck around in the >>> chadoxml.pm that comes with the distribution. >>> >>> I'm not sure I fully grasp the underlying issue, so I may not make >>> much sense here. Apologies if that's the case ... >>> >>> -hilmar >> -- >> --------------------------------------------------------------------- >> --- >> Scott Cain, Ph. D. >> cain.cshl at gmail.com >> GMOD Coordinator (http://www.gmod.org/) >> 216-392-3087 >> Cold Spring Harbor Laboratory >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cain.cshl at gmail.com Fri Feb 2 12:02:32 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Fri, 02 Feb 2007 12:02:32 -0500 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: <3A3FF1B0-129A-410D-9DE2-D1CB28015C9A@gmx.net> References: <1170359746.2706.622.camel@localhost.localdomain> <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu> <1170432196.2706.661.camel@localhost.localdomain> <64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu> <3A3FF1B0-129A-410D-9DE2-D1CB28015C9A@gmx.net> Message-ID: <1170435752.2706.676.camel@localhost.localdomain> Ah, I'll go ahead and add one, though it will just throw an exception because this is a write-only adapter. Scott On Fri, 2007-02-02 at 11:45 -0500, Hilmar Lapp wrote: > There must be at least a stub for next_seq(). It may throw a not- > implemented exception, but it should not just be absent. > > -hilmar > > On Feb 2, 2007, at 11:37 AM, Chris Fields wrote: > > > I was going to suggest maybe allowing one to switch out XML > > handlers/writers based on the style (ala XML::SAX), but I see that > > chadoxml currently uses XML::Writer and there is no next_seq() > > implemented. Oh well... > > > > chris > > > > On Feb 2, 2007, at 10:03 AM, Scott Cain wrote: > > > >> Hi Hilmar, > >> > >> That is a good idea; when I started down this road, it felt like > >> there > >> would only be a few things that I might want to allow to be > >> different, > >> but I think you are right that having one standard implementation > >> that > >> can be subclassed for legacy systems is a good thing. > >> > >> Scott > >> > >> > >> On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote: > >>> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote: > >>> > >>>> The second main change was to introduce a -flybase_compat argument > >>>> when > >>>> initializing the Bio::SeqIO writer, so that 'old style' cv and > >>>> cvterms > >>>> (that are compatable with flybase) will be used, but now the > >>>> default > >>>> will be to use current standards: > >>> > >>> Just my $0.02 ... obviously, Flybase may be the only organization > >>> that uses an 'old style' or any other way not compliant with > >>> 'current > >>> standards' (presumably SO), but if it's not the only one then this > >>> approach won't scale. > >>> > >>> Also, an argument -flybase_compat suggests to the unsuspecting that > >>> this is an endorsed flavor of the standard and fine to use for > >>> everyone else too. > >>> > >>> If Flybase is idiosyncratic in this way, why not make chadoxml.pm > >>> compliant with the standard as we all want it, keep it free from > >>> litter caused by usage of old versions of SO, and create a second > >>> module fb-chadoxml.pm that inherits from the first and merely > >>> overrides a few things so that it works for Flybase. This way, other > >>> organizations with similar needs can follow the path and create > >>> their > >>> own xyz-chadoxml.pm, rather than having to muck around in the > >>> chadoxml.pm that comes with the distribution. > >>> > >>> I'm not sure I fully grasp the underlying issue, so I may not make > >>> much sense here. Apologies if that's the case ... > >>> > >>> -hilmar > >> -- > >> --------------------------------------------------------------------- > >> --- > >> Scott Cain, Ph. D. > >> cain.cshl at gmail.com > >> GMOD Coordinator (http://www.gmod.org/) > >> 216-392-3087 > >> Cold Spring Harbor Laboratory > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From peili at morgan.harvard.edu Fri Feb 2 10:56:56 2007 From: peili at morgan.harvard.edu (Peili Zhang) Date: Fri, 02 Feb 2007 10:56:56 -0500 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: References: Message-ID: <1170431816.6583.47.camel@jacks> i 'third' Hilmar's opinion. flybase's fingerprint is shown in the module because i wrote it for fb's data loading task. no need to worry about flybase compatibility in making the module generic. in fact, at flybase, i tweak the module frequently to make it work for different scenarios. cheers, peili On Fri, 2007-02-02 at 10:27, Brian Osborne wrote: > Hilmar, > > I second your motion, good idea. Let's keep the standard module nice and > clean. > > Brian O. > > > On 2/2/07 10:09 AM, "Hilmar Lapp" wrote: > > > and create a second > > module fb-chadoxml.pm that inherits from the first and merely > > overrides a few things so that it works for Flybase > > > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier. > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Gmod-schema mailing list > Gmod-schema at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-schema > From cain.cshl at gmail.com Fri Feb 2 13:05:47 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Fri, 02 Feb 2007 13:05:47 -0500 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: <1170431816.6583.47.camel@jacks> References: <1170431816.6583.47.camel@jacks> Message-ID: <1170439549.2706.683.camel@localhost.localdomain> Hi Peili, A little bit ago I checked in Bio::SeqIO::flybase_chadoxml that is fairly simple. My suggestion is that when you make tweaks for different scenarios, that you turn the things you are tweaking into methods in BSIO::chadoxml and then override them in flybase_chadoxml (and commit at least the chadoxml module) to make it more flexible when other people have similar scenarios. Scott On Fri, 2007-02-02 at 10:56 -0500, Peili Zhang wrote: > i 'third' Hilmar's opinion. flybase's fingerprint is shown in the module > because i wrote it for fb's data loading task. no need to worry about > flybase compatibility in making the module generic. in fact, at flybase, > i tweak the module frequently to make it work for different scenarios. > > cheers, > peili > > On Fri, 2007-02-02 at 10:27, Brian Osborne wrote: > > Hilmar, > > > > I second your motion, good idea. Let's keep the standard module nice and > > clean. > > > > Brian O. > > > > > > On 2/2/07 10:09 AM, "Hilmar Lapp" wrote: > > > > > and create a second > > > module fb-chadoxml.pm that inherits from the first and merely > > > overrides a few things so that it works for Flybase > > > > > > > > ------------------------------------------------------------------------- > > Using Tomcat but need to do more? Need to support web services, security? > > Get stuff done quickly with pre-integrated technology to make your job easier. > > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > > _______________________________________________ > > Gmod-schema mailing list > > Gmod-schema at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From cjfields at uiuc.edu Fri Feb 2 15:33:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 2 Feb 2007 14:33:46 -0600 Subject: [Bioperl-l] seqDiff In-Reply-To: <101051013116870@lycos-europe.com> References: <101051013116870@lycos-europe.com> Message-ID: Judging by the code you'll have to recreate the SeqDiff while iterating through various alleles; there is no method to remove particular variants or purge them (at least I couldn't find one). I also noticed SeqDiff doesn't support deletions/insertions either; using a null allele (no seq) or leaving out either the mutant or original allele leads to errors. I'll look into the latter, and I may try to add a method to at least purge variants and reset dna_mut(). chris On Feb 2, 2007, at 4:06 AM, marian thieme wrote: > HI, > > is there a way to put out all mutated sequences of a seqdiff object ? > Suppose I add some variants via: > > $dnamut->add_Allele($a2); > $dnamut->add_Allele($a3); > $seqDiff->add_Variant($dnamut); > > and afterwards want to access the alternative sequences via > $seqDiff->dna_mut() > > which allele is choosen when using dna_mut(), respective can I > control to access the first or the second alternate sequence ? > If yes, how can I do this ? > > Regards, > Marian > > Brauchst du eine Schocktherapie gegen den Alltag? L?chle! Die warme > Sonne von Ibiza und ein bisschen Sand vom Mittelmeer ist die > Therapie, die du brauchst. Plan deinen Urlaub in Spanien auf > www.spain.info > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From MEC at stowers-institute.org Fri Feb 2 16:47:08 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 2 Feb 2007 15:47:08 -0600 Subject: [Bioperl-l] Bio::DB::SeqFeature treamtent of tags and annotations Message-ID: Lincoln, I don't think that adding this directive is a good idea after all either. But, I see that you remap the ID= to a load_id attribute which is preserved in the Bio::DB::SeqFeatureStore database. And then it gets squelched during GFF production by NormalizedFeature::format_attributes. However, if ID is prone to clashes, then certainly simply renaming the attribute to be load_id does not preclude clashes from happening, and only courts disaster. Don't you think? I'm a little blurry on the GFF3Loader, but it looks like you're using load_id to facilitate loading parent/child features out of order. Is that right? If so, I suggest you delete all load_id features immediately after performing a load. It has not further use. Or, you might consider instead of `round-trip-ids` directive, rather, give the GFF3Loader an IDAttribute option which would allow the use of the loader to preserve the ID values, but to use a named In my case, munging flybase gff, I would then use it like this: bp_seqfeature_load.PLS --fast --IDAttribute flybaseID which would preserve the ID values in the database but under the FlybaseID attribute for features so loaded. --------------------------------------------- On a related topic: I just committed this patch to Bio::DB::SeqFeature::NormalizedFeature _create_subfeatures : ensure that subfeatures get the `source` of their parent While doing this I wonder: what is the -class that subfeatures are getting from their parent...??? I left it in place. Please advise! Fix my thinking.... ---------------------------------------------- Further, I observe that Bio::Graphics::FeatureBase::new handles the -segments option is to call add_segment. So, when I create a new Bio::DB::SeqFeature with -segments [[ 100,200 ] [300,400]], the -segments option gets handled by Bio::Graphics::FeatureBase::new, which, as mentioned, calls add_segment. The surprising thing to me when thrying to trace through the class modules and understand what is going on is that what gets run at this point is not Bio::Graphics::FeatureBase::add_segment, but rather Bio::DB::SeqFeature::add_segment, whose semantics is different in at least one regard, namely, that it does not set the start and stop of the parent feature from the min and max of the segments. I have committed a patch to Bio::Graphics::FeatureBase with a comment to this effect, and have also patched it's add_segment method to copy the parent's source into the segment. I hope my commits and suggestions further the cause. Let me know if not! -- Malcolm ________________________________ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: Tuesday, January 30, 2007 4:46 PM To: Cook, Malcolm Cc: bioperl list; lstein at cshl.org Subject: Re: Bio::DB::SeqFeature treamtent of tags and annotations I've fixed the first issue in CVS. Sorry for the inconsistency. add_tag_value(), delete_tag_value() and get_Annotations() now all work as expected. The problem with the ID column is that it is supposed to be LOCAL to the GFF3 file and is not intended to be stored in the database. In contrast, Name can survive roundtripping. Perhaps the thing to do is to add a flag to the GFF3 file that turns on ID round-tripping, e.g. ##round-trip-ids: 1 If you like this idea, I can implement it. Lincoln On 1/29/07, Cook, Malcolm < MEC at stowers-institute.org > wrote: Lincoln, Thanks for your suggestions on approach to my problems augmenting Flybase annotation. I am trying to follow them and finding the following oddities The first issue relates to the intermix of 'annotations' and 'tag values'. I find that Bio::DB::SeqFeature implements some of the 'tag' methods and some of the 'Annotation' methods. Here is a perl one-liner that shows values stored using add_tag_value are not retreived with get_tag_values, but rather with get_Annotations. > perl -MBio::DB::SeqFeature -e 'my $f=Bio::DB::SeqFeature->new; $f->add_tag_value("x",666); print "get_tag_values:\t" . $f->get_tag_values("x") . "\nget_Annotations:\t" . $f->get_Annotations("x");' whose output is: get_tag_values: get_Annotations: 666 Tracing this shows me that this results from the fact that: Bio::DB::SeqFeature uses of Bio::Graphics::FeatureBase (via Bio::DB::SeqFeature::NormalizedFeature) which does not support -tags in ->new but rather -attributes, viz: -attributes a hashref of tag value attributes, in which the key is the tag and the value is an array reference of values And though Bio::Graphics::FeatureBase purports to implement Bio::SeqFeatureI, it only partially implements the 'tag' methods (now deprecated and relegated to Bio::AnnotatableI). In particular, the '*' methods Bio::SeqFeatureI are not implemented in Bio::Graphics::FeatureBase has_tag * add_tag_value get_tag_values get_all_tags * remove_tag get_tagset_values get_Annotations As a result, add_tag_value and remove_tag are inherited from different modules whose understanding of tags is not the same! This one-liner : >perl -MClass::ISA -MClass::Inspector -MBio::DB::SeqFeature -e 'my @c = Class::ISA::self_and_super_path("Bio::DB::SeqFeature"); foreach my $fn qw(add_tag_value get_tag_values) {print "\n$fn:\t", join "\t", (grep {Class::Inspector->function_exists($_, $fn)} @c)}' confirms that they are defined in different packages, namely: add_tag_value: Bio::AnnotatableI get_tag_values: Bio::Graphics::FeatureBase Bio::AnnotatableI Proposed solution... hmmmm ..... I dunno.... maybe the following patch to Bio::Graphics::FeatureBase->add_tag_value : sub add_tag_value { my ($self,$tag, at vals) = @_; push @{$self->{attributes}{$tag}}, @vals; } It fixes my use case for now but I'm still concerned and confused about this variety of methods. Suggestions? ------------------------------------------------------------------------ - Also, I think that any "ID" in column 9 of GFF3 float file should be preserved through a round-trip through a Bio::DB::SeqFeature store, but this is not yet possible since any ID attribute in GFF3 column 9 is being lost by GFF3Loader, causing me to locally patch GFF3Loader::handle_feature method to add the following: # mec at stowers-institute.org , wondering why not all attributes are # carried forward, adds ID tag in particular service of # round-tripping ID, which, though present in database as load_id # attribute, was getting lost as itself $unreserved->{ID}= $reserved->{ID} if exists $reserved->{ID}; Poised to patch.... what d'you think? Malcolm Cook Stowers Institute for Medical Research - Kansas City, Missouri ________________________________ From: lincoln.stein at gmail.com [mailto: lincoln.stein at gmail.com ] On Behalf Of Lincoln Stein Sent: Tuesday, December 19, 2006 3:58 PM To: Cook, Malcolm Cc: bioperl list; lstein at cshl.org Subject: Re: bp_seqfeature_load / Bio::DB::SeqFeature::Store::GFF3Loader problems augmenting Flybase annotation Hi Malcom, Your second guess was right. The use case of augmenting an existing gene with additional splice forms isn't provided for. You can get the functionality by making direct calls to Bio::DB::SeqFeature::Store methods: my @genes = $db->get_features_by_name('FBgn0017545'); @genes == 1 or die "Didn't get exactly one gene"; my $parent = $genes[0]; my $parent = $genes[0]; my $chr = $parent->seq_id; my $start = $parent->start; my $end = $parent->end; my $strand = $parent->strand; my $new_splice_form = $db->new_feature(-primary_tag => 'mRNA', -source => 'added', -seq_id => '4r', -strand => $strand, -start => $start+10, -end => $end, ); $parent->add_SeqFeature($new_splice_form); for my $pos ([$start+10,$start+100],[$start+200,$end]) { my ($e_start,$e_end) = @$pos; my $exon = Bio::DB::SeqFeature->new(-primary_tag => 'exon', -store => $db, -seq_id => '4r', -strand => $strand, -start => $e_start, -end => $e_end); $new_splice_form->add_SeqFeature($exon); } I found a bug in updating the seqfeature database when I wrote this script, so you'll have to get the latest biperl live. I think you can use this to write a splice form updating script. In order to support the idea of adding new splice forms to an existing gene using the GFF3 format, I will have to either modify the loader, or write a separate script (probably better to do the latter). It shouldn't be hard if you'd like to give it a try. Lincoln On 12/19/06, Cook, Malcolm > wrote: Lincoln and fellow Bio::DB::SeqFeature travelers, I find that using bp_seqfeature_load.PLS to load subfeatures of genes already loaded using bp_seqfeature_load.PLS fails with ------------- EXCEPTION ------------- MSG: FBgn0017545 doesn't have a primary id STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682 STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663 STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372 STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345 STACK Bio::DB::SeqFeature::Store::GFF3Loader::load /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242 STACK toplevel /home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo ad.PLS:76 Where FBgn0017545 is the ID of a gene previously loaded. I am unsure how to remedy my situation and welcome any advise on correct or improved approach to my problem. Here's some detail if it helps. I am developing a pipeline to design a microarray probes capable of distinguishing among splice variants in drosophila (using latest Flybase dmel_r5.1 annotation). So I 1) load a filtered selection of Flybase annotation using bp_seqfeature_load. (for testing purposes, I am using a single gene's worth of annotation, FBgn0017545.gff, attached). This is done as follows: > bp_seqfeature_load.PLS --create FBgn0017545.gff 2) analyze all the genes in the database, and create GFF3 output each feature of which has a 'Parent' that is a previously loaded gene (i.e. FBgn0017545). (These features represent the unique introns, splice sites, and exonic design targets. Output of this analysis, FBgn0017545_matd.gff, is also attached) 3) load these analysis results into the same database, as follows: > bp_seqfeature_load.PLS FBgn0017545_matd.gff It is at this point that I get the above error. However, I don't get any error and the data loads fine if I load the two files together, as follows: > bp_seqfeature_load.PLS --create <(cat FBgn0017545.gff FBgn0017545_matd.gff) So, I suspect that either I am misunderstanding when/how to use bp_seqfeature_load.PLS or else this use case has not yet arisen and must be provided for somehow. I am running against bioperl-live Thanks for your thoughts and assistance, Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From neha_bafs at yahoo.co.in Mon Feb 5 12:59:03 2007 From: neha_bafs at yahoo.co.in (Neha Nahar) Date: Mon, 5 Feb 2007 17:59:03 +0000 (GMT) Subject: [Bioperl-l] Convert newick to nexus format Message-ID: <20070205175903.30379.qmail@web8701.mail.in.yahoo.com> Hello everyone, I am trying to convert newick tree to nexus format. Using the script (refered from and email from George dated Wed Sep 22 11:52:47 EDT 2004) : /*------------------------------------------------------------*/ $ cat nexus.pl #!/usr/bin/perl -w use Bio::TreeIO; ($NEWICKFILE, $NEXUSFILE) = @ARGV; print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n"; my $treeio = new Bio::TreeIO(-format => 'newick', -file => "$NEWICKFILE"); my $treeout = new Bio::TreeIO(-format => 'nexus', -file => ">$NEXUSFILE"); while(my $tree = $treeio->next_tree) { $treeout->write_tree($treeout); } exit 0; /*------------------------------------------------------------*/ Running the script through command line: Gives the following error: $ ./nexus.pl mrp-input.txt nexus.out newickfile=mrp-input.txt, nexusfile=nexus.out ------------- EXCEPTION ------------- MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170 STACK toplevel ./nexus.pl:23 -------------------------------------- Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm Questions:- 1. Please let me know if I am using the correct version. If not, please point me to the latest one. 2. Provided that the version I am using is the right one, please let me know what is wrong with the script. Thank you. Regards, Neha. -Neha Nahar " Work for cause and not for applause, live to express and not to impress !" --------------------------------- Here?s a new way to find what you're looking for - Yahoo! Answers From jason at bioperl.org Mon Feb 5 13:10:42 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 5 Feb 2007 10:10:42 -0800 Subject: [Bioperl-l] Convert newick to nexus format In-Reply-To: <20070205175903.30379.qmail@web8701.mail.in.yahoo.com> References: <20070205175903.30379.qmail@web8701.mail.in.yahoo.com> Message-ID: <46219DCD-8C6E-4DBE-82F2-D4B58207AD54@bioperl.org> you want to write the TREE out not the TREE WRITER. $treeout->write_tree($tree) not $treeout->write_tree($treeout); On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote: > Hello everyone, > > I am trying to convert newick tree to nexus format. > Using the script (refered from and email from George dated Wed Sep > 22 11:52:47 EDT 2004) : > > /*------------------------------------------------------------*/ > > $ cat nexus.pl > #!/usr/bin/perl -w > > use Bio::TreeIO; > > ($NEWICKFILE, $NEXUSFILE) = @ARGV; > print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n"; > my $treeio = new Bio::TreeIO(-format => 'newick', -file => > "$NEWICKFILE"); > my $treeout = new Bio::TreeIO(-format => 'nexus', -file => "> > $NEXUSFILE"); > while(my $tree = $treeio->next_tree) { > $treeout->write_tree($treeout); > } > > exit 0; > > > /*------------------------------------------------------------*/ > > Running the script through command line: > Gives the following error: > > $ ./nexus.pl mrp-input.txt nexus.out > newickfile=mrp-input.txt, nexusfile=nexus.out > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus.pl:23 > > -------------------------------------- > > > Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ > ~sendu/bioperl/Bio/TreeIO.pm > > Questions:- > > 1. Please let me know if I am using the correct version. > If not, please point me to the latest one. > > 2. Provided that the version I am using is the right one, please > let me know what is wrong with the script. > > Thank you. > Regards, > Neha. > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From nehadnahar at yahoo.co.in Mon Feb 5 13:05:26 2007 From: nehadnahar at yahoo.co.in (Neha Nahar) Date: Mon, 5 Feb 2007 18:05:26 +0000 (GMT) Subject: [Bioperl-l] Convert newick to nexus format Message-ID: <288335.22352.qm@web8412.mail.in.yahoo.com> Hello everyone, I am trying to convert newick tree to nexus format. Using the script (refered from and email from George dated Wed Sep 22 11:52:47 EDT 2004) : /*------------------------------------------------------------*/ $ cat nexus.pl #!/usr/bin/perl -w use Bio::TreeIO; ($NEWICKFILE, $NEXUSFILE) = @ARGV; print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n"; my $treeio = new Bio::TreeIO(-format => 'newick', -file => "$NEWICKFILE"); my $treeout = new Bio::TreeIO(-format => 'nexus', -file => ">$NEXUSFILE"); while(my $tree = $treeio->next_tree) { $treeout->write_tree($treeout); } exit 0; /*------------------------------------------------------------*/ Running the script through command line: Gives the following error: $ ./nexus.pl mrp-input.txt nexus.out newickfile=mrp-input.txt, nexusfile=nexus.out ------------- EXCEPTION ------------- MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170 STACK toplevel ./nexus.pl:23 -------------------------------------- Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm Questions:- 1. Please let me know if I am using the correct version. If not, please point me to the latest one. 2. Provided that the version I am using is the right one, please let me know what is wrong with the script. Thank you. Regards, Neha. -Neha Nahar " Work for cause and not for applause, live to express and not to impress !" --------------------------------- Here?s a new way to find what you're looking for - Yahoo! Answers From hlapp at duke.edu Fri Feb 2 10:09:57 2007 From: hlapp at duke.edu (Hilmar Lapp) Date: Fri, 2 Feb 2007 10:09:57 -0500 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: <1170359746.2706.622.camel@localhost.localdomain> References: <1170359746.2706.622.camel@localhost.localdomain> Message-ID: <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote: > The second main change was to introduce a -flybase_compat argument > when > initializing the Bio::SeqIO writer, so that 'old style' cv and cvterms > (that are compatable with flybase) will be used, but now the default > will be to use current standards: Just my $0.02 ... obviously, Flybase may be the only organization that uses an 'old style' or any other way not compliant with 'current standards' (presumably SO), but if it's not the only one then this approach won't scale. Also, an argument -flybase_compat suggests to the unsuspecting that this is an endorsed flavor of the standard and fine to use for everyone else too. If Flybase is idiosyncratic in this way, why not make chadoxml.pm compliant with the standard as we all want it, keep it free from litter caused by usage of old versions of SO, and create a second module fb-chadoxml.pm that inherits from the first and merely overrides a few things so that it works for Flybase. This way, other organizations with similar needs can follow the path and create their own xyz-chadoxml.pm, rather than having to muck around in the chadoxml.pm that comes with the distribution. I'm not sure I fully grasp the underlying issue, so I may not make much sense here. Apologies if that's the case ... -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== From jason at bioperl.org Mon Feb 5 14:43:09 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 5 Feb 2007 11:43:09 -0800 Subject: [Bioperl-l] Convert newick to nexus format In-Reply-To: <209988.63723.qm@web8715.mail.in.yahoo.com> References: <209988.63723.qm@web8715.mail.in.yahoo.com> Message-ID: <9E477447-67F5-46CA-BCC1-47BB4170EC76@bioperl.org> please cc the mailing list when asking a question or followup. Sorry I don't know what you are doing wrong - you didn't resend your code so I don't know if you still have a typo. This code works fine for me use Bio::TreeIO; use strict; my ($filein,$fileout) = @ARGV; my ($format,$oformat) = qw(newick nexus); my $in = Bio::TreeIO->new(-file => $filein, -format => $format); my $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout"); while( my $t = $in->next_tree ) { $out->write_tree($t); } On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote: > Thank you very much for the reply. > > I fixed the code as per your suggestion,but now am getting a > different error: > > $ ./nexus.pl mrp-input.txt nexus.out > newickfile=mrp-input.txt, nexusfile=nexus.out > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus.pl:23 > > -------------------------------------- > > Please help me out with this script. > > Thank you. > Regards, > Neha > > > > > Jason Stajich wrote: you want to write the TREE > out not the TREE WRITER. > > > $treeout->write_tree($tree) > > not > $treeout->write_tree($treeout); > > On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote: > > Hello everyone, > > > I am trying to convert newick tree to nexus format. > Using the script (refered from and email from George dated Wed Sep > 22 11:52:47 EDT 2004) : > > > /*------------------------------------------------------------*/ > > > $ cat nexus.pl > #!/usr/bin/perl -w > > > use Bio::TreeIO; > > > ($NEWICKFILE, $NEXUSFILE) = @ARGV; > print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n"; > my $treeio = new Bio::TreeIO(-format => 'newick', -file => > "$NEWICKFILE"); > my $treeout = new Bio::TreeIO(-format => 'nexus', -file => "> > $NEXUSFILE"); > while(my $tree = $treeio->next_tree) { > $treeout->write_tree($treeout); > } > > > exit 0; > > > > > /*------------------------------------------------------------*/ > > > Running the script through command line: > Gives the following error: > > > $ ./nexus.pl mrp-input.txt nexus.out > newickfile=mrp-input.txt, nexusfile=nexus.out > > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus.pl:23 > > > -------------------------------------- > > > > > Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ > ~sendu/bioperl/Bio/TreeIO.pm > > > Questions:- > > > 1. Please let me know if I am using the correct version. > If not, please point me to the latest one. > > > 2. Provided that the version I am using is the right one, please > let me know what is wrong with the script. > > > Thank you. > Regards, > Neha. > > > > > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From nehadnahar at yahoo.co.in Mon Feb 5 14:58:08 2007 From: nehadnahar at yahoo.co.in (Neha Nahar) Date: Mon, 5 Feb 2007 19:58:08 +0000 (GMT) Subject: [Bioperl-l] Convert newick to nexus format In-Reply-To: <99196.23114.qm@web8711.mail.in.yahoo.com> Message-ID: <36024.1212.qm@web8405.mail.in.yahoo.com> Hi, Thank you for the code. I tried it but I still get the same exception. ------------- EXCEPTION ------------- MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170 STACK toplevel ./nexus1.pl:18 Please find attached the perl file(nexus.pl). Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm Please let me know if I am using the correct version.If not, please point me to the latest one. Thank you. Regards, nnahar Jason Stajich wrote:please cc the mailing list when asking a question or followup. Sorry I don't know what you are doing wrong - you didn't resend your code so I don't know if you still have a typo. This code works fine for me use Bio::TreeIO; use strict; my ($filein,$fileout) = @ARGV; my ($format,$oformat) = qw(newick nexus); my $in = Bio::TreeIO->new(-file => $filein, -format => $format);my $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout"); while( my $t = $in->next_tree ) { $out->write_tree($t); } On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote: Thank you very much for the reply. I fixed the code as per your suggestion,but now am getting a different error: $ ./nexus.pl mrp-input.txt nexus.out newickfile=mrp-input.txt, nexusfile=nexus.out ------------- EXCEPTION ------------- MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170 STACK toplevel ./nexus.pl:23 -------------------------------------- Please help me out with this script. Thank you. Regards, Neha Jason Stajich wrote: you want to write the TREE out not the TREE WRITER. $treeout->write_tree($tree) not $treeout->write_tree($treeout); On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote: Hello everyone, I am trying to convert newick tree to nexus format. Using the script (refered from and email from George dated Wed Sep 22 11:52:47 EDT 2004) : /*------------------------------------------------------------*/ $ cat nexus.pl #!/usr/bin/perl -w use Bio::TreeIO; ($NEWICKFILE, $NEXUSFILE) = @ARGV; print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n"; my $treeio = new Bio::TreeIO(-format => 'newick', -file => "$NEWICKFILE"); my $treeout = new Bio::TreeIO(-format => 'nexus', -file => ">$NEXUSFILE"); while(my $tree = $treeio->next_tree) { $treeout->write_tree($treeout); } exit 0; /*------------------------------------------------------------*/ Running the script through command line: Gives the following error: $ ./nexus.pl mrp-input.txt nexus.out newickfile=mrp-input.txt, nexusfile=nexus.out ------------- EXCEPTION ------------- MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170 STACK toplevel ./nexus.pl:23 -------------------------------------- Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm Questions:- 1. Please let me know if I am using the correct version. If not, please point me to the latest one. 2. Provided that the version I am using is the right one, please let me know what is wrong with the script. Thank you. Regards, Neha. -Neha Nahar " Work for cause and not for applause, live to express and not to impress !" --------------------------------- Here?s a new way to find what you're looking for - Yahoo! Answers -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ -Neha Nahar " Work for cause and not for applause, live to express and not to impress !" --------------------------------- Here?s a new way to find what you're looking for - Yahoo! Answers -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ -Neha Nahar " Work for cause and not for applause, live to express and not to impress !" --------------------------------- Here?s a new way to find what you're looking for - Yahoo! Answers -Neha Nahar " Work for cause and not for applause, live to express and not to impress !" --------------------------------- Here?s a new way to find what you're looking for - Yahoo! Answers -------------- next part -------------- A non-text attachment was scrubbed... Name: nexus.pl Type: application/x-perl Size: 811 bytes Desc: 1389215665-nexus.pl URL: From jason at bioperl.org Mon Feb 5 17:15:52 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 5 Feb 2007 14:15:52 -0800 Subject: [Bioperl-l] Convert newick to nexus format In-Reply-To: <36024.1212.qm@web8405.mail.in.yahoo.com> References: <36024.1212.qm@web8405.mail.in.yahoo.com> Message-ID: <84703383-600F-42F4-A860-DD0D1C43EE83@bioperl.org> Something is wrong with your install I am guessing - can you run the tests? Go to bioperl directory: $ perl t/TreeIO.t can you describe how you installed bioperl? On Feb 5, 2007, at 11:58 AM, Neha Nahar wrote: > > Hi, > Thank you for the code. > I tried it but I still get the same exception. > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus1.pl:18 > > > Please find attached the perl file(nexus.pl). > > > Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ > ~sendu/bioperl/Bio/TreeIO.pm > > Please let me know if I am using the correct version.If not, please > point me to the latest one. > > Thank you. > Regards, > nnahar > > > > > > Jason Stajich wrote:please cc the mailing list > when asking a question or followup. > > Sorry I don't know what you are doing wrong - you didn't resend > your code so I don't know if you still have a typo. > > This code works fine for me > > use Bio::TreeIO; > use strict; > my ($filein,$fileout) = @ARGV; > my ($format,$oformat) = qw(newick nexus); > my $in = Bio::TreeIO->new(-file => $filein, -format => $format);my > $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout"); > > > while( my $t = $in->next_tree ) { > $out->write_tree($t); > } > > > > On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote: > > Thank you very much for the reply. > > > I fixed the code as per your suggestion,but now am getting a > different error: > > > $ ./nexus.pl mrp-input.txt nexus.out > newickfile=mrp-input.txt, nexusfile=nexus.out > > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus.pl:23 > > > -------------------------------------- > > > Please help me out with this script. > > > Thank you. > Regards, > Neha > > > > > > > > > Jason Stajich wrote: you want to write the TREE > out not the TREE WRITER. > > > > > $treeout->write_tree($tree) > > > not > $treeout->write_tree($treeout); > > > On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote: > > > Hello everyone, > > > > > I am trying to convert newick tree to nexus format. > Using the script (refered from and email from George dated Wed Sep > 22 11:52:47 EDT 2004) : > > > > > /*------------------------------------------------------------*/ > > > > > $ cat nexus.pl > #!/usr/bin/perl -w > > > > > use Bio::TreeIO; > > > > > ($NEWICKFILE, $NEXUSFILE) = @ARGV; > print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n"; > my $treeio = new Bio::TreeIO(-format => 'newick', -file => > "$NEWICKFILE"); > my $treeout = new Bio::TreeIO(-format => 'nexus', -file => "> > $NEXUSFILE"); > while(my $tree = $treeio->next_tree) { > $treeout->write_tree($treeout); > } > > > > > exit 0; > > > > > > > > > /*------------------------------------------------------------*/ > > > > > Running the script through command line: > Gives the following error: > > > > > $ ./nexus.pl mrp-input.txt nexus.out > newickfile=mrp-input.txt, nexusfile=nexus.out > > > > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus.pl:23 > > > > > -------------------------------------- > > > > > > > > > Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ > ~sendu/bioperl/Bio/TreeIO.pm > > > > > Questions:- > > > > > 1. Please let me know if I am using the correct version. > If not, please point me to the latest one. > > > > > 2. Provided that the version I am using is the right one, please > let me know what is wrong with the script. > > > > > Thank you. > Regards, > Neha. > > > > > > > > > > > > > > > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > > > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > > > > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > > > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > > > > > > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not > to impress !" > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > From lzhtom at hotmail.com Mon Feb 5 22:31:56 2007 From: lzhtom at hotmail.com (zhihua li) Date: Tue, 06 Feb 2007 03:31:56 +0000 Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file? In-Reply-To: Message-ID: Thanks a lot! After checking out the script bp_index, I changed the syntax to: my $inx = Bio::Index::Fasta->new("test.fasta.idx", 'WRITE'); $inx->make_index("test.fasta"); Now I have a index file test.fasta.idx in my current directory. And I can use it in my later script by saying my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx"); So now everything is OK. But I don't understand why I have to use that syntax. And why the syntax provided in the document didn't work? >From: Jason Stajich >To: zhihua li >CC: bioperl-l at lists.open-bio.org, arokfl at yahoo.com >Subject: Re: [Bioperl-l] Bio::index::Fasta- where's the indexed file? >Date: Thu, 1 Feb 2007 22:24:44 -0800 > >I don't think BIOPERL_INDEX does anything in the module so that >documentation is not quite right. the env variable is used in the >scripts/index/bp_index and bp_fetch scripts so maybe a cut+paste job >went bad somewhere. > >you need to specify the full path you want with -filename - you can >just prepen the BIOPERL_INDEX to the filename like. >-filename => $ENV{BIOPERL_INDEX}."/$index" > >-jason >On Feb 1, 2007, at 7:27 PM, zhihua li wrote: > > > Sorry guys, the former empty mail was sent out by mistake. > > > > I'm using Bio::index::Fasta to index a file containing lots of > > sequences in fasta format. All is fine except one thing. > > > > According to the bioperl tutorial and the documents, the following > > code will make a indexed file: > > > > my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx", > > -write_flag => 1); > > $inx->make_index("test.fasta"); > > > > And in another script I can access the indexed file by sayinig > > > > $ENV{BIOPERL_INDEX} = "."; # find index in current directory > > my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx"); > > my $seq=$inx->fetch("ent1001"); #fetch the sequence named > > ent1001 > > > > However, after running the first script, I cannot find a new file > > test.fasta.idx in my current directory. And not surprisingly, when > > I ran the second script, perl told me it couldn't find > > "test.fasta.idx". > > > > What's going on here? > > > > Thanks a lot! > > > > _________________________________________________________________ > > ???????????????????????????????????????? MSN Messenger: http:// > > messenger.msn.com/cn > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > >-- >Jason Stajich >Miller Research Fellow >University of California, Berkeley >lab: 510.642.8441 >http://pmb.berkeley.edu/~taylor/people/js.html >http://fungalgenomes.org/ > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l _________________________________________________________________ ???????? MSN Explorer: http://explorer.msn.com/lccn/ From johnston at biochem.ucl.ac.uk Tue Feb 6 06:52:08 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Tue, 6 Feb 2007 11:52:08 +0000 (GMT) Subject: [Bioperl-l] RNA folding Message-ID: Hello, I've just joined the list - I'm a Bioinformatics PhD student at Essex University doing transcriptomics-related things. Mainly microarray analysis and more recently looking at RNA structure prediction. I was thinking about having a go at writing a bioperl-run wrapper around some of the structure prediction stuff, but according to the wiki this is being done already (at least for the Vienna tools). I spoke to Albert Vilella at the EBI the other day and he said Chris Fields was the man to speak to. So could he (or anyone) let me know what the current state of RNA structure prediction tools in bioperl is? Cheers, Cass xx From marian.thieme at lycos.de Tue Feb 6 08:52:10 2007 From: marian.thieme at lycos.de (Marian Thieme) Date: Tue, 06 Feb 2007 14:52:10 +0100 Subject: [Bioperl-l] dbSNP Message-ID: <45C8880A.7030702@lycos.de> Hello all, I looked for a method/class/function/script in the docuementation which provides the opportunity to generate a snp assay suited to submit to dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP/ http://www.ncbi.nlm.nih.gov/projects/SNP/how_to_submit.html) I didnt find those code, but I recognized that there is at least a xml parser to read dbSNP reports. Does anybody know if there is also an output class to generate dbSNP reports ? I could imagine, that at least the snp assay section is worth to be implemented. This example is given by ncbi: TYPE:SNPASSAY HANDLE:WI BATCH: 1.98 MOLTYPE:Genomic METHOD:RESEQ SYN NAMES:WI-SNP,DnaId,MapDna COMMENT: Here is where some public comment that applies to the entire batch of SNPS could be put. PRIVATE: Here is where a note to NCBI regarding processing that would not be seen by the outside, could be put. Note that these are is not exactly real SNPs, as the data were modified. || SNP:WI|WIAF-1234567 SYNONYM:EST4291092,EST8291092,EST7291092 ACCESSION:H30533 LENGTH:101 5'_ASSAY:GGCAGGGAAGGAAAATCCTAGGGNCAGCATTGGGGAGGGGGGGACTCTG OBSERVED:C/T 3'_ASSAY:TAAATTTATTGGGCAACAGGCTGCAGGTGAGGGGGCTGACAGGAGGAGGGA || SNP:WI|WIAF-1722 SYNONYM:STS-T17494,STS-T17494,STS-T17494 ACCESSION:T17494 LENGTH:269 5'_FLANK:CTTTCCCTCATCCCCTCTTCCACCACACCATCCCGGAACAAGTGCTCCAGGATT 5'_ASSAY:CCCTGCCCACTGGCCATTTTGGAGTGTGTCC OBSERVED:A/T 3'_ASSAY:GTGGGTAGCAATGTGGAAACCACCAGGGCCTTTGTGGAGAAAA 3'_FLANK:TGGAGGGGGTTGAGGGAGTCCCAGGAGGGGCTTATTTGAGGGCCTTTGCCACTT GCTCATAGGCGAGCTCGATCTCCTCATCATCTGGACAGGTGGAAGCGAATTCTT CCCGGGCGTAGGCATTGCTCAAGTACCGAT || Regards, Marian P.S. this is not in contradiction to my first request about the brackets notation. We need both formats. From cjfields at uiuc.edu Tue Feb 6 11:45:36 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 6 Feb 2007 10:45:36 -0600 Subject: [Bioperl-l] RNA folding In-Reply-To: References: Message-ID: On Feb 6, 2007, at 5:52 AM, Caroline Johnston wrote: > Hello, > > I've just joined the list - I'm a Bioinformatics PhD student at Essex > University doing transcriptomics-related things. Mainly microarray > analysis and more recently looking at RNA structure prediction. > > I was thinking about having a go at writing a bioperl-run wrapper > around > some of the structure prediction stuff, but according to the wiki > this is > being done already (at least for the Vienna tools). I spoke to Albert > Vilella at the EBI the other day and he said Chris Fields was the > man to > speak to. So could he (or anyone) let me know what the current > state of > RNA structure prediction tools in bioperl is? > > Cheers, > Cass xx > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Actually, the only RNA tool wrappers I have made are ones for ERPIN, RNAMotif, and Infernal (the only one in bioperl-run CVS at this time is RNAMotif). I am planning on writing up wrappers for Vienna, UNAFold, and a few others but haven't really started in. Here's where I'm at right now... I am writing up a new set of AnnotationI classes which positionally describe data (Meta) which I hope will help deal with this stuff. These would be similar in nature to Heikki's Bio::Seq::Meta classes: http://bioperl.org/pipermail/bioperl-l/2006-December/024414.html I would use a regular Bio::SeqI and store the structural data and anything else (such as energy calculations, etc) as Annotation objects in an AnnotationCollection, and then write up a series of SeqIO modules to get data into/out of the designated structure formats, like UNAfold ct, RNAML, and so on. Each sequence would then be capable of holding more than one structural Annotation (i.e. could represent different folding pathways, alternative RNA folds, and so on). At this point I represent the data as an array of hashes where $array [0] is nt 1 and the hash keys indicate the type of interaction, base interacted with, etc. The text representation would be as simple Eddy WUSS (Rfam-like) format by default, which is capable of representing some complex data (pseudoknots, for instance), is compact, and is documented (via the Infernal manual). Tags will probably switch to more ontologically relevant terms (probably from RNAML or RNA Ontology), but in general it is something like this: [ {'interaction' => 'WC', 'base' => 24}, {'interaction' => 'WC', 'base' => 23}, {'interaction' => 'SS'}, ... ] In this implementation every seq position would have some kind of interaction designation, though that's open for debate as it could just be simple text or undef for single-stranded regions. This is also scalable based on complexity of the data: if one wanted to add tert/quaternary interactions, location, base modifications, remote sequence interactions, etc., extra key/value pairs could be used. Comversely, if one only wanted sec structure (for drawing RNA structures, for example), then only that data would be parsed. If you (or anyone listening) have any suggestions I would greatly appreciate them. chris From johnsonm at gmail.com Tue Feb 6 18:53:49 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Tue, 6 Feb 2007 17:53:49 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> Message-ID: Okay, I need to get something going for a project I'm working on. Options: 1) Stick it all in one module: This can get a bit ugly, as Glimmer, as opposed to GlimmerM and GlimmerHMM, does not explicitly identify itself in the prediction report. You can pick up on some unique things in the output file, but you don't know what you've got until you're actually parsing it. Unless you require a format argument up front, then you can split the parsing code up into different functions. 2) Two modules, one for GlimmerM/GlimmerHMM and one for Glimmer2/Glimmer3. With or without an abstract dispatch front end. I suppose at this point, after getting my hands dirty, I'd prefer 1), with an explicit -format => Glimmer2/3/M/HMM arg required in the constructor. Though I'm not opposed to 2) if that is what it takes to get it into Bioperl. If we can achieve some sort of consensus without too much bloodshed, I'll shoot y'all some patches and we can consider this issue checked off the list. On 9/20/06, Mark Johnson wrote: > > I think it's going to be at least two modules, one for the > prokaryotic stuff and one for the eukaryotic. And really, the > prokaryotic stuff is different enough to warrant two modules. So three > different parsers. Could do it in one, but it would be ugly and > nasty. However, this does not preclude three parsers and one abstract > interface, which is your excellent suggestion. > Oh, and excuse me, but I have a bit of a rant here, after dealing > with parsers and pipelines for the last few months. Parsers should > not load the whole input file into RAM to parse it. And Pipelines > using the parsers (Ensembl / biopipe) should not stuff the whole > result set from the parser into a single array. When you're trying to > annotate assemblies, it sucks to have to split up contigs/supercontigs > because the whole result set won't fit into RAM on a 12 gig blade. > Sheesh. Though this doesn't matter for bacterial genomes, as they're > tiny (by comparison to vertebrates). There, sorry, been saving up > that frustration for a while. No offense meant, hope I didn't tick > anybody off. 8) > Torsten: You sound like you know what you're doing with respect > to Bioperl more than I do, and I know I don't have CVS access, so I'll > defer to you. I'd be happy to help out, though. > > > On 9/20/06, Hilmar Lapp wrote: > > > > On Sep 19, 2006, at 9:13 PM, Torsten Seemann wrote: > > > > > I'm not sure whether to > > > > > > 1. parse them all under the same module, perhaps with a > > > -format=>'glimmerXXX' parameter > > > > > > 2. create a single new module Glimmer2 and Glimmer3 > > > > > > 3. create two new modules, one for Glimmer2 and one for Glimmer3, > > > given > > > they are different outputs both in syntax and number of output files > > > > > > Any advice from Bioperl 'old timers' appreciated ;-) > > > > > > > If at all possible I'd favor 1), with e.g. Bio::Tools::GFF being an > > example for how this can work. > > > > If this would amount to basically 4 modules stringed together into > > one file (because the parsing code can't share much if anything > > between the flavors), it'd still be advantageous to have a single > > frontend module that would then dispatch. > > > > -hilmar > > > > -- > > =========================================================== > > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > =========================================================== > > > > > > > > > > > > > From jason at bioperl.org Tue Feb 6 19:33:11 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 6 Feb 2007 16:33:11 -0800 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> Message-ID: <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> I definitely vote for 1) - worst case you have 4 separate methods if there is no good way to condense the parsing for each format and require the user to specify the format. I have no problem with requiring user to specify what program she used - if we can be fancy and guess the format later (i.e. guess format in SeqIO) -then that's icing. -jason On Feb 6, 2007, at 3:53 PM, Mark Johnson wrote: > Okay, I need to get something going for a project I'm working on. > Options: > > 1) Stick it all in one module: This can get a bit ugly, as > Glimmer, as > opposed to GlimmerM and GlimmerHMM, does not explicitly identify > itself in > the prediction report. You can pick up on some unique things in > the output > file, but you don't know what you've got until you're actually > parsing it. > Unless you require a format argument up front, then you can split the > parsing code up into different functions. > 2) Two modules, one for GlimmerM/GlimmerHMM and one for Glimmer2/ > Glimmer3. > With or without an abstract dispatch front end. > > I suppose at this point, after getting my hands dirty, I'd prefer > 1), with > an explicit -format => Glimmer2/3/M/HMM arg required in the > constructor. > Though I'm not opposed to 2) if that is what it takes to get it into > Bioperl. > > If we can achieve some sort of consensus without too much > bloodshed, I'll > shoot y'all some patches and we can consider this issue checked off > the > list. > > On 9/20/06, Mark Johnson wrote: >> >> I think it's going to be at least two modules, one for the >> prokaryotic stuff and one for the eukaryotic. And really, the >> prokaryotic stuff is different enough to warrant two modules. So >> three >> different parsers. Could do it in one, but it would be ugly and >> nasty. However, this does not preclude three parsers and one >> abstract >> interface, which is your excellent suggestion. >> Oh, and excuse me, but I have a bit of a rant here, after dealing >> with parsers and pipelines for the last few months. Parsers should >> not load the whole input file into RAM to parse it. And Pipelines >> using the parsers (Ensembl / biopipe) should not stuff the whole >> result set from the parser into a single array. When you're >> trying to >> annotate assemblies, it sucks to have to split up contigs/ >> supercontigs >> because the whole result set won't fit into RAM on a 12 gig blade. >> Sheesh. Though this doesn't matter for bacterial genomes, as they're >> tiny (by comparison to vertebrates). There, sorry, been saving up >> that frustration for a while. No offense meant, hope I didn't tick >> anybody off. 8) >> Torsten: You sound like you know what you're doing with respect >> to Bioperl more than I do, and I know I don't have CVS access, so >> I'll >> defer to you. I'd be happy to help out, though. >> >> >> On 9/20/06, Hilmar Lapp wrote: >>> >>> On Sep 19, 2006, at 9:13 PM, Torsten Seemann wrote: >>> >>>> I'm not sure whether to >>>> >>>> 1. parse them all under the same module, perhaps with a >>>> -format=>'glimmerXXX' parameter >>>> >>>> 2. create a single new module Glimmer2 and Glimmer3 >>>> >>>> 3. create two new modules, one for Glimmer2 and one for Glimmer3, >>>> given >>>> they are different outputs both in syntax and number of output >>>> files >>>> >>>> Any advice from Bioperl 'old timers' appreciated ;-) >>>> >>> >>> If at all possible I'd favor 1), with e.g. Bio::Tools::GFF being an >>> example for how this can work. >>> >>> If this would amount to basically 4 modules stringed together into >>> one file (because the parsing code can't share much if anything >>> between the flavors), it'd still be advantageous to have a single >>> frontend module that would then dispatch. >>> >>> -hilmar >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >>> >>> >>> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From torsten.seemann at infotech.monash.edu.au Tue Feb 6 21:36:54 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Wed, 7 Feb 2007 13:36:54 +1100 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: > I definitely vote for 1) - worst case you have 4 separate methods if > there is no good way to condense the parsing for each format and > require the user to specify the format. And make the defaut -format to be what is currently parses, ie. GlimmerM/GlimmerHMM > I have no problem with requiring user to specify what program she > used - if we can be fancy and guess the format later (i.e. guess > format in SeqIO) -then that's icing. Agreed. >> Okay, I need to get something going for a project I'm working on. I would normally try to help but I am so swamped with work-work at the moment. Just a reminder that last year I added examples of the different Glimmer outputs to the CVS repository: ./t/data/Glimmer3.predict ./t/data/Glimmer3.detail ./t/data/GlimmerHMM.out ./t/data/Glimmer2.out ./t/data/GlimmerM.out ./t/data/glimmer.out (this was the original one) Thanks for taking this on! --Torsten From mitch_skinner at berkeley.edu Tue Feb 6 23:37:35 2007 From: mitch_skinner at berkeley.edu (Mitch Skinner) Date: Tue, 06 Feb 2007 20:37:35 -0800 Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels Message-ID: <45C9578F.2060802@berkeley.edu> Hello, I'm working on an AJAX version of GBrowse (http://genome.biowiki.org), where we're pre-rendering entire chromosomes by breaking them up into tiles. One of the problems we have is that it takes a long time to render all those tiles. One of the things that's slowing the process down (and using lots of RAM) is rendering the gridlines, and it would make things a lot easier (and faster) for us if we could assume that the gridlines were the same for each tile. Since we're only rendering at a particular set of zoom levels (that we have control over), I think this is a reasonable assumption. Given the right set of zoom levels, the assumption works almost all the time, except for one specific case. It has to do with the way draw_grid and map_pt in Bio::Graphics::Panel work for the very first gridline. Here's how draw_grid (in CVS HEAD) computes the first gridline: my $first_tick = $minor * int($self->start/$minor); $first_tick, $minor and $self->start are in base-pair space, which is 1-based. However, if ($self->start < $minor) then $first_tick is 0. This might not be a problem, except that $first_tick is translated into pixel coordinates in map_pt, which expects 1-based bp coordinates. Here are the relevant lines in map_pt: my $val = $flip ? int (0.5 + $pr - ($length - ($_- 1)) * $scale) : int (0.5 + ($_-$offset-1) * $scale); This style of rounding only works for positive numbers; rounding 0.6 by doing int(0.5 + 0.6) gives you 1 as expected, but rounding -0.6 by doing int(0.5 + -0.6) gives you 0. So if the first three gridlines are at 0, 10, and 20 bp, then (assuming $scale is 1, $offset is 0, $flip evaluates false, and pad left is 0) they're drawn at pixels 0, 9, and 19. I think that there should be gridlines at pixels 0, 10, and 20. The fact that currently the first interval is 9 pixels and the second is 10 pixels is breaking my hopeful assumption about the gridlines. AFAICT my problems are solved if we make two changes: change the above line from draw_grid to this: my $first_tick = 1 + $minor * int(($start - 1)/$minor); and change the lines from map_pt to this: my $val = $flip ? ($pr - ($length - ($_- 1)) * $scale) : (($_-$offset-1) * $scale); $val = int($val + .5 * ($val <=> 0)); Does this make sense? If people agree that these changes are right then I can also produce a proper patch if y'all would prefer that. Regards, Mitch From lstein at cshl.edu Wed Feb 7 07:17:22 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 7 Feb 2007 07:17:22 -0500 Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels In-Reply-To: <45C9578F.2060802@berkeley.edu> References: <45C9578F.2060802@berkeley.edu> Message-ID: <6dce9a0b0702070417k692fedcchfac5b1f0e01f72bd@mail.gmail.com> Hi Mitch, Zero is not a forbidden coordinate, since gbrowse also works on genetic maps which have negative and floating point coordinates. You've simply picked up a boundary case where the rounding isn't working properly. I will fix this now. Lincoln On 2/6/07, Mitch Skinner wrote: > > Hello, > > I'm working on an AJAX version of GBrowse (http://genome.biowiki.org), > where we're pre-rendering entire chromosomes by breaking them up into > tiles. One of the problems we have is that it takes a long time to > render all those tiles. One of the things that's slowing the process > down (and using lots of RAM) is rendering the gridlines, and it would > make things a lot easier (and faster) for us if we could assume that the > gridlines were the same for each tile. Since we're only rendering at a > particular set of zoom levels (that we have control over), I think this > is a reasonable assumption. > > Given the right set of zoom levels, the assumption works almost all the > time, except for one specific case. It has to do with the way draw_grid > and map_pt in Bio::Graphics::Panel work for the very first gridline. > > Here's how draw_grid (in CVS HEAD) computes the first gridline: > > my $first_tick = $minor * int($self->start/$minor); > > $first_tick, $minor and $self->start are in base-pair space, which is > 1-based. However, if ($self->start < $minor) then $first_tick is 0. > This might not be a problem, except that $first_tick is translated into > pixel coordinates in map_pt, which expects 1-based bp coordinates. Here > are the relevant lines in map_pt: > > my $val = $flip > ? int (0.5 + $pr - ($length - ($_- 1)) * $scale) > : int (0.5 + ($_-$offset-1) * $scale); > > This style of rounding only works for positive numbers; rounding 0.6 by > doing int(0.5 + 0.6) gives you 1 as expected, but rounding -0.6 by doing > int(0.5 + -0.6) gives you 0. So if the first three gridlines are at 0, > 10, and 20 bp, then (assuming $scale is 1, $offset is 0, $flip evaluates > false, and pad left is 0) they're drawn at pixels 0, 9, and 19. > > I think that there should be gridlines at pixels 0, 10, and 20. The > fact that currently the first interval is 9 pixels and the second is 10 > pixels is breaking my hopeful assumption about the gridlines. > > AFAICT my problems are solved if we make two changes: > change the above line from draw_grid to this: > my $first_tick = 1 + $minor * int(($start - 1)/$minor); > and change the lines from map_pt to this: > > my $val = $flip > ? ($pr - ($length - ($_- 1)) * $scale) > : (($_-$offset-1) * $scale); > $val = int($val + .5 * ($val <=> 0)); > > Does this make sense? If people agree that these changes are right then > I can also produce a proper patch if y'all would prefer that. > > Regards, > Mitch > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Wed Feb 7 07:18:40 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 7 Feb 2007 07:18:40 -0500 Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels In-Reply-To: <45C9578F.2060802@berkeley.edu> References: <45C9578F.2060802@berkeley.edu> Message-ID: <6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com> However, I'm also very interested in why grid-drawing takes so long. When I've profiled drawing, neither grid drawing nor map_pt() consume any significant amount of time. Lincoln On 2/6/07, Mitch Skinner wrote: > > Hello, > > I'm working on an AJAX version of GBrowse (http://genome.biowiki.org), > where we're pre-rendering entire chromosomes by breaking them up into > tiles. One of the problems we have is that it takes a long time to > render all those tiles. One of the things that's slowing the process > down (and using lots of RAM) is rendering the gridlines, and it would > make things a lot easier (and faster) for us if we could assume that the > gridlines were the same for each tile. Since we're only rendering at a > particular set of zoom levels (that we have control over), I think this > is a reasonable assumption. > > Given the right set of zoom levels, the assumption works almost all the > time, except for one specific case. It has to do with the way draw_grid > and map_pt in Bio::Graphics::Panel work for the very first gridline. > > Here's how draw_grid (in CVS HEAD) computes the first gridline: > > my $first_tick = $minor * int($self->start/$minor); > > $first_tick, $minor and $self->start are in base-pair space, which is > 1-based. However, if ($self->start < $minor) then $first_tick is 0. > This might not be a problem, except that $first_tick is translated into > pixel coordinates in map_pt, which expects 1-based bp coordinates. Here > are the relevant lines in map_pt: > > my $val = $flip > ? int (0.5 + $pr - ($length - ($_- 1)) * $scale) > : int (0.5 + ($_-$offset-1) * $scale); > > This style of rounding only works for positive numbers; rounding 0.6 by > doing int(0.5 + 0.6) gives you 1 as expected, but rounding -0.6 by doing > int(0.5 + -0.6) gives you 0. So if the first three gridlines are at 0, > 10, and 20 bp, then (assuming $scale is 1, $offset is 0, $flip evaluates > false, and pad left is 0) they're drawn at pixels 0, 9, and 19. > > I think that there should be gridlines at pixels 0, 10, and 20. The > fact that currently the first interval is 9 pixels and the second is 10 > pixels is breaking my hopeful assumption about the gridlines. > > AFAICT my problems are solved if we make two changes: > change the above line from draw_grid to this: > my $first_tick = 1 + $minor * int(($start - 1)/$minor); > and change the lines from map_pt to this: > > my $val = $flip > ? ($pr - ($length - ($_- 1)) * $scale) > : (($_-$offset-1) * $scale); > $val = int($val + .5 * ($val <=> 0)); > > Does this make sense? If people agree that these changes are right then > I can also produce a proper patch if y'all would prefer that. > > Regards, > Mitch > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From johnsonm at gmail.com Wed Feb 7 11:50:05 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Wed, 7 Feb 2007 10:50:05 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: Well, each format has some unique features. If the user declines to specify the format, I can figure it out, but it will probably involve scanning the input file twice. I'll take a look. I can do all the parsing in one function, in fact I have, just to see how nasty it would end up being. I just can't stomach having the code that tightly coupled and hard to read. In the end it'll probably be three functions. GlimmermM/HMM are pretty close. Maybe two, Glimmer2 and Glimmer3 aren't *that* different, either. On 2/6/07, Jason Stajich wrote: > > I definitely vote for 1) - worst case you have 4 separate methods if there > is no good way to condense the parsing for each format and require the user > to specify the format. > > I have no problem with requiring user to specify what program she used - > if we can be fancy and guess the format later (i.e. guess format in SeqIO) > -then that's icing. > > -jason > > From adsj at novozymes.com Wed Feb 7 12:11:32 2007 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Wed, 07 Feb 2007 18:11:32 +0100 Subject: [Bioperl-l] Bioperl 1.4 to 1.5.2, adding qualifiers to Bio::Seq-objects Message-ID: <8764adoptn.fsf@topper.koldfront.dk> Hi. I am transitioning from Bioperl 1.4 to 1.5.2, and the qualifiers I add to features in Bio::Seq objects have stopped appearing when I output them as EMBL or GenBank-files. Below is a test-script that exercises the problem. I guess I should be doing something else when adding qualifiers, now with 1.5.2 (as reading an EMBL-file with Bio::SeqIO and outputting it again of course works perfectly), but I can't deduce what from perldoc Bio::SeqFeature::Generic - it still lists the add_tag_value method, and calling it doesn't croak nor warn. I have found some comments on this in the release notes of 1.5.0? on the Bioperl wiki, but I must admit I wasn't able to extract what methods I should be calling instead. If someone could point me to the relevant documentation or tell me what method to use instead, I would be happy as a clam. Best regards, Adam == = use Test::More tests=>2; use strict; use warnings; use Bio::Seq; use Bio::SeqFeature::Generic; use IO::String; use Bio::SeqIO; my $seq=Bio::Seq->new( -seq=>'actgactgactg', ); $seq->display_id('D27'); $seq->accession_number('DB:D27'); my $seq_feature=Bio::SeqFeature::Generic->new( -strand=>1, -primary=>'source', ); $seq_feature->set_attributes(-start=>2, -end=>8); $seq_feature->add_tag_value(note=>'TEST'); $seq_feature->add_tag_value(db_xref=>'DB:D27'); $seq->add_SeqFeature($seq_feature); my $raw=''; my $fh=IO::String->new($raw); my $out=Bio::SeqIO->new(-format=>'EMBL', -fh=>$fh); $out->write_seq($seq); ok($raw=~m!/note!, 'Qualifier note found'); ok($raw=~m!/db_xref!, 'Qualifier db_xref found'); == = ? -- Adam Sj?gren adsj at novozymes.com From cjfields at uiuc.edu Wed Feb 7 12:50:13 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Feb 2007 11:50:13 -0600 Subject: [Bioperl-l] Bioperl 1.4 to 1.5.2, adding qualifiers to Bio::Seq-objects In-Reply-To: <8764adoptn.fsf@topper.koldfront.dk> References: <8764adoptn.fsf@topper.koldfront.dk> Message-ID: On Feb 7, 2007, at 11:11 AM, Adam Sj?gren wrote: > Hi. > > > I am transitioning from Bioperl 1.4 to 1.5.2, and the qualifiers I add > to features in Bio::Seq objects have stopped appearing when I output > them as EMBL or GenBank-files. > > Below is a test-script that exercises the problem. > > I guess I should be doing something else when adding qualifiers, now > with 1.5.2 (as reading an EMBL-file with Bio::SeqIO and outputting it > again of course works perfectly), but I can't deduce what from perldoc > Bio::SeqFeature::Generic - it still lists the add_tag_value method, > and calling it doesn't croak nor warn. > > I have found some comments on this in the release notes of 1.5.0? on > the Bioperl wiki, but I must admit I wasn't able to extract what > methods I should be calling instead. > > If someone could point me to the relevant documentation or tell me > what method to use instead, I would be happy as a clam. > > > Best regards, > > Adam ... This works for me using bioperl-live (Mac OS X): ok 1 - Qualifier note found ok 2 - Qualifier db_xref found If I print the string I get: ID DB:D27; SV 1; linear; unassigned DNA; STD; UNC; 12 BP. XX AC DB:D27; XX XX FH Key Location/Qualifiers FH FT source 2..8 FT /db_xref="DB:D27" FT /note="TEST" XX SQ Sequence 12 BP; 3 A; 3 C; 3 G; 3 T; 0 other; actgactgac tg 12 // GenBank also works: LOCUS D27 12 bp dna linear UNK ACCESSION DB:D27 FEATURES Location/Qualifiers source 2..8 /db_xref="DB:D27" /note="TEST" BASE COUNT 3 a 3 c 3 g 3 t ORIGIN 1 actgactgac tg // If you haven't uninstalled 1.4, make sure you aren't running 1.4 or mixing the two versions (you can check by using 'perldoc -l Bio::Root::Root'). chris From cjfields at uiuc.edu Wed Feb 7 13:04:33 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Feb 2007 12:04:33 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: <91A3A651-C0D5-495F-941F-05B8AA0DDA60@uiuc.edu> On Feb 7, 2007, at 10:50 AM, Mark Johnson wrote: > Well, each format has some unique features. If the user > declines to > specify the format, I can figure it out, but it will probably involve > scanning the input file twice. I'll take a look. > I can do all the parsing in one function, in fact I have, just > to see > how nasty it would end up being. I just can't stomach having the > code that > tightly coupled and hard to read. In the end it'll probably be three > functions. GlimmermM/HMM are pretty close. Maybe two, Glimmer2 and > Glimmer3 aren't *that* different, either. I don't see a problem with passing off the parse to a defined class method either right off or mid-parse. I'm doing something like this with a revamped GenBank parser: # declare local to module my %GLIMMER_METHODS = ( 'GlimmerHMM' => '_parsehmm', 'Glimmer' => '_parsenormal', ....others if needed '_DEFAULT_' => '_parseabnormal' ); ... Then either preparse part of file using _readline() to determine format, or use -format and bypass preparsing: sub next_thingy { ... if (!$format) { while (my $line = $self->_readline()) { if ($line =~ m{(something)}) { $format = $1; $self->_pushback($line); last; } } } my $method = (exists $GLIMMER_METHODS($format)) ? $GLIMMER_METHODS($format) : ($GLIMMER_METHODS('_DEFAULT_'); # fallback to this one return $self->$method() # hand off parsing flow to to proper parser ... } # all parser variants would have this structure: sub _parsehmm { my $self = shift; ... init stuff here while (my $line = $self->_readline()) { ... do stuff until END of next prediction/report } ... return data if any } chris > On 2/6/07, Jason Stajich wrote: >> >> I definitely vote for 1) - worst case you have 4 separate methods >> if there >> is no good way to condense the parsing for each format and require >> the user >> to specify the format. >> >> I have no problem with requiring user to specify what program she >> used - >> if we can be fancy and guess the format later (i.e. guess format >> in SeqIO) >> -then that's icing. >> >> -jason >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From johnston at biochem.ucl.ac.uk Wed Feb 7 13:56:52 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Wed, 7 Feb 2007 18:56:52 +0000 (GMT) Subject: [Bioperl-l] RNA folding In-Reply-To: References: Message-ID: Thanks Chris. Storing the interaction data as a hash according to an ontology and using an extended bracket notation as the string representation seems to make sense, but I'm still unsure how this is supposed to be attached to the Seq objects. You reckon it should be an AnnotationI? I'm not sure I understand the distinction between annotations and features. From the docs I got the impression that Features were like annotation on bits of sequences and had a reference to the sequence to which they belong, whereas annotations don't. If that's the case though, why would RNA structure be an annotation rather than a feature? If not, what is the distinction between them? Are the positional Annotation subclasses you're developing intended to replace features? Have I got the wrong end of the stick entirely? Cheers, Cass On Tue, 6 Feb 2007, Chris Fields wrote: > Actually, the only RNA tool wrappers I have made are ones for ERPIN, > RNAMotif, and Infernal (the only one in bioperl-run CVS at this time > is RNAMotif). I am planning on writing up wrappers for Vienna, > UNAFold, and a few others but haven't really started in. Here's > where I'm at right now... > > I am writing up a new set of AnnotationI classes which positionally > describe data (Meta) which I hope will help deal with this stuff. > These would be similar in nature to Heikki's Bio::Seq::Meta classes: > > http://bioperl.org/pipermail/bioperl-l/2006-December/024414.html > > I would use a regular Bio::SeqI and store the structural data and > anything else (such as energy calculations, etc) as Annotation > objects in an AnnotationCollection, and then write up a series of > SeqIO modules to get data into/out of the designated structure > formats, like UNAfold ct, RNAML, and so on. Each sequence would then > be capable of holding more than one structural Annotation (i.e. could > represent different folding pathways, alternative RNA folds, and so on). > > At this point I represent the data as an array of hashes where $array > [0] is nt 1 and the hash keys indicate the type of interaction, base > interacted with, etc. The text representation would be as simple > Eddy WUSS (Rfam-like) format by default, which is capable of > representing some complex data (pseudoknots, for instance), is > compact, and is documented (via the Infernal manual). Tags will > probably switch to more ontologically relevant terms (probably from > RNAML or RNA Ontology), but in general it is something like this: > > [ > {'interaction' => 'WC', > 'base' => 24}, > {'interaction' => 'WC', > 'base' => 23}, > {'interaction' => 'SS'}, > ... > ] > > In this implementation every seq position would have some kind of > interaction designation, though that's open for debate as it could > just be simple text or undef for single-stranded regions. > > This is also scalable based on complexity of the data: if one wanted > to add tert/quaternary interactions, location, base modifications, > remote sequence interactions, etc., extra key/value pairs could be > used. Comversely, if one only wanted sec structure (for drawing RNA > structures, for example), then only that data would be parsed. > > If you (or anyone listening) have any suggestions I would greatly > appreciate them. > > chris > > From cjfields at uiuc.edu Wed Feb 7 17:15:44 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Feb 2007 16:15:44 -0600 Subject: [Bioperl-l] RNA folding In-Reply-To: References: Message-ID: <7360B66F-6AF3-4CB1-8343-0A19E42AD7F8@uiuc.edu> On Feb 7, 2007, at 12:56 PM, Caroline Johnston wrote: > Thanks Chris. > > Storing the interaction data as a hash according to an ontology and > using > an extended bracket notation as the string representation seems to > make > sense, but I'm still unsure how this is supposed to be > attached to the Seq objects. You reckon it should be an AnnotationI? As long as it describes everything in the object and that there is a reasonable way of textually representing the data, I think you can attach anything as annotation. A recent example is the addition of trees as annotation. Also, Annotation can be used to describe alignments (such as the structure consensus string in Rfam alignments), or added to SeqFeatures. The class just needs to implement AnnotatableI. > I'm not sure I understand the distinction between annotations and > features. From the docs I got the impression that Features were like > annotation on bits of sequences and had a reference to the sequence to > which they belong, whereas annotations don't. If that's the case > though, > why would RNA structure be an annotation rather than a feature? If > not, > what is the distinction between them? Are the positional Annotation > subclasses you're developing intended to replace features? Have I > got the > wrong end of the stick entirely? > > Cheers, > Cass The key distinction between seqfeatures and annotations is that annotations are normally associated with the entire sequence record, while seqfeatures normally describe a part of the sequence (and thus have a location on the sequence). There are a few exceptions, but in general that's that case. The HOWTO gives a bit more background: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation Using annotations or seqfeatures in a case like this may be completely dependent on one's point of view. For instance, one implementation I had considered was adding an interface to Bio::Seq which would allow Seq objects to also have Bio::Structure objects/ since my view is that any sequence could (optionally) have a structure associated with it. However, I reasoned that a sequence could actually have multiple structures (RNA, ssDNA, and protein can have several alternative folds or different folding pathways, for instance). Instead of splitting up each structure into individual seqfeatures (where each which would have to be tagged with the relevant structure and score info), I could have one class encompass all of that data in a reasonable way. Hence I used Annotation. BTW, this isn't meant to replace features in any way. It would be primarily used to describe (1) a sequence as a whole, such as a tRNA sequence, (2) a seqfeature, such as a tRNA, rRNA, riboswitch, etc in a genome sequence, or (3) a conserved structure in an alignment, such as Rfam stockholm output. I'll add that the option of splitting the data into seqfeatures isn't ruled out. It would be a matter of using a helper method, maybe in SeqUtils or directly in Annotation::Meta or whatever I end up calling it. I plan on adding something along those lines at some point. chris From mitch_skinner at berkeley.edu Wed Feb 7 18:26:53 2007 From: mitch_skinner at berkeley.edu (Mitch Skinner) Date: Wed, 07 Feb 2007 15:26:53 -0800 Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels In-Reply-To: <6dce9a0b0702070417k692fedcchfac5b1f0e01f72bd@mail.gmail.com> References: <45C9578F.2060802@berkeley.edu> <6dce9a0b0702070417k692fedcchfac5b1f0e01f72bd@mail.gmail.com> Message-ID: <45CA603D.1070901@berkeley.edu> Lincoln Stein wrote: > Zero is not a forbidden coordinate, since gbrowse also works on > genetic maps which have negative and floating point coordinates. > You've simply picked up a boundary case where the rounding isn't > working properly. I will fix this now. Thanks for the fix. What do you think of the following case?. This is something I actually ran into. Suppose you have: the original draw_grid: my $first_tick = $minor * int($self->start/$minor); and my version of map_pt: my $val = $flip ? ($pr - ($length - ($_- 1)) * $scale) : (($_-$offset-1) * $scale); $val = int($val + .5 * ($val <=> 0)); and scale=0.5, offset=0, pad_left=0, flip=0, and minor=10. Our tiles are currently 1000px wide. So the first gridline will be at 0bp => -1px and the 200th gridline will be at 2000bp => 1000px. So the first tile will not have a gridline at it's 0th pixel but the second tile will have one there. Last night I was thinking that this was an artifact of having gridlines start at 0bp but now I'm thinking this is just because rounding half-pixels leaves an extra space when crossing zero. Which is not unreasonable; it just invalidates the assumption I was hoping to make that the gridlines are the same for each tile. Maybe it's just unreasonable to think that floating point calculations will give pixel-exact results. Or I may just be barking up the wrong tree entirely. Perhaps it's time to reconsider at a higher level (see my next message). Mitch From mitch_skinner at berkeley.edu Wed Feb 7 18:28:11 2007 From: mitch_skinner at berkeley.edu (Mitch Skinner) Date: Wed, 07 Feb 2007 15:28:11 -0800 Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels In-Reply-To: <6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com> References: <45C9578F.2060802@berkeley.edu> <6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com> Message-ID: <45CA608B.80907@berkeley.edu> Lincoln Stein wrote: > However, I'm also very interested in why grid-drawing takes so long. > When I've profiled drawing, neither grid drawing nor map_pt() consume > any significant amount of time. Well, the approach that we've been taking is to hand Bio::Graphics::Panel a fake GD object that stores all of the graphical primitives (line, rectangle, filledRectangle, etc. + their parameters) and then draws them later in chunks (for each tile, we draw all the primitives that overlap its pixel boundaries). We're doing this because trying to create a real GD object that's hundreds of millions of pixels wide takes too much RAM. But storing all the gridlines (for a whole chromosome, at a high zoom level) also takes a lot of RAM, and getting the gridlines for the current tile and translating their coordinates into the coordinate space of the tile also takes a fair amount of CPU. The gridline hack I've been experimenting with (that prompted these emails) was motivated by the hope that the gridlines were regular enough that we wouldn't have to store them explicitly, but just draw the same gridlines over and over again. It runs almost twice as fast as the version that explicitly stores the gridlines. So the main slowdown is not in draw_grid or map_pt, but in our code that's storing/retrieving and translating the gridlines. Which we are also looking into speeding up. But the memory usage is harder to reduce; I've experimented with trying to compress the gridline data but it seems easier to just have the panel draw the grid directly. The more I read the Panel code, the more I think it would be nice to make more use of it. One of the reasons that we're trying to fool it right now is that there seem to be a number of behaviors in it (and/or in the glyphs?) that take the current image boundaries into account (drawing an arrow where a feature runs off the edge of the image, etc.). But in our browser each tile is supposed to mesh seamlessly with its neighbor, so if there's an easy way to turn off those edge-aware behaviors that would be pretty interesting. Ian has also suggested that it might be better to store less information than the full set of graphics primitives. For example, we could just store the Panel's glyph boxes and use their (pixel bound)->feature information to decide which features need to be drawn for each tile. I'm going to be spending some time reading the Bio::Graphics code in more depth. I'd also welcome suggestions from you or anyone on the list. Thanks, Mitch From sdbrown at annular.org Wed Feb 7 18:41:13 2007 From: sdbrown at annular.org (Steven Brown) Date: Wed, 7 Feb 2007 15:41:13 -0800 Subject: [Bioperl-l] Bio::Restriction::Analysis cut site problem in 1.5.2 Message-ID: <3828F5B9-BFF8-47F4-8228-3BEA036513A0@annular.org> The module seems to have trouble handling the cut-site specifiers that surround the sequence that the enzyme is specific for. The error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Bad end parameter (22). End must be less than the total length of sequence (total=6) STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.8.6/ Bio/Root/Root.pm:328 STACK: Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.8.6/ Bio/PrimarySeq.pm:371 STACK: Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/perl5/ site_perl/5.8.6/Bio/Restriction/Analysis.pm:884 STACK: Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ site_perl/5.8.6/Bio/Restriction/Analysis.pm:785 STACK: Bio::Restriction::Analysis::cut /usr/local/lib/perl5/site_perl/ 5.8.6/Bio/Restriction/Analysis.pm:369 STACK: Bio::Restriction::Analysis::cutters /usr/local/lib/perl5/ site_perl/5.8.6/Bio/Restriction/Analysis.pm:678 ---snip (my script line)--- ----------------------------------------------------------- The offending enzyme: ---snip--- <1>AcuI <2>Eco57I,Bsp6II,BspD6II,BspKT5I,Eco112I,Eco125I,FsfI <3>CTGAAG(16/14) ---snip--- If I get rid of the (16/14) the error disappears and the right sequence site is matched. It seems like maybe a decision was made not analyze enzymes with remote cut positions, or the code wouldn't throw the error...? Any information on this would be helpful. Thanks, Steve From adsj at novozymes.com Thu Feb 8 03:55:50 2007 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Thu, 08 Feb 2007 09:55:50 +0100 Subject: [Bioperl-l] Bioperl 1.4 to 1.5.2, adding qualifiers to Bio::Seq-objects References: <8764adoptn.fsf@topper.koldfront.dk> Message-ID: <87fy9hqb8p.fsf@topper.koldfront.dk> On Wed, 7 Feb 2007 11:50:13 -0600, Chris wrote: > This works for me using bioperl-live (Mac OS X): > ok 1 - Qualifier note found > ok 2 - Qualifier db_xref found *slaps forehead* Thanks for the test - your diagnose was spot on: > If you haven't uninstalled 1.4, make sure you aren't running 1.4 or > mixing the two versions (you can check by using 'perldoc -l > Bio::Root::Root'). I had a modified version of Bio::Seq and Bio::SeqFeature::Generic in my @INC (added, and promptly forgotten, writing the patch mentioned here: ). Removing those and patching 1.5.2 fixed my self-inflicted problem. Thanks again! Adam -- Adam Sj?gren adsj at novozymes.com From heikki at sanbi.ac.za Thu Feb 8 04:39:47 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 8 Feb 2007 11:39:47 +0200 Subject: [Bioperl-l] Bio::Restriction::Analysis cut site problem in 1.5.2 In-Reply-To: <3828F5B9-BFF8-47F4-8228-3BEA036513A0@annular.org> References: <3828F5B9-BFF8-47F4-8228-3BEA036513A0@annular.org> Message-ID: <200702081139.48125.heikki@sanbi.ac.za> The error comes from Bio::PrimarySeq::subseq when it tries to cut beyond an existing sequence. Maybe your sequence has a restriction site that is near the end of your sequence? This is a special case which has not been into account in Bio::Restriction::Analysis::_cuts method. The question is : should the site be be detected if its cut site is not within the studied sequence? Please submit a bugzilla bug, so this gets solved. I probably do not have time to tweak the code myself. -Heikki On Thursday 08 February 2007 01:41:13 Steven Brown wrote: > The module seems to have trouble handling the cut-site specifiers > that surround the sequence that the enzyme is specific for. The error: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Bad end parameter (22). End must be less than the total length > of sequence (total=6) > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.8.6/ > Bio/Root/Root.pm:328 > STACK: Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.8.6/ > Bio/PrimarySeq.pm:371 > STACK: Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/perl5/ > site_perl/5.8.6/Bio/Restriction/Analysis.pm:884 > STACK: Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ > site_perl/5.8.6/Bio/Restriction/Analysis.pm:785 > STACK: Bio::Restriction::Analysis::cut /usr/local/lib/perl5/site_perl/ > 5.8.6/Bio/Restriction/Analysis.pm:369 > STACK: Bio::Restriction::Analysis::cutters /usr/local/lib/perl5/ > site_perl/5.8.6/Bio/Restriction/Analysis.pm:678 > ---snip (my script line)--- > ----------------------------------------------------------- > > The offending enzyme: > > ---snip--- > <1>AcuI > <2>Eco57I,Bsp6II,BspD6II,BspKT5I,Eco112I,Eco125I,FsfI > <3>CTGAAG(16/14) > ---snip--- > > If I get rid of the (16/14) the error disappears and the right > sequence site is matched. It seems like maybe a decision was made > not analyze enzymes with remote cut positions, or the code wouldn't > throw the error...? Any information on this would be helpful. > > Thanks, > Steve > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From cjfields at uiuc.edu Thu Feb 8 09:20:26 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 8 Feb 2007 08:20:26 -0600 Subject: [Bioperl-l] BLASTXML changes (good this time!) Message-ID: All, BLAST XML parsing should now work for any CPAN-based XML::SAX parser! XML::SAX::PurePerl (comes with XML::SAX, the slowest) XML::SAX::Expat XML::SAX::ExpatXS (the fastest) XML::LibXML::SAX XML::LibXML::SAX::Parser Grant MacLean has updated XML::SAX on CPAN to fix a XML::SAX:PurePerl bug, so using that parser will necessitate an XML::SAX upgrade. I had also found a bug in the SAX handler which chopped off a large chunk of the description for hits which is now fixed in CVS. If Sendu is out there, I think we can safely remove any dependencies beyond XML::SAX 0.15 for the next release. Should I go ahead and modify Build.PL? chris From lstein at cshl.edu Thu Feb 8 10:51:49 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 8 Feb 2007 10:51:49 -0500 Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels In-Reply-To: <45CA608B.80907@berkeley.edu> References: <45C9578F.2060802@berkeley.edu> <6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com> <45CA608B.80907@berkeley.edu> Message-ID: <6dce9a0b0702080751m210e4d44k3e5c38bfdd3ee9ea@mail.gmail.com> Hi, I like the approach you're taking (creating a fake GD object that stores the graphics primitives). Perhaps the best thing to do is to subclass Panel itself so that it doesn't draw the gridlines (or turn gridlines off completely). Then you can draw gridlines after the fact in each tile as needed. Lincoln On 2/7/07, Mitch Skinner wrote: > > Lincoln Stein wrote: > > However, I'm also very interested in why grid-drawing takes so long. > > When I've profiled drawing, neither grid drawing nor map_pt() consume > > any significant amount of time. > Well, the approach that we've been taking is to hand > Bio::Graphics::Panel a fake GD object that stores all of the graphical > primitives (line, rectangle, filledRectangle, etc. + their parameters) > and then draws them later in chunks (for each tile, we draw all the > primitives that overlap its pixel boundaries). We're doing this because > trying to create a real GD object that's hundreds of millions of pixels > wide takes too much RAM. But storing all the gridlines (for a whole > chromosome, at a high zoom level) also takes a lot of RAM, and getting > the gridlines for the current tile and translating their coordinates > into the coordinate space of the tile also takes a fair amount of CPU. > The gridline hack I've been experimenting with (that prompted these > emails) was motivated by the hope that the gridlines were regular enough > that we wouldn't have to store them explicitly, but just draw the same > gridlines over and over again. It runs almost twice as fast as the > version that explicitly stores the gridlines. > > So the main slowdown is not in draw_grid or map_pt, but in our code > that's storing/retrieving and translating the gridlines. Which we are > also looking into speeding up. But the memory usage is harder to > reduce; I've experimented with trying to compress the gridline data but > it seems easier to just have the panel draw the grid directly. > > The more I read the Panel code, the more I think it would be nice to > make more use of it. One of the reasons that we're trying to fool it > right now is that there seem to be a number of behaviors in it (and/or > in the glyphs?) that take the current image boundaries into account > (drawing an arrow where a feature runs off the edge of the image, > etc.). But in our browser each tile is supposed to mesh seamlessly with > its neighbor, so if there's an easy way to turn off those edge-aware > behaviors that would be pretty interesting. > > Ian has also suggested that it might be better to store less information > than the full set of graphics primitives. For example, we could just > store the Panel's glyph boxes and use their (pixel bound)->feature > information to decide which features need to be drawn for each tile. > > I'm going to be spending some time reading the Bio::Graphics code in > more depth. I'd also welcome suggestions from you or anyone on the list. > > Thanks, > Mitch > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From Kevin.M.Brown at asu.edu Thu Feb 8 10:28:30 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 8 Feb 2007 08:28:30 -0700 Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels References: <45C9578F.2060802@berkeley.edu><6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com> <45CA608B.80907@berkeley.edu> Message-ID: <1A4207F8295607498283FE9E93B775B402AAC1D0@EX02.asurite.ad.asu.edu> > The more I read the Panel code, the more I think it would be > nice to make more use of it. One of the reasons that we're > trying to fool it right now is that there seem to be a number > of behaviors in it (and/or in the glyphs?) that take the > current image boundaries into account (drawing an arrow where > a feature runs off the edge of the image, etc.). But in our > browser each tile is supposed to mesh seamlessly with its > neighbor, so if there's an easy way to turn off those > edge-aware behaviors that would be pretty interesting. I think the glyphs try to deal with edges because if they didn't, then they would flow out into whatever right or left padding had been placed around the image when the panel was created. Something I've noticed is that when I create tiles for the chromosomes I'm working on the panels don't line up because the bump position in one panel is not accounted for when the next panel is drawn. From sheris at eps.berkeley.edu Thu Feb 8 12:42:27 2007 From: sheris at eps.berkeley.edu (Sheri Simmons) Date: Thu, 08 Feb 2007 09:42:27 -0800 Subject: [Bioperl-l] bioperl newbie needs help with extracting cds info Message-ID: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu> Hi, I'm a newbie to BioPerl so apologies if this is a very basic question. I am trying to parse GenBank files with the goal of creating concatenated gene lists in nucleic acid or amino acid format. It is working fine, except for one thing: I need to create gene labels incorporating information on whether the gene is on the complementary strand or not ("complement" in the CDS tag). How can I get Bioperl to tell me whether the CDS tag value includes the word "complement"? Thanks Sheri From george.heller at yahoo.com Thu Feb 8 13:54:41 2007 From: george.heller at yahoo.com (George Heller) Date: Thu, 8 Feb 2007 10:54:41 -0800 (PST) Subject: [Bioperl-l] Perl script to extract from ncbi Message-ID: <178139.85769.qm@web56506.mail.re3.yahoo.com> Hi all, I have a question regarding extracting data from Ncbi. I have a database to store the sequence data, but the files I have loaded into it, dont have a proper description line specified. Based on the accession number, I need to find out what is the genus and species name (organism name) from ncbi. I have about 1500 records for which I need to extract the names from ncbi. Any ideas of how I can go about writing a perl script for extracting this information from ncbi? Thanks! George. --------------------------------- Now that's room service! Choose from over 150,000 hotels in 45,000 destinations on Yahoo! Travel to find your fit. From Kevin.M.Brown at asu.edu Thu Feb 8 14:11:50 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 8 Feb 2007 12:11:50 -0700 Subject: [Bioperl-l] bioperl newbie needs help with extracting cds info References: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu> Message-ID: <1A4207F8295607498283FE9E93B775B402AAC29A@EX02.asurite.ad.asu.edu> When you extract the features, just look at the strand method on the returned sequence to find out. @features = $seq->all_SeqFeatures; # sort features by their primary tags for my $f (@features) { my $tag = $f->primary_tag; if ($tag eq 'CDS') { print $f->strand ."\n"; } } > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Sheri Simmons > Sent: Thursday, February 08, 2007 10:42 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] bioperl newbie needs help with > extracting cds info > > Hi, > I'm a newbie to BioPerl so apologies if this is a very basic > question. I am trying to parse GenBank files with the goal of > creating concatenated gene lists in nucleic acid or amino > acid format. It is working fine, except for one thing: I need > to create gene labels incorporating information on whether > the gene is on the complementary strand or not ("complement" > in the CDS tag). How can I get Bioperl to tell me whether the > CDS tag value includes the word "complement"? > > Thanks > Sheri > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From barry.moore at genetics.utah.edu Thu Feb 8 14:35:03 2007 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu, 8 Feb 2007 12:35:03 -0700 Subject: [Bioperl-l] bioperl newbie needs help with extracting cds info In-Reply-To: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu> References: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu> Message-ID: Sheri- The Bio::SeqFeature::Generic object has a 'strand' method, so you can just call strand on the CDS (or any other) feature like this. my @features = grep { $_->primary_tag eq 'CDS' } $seq- >get_SeqFeatures(); for my $feature (@features) { my $strand = $feature->strand; } Barry On Feb 8, 2007, at 10:42 AM, Sheri Simmons wrote: > Hi, > I'm a newbie to BioPerl so apologies if this is a very basic > question. I am trying to parse GenBank files with the goal of > creating concatenated gene lists in nucleic acid or amino acid > format. It is working fine, except for one thing: I need to create > gene labels incorporating information on whether the gene is on the > complementary strand or not ("complement" in the CDS tag). How can I > get Bioperl to tell me whether the CDS tag value includes the word > "complement"? > > Thanks > Sheri > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From torsten.seemann at infotech.monash.edu.au Thu Feb 8 23:18:33 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 9 Feb 2007 15:18:33 +1100 Subject: [Bioperl-l] BLASTXML changes (good this time!) In-Reply-To: References: Message-ID: Chris, > BLAST XML parsing should now work for any CPAN-based XML::SAX parser! > XML::SAX::PurePerl (comes with XML::SAX, the slowest) > XML::SAX::Expat > XML::SAX::ExpatXS (the fastest) > XML::LibXML::SAX > XML::LibXML::SAX::Parser That's excellent news - thanks for all the work you have put in on this one. I'm impressed. This is a good opportunity to encourage people who use Bio::SearchIO for BLAST parsing to switch to 'blastxml' format over 'blast'. Although the latter is more human readable, it perenially requires parser source changes to cope with the variations and new formatting introduced with each new NCBI BLAST release. Best to use "-m 7" XML format, and convert as appropriate using one of the Bio::Search::Writer:: classes. --Torsten From cjfields at uiuc.edu Fri Feb 9 08:58:24 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 9 Feb 2007 07:58:24 -0600 Subject: [Bioperl-l] BLASTXML changes (good this time!) In-Reply-To: References: Message-ID: <4FC966A7-7A03-41D9-ABF7-6ACD888720FB@uiuc.edu> On Feb 8, 2007, at 10:18 PM, Torsten Seemann wrote: > Chris, > >> BLAST XML parsing should now work for any CPAN-based XML::SAX parser! >> XML::SAX::PurePerl (comes with XML::SAX, the slowest) >> XML::SAX::Expat >> XML::SAX::ExpatXS (the fastest) >> XML::LibXML::SAX >> XML::LibXML::SAX::Parser > > That's excellent news - thanks for all the work you have put in on > this one. I'm impressed. Jason did most of the hard work; I just tinkered with it until it worked (and pestered a few perl XML guys along the way). Thanks Grant and Bj?rn! > This is a good opportunity to encourage people who use Bio::SearchIO > for BLAST parsing to switch to 'blastxml' format over 'blast'. > Although the latter is more human readable, it perenially requires > parser source changes to cope with the variations and new formatting > introduced with each new NCBI BLAST release. Best to use "-m 7" XML > format, and convert as appropriate using one of the > Bio::Search::Writer:: classes. > > --Torsten I'll try getting some benchmarks for the different parsers up today on the wiki if I have time. Strangely enough, NCBI changed a few things about BLAST XML a few releases back w/o mentioning it to anyone (it was a silent bug in BLAST XML parsing which I fixed recently). If you sent in multiple queries in older versions of BLAST you would get all of the BLAST XML reports concatenated together, which required preparsing the reports to carve up the XML prior to parsing. Now they treat it like PSI- BLAST where multiple queries = multiple iterations, so you get one long XML BLAST report where each iteration=Result. The current parser should handle both as it just caches the other results and returns them one at a time prior to new parses, but I wouldn't recommend parsing a huge BLAST XML file with hundreds of queries as you'll quickly run out of memory! If they get Perl SAX2 up to date with Expat they'll eventually add parse_chunk() and pause_parse() for each parser. Until then... chris Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cuiw at ncbi.nlm.nih.gov Fri Feb 9 09:20:10 2007 From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C]) Date: Fri, 9 Feb 2007 09:20:10 -0500 Subject: [Bioperl-l] Perl script to extract from ncbi In-Reply-To: <178139.85769.qm@web56506.mail.re3.yahoo.com> References: <178139.85769.qm@web56506.mail.re3.yahoo.com> Message-ID: <18C407FD4FFB424292D769FBD68C1987020BBC58@NIHCESMLBX8.nih.gov> This is an example for fetching two GenBank records (id=124504630,110665734) in XML format. Organism names like 'Rattus norvegicus' can be parsed from the XML. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&i d=124504630,110665734&retmode=xml&rettype=gb Or you can get TaxIds and translate them into real names: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=nucleotide &id=124504630,110665734&retmode=xml Wenwu Cui, PhD -----Original Message----- From: George Heller [mailto:george.heller at yahoo.com] Sent: Thursday, February 08, 2007 1:55 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Perl script to extract from ncbi Hi all, I have a question regarding extracting data from Ncbi. I have a database to store the sequence data, but the files I have loaded into it, dont have a proper description line specified. Based on the accession number, I need to find out what is the genus and species name () from ncbi. I have about 1500 records for which I need to extract the names from ncbi. Any ideas of how I can go about writing a perl script for extracting this information from ncbi? Thanks! George. --------------------------------- Now that's room service! Choose from over 150,000 hotels in 45,000 destinations on Yahoo! Travel to find your fit. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Fri Feb 9 12:51:19 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 09 Feb 2007 12:51:19 -0500 Subject: [Bioperl-l] Perl script to extract from ncbi In-Reply-To: <178139.85769.qm@web56506.mail.re3.yahoo.com> Message-ID: George, http://www.bioperl.org/wiki/HOWTO:Beginners#Retrieving_a_sequence_from_a_dat abase Brian O. On 2/8/07 1:54 PM, "George Heller" wrote: > Hi all, > > I have a question regarding extracting data from Ncbi. I have a database to > store the sequence data, but the files I have loaded into it, dont have a > proper description line specified. Based on the accession number, I need to > find out what is the genus and species name (organism name) from ncbi. > > I have about 1500 records for which I need to extract the names from ncbi. > > Any ideas of how I can go about writing a perl script for extracting this > information from ncbi? > > Thanks! > George. > > > --------------------------------- > Now that's room service! Choose from over 150,000 hotels > in 45,000 destinations on Yahoo! Travel to find your fit. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From johnston at biochem.ucl.ac.uk Fri Feb 9 14:23:41 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Fri, 9 Feb 2007 19:23:41 +0000 (GMT) Subject: [Bioperl-l] WrapperBase Message-ID: Hi, Could WrapperBase::executable warn you if it doesn't find the exe in program_path? At the moment it just silently goes ahead and uses one in the system path if it exists. Cass. I've never used diff, so not sure if this is right, but: 305,308c305,314 < if( $prog_path && -e $prog_path && -x $prog_path ) { < $self->{'_pathtoexe'} = $prog_path; < } else { < my $exe; --- > if($prog_path){ > if(-e $prog_path && -x $prog_path){ > $self->{'_pathtoexe'} = $prog_path; > } > else{ > $self->warn("executable not found in $prog_path, trying system path...") if $warn; > } > } > unless ($self->{_path_to_exe}){ > my $exe; 335a342 From bix at sendu.me.uk Fri Feb 9 17:38:59 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 09 Feb 2007 22:38:59 +0000 Subject: [Bioperl-l] WrapperBase In-Reply-To: References: Message-ID: <45CCF803.9030004@sendu.me.uk> Caroline Johnston wrote: > Hi, > > Could WrapperBase::executable warn you if it doesn't find the exe in > program_path? At the moment it just silently goes ahead and uses one in > the system path if it exists. No, I think not. That would be very annoying when using wrappers for programs that you just have in your system path. What specific problem are you encountering with the current behaviour? From bix at sendu.me.uk Fri Feb 9 17:40:33 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 09 Feb 2007 22:40:33 +0000 Subject: [Bioperl-l] BLASTXML changes (good this time!) In-Reply-To: References: Message-ID: <45CCF861.8030000@sendu.me.uk> Chris Fields wrote: > If Sendu is out there, I think we can safely remove any dependencies > beyond XML::SAX 0.15 for the next release. Should I go ahead and > modify Build.PL? Sure, good to hear. From cjfields at uiuc.edu Fri Feb 9 22:42:24 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 9 Feb 2007 21:42:24 -0600 Subject: [Bioperl-l] BLASTXML changes (good this time!) In-Reply-To: <45CCF861.8030000@sendu.me.uk> References: <45CCF861.8030000@sendu.me.uk> Message-ID: On Feb 9, 2007, at 4:40 PM, Sendu Bala wrote: > Chris Fields wrote: >> If Sendu is out there, I think we can safely remove any dependencies >> beyond XML::SAX 0.15 for the next release. Should I go ahead and >> modify Build.PL? > > Sure, good to hear. I added a version dependency for XML::SAX (v. 0.15) for the PurePerl fix. That likely obviates the need for a Bundle for XML::Simple. Not too pressing; we can determine that before the next release. chris From johnston at biochem.ucl.ac.uk Sat Feb 10 11:27:53 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Sat, 10 Feb 2007 16:27:53 +0000 (GMT) Subject: [Bioperl-l] WrapperBase In-Reply-To: <45CCF803.9030004@sendu.me.uk> References: <45CCF803.9030004@sendu.me.uk> Message-ID: > No, I think not. That would be very annoying when using wrappers for > programs that you just have in your system path. > Hmm, maybe I misundertood what the program_path was for? The executable method goes straight to the system path unless program_path is set, so I assumed you would only set program_path if you specifically wanted it to look somewhere else. You wouldn't get a warning if you didn't specify a program_path and just left it to look in the system path. > What specific problem are you encountering with the current behaviour? One version of an executable in /usr/local, another version - which I wanted to use in my home directory. The program_path method gets a path from an environment variable, which was set to ~/. I didn't realise I had the wrong permissions on the executable though, and it was silently failing to use my version and using the one in /usr/local instead. Cass From george.heller at yahoo.com Sat Feb 10 15:35:18 2007 From: george.heller at yahoo.com (George Heller) Date: Sat, 10 Feb 2007 12:35:18 -0800 (PST) Subject: [Bioperl-l] Error while parsing Message-ID: <162150.76282.qm@web56511.mail.re3.yahoo.com> Hi all, I am in the process of parsing a few files, actually blast results, but happen to get the following error: ------------- EXCEPTION ------------- MSG: Can't get HSPs: data not collected. STACK Bio::Search::Hit::GenericHit::hsp /usr/lib/perl5/site_perl/5.8.5/Bio/Search/Hit/GenericHit.pm:649 STACK toplevel parser.pl:31 -------------------------------------- I am not sure if this is a bug, or is there something I am doing wrong. Any pointers are appreciated. Thanks! George. --------------------------------- No need to miss a message. Get email on-the-go with Yahoo! Mail for Mobile. Get started. From cjfields at uiuc.edu Sat Feb 10 17:56:19 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 10 Feb 2007 16:56:19 -0600 Subject: [Bioperl-l] Error while parsing In-Reply-To: <162150.76282.qm@web56511.mail.re3.yahoo.com> References: <162150.76282.qm@web56511.mail.re3.yahoo.com> Message-ID: On Feb 10, 2007, at 2:35 PM, George Heller wrote: > Hi all, > > I am in the process of parsing a few files, actually blast > results, but happen to get the following error: > > ------------- EXCEPTION ------------- > MSG: Can't get HSPs: data not collected. > STACK Bio::Search::Hit::GenericHit::hsp /usr/lib/perl5/site_perl/ > 5.8.5/Bio/Search/Hit/GenericHit.pm:649 > STACK toplevel parser.pl:31 > -------------------------------------- > > I am not sure if this is a bug, or is there something I am doing > wrong. Any pointers are appreciated. > > Thanks! > George. We'll need more to go on than that. If the bioperl version is v1.5.2, please file a bug via the bioperl bugzilla: http://bugzilla.open-bio.org/ Don't forget to attach a test file which triggers the bug using the 'Create a new attachment' link after the report has been filed. chris From sac at bioperl.org Sat Feb 10 22:56:10 2007 From: sac at bioperl.org (Steve Chervitz) Date: Sat, 10 Feb 2007 19:56:10 -0800 Subject: [Bioperl-l] Error while parsing In-Reply-To: <162150.76282.qm@web56511.mail.re3.yahoo.com> References: <162150.76282.qm@web56511.mail.re3.yahoo.com> Message-ID: <8f200b4c0702101956h53fea96dm241126c680d64ab4@mail.gmail.com> Your report may be lacking HSP alignments for the hit you are attempting to process. Note that by default, blast will report twice as many one-line descriptions as it will alignments: -v Number of database sequences to show one-line descriptions for (V) [Integer] default = 500 -b Number of database sequence to show alignments for (B) [Integer] default = 250 Verify that this isn't the case for your error. If not, go ahead and file a bug report. Attach the report (zipped if big) as well as the relevant portion of your processing script. Steve On 2/10/07, George Heller wrote: > > Hi all, > > I am in the process of parsing a few files, actually blast results, but > happen to get the following error: > > ------------- EXCEPTION ------------- > MSG: Can't get HSPs: data not collected. > STACK Bio::Search::Hit::GenericHit::hsp > /usr/lib/perl5/site_perl/5.8.5/Bio/Search/Hit/GenericHit.pm:649 > STACK toplevel parser.pl:31 > -------------------------------------- > > I am not sure if this is a bug, or is there something I am doing wrong. > Any pointers are appreciated. > > Thanks! > George. > > > --------------------------------- > No need to miss a message. Get email on-the-go > with Yahoo! Mail for Mobile. Get started. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jay at jays.net Sun Feb 11 09:24:55 2007 From: jay at jays.net (Jay Hannah) Date: Sun, 11 Feb 2007 08:24:55 -0600 Subject: [Bioperl-l] wiki: Database error when attempting to change preferences (1054: Unknown column 'user_newpass_time') Message-ID: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net> Just a heads-up -- I wanted to check the "E-mail me when a page I'm watching is changed" box in my preferences http://www.bioperl.org/wiki/Special:Preferences But I can't. Even if I change nothing and hit the Save button I get this: ---------- Database error A database query syntax error has occurred. This may indicate a bug in the software. The last attempted database query was: (SQL query hidden) from within function "User::saveSettings". MySQL returned error "1054: Unknown column 'user_newpass_time' in 'field list' (localhost)". ---------- (Yes, it literally says "(SQL query hidden)". That wasn't me for the purposes of this email. -grin-) Thanks, j seqlab.net http://www.bioperl.org/wiki/User:Jhannah Username: Jhannah User ID: 51 From jay at jays.net Sun Feb 11 10:16:13 2007 From: jay at jays.net (Jay Hannah) Date: Sun, 11 Feb 2007 09:16:13 -0600 Subject: [Bioperl-l] wiki: Database error when attempting to change preferences (1054: Unknown column 'user_newpass_time') In-Reply-To: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net> References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net> Message-ID: <9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net> Hmm.... The error appears to not be limited to changing preferences. I tried to update a couple different pages and got errors like this: ------ Database error A database query syntax error has occurred. This may indicate a bug in the software. The last attempted database query was: (SQL query hidden) from within function "Article::updateRedirectOn". MySQL returned error "1146: Table 'perlwikidb.redirect' doesn't exist (localhost)". ------ So all changes to the wiki aren't working right now? j seqlab.net http://www.bioperl.org/wiki/User:Jhannah From jason at bioperl.org Sun Feb 11 15:18:15 2007 From: jason at bioperl.org (Jason Stajich) Date: Sun, 11 Feb 2007 12:18:15 -0800 Subject: [Bioperl-l] wiki: Database error when attempting to change preferences (1054: Unknown column 'user_newpass_time') In-Reply-To: <9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net> References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net> <9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net> Message-ID: <3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org> Should be fine now - I did an upgrade to mediawiki 1.9 this weekend and i think the upgrade script didn't finish. In the future system support requests should go to support - AT - open-bio.org so we can track them. -jason On Feb 11, 2007, at 7:16 AM, Jay Hannah wrote: > Hmm.... The error appears to not be limited to changing preferences. > I tried to update a couple different pages and got errors like this: > > ------ > Database error > A database query syntax error has occurred. This may indicate a bug > in the software. The last attempted database query was: > > (SQL query hidden) > > from within function "Article::updateRedirectOn". MySQL returned > error "1146: Table 'perlwikidb.redirect' doesn't exist (localhost)". > ------ > > So all changes to the wiki aren't working right now? > > j > seqlab.net > http://www.bioperl.org/wiki/User:Jhannah > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From cjfields at uiuc.edu Sun Feb 11 15:51:53 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 11 Feb 2007 14:51:53 -0600 Subject: [Bioperl-l] wiki: Database error when attempting to change preferences (1054: Unknown column 'user_newpass_time') In-Reply-To: <3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org> References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net> <9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net> <3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org> Message-ID: Is there a good place on the main wiki page to prominently display this? I wanted to place something at the top of the main page but I didn't know if we wanted to post the support email address on the page itself. chris On Feb 11, 2007, at 2:18 PM, Jason Stajich wrote: > Should be fine now - I did an upgrade to mediawiki 1.9 this weekend > and i think the upgrade script didn't finish. > > In the future system support requests should go to support - AT - > open-bio.org so we can track them. > > -jason > On Feb 11, 2007, at 7:16 AM, Jay Hannah wrote: > >> Hmm.... The error appears to not be limited to changing preferences. >> I tried to update a couple different pages and got errors like this: >> >> ------ >> Database error >> A database query syntax error has occurred. This may indicate a bug >> in the software. The last attempted database query was: >> >> (SQL query hidden) >> >> from within function "Article::updateRedirectOn". MySQL returned >> error "1146: Table 'perlwikidb.redirect' doesn't exist (localhost)". >> ------ >> >> So all changes to the wiki aren't working right now? >> >> j >> seqlab.net >> http://www.bioperl.org/wiki/User:Jhannah >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jay at jays.net Sun Feb 11 15:56:53 2007 From: jay at jays.net (Jay Hannah) Date: Sun, 11 Feb 2007 14:56:53 -0600 Subject: [Bioperl-l] wiki: Database error when attempting to change preferences (1054: Unknown column 'user_newpass_time') In-Reply-To: References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net> <9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net> <3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org> Message-ID: On Feb 11, 2007, at 2:51 PM, Chris Fields wrote: > Is there a good place on the main wiki page to prominently display > this? I wanted to place something at the top of the main page but > I didn't know if we wanted to post the support email address on the > page itself. I added it here: http://www.bioperl.org/wiki/About_site Which is linked from all pages via the left-hand bar: community | About this site j seqlab.net http://www.bioperl.org/wiki/User:Jhannah From agd27 at cornell.edu Sun Feb 11 12:47:03 2007 From: agd27 at cornell.edu (Adam Diehl) Date: Sun, 11 Feb 2007 12:47:03 -0500 Subject: [Bioperl-l] Getting GFF output in UCSC-specific format Message-ID: <45CF5697.60703@cornell.edu> Good morning folks, I've got sort of a newbie question regarding how to get gff's out of Bio::Tools:GFF objects that are formatted according to the UCSC browser conventions, described here: http://genome.ucsc.edu/goldenPath/help/customTrack.html#GFF (Ignore the custom track headers and what-not. I just need the fields to be set up according to the descriptions in 1 - 9). The write_feature($feature) method isn't doing it for me, as I get lines like the following (newlines excepted): chr1 EMBL/GenBank/SwissProt gene 1712 2848 . + . db_xref=GeneID:4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002 chr1 EMBL/GenBank/SwissProt CDS 1712 2848 . + . EC_number=2.7.7.7;codon_start=1;db_xref=GI:94989511,GeneID:4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002;product=DNA+polymerase+III%2C+beta+chain;protein_ id=YP_597611.1;transl_table=11;translation=MIQFSINRTLFIHALNATKRAISTKNAIPILSSIKIEVTSTGVTLTGSNGQISIENTIPVSNENAGLLITSPGAILLEASFFINIISSLPDISINVKEIEQHQVVLTSGKSEITLKGKDVDQYPRLQEVSTENPLILKTKLLKSIIAETAFAASLQESRPILTGVHIVLSNHKDFKAVATDSHRMSQRLIT LENTSADFDVVIPSKSLREFSAVFTDDIETVEVFFSPSQILFRSEHISFYTRLLEGNYPDTDRLLMTEFETEVVFNTQSLRHAMERAFLISNATQNGTVKLEITQNHISAHVNSPEVGKVNEDLDIVSQSGSDLTISFNPTYLIESLKAIKSETVKIHFLSPVRPFTLTPGDEEESFIQLITPVRTN As you can see, field 8, which should be frame according to UCSC conventions is blank, and field 9, group according to UCSC, has frame, along with ID, etc. All this extra stuff causes the UCSC browser to choke. First off, it can't identify which features are the same (it does this by matching the group field), and second, it can't interpret the CDS's into translated proteins because it lacks frame data. Basically what I need to do is, for CDS features, extract frame (or codon_start, as it were), from the last field, parse out the integer value and store that in field 8 (as frame), then parse out locus_tag from the last field, clear out everything else and store the locus_tag only in that field (preferably without the qualifier locus_tag=). For feature type gene, I just want to do the last step, so that the gene and CDS features for the same feature have matching group fields that are as simple as possible. Let me know if this is not clear. The way I've been trying to do this is by stringifying each gff object, splitting into an array, @tmp1, splitting @tmp1[8] into @tmp2 with the following code: my @tmp2 = split /\;\, $tmp1[8]; and finally, trying to parse out the bits I need with regular expressions and store back to @tmp1[n]. -- This does not work, because perl wants to interpret every / + etc. as a metacharacter! I am assuming there's a simple way to get at each value in the last field of the gff object using methods supplied by Bio::Tools::GFF, but the API docs seem a bit lacking in this area. Could anyone steer me towards what I need to know to do this? Please let me know if I can clarify any details! Cheers, Adam Diehl From jason at bioperl.org Sun Feb 11 18:29:16 2007 From: jason at bioperl.org (Jason Stajich) Date: Sun, 11 Feb 2007 15:29:16 -0800 Subject: [Bioperl-l] Getting GFF output in UCSC-specific format In-Reply-To: <45CF5697.60703@cornell.edu> References: <45CF5697.60703@cornell.edu> Message-ID: I assume you are getting your features from a Bio::SeqIO parse of a Genbank file? you get back a Bio::SeqFeature::Generic objects so you want to look at the docs for that module to see what the API is. you will need to set frame via $feature->frame($frame) You are going to have to determine the frame yourself if that isn't part of the feature, we don't calculate it for you. For the 9th column, this is available through the tags methods has_tag, add_tag_values, get_tag_values, get_all_tags, and remove_tag so you can remove all the tags you don't want through remove_tag (or if you want to remove them all) my $locus; for my $tag ( $feature->get_all_tags ) { if( $tag eq 'locus_tag' ) { # save the locus_tag when we see it ($locus) = $feature->get_tag_values($tag); } $feature->remove_tag($tag); } You will also want to set the GFF format when you call Bio::Tools::GFF - I think the UCSC site is only supporting GFF1, I don't know exactly how you set the tag then when they aren't paired with key=>value, you'll need to set the tag to 'group' so $feature->add_tag_value('group', $locus); If this is all unsatistfactory you can easily write your own GFF write for your flavor of the data with the print join("\t", $feat->seq_id, $feat->source_tag, $feat->primary_tag, $feat->start, $feat->end, $feat->score, $feat->strand > 0 ? '+' : '-', $feat->frame, $locus), "\n"; -jason On Feb 11, 2007, at 9:47 AM, Adam Diehl wrote: > Good morning folks, > > I've got sort of a newbie question regarding how to get gff's out of > Bio::Tools:GFF objects that are formatted according to the UCSC > browser > conventions, described here: > > http://genome.ucsc.edu/goldenPath/help/customTrack.html#GFF > (Ignore the custom track headers and what-not. I just need the > fields to > be set up according to the descriptions in 1 - 9). > > The write_feature($feature) method isn't doing it for me, as I get > lines > like the following (newlines excepted): > > chr1 EMBL/GenBank/SwissProt gene 1712 2848 . + > . db_xref=GeneID:4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002 > chr1 EMBL/GenBank/SwissProt CDS 1712 2848 . + > . > EC_number=2.7.7.7;codon_start=1;db_xref=GI:94989511,GeneID: > 4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002;product=DNA+polymerase > +III%2C+beta+chain;protein_ > id=YP_597611.1;transl_table=11;translation=MIQFSINRTLFIHALNATKRAISTKNA > IPILSSIKIEVTSTGVTLTGSNGQISIENTIPVSNENAGLLITSPGAILLEASFFINIISSLPDISINVK > EIEQHQVVLTSGKSEITLKGKDVDQYPRLQEVSTENPLILKTKLLKSIIAETAFAASLQESRPILTGVHI > VLSNHKDFKAVATDSHRMSQRLIT > LENTSADFDVVIPSKSLREFSAVFTDDIETVEVFFSPSQILFRSEHISFYTRLLEGNYPDTDRLLMTEFE > TEVVFNTQSLRHAMERAFLISNATQNGTVKLEITQNHISAHVNSPEVGKVNEDLDIVSQSGSDLTISFNP > TYLIESLKAIKSETVKIHFLSPVRPFTLTPGDEEESFIQLITPVRTN > > As you can see, field 8, which should be frame according to UCSC > conventions is blank, and field 9, group according to UCSC, has frame, > along with ID, etc. All this extra stuff causes the UCSC browser to > choke. First off, it can't identify which features are the same (it > does > this by matching the group field), and second, it can't interpret the > CDS's into translated proteins because it lacks frame data. > > Basically what I need to do is, for CDS features, extract frame (or > codon_start, as it were), from the last field, parse out the integer > value and store that in field 8 (as frame), then parse out locus_tag > from the last field, clear out everything else and store the locus_tag > only in that field (preferably without the qualifier locus_tag=). For > feature type gene, I just want to do the last step, so that the > gene and > CDS features for the same feature have matching group fields that > are as > simple as possible. Let me know if this is not clear. > > The way I've been trying to do this is by stringifying each gff > object, > splitting into an array, @tmp1, splitting @tmp1[8] into @tmp2 with the > following code: my @tmp2 = split /\;\, $tmp1[8]; and finally, > trying to > parse out the bits I need with regular expressions and store back to > @tmp1[n]. -- This does not work, because perl wants to interpret > every > / + etc. as a metacharacter! > > I am assuming there's a simple way to get at each value in the last > field of the gff object using methods supplied by Bio::Tools::GFF, but > the API docs seem a bit lacking in this area. Could anyone steer me > towards what I need to know to do this? Please let me know if I can > clarify any details! > > Cheers, > Adam Diehl > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From bix at sendu.me.uk Sun Feb 11 18:39:15 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 11 Feb 2007 23:39:15 +0000 Subject: [Bioperl-l] WrapperBase In-Reply-To: References: <45CCF803.9030004@sendu.me.uk> Message-ID: <45CFA923.8010201@sendu.me.uk> Caroline Johnston wrote: >> No, I think not. That would be very annoying when using wrappers for >> programs that you just have in your system path. > > Hmm, maybe I misundertood what the program_path was for? The executable > method goes straight to the system path unless program_path is set, so I > assumed you would only set program_path if you specifically wanted it to > look somewhere else. You wouldn't get a warning if you didn't specify a > program_path and just left it to look in the system path. Yes, sorry. Having now actually looked at your patch it seems fine. I'll commit it unless someone beats me to it. From flope004 at hotmail.com Sun Feb 11 21:40:08 2007 From: flope004 at hotmail.com (Wolverine Fran) Date: Mon, 12 Feb 2007 03:40:08 +0100 Subject: [Bioperl-l] TreeIO, how it works? Message-ID: Hi, I have a problem. I don't understand how TreeIO reads the trees: my input: ((dog:0.04,cat:0.08):0.12,(human:0.15,mouse:0.2)); An unrooted tree with 4 tips and 2 internal nodes. when I asked for: print "Total number of nodes ",$tree->number_nodes; I get 6 but when I ask for: foreach my $node (@nodes) { print $node->internal_id,","; } I get 6,0,1,2,3,4,5. Total 7. The root is number 6 and 2 and 5 are my internal nodes. If I set the root to be number 5 this node 6 is still present. Why? what is the node 6? when I try the following: $node5 = $tree->find_node(-internal_id => '5'); $node6 = $tree->find_node(-internal_id => '6'); $node2 = $tree->find_node(-internal_id => '2'); $distance1 = $tree->distance(-nodes =>[$node5,$node2]); $distance2 = $tree->distance(-nodes =>[$node5,$node6]); $distance3 = $tree->distance(-nodes =>[$node2,$node6]); or any other distance I get 2 warnings: -------------------- WARNING --------------------- MSG: Must provide a valid array reference for -nodes --------------------------------------------------- -------------------- WARNING --------------------- MSG: Could not find distance! --------------------------------------------------- What am I doing incorrectly? I am practicing with AlignIO and TreeIO to calculate the maximum likelihood for a given tree. So, other information about that would be of great help. I am practicing with this to see how Bioperl can help me with more complex problems. Thank you very much for your help! _________________________________________________________________ Acepta el reto MSN Premium: Correos m?s divertidos con fotos y textos incre?bles en MSN Premium. Desc?rgalo y pru?balo 2 meses gratis. http://join.msn.com?XAPID=1697&DI=1055&HL=Footer_mailsenviados_correosmasdivertidos From jason at bioperl.org Sun Feb 11 22:05:18 2007 From: jason at bioperl.org (Jason Stajich) Date: Sun, 11 Feb 2007 19:05:18 -0800 Subject: [Bioperl-l] TreeIO, how (does) it work? In-Reply-To: References: Message-ID: <60D2DD3B-2ED1-4A92-A9FA-2875FEAA28CC@bioperl.org> On Feb 11, 2007, at 6:40 PM, Wolverine Fran wrote: > Hi, > > I have a problem. I don't understand how TreeIO reads the trees: > my input: ((dog:0.04,cat:0.08):0.12,(human:0.15,mouse:0.2)); > > An unrooted tree with 4 tips and 2 internal nodes. > when I asked for: > print "Total number of nodes ",$tree->number_nodes; > > I get 6 but when I ask for: > foreach my $node (@nodes) { > print $node->internal_id,","; > } > I get 6,0,1,2,3,4,5. Total 7. > > The root is number 6 and 2 and 5 are my internal nodes. > If I set the root to be number 5 this node 6 is still present. > Why? what is the node 6? Node 6 is to hold the root or a fake root with a trifurcation for unrooted trees. Did you actually call the reroot method to set the root to node 5? > > when I try the following: > $node5 = $tree->find_node(-internal_id => '5'); > $node6 = $tree->find_node(-internal_id => '6'); > $node2 = $tree->find_node(-internal_id => '2'); > $distance1 = $tree->distance(-nodes =>[$node5,$node2]); > $distance2 = $tree->distance(-nodes =>[$node5,$node6]); > $distance3 = $tree->distance(-nodes =>[$node2,$node6]); > or any other distance I get 2 warnings: > -------------------- WARNING --------------------- > MSG: Must provide a valid array reference for -nodes > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: Could not find distance! > --------------------------------------------------- > What am I doing incorrectly? > The distance method is just summing branch lengths on the path between two nodes. Is that what are you trying to do? The error message you report doesn't make sense as "Must provide a valid array reference for -nodes" is only printed when you call is_monophyletic or is_paraphyletic as far as I can tell. what version of bioperl are you using? > I am practicing with AlignIO and TreeIO to calculate the maximum > likelihood > for a given tree. So,other information about that would be of great > help. I am practicing with > this to see how Bioperl can help me with more complex problems. > You are trying to calculate the likelihood of a tree or are you trying to generate a ML tree from an alignment? > Thank you very much for your help! > > _________________________________________________________________ > Acepta el reto MSN Premium: Correos m?s divertidos con fotos y textos > incre?bles en MSN Premium. Desc?rgalo y pru?balo 2 meses gratis. > http://join.msn.com? > XAPID=1697&DI=1055&HL=Footer_mailsenviados_correosmasdivertidos > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From er at xs4all.nl Mon Feb 12 08:03:06 2007 From: er at xs4all.nl (Erik) Date: Mon, 12 Feb 2007 14:03:06 +0100 (CET) Subject: [Bioperl-l] bioperl wiki changes rss / atom In-Reply-To: References: <162150.76282.qm@web56511.mail.re3.yahoo.com> Message-ID: <20181.156.83.0.124.1171285386.squirrel@webmail.xs4all.nl> Hi, The bioperl wiki changes rss / atom feed has two leading empty lines which invalidate the xml: XML Parsing Error: xml declaration not at start of external entity Location: http://www.bioperl.org/w/index.php?title=Special:Recentchanges&feed=rss Line Number 3, Column 1: ^ Could those be removed? (I didn't see a way to do it myself). Might be a useful feed :) thanks, Erik From cjfields at uiuc.edu Mon Feb 12 09:52:44 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 12 Feb 2007 08:52:44 -0600 Subject: [Bioperl-l] bioperl wiki changes rss / atom In-Reply-To: <20181.156.83.0.124.1171285386.squirrel@webmail.xs4all.nl> References: <162150.76282.qm@web56511.mail.re3.yahoo.com> <20181.156.83.0.124.1171285386.squirrel@webmail.xs4all.nl> Message-ID: I have forwarded this to support at open-bio.org, which should take care of it. chris On Feb 12, 2007, at 7:03 AM, Erik wrote: > Hi, > > > The bioperl wiki changes rss / atom feed has two leading empty > lines which > invalidate the xml: > > XML Parsing Error: xml declaration not at start of external entity > Location: > http://www.bioperl.org/w/index.php? > title=Special:Recentchanges&feed=rss > Line Number 3, Column 1: > ^ > > Could those be removed? (I didn't see a way to do it myself). Might > be a > useful feed :) > > > thanks, > > Erik > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From sm8 at sanger.ac.uk Mon Feb 12 12:12:00 2007 From: sm8 at sanger.ac.uk (Stephen Montgomery) Date: Mon, 12 Feb 2007 17:12:00 -0000 Subject: [Bioperl-l] subtract for Bio::RangeI.pm Message-ID: Hi - It is a subtract function for the Bio::RangeI class. (To be added if interested) All the best! Stephen Montgomery //ADD TO BIO::RANGEI =head2 subtract Title : subtract Usage : my @subtracted = $r1->subtract($r2) Function: Subtract range r2 from range r1 Args : arg #1 = a range to subtract from this one (mandatory) arg #2 = strand option ('strong', 'weak', 'ignore') (optional) Returns : undef if they do not overlap or r2 contains this RangeI, or an arrayref of Range objects (this is an array since some instances where the subtract range is enclosed within this range will result in the creation of two new disjoint ranges) =cut sub subtract() { my ($self, $range, $so) = @_; $self->throw("missing arg: you need to pass in another feature") unless $range; return unless $self->_testStrand($range, $so); if ($self eq "Bio::RangeI") { $self = "Bio::Range"; $self->warn("calling static methods of an interface is deprecated; use $self instead"); } $range->throw("Input a Bio::RangeI object") unless $range->isa('Bio::RangeI'); if (!$self->overlaps($range)) { return undef; } ##Subtracts everything if ($range->contains($self)) { return undef; } my ($start, $end, $strand) = $self->intersection($range, $so); ##Subtract intersection from $self range my @outranges = (); if ($self->start < $start) { push(@outranges, $self->new('-start'=>$self->start, '-end'=>$start - 1, '-strand'=>$self->strand, )); } if ($self->end > $end) { push(@outranges, $self->new('-start'=>$end + 1, '-end'=>$self->end, '-strand'=>$self->strand, )); } return \@outranges; } //UNIT TEST #!/usr/bin/perl use strict; use Bio::SeqFeature::Generic; use Data::Dumper; use Test; plan tests => 13; my $feature1 = new Bio::SeqFeature::Generic ( -start => 1, -end => 1000, -strand => 1); my $feature2 = new Bio::SeqFeature::Generic ( -start => 100, -end => 900, -strand => -1); my $subtracted = $feature1->subtract($feature2); ok(defined($subtracted)); ok(scalar(@$subtracted) == 2); foreach my $range (@$subtracted) { ok($range->start == 1 || $range->start == 901); ok($range->end == 99 || $range->end == 1000); } my $subtracted = $feature2->subtract($feature1); ok(!defined($subtracted)); my $subtracted = $feature1->subtract($feature2, 'weak'); ok(!defined($subtracted)); my $subtracted = $feature1->subtract($feature2, 'strong'); ok(!defined($subtracted)); my $feature3 = new Bio::SeqFeature::Generic ( -start => 500, -end => 1500, -strand => 1); my $subtracted = $feature1->subtract($feature3); ok(defined($subtracted)); ok(scalar(@$subtracted) == 1); my $subtracted_i = @$subtracted[0]; ok($subtracted_i->start == 1); ok($subtracted_i->end == 499); From sm8 at sanger.ac.uk Mon Feb 12 11:04:41 2007 From: sm8 at sanger.ac.uk (Stephen Montgomery) Date: Mon, 12 Feb 2007 16:04:41 -0000 Subject: [Bioperl-l] subtract for Bio::RangeI.pm Message-ID: Hi - It is a subtract function for the Bio::RangeI class. (To be added if interested) All the best! Stephen Montgomery //ADD TO BIO::RANGEI =head2 subtract Title : subtract Usage : my @subtracted = $r1->subtract($r2) Function: Subtract range r2 from range r1 Args : arg #1 = a range to subtract from this one (mandatory) arg #2 = strand option ('strong', 'weak', 'ignore') (optional) Returns : undef if they do not overlap or r2 contains this RangeI, or an arrayref of Range objects (this is an array since some instances where the subtract range is enclosed within this range will result in the creation of two new disjoint ranges) =cut sub subtract() { my ($self, $range, $so) = @_; $self->throw("missing arg: you need to pass in another feature") unless $range; return unless $self->_testStrand($range, $so); if ($self eq "Bio::RangeI") { $self = "Bio::Range"; $self->warn("calling static methods of an interface is deprecated; use $self instead"); } $range->throw("Input a Bio::RangeI object") unless $range->isa('Bio::RangeI'); if (!$self->overlaps($range)) { return undef; } ##Subtracts everything if ($range->contains($self)) { return undef; } my ($start, $end, $strand) = $self->intersection($range, $so); ##Subtract intersection from $self range my @outranges = (); if ($self->start < $start) { push(@outranges, $self->new('-start'=>$self->start, '-end'=>$start - 1, '-strand'=>$self->strand, )); } if ($self->end > $end) { push(@outranges, $self->new('-start'=>$end + 1, '-end'=>$self->end, '-strand'=>$self->strand, )); } return \@outranges; } //UNIT TEST #!/usr/bin/perl use strict; use Bio::SeqFeature::Generic; use Data::Dumper; use Test; plan tests => 13; my $feature1 = new Bio::SeqFeature::Generic ( -start => 1, -end => 1000, -strand => 1); my $feature2 = new Bio::SeqFeature::Generic ( -start => 100, -end => 900, -strand => -1); my $subtracted = $feature1->subtract($feature2); ok(defined($subtracted)); ok(scalar(@$subtracted) == 2); foreach my $range (@$subtracted) { ok($range->start == 1 || $range->start == 901); ok($range->end == 99 || $range->end == 1000); } my $subtracted = $feature2->subtract($feature1); ok(!defined($subtracted)); my $subtracted = $feature1->subtract($feature2, 'weak'); ok(!defined($subtracted)); my $subtracted = $feature1->subtract($feature2, 'strong'); ok(!defined($subtracted)); my $feature3 = new Bio::SeqFeature::Generic ( -start => 500, -end => 1500, -strand => 1); my $subtracted = $feature1->subtract($feature3); ok(defined($subtracted)); ok(scalar(@$subtracted) == 1); my $subtracted_i = @$subtracted[0]; ok($subtracted_i->start == 1); ok($subtracted_i->end == 499); From flope004 at hotmail.com Mon Feb 12 13:07:12 2007 From: flope004 at hotmail.com (Wolverine Fran) Date: Mon, 12 Feb 2007 19:07:12 +0100 Subject: [Bioperl-l] TreeIO, how (does) it work? In-Reply-To: <60D2DD3B-2ED1-4A92-A9FA-2875FEAA28CC@bioperl.org> Message-ID: thanks for your reply! I am using Bioperl 1.4. >Node 6 is to hold the root or a fake root with a trifurcation for >unrooted trees. Did you actually call the reroot method to set the >root to node 5? Yes, I tried the following with the same result: $tree->reroot($tree->find_node(-internal_id => '5')); or $tree->set_root_node($tree->find_node(-internal_id => '5')); Even if I use a rooted tree: (((dog:0.04,cat:0.08):0.12,human:0.15):0.1,mouse:0.1); I get the node #6. So, is it always present? Am I not representing properly a rooted tree in newick format? >The distance method is just summing branch lengths on the path >between two nodes. Is that what are you trying to do? > >The error message you report doesn't make sense as >"Must provide a valid array reference for -nodes" >is only printed when you call is_monophyletic or is_paraphyletic as >far as I can tell. I do not know yet what I was doing incorrectly but now It works. Yes, I was using the distance method to know where the node 6 was located. For the unrooted tree, node 6 was node 5 (an internal node) and for the rooted tree node 6 was 0.1 from the mouse leaf and the internal node (root). The error message: "Must provide a valid array reference for -nodes" is shown if I indicate a node which is not present in the tree. >You are trying to calculate the likelihood of a tree or are you >trying to generate a ML tree from an alignment? I am trying to calculate the likelihood of a tree, as a practice. Probably there are other bioperl modules, besides AlignIO and TreeIO, which can help me in the process and I do not know them. Again, thank you for your time! _________________________________________________________________ Acepta el reto MSN Premium: Protecci?n para tus hijos en internet. Desc?rgalo y pru?balo 2 meses gratis. http://join.msn.com?XAPID=1697&DI=1055&HL=Footer_mailsenviados_proteccioninfantil From dmessina at wustl.edu Mon Feb 12 12:49:49 2007 From: dmessina at wustl.edu (David Messina) Date: Mon, 12 Feb 2007 11:49:49 -0600 Subject: [Bioperl-l] subtract for Bio::RangeI.pm In-Reply-To: References: Message-ID: <1574ACCF-92D5-4DEC-AD04-14EB7767F22A@wustl.edu> Stephen, Great, thanks for this. Could you submit it to Bugzilla as an enhancement? http://bugzilla.open-bio.org/ Thanks, Dave From jason at bioperl.org Mon Feb 12 13:38:11 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 12 Feb 2007 10:38:11 -0800 Subject: [Bioperl-l] TreeIO, how (does) it work? In-Reply-To: References: Message-ID: I would definitely suggest getting ahold of bioperl 1.5.2 as I seem to remember there are several fixes in the tree module code for re- rooting a tree. -jason On Feb 12, 2007, at 10:07 AM, Wolverine Fran wrote: > thanks for your reply! > > I am using Bioperl 1.4. > >> Node 6 is to hold the root or a fake root with a trifurcation for >> unrooted trees. Did you actually call the reroot method to set the >> root to node 5? > > Yes, I tried the following with the same result: > $tree->reroot($tree->find_node(-internal_id => '5')); > or > $tree->set_root_node($tree->find_node(-internal_id => '5')); > > Even if I use a rooted tree: (((dog:0.04,cat:0.08):0.12,human:0.15): > 0.1,mouse:0.1); > I get the node #6. So, is it always present? Am I not representing > properly a rooted tree in newick format? > >> The distance method is just summing branch lengths on the path >> between two nodes. Is that what are you trying to do? >> >> The error message you report doesn't make sense as >> "Must provide a valid array reference for -nodes" >> is only printed when you call is_monophyletic or is_paraphyletic as >> far as I can tell. > > I do not know yet what I was doing incorrectly but now It works. > Yes, I was using the distance method to know where the node 6 was > located. For the unrooted tree, node 6 was node 5 (an internal > node) and for the rooted tree node 6 was 0.1 from the mouse leaf > and the internal node (root). > The error message: "Must provide a valid array reference for - > nodes" is shown if I indicate a node which is not present in the tree. > >> You are trying to calculate the likelihood of a tree or are you >> trying to generate a ML tree from an alignment? > > I am trying to calculate the likelihood of a tree, as a practice. > Probably there are other bioperl modules, besides AlignIO and > TreeIO, which can help me in the process and I do not know them. > > Again, thank you for your time! > > _________________________________________________________________ > Acepta el reto MSN Premium: Protecci?n para tus hijos en internet. > Desc?rgalo y pru?balo 2 meses gratis. http://join.msn.com? > XAPID=1697&DI=1055&HL=Footer_mailsenviados_proteccioninfantil > -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From johnsonm at gmail.com Mon Feb 12 18:13:09 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Mon, 12 Feb 2007 17:13:09 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: On 2/7/07, Mark Johnson wrote: > > Well, each format has some unique features. If the user declines to > specify the format, I can figure it out, but it will probably involve > scanning the input file twice. I'll take a look. > I can do all the parsing in one function, in fact I have, just to see > how nasty it would end up being. I just can't stomach having the code that > tightly coupled and hard to read. In the end it'll probably be three > functions. GlimmermM/HMM are pretty close. Maybe two, Glimmer2 and > Glimmer3 aren't *that* different, either. I've got a 4-in-1 parser roughed in per Chris Fields' suggestion. Two actual parsing routines (prokaryotic and eukaryotic). You can specify -format as an arg to the constructor (Glimmer, GlimmerM, GlimmerHMM), or it will look through the input until it can figure out what it is looking at. I've got one main issue to solve, the rest is just stuff like updating the POD. Torsten Seemann very helpfully added example output for all 4 formats to t/data. Looking at GlimmerHMM.out, the first line is 'GlimmerHMM'. However, I think there is a bug in the existing _parse_predictions: Shouldn't this: } elsif( /^(Glimmer\S*)$/ ) { # GlimmerHMM has no version $source = $1; next; } be this instead: } elsif( /^(GlimmerHMM\S*)$/ ) { # GlimmerHMM has no version $source = $1; next; } I lifted that bit of code to do format detection...we don't have GlimmerHMM installed locally, so I'm assuming Torsten's output is correct and the above is a bug. Guess I'll go check bugzilla... From torsten.seemann at infotech.monash.edu.au Mon Feb 12 21:07:40 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 13 Feb 2007 13:07:40 +1100 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: Mark, > I've got one main issue to solve, the rest is just stuff like updating > the POD. Torsten Seemann very helpfully added example output for all 4 > formats to t/data. Looking at GlimmerHMM.out, the first line is > 'GlimmerHMM'. However, I think there is a bug in the existing > _parse_predictions: > Shouldn't this: > } elsif( /^(Glimmer\S*)$/ ) { # GlimmerHMM has no version > be this instead: > } elsif( /^(GlimmerHMM\S*)$/ ) { # GlimmerHMM has no version I think it should be what it says, or perhaps now /^(Glimmer(M|HMM))/. Here's why: I came onto the scene at Glimmer.pm rev 1.4. At that stage it only parse GlimmerM. I noted that GlimmerHMM was the same output format as GlimmerM, except for the first line. So in rev 1.5 I modified the regexp to match both ie. \S* . This would also hopefully match any other Glimmer-clone formats that arose. I also fixed the pdocs to say this, and added tests to t/Genpred.t. % cvs diff -r 1.4 -r 1.5 Bio/Tools/Glimmer.pm % cvs diff -r 1.15 -r 1.16 t/Genpred.t I then planned to extend support to Glimmer2 and Glimmer3. I added the 4 test files (t/Glimmer*.out) but never wrote the code. This is where you have come in Mark :-) > I lifted that bit of code to do format detection...we don't have GlimmerHMM > installed locally, so I'm assuming Torsten's output is correct and the above > is a bug. Guess I'll go check bugzilla... I'm pretty sure my 4 test files are correct - I spent a lot of time ensuring they were consistent etc, as I was getting very confused with the different "glimmer" versions! Hope this all helps, --Torsten From avilella at gmail.com Tue Feb 13 08:20:15 2007 From: avilella at gmail.com (Albert Vilella) Date: Tue, 13 Feb 2007 13:20:15 +0000 Subject: [Bioperl-l] number of gaps for the other sequences in an alignment Message-ID: <358f4d650702130520n269419cfkb9cb6dac8feaaa5c@mail.gmail.com> Hi, It would be great if we could have a method to count, given one sequence in an alignment, the number of gaps present in the rest of the sequences of the alignment. That is, for each nucleotide/aminoacidic position of the sequence of interest, look at the column in the alignment, count the gaps, then sum them over for the rest of the non-gapped columns in the sequence of interest. Has anyone tried this before? My idea is to end up having a coefficient of indel contribution for each of the sequences in the alignment, with this coefficient being high when one sequences forces a lot of gaps to be inserted in the final alignment, in order to accommodate this given sequence. I would say that the best place for this is either using methods already available in SimpleAlign, or have something new added there. Looking forward to your comments, Cheers, Albert. From bix at sendu.me.uk Tue Feb 13 11:09:09 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 13 Feb 2007 16:09:09 +0000 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts Message-ID: <45D1E2A5.6060104@sendu.me.uk> I have some raw sequences in a Bio::DB::SeqFeature::Store mysql database and wanted to associated some basic information with them, like exon positions. I thought of creating Bio::SeqFeature::Gene::Transcript objects and storing them so I could later use features() to see what other features overlapped exons. I ran into a fatal error that can be replicated with the following simplified one-liner: perl -MBio::DB::SeqFeature::Store -MBio::SeqFeature::Gene::Transcript -e '$db = Bio::DB::SeqFeature::Store->new(-adaptor => "DBI::mysql", -dsn => "dbi:mysql:test"); $trans = Bio::SeqFeature::Gene::Transcript->new(-start => 1, -end => 2, -seq_id => "test"); $db->store($trans); @trans = $db->features(-seqid => $id, -type => "transcript"); print "@trans\n";' code sub { package Bio::SeqFeature::Generic; use strict 'refs'; my $self = shift @_; foreach my $f (@{[] unless $$self{'_gsf_sub_array'};}) { $f = undef; } $$self{'_gsf_seq'} = undef; foreach my $t (keys %{{} unless $$self{'_gsf_tag_hash'};}) { $$self{'_gsf_tag_hash'}{$t} = undef; delete $$self{'_gsf_tag_hash'}{$t}; } } did not evaluate to a subroutine reference, at /.../Bio/DB/SeqFeature/Store.pm line 2280 Is this a bug? Or am I taking the wrong approach? From johnsonm at gmail.com Tue Feb 13 15:10:23 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Tue, 13 Feb 2007 14:10:23 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: You're quite correct. I wasn't paying enough attention. That does work just fine. I fat-fingered something somewhere else, broke my version of the module for GlimmerHMM, hallucinated and confused \S and \s. 8) All I have left now is to fixup the POD documentation and such and then I can send the module along and somebody can make whatever tweaks and check it in. Shall I open a ticket in Bugzilla for this and attach diffs, or just send them along to somebody to take care of directly? Oh, one thing I have not mentioned. I also added a -seqname argument. Glimmer2 does not provide any kind of sequence identifier in the output, and only processes the first sequence in a fasta file. It would be tedious to have to code around this by fixing up the predictions after they are produced, so I added the option to provide this missing info up front, hopefully allowing downstream code to not have to care as much and have a special case for fixing up Glimmer2 predictions. On 2/12/07, Torsten Seemann wrote: > I think it should be what it says, or perhaps now /^(Glimmer(M|HMM))/. > Here's why: > > I came onto the scene at Glimmer.pm rev 1.4. At that stage it only > parse GlimmerM. I noted that GlimmerHMM was the same output format as > GlimmerM, except for the first line. So in rev 1.5 I modified the > regexp to match both ie. \S* . This would also hopefully match any > other Glimmer-clone formats that arose. I also fixed the pdocs to say > this, and added tests to t/Genpred.t. > % cvs diff -r 1.4 -r 1.5 Bio/Tools/Glimmer.pm > % cvs diff -r 1.15 -r 1.16 t/Genpred.t > > I then planned to extend support to Glimmer2 and Glimmer3. I added the > 4 test files (t/Glimmer*.out) but never wrote the code. This is where > you have come in Mark :-) > > > I lifted that bit of code to do format detection...we don't have > GlimmerHMM > > installed locally, so I'm assuming Torsten's output is correct and the > above > > is a bug. Guess I'll go check bugzilla... > > I'm pretty sure my 4 test files are correct - I spent a lot of time > ensuring they were consistent etc, as I was getting very confused with > the different "glimmer" versions! > > Hope this all helps, > > --Torsten > From cjfields at uiuc.edu Tue Feb 13 15:47:19 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 13 Feb 2007 14:47:19 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: You'll also want to update whatever relevant tests there are for Glimmer; looks like they are in GenPred.t. chris On Feb 13, 2007, at 2:10 PM, Mark Johnson wrote: > You're quite correct. I wasn't paying enough attention. That > does work > just fine. I fat-fingered something somewhere else, broke my > version of the > module for GlimmerHMM, hallucinated and confused \S and \s. 8) > All I have left now is to fixup the POD documentation and such > and then > I can send the module along and somebody can make whatever tweaks > and check > it in. Shall I open a ticket in Bugzilla for this and attach > diffs, or just > send them along to somebody to take care of directly? > Oh, one thing I have not mentioned. I also added a -seqname > argument. > Glimmer2 does not provide any kind of sequence identifier in the > output, and > only processes the first sequence in a fasta file. It would be > tedious to > have to code around this by fixing up the predictions after they are > produced, so I added the option to provide this missing info up front, > hopefully allowing downstream code to not have to care as much and > have a > special case for fixing up Glimmer2 predictions. > > On 2/12/07, Torsten Seemann > wrote: > >> I think it should be what it says, or perhaps now /^(Glimmer(M| >> HMM))/. >> Here's why: >> >> I came onto the scene at Glimmer.pm rev 1.4. At that stage it only >> parse GlimmerM. I noted that GlimmerHMM was the same output format as >> GlimmerM, except for the first line. So in rev 1.5 I modified the >> regexp to match both ie. \S* . This would also hopefully match any >> other Glimmer-clone formats that arose. I also fixed the pdocs to say >> this, and added tests to t/Genpred.t. >> % cvs diff -r 1.4 -r 1.5 Bio/Tools/Glimmer.pm >> % cvs diff -r 1.15 -r 1.16 t/Genpred.t >> >> I then planned to extend support to Glimmer2 and Glimmer3. I added >> the >> 4 test files (t/Glimmer*.out) but never wrote the code. This is where >> you have come in Mark :-) >> >>> I lifted that bit of code to do format detection...we don't have >> GlimmerHMM >>> installed locally, so I'm assuming Torsten's output is correct >>> and the >> above >>> is a bug. Guess I'll go check bugzilla... >> >> I'm pretty sure my 4 test files are correct - I spent a lot of time >> ensuring they were consistent etc, as I was getting very confused >> with >> the different "glimmer" versions! >> >> Hope this all helps, >> >> --Torsten >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From thokeller at gmail.com Tue Feb 13 17:00:06 2007 From: thokeller at gmail.com (Thomas Keller) Date: Tue, 13 Feb 2007 14:00:06 -0800 Subject: [Bioperl-l] update/install problem Message-ID: Could someone suggest a workaround or fix for this error? $ sudo fink update bioperl-pm586 Information about 5850 packages read in 2 seconds. The package 'bioperl-pm586' will be built and installed. The package 'xml-sax-pm586' will be installed. The package 'xml-sax-writer-pm586' will be built and installed. The package 'xml-filter-buffertext-pm586' will be built and installed. The following package will be installed or updated: bioperl-pm586 The following 3 additional packages will be installed: xml-filter-buffertext-pm586 xml-sax-pm586 xml-sax-writer-pm586 Do you want to continue? [Y/n] Y /sw/bin/dpkg-lockwait -i /sw/fink/dists/unstable/main/binary-darwin-powerpc/libs/perlmods/xml-sax-pm586_0.13-2_darwin- powerpc.deb (Reading database ... 48029 files and directories currently installed.) Preparing to replace xml-sax-pm586 0.13-2 (using .../xml-sax-pm586_0.13-2_darwin-powerpc.deb) ... Unpacking replacement xml-sax-pm586 ... Setting up xml-sax-pm586 (0.13-2) ... update-perl586-sax-parsers: adding Perl SAX parser module info file of XML::SAX::PurePerl... Can't locate object method "save_parsers_debian" via package "XML::SAX" at /sw/sbin/update-perl586-sax-parsers line 96. /sw/bin/dpkg: error processing xml-sax-pm586 (--install): subprocess post-installation script returned error exit status 22 Errors were encountered while processing: xml-sax-pm586 ### execution of /sw/bin/dpkg-lockwait failed, exit code 1 Failed: can't install package xml-sax-pm586-0.13-2 -- Tom Keller "Ecrasez l'Infame!" -- Voltaire From sac at bioperl.org Tue Feb 13 18:00:46 2007 From: sac at bioperl.org (Steve Chervitz) Date: Tue, 13 Feb 2007 15:00:46 -0800 Subject: [Bioperl-l] Bio::Root::Utilities.pm Message-ID: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com> I noticed that Bio::Root::Utilities was purged from bioperl-live for the 1.5.2 release, but I'd like us to consider adding it back. I agree that the other purged Root modules were ancient relics of the past, but Bio::Root:: Utilities.pm still has signs of life (at least I still find occasion to use it, or refer to code in it). I know that it's not currently used by any other modules in Bioperl, but there are likely some legacy scripts out there that rely on it. Probably most of those scripts are ones I've written, but there have been substantive commits by others in the not-to-distant past (Dec 2005), so at least some folks besides myself are using it and may hesitate to upgrade their bioperl installation if it's absent. I'm all for avoiding bloat in the codebase and am eager to see Bioperl be more lean and mean, but I'd like to keep this module around. I'll agree to add some tests for it as well as clean some things up (e.g., use Bio::Root::IO to get temp file name). Cheers, Steve -- Steve Chervitz sac at bioperl.org From cjfields at uiuc.edu Tue Feb 13 20:29:03 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 13 Feb 2007 19:29:03 -0600 Subject: [Bioperl-l] update/install problem In-Reply-To: References: Message-ID: On Feb 13, 2007, at 4:00 PM, Thomas Keller wrote: > Could someone suggest a workaround or fix for this error? > > $ sudo fink update bioperl-pm586 > Information about 5850 packages read in 2 seconds. > The package 'bioperl-pm586' will be built and installed. > The package 'xml-sax-pm586' will be installed. > The package 'xml-sax-writer-pm586' will be built and installed. > The package 'xml-filter-buffertext-pm586' will be built and installed. > The following package will be installed or updated: > bioperl-pm586 > The following 3 additional packages will be installed: > xml-filter-buffertext-pm586 xml-sax-pm586 xml-sax-writer-pm586 > Do you want to continue? [Y/n] Y > /sw/bin/dpkg-lockwait -i > /sw/fink/dists/unstable/main/binary-darwin-powerpc/libs/perlmods/ > xml-sax-pm586_0.13-2_darwin- > powerpc.deb > (Reading database ... 48029 files and directories currently > installed.) > Preparing to replace xml-sax-pm586 0.13-2 (using > .../xml-sax-pm586_0.13-2_darwin-powerpc.deb) ... > Unpacking replacement xml-sax-pm586 ... > Setting up xml-sax-pm586 (0.13-2) ... > update-perl586-sax-parsers: adding Perl SAX parser module info file of > XML::SAX::PurePerl... > Can't locate object method "save_parsers_debian" via package > "XML::SAX" at > /sw/sbin/update-perl586-sax-parsers line 96. > /sw/bin/dpkg: error processing xml-sax-pm586 (--install): > subprocess post-installation script returned error exit status 22 > Errors were encountered while processing: > xml-sax-pm586 > ### execution of /sw/bin/dpkg-lockwait failed, exit code 1 > Failed: can't install package xml-sax-pm586-0.13-2 The fink installation seems to be hanging on XML::SAX, not bioperl. You could try installing XML::SAX (now at v. 0.15) via CPAN using 'sudo cpan'; I updated just recently w/o problems. As an aside, you could similarly install bioperl directly from CPAN (which I also haven't had any problems with). The installation allows for installing optional modules. chris From cjfields at uiuc.edu Tue Feb 13 22:41:31 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 13 Feb 2007 21:41:31 -0600 Subject: [Bioperl-l] Bio::Root::Utilities.pm In-Reply-To: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com> References: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com> Message-ID: <1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu> On Feb 13, 2007, at 5:00 PM, Steve Chervitz wrote: > I noticed that Bio::Root::Utilities was purged from bioperl-live > for the > 1.5.2 release, but I'd like us to consider adding it back. I agree > that the > other purged Root modules were ancient relics of the past, but > Bio::Root:: > Utilities.pm still has signs of life (at least I still find > occasion to use > it, or refer to code in it). > > I know that it's not currently used by any other modules in > Bioperl, but > there are likely some legacy scripts out there that rely on it. > Probably > most of those scripts are ones I've written, but there have been > substantive > commits by others in the not-to-distant past (Dec 2005), so at > least some > folks besides myself are using it and may hesitate to upgrade their > bioperl > installation if it's absent. > > I'm all for avoiding bloat in the codebase and am eager to see > Bioperl be > more lean and mean, but I'd like to keep this module around. I'll > agree to > add some tests for it as well as clean some things up (e.g., use > Bio::Root::IO to get temp file name). > > Cheers, > Steve > -- > Steve Chervitz > sac at bioperl.org I don't have a problem with adding it back, esp. if tests are added. Everything in Bio::Root* not tied to a module was yanked out when no one spoke up about cleaning up Bio::Root* modules: http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12792/ focus=12839 Maybe others disagree? chris From bix at sendu.me.uk Wed Feb 14 03:00:35 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 14 Feb 2007 08:00:35 +0000 Subject: [Bioperl-l] update/install problem In-Reply-To: References: Message-ID: <45D2C1A3.9060300@sendu.me.uk> Chris Fields wrote: > As an aside, you could similarly install bioperl directly from CPAN > (which I also haven't had any problems with). Indeed. If you follow the unix instructions at http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix you should have a problem-free complete install under Mac OS X. From bix at sendu.me.uk Wed Feb 14 09:08:22 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 14 Feb 2007 14:08:22 +0000 Subject: [Bioperl-l] BLASTXML changes (good this time!) In-Reply-To: References: <45CCF861.8030000@sendu.me.uk> Message-ID: <45D317D6.5070903@sendu.me.uk> Chris Fields wrote: > > On Feb 9, 2007, at 4:40 PM, Sendu Bala wrote: > >> Chris Fields wrote: >>> If Sendu is out there, I think we can safely remove any dependencies >>> beyond XML::SAX 0.15 for the next release. Should I go ahead and >>> modify Build.PL? >> >> Sure, good to hear. > > I added a version dependency for XML::SAX (v. 0.15) for the PurePerl > fix. That likely obviates the need for a Bundle for XML::Simple. Not > too pressing; we can determine that before the next release. The bundle is now obsolete. Does anything in Bioperl, or any of its dependencies, now make use of the expat library? If not, I can remove mention of it from the install documentation. From bix at sendu.me.uk Wed Feb 14 09:02:39 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 14 Feb 2007 14:02:39 +0000 Subject: [Bioperl-l] DB.t failures Message-ID: <45D3167F.2000608@sendu.me.uk> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer getting sequences back from NCBI in the order we requested them in batch mode. Is this a change at NCBI? Is there some way we can make sure to return the sequences in the expected order? Or shouldn't the order be expected (should the test script be altered)? From cjfields at uiuc.edu Wed Feb 14 09:37:07 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 14 Feb 2007 08:37:07 -0600 Subject: [Bioperl-l] DB.t failures In-Reply-To: <45D3167F.2000608@sendu.me.uk> References: <45D3167F.2000608@sendu.me.uk> Message-ID: <49A5C7D3-8D63-452C-B0EA-6F7144F85E35@uiuc.edu> Confirmed on this end. It's possible that the default sort order from eutils is different now though I haven't seen anything on the eutils mail list. There may be a way to set the sort order via the base URL; I'll check into it later today; I'm still digging myself out from the midwest blizzard. chris On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: > DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer > getting sequences back from NCBI in the order we requested them in > batch > mode. > > Is this a change at NCBI? Is there some way we can make sure to return > the sequences in the expected order? Or shouldn't the order be > expected > (should the test script be altered)? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Feb 14 09:42:05 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 14 Feb 2007 08:42:05 -0600 Subject: [Bioperl-l] BLASTXML changes (good this time!) In-Reply-To: <45D317D6.5070903@sendu.me.uk> References: <45CCF861.8030000@sendu.me.uk> <45D317D6.5070903@sendu.me.uk> Message-ID: On Feb 14, 2007, at 8:08 AM, Sendu Bala wrote: > Chris Fields wrote: >> >> On Feb 9, 2007, at 4:40 PM, Sendu Bala wrote: >> >>> Chris Fields wrote: >>>> If Sendu is out there, I think we can safely remove any >>>> dependencies >>>> beyond XML::SAX 0.15 for the next release. Should I go ahead and >>>> modify Build.PL? >>> >>> Sure, good to hear. >> >> I added a version dependency for XML::SAX (v. 0.15) for the PurePerl >> fix. That likely obviates the need for a Bundle for XML::Simple. >> Not >> too pressing; we can determine that before the next release. > > The bundle is now obsolete. Does anything in Bioperl, or any of its > dependencies, now make use of the expat library? If not, I can remove > mention of it from the install documentation. I'll try getting something up about XML::SAX on the wiki today. XML::Parser, though, still requires expat AFAIK: http://www.bioperl.org/wiki/BioPerl_Dependencies chris From kellert at ohsu.edu Tue Feb 13 17:43:24 2007 From: kellert at ohsu.edu (Thomas J Keller) Date: Tue, 13 Feb 2007 14:43:24 -0800 Subject: [Bioperl-l] HowTo:SearchIO Message-ID: Greetings, I've been away from programming and informatics for many months. Hoping to get back into it, I thought it would be good to review the tutorials. I tried the code in the tutorial on the sample blast report in the tutorial and it worked fine. So I ran a blastx search and saved the results and tried to parse them: It gave the "... parsing" message, but no other results get reported. Any suggestions? Thanks, Tom Tom Keller, Ph.D. kellert at ohsu.edu 503-494-2442 6339b Basic Science Bldg http://www.ohsu.edu/research/core From mrouard at gmail.com Wed Feb 14 06:23:47 2007 From: mrouard at gmail.com (Mathieu Rouard) Date: Wed, 14 Feb 2007 12:23:47 +0100 Subject: [Bioperl-l] get the sequence of a column in a multiple alignment Message-ID: Dear all, I am starting to use the bioperl API to parse multiple alignments and I am wondering what is the most effective way to splice all the columns from an alignment (all the AA at the postion 1, position 2 etc.). I quickly implemented this simple code but it becomes quite slow when the length of sequences increases. my $stream = Bio::AlignIO->new(-file => $inputfilename, '-format' => 'stockholm'); my $aln = $stream->next_aln(); my $length = $aln->length(); my %column; for (my $i=1;$i<=$length;$i++) { my $aa; foreach my $seq ($aln->each_seq()) { my $obj = $seq->trunc($i,$i); $aa .=$obj->seq; } # need to track the column number and the sequence of the column push $column, $aa; } Would you have any other suggestion? thanks Mathieu From avilella at gmail.com Wed Feb 14 10:29:02 2007 From: avilella at gmail.com (Albert Vilella) Date: Wed, 14 Feb 2007 15:29:02 +0000 Subject: [Bioperl-l] get the sequence of a column in a multiple alignment In-Reply-To: References: Message-ID: <358f4d650702140729u4dae2847qc8eeeb45b20faca4@mail.gmail.com> there is a slice method: $mini_aln = $aln->slice(20,30); # get a block of columns Title : slice Usage : $aln2 = $aln->slice(20,30) Function : Creates a slice from the alignment inclusive of start and end columns, and the first column in the alignment is denoted 1. Sequences with no residues in the slice are excluded from the new alignment and a warning is printed. Slice beyond the length of the sequence does not do padding. Returns : A Bio::SimpleAlign object Args : Positive integer for start column, positive integer for end column, optional boolean which if true will keep gap-only columns in the newly created slice. Example: $aln2 = $aln->slice(20,30,1) but I don't know how well it behaves for lots of sequences :) On 2/14/07, Mathieu Rouard wrote: > Dear all, > > I am starting to use the bioperl API to parse multiple alignments and I am > wondering what is the most effective way to splice all the columns from an > alignment (all the AA at the postion 1, position 2 etc.). I quickly > implemented this simple code but it becomes quite slow when the length of > sequences increases. > > my $stream = Bio::AlignIO->new(-file => $inputfilename, > '-format' => 'stockholm'); > > my $aln = $stream->next_aln(); > > my $length = $aln->length(); > my %column; > > for (my $i=1;$i<=$length;$i++) { > my $aa; > foreach my $seq ($aln->each_seq()) { > my $obj = $seq->trunc($i,$i); > $aa .=$obj->seq; > } > # need to track the column number and the sequence of the column > push $column, $aa; > } > > Would you have any other suggestion? > > thanks > Mathieu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Wed Feb 14 11:59:49 2007 From: jason at bioperl.org (Jason Stajich) Date: Wed, 14 Feb 2007 08:59:49 -0800 Subject: [Bioperl-l] HowTo:SearchIO In-Reply-To: References: Message-ID: As always, reporting the version of BLAST and Bioperl you have installed will help someone diagnose if this is a fixed problem or not. If you trawl through the list archives you'll chris and others have been playing cat and mouse with the text version output from NCBI BLAST which appears to be an ever evolving beast. So the best advice right now is to get the latest bioperl from CVS to insure you have all the patches that might parse this version. If it still fails then the standard response will be to submit the report as an attachment to a new bug report on the bugzilla. thanks, -jason On Feb 13, 2007, at 2:43 PM, Thomas J Keller wrote: > Greetings, > I've been away from programming and informatics for many months. > Hoping to get back into it, I thought it would be good to review the > tutorials. > I tried the code in the tutorial on the sample blast report in the > tutorial and it worked fine. So I ran a blastx search and saved the > results and tried to parse them: It gave the "... parsing" message, > but no other results get reported. > > Any suggestions? > > Thanks, > Tom > > Tom Keller, Ph.D. > kellert at ohsu.edu > 503-494-2442 > 6339b Basic Science Bldg > http://www.ohsu.edu/research/core > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From dmessina at wustl.edu Wed Feb 14 11:58:45 2007 From: dmessina at wustl.edu (David Messina) Date: Wed, 14 Feb 2007 10:58:45 -0600 Subject: [Bioperl-l] HowTo:SearchIO In-Reply-To: References: Message-ID: <6E3CAB6B-9F9E-46FD-9021-50D7FE011860@wustl.edu> Hi Tom, Could you tell us what version of BioPerl you are using, and what specific example is failing for you? And could you post your code? That would make it easier to diagnose the problem. Thanks, Dave -- Dave Messina Senior Programmer/Analyst, Assembly Group WashU Genome Sequencing Center dmessina a t wustl.edu 314-286-1415 From cjfields at uiuc.edu Wed Feb 14 12:28:24 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 14 Feb 2007 11:28:24 -0600 Subject: [Bioperl-l] HowTo:SearchIO In-Reply-To: References: Message-ID: <3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu> I would also strongly encourage switching to using XML-based parsing, which is much more stable now. Here's the link to the NCBI response re: BLAST report parsing: http://bioperl.org/wiki/NCBI_Blast_email chris (taking a break from shoveling snow...) On Feb 14, 2007, at 10:59 AM, Jason Stajich wrote: > As always, reporting the version of BLAST and Bioperl you have > installed will help someone diagnose if this is a fixed problem or > not. If you trawl through the list archives you'll chris and others > have been playing cat and mouse with the text version output from > NCBI BLAST which appears to be an ever evolving beast. > > So the best advice right now is to get the latest bioperl from CVS > to insure you have all the patches that might parse this version. If > it still fails then the standard response will be to submit the > report as an attachment to a new bug report on the bugzilla. > > thanks, > -jason > > > On Feb 13, 2007, at 2:43 PM, Thomas J Keller wrote: > >> Greetings, >> I've been away from programming and informatics for many months. >> Hoping to get back into it, I thought it would be good to review the >> tutorials. >> I tried the code in the tutorial on the sample blast report in the >> tutorial and it worked fine. So I ran a blastx search and saved the >> results and tried to parse them: It gave the "... parsing" message, >> but no other results get reported. >> >> Any suggestions? >> >> Thanks, >> Tom >> >> Tom Keller, Ph.D. >> kellert at ohsu.edu >> 503-494-2442 >> 6339b Basic Science Bldg >> http://www.ohsu.edu/research/core >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From sac at bioperl.org Wed Feb 14 13:20:17 2007 From: sac at bioperl.org (Steve Chervitz) Date: Wed, 14 Feb 2007 10:20:17 -0800 Subject: [Bioperl-l] Bio::Root::Utilities.pm In-Reply-To: <1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu> References: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com> <1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu> Message-ID: <8f200b4c0702141020x5815018bvd69873222582d9d5@mail.gmail.com> On 2/13/07, Chris Fields wrote: > > > On Feb 13, 2007, at 5:00 PM, Steve Chervitz wrote: > > > I noticed that Bio::Root::Utilities was purged from bioperl-live > > for the > > 1.5.2 release, but I'd like us to consider adding it back. I agree > > that the > > other purged Root modules were ancient relics of the past, but > > Bio::Root:: > > Utilities.pm still has signs of life (at least I still find > > occasion to use > > it, or refer to code in it). > > > > I know that it's not currently used by any other modules in > > Bioperl, but > > there are likely some legacy scripts out there that rely on it. > > Probably > > most of those scripts are ones I've written, but there have been > > substantive > > commits by others in the not-to-distant past (Dec 2005), so at > > least some > > folks besides myself are using it and may hesitate to upgrade their > > bioperl > > installation if it's absent. > > > > I'm all for avoiding bloat in the codebase and am eager to see > > Bioperl be > > more lean and mean, but I'd like to keep this module around. I'll > > agree to > > add some tests for it as well as clean some things up (e.g., use > > Bio::Root::IO to get temp file name). > > > > Cheers, > > Steve > > -- > > Steve Chervitz > > sac at bioperl.org > > I don't have a problem with adding it back, esp. if tests are added. > Everything in Bio::Root* not tied to a module was yanked out when no > one spoke up about cleaning up Bio::Root* modules: > > http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12792/ > focus=12839 > > Maybe others disagree? > > chris > Sorry I missed out on that thread. I had some trouble with my bioperl-l email delivery getting disabled due to excessive bounces, and it took me a while to catch it. Bio::Root::Utilities is quite a grab bag of miscellaneous general functions that are occasionally useful for perl scripting (e.g., determining end-of-line characters, sending email, etc.). The code could definitely use a review, and maybe an example script to advertise it. I can look into this, and suggestions are welcome. Steve From dmessina at wustl.edu Wed Feb 14 13:55:18 2007 From: dmessina at wustl.edu (David Messina) Date: Wed, 14 Feb 2007 12:55:18 -0600 Subject: [Bioperl-l] HowTo:SearchIO In-Reply-To: <3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu> References: <3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu> Message-ID: <49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu> On Feb 14, 2007, at 11:28 AM, Chris Fields wrote: > I would also strongly encourage switching to using XML-based parsing, Unless anyone objects, I would be happy to update the HOWTO to suggest people make the switch and give an example of XML parsing. The Bio::SearchIO synopsis is already an XML example. However, there's no warning about text-based parsing nor a suggestion to use XML that I can see -- perhaps should be added? Dave From cjfields at uiuc.edu Wed Feb 14 15:12:21 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 14 Feb 2007 14:12:21 -0600 Subject: [Bioperl-l] HowTo:SearchIO In-Reply-To: <49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu> References: <3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu> <49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu> Message-ID: On Feb 14, 2007, at 12:55 PM, David Messina wrote: > > On Feb 14, 2007, at 11:28 AM, Chris Fields wrote: > >> I would also strongly encourage switching to using XML-based parsing, > > Unless anyone objects, I would be happy to update the HOWTO to > suggest people make the switch and give an example of XML parsing. > > The Bio::SearchIO synopsis is already an XML example. However, > there's no warning about text-based parsing nor a suggestion to use > XML that I can see -- perhaps should be added? > > Dave We should probably add something specifically for BLAST, yes. Other text parsers should be fine. Personally, I use XML or tabular output parsing simply b/c they are faster and do what I need. I think we'll need to retain the capability for text-based BLAST parsing, but it will become extremely bloated long-term if we plan on continuing support for parsing all versions and flavors of BLAST, particularly if NCBI continues to change the output. chris From dmessina at wustl.edu Wed Feb 14 15:46:31 2007 From: dmessina at wustl.edu (David Messina) Date: Wed, 14 Feb 2007 14:46:31 -0600 Subject: [Bioperl-l] HowTo:SearchIO In-Reply-To: References: <3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu> <49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu> Message-ID: <136DA052-B9FD-4547-B262-EC6E38B47392@wustl.edu> On Feb 14, 2007, at 2:12 PM, Chris Fields wrote: > We should probably add something specifically for BLAST, yes. > Other text parsers should be fine. Good point -- I'll make it clear it's only pertinent to BLAST. > I think we'll need to retain the capability for text-based BLAST > parsing, Agreed. Through the 1.6 release at least, I would think. > particularly if NCBI continues to change the output. Well, clearly the solution is not to use the NCBI flavor of BLAST. :) Dave (look at my email address) From jay at jays.net Thu Feb 15 08:08:56 2007 From: jay at jays.net (Jay Hannah) Date: Thu, 15 Feb 2007 07:08:56 -0600 Subject: [Bioperl-l] DB.t failures In-Reply-To: <45D3167F.2000608@sendu.me.uk> References: <45D3167F.2000608@sendu.me.uk> Message-ID: On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: > DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer > getting sequences back from NCBI in the order we requested them in > batch > mode. Is this the same result you get? DIED. FAILED tests 59-60, 63-64, 67-68, 71-72 Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97 okay, 85.84%) Failed Test Stat Wstat Total Fail Failed List of Failed ------------------------------------------------------------------------ ------- DB.t 8 2048 113 8 7.08% 59-60 63-64 67-68 71-72 8 subtests skipped. Thanks, j seqlab.net http://www.bioperl.org/wiki/User:Jhannah From bix at sendu.me.uk Thu Feb 15 08:37:32 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 15 Feb 2007 13:37:32 +0000 Subject: [Bioperl-l] DB.t failures In-Reply-To: References: <45D3167F.2000608@sendu.me.uk> Message-ID: <45D4621C.6040309@sendu.me.uk> Jay Hannah wrote: > On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: >> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer >> getting sequences back from NCBI in the order we requested them in >> batch >> mode. > > Is this the same result you get? > > > DIED. FAILED tests 59-60, 63-64, 67-68, 71-72 > Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97 > okay, 85.84%) > Failed Test Stat Wstat Total Fail Failed List of Failed > ------------------------------------------------------------------------ > ------- > DB.t 8 2048 113 8 7.08% 59-60 63-64 67-68 71-72 > 8 subtests skipped. Yes, those fails are all caused by results in the wrong order (I believe). From cjfields at uiuc.edu Thu Feb 15 09:22:09 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 08:22:09 -0600 Subject: [Bioperl-l] DB.t failures In-Reply-To: <45D4621C.6040309@sendu.me.uk> References: <45D3167F.2000608@sendu.me.uk> <45D4621C.6040309@sendu.me.uk> Message-ID: On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote: > Jay Hannah wrote: >> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: >>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer >>> getting sequences back from NCBI in the order we requested them in >>> batch >>> mode. >> >> Is this the same result you get? >> >> >> DIED. FAILED tests 59-60, 63-64, 67-68, 71-72 >> Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97 >> okay, 85.84%) >> Failed Test Stat Wstat Total Fail Failed List of Failed >> --------------------------------------------------------------------- >> --- >> ------- >> DB.t 8 2048 113 8 7.08% 59-60 63-64 67-68 71-72 >> 8 subtests skipped. > > Yes, those fails are all caused by results in the wrong order (I > believe). I'm fixing those now so it doesn't depend on order and will commit in the next few minutes. chris From bix at sendu.me.uk Thu Feb 15 09:37:00 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 15 Feb 2007 14:37:00 +0000 Subject: [Bioperl-l] DB.t failures In-Reply-To: <586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu> References: <45D3167F.2000608@sendu.me.uk> <45D4621C.6040309@sendu.me.uk> <586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu> Message-ID: <45D4700C.8020305@sendu.me.uk> Chris Fields wrote: > > On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote: > >> Jay Hannah wrote: >>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: >>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer >>>> getting sequences back from NCBI in the order we requested them in >>>> batch mode. > > Okay, I committed a fix for that. I hope there are many users who > depend on the returned sequence order for anything! s/are/aren't/ ? I suspect there might be, and its certainly a reasonable assumption to make. Did you not see an easy way of maintaining the order? From cjfields at uiuc.edu Thu Feb 15 09:28:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 08:28:46 -0600 Subject: [Bioperl-l] DB.t failures In-Reply-To: <45D4621C.6040309@sendu.me.uk> References: <45D3167F.2000608@sendu.me.uk> <45D4621C.6040309@sendu.me.uk> Message-ID: <586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu> On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote: > Jay Hannah wrote: >> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: >>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer >>> getting sequences back from NCBI in the order we requested them in >>> batch >>> mode. >> >> Is this the same result you get? >> >> >> DIED. FAILED tests 59-60, 63-64, 67-68, 71-72 >> Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97 >> okay, 85.84%) >> Failed Test Stat Wstat Total Fail Failed List of Failed >> --------------------------------------------------------------------- >> --- >> ------- >> DB.t 8 2048 113 8 7.08% 59-60 63-64 67-68 71-72 >> 8 subtests skipped. > > Yes, those fails are all caused by results in the wrong order (I > believe). Okay, I committed a fix for that. I hope there are many users who depend on the returned sequence order for anything! chris From michael.watson at bbsrc.ac.uk Thu Feb 15 09:44:27 2007 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu, 15 Feb 2007 14:44:27 -0000 Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk> Hi OK I have some great images out of this glyph, but I can't see the axis, and nor is it labelled (ie does it go from 0 - 100%?) so isn't great for publication. The docs say: "NOTE: -gc_window=>'auto' gives nice results and is recommended for drawing GC content. The GC content axes draw slightly outside the panel, so you may wish to add some extra padding on the right and left. " Any idea how to do this? Basically, I want a nice GC graph with the axis quite clearly labelled, and a nice "%GC" title next to it :) Thanks Mick The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. From nehadnahar at yahoo.co.in Thu Feb 15 10:28:42 2007 From: nehadnahar at yahoo.co.in (Neha Nahar) Date: Thu, 15 Feb 2007 15:28:42 +0000 (GMT) Subject: [Bioperl-l] Convert newick to nexus format In-Reply-To: <84703383-600F-42F4-A860-DD0D1C43EE83@bioperl.org> Message-ID: <777943.33252.qm@web8404.mail.in.yahoo.com> Thank you Jason. I ran the tests and they failed, so I re-installed the bioperl module and now it works fine. Regards, Neha. Jason Stajich wrote: Something is wrong with your install I am guessing - can you run the tests? Go to bioperl directory: $ perl t/TreeIO.t can you describe how you installed bioperl? On Feb 5, 2007, at 11:58 AM, Neha Nahar wrote: > > Hi, > Thank you for the code. > I tried it but I still get the same exception. > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus1.pl:18 > > > Please find attached the perl file(nexus.pl). > > > Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ > ~sendu/bioperl/Bio/TreeIO.pm > > Please let me know if I am using the correct version.If not, please > point me to the latest one. > > Thank you. > Regards, > nnahar > > > > > > Jason Stajich wrote:please cc the mailing list > when asking a question or followup. > > Sorry I don't know what you are doing wrong - you didn't resend > your code so I don't know if you still have a typo. > > This code works fine for me > > use Bio::TreeIO; > use strict; > my ($filein,$fileout) = @ARGV; > my ($format,$oformat) = qw(newick nexus); > my $in = Bio::TreeIO->new(-file => $filein, -format => $format);my > $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout"); > > > while( my $t = $in->next_tree ) { > $out->write_tree($t); > } > > > > On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote: > > Thank you very much for the reply. > > > I fixed the code as per your suggestion,but now am getting a > different error: > > > $ ./nexus.pl mrp-input.txt nexus.out > newickfile=mrp-input.txt, nexusfile=nexus.out > > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus.pl:23 > > > -------------------------------------- > > > Please help me out with this script. > > > Thank you. > Regards, > Neha > > > > > > > > > Jason Stajich wrote: you want to write the TREE > out not the TREE WRITER. > > > > > $treeout->write_tree($tree) > > > not > $treeout->write_tree($treeout); > > > On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote: > > > Hello everyone, > > > > > I am trying to convert newick tree to nexus format. > Using the script (refered from and email from George dated Wed Sep > 22 11:52:47 EDT 2004) : > > > > > /*------------------------------------------------------------*/ > > > > > $ cat nexus.pl > #!/usr/bin/perl -w > > > > > use Bio::TreeIO; > > > > > ($NEWICKFILE, $NEXUSFILE) = @ARGV; > print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n"; > my $treeio = new Bio::TreeIO(-format => 'newick', -file => > "$NEWICKFILE"); > my $treeout = new Bio::TreeIO(-format => 'nexus', -file => "> > $NEXUSFILE"); > while(my $tree = $treeio->next_tree) { > $treeout->write_tree($treeout); > } > > > > > exit 0; > > > > > > > > > /*------------------------------------------------------------*/ > > > > > Running the script through command line: > Gives the following error: > > > > > $ ./nexus.pl mrp-input.txt nexus.out > newickfile=mrp-input.txt, nexusfile=nexus.out > > > > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus.pl:23 > > > > > -------------------------------------- > > > > > > > > > Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ > ~sendu/bioperl/Bio/TreeIO.pm > > > > > Questions:- > > > > > 1. Please let me know if I am using the correct version. > If not, please point me to the latest one. > > > > > 2. Provided that the version I am using is the right one, please > let me know what is wrong with the script. > > > > > Thank you. > Regards, > Neha. > > > > > > > > > > > > > > > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > > > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > > > > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > > > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > > > > > > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not > to impress !" > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -Neha Nahar " Work for cause and not for applause, live to express and not to impress !" --------------------------------- Here?s a new way to find what you're looking for - Yahoo! Answers From cjfields at uiuc.edu Thu Feb 15 10:44:23 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 09:44:23 -0600 Subject: [Bioperl-l] DB.t failures In-Reply-To: <45D4700C.8020305@sendu.me.uk> References: <45D3167F.2000608@sendu.me.uk> <45D4621C.6040309@sendu.me.uk> <586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu> <45D4700C.8020305@sendu.me.uk> Message-ID: <809A1A9D-0344-412D-98D7-EA852EFED63D@uiuc.edu> On Feb 15, 2007, at 8:37 AM, Sendu Bala wrote: > Chris Fields wrote: >> >> On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote: >> >>> Jay Hannah wrote: >>>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: >>>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no >>>>> longer >>>>> getting sequences back from NCBI in the order we requested them in >>>>> batch mode. >> >> Okay, I committed a fix for that. I hope there are many users who >> depend on the returned sequence order for anything! > > s/are/aren't/ ? Yes, my oops. > I suspect there might be, and its certainly a reasonable assumption to > make. Did you not see an easy way of maintaining the order? I haven't looked (been busy the last few days), but I think there is a way via efetch. We could add in something to the default base URL if there is something or (probably better) add a sort_order() method to designate a particular sort order, defaulting to the old order if not set. chris From lstein at cshl.edu Thu Feb 15 13:53:13 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 15 Feb 2007 13:53:13 -0500 Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com> Hi Michael, When you set up the panel, do this: Bio::Graphics::Panel->new(-blah -blah, -pad_left => 20, -pad_right => 20); This will leave enough room on the left and right for you to see the Y axis. Otherwise it runs off the edge of the image (ok, this is a mis-design, but it was the only way to solve a chicken-and-egg problem about who gets to say how wide the panel is) Lincoln On 2/15/07, michael watson (IAH-C) wrote: > > Hi > > OK I have some great images out of this glyph, but I can't see the axis, > and nor is it labelled (ie does it go from 0 - 100%?) so isn't great for > publication. The docs say: > > "NOTE: -gc_window=>'auto' gives nice results and is recommended for > drawing GC content. The GC content axes draw slightly outside the > panel, so you may wish to add some extra padding on the right and > left. " > > Any idea how to do this? > > Basically, I want a nice GC graph with the axis quite clearly labelled, > and a nice "%GC" title next to it :) > > Thanks > > Mick > > The information contained in this message may be confidential or legally > privileged and is intended solely for the addressee. If you have > received this message in error please delete it & notify the originator > immediately. > Unauthorised use, disclosure, copying or alteration of this message is > forbidden & may be unlawful. > The contents of this e-mail are the views of the sender and do not > necessarily represent the views of the Institute. > This email and associated attachments has been checked locally for > viruses but we can accept no responsibility once it has left our > systems. > Communications on Institute computers are monitored to secure the > effective operation of the systems and for other lawful purposes. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From johnsonm at gmail.com Thu Feb 15 14:24:08 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Thu, 15 Feb 2007 13:24:08 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: Done. Bug opened in Bugzilla, diffs attached including new/updated tests: http://bugzilla.open-bio.org/show_bug.cgi?id=2206 Can somebody grab that, take a look, tweak to taste, test and commit? Tests pass on my end presently. On 2/13/07, Chris Fields wrote: > > You'll also want to update whatever relevant tests there are for > Glimmer; looks like they are in GenPred.t. > > chris > From cjfields at uiuc.edu Thu Feb 15 14:37:22 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 13:37:22 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: <4C15214E-AE4B-4D85-A710-60536B08BE86@uiuc.edu> On Feb 15, 2007, at 1:24 PM, Mark Johnson wrote: > Done. Bug opened in Bugzilla, diffs attached including new/updated > tests: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2206 > > Can somebody grab that, take a look, tweak to taste, test and > commit? Tests > pass on my end presently. > > On 2/13/07, Chris Fields wrote: >> >> You'll also want to update whatever relevant tests there are for >> Glimmer; looks like they are in GenPred.t. >> >> chris Done; everything passed on this end as well, no tweaking necessary. If there are problems we'll definitely hear about it down the road (Glimmer is a popular tool), but I think you'll be fine. Thanks Mark! chris From cjfields at uiuc.edu Thu Feb 15 14:46:07 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 13:46:07 -0600 Subject: [Bioperl-l] DB.t failures In-Reply-To: <809A1A9D-0344-412D-98D7-EA852EFED63D@uiuc.edu> References: <45D3167F.2000608@sendu.me.uk> <45D4621C.6040309@sendu.me.uk> <586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu> <45D4700C.8020305@sendu.me.uk> <809A1A9D-0344-412D-98D7-EA852EFED63D@uiuc.edu> Message-ID: On Feb 15, 2007, at 9:44 AM, Chris Fields wrote: > > On Feb 15, 2007, at 8:37 AM, Sendu Bala wrote: > >> Chris Fields wrote: >>> >>> On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote: >>> >>>> Jay Hannah wrote: >>>>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: >>>>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no >>>>>> longer >>>>>> getting sequences back from NCBI in the order we requested >>>>>> them in >>>>>> batch mode. >>> >>> Okay, I committed a fix for that. I hope there are many users who >>> depend on the returned sequence order for anything! >> >> s/are/aren't/ ? > > Yes, my oops. > >> I suspect there might be, and its certainly a reasonable >> assumption to >> make. Did you not see an easy way of maintaining the order? > > I haven't looked (been busy the last few days), but I think there is > a way via efetch. > > We could add in something to the default base URL if there is > something or (probably better) add a sort_order() method to designate > a particular sort order, defaulting to the old order if not set. > > chris Delving in to it further, the problem only occurs when using get_seq_stream() directly in batch mode, which is likely only used by developers for testing. The sort issue only pops up when eposting IDs using that mode; retrieved seqs are returned in a different order than through a direct efetch query (the default with get_Stream* or get_Seq* methods). No use of the 'sort' parameter works to get around that problem, not a complete surprise since it is supposed to only work for PubMed, but since the method is rarely used I'll just leave the bullet-proofed tests alone. chris From letondal at pasteur.fr Thu Feb 15 15:23:55 2007 From: letondal at pasteur.fr (Catherine Letondal) Date: Thu, 15 Feb 2007 21:23:55 +0100 Subject: [Bioperl-l] Problem parsing clustalw format ni AlignIO Message-ID: Hi bioperlers, I have a script called protal2dna (http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html, see attachment #1) that realign DNA sequences giving their sequences + the corresponding protein alignment (sequences have to be in the same order or named equivalently). We have a parsing problem reported from the AlignIO class when users enter some clustalw file (see attachment #2 for an example): % protal2dna alig-protal2dna.dat dna-protal2dna.data no alignment available in 'clustalw' format from file 'alig-protal2dna.dat' % I have tried with bioperl 1.4. I have looked in the archive and in the BUGS, but found nothing? Is there any bug fix for this? I also provide the DNA sequences file if you want to test. Thanks a lot in advance, -- Catherine Letondal -- Institut Pasteur www.pasteur.fr/~letondal -------------- next part -------------- A non-text attachment was scrubbed... Name: protal2dna Type: application/octet-stream Size: 11093 bytes Desc: not available URL: -------------- next part -------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: alig-protal2dna.dat Type: application/octet-stream Size: 12022 bytes Desc: not available URL: -------------- next part -------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: dna-protal2dna.data Type: application/octet-stream Size: 7739 bytes Desc: not available URL: From Kevin.M.Brown at asu.edu Thu Feb 15 16:38:25 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 15 Feb 2007 14:38:25 -0700 Subject: [Bioperl-l] Problem parsing clustalw format ni AlignIO In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B402BA7764@EX02.asurite.ad.asu.edu> Did you try Bioperl 1.5.2 to see if updates to it might fix the issue? IIRC 1.4 is nearly 2 years old now. 1.5.2 was released within the last few months. > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Catherine Letondal > Sent: Thursday, February 15, 2007 1:24 PM > To: bioperl-l > Cc: Catherine Letondal; Katja Schuerer > Subject: [Bioperl-l] Problem parsing clustalw format ni AlignIO > > Hi bioperlers, > > I have a script called protal2dna > (http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html, > see attachment #1) that realign DNA sequences giving their > sequences + the corresponding protein alignment (sequences > have to be in the same order or named equivalently). We have > a parsing problem reported from the AlignIO class when users > enter some clustalw file (see attachment #2 for an example): > > % protal2dna alig-protal2dna.dat dna-protal2dna.data no > alignment available in 'clustalw' format from file > 'alig-protal2dna.dat' > % > > I have tried with bioperl 1.4. I have looked in the archive > and in the BUGS, but found nothing? > Is there any bug fix for this? I also provide the DNA > sequences file if you want to test. > > Thanks a lot in advance, > > -- > Catherine Letondal -- Institut Pasteur > www.pasteur.fr/~letondal > > From cjfields at uiuc.edu Thu Feb 15 16:50:54 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 15:50:54 -0600 Subject: [Bioperl-l] Bio::Root::Utilities.pm In-Reply-To: <8f200b4c0702141020x5815018bvd69873222582d9d5@mail.gmail.com> References: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com> <1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu> <8f200b4c0702141020x5815018bvd69873222582d9d5@mail.gmail.com> Message-ID: On Feb 14, 2007, at 12:20 PM, Steve Chervitz wrote: ... >> >> I don't have a problem with adding it back, esp. if tests are added. >> Everything in Bio::Root* not tied to a module was yanked out when no >> one spoke up about cleaning up Bio::Root* modules: >> >> http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12792/ >> focus=12839 >> >> Maybe others disagree? >> >> chris >> > > Sorry I missed out on that thread. I had some trouble with my > bioperl-l > email delivery getting disabled due to excessive bounces, and it > took me a > while to catch it. > > Bio::Root::Utilities is quite a grab bag of miscellaneous general > functions > that are occasionally useful for perl scripting (e.g., determining > end-of-line characters, sending email, etc.). The code could > definitely use > a review, and maybe an example script to advertise it. I can look > into this, > and suggestions are welcome. > > Steve Steve, I have added Root::Utilities back to CVS but I didn't know if I should add back the other related Root modules (didn't know what your future plans were for them). Could the Bio::Root::Global and Bio::Root::Object stuff be consolidated into Bio::Root::Utilities or would that be too problematic? None of the other Bio* modules currently use them. Personally, I use Date::Manip for anything that requires date/time manipulation (updating seq records based on dates, for instance). Some of the other utilities could come in handy, though. Don't know if that helps... chris From cjfields at uiuc.edu Thu Feb 15 16:51:58 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 15:51:58 -0600 Subject: [Bioperl-l] XEMBL deprecation Message-ID: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> I have gone ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService both for deprecation in the wiki and in CVS (though I haven't set any timeline): http://www.bioperl.org/wiki/Deprecated_modules The XEMBL web services are no longer available, and it looks like everything is running through DBFetch now. The XEMBL tests are skipped if no server is detected, so they shouldn't cause any problems with Bioperl installations. Lincoln, was there anything to salvage from these? I noticed they used SOAP::Lite, so maybe we could convert these over to a SOAP-based interface to DBFetch web services? chris From johnsonm at gmail.com Thu Feb 15 17:29:37 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Thu, 15 Feb 2007 16:29:37 -0600 Subject: [Bioperl-l] Bio::Tools::Run::Glimmer? Message-ID: Now that I've got Bio::Tools::Glimmer parsing Glimmer2 and Glimmer3 output, I suppose I might as well go and write Bio::Tools::Run::Glimmer. I suspect another 4-in-1 module may be possible. Now that I think about it, I'll need one for GeneMark, too. Comments? Suggestions on a good module to use as a template? From hlapp at gmx.net Thu Feb 15 20:18:56 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 15 Feb 2007 20:18:56 -0500 Subject: [Bioperl-l] XEMBL deprecation In-Reply-To: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> Message-ID: <989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net> On Feb 15, 2007, at 4:51 PM, Chris Fields wrote: > The XEMBL web services are no longer available What happens if someone invokes the module? Should it maybe return nothing and warn()? I don't think it's a good idea if the module just silently does not function because its backend is no more. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Thu Feb 15 20:48:12 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 19:48:12 -0600 Subject: [Bioperl-l] XEMBL deprecation In-Reply-To: <989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net> References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> <989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net> Message-ID: <00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu> On Feb 15, 2007, at 7:18 PM, Hilmar Lapp wrote: > On Feb 15, 2007, at 4:51 PM, Chris Fields wrote: > >> The XEMBL web services are no longer available > > What happens if someone invokes the module? Should it maybe return > nothing and warn()? I don't think it's a good idea if the module > just silently does not function because its backend is no more. > > -hilmar Yes, I thought the same. I have added a warn() noting the deprecation to the XEMBL constructor and removed XEMBL tests from CVS. The modules are still there for the time being. I actually worry more about the internals; it would be a shame to toss them altogether. Would it be worth it to shift this towards a SOAP-based interface to DBFetch? Or, more precisely, how much trouble would it be to do so? chris From hlapp at gmx.net Thu Feb 15 20:54:29 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 15 Feb 2007 20:54:29 -0500 Subject: [Bioperl-l] XEMBL deprecation In-Reply-To: <00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu> References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> <989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net> <00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu> Message-ID: Well, if dbFetch dosn't have a SOAP based interface, how would you want to do this? -hilmar On Feb 15, 2007, at 8:48 PM, Chris Fields wrote: > On Feb 15, 2007, at 7:18 PM, Hilmar Lapp wrote: > >> On Feb 15, 2007, at 4:51 PM, Chris Fields wrote: >> >>> The XEMBL web services are no longer available >> >> What happens if someone invokes the module? Should it maybe return >> nothing and warn()? I don't think it's a good idea if the module >> just silently does not function because its backend is no more. >> >> -hilmar > > Yes, I thought the same. I have added a warn() noting the > deprecation to the XEMBL constructor and removed XEMBL tests from > CVS. The modules are still there for the time being. > > I actually worry more about the internals; it would be a shame to > toss them altogether. Would it be worth it to shift this towards a > SOAP-based interface to DBFetch? Or, more precisely, how much > trouble would it be to do so? > > chris -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Thu Feb 15 20:59:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 19:59:46 -0600 Subject: [Bioperl-l] XEMBL deprecation In-Reply-To: References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> <989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net> <00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu> Message-ID: <8C7E18C6-B38D-4E15-BE9C-84256B09C312@uiuc.edu> On Feb 15, 2007, at 7:54 PM, Hilmar Lapp wrote: > Well, if dbFetch dosn't have a SOAP based interface, how would you > want to do this? > > -hilmar DBfetch has a SOAP-based interface: http://www.ebi.ac.uk/Tools/webservices/services/dbfetch Just not sure how easy it would be to switch XEMBL code over to using it. We already have Bio::DB::DBFetch so it may be redundant, but I don't recall any other SOAP-based tools in BioPerl beyond some stuff in bioperl-run (and I'm not sure how up-to-date the DBFetch module is). chris From jimhu at tamu.edu Fri Feb 16 00:20:09 2007 From: jimhu at tamu.edu (Jim Hu) Date: Thu, 15 Feb 2007 23:20:09 -0600 Subject: [Bioperl-l] Pathway tools output parser In-Reply-To: References: Message-ID: <1632E2BF-4402-47DE-B750-9763E02711D2@tamu.edu> Hi Chris, I need to check the list more often! I never got an answer here, but Eric Just pointed out a perl api at TAIR that's linked from the BioCyc site. I've used the lisp parser functions from that to move the data to a perl array of arrays, and I'm working on creating object classes for BioCyc objects, starting with genes and products. I need to look at the appropriate ways to link this up to the existing codebase for interconverting to Chado and other BioPerl data types. Jim ===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 On Feb 7, 2007, at 12:07 AM, Chris Mungall wrote: > > Hi Jim > > Did you ever get an answer to this? I'm interested in storing > pathway data > in Chado & I remember enough lisp to get it into something perl- > manageable > like XML > > On Thu, 25 Jan 2007, Jim Hu wrote: > >> Is there a module to parse the lisp object files from Peter Karp's >> Pathway Tools? I need a parser to convert the gene and protein >> objects in EcoCyc releases into something that can be imported into >> Chado. >> ===================================== >> Jim Hu >> Associate Professor >> Dept. of Biochemistry and Biophysics >> 2128 TAMU >> Texas A&M Univ. >> College Station, TX 77843-2128 >> 979-862-4054 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From lstein at cshl.edu Fri Feb 16 08:35:19 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 16 Feb 2007 08:35:19 -0500 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts In-Reply-To: <45D1E2A5.6060104@sendu.me.uk> References: <45D1E2A5.6060104@sendu.me.uk> Message-ID: <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com> Hi, Older versions of Storable can't deal with features that contain subroutine refs. You should get the current version from CPAN. Note that there is a slight security problem here if you don't trust the objects stored in the database. If they contain code refs, the code will be evaluated during deserialization. Lincoln On 2/13/07, Sendu Bala wrote: > > I have some raw sequences in a Bio::DB::SeqFeature::Store mysql database > and wanted to associated some basic information with them, like exon > positions. I thought of creating Bio::SeqFeature::Gene::Transcript > objects and storing them so I could later use features() to see what > other features overlapped exons. I ran into a fatal error that can be > replicated with the following simplified one-liner: > > perl -MBio::DB::SeqFeature::Store -MBio::SeqFeature::Gene::Transcript -e > '$db = Bio::DB::SeqFeature::Store->new(-adaptor => "DBI::mysql", -dsn => > "dbi:mysql:test"); $trans = > Bio::SeqFeature::Gene::Transcript->new(-start => 1, -end => 2, -seq_id > => "test"); $db->store($trans); @trans = $db->features(-seqid => $id, > -type => "transcript"); print "@trans\n";' > > code sub { > package Bio::SeqFeature::Generic; > use strict 'refs'; > my $self = shift @_; > foreach my $f (@{[] unless $$self{'_gsf_sub_array'};}) { > $f = undef; > } > $$self{'_gsf_seq'} = undef; > foreach my $t (keys %{{} unless $$self{'_gsf_tag_hash'};}) { > $$self{'_gsf_tag_hash'}{$t} = undef; > delete $$self{'_gsf_tag_hash'}{$t}; > } > } did not evaluate to a subroutine reference, at > /.../Bio/DB/SeqFeature/Store.pm line 2280 > > > Is this a bug? Or am I taking the wrong approach? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Fri Feb 16 08:47:29 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 16 Feb 2007 08:47:29 -0500 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts In-Reply-To: <45D5B42A.1080303@sendu.me.uk> References: <45D1E2A5.6060104@sendu.me.uk> <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com> <45D5B42A.1080303@sendu.me.uk> Message-ID: <6dce9a0b0702160547s5873cd2bg2c5cf09779138249@mail.gmail.com> Hi Sendu, I'll do a little digging and let you know. Lincoln On 2/16/07, Sendu Bala wrote: > > Lincoln Stein wrote: > > Hi, > > > > Older versions of Storable can't deal with features that contain > > subroutine refs. You should get the current version from CPAN. > > Do you have any idea which version of Storable first supported this? I > can specify that version in Bioperl's Build.PL. > > (else I just just specify the latest version) > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Fri Feb 16 08:52:30 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 16 Feb 2007 08:52:30 -0500 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts In-Reply-To: <45D5B42A.1080303@sendu.me.uk> References: <45D1E2A5.6060104@sendu.me.uk> <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com> <45D5B42A.1080303@sendu.me.uk> Message-ID: <6dce9a0b0702160552l11417675kec7b5b4537cbfa70@mail.gmail.com> It looks like 2.05 or higher is the Storable version to use. It requires B::Deparse, which is (I think) standard on perl 5.6 or higher. Lincoln On 2/16/07, Sendu Bala wrote: > > Lincoln Stein wrote: > > Hi, > > > > Older versions of Storable can't deal with features that contain > > subroutine refs. You should get the current version from CPAN. > > Do you have any idea which version of Storable first supported this? I > can specify that version in Bioperl's Build.PL. > > (else I just just specify the latest version) > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Fri Feb 16 08:55:06 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 16 Feb 2007 08:55:06 -0500 Subject: [Bioperl-l] XEMBL deprecation In-Reply-To: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> Message-ID: <6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com> I like the idea of converting these over to use DBFetch's SOAP services. On the other hand, it isn't llikely that I'm going to have time to do this anytime soon. Probably the best thing to do is to issue a warning and return undef if someone tries to use othe XEMBL module. I'll make that change. Lincoln On 2/15/07, Chris Fields wrote: > > I have gone ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService > both for deprecation in the wiki and in CVS (though I haven't set any > timeline): > > http://www.bioperl.org/wiki/Deprecated_modules > > The XEMBL web services are no longer available, and it looks like > everything is running through DBFetch now. The XEMBL tests are > skipped if no server is detected, so they shouldn't cause any > problems with Bioperl installations. > > Lincoln, was there anything to salvage from these? I noticed they > used SOAP::Lite, so maybe we could convert these over to a SOAP-based > interface to DBFetch web services? > > chris > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Fri Feb 16 08:55:47 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 16 Feb 2007 08:55:47 -0500 Subject: [Bioperl-l] XEMBL deprecation In-Reply-To: <6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com> References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> <6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com> Message-ID: <6dce9a0b0702160555s5b4d7d70n936c603bc79dd40d@mail.gmail.com> Oh, looks like someone has inserted the warnings already. Good. Lincoln On 2/16/07, Lincoln Stein wrote: > > I like the idea of converting these over to use DBFetch's SOAP services. > On the other hand, it isn't llikely that I'm going to have time to do this > anytime soon. > > Probably the best thing to do is to issue a warning and return undef if > someone tries to use othe XEMBL module. I'll make that change. > > Lincoln > > On 2/15/07, Chris Fields wrote: > > > > I have gone ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService > > both for deprecation in the wiki and in CVS (though I haven't set any > > timeline): > > > > http://www.bioperl.org/wiki/Deprecated_modules > > > > The XEMBL web services are no longer available, and it looks like > > everything is running through DBFetch now. The XEMBL tests are > > skipped if no server is detected, so they shouldn't cause any > > problems with Bioperl installations. > > > > Lincoln, was there anything to salvage from these? I noticed they > > used SOAP::Lite, so maybe we could convert these over to a SOAP-based > > interface to DBFetch web services? > > > > chris > > > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From bix at sendu.me.uk Fri Feb 16 08:56:50 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 16 Feb 2007 13:56:50 +0000 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts In-Reply-To: <6dce9a0b0702160552l11417675kec7b5b4537cbfa70@mail.gmail.com> References: <45D1E2A5.6060104@sendu.me.uk> <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com> <45D5B42A.1080303@sendu.me.uk> <6dce9a0b0702160552l11417675kec7b5b4537cbfa70@mail.gmail.com> Message-ID: <45D5B822.6080908@sendu.me.uk> Lincoln Stein wrote: > It looks like 2.05 or higher is the Storable version to use. It requires > B::Deparse, which is (I think) standard on perl 5.6 or higher. Thanks, now recommended in Build.PL From cjfields at uiuc.edu Fri Feb 16 09:05:08 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 16 Feb 2007 08:05:08 -0600 Subject: [Bioperl-l] XEMBL deprecation In-Reply-To: <6dce9a0b0702160555s5b4d7d70n936c603bc79dd40d@mail.gmail.com> References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> <6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com> <6dce9a0b0702160555s5b4d7d70n936c603bc79dd40d@mail.gmail.com> Message-ID: I added the warning yesterday. We can add something to the project priority list on modifying XEMBL to use DBFetch instead; I like the SOAP-based interface. I am thinking of a similar interface for NCBI eutils but I haven't had time to work on it. chris On Feb 16, 2007, at 7:55 AM, Lincoln Stein wrote: > Oh, looks like someone has inserted the warnings already. Good. > > Lincoln > > On 2/16/07, Lincoln Stein wrote:I like the idea > of converting these over to use DBFetch's SOAP services. On the > other hand, it isn't llikely that I'm going to have time to do this > anytime soon. > > Probably the best thing to do is to issue a warning and return > undef if someone tries to use othe XEMBL module. I'll make that > change. > > Lincoln > > > On 2/15/07, Chris Fields < cjfields at uiuc.edu> wrote: I have gone > ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService > both for deprecation in the wiki and in CVS (though I haven't set any > timeline): > > http://www.bioperl.org/wiki/Deprecated_modules > > The XEMBL web services are no longer available, and it looks like > everything is running through DBFetch now. The XEMBL tests are > skipped if no server is detected, so they shouldn't cause any > problems with Bioperl installations. > > Lincoln, was there anything to salvage from these? I noticed they > used SOAP::Lite, so maybe we could convert these over to a SOAP-based > interface to DBFetch web services? > > chris > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Fri Feb 16 08:39:54 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 16 Feb 2007 13:39:54 +0000 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts In-Reply-To: <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com> References: <45D1E2A5.6060104@sendu.me.uk> <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com> Message-ID: <45D5B42A.1080303@sendu.me.uk> Lincoln Stein wrote: > Hi, > > Older versions of Storable can't deal with features that contain > subroutine refs. You should get the current version from CPAN. Do you have any idea which version of Storable first supported this? I can specify that version in Bioperl's Build.PL. (else I just just specify the latest version) From eu at otelo-online.de Sat Feb 17 07:55:08 2007 From: eu at otelo-online.de (eu at otelo-online.de) Date: Sat, 17 Feb 2007 13:55:08 +0100 (CET) Subject: [Bioperl-l] Bioperl Module OddCodes(help) Message-ID: <29037001.1171716908969.JavaMail.ngmail@webmail18> Hello @all, i want translate a Sequence in Fasta Format only to acidic,basic and polar dependent on the pH. OddCodes Module can ony to acidic,basic, polar and hydrophobic. And i think on default pH. Can somebody help me? I dont know whether it is possible? Because i need for each amino acid a positive, negative charge and unchargedly. thx Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: g?nstig und schnell mit DSL - das All-Inclusive-Paket f?r clevere Doppel-Sparer, nur 44,85 ? inkl. DSL- und ISDN-Grundgeb?hr! http://www.arcor.de/rd/emf-dsl-2 From The_Polymorph at rocketmail.com Sun Feb 18 14:08:34 2007 From: The_Polymorph at rocketmail.com (Caitlin) Date: Sun, 18 Feb 2007 11:08:34 -0800 (PST) Subject: [Bioperl-l] Missing ppm for Bioperl 1.5.2_100(?) Message-ID: <148421.50501.qm@web50801.mail.yahoo.com> Hi. In an attempt to upgrade my Bioperl install from 1.5.2 RC5 to 1.5.2_100, I noticed the ppm was not found on the activestate repositories. Thanks, ~Caitlin ____________________________________________________________________________________ No need to miss a message. Get email on-the-go with Yahoo! Mail for Mobile. Get started. http://mobile.yahoo.com/mail From bix at sendu.me.uk Sun Feb 18 15:36:03 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 18 Feb 2007 20:36:03 +0000 Subject: [Bioperl-l] Missing ppm for Bioperl 1.5.2_100(?) In-Reply-To: <148421.50501.qm@web50801.mail.yahoo.com> References: <148421.50501.qm@web50801.mail.yahoo.com> Message-ID: <45D8B8B3.4000408@sendu.me.uk> Caitlin wrote: > Hi. > > In an attempt to upgrade my Bioperl install from 1.5.2 RC5 to > 1.5.2_100, I noticed the ppm was not found on the activestate > repositories. Follow the install instructions: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows Its not in the normal activestate repository, but on bioperl.org. From t.nugent at cs.ucl.ac.uk Mon Feb 19 12:29:48 2007 From: t.nugent at cs.ucl.ac.uk (Tim Nugent) Date: Mon, 19 Feb 2007 17:29:48 +0000 Subject: [Bioperl-l] Module to draw transmembrane protein toplogoy Message-ID: <45D9DE8C.2010301@cs.ucl.ac.uk> Hi everyone, I've written a perl module to display transmembrane protein topology using GD. There are various options, including labels, helix/loop dimensions, colour schemes etc but it only requires a string or array containing the protein topology (e.g. transmembrane helix start/stop points). It produces output like this: http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png using the code at the bottom. Here is a the module: http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm I've never submitted anything to Bioperl before - is this sort of thing likely to be of use to others? I imagine it would sit alongside some of the Bio::Graphics stuff. Best wishes, Tim #!/usr/bin/perl use strict; use warnings; use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to module use DrawTransmembrane; my @topology = (20,45,59,70,86,109,145,168,194,220); my %labels = ('5' => '5 - Sulphation Site', '21' => '1st Helix', '47' => '40 - Mutation', '60' => 'Voltage Sensor', '72' => '72 - Mutation 2', '73' => '73 - Mutation 3', '138' => '138 - Glycosylation Site', '170' => '170 - Phosphorylation Site', '200' => 'Last Helix'); my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a cartoon displaying transmembrane helices.', -topology => \@topology, -n_terminal => 'out', -helix_width => 48, -helix_height => 125, -short_loop_limit => 10, -long_loop_limit => 35, -loop_width => 25, -colour_scheme => 'yellow', -labels => \%labels, -text_offset => -10); ## print the .png file my $output = 'test.png'; open(OUTPUT, ">$output"); binmode OUTPUT; print OUTPUT $im->png; close OUTPUT; my $system = `display $output`; -- Tim Nugent (MRes) Research Student Bioinformatics Unit Department of Computer Science University College London Gower Street London WC1E 6BT Tel: 020-7679-0410 t.nugent at ucl.ac.uk From bix at sendu.me.uk Mon Feb 19 12:42:23 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 19 Feb 2007 17:42:23 +0000 Subject: [Bioperl-l] t/FeatureHolder.x Message-ID: <45D9E17F.4030302@sendu.me.uk> Is this supposed to work? It doesn't get run in the test suite normally because of its name. With a live checkout I get: ./Build test --test_files t/FeatureHolder.x --verbose t/FeatureHolder....1..6 ok 1 ok 2 Set group tag to: locus_tag GROUPS: GROUP [?]:source [snip] resolved pair Bio::SeqFeature::Generic=HASH(0x1375dc0) Bio::SeqFeature::Generic=HASH(0x1362830) UNFLATTENING GROUP: GROUP [?]:gene UNFLATTENING GROUP: GROUP [?]:repeat_region UNFLATTENING GROUP: GROUP [?]:gene UNFLATTENING GROUP: GROUP [?]:repeat_region UNFLATTENING GROUP: GROUP [BG:DS07721.3]:gene mRNA CDS UNFLATTENING GROUP: GROUP [BG:DS07721.6]:gene mRNA CDS ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: DUPLICATE ID: AAF53399.1 STACK: Error::throw STACK: Bio::Root::Root::throw /home/sendu/src/bioperl/core/blib/lib/Bio/Root/Root.pm:359 STACK: Bio::SeqFeature::Tools::IDHandler::create_hierarchy_from_ParentIDs /home/sendu/src/bioperl/core/blib/lib/Bio/SeqFeature/Tools/IDHandler.pm:175 STACK: Bio::FeatureHolderI::create_hierarchy_from_ParentIDs /home/sendu/src/bioperl/core/blib/lib/Bio/FeatureHolderI.pm:245 STACK: t/FeatureHolder.x:68 ----------------------------------------------------------- dubious Test returned status 255 (wstat 65280, 0xff00) DIED. FAILED tests 3-6 Failed 4/6 tests, 33.33% okay Failed Test Stat Wstat Total Fail List of Failed ------------------------------------------------------------------------------- t/FeatureHolder.x 255 65280 6 8 3-6 Failed 1/1 test scripts. 4/6 subtests failed. Files=1, Tests=6, 1 wallclock secs ( 0.55 cusr + 0.04 csys = 0.59 CPU) Failed 1/1 test programs. 4/6 subtests failed. It also fails quite differently with 1.5.2. From cjfields at uiuc.edu Mon Feb 19 15:04:20 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 19 Feb 2007 14:04:20 -0600 Subject: [Bioperl-l] t/FeatureHolder.x In-Reply-To: <45D9E17F.4030302@sendu.me.uk> References: <45D9E17F.4030302@sendu.me.uk> Message-ID: <534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu> Looks like that's some of Chris Mungall's stuff for GFF3. Don't know if he's stalking the mail list. Wonder if this has anything to do the feature/annotation changes around rel 1.5. (the other) chris On Feb 19, 2007, at 11:42 AM, Sendu Bala wrote: > Is this supposed to work? It doesn't get run in the test suite > normally > because of its name. > > With a live checkout I get: > ./Build test --test_files t/FeatureHolder.x --verbose > t/FeatureHolder....1..6 ... From cjfields at uiuc.edu Mon Feb 19 16:24:04 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 19 Feb 2007 15:24:04 -0600 Subject: [Bioperl-l] Module to draw transmembrane protein toplogoy In-Reply-To: <45D9DE8C.2010301@cs.ucl.ac.uk> References: <45D9DE8C.2010301@cs.ucl.ac.uk> Message-ID: <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu> I think this is pretty nice! We can add the code and test script to bugzilla and (if someone has time) try to see where it might fit in, though Bio::Graphics sounds like a good spot. Anyone else have ideas on where this could go? chris On Feb 19, 2007, at 11:29 AM, Tim Nugent wrote: > Hi everyone, > > I've written a perl module to display transmembrane protein topology > using GD. There are various options, including labels, helix/loop > dimensions, colour schemes etc but it only requires a string or array > containing the protein topology (e.g. transmembrane helix start/stop > points). It produces output like this: > > http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png > http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png > > using the code at the bottom. > > Here is a the module: > http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm > > I've never submitted anything to Bioperl before - is this sort of > thing > likely to be of use to others? I imagine it would sit alongside > some of > the Bio::Graphics stuff. > > Best wishes, > > Tim > > #!/usr/bin/perl > > use strict; > use warnings; > use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to module > use DrawTransmembrane; > > my @topology = (20,45,59,70,86,109,145,168,194,220); > > my %labels = ('5' => '5 - Sulphation Site', > '21' => '1st Helix', > '47' => '40 - Mutation', > '60' => 'Voltage Sensor', > '72' => '72 - Mutation 2', > '73' => '73 - Mutation 3', > '138' => '138 - Glycosylation Site', > '170' => '170 - Phosphorylation Site', > '200' => 'Last Helix'); > > my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a > cartoon displaying transmembrane helices.', > -topology => > \@topology, > -n_terminal => 'out', > -helix_width => 48, > -helix_height => 125, > -short_loop_limit > => 10, > -long_loop_limit => > 35, > -loop_width => 25, > -colour_scheme => > 'yellow', > -labels => \%labels, > -text_offset => -10); > > ## print the .png file > my $output = 'test.png'; > open(OUTPUT, ">$output"); > binmode OUTPUT; > print OUTPUT $im->png; > close OUTPUT; > > my $system = `display $output`; > > -- > Tim Nugent (MRes) > Research Student > Bioinformatics Unit > Department of Computer Science > University College London > Gower Street > London WC1E 6BT > Tel: 020-7679-0410 > t.nugent at ucl.ac.uk > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjm at fruitfly.org Mon Feb 19 17:23:56 2007 From: cjm at fruitfly.org (Chris Mungall) Date: Mon, 19 Feb 2007 14:23:56 -0800 Subject: [Bioperl-l] t/FeatureHolder.x In-Reply-To: <534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu> References: <45D9E17F.4030302@sendu.me.uk> <534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu> Message-ID: On Feb 19, 2007, at 12:04 PM, Chris Fields wrote: > Looks like that's some of Chris Mungall's stuff for GFF3. Don't know > if he's stalking the mail list. occasionally.. > Wonder if this has anything to do the feature/annotation changes > around rel 1.5. possibly even before then. there was a reason for the .x prefix... I think it was intended to denote requirements; tests that don't pass yet but should in the future anyway, this file can go > (the other) chris > > On Feb 19, 2007, at 11:42 AM, Sendu Bala wrote: > >> Is this supposed to work? It doesn't get run in the test suite >> normally >> because of its name. >> >> With a live checkout I get: >> ./Build test --test_files t/FeatureHolder.x --verbose >> t/FeatureHolder....1..6 > ... > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From torsten.seemann at infotech.monash.edu.au Mon Feb 19 18:20:48 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 20 Feb 2007 10:20:48 +1100 Subject: [Bioperl-l] Bioperl Module OddCodes(help) In-Reply-To: <29037001.1171716908969.JavaMail.ngmail@webmail18> References: <29037001.1171716908969.JavaMail.ngmail@webmail18> Message-ID: > i want translate a Sequence in Fasta Format only to acidic,basic and polar dependent on the pH. > OddCodes Module can ony to acidic,basic, polar and hydrophobic. And i think on default pH. > Can somebody help me? I dont know whether it is possible? > Because i need for each amino acid a positive, negative charge and unchargedly. The latest released Bioperl 1.5.x has a charge() function which does what you want: http://doc.bioperl.org/releases/bioperl-1.5.2/Bio/Tools/OddCodes.html It returns A, N, C for the charges. --Torsten From bix at sendu.me.uk Tue Feb 20 06:18:14 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 20 Feb 2007 11:18:14 +0000 Subject: [Bioperl-l] Bio::Graphics::FeatureBase seq_id question Message-ID: <45DAD8F6.1030409@sendu.me.uk> Bio::Graphics::FeatureBase::seq_id is currently implemented as a read-only alias to ref(): sub seq_id { shift->ref() } What is the reasoning behind this? Can it be made to handle setting of the value as well?: sub seq_id { shift->ref(@_) } Cheers, Sendu. From cjfields at uiuc.edu Tue Feb 20 08:39:11 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Feb 2007 07:39:11 -0600 Subject: [Bioperl-l] t/FeatureHolder.x In-Reply-To: References: <45D9E17F.4030302@sendu.me.uk> <534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu> Message-ID: <67E26F10-67D5-405E-A00E-826EF51C476F@uiuc.edu> On Feb 19, 2007, at 4:23 PM, Chris Mungall wrote: > On Feb 19, 2007, at 12:04 PM, Chris Fields wrote: > >> Looks like that's some of Chris Mungall's stuff for GFF3. Don't know >> if he's stalking the mail list. > > occasionally.. > >> Wonder if this has anything to do the feature/annotation changes >> around rel 1.5. > > possibly even before then. > > there was a reason for the .x prefix... I think it was intended to > denote requirements; tests that don't pass yet but should in the > future > > anyway, this file can go Chris, I removed it from CVS. Thanks! (the other) chris besides chris D. P.S. I may have some Data::Stag questions for you at some point. I'm guessing you're still at fruitfly.org? From cjfields at uiuc.edu Tue Feb 20 08:29:20 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Feb 2007 07:29:20 -0600 Subject: [Bioperl-l] Fwd: help on remote blast References: <20070220073200.M42567@bic.boseinst.ernet.in> Message-ID: <6CC54E14-0581-45AF-8F12-E500A2FFDE86@uiuc.edu> Sanjib, You shouldn't email the developers directly. Questions like this should go to the bioperl mail list in case I (or others) can't answer them immediately. chris Begin forwarded message: > From: "Sanjib Kumar Gupta" > Date: February 20, 2007 1:32:00 AM CST > To: cjfields at uiuc.edu > Subject: help on remote blast > > Dear Dr. Chris > I am very new usedr to bioperl. and have been using the script for > retrieving some blast sequences . But suddenly it has stopped > retrieving > #perl n9.pl > te.pep > waiting........ > for a long time > > I am attaching the file. Can you please tell me what I should do so > that it > again runs. > > > -- > Sanjib Kumar Gupta > Bioinformatics Centre > Bose Institute > Kolkata 700054, INDIA > Phone : +91-33-2355 6626, 2816, 2355 4766 > Fax : +91-33-2355 3886 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: n9.pl URL: -------------- next part -------------- Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From t.nugent at cs.ucl.ac.uk Tue Feb 20 09:31:20 2007 From: t.nugent at cs.ucl.ac.uk (Tim Nugent) Date: Tue, 20 Feb 2007 14:31:20 +0000 Subject: [Bioperl-l] Module to draw transmembrane protein topology In-Reply-To: <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu> References: <45D9DE8C.2010301@cs.ucl.ac.uk> <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu> Message-ID: <45DB0638.1030001@cs.ucl.ac.uk> Thanks Chris, glad it's appreciated. Is there anything else I can do? If anyone has any requests/suggestions please let me know too. Best wishes, Tim Chris Fields wrote: > I think this is pretty nice! We can add the code and test script to > bugzilla and (if someone has time) try to see where it might fit in, > though Bio::Graphics sounds like a good spot. > > Anyone else have ideas on where this could go? > > chris > > On Feb 19, 2007, at 11:29 AM, Tim Nugent wrote: > >> Hi everyone, >> >> I've written a perl module to display transmembrane protein topology >> using GD. There are various options, including labels, helix/loop >> dimensions, colour schemes etc but it only requires a string or array >> containing the protein topology (e.g. transmembrane helix start/stop >> points). It produces output like this: >> >> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png >> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png >> >> using the code at the bottom. >> >> Here is a the module: >> http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm >> >> I've never submitted anything to Bioperl before - is this sort of >> thing >> likely to be of use to others? I imagine it would sit alongside >> some of >> the Bio::Graphics stuff. >> >> Best wishes, >> >> Tim >> >> #!/usr/bin/perl >> >> use strict; >> use warnings; >> use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to module >> use DrawTransmembrane; >> >> my @topology = (20,45,59,70,86,109,145,168,194,220); >> >> my %labels = ('5' => '5 - Sulphation Site', >> '21' => '1st Helix', >> '47' => '40 - Mutation', >> '60' => 'Voltage Sensor', >> '72' => '72 - Mutation 2', >> '73' => '73 - Mutation 3', >> '138' => '138 - Glycosylation Site', >> '170' => '170 - Phosphorylation Site', >> '200' => 'Last Helix'); >> >> my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a >> cartoon displaying transmembrane helices.', >> -topology => >> \@topology, >> -n_terminal => 'out', >> -helix_width => 48, >> -helix_height => 125, >> -short_loop_limit >> => 10, >> -long_loop_limit => >> 35, >> -loop_width => 25, >> -colour_scheme => >> 'yellow', >> -labels => \%labels, >> -text_offset => -10); >> >> ## print the .png file >> my $output = 'test.png'; >> open(OUTPUT, ">$output"); >> binmode OUTPUT; >> print OUTPUT $im->png; >> close OUTPUT; >> >> my $system = `display $output`; >> >> -- >> Tim Nugent (MRes) >> Research Student >> Bioinformatics Unit >> Department of Computer Science >> University College London >> Gower Street >> London WC1E 6BT >> Tel: 020-7679-0410 >> t.nugent at ucl.ac.uk >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Tim Nugent (MRes) Research Student Bioinformatics Unit Department of Computer Science University College London Gower Street London WC1E 6BT Tel: 020-7679-0410 t.nugent at ucl.ac.uk From marian.thieme at lycos.de Tue Feb 20 08:34:24 2007 From: marian.thieme at lycos.de (marian thieme) Date: Tue, 20 Feb 2007 13:34:24 +0000 Subject: [Bioperl-l] Alignment Message-ID: <188661178021328@lycos-europe.com> Hi all, perhaps somebody can give some comments in the following matter: I have a series of sequences which should be aligned against a reference sequence. In this special case we dont need to calculate anything, we only need to represent the sequences and get for instance some columns of interest. The problem now is, that some sequences have gaps and we need to represent gaps in the rewference sequence as well as in some individual sequences. Question: Can I use LocatableSeq to describe sequences with gaps and to add the sequence to the alignment ? If yes how I have to understand the example in the doc: use Bio::LocatableSeq; my $seq = new Bio::LocatableSeq(-seq => "CAGT-GGT",-id => "seq1", -start => 1,-end => 7); Does the "-" sign represents a gap ? When this sequence starts at position 1 why it ends at position 7, because when considering the gap, there are 8 positions. Does the SimpleAlign object can treat the gap ? Thanks for your attention, Marian Benachrichtigung bei E-Mail Empfang! - http://mail.lycos.de/app/lycosinside/setupLI.exe From cjfields at uiuc.edu Tue Feb 20 09:40:38 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Feb 2007 08:40:38 -0600 Subject: [Bioperl-l] Module to draw transmembrane protein topology In-Reply-To: <45DB0638.1030001@cs.ucl.ac.uk> References: <45D9DE8C.2010301@cs.ucl.ac.uk> <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu> <45DB0638.1030001@cs.ucl.ac.uk> Message-ID: You can add the module and test code (the script) to bugzilla: http://www.bioperl.org/wiki/Bugs http://bugzilla.open-bio.org/ Basically file a new bug report but note that it in an enhancement request when filling it out. Attach the code and test script to the report after it is generated (note that it may be easier to add all of the files together as a zipped archive). I think you could also add the graphical output as a binary file if they are huge files. chris On Feb 20, 2007, at 8:31 AM, Tim Nugent wrote: > Thanks Chris, glad it's appreciated. > > Is there anything else I can do? If anyone has any requests/ > suggestions please let me know too. > > Best wishes, > > Tim > > Chris Fields wrote: >> I think this is pretty nice! We can add the code and test script >> to bugzilla and (if someone has time) try to see where it might >> fit in, though Bio::Graphics sounds like a good spot. >> Anyone else have ideas on where this could go? >> chris >> On Feb 19, 2007, at 11:29 AM, Tim Nugent wrote: >>> Hi everyone, >>> >>> I've written a perl module to display transmembrane protein topology >>> using GD. There are various options, including labels, helix/loop >>> dimensions, colour schemes etc but it only requires a string or >>> array >>> containing the protein topology (e.g. transmembrane helix start/stop >>> points). It produces output like this: >>> >>> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png >>> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png >>> >>> using the code at the bottom. >>> >>> Here is a the module: >>> http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm >>> >>> I've never submitted anything to Bioperl before - is this sort >>> of thing >>> likely to be of use to others? I imagine it would sit alongside >>> some of >>> the Bio::Graphics stuff. >>> >>> Best wishes, >>> >>> Tim >>> >>> #!/usr/bin/perl >>> >>> use strict; >>> use warnings; >>> use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to >>> module >>> use DrawTransmembrane; >>> >>> my @topology = (20,45,59,70,86,109,145,168,194,220); >>> >>> my %labels = ('5' => '5 - Sulphation Site', >>> '21' => '1st Helix', >>> '47' => '40 - Mutation', >>> '60' => 'Voltage Sensor', >>> '72' => '72 - Mutation 2', >>> '73' => '73 - Mutation 3', >>> '138' => '138 - Glycosylation Site', >>> '170' => '170 - Phosphorylation Site', >>> '200' => 'Last Helix'); >>> >>> my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a >>> cartoon displaying transmembrane helices.', >>> -topology => >>> \@topology, >>> -n_terminal => >>> 'out', >>> -helix_width => 48, >>> -helix_height => >>> 125, >>> - >>> short_loop_limit => 10, >>> -long_loop_limit >>> => 35, >>> -loop_width => 25, >>> -colour_scheme >>> => 'yellow', >>> -labels => \%labels, >>> -text_offset => >>> -10); >>> >>> ## print the .png file >>> my $output = 'test.png'; >>> open(OUTPUT, ">$output"); >>> binmode OUTPUT; >>> print OUTPUT $im->png; >>> close OUTPUT; >>> >>> my $system = `display $output`; >>> >>> -- >>> Tim Nugent (MRes) >>> Research Student >>> Bioinformatics Unit >>> Department of Computer Science >>> University College London >>> Gower Street >>> London WC1E 6BT >>> Tel: 020-7679-0410 >>> t.nugent at ucl.ac.uk >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Tim Nugent (MRes) > Research Student > Bioinformatics Unit > Department of Computer Science > University College London > Gower Street > London WC1E 6BT > Tel: 020-7679-0410 > t.nugent at ucl.ac.uk Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From avilella at gmail.com Tue Feb 20 10:30:17 2007 From: avilella at gmail.com (Albert Vilella) Date: Tue, 20 Feb 2007 15:30:17 +0000 Subject: [Bioperl-l] Alignment In-Reply-To: <188661178021328@lycos-europe.com> References: <188661178021328@lycos-europe.com> Message-ID: <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com> I think the SimpleAlign object contains a set of sequences, each of which is a LocatableSeq object. These LocatableSeq objects will have gaps, represented by '-' or whatever other symbol is specified (I think there are methods for it), and then one can use methods like column_from_residue_number to map the coordinates between the primary sequence and the aligned sequence. The perldoc for LocatableSeq has some examples on how to use these methods. [Hopefully I haven't written any lie in this message], Cheers, Albert. On 2/20/07, marian thieme wrote: > Hi all, > > perhaps somebody can give some comments in the following matter: > > I have a series of sequences which should be aligned against a reference sequence. > In this special case we dont need to calculate anything, we only need to represent the sequences and get for instance some columns of interest. > The problem now is, that some sequences have gaps and we need to represent gaps in the rewference sequence as well as in some individual sequences. > > Question: Can I use LocatableSeq to describe sequences with gaps and to add the sequence to the alignment ? > If yes how I have to understand the example in the doc: > use Bio::LocatableSeq; > my $seq = new Bio::LocatableSeq(-seq => "CAGT-GGT",-id => "seq1", -start => 1,-end => 7); > > Does the "-" sign represents a gap ? When this sequence starts at position 1 > why it ends at position 7, because when considering the gap, there are 8 positions. > Does the SimpleAlign object can treat the gap ? > > > Thanks for your attention, > Marian > > Benachrichtigung bei E-Mail Empfang! - http://mail.lycos.de/app/lycosinside/setupLI.exe > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Tue Feb 20 10:30:15 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Feb 2007 09:30:15 -0600 Subject: [Bioperl-l] Module to draw transmembrane protein topology In-Reply-To: <45DB0638.1030001@cs.ucl.ac.uk> References: <45D9DE8C.2010301@cs.ucl.ac.uk> <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu> <45DB0638.1030001@cs.ucl.ac.uk> Message-ID: <4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu> Sorry, I sent that last one off prematurely. I could see this being used as a very useful utility if a Bioperl object had SeqFeatures which described transmembrane regions, or if output from something like TMHMM were parsed and used for input. Don't know if it's included, but if not you probably should allow labeling of the intracellular/extracellular space to designate periplasmic space, mitochondrial matrix, thylakoid, etc. I think Bio::Graphics namespace is definitely the place to go. If I ever get around to writing up the RNA structural stuff I may put something there myself. chris On Feb 20, 2007, at 8:31 AM, Tim Nugent wrote: > Thanks Chris, glad it's appreciated. > > Is there anything else I can do? If anyone has any requests/ > suggestions > please let me know too. > > Best wishes, > > Tim From cjfields at uiuc.edu Tue Feb 20 10:49:56 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Feb 2007 09:49:56 -0600 Subject: [Bioperl-l] Alignment In-Reply-To: <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com> References: <188661178021328@lycos-europe.com> <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com> Message-ID: <97E36074-1CF4-4348-85AB-DF23F1048727@uiuc.edu> On Feb 20, 2007, at 9:30 AM, Albert Vilella wrote: > I think the SimpleAlign object contains a set of sequences, each of > which is a LocatableSeq object. > > These LocatableSeq objects will have gaps, represented by '-' or > whatever other symbol is specified (I think there are methods for it), > and then one can use methods like column_from_residue_number to map > the coordinates between the primary sequence and the aligned sequence. > The perldoc for LocatableSeq has some examples on how to use these > methods. > > [Hopefully I haven't written any lie in this message], > > Cheers, > > Albert. No lies. The comparison methods are in SimpleAlign; if you look in SimpleAlign.t you'll see several demos on how to go abouot adding LocatableSeqs to a SimpleAlign object and then use SimpleAlign methods for them. chris PS (to marian): I'm a bit behind this week, so the bracket_strings stuff is lagging behind; I'm writing up some stuff on a deadline. From t.nugent at cs.ucl.ac.uk Tue Feb 20 10:50:10 2007 From: t.nugent at cs.ucl.ac.uk (Tim Nugent) Date: Tue, 20 Feb 2007 15:50:10 +0000 Subject: [Bioperl-l] Module to draw transmembrane protein topology In-Reply-To: <4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu> References: <45D9DE8C.2010301@cs.ucl.ac.uk> <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu> <45DB0638.1030001@cs.ucl.ac.uk> <4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu> Message-ID: <45DB18B2.8070004@cs.ucl.ac.uk> Labeling of inside/outside and membrane is already possible via -inside_label, -outside_label and -membrane_label tags, defaults are intracellular, extracellular and plasma membrane. Was definitely going to add an input/parser for MEMSAT, developed here at UCL, and probably a few other popular TM predictors too, e.g. PHOBIUS, TMHMM etc. Can already accept topology in the string format used by OPM (http://opm.phar.umich.edu/). Tim Chris Fields wrote: > Sorry, I sent that last one off prematurely. > > I could see this being used as a very useful utility if a Bioperl object > had SeqFeatures which described transmembrane regions, or if output from > something like TMHMM were parsed and used for input. Don't know if it's > included, but if not you probably should allow labeling of the > intracellular/extracellular space to designate periplasmic space, > mitochondrial matrix, thylakoid, etc. > > I think Bio::Graphics namespace is definitely the place to go. If I > ever get around to writing up the RNA structural stuff I may put > something there myself. > > chris > > On Feb 20, 2007, at 8:31 AM, Tim Nugent wrote: > >> Thanks Chris, glad it's appreciated. >> >> Is there anything else I can do? If anyone has any requests/suggestions >> please let me know too. >> >> Best wishes, >> >> Tim > > -- Tim Nugent (MRes) Research Student Bioinformatics Unit Department of Computer Science University College London Gower Street London WC1E 6BT Tel: 020-7679-0410 t.nugent at ucl.ac.uk From cjfields at uiuc.edu Tue Feb 20 11:09:00 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Feb 2007 10:09:00 -0600 Subject: [Bioperl-l] Module to draw transmembrane protein topology In-Reply-To: <45DB18B2.8070004@cs.ucl.ac.uk> References: <45D9DE8C.2010301@cs.ucl.ac.uk> <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu> <45DB0638.1030001@cs.ucl.ac.uk> <4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu> <45DB18B2.8070004@cs.ucl.ac.uk> Message-ID: On Feb 20, 2007, at 9:50 AM, Tim Nugent wrote: > Labeling of inside/outside and membrane is already possible via - > inside_label, -outside_label and -membrane_label tags, defaults are > intracellular, extracellular and plasma membrane. > > Was definitely going to add an input/parser for MEMSAT, developed > here at UCL, and probably a few other popular TM predictors too, > e.g. PHOBIUS, TMHMM etc. Can already accept topology in the string > format used by OPM (http://opm.phar.umich.edu/). > > Tim I'll definitely have to take a closer look at it when I have time. My guess is the best fit for data would be a seqfeatures, either in a collection or a Bio::Seq. As for the parsers you can look at the Bio::Tools::Tmhmm module, which scans Tmhmm output and converts everything to seqfeatures. chris From lstein at cshl.edu Tue Feb 20 12:25:24 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 20 Feb 2007 12:25:24 -0500 Subject: [Bioperl-l] Bio::Graphics::FeatureBase seq_id question In-Reply-To: <45DAD8F6.1030409@sendu.me.uk> References: <45DAD8F6.1030409@sendu.me.uk> Message-ID: <6dce9a0b0702200925g74d2db53j3252cca8a41765b@mail.gmail.com> Just an oversight. I'll fix it. Lincoln On 2/20/07, Sendu Bala wrote: > > Bio::Graphics::FeatureBase::seq_id is currently implemented as a > read-only alias to ref(): > sub seq_id { shift->ref() } > > > What is the reasoning behind this? Can it be made to handle setting of > the value as well?: > sub seq_id { shift->ref(@_) } > > > Cheers, > Sendu. > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From khan at cshl.edu Tue Feb 20 15:42:12 2007 From: khan at cshl.edu (Khan, Sohail) Date: Tue, 20 Feb 2007 15:42:12 -0500 Subject: [Bioperl-l] parsing a list of ids to a fasta file. Message-ID: Dear list, I am new to Bio-Perl. I have the following question: I have a list of ids, which I would like to parse against a large fasta file to retrieve the Seq for parsed ids. I appreciate any suggestions. Thanks. Khan From michael.watson at bbsrc.ac.uk Tue Feb 20 16:33:19 2007 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue, 20 Feb 2007 21:33:19 -0000 Subject: [Bioperl-l] parsing a list of ids to a fasta file. References: Message-ID: <8975119BCD0AC5419D61A9CF1A923E95020680FD@iahce2ksrv1.iah.bbsrc.ac.uk> Suggest you use Bio::Index::Fasta to create an index for the fasta file and then a simple script to retrieve sequences using that index. Or just use the pre-written bp_index.PLS and bp_fetch.PLS scripts. http://www.bioperl.org/wiki/Module:Bio::Index::Fasta ________________________________ From: bioperl-l-bounces at lists.open-bio.org on behalf of Khan, Sohail Sent: Tue 20/02/2007 8:42 PM To: Bioperl-l at lists.open-bio.org Subject: [Bioperl-l] parsing a list of ids to a fasta file. Dear list, I am new to Bio-Perl. I have the following question: I have a list of ids, which I would like to parse against a large fasta file to retrieve the Seq for parsed ids. I appreciate any suggestions. Thanks. Khan _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From neetisomaiya at gmail.com Wed Feb 21 03:19:14 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Wed, 21 Feb 2007 13:49:14 +0530 Subject: [Bioperl-l] need help in Bio-SCF Message-ID: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> Hi All, I downloaded module Bio-SCF-1.01from CPAN. And I am trying to install it when I got the following error. Can someone please guide me. [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL Checking if your kit is complete... Looks good Note (probably harmless): No library found for -lread Writing Makefile for Bio::SCF [root at ps2288 Bio-SCF-1.01]# make cp SCF.pm blib/lib/Bio/SCF.pm cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp -typemap /usr/lib/perl5/5.8.5/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc SCF.c Please specify prototyping behavior for SCF.xs (see perlxs manual) gcc -c -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386 -mtune=pentium4 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE" -DLITTLE_ENDIAN SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory SCF.xs:13:26: io_lib/mFILE.h: No such file or directory SCF.xs: In function `XS_Bio__SCF_get_scf_pointer': SCF.xs:27: error: `Scf' undeclared (first use in this function) SCF.xs:27: error: (Each undeclared identifier is reported only once SCF.xs:27: error: for each function it appears in.) SCF.xs:27: error: `scf_data' undeclared (first use in this function) SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer': SCF.xs:66: error: `Scf' undeclared (first use in this function) SCF.xs:66: error: `scf_data' undeclared (first use in this function) SCF.xs:68: error: `mFILE' undeclared (first use in this function) SCF.xs:68: error: `mf' undeclared (first use in this function) SCF.xs: In function `XS_Bio__SCF_scf_free': SCF.xs:89: error: `Scf' undeclared (first use in this function) SCF.xs:89: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_get_comments': SCF.xs:95: error: `Scf' undeclared (first use in this function) SCF.xs:95: error: `scf_data' undeclared (first use in this function) SCF.xs:95: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_set_comments': SCF.xs:108: error: `Scf' undeclared (first use in this function) SCF.xs:108: error: `scf_data' undeclared (first use in this function) SCF.xs:108: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_scf_write': SCF.xs:121: error: `Scf' undeclared (first use in this function) SCF.xs:121: error: `scf_data' undeclared (first use in this function) SCF.xs:121: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_scf_fwrite': SCF.xs:135: error: `mFILE' undeclared (first use in this function) SCF.xs:135: error: `mf' undeclared (first use in this function) SCF.xs:137: error: `Scf' undeclared (first use in this function) SCF.xs:137: error: `scf_data' undeclared (first use in this function) SCF.xs:137: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_get_from_header': SCF.xs:159: error: `Scf' undeclared (first use in this function) SCF.xs:159: error: `scf_data' undeclared (first use in this function) SCF.xs:159: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_get_at': SCF.xs:186: error: `Scf' undeclared (first use in this function) SCF.xs:186: error: `scf_data' undeclared (first use in this function) SCF.xs:186: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_set_base_at': SCF.xs:242: error: `Scf' undeclared (first use in this function) SCF.xs:242: error: `scf_data' undeclared (first use in this function) SCF.xs:242: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_set_at': SCF.xs:255: error: `Scf' undeclared (first use in this function) SCF.xs:255: error: `scf_data' undeclared (first use in this function) SCF.xs:255: error: syntax error before ')' token make: *** [SCF.o] Error 1 -- -Neeti Even my blood says, B positive From neetisomaiya at gmail.com Wed Feb 21 03:19:14 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Wed, 21 Feb 2007 13:49:14 +0530 Subject: [Bioperl-l] need help in Bio-SCF Message-ID: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> Hi All, I downloaded module Bio-SCF-1.01from CPAN. And I am trying to install it when I got the following error. Can someone please guide me. [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL Checking if your kit is complete... Looks good Note (probably harmless): No library found for -lread Writing Makefile for Bio::SCF [root at ps2288 Bio-SCF-1.01]# make cp SCF.pm blib/lib/Bio/SCF.pm cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp -typemap /usr/lib/perl5/5.8.5/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc SCF.c Please specify prototyping behavior for SCF.xs (see perlxs manual) gcc -c -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386 -mtune=pentium4 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE" -DLITTLE_ENDIAN SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory SCF.xs:13:26: io_lib/mFILE.h: No such file or directory SCF.xs: In function `XS_Bio__SCF_get_scf_pointer': SCF.xs:27: error: `Scf' undeclared (first use in this function) SCF.xs:27: error: (Each undeclared identifier is reported only once SCF.xs:27: error: for each function it appears in.) SCF.xs:27: error: `scf_data' undeclared (first use in this function) SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer': SCF.xs:66: error: `Scf' undeclared (first use in this function) SCF.xs:66: error: `scf_data' undeclared (first use in this function) SCF.xs:68: error: `mFILE' undeclared (first use in this function) SCF.xs:68: error: `mf' undeclared (first use in this function) SCF.xs: In function `XS_Bio__SCF_scf_free': SCF.xs:89: error: `Scf' undeclared (first use in this function) SCF.xs:89: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_get_comments': SCF.xs:95: error: `Scf' undeclared (first use in this function) SCF.xs:95: error: `scf_data' undeclared (first use in this function) SCF.xs:95: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_set_comments': SCF.xs:108: error: `Scf' undeclared (first use in this function) SCF.xs:108: error: `scf_data' undeclared (first use in this function) SCF.xs:108: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_scf_write': SCF.xs:121: error: `Scf' undeclared (first use in this function) SCF.xs:121: error: `scf_data' undeclared (first use in this function) SCF.xs:121: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_scf_fwrite': SCF.xs:135: error: `mFILE' undeclared (first use in this function) SCF.xs:135: error: `mf' undeclared (first use in this function) SCF.xs:137: error: `Scf' undeclared (first use in this function) SCF.xs:137: error: `scf_data' undeclared (first use in this function) SCF.xs:137: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_get_from_header': SCF.xs:159: error: `Scf' undeclared (first use in this function) SCF.xs:159: error: `scf_data' undeclared (first use in this function) SCF.xs:159: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_get_at': SCF.xs:186: error: `Scf' undeclared (first use in this function) SCF.xs:186: error: `scf_data' undeclared (first use in this function) SCF.xs:186: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_set_base_at': SCF.xs:242: error: `Scf' undeclared (first use in this function) SCF.xs:242: error: `scf_data' undeclared (first use in this function) SCF.xs:242: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_set_at': SCF.xs:255: error: `Scf' undeclared (first use in this function) SCF.xs:255: error: `scf_data' undeclared (first use in this function) SCF.xs:255: error: syntax error before ')' token make: *** [SCF.o] Error 1 -- -Neeti Even my blood says, B positive From sdavis2 at mail.nih.gov Wed Feb 21 06:17:50 2007 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 21 Feb 2007 06:17:50 -0500 Subject: [Bioperl-l] need help in Bio-SCF In-Reply-To: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> Message-ID: <200702210617.50616.sdavis2@mail.nih.gov> On Wednesday 21 February 2007 03:19, neeti somaiya wrote: > Hi All, > > I downloaded module > Bio-SCF-1.01from CPAN. > And I am trying to install it when I got the following error. Can someone > please guide me. You will probably need to read the INSTALL document. You need to install a couple of libraries first. Looks like you don't have the staden io-lib installed. > [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL > Checking if your kit is complete... > Looks good > Note (probably harmless): No library found for -lread > Writing Makefile for Bio::SCF > > [root at ps2288 Bio-SCF-1.01]# make > cp SCF.pm blib/lib/Bio/SCF.pm > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm > /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp -typemap > /usr/lib/perl5/5.8.5/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc SCF.c > Please specify prototyping behavior for SCF.xs (see perlxs manual) > gcc -c -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386 > -mtune=pentium4 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC > "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE" -DLITTLE_ENDIAN > SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory > SCF.xs:13:26: io_lib/mFILE.h: No such file or directory > SCF.xs: In function `XS_Bio__SCF_get_scf_pointer': > SCF.xs:27: error: `Scf' undeclared (first use in this function) > SCF.xs:27: error: (Each undeclared identifier is reported only once > SCF.xs:27: error: for each function it appears in.) > SCF.xs:27: error: `scf_data' undeclared (first use in this function) > SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer': > SCF.xs:66: error: `Scf' undeclared (first use in this function) > SCF.xs:66: error: `scf_data' undeclared (first use in this function) > SCF.xs:68: error: `mFILE' undeclared (first use in this function) > SCF.xs:68: error: `mf' undeclared (first use in this function) > SCF.xs: In function `XS_Bio__SCF_scf_free': > SCF.xs:89: error: `Scf' undeclared (first use in this function) > SCF.xs:89: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_get_comments': > SCF.xs:95: error: `Scf' undeclared (first use in this function) > SCF.xs:95: error: `scf_data' undeclared (first use in this function) > SCF.xs:95: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_set_comments': > SCF.xs:108: error: `Scf' undeclared (first use in this function) > SCF.xs:108: error: `scf_data' undeclared (first use in this function) > SCF.xs:108: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_scf_write': > SCF.xs:121: error: `Scf' undeclared (first use in this function) > SCF.xs:121: error: `scf_data' undeclared (first use in this function) > SCF.xs:121: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_scf_fwrite': > SCF.xs:135: error: `mFILE' undeclared (first use in this function) > SCF.xs:135: error: `mf' undeclared (first use in this function) > SCF.xs:137: error: `Scf' undeclared (first use in this function) > SCF.xs:137: error: `scf_data' undeclared (first use in this function) > SCF.xs:137: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_get_from_header': > SCF.xs:159: error: `Scf' undeclared (first use in this function) > SCF.xs:159: error: `scf_data' undeclared (first use in this function) > SCF.xs:159: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_get_at': > SCF.xs:186: error: `Scf' undeclared (first use in this function) > SCF.xs:186: error: `scf_data' undeclared (first use in this function) > SCF.xs:186: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_set_base_at': > SCF.xs:242: error: `Scf' undeclared (first use in this function) > SCF.xs:242: error: `scf_data' undeclared (first use in this function) > SCF.xs:242: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_set_at': > SCF.xs:255: error: `Scf' undeclared (first use in this function) > SCF.xs:255: error: `scf_data' undeclared (first use in this function) > SCF.xs:255: error: syntax error before ')' token > make: *** [SCF.o] Error 1 From sdavis2 at mail.nih.gov Wed Feb 21 06:17:50 2007 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 21 Feb 2007 06:17:50 -0500 Subject: [Bioperl-l] need help in Bio-SCF In-Reply-To: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> Message-ID: <200702210617.50616.sdavis2@mail.nih.gov> On Wednesday 21 February 2007 03:19, neeti somaiya wrote: > Hi All, > > I downloaded module > Bio-SCF-1.01from CPAN. > And I am trying to install it when I got the following error. Can someone > please guide me. You will probably need to read the INSTALL document. You need to install a couple of libraries first. Looks like you don't have the staden io-lib installed. > [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL > Checking if your kit is complete... > Looks good > Note (probably harmless): No library found for -lread > Writing Makefile for Bio::SCF > > [root at ps2288 Bio-SCF-1.01]# make > cp SCF.pm blib/lib/Bio/SCF.pm > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm > /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp -typemap > /usr/lib/perl5/5.8.5/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc SCF.c > Please specify prototyping behavior for SCF.xs (see perlxs manual) > gcc -c -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386 > -mtune=pentium4 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC > "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE" -DLITTLE_ENDIAN > SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory > SCF.xs:13:26: io_lib/mFILE.h: No such file or directory > SCF.xs: In function `XS_Bio__SCF_get_scf_pointer': > SCF.xs:27: error: `Scf' undeclared (first use in this function) > SCF.xs:27: error: (Each undeclared identifier is reported only once > SCF.xs:27: error: for each function it appears in.) > SCF.xs:27: error: `scf_data' undeclared (first use in this function) > SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer': > SCF.xs:66: error: `Scf' undeclared (first use in this function) > SCF.xs:66: error: `scf_data' undeclared (first use in this function) > SCF.xs:68: error: `mFILE' undeclared (first use in this function) > SCF.xs:68: error: `mf' undeclared (first use in this function) > SCF.xs: In function `XS_Bio__SCF_scf_free': > SCF.xs:89: error: `Scf' undeclared (first use in this function) > SCF.xs:89: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_get_comments': > SCF.xs:95: error: `Scf' undeclared (first use in this function) > SCF.xs:95: error: `scf_data' undeclared (first use in this function) > SCF.xs:95: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_set_comments': > SCF.xs:108: error: `Scf' undeclared (first use in this function) > SCF.xs:108: error: `scf_data' undeclared (first use in this function) > SCF.xs:108: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_scf_write': > SCF.xs:121: error: `Scf' undeclared (first use in this function) > SCF.xs:121: error: `scf_data' undeclared (first use in this function) > SCF.xs:121: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_scf_fwrite': > SCF.xs:135: error: `mFILE' undeclared (first use in this function) > SCF.xs:135: error: `mf' undeclared (first use in this function) > SCF.xs:137: error: `Scf' undeclared (first use in this function) > SCF.xs:137: error: `scf_data' undeclared (first use in this function) > SCF.xs:137: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_get_from_header': > SCF.xs:159: error: `Scf' undeclared (first use in this function) > SCF.xs:159: error: `scf_data' undeclared (first use in this function) > SCF.xs:159: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_get_at': > SCF.xs:186: error: `Scf' undeclared (first use in this function) > SCF.xs:186: error: `scf_data' undeclared (first use in this function) > SCF.xs:186: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_set_base_at': > SCF.xs:242: error: `Scf' undeclared (first use in this function) > SCF.xs:242: error: `scf_data' undeclared (first use in this function) > SCF.xs:242: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_set_at': > SCF.xs:255: error: `Scf' undeclared (first use in this function) > SCF.xs:255: error: `scf_data' undeclared (first use in this function) > SCF.xs:255: error: syntax error before ')' token > make: *** [SCF.o] Error 1 From cjfields at uiuc.edu Wed Feb 21 07:08:57 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 21 Feb 2007 06:08:57 -0600 Subject: [Bioperl-l] need help in Bio-SCF In-Reply-To: <200702210617.50616.sdavis2@mail.nih.gov> References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> <200702210617.50616.sdavis2@mail.nih.gov> Message-ID: <40C288FE-C74C-4B3F-A835-1A5C563B2B8E@uiuc.edu> On Feb 21, 2007, at 5:17 AM, Sean Davis wrote: > On Wednesday 21 February 2007 03:19, neeti somaiya wrote: >> Hi All, >> >> I downloaded module >> Bio-SCF-1.01from CPAN. >> And I am trying to install it when I got the following error. Can >> someone >> please guide me. > > You will probably need to read the INSTALL document. You need to > install a > couple of libraries first. Looks like you don't have the staden io- > lib > installed. Just to note, this module isn't part of BioPerl (I don't even think it has a Bioperl interface). You'll probably need to contact Lincoln for details on using this module. One thing you may run into is errors with the version of io_lib installed (a problem I've encountered with bioperl-ext), probably from API changes. If you run into problems with newer versions of io_lib you should try downgrading to io_lib 1.8.11 or 1.8.12. From neetisomaiya at gmail.com Wed Feb 21 07:25:26 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Wed, 21 Feb 2007 17:55:26 +0530 Subject: [Bioperl-l] need help in Bio-SCF In-Reply-To: <200702210617.50616.sdavis2@mail.nih.gov> References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> <200702210617.50616.sdavis2@mail.nih.gov> Message-ID: <764978cf0702210425j544330bbr12c86a89960dbb66@mail.gmail.com> Thanks. It resolved my problem. On 2/21/07, Sean Davis wrote: > > On Wednesday 21 February 2007 03:19, neeti somaiya wrote: > > Hi All, > > > > I downloaded module > > Bio-SCF-1.01from CPAN. > > And I am trying to install it when I got the following error. Can > someone > > please guide me. > > You will probably need to read the INSTALL document. You need to install > a > couple of libraries first. Looks like you don't have the staden io-lib > installed. > > > > [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL > > Checking if your kit is complete... > > Looks good > > Note (probably harmless): No library found for -lread > > Writing Makefile for Bio::SCF > > > > [root at ps2288 Bio-SCF-1.01]# make > > cp SCF.pm blib/lib/Bio/SCF.pm > > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm > > /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp -typemap > > /usr/lib/perl5/5.8.5/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc > SCF.c > > Please specify prototyping behavior for SCF.xs (see perlxs manual) > > gcc -c -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING > > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE > > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386 > > -mtune=pentium4 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC > > "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE" -DLITTLE_ENDIAN > > SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory > > SCF.xs:13:26: io_lib/mFILE.h: No such file or directory > > SCF.xs: In function `XS_Bio__SCF_get_scf_pointer': > > SCF.xs:27: error: `Scf' undeclared (first use in this function) > > SCF.xs:27: error: (Each undeclared identifier is reported only once > > SCF.xs:27: error: for each function it appears in.) > > SCF.xs:27: error: `scf_data' undeclared (first use in this function) > > SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer': > > SCF.xs:66: error: `Scf' undeclared (first use in this function) > > SCF.xs:66: error: `scf_data' undeclared (first use in this function) > > SCF.xs:68: error: `mFILE' undeclared (first use in this function) > > SCF.xs:68: error: `mf' undeclared (first use in this function) > > SCF.xs: In function `XS_Bio__SCF_scf_free': > > SCF.xs:89: error: `Scf' undeclared (first use in this function) > > SCF.xs:89: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_get_comments': > > SCF.xs:95: error: `Scf' undeclared (first use in this function) > > SCF.xs:95: error: `scf_data' undeclared (first use in this function) > > SCF.xs:95: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_set_comments': > > SCF.xs:108: error: `Scf' undeclared (first use in this function) > > SCF.xs:108: error: `scf_data' undeclared (first use in this function) > > SCF.xs:108: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_scf_write': > > SCF.xs:121: error: `Scf' undeclared (first use in this function) > > SCF.xs:121: error: `scf_data' undeclared (first use in this function) > > SCF.xs:121: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_scf_fwrite': > > SCF.xs:135: error: `mFILE' undeclared (first use in this function) > > SCF.xs:135: error: `mf' undeclared (first use in this function) > > SCF.xs:137: error: `Scf' undeclared (first use in this function) > > SCF.xs:137: error: `scf_data' undeclared (first use in this function) > > SCF.xs:137: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_get_from_header': > > SCF.xs:159: error: `Scf' undeclared (first use in this function) > > SCF.xs:159: error: `scf_data' undeclared (first use in this function) > > SCF.xs:159: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_get_at': > > SCF.xs:186: error: `Scf' undeclared (first use in this function) > > SCF.xs:186: error: `scf_data' undeclared (first use in this function) > > SCF.xs:186: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_set_base_at': > > SCF.xs:242: error: `Scf' undeclared (first use in this function) > > SCF.xs:242: error: `scf_data' undeclared (first use in this function) > > SCF.xs:242: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_set_at': > > SCF.xs:255: error: `Scf' undeclared (first use in this function) > > SCF.xs:255: error: `scf_data' undeclared (first use in this function) > > SCF.xs:255: error: syntax error before ')' token > > make: *** [SCF.o] Error 1 > -- -Neeti Even my blood says, B positive From neetisomaiya at gmail.com Wed Feb 21 07:25:26 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Wed, 21 Feb 2007 17:55:26 +0530 Subject: [Bioperl-l] need help in Bio-SCF In-Reply-To: <200702210617.50616.sdavis2@mail.nih.gov> References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> <200702210617.50616.sdavis2@mail.nih.gov> Message-ID: <764978cf0702210425j544330bbr12c86a89960dbb66@mail.gmail.com> Thanks. It resolved my problem. On 2/21/07, Sean Davis wrote: > > On Wednesday 21 February 2007 03:19, neeti somaiya wrote: > > Hi All, > > > > I downloaded module > > Bio-SCF-1.01from CPAN. > > And I am trying to install it when I got the following error. Can > someone > > please guide me. > > You will probably need to read the INSTALL document. You need to install > a > couple of libraries first. Looks like you don't have the staden io-lib > installed. > > > > [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL > > Checking if your kit is complete... > > Looks good > > Note (probably harmless): No library found for -lread > > Writing Makefile for Bio::SCF > > > > [root at ps2288 Bio-SCF-1.01]# make > > cp SCF.pm blib/lib/Bio/SCF.pm > > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm > > /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp -typemap > > /usr/lib/perl5/5.8.5/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc > SCF.c > > Please specify prototyping behavior for SCF.xs (see perlxs manual) > > gcc -c -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING > > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE > > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386 > > -mtune=pentium4 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC > > "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE" -DLITTLE_ENDIAN > > SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory > > SCF.xs:13:26: io_lib/mFILE.h: No such file or directory > > SCF.xs: In function `XS_Bio__SCF_get_scf_pointer': > > SCF.xs:27: error: `Scf' undeclared (first use in this function) > > SCF.xs:27: error: (Each undeclared identifier is reported only once > > SCF.xs:27: error: for each function it appears in.) > > SCF.xs:27: error: `scf_data' undeclared (first use in this function) > > SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer': > > SCF.xs:66: error: `Scf' undeclared (first use in this function) > > SCF.xs:66: error: `scf_data' undeclared (first use in this function) > > SCF.xs:68: error: `mFILE' undeclared (first use in this function) > > SCF.xs:68: error: `mf' undeclared (first use in this function) > > SCF.xs: In function `XS_Bio__SCF_scf_free': > > SCF.xs:89: error: `Scf' undeclared (first use in this function) > > SCF.xs:89: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_get_comments': > > SCF.xs:95: error: `Scf' undeclared (first use in this function) > > SCF.xs:95: error: `scf_data' undeclared (first use in this function) > > SCF.xs:95: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_set_comments': > > SCF.xs:108: error: `Scf' undeclared (first use in this function) > > SCF.xs:108: error: `scf_data' undeclared (first use in this function) > > SCF.xs:108: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_scf_write': > > SCF.xs:121: error: `Scf' undeclared (first use in this function) > > SCF.xs:121: error: `scf_data' undeclared (first use in this function) > > SCF.xs:121: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_scf_fwrite': > > SCF.xs:135: error: `mFILE' undeclared (first use in this function) > > SCF.xs:135: error: `mf' undeclared (first use in this function) > > SCF.xs:137: error: `Scf' undeclared (first use in this function) > > SCF.xs:137: error: `scf_data' undeclared (first use in this function) > > SCF.xs:137: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_get_from_header': > > SCF.xs:159: error: `Scf' undeclared (first use in this function) > > SCF.xs:159: error: `scf_data' undeclared (first use in this function) > > SCF.xs:159: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_get_at': > > SCF.xs:186: error: `Scf' undeclared (first use in this function) > > SCF.xs:186: error: `scf_data' undeclared (first use in this function) > > SCF.xs:186: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_set_base_at': > > SCF.xs:242: error: `Scf' undeclared (first use in this function) > > SCF.xs:242: error: `scf_data' undeclared (first use in this function) > > SCF.xs:242: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_set_at': > > SCF.xs:255: error: `Scf' undeclared (first use in this function) > > SCF.xs:255: error: `scf_data' undeclared (first use in this function) > > SCF.xs:255: error: syntax error before ')' token > > make: *** [SCF.o] Error 1 > -- -Neeti Even my blood says, B positive From jay at jays.net Tue Feb 20 19:27:01 2007 From: jay at jays.net (Jay Hannah) Date: Tue, 20 Feb 2007 18:27:01 -0600 Subject: [Bioperl-l] Alignment In-Reply-To: <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com> References: <188661178021328@lycos-europe.com> <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com> Message-ID: > On 2/20/07, marian thieme wrote: >> I have a series of sequences which should be aligned against a >> reference sequence. >> In this special case we dont need to calculate anything, we only need >> to represent the sequences and get for instance some columns of >> interest. >> The problem now is, that some sequences have gaps and we need to >> represent gaps in the rewference sequence as well as in some >> individual sequences. On Feb 20, 2007, at 9:30 AM, Albert Vilella wrote: > I think the SimpleAlign object contains a set of sequences, each of > which is a LocatableSeq object. Fascinating. In my BLAST-centric universe I went and rolled my own solution for SeqLab where I hold onto the Bio::Seq from the reference sequences and then hold onto the Bio::Search::HSP::GenericHSP objects for all my BLAST hits. From that dataset I can write whatever reports I want and/or perform any subsequent actions. I wonder if I should have done that differently... What typically creates .pfam files? j seqlab.net http://www.bioperl.org/wiki/User:Jhannah From cjfields at uiuc.edu Wed Feb 21 08:36:02 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 21 Feb 2007 07:36:02 -0600 Subject: [Bioperl-l] Alignment In-Reply-To: References: <188661178021328@lycos-europe.com> <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com> Message-ID: <2233F0EE-94FE-42F0-B8E5-1BE14A25C0D4@uiuc.edu> On Feb 20, 2007, at 6:27 PM, Jay Hannah wrote: ... > > On Feb 20, 2007, at 9:30 AM, Albert Vilella wrote: >> I think the SimpleAlign object contains a set of sequences, each of >> which is a LocatableSeq object. > > Fascinating. In my BLAST-centric universe I went and rolled my own > solution for SeqLab where I hold onto the Bio::Seq from the reference > sequences and then hold onto the Bio::Search::HSP::GenericHSP objects > for all my BLAST hits. From that dataset I can write whatever > reports I > want and/or perform any subsequent actions. I wonder if I should have > done that differently... > > What typically creates .pfam files? > > j > seqlab.net > http://www.bioperl.org/wiki/User:Jhannah Pfam alignments come in two formats (pfam and stockholm) that can both be parsed into SimpleAlign objects via Bio::AlignIO: my $alnin = Bio::AlignIO->new(-format => 'stockholm', -file => 'dho.sto'); while (my $aln = $alnin->next_aln) { # do stuff to $aln SimpleAlign } Personally I stick with Stockholm as it's a richer format (with annotations and so on), but the parser was rewritten recently (by moi!) so may have some bugs still. I'm a bit confused as to what you do with BLAST files. You can generate a SimpleAlign right from the HSP for most SearchIO parsers: http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods chris From sanjib at bic.boseinst.ernet.in Wed Feb 21 01:12:06 2007 From: sanjib at bic.boseinst.ernet.in (Sanjib Kumar Gupta) Date: Wed, 21 Feb 2007 11:42:06 +0530 Subject: [Bioperl-l] help on remote blast In-Reply-To: <20070220073200.M42567@bic.boseinst.ernet.in> References: <20070220073200.M42567@bic.boseinst.ernet.in> Message-ID: <20070221061206.M37845@bic.boseinst.ernet.in> Hi I have been running this script for some time and it was running fine. I am using this linux machine with live IP(no proxy). But suudenly it has stopped working with this errors waiting...waiting... -------------------- WARNING --------------------- MSG: An Error Occurred

An Error Occurred

500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad hostname 'www.ncbi.nlm.nih.gov') --------------------------------------------------- xx.pep -------------------- WARNING --------------------- MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5 Content-Length: 497 Content-Type: application/x-www-form-urlencoded DATABASE=nr&QUERY=%3ENM_005277.3%3B_1414961_1+837% 0AMEENMEEGQTQKGCFECCIKCLGGIPYASLIATILLYAGVALFCGCGHEALSGTVNILQTYFEMARTAGDTLDVF TMIDIFKYVIYGIAAAFFVYGILLMVEGFFTTGAIKDLYGDFKITTCGRCVSAWFIMLTYLFMLAWLGVTAFTSLPV YMYFNLWTICRNTTLVEGANLCLDLRQFGIVTIGEEKKICTVSENFLRMCESTELNMTFHLFIVALAGAGAAVIAMV HYLMVLSANWAYVKDACRMQKYEDIKSKEEQELHDIHSTRSKERLNAYT*&COMPOSITION_BASED_STATISTI CS=off&EXPECT=1e- 10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&MATRIX_NAME=BLOSUM62&ENTREZ_ QUERY=Xenopus+laevis[Organism]&PROGRAM=blastp An Error Occurred

An Error Occurred

500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad hostname 'www.ncbi.nlm.nih.gov') --------------------------------------------------- waiting...waiting... -------------------- WARNING --------------------- MSG: An Error Occurred

An Error Occurred

500 Internal Server Error --------------------------------------------------- Though I am able to see the ncbi page from browser but am unable to ping ot trace route to the server. Please help me. -- Sanjib Kumar Gupta Bioinformatics Centre Bose Institute Kolkata 700054, INDIA Phone : +91-33-2355 6626, 2816, 2355 4766 Fax : +91-33-2355 3886 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: n9.pl URL: From granjeau at tagc.univ-mrs.fr Wed Feb 21 08:50:39 2007 From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137) Date: Wed, 21 Feb 2007 14:50:39 +0100 Subject: [Bioperl-l] Adding empty member list in Bio::Cluster::SequenceFamily Message-ID: <45DC4E2F.4060804@tagc.univ-mrs.fr> Hello! Not clear to me, but I find a work around by checking for empty list before adding, here is what I noticed. Adding as members an empty list () is not the same as adding a reference to an empty list [], of course, but could be thought to be the same. Calling get_members, for the second case, I got a list of 0 member, but in the first case I got of 1 member, which is not an object at all. I am warned now, but may be the documentation should emphasize on using by the reference call. Best regards, --Samuel use Bio::Cluster::SequenceFamily; $f = new Bio::Cluster::SequenceFamily( -id => 'aa' ); $f->add_members( () ); print scalar $f->get_members(); # 1 $g = new Bio::Cluster::SequenceFamily( -id => 'aa' ); $g->add_members( [] ); print scalar $g->get_members(); # 0 From stephen.marshall at novartis.com Wed Feb 21 12:01:00 2007 From: stephen.marshall at novartis.com (stephen.marshall at novartis.com) Date: Wed, 21 Feb 2007 12:01:00 -0500 Subject: [Bioperl-l] Parsing kegg files Message-ID: Hello I"m trying to parse a Kegg file and I can't seem to get at the pathway information... Here's a snippet of my code. I only see dblink and description as annotation use Bio::SeqIO; my $stream = Bio::SeqIO->new(-file => $filename, -format => 'KEGG'); while ( my $seq = $stream->next_seq() ) { # do something with $seq my $id = $seq->display_id(); print "$id:"; my $ann = $seq->annotation(); foreach my $key ( $ann->get_all_annotation_keys() ) { my @values = $ann->get_Annotations($key); foreach my $value ( @values ) { print "Annotation: ",$key," value: ",$value->as_text,"\n"; } } } _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From prateek.vit at gmail.com Wed Feb 21 12:40:25 2007 From: prateek.vit at gmail.com (prateek singh yadav) Date: Wed, 21 Feb 2007 23:10:25 +0530 Subject: [Bioperl-l] Problem in BioPerl Installation Message-ID: Hello all, I was trying to install Bioperl on my redhat linux (EL) using CPAN. but CPAN shows this problem. [root at HX342SBC054 Desktop]# cpan Terminal does not support AddHistory. cpan shell -- CPAN exploration and modules installation (v1.7601) ReadLine support available (try 'install Bundle::CPAN') cpan> get bioperl CPAN: Storable loaded ok Going to read /root/.cpan/Metadata Warning: Found only 25 objects in /root/.cpan/Metadata Going to read /root/.cpan/sources/authors/01mailrc.txt.gz Going to read /root/.cpan/sources/modules/02packages.details.txt.gz Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not contain a Line-Count header. Please check the validity of the index file by comparing it to more than one CPAN mirror. I'll continue but problems seem likely to happen. Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not contain a Last-Updated header. Please check the validity of the index file by comparing it to more than one CPAN mirror. I'll continue but problems seem likely to happen. Going to read /root/.cpan/sources/modules/03modlist.data.gz Can't locate object method "data" via package "CPAN::Modulelist" (perhaps you forgot to load "CPAN::Modulelist"?) at (eval 13) line 1. at /usr/lib/perl5/5.8.5/CPAN.pm line 3406 CPAN::Index::rd_modlist('CPAN::Index', '/root/.cpan/sources/modules/03modlist.data.gz') called at /usr/lib/perl5/5.8.5/CPAN.pm line 3129 CPAN::Index::reload('CPAN::Index') called at /usr/lib/perl5/5.8.5/CPAN.pm line 675 CPAN::exists('CPAN=HASH(0x8548f20)', 'CPAN::Module', 'bioperl') called at /usr/lib/perl5/5.8.5/CPAN.pm line 1842 CPAN::Shell::expandany('CPAN::Shell', 'bioperl') called at /usr/lib/perl5/5.8.5/CPAN.pm line 2078 CPAN::Shell::rematein('CPAN::Shell', 'get', 'bioperl') called at /usr/lib/perl5/5.8.5/CPAN.pm line 2157 CPAN::Shell::get('CPAN::Shell', 'bioperl') called at /usr/lib/perl5/5.8.5/CPAN.pm line 201 eval {...} called at /usr/lib/perl5/5.8.5/CPAN.pm line 201 CPAN::shell() called at /usr/bin/cpan line 193 cpan> Can anyone give me direction how to configure cpan again or how to install BioPerl on linux with its complete dependencies. Because I think I have a problem in CPAN configuration. Regards, Prateek -- Prateek Singh 3rd year Bioinformatics(BTech) Vellore Institute Of Technology Vellore-632014 prateek.vit at gmail.com From bosborne11 at verizon.net Wed Feb 21 12:29:40 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 21 Feb 2007 12:29:40 -0500 Subject: [Bioperl-l] Parsing kegg files In-Reply-To: Message-ID: Stephen, I don't know what your eventual goals are but you might want to take a look at bioperl-network. However, there are problems with this package. One, it only parses DIP tab-delimited and PSI-MI and it does this last one only partially (you will get the graph though). Two, it seems to have only a single developer interested in it, that's me, and few users. In my Bioperl experience projects like this tend to fade away. http://www.bioperl.org/wiki/Network_package Brian O. On 2/21/07 12:01 PM, "stephen.marshall at novartis.com" wrote: > Hello > I"m trying to parse a Kegg file and I can't seem to get at the pathway > information... Here's a snippet of my code. I only see dblink and > description as annotation > > use Bio::SeqIO; > > my $stream = Bio::SeqIO->new(-file => $filename, -format => 'KEGG'); > > while ( my $seq = $stream->next_seq() ) { > # do something with $seq > my $id = $seq->display_id(); > print "$id:"; > my $ann = $seq->annotation(); > foreach my $key ( $ann->get_all_annotation_keys() ) { > my @values = $ann->get_Annotations($key); > foreach my $value ( @values ) { > print "Annotation: ",$key," value: > ",$value->as_text,"\n"; > } > } > > } > _________________________ > > CONFIDENTIALITY NOTICE > > The information contained in this e-mail message is intended only for the > exclusive use of the individual or entity named above and may contain > information that is privileged, confidential or exempt from disclosure > under applicable law. If the reader of this message is not the intended > recipient, or the employee or agent responsible for delivery of the > message to the intended recipient, you are hereby notified that any > dissemination, distribution or copying of this communication is strictly > prohibited. If you have received this communication in error, please > notify the sender immediately by e-mail and delete the material from any > computer. Thank you. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From arareko at campus.iztacala.unam.mx Wed Feb 21 13:18:37 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Wed, 21 Feb 2007 12:18:37 -0600 Subject: [Bioperl-l] Problem in BioPerl Installation In-Reply-To: References: Message-ID: <45DC8CFD.1060108@campus.iztacala.unam.mx> You can always rebuild your CPAN configuration by deleting the existing .cpan/ directory in root's $HOME dir (quick & dirty trick), then invoke CPAN again from root's shell to rebuild the config: # perl -MCPAN -e shell Hope this helps. Regards, Mauricio. prateek singh yadav wrote: > Hello all, > > I was trying to install Bioperl on my redhat linux (EL) using CPAN. but CPAN > shows this problem. > > > [root at HX342SBC054 Desktop]# cpan > Terminal does not support AddHistory. > > cpan shell -- CPAN exploration and modules installation (v1.7601) > ReadLine support available (try 'install Bundle::CPAN') > > cpan> get bioperl > CPAN: Storable loaded ok > Going to read /root/.cpan/Metadata > Warning: Found only 25 objects in /root/.cpan/Metadata > Going to read /root/.cpan/sources/authors/01mailrc.txt.gz > Going to read /root/.cpan/sources/modules/02packages.details.txt.gz > Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not > contain a Line-Count header. > Please check the validity of the index file by comparing it to more > than one CPAN mirror. I'll continue but problems seem likely to > happen. > Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not > contain a Last-Updated header. > Please check the validity of the index file by comparing it to more > than one CPAN mirror. I'll continue but problems seem likely to > happen. > Going to read /root/.cpan/sources/modules/03modlist.data.gz > Can't locate object method "data" via package "CPAN::Modulelist" (perhaps > you forgot to load "CPAN::Modulelist"?) at (eval 13) line 1. > at /usr/lib/perl5/5.8.5/CPAN.pm line 3406 > CPAN::Index::rd_modlist('CPAN::Index', > '/root/.cpan/sources/modules/03modlist.data.gz') called at > /usr/lib/perl5/5.8.5/CPAN.pm line 3129 > CPAN::Index::reload('CPAN::Index') called at > /usr/lib/perl5/5.8.5/CPAN.pm line 675 > CPAN::exists('CPAN=HASH(0x8548f20)', 'CPAN::Module', 'bioperl') > called at /usr/lib/perl5/5.8.5/CPAN.pm line 1842 > CPAN::Shell::expandany('CPAN::Shell', 'bioperl') called at > /usr/lib/perl5/5.8.5/CPAN.pm line 2078 > CPAN::Shell::rematein('CPAN::Shell', 'get', 'bioperl') called at > /usr/lib/perl5/5.8.5/CPAN.pm line 2157 > CPAN::Shell::get('CPAN::Shell', 'bioperl') called at > /usr/lib/perl5/5.8.5/CPAN.pm line 201 > eval {...} called at /usr/lib/perl5/5.8.5/CPAN.pm line 201 > CPAN::shell() called at /usr/bin/cpan line 193 > > cpan> > > Can anyone give me direction how to configure cpan again or how to install > BioPerl on linux with its complete dependencies. Because I think I have a > problem in CPAN configuration. > > Regards, > Prateek > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From hlapp at gmx.net Wed Feb 21 13:33:17 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 21 Feb 2007 13:33:17 -0500 Subject: [Bioperl-l] Adding empty member list in Bio::Cluster::SequenceFamily In-Reply-To: <45DC4E2F.4060804@tagc.univ-mrs.fr> References: <45DC4E2F.4060804@tagc.univ-mrs.fr> Message-ID: <5B31EEBD-FFE5-4A0F-BB05-DF7297103BBD@gmx.net> Fixed in CVS HEAD. -hilmar On Feb 21, 2007, at 8:50 AM, Samuel GRANJEAUD - IR/IFR137 wrote: > Hello! > > Not clear to me, but I find a work around by checking for empty list > before adding, here is what I noticed. Adding as members an empty list > () is not the same as adding a reference to an empty list [], of > course, > but could be thought to be the same. Calling get_members, for the > second > case, I got a list of 0 member, but in the first case I got of 1 > member, > which is not an object at all. I am warned now, but may be the > documentation should emphasize on using by the reference call. > > Best regards, > --Samuel > > > use Bio::Cluster::SequenceFamily; > > $f = new Bio::Cluster::SequenceFamily( -id => 'aa' ); > $f->add_members( () ); > print scalar $f->get_members(); > # 1 > $g = new Bio::Cluster::SequenceFamily( -id => 'aa' ); > $g->add_members( [] ); > print scalar $g->get_members(); > # 0 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Wed Feb 21 14:12:57 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 21 Feb 2007 13:12:57 -0600 Subject: [Bioperl-l] GenBank accession bug? Message-ID: <11B83C2C-7BFE-48B5-A20C-7EAF282F39BD@uiuc.edu> Dmitry, I'm forwarding this to the mail list. In the future please post/ respond to the regular mail list so other BioPerl developers/users can comment. You'll get feedback much faster here (and maybe even some support!). The issue at hand is whether we can support GenBank accessions/ display_id/version with your naming scheme. My feeling is that support for nonalphanumerics was removed to be compliant with the GenBank standard for accessions, though I may be wrong. Maybe someone who was around during bioperl 1.2 can elaborate more? From http://bugzilla.open-bio.org/show_bug.cgi?id=2214 -------------------------------------------------- .... Thanks for verbose explanation. It seems that I would need to apply my local patches to the BioPerl module(s). With BioPerl-1.2 there was no problem with '-' in sequence names. The problem is that in the project we participate (Vizier project) following sequence name convention was adopted: VZ##-(or)-<$$> VZ Stands for Vizier ## Your 2-digits Partner ID within the VIZIER consortium Virus name according to the ICTV nomenclature; , If sequence has not been assigned a GenBank LOCUS ID, available strain designation, short as possible, should be used <$$> Unique 2-digits number on your discretion to label sequence variant -------------------------------------------------- chris From gabriel.cardona at uib.es Thu Feb 22 04:33:14 2007 From: gabriel.cardona at uib.es (gcardona) Date: Thu, 22 Feb 2007 01:33:14 -0800 (PST) Subject: [Bioperl-l] bioperl-1.5.2_100 ... not found Message-ID: <9096740.post@talk.nabble.com> Hello, I am trying to install Bioperl on a Windows system, following the installation notes in http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows When the Perl Package Manager tries to download bioperl-1.5.2_100, it cannot find the package and answers: Downloading bioperl-1.5.2_100 ... not found I've looked the contents of http://bioperl.org/DIST and in package.xml the version for bioperl is bioperl-1.5.2_100, but in that folder the available version is bioperl-1.5.2_102 Is this a bug? or should I download and install manually? Thank you in advance, Gabriel Cardona -- View this message in context: http://www.nabble.com/bioperl-1.5.2_100-...-not-found-tf3271747.html#a9096740 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From bix at sendu.me.uk Thu Feb 22 07:35:14 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 22 Feb 2007 12:35:14 +0000 Subject: [Bioperl-l] bioperl-1.5.2_100 ... not found In-Reply-To: <9096740.post@talk.nabble.com> References: <9096740.post@talk.nabble.com> Message-ID: <45DD8E02.1070404@sendu.me.uk> gcardona wrote: > Hello, > > I am trying to install Bioperl on a Windows system, following the > installation notes in > http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows > When the Perl Package Manager tries to download bioperl-1.5.2_100, it cannot > find the package and answers: > Downloading bioperl-1.5.2_100 ... not found > > I've looked the contents of > http://bioperl.org/DIST > and in package.xml the version for bioperl is bioperl-1.5.2_100, but in that > folder the available version is bioperl-1.5.2_102 > Is this a bug? or should I download and install manually? Sorry, my mistake. I accidentally moved the ppm to a different folder. It should work now though. I may make a 1.5.2_102 ppm at some point, but there are no relevant differences between _102 and _100 as far as Windows users are concerned. From enrique_rulz at yahoo.com Thu Feb 22 15:41:37 2007 From: enrique_rulz at yahoo.com (Kurt Gobain) Date: Thu, 22 Feb 2007 12:41:37 -0800 (PST) Subject: [Bioperl-l] Sequence matching problem! Message-ID: <9107936.post@talk.nabble.com> Hi every1.. I m facing a great deal of problem in simple pattern matching between sequence & a pattern ..Program shod be designed such a way that it shod be able do two things 1) normal matching...For eg: GATCAAT....if TC is entered... output shod be 2...2) matching using spl character..In same example if C*T value is entered It shod give o/p as 3 & seq to b displayed is CAAT..I m easily getting 1st part...But in 2nd part Its giving sum problem..output I m gettin as 1 instead of 3...Code is really simple! #!/usr/bin/perl $alphabet = "GATCAAT"; $pattern= "C*T "; $alphabet =~ /($pattern)/i; print "The entire '$pattern' match began at $-[0] and ended at $+[0]\n"; ==================== OUTPUT! The entire C*T match began at 1 and ended at 2 ==================== but the o/p shod be 3???? & Is there n e chance I can get seq too..I mean instead of C*T'' i need 'CAAT'...???? Well..Its not compulsion to use regex....But I find it quite simple..can there be n e other method?? Thanx in advance! Kurt! -- View this message in context: http://www.nabble.com/Sequence-matching-problem%21-tf3275153.html#a9107936 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at uiuc.edu Thu Feb 22 16:01:03 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 22 Feb 2007 15:01:03 -0600 Subject: [Bioperl-l] GenBank accession bug? In-Reply-To: <51879.10.0.7.57.1172176272.squirrel@gscmail.wustl.edu> References: <11B83C2C-7BFE-48B5-A20C-7EAF282F39BD@uiuc.edu> <51879.10.0.7.57.1172176272.squirrel@gscmail.wustl.edu> Message-ID: <028E16D7-036A-44DA-BECD-F910BEA58E53@uiuc.edu> On Feb 22, 2007, at 2:31 PM, dmessina at watson.wustl.edu wrote: >> The issue at hand is whether we can support GenBank accessions/ >> display_id/version with your naming scheme. > > Chris, I'm a little unsure of what you're saying here (which might > mean > that you're already saying what I'm about to...say). Do you mean it > might > be tricky to support both the Genbank standard and Dmitry's > simultaneously? > > I would argue any arbitrary ID should be supported as long as that > ID is a > contiguous non-space word (\S+). > > Actually the existing accession regex looks like it already > supports IDs > with '-': > > /^ACCESSION\s+(\S.*\S)/ > > It's only the version regex which doesn't (\w doesn't include '-'): > > /^\w+\.(\d+)/ > > > Anyone else have thoughts or comments on this? Off the top of my > head, I > can't think of any issues that might arise from doing so (apart from > having to modify all of the SeqIO modules to support it). > > Dave You're right; the argument comes down simply to whether we would support \S+ or just \w+. I'm neutral on this myself, but I wonder how allowing \S+ would affect other modules (for instance, indexing for a flat db), where one might just use \w+ for accessions, expecting them to be GenBank- or EMBL-like alphanumerics. The fact that \S+ was supported in the past (as indicated in the bug report) and then wasn't post 1.2 makes me think there was a reason for someone going in and modifying it, but that was before my time on the group. I'll have a look at the CVS history when I have time to see what I can dig up. chris From mkiwala at watson.wustl.edu Thu Feb 22 15:36:33 2007 From: mkiwala at watson.wustl.edu (Michael Kiwala) Date: Thu, 22 Feb 2007 14:36:33 -0600 Subject: [Bioperl-l] Bio::AnnotatableI vs Bio::FeatureHolderI Message-ID: <45DDFED1.1090503@watson.wustl.edu> Are Bio::AnnotatableI and Bio::FeatureHolderI competing interfaces? I get the impression they are designed to do similar things. If so is one deprecated and the other preferred? If their responsibilities are orthogonal to each other, what sorts of tasks are suited to each? Thanks, Michael From dmessina at wustl.edu Thu Feb 22 15:53:01 2007 From: dmessina at wustl.edu (Dave Messina) Date: Thu, 22 Feb 2007 14:53:01 -0600 (CST) Subject: [Bioperl-l] GenBank accession bug? Message-ID: <51923.10.0.7.57.1172177581.squirrel@gscmail.wustl.edu> > The issue at hand is whether we can support GenBank accessions/ > display_id/version with your naming scheme. Chris, I'm a little unsure of what you're saying here (which might mean that you're already saying what I'm about to...say). Do you mean it might be tricky to support both the Genbank standard and Dmitry's simultaneously? I would argue any arbitrary ID should be supported as long as that ID is a contiguous non-space word (\S+). Actually the existing accession regex looks like it already supports IDs with '-': /^ACCESSION\s+(\S.*\S)/ It's only the version regex which doesn't (\w doesn't include '-'): /^\w+\.(\d+)/ Anyone else have thoughts or comments on this? Off the top of my head, I can't think of any issues that might arise from doing so (apart from having to modify all of the SeqIO modules to support it). Dave From heikki at sanbi.ac.za Fri Feb 23 03:25:39 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Fri, 23 Feb 2007 10:25:39 +0200 Subject: [Bioperl-l] Sequence matching problem! In-Reply-To: <9107936.post@talk.nabble.com> References: <9107936.post@talk.nabble.com> Message-ID: <200702231025.39416.heikki@sanbi.ac.za> Kurt, There are few things in your code to note: - regexp /C*T/ matches any T preceded by zero or more Cs, not what you meant - $- and $+ are among the "expensive" perl functions worth not using unless you have to. Using them once in your code slows execution down considerable. There is always an other way. - Keep in mind what you want to use the match positions for: Human readable locations usually start counting with 1 but perl code uses 0 as the first location. The code below assumes you want to print the locations out. Study my example code below. Yours, -Heikki ################################################################### #!/usr/bin/perl $seq = "GATCAAT"; #$pattern= 'C*T'; $pattern= 'C.*T'; while ($seq =~ m/($pattern)/gi) { $match = $1; $end = pos($seq); $start = $end - length($match) +1; print "$match : $start - $end\n"; } ################################################################### On Thursday 22 February 2007 22:41:37 Kurt Gobain wrote: > Hi every1.. > I m facing a great deal of problem in simple pattern matching between > sequence & a pattern ..Program shod be designed such a way that it shod be > able do two things 1) normal matching...For eg: GATCAAT....if TC is > entered... output shod be 2...2) matching using spl character..In same > example if C*T value is entered It shod give o/p as 3 & seq to b displayed > is CAAT..I m easily getting 1st part...But in 2nd part Its giving sum > problem..output I m gettin as 1 instead of 3...Code is really simple! > > #!/usr/bin/perl > $alphabet = "GATCAAT"; > $pattern= "C*T "; > > $alphabet =~ /($pattern)/i; > > print "The entire '$pattern' match began at $-[0] and ended at $+[0]\n"; > > ==================== > OUTPUT! > The entire C*T match began at 1 and ended at 2 > ==================== > > but the o/p shod be 3???? > & Is there n e chance I can get seq too..I mean instead of C*T'' i need > 'CAAT'...???? > > Well..Its not compulsion to use regex....But I find it quite simple..can > there be n e other method?? > > Thanx in advance! > Kurt! -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From avilella at gmail.com Fri Feb 23 04:59:49 2007 From: avilella at gmail.com (Albert Vilella) Date: Fri, 23 Feb 2007 09:59:49 +0000 Subject: [Bioperl-l] Sequence matching problem! In-Reply-To: <200702231025.39416.heikki@sanbi.ac.za> References: <9107936.post@talk.nabble.com> <200702231025.39416.heikki@sanbi.ac.za> Message-ID: <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com> now that we are at this pattern matching thread, I was wondering if any perl guru could enlighten me on the issue of matching exact sequence patterns on a gapped target sequence. E.g.: my $seq = "CGATCAACGAATCGTACGTACTC"; my $gapped_seq = "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG"; and one would like to get as a result: "CG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTC" which is the match of $seq but in $gapped_seq. Cheers, Albert. On 2/23/07, Heikki Lehvaslaiho wrote: > Kurt, > > There are few things in your code to note: > > - regexp /C*T/ matches any T preceded by zero or more Cs, > not what you meant > - $- and $+ are among the "expensive" perl functions worth > not using unless you have to. Using them once in your > code slows execution down considerable. There is always > an other way. > - Keep in mind what you want to use the match positions for: > Human readable locations usually start counting with 1 but > perl code uses 0 as the first location. The code below assumes > you want to print the locations out. > > Study my example code below. > > Yours, > -Heikki > > ################################################################### > #!/usr/bin/perl > $seq = "GATCAAT"; > #$pattern= 'C*T'; > $pattern= 'C.*T'; > > while ($seq =~ m/($pattern)/gi) { > > $match = $1; > $end = pos($seq); > $start = $end - length($match) +1; > > print "$match : $start - $end\n"; > } > > ################################################################### > > > On Thursday 22 February 2007 22:41:37 Kurt Gobain wrote: > > Hi every1.. > > I m facing a great deal of problem in simple pattern matching between > > sequence & a pattern ..Program shod be designed such a way that it shod be > > able do two things 1) normal matching...For eg: GATCAAT....if TC is > > entered... output shod be 2...2) matching using spl character..In same > > example if C*T value is entered It shod give o/p as 3 & seq to b displayed > > is CAAT..I m easily getting 1st part...But in 2nd part Its giving sum > > problem..output I m gettin as 1 instead of 3...Code is really simple! > > > > #!/usr/bin/perl > > $alphabet = "GATCAAT"; > > $pattern= "C*T "; > > > > $alphabet =~ /($pattern)/i; > > > > print "The entire '$pattern' match began at $-[0] and ended at $+[0]\n"; > > > > ==================== > > OUTPUT! > > The entire C*T match began at 1 and ended at 2 > > ==================== > > > > but the o/p shod be 3???? > > & Is there n e chance I can get seq too..I mean instead of C*T'' i need > > 'CAAT'...???? > > > > Well..Its not compulsion to use regex....But I find it quite simple..can > > there be n e other method?? > > > > Thanx in advance! > > Kurt! > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From js5 at sanger.ac.uk Fri Feb 23 06:34:37 2007 From: js5 at sanger.ac.uk (James Smith) Date: Fri, 23 Feb 2007 11:34:37 +0000 (GMT) Subject: [Bioperl-l] Sequence matching problem! In-Reply-To: <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com> References: <9107936.post@talk.nabble.com> <200702231025.39416.heikki@sanbi.ac.za> <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com> Message-ID: On Fri, 23 Feb 2007, Albert Vilella wrote: > now that we are at this pattern matching thread, I was wondering if > any perl guru could enlighten me on the issue of matching exact > sequence patterns on a gapped target sequence. E.g.: > > my $seq = "CGATCAACGAATCGTACGTACTC"; > my $gapped_seq = > "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG"; > > and one would like to get as a result: > > "CG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTC" > > which is the match of $seq but in $gapped_seq. Try... my $seq = "CGATCAACGAATCGTACGTACTC"; my $gapped_seq = "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG"; my $regexp = '('.join('-*?',split//,$seq).')'; if( $gapped_seq =~ /$regexp/ ) { print "Match is $1\n"; } else { print "No match\n"; } (not sure on the efficiency if $seq is long tho') James > > Cheers, From khoueiry at ibdm.univ-mrs.fr Fri Feb 23 08:09:33 2007 From: khoueiry at ibdm.univ-mrs.fr (pierre) Date: Fri, 23 Feb 2007 14:09:33 +0100 Subject: [Bioperl-l] Sequence matching problem! In-Reply-To: <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com> References: <9107936.post@talk.nabble.com> <200702231025.39416.heikki@sanbi.ac.za> <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com> Message-ID: <1172236173.4309.6.camel@ciona-pierre> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From neetisomaiya at gmail.com Fri Feb 23 07:27:28 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Fri, 23 Feb 2007 17:57:28 +0530 Subject: [Bioperl-l] need help urgently - needle output parsing Message-ID: <764978cf0702230427x5b5acf73y6538527ade3fd453@mail.gmail.com> Hi, I am using needle alignment tool (standalone, on a linux machine), and then I am using Bioperl to parse the output. All data - sequence files and alignment outputs are attached with this mail. I have 2 small sequences :- 693.seq and revcomp693.seq I have 2 big sequences :- 80768-4291-5639.84809_84810_84809_1.scf.seq and 80768-4291-5639.84809_84810_84810_1.scf.seq All these are in fasta format Now I am doing the following :- 1) Aligning 80768-4291-5639.84809_84810_84809_1.scf.seq and 693.seq - output file is 80768-4291-5639.84809_84810_84809_1.scf.out parsing the output gives me the alignment start in 'traceseq' as 97 2) Aligning 80768-4291-5639.84809_84810_84809_1.scf.seq and revcomp693.seq - output file is 80768-4291-5639.84809_84810_84809_1.scf.comp.out parsing the output gives me the alignment start in 'traceseq' as 91 All this is correct. Now I am doing the following :- 1) Aligning 80768-4291-5639.84809_84810_84810_1.scf.seq and 693.seq - output file is 80768-4291-5639.84809_84810_84810_1.scf.out parsing the output gives me the alignment start in 'traceseq' as 341 (this is correct) 2) Aligning 80768-4291-5639.84809_84810_84810_1.scf.seq and revcomp693.seq - output file is 80768-4291-5639.84809_84810_84810_1.scf.comp.out parsing the output gives me the alignment start in 'traceseq' as 341 (this is incorrect, correct position is 330) Part of my code is as follows :- --------------------------------------------- # running needle `$needle_path./needle $trace.seq $snp_position_on_con.seq -gapopen 10.0-gapextend 0.5 $output`; # parsing needle output my $str = Bio::AlignIO->new(-format => 'emboss',-file => $output); my $aln = $str->next_aln(); my $pos = $aln->column_from_residue_number('original',1); $logger->info("Alignment pos is $pos"); #################################### # running needle `$needle_path./needle $trace.seq revcomp$snp_position_on_con.seq -gapopen 10.0 -gapextend 0.5 $comp_output`; # parsing needle output my $comp_str = Bio::AlignIO->new(-format => 'emboss',-file => $comp_output); my $comp_aln = $comp_str->next_aln(); my $comp_pos = $comp_aln->column_from_residue_number('revcomp',1); $logger->info("Alignment pos is $comp_pos"); Can someone please tell me what is going wrong here? -- -Neeti Even my blood says, B positive -------------- next part -------------- A non-text attachment was scrubbed... Name: data.zip Type: application/zip Size: 4456 bytes Desc: not available URL: From bix at sendu.me.uk Fri Feb 23 08:55:24 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 23 Feb 2007 13:55:24 +0000 Subject: [Bioperl-l] Sequence matching problem! In-Reply-To: References: <9107936.post@talk.nabble.com> <200702231025.39416.heikki@sanbi.ac.za> <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com> Message-ID: <45DEF24C.1010303@sendu.me.uk> James Smith wrote: > On Fri, 23 Feb 2007, Albert Vilella wrote: > >> now that we are at this pattern matching thread, I was wondering if >> any perl guru could enlighten me on the issue of matching exact >> sequence patterns on a gapped target sequence. E.g.: >> >> my $seq = "CGATCAACGAATCGTACGTACTC"; >> my $gapped_seq = >> "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG"; >> >> and one would like to get as a result: >> >> "CG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTC" >> >> which is the match of $seq but in $gapped_seq. > > Try... > > my $seq = "CGATCAACGAATCGTACGTACTC"; > my $gapped_seq = > "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG"; > > my $regexp = '('.join('-*?',split//,$seq).')'; > > if( $gapped_seq =~ /$regexp/ ) { > print "Match is $1\n"; > } else { > print "No match\n"; > } That's great stuff. If you were matching thousands of different $seq against the same very large $gapped_seq, and only needed the first match of $seq in $gapped_seq, the alternative to the above approach (remove the gaps from $gapped_seq and do index() matching) will be faster. Here's one (overly long-winded) way of implementing it, that I found to take ~2s vs ~22s for the above regex approach when doing the job on 999999 copies of $seq: #!/usr/bin/perl -w use strict; use warnings; my $gapped_seq = "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG"; # note the total gap-length at position in gapless 0-based coords my @gap_lengths; my $gap_length = 0; while ($gapped_seq =~ /(-+)/g) { my $match = $1; my $prev_length = $gap_length; $gap_length += length($match); my $end = pos($gapped_seq) - $gap_length - 1; push(@gap_lengths, $prev_length) for (1..$end-$#gap_lengths); } push(@gap_lengths, $gap_length) for (1..(length($gapped_seq) - @gap_lengths - $gap_length)); # remove the gaps my $gapless_seq = $gapped_seq; $gapless_seq =~ s/-//g; # now for each of thousands of seqs... my $seq = 'CGATCAACGAATCGTACGTACTC'; my @seqs; for (1..999999) { push(@seqs, $seq); } foreach my $seq (@seqs) { my $start = index($gapless_seq, $seq); if ($start == -1) { print "No match found for seq '$seq'\n"; next; } my $end = $start + length($seq) - 1; # calculate the coords in $gapped_seq $start = $start + $gap_lengths[$start]; $end = $end + $gap_lengths[$end]; my $result = substr($gapped_seq, $start, ($end - $start + 1)); #print $result, "\n"; } exit; From MEC at stowers-institute.org Fri Feb 23 10:54:57 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 23 Feb 2007 09:54:57 -0600 Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes with multiple values In-Reply-To: <6dce9a0b0701301446w7fc31d6eufe27442fecd0f20e@mail.gmail.com> Message-ID: Lincoln, and other Bio::DB::SeqFeature wanderers: I find that generating GFF from a Bio::DB::SeqFeature using gff3_string does not respect the following: "Multiple attributes of the same type are indicated by separating the values with the comma "," character" (c.f. http://www.sequenceontology.org/gff3.shtml) This one-liner demonstrates the problem: perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id => "J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A', -name => 'mec', -attributes => {foo => [qw(bar blat)]})->gff3_string' J A PH 1 2 . . . foo=bar;foo=blat;Name=mec Do you agree this is a problem? The fix is in the post-sig patch to /Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the stylistic privilege of promoting any ID, Parent, or Name attribute to the front of column 9, so output is now: J A PH 1 2 . . . Name=mec;foo=bar,blat Do you agree this is better? I am poised to commit it, as well as the functionally same patch to the equivilent function in Bio/Graphics/FeatureBase.pm All clear? -- Malcolm Cook *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25 --- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000 *************** *** 481,494 **** next if $t eq 'load_id'; next if $t eq 'parent_id'; foreach (@values) { s/\s+$// } # get rid of trailing whitespace ! ! push @result,join '=',$self->escape($t),$self->escape($_) foreach @values; } my $id = $self->primary_id; my $name = $self->display_name; ! push @result,"ID=".$self->escape($id) if defined $id; ! push @result,"Parent=".$self->escape($parent->primary_id) if defined $parent; ! push @result,"Name=".$self->escape($name) if defined $name; return join ';', at result; } --- 481,498 ---- next if $t eq 'load_id'; next if $t eq 'parent_id'; foreach (@values) { s/\s+$// } # get rid of trailing whitespace ! ! push @result,join '=',$self->escape($t),$self->escape($_) foreach @values; ! # NO! Multiple attributes of the same type are indicated by ! # separating the values with the comma "," character - per ! # http://www.sequenceontology.org/gff3.shtml. Do it this way: ! #push @result,join '=',$self->escape($t),join(',', map {$self->escape($_)} @values); } my $id = $self->primary_id; my $name = $self->display_name; ! unshift @result,"ID=".$self->escape($id) if defined $id; ! unshift @result,"Parent=".$self->escape($parent->primary_id) if defined $parent; ! unshift @result,"Name=".$self->escape($name) if defined $name; return join ';', at result; } From MEC at stowers-institute.org Fri Feb 23 12:08:11 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 23 Feb 2007 11:08:11 -0600 Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes withmultiple values In-Reply-To: Message-ID: Oy, I hit send too soon. The patch I send had my new attribute encoder commented out. It should've been: *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25 --- NormalizedFeature.pm 23 Feb 2007 17:06:37 -0000 *************** *** 481,494 **** next if $t eq 'load_id'; next if $t eq 'parent_id'; foreach (@values) { s/\s+$// } # get rid of trailing whitespace ! ! push @result,join '=',$self->escape($t),$self->escape($_) foreach @values; } my $id = $self->primary_id; my $name = $self->display_name; ! push @result,"ID=".$self->escape($id) if defined $id; ! push @result,"Parent=".$self->escape($parent->primary_id) if defined $parent; ! push @result,"Name=".$self->escape($name) if defined $name; return join ';', at result; } --- 481,497 ---- next if $t eq 'load_id'; next if $t eq 'parent_id'; foreach (@values) { s/\s+$// } # get rid of trailing whitespace ! # push @result,join '=',$self->escape($t),$self->escape($_) foreach @values; ! # NO! Multiple attributes of the same type are indicated by ! # separating the values with the comma "," character - per ! # http://www.sequenceontology.org/gff3.shtml. Do it this way: ! push @result,join '=',$self->escape($t),join(',', map {$self->escape($_)} @values); } my $id = $self->primary_id; my $name = $self->display_name; ! unshift @result,"ID=".$self->escape($id) if defined $id; ! unshift @result,"Parent=".$self->escape($parent->primary_id) if defined $parent; ! unshift @result,"Name=".$self->escape($name) if defined $name; return join ';', at result; } Malcolm From lstein at cshl.edu Fri Feb 23 12:16:01 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 23 Feb 2007 12:16:01 -0500 Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes with multiple values In-Reply-To: References: <6dce9a0b0701301446w7fc31d6eufe27442fecd0f20e@mail.gmail.com> Message-ID: <6dce9a0b0702230916o684e490cu9d01d3be3e2ae061@mail.gmail.com> Hi Malcom, You're quite right, and I appreciate your work in tracking down and fixing it. Before you commit the patch, can you confirm that the loader is working correctly so that comma-separated values are read back into the data structure as multiple attributes? Lincoln On 2/23/07, Cook, Malcolm wrote: > > Lincoln, and other Bio::DB::SeqFeature wanderers: > > I find that generating GFF from a Bio::DB::SeqFeature using gff3_string > does not respect the following: > > "Multiple attributes of the same type are indicated by separating the > values with the comma "," character" (c.f. > http://www.sequenceontology.org/gff3.shtml) > > This one-liner demonstrates the problem: > > perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id => > "J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A', > -name => 'mec', -attributes => {foo => [qw(bar blat)]})->gff3_string' > J A PH 1 2 . . . > foo=bar;foo=blat;Name=mec > > Do you agree this is a problem? > > The fix is in the post-sig patch to > /Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the > stylistic privilege of promoting any ID, Parent, or Name attribute to > the front of column 9, so output is now: > > J A PH 1 2 . . . > Name=mec;foo=bar,blat > > Do you agree this is better? > > I am poised to commit it, as well as the functionally same patch to the > equivilent function in Bio/Graphics/FeatureBase.pm > > All clear? > > -- Malcolm Cook > > > > *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25 > --- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000 > *************** > *** 481,494 **** > next if $t eq 'load_id'; > next if $t eq 'parent_id'; > foreach (@values) { s/\s+$// } # get rid of trailing whitespace > ! > ! push @result,join '=',$self->escape($t),$self->escape($_) foreach > @values; > } > my $id = $self->primary_id; > my $name = $self->display_name; > ! push @result,"ID=".$self->escape($id) if defined > $id; > ! push @result,"Parent=".$self->escape($parent->primary_id) if defined > $parent; > ! push @result,"Name=".$self->escape($name) if > defined $name; > return join ';', at result; > } > > --- 481,498 ---- > next if $t eq 'load_id'; > next if $t eq 'parent_id'; > foreach (@values) { s/\s+$// } # get rid of trailing whitespace > ! > ! push @result,join '=',$self->escape($t),$self->escape($_) foreach > @values; > ! # NO! Multiple attributes of the same type are indicated by > ! # separating the values with the comma "," character - per > ! # http://www.sequenceontology.org/gff3.shtml. Do it this way: > ! #push @result,join '=',$self->escape($t),join(',', map > {$self->escape($_)} @values); > } > my $id = $self->primary_id; > my $name = $self->display_name; > ! unshift @result,"ID=".$self->escape($id) if > defined $id; > ! unshift @result,"Parent=".$self->escape($parent->primary_id) if > defined $parent; > ! unshift @result,"Name=".$self->escape($name) if > defined $name; > return join ';', at result; > } > > > > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From aaron.j.mackey at gsk.com Fri Feb 23 09:36:18 2007 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Fri, 23 Feb 2007 09:36:18 -0500 Subject: [Bioperl-l] Bio::AnnotatableI vs Bio::FeatureHolderI In-Reply-To: <45DDFED1.1090503@watson.wustl.edu> Message-ID: The fundamental difference (in my mind) between a feature and an annotation, is that a feature has a location/range, and thus the information represented in the feature is applicable only to that location/range. An annotation, on the other hand, is "global", or at least non-localizable (note: a feature with a "fuzzy" location of "somewhere along this sequence, but I'm not sure where" is still not global - if you did/could know the location, you'd describe it as a feature, so it shouldn't be represented with an annotation). -Aaron bioperl-l-bounces at lists.open-bio.org wrote on 02/22/2007 03:36:33 PM: > Are Bio::AnnotatableI and Bio::FeatureHolderI competing interfaces? > > I get the impression they are designed to do similar things. If so is > one deprecated and the other preferred? > > If their responsibilities are orthogonal to each other, what sorts of > tasks are suited to each? > > Thanks, > Michael > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From MEC at stowers-institute.org Fri Feb 23 13:46:00 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 23 Feb 2007 12:46:00 -0600 Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes with multiple values In-Reply-To: <6dce9a0b0702230916o684e490cu9d01d3be3e2ae061@mail.gmail.com> Message-ID: Lincoln, OK. I'll do that... ...let's see, a quick squiz at Bio/DB/SeqFeature/Store/ .... ...ok - parse_attributes _looks_ right to me ...so, let's try it #load a feature into a new database: bp_seqfeature_load.PLS -dsn 'dbi:mysql:database=test;host=mysql-dev' -create -user test -pass test <(echo -e "J\tA\tPH\t1\t2\t.\t.\t.\tfoo=bar,blat;Name=mec\n") #It loaded ok. Now, let's print it out in GFF3: perl -MBio::DB::SeqFeature::Store -e 'foreach (Bio::DB::SeqFeature::Store->new(-dsn => "dbi:mysql:database=test;host=mysql-dev;user=test;password=test")->featu res(-type => "PH:A")) {print $_->gff3_string . "\n"}' J A PH 1 2 . . . Name=mec;ID=1;foo=bar,blat #output looks good to me Note, I tried loading attributes foo=bar;foo=blat and it came back foo=bar,blat. So, you can load either way. I'll commit later today. --Malcolm ________________________________ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: Friday, February 23, 2007 11:16 AM To: Cook, Malcolm Cc: bioperl list; lstein at cshl.org Subject: Re: Bio::DB::SeqFeature to GFF mishandles attributes with multiple values Hi Malcom, You're quite right, and I appreciate your work in tracking down and fixing it. Before you commit the patch, can you confirm that the loader is working correctly so that comma-separated values are read back into the data structure as multiple attributes? Lincoln On 2/23/07, Cook, Malcolm wrote: Lincoln, and other Bio::DB::SeqFeature wanderers: I find that generating GFF from a Bio::DB::SeqFeature using gff3_string does not respect the following: "Multiple attributes of the same type are indicated by separating the values with the comma "," character" (c.f. http://www.sequenceontology.org/gff3.shtml) This one-liner demonstrates the problem: perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id => "J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A', -name => 'mec', -attributes => {foo => [qw(bar blat)]})->gff3_string' J A PH 1 2 . . . foo=bar;foo=blat;Name=mec Do you agree this is a problem? The fix is in the post-sig patch to /Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the stylistic privilege of promoting any ID, Parent, or Name attribute to the front of column 9, so output is now: J A PH 1 2 . . . Name=mec;foo=bar,blat Do you agree this is better? I am poised to commit it, as well as the functionally same patch to the equivilent function in Bio/Graphics/FeatureBase.pm All clear? -- Malcolm Cook *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25 --- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000 *************** *** 481,494 **** next if $t eq 'load_id'; next if $t eq 'parent_id'; foreach (@values) { s/\s+$// } # get rid of trailing whitespace ! ! push @result,join '=',$self->escape($t),$self->escape($_) foreach @values; } my $id = $self->primary_id; my $name = $self->display_name; ! push @result,"ID=".$self->escape($id) if defined $id; ! push @result,"Parent=".$self->escape($parent->primary_id) if defined $parent; ! push @result,"Name=".$self->escape($name) if defined $name; return join ';', at result; } --- 481,498 ---- next if $t eq 'load_id'; next if $t eq 'parent_id'; foreach (@values) { s/\s+$// } # get rid of trailing whitespace ! ! push @result,join '=',$self->escape($t),$self->escape($_) foreach @values; ! # NO! Multiple attributes of the same type are indicated by ! # separating the values with the comma "," character - per ! # http://www.sequenceontology.org/gff3.shtml. Do it this way: ! #push @result,join '=',$self->escape($t),join(',', map {$self->escape($_)} @values); } my $id = $self->primary_id; my $name = $self->display_name; ! unshift @result,"ID=".$self->escape($id) if defined $id; ! unshift @result,"Parent=".$self->escape($parent->primary_id) if defined $parent; ! unshift @result,"Name=".$self->escape($name) if defined $name; return join ';', at result; } -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Fri Feb 23 13:49:44 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 23 Feb 2007 12:49:44 -0600 Subject: [Bioperl-l] Bio::AnnotatableI vs Bio::FeatureHolderI In-Reply-To: References: Message-ID: To add to that, there's a HOWTO describing the differences: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation I agree w/ Aaron; if it has a location it's a feature, otherwise it's an annotation. chris On Feb 23, 2007, at 8:36 AM, aaron.j.mackey at gsk.com wrote: > The fundamental difference (in my mind) between a feature and an > annotation, is that a feature has a location/range, and thus the > information represented in the feature is applicable only to that > location/range. An annotation, on the other hand, is "global", or at > least non-localizable (note: a feature with a "fuzzy" location of > "somewhere along this sequence, but I'm not sure where" is still not > global - if you did/could know the location, you'd describe it as a > feature, so it shouldn't be represented with an annotation). > > -Aaron > > bioperl-l-bounces at lists.open-bio.org wrote on 02/22/2007 03:36:33 PM: > >> Are Bio::AnnotatableI and Bio::FeatureHolderI competing interfaces? >> >> I get the impression they are designed to do similar things. If >> so is >> one deprecated and the other preferred? >> >> If their responsibilities are orthogonal to each other, what sorts of >> tasks are suited to each? >> >> Thanks, >> Michael >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From lstein at cshl.edu Fri Feb 23 16:20:26 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 23 Feb 2007 16:20:26 -0500 Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes with multiple values In-Reply-To: References: <6dce9a0b0702230916o684e490cu9d01d3be3e2ae061@mail.gmail.com> Message-ID: <6dce9a0b0702231320j1f24d4b4oe33bce6d2da96db7@mail.gmail.com> Excellent! Lincoln On 2/23/07, Cook, Malcolm wrote: > > Lincoln, > > OK. I'll do that... > > ...let's see, a quick squiz at Bio/DB/SeqFeature/Store/ .... > > ...ok - parse_attributes _looks_ right to me > > ...so, let's try it > > #load a feature into a new database: > > bp_seqfeature_load.PLS -dsn 'dbi:mysql:database=test;host=mysql-dev' > -create -user test -pass test <(echo -e "J\tA\tPH\t1\t2\t.\t.\t.\tfoo=bar, > blat;Name=mec\n") > > #It loaded ok. Now, let's print it out in GFF3: > > perl -MBio::DB::SeqFeature::Store -e 'foreach > (Bio::DB::SeqFeature::Store->new(-dsn => > "dbi:mysql:database=test;host=mysql-dev;user=test;password=test")->features(-type > => "PH:A")) {print $_->gff3_string . "\n"}' > J A PH 1 2 . . . Name=mec;ID=1;foo=bar,blat > > #output looks good to me > > Note, I tried loading attributes foo=bar;foo=blat and it came back > foo=bar,blat. So, you can load either way. > > I'll commit later today. > > --Malcolm > > > ------------------------------ > *From:* lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] *On > Behalf Of *Lincoln Stein > *Sent:* Friday, February 23, 2007 11:16 AM > *To:* Cook, Malcolm > *Cc:* bioperl list; lstein at cshl.org > *Subject:* Re: Bio::DB::SeqFeature to GFF mishandles attributes with > multiple values > > Hi Malcom, > > You're quite right, and I appreciate your work in tracking down and fixing > it. Before you commit the patch, can you confirm that the loader is working > correctly so that comma-separated values are read back into the data > structure as multiple attributes? > > Lincoln > > On 2/23/07, Cook, Malcolm wrote: > > > > Lincoln, and other Bio::DB::SeqFeature wanderers: > > > > I find that generating GFF from a Bio::DB::SeqFeature using gff3_string > > does not respect the following: > > > > "Multiple attributes of the same type are indicated by separating the > > values with the comma "," character" (c.f. > > http://www.sequenceontology.org/gff3.shtml) > > > > This one-liner demonstrates the problem: > > > > perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id => > > "J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A', > > -name => 'mec', -attributes => {foo => [qw(bar blat)]})->gff3_string' > > J A PH 1 2 . . . > > foo=bar;foo=blat;Name=mec > > > > Do you agree this is a problem? > > > > The fix is in the post-sig patch to > > /Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the > > stylistic privilege of promoting any ID, Parent, or Name attribute to > > the front of column 9, so output is now: > > > > J A PH 1 2 . . . > > Name=mec;foo=bar,blat > > > > Do you agree this is better? > > > > I am poised to commit it, as well as the functionally same patch to the > > equivilent function in Bio/Graphics/FeatureBase.pm > > > > All clear? > > > > -- Malcolm Cook > > > > > > > > *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25 > > --- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000 > > *************** > > *** 481,494 **** > > next if $t eq 'load_id'; > > next if $t eq 'parent_id'; > > foreach (@values) { s/\s+$// } # get rid of trailing whitespace > > ! > > ! push @result,join '=',$self->escape($t),$self->escape($_) foreach > > @values; > > } > > my $id = $self->primary_id; > > my $name = $self->display_name; > > ! push @result,"ID=".$self->escape($id) if defined > > > > $id; > > ! push @result,"Parent=".$self->escape($parent->primary_id) if defined > > $parent; > > ! push @result,"Name=".$self->escape($name) if > > defined $name; > > return join ';', at result; > > } > > > > --- 481,498 ---- > > next if $t eq 'load_id'; > > next if $t eq 'parent_id'; > > foreach (@values) { s/\s+$// } # get rid of trailing whitespace > > ! > > ! push @result,join '=',$self->escape($t),$self->escape($_) foreach > > > > @values; > > ! # NO! Multiple attributes of the same type are indicated by > > ! # separating the values with the comma "," character - per > > ! # http://www.sequenceontology.org/gff3.shtml. Do it this way: > > ! #push @result,join '=',$self->escape($t),join(',', map > > {$self->escape($_)} @values); > > } > > my $id = $self->primary_id; > > my $name = $self->display_name; > > ! unshift @result,"ID=".$self->escape($id) if > > defined $id; > > ! unshift @result,"Parent=".$self->escape($parent->primary_id) if > > defined $parent; > > ! unshift @result,"Name=".$self->escape($name) if > > defined $name; > > return join ';', at result; > > } > > > > > > > > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From enrique_rulz at yahoo.com Sat Feb 24 16:23:59 2007 From: enrique_rulz at yahoo.com (Kurt Gobain) Date: Sat, 24 Feb 2007 13:23:59 -0800 (PST) Subject: [Bioperl-l] Sequence matching problem! In-Reply-To: <200702231025.39416.heikki@sanbi.ac.za> References: <9107936.post@talk.nabble.com> <200702231025.39416.heikki@sanbi.ac.za> Message-ID: <9137941.post@talk.nabble.com> Heikki Lehvaslaiho wrote: > > Kurt, > > There are few things in your code to note: > > - regexp /C*T/ matches any T preceded by zero or more Cs, > not what you meant > - $- and $+ are among the "expensive" perl functions worth > not using unless you have to. Using them once in your > code slows execution down considerable. There is always > an other way. > - Keep in mind what you want to use the match positions for: > Human readable locations usually start counting with 1 but > perl code uses 0 as the first location. The code below assumes > you want to print the locations out. > > Study my example code below. > > Yours, > -Heikki > > ################################################################### > #!/usr/bin/perl > $seq = "GATCAAT"; > #$pattern= 'C*T'; > $pattern= 'C.*T'; > > while ($seq =~ m/($pattern)/gi) { > > $match = $1; > $end = pos($seq); > $start = $end - length($match) +1; > > print "$match : $start - $end\n"; > } > > ################################################################### > > Thanx for the instant reply!...Sorry cudn reply earlier.. Code works perfectly fine...but...sum time its not givin reqd o/p..For eg. If I type sequence as "GATCAAGTCAGGAT" & pattern to be matched as T.*A..then o/p which I am getting frm above prog is TCAAGTCAGGA instead of TCA... & 1 more thing Is there n e chance by which I can replace T*A to T.*A cos the code which I need to write says T*A shod be only the input not T.*A..So Can we use replacment reg ex...sumthing like $pattern =~ s/.*/*/...or sumthing else... But its kinda givin sum error again...Dam! Regex is really hairy!!...:P N e ways thanx a lot again for the code...Hope to listen frm you soon! Kurt! -- View this message in context: http://www.nabble.com/Sequence-matching-problem%21-tf3275153.html#a9137941 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From biology0046 at hotmail.com Sat Feb 24 23:14:51 2007 From: biology0046 at hotmail.com (=?gb2312?B?va0gzsTi/Q==?=) Date: Sun, 25 Feb 2007 04:14:51 +0000 Subject: [Bioperl-l] how to change align output format Message-ID: Dear all: I have problems in changing the output format of clustal alignment. I use the Bio::Tools::Run::Alignment::Clustalw module to carry out an mulitple sequences alignment, then i use the Bio::AlignIO module to write out the alignment. Scripts like this: my $aln_out=Bio::AlignIO->new(-file=>">./clustal/${outfilename}.aln",-format=>'clustalw'); The output : dana_GLEANR_16071 MSKMKMLPVQLSLNQLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL dere_GLEANR_9270 ..............S............................................. FBgn0000097 ..............S............................................. dsec_GLEANR_671 ..............S............................................. dsim_GLEANR_6613 ..............S............................................. dyak_GLEANR_1669 ..............S............................................. . dana_GLEANR_16071 VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM dere_GLEANR_9270 ............................................................ FBgn0000097 ............................................................ dsec_GLEANR_671 ............................................................ dsim_GLEANR_6613 ............................................................ dyak_GLEANR_1669 ............................................................ But , I want to change the output format as below, which do not change the identical residues into "." character. dere_GLEANR_9270 MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL dyak_GLEANR_1669 MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL dsec_GLEANR_671 MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL dsim_GLEANR_6613 MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL FBgn0000097 MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL dana_GLEANR_16071 MSKMKMLPVQLSLNQLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL **************.********************************************* dere_GLEANR_9270 VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM dyak_GLEANR_1669 VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM dsec_GLEANR_671 VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM dsim_GLEANR_6613 VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM FBgn0000097 VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM dana_GLEANR_16071 VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM ************************************************************ Are their any parameters in the package that can be changed so that i can get the postier output format? Thank you Sincerely! Jiang _________________________________________________________________ ?????????????????????????????? MSN Hotmail?? http://www.hotmail.com From bix at sendu.me.uk Sun Feb 25 05:53:48 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 25 Feb 2007 10:53:48 +0000 Subject: [Bioperl-l] [Fwd: Bio::Graph::SimpleGraph] Message-ID: <45E16ABC.3060405@sendu.me.uk> Tels, I've forwarded this to the author of the module, Nat Goodman, and to the Bioperl mailing list (http://www.bioperl.org/wiki/Mailing_lists#Main_BioPerl_list). But actually we have Bio::Graph::* as tentatively deprecated: http://www.bioperl.org/wiki/Deprecated_modules#Bio::Graph_modules so any further work on it doesn't seem worthwhile. -------- Original Message -------- Subject: Bio::Graph::SimpleGraph Date: Sat, 24 Feb 2007 12:07:31 +0100 From: Tels Moin, I just stumble dover Bio::Graph::SimpleGraph and read this comment: "This is a simple, hopefully fast undirected graph package. The only reason this exists is that the standard CPAN Graph pacakge, Graph::Base, is seriously broken." Really sad to see people always reinventing the wheel :/ Anyway, I wonder if you would like to make your module support Graph::Easy (http://search.cpan.org/~tels/Graph-Easy/)? I would be willing to submit patches and do testing/documention for that. All the best, Tels From bix at sendu.me.uk Sun Feb 25 05:45:21 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 25 Feb 2007 10:45:21 +0000 Subject: [Bioperl-l] Sequence matching problem! In-Reply-To: <9137941.post@talk.nabble.com> References: <9107936.post@talk.nabble.com> <200702231025.39416.heikki@sanbi.ac.za> <9137941.post@talk.nabble.com> Message-ID: <45E168C1.80306@sendu.me.uk> Kurt Gobain wrote: > Code works perfectly fine...but...sum time its not givin reqd o/p..For eg. > If I type sequence as "GATCAAGTCAGGAT" & pattern to be matched as T.*A..then > o/p which I am getting frm above prog is TCAAGTCAGGA instead of TCA... > & 1 more thing Is there n e chance by which I can replace T*A to T.*A cos > the code which I need to write says T*A shod be only the input not T.*A..So > Can we use replacment reg ex...sumthing like > $pattern =~ s/.*/*/...or sumthing else... > But its kinda givin sum error again...Dam! Regex is really hairy!!...:P These aren't Bioperl questions. For regular expression help see: http://perldoc.perl.org/perlretut.html Basically, you want a non-greedy match, so T.*?A You can convert T*A by doing s/\*/.*?/ Here are some more regexs for you: s/sum/some/g s/frm/from/g s/n e/any/g etc... From biology0046 at hotmail.com Sun Feb 25 08:28:34 2007 From: biology0046 at hotmail.com (=?gb2312?B?va0gzsTi/Q==?=) Date: Sun, 25 Feb 2007 13:28:34 +0000 Subject: [Bioperl-l] AlignIO problems Message-ID: hi, all, I use the AlignIO module to convert the alignment file. my original file is : CLUSTAL W(1.81) multiple sequence alignment dana_GLEANR_11249 MEAIAKHDFSATADDELSFRKTQTLKILNMEDDSNWYRAELDGKEGLIPSNYIEMKNHDW dere_GLEANR_7213 ...V...................I.................................... dgri_GLEANR_6962 .......................I.................................... FBgn0004638 .......................I.................................... dmoj_GLEANR_6118 ...........N...........I.................................... dper_GLEANR_18885 ...V...................I.................................... dpse_GLEANR_14384 ...V...................I.................................... dsec_GLEANR_3096 .................N.....I.................................... dsim_GLEANR_9744 -----------------------------............................... dvir_GLEANR_4811 .......................I.................................... dwil_GLEANR_10869 .......................I.................................... dyak_GLEANR_13576 .......................I.................................... dana_GLEANR_11249 YYGRITRADAEKLLSNKHEGAFLIRISESSPGDFSLSVKCPDGVQHFKVLRDAQSKFFLW dere_GLEANR_7213 ............................................................ dgri_GLEANR_6962 ............................................................ FBgn0004638 ............................................................ dmoj_GLEANR_6118 .................L.......................................... dper_GLEANR_18885 ............................................................ dpse_GLEANR_14384 ............................................................ dsec_GLEANR_3096 ............................................................ dsim_GLEANR_9744 ............................................................ dvir_GLEANR_4811 ............................................................ dwil_GLEANR_10869 ............................................................ dyak_GLEANR_13576 ............................................................ dana_GLEANR_11249 VVKFNSLNELVEYHRTASVSRSQDVKLRDMIPEEMLVQALYDFVPQESGELDFRRGDVIT dere_GLEANR_7213 ............................................................ dgri_GLEANR_6962 ............................................................ FBgn0004638 ............................................................ dmoj_GLEANR_6118 ..............................V.D........................... dper_GLEANR_18885 .......................E.................................... dpse_GLEANR_14384 .......................E.................................... dsec_GLEANR_3096 ............................................................ dsim_GLEANR_9744 ............................................................ dvir_GLEANR_4811 ............................................................ dwil_GLEANR_10869 ............................................................ dyak_GLEANR_13576 ............................................................ dana_GLEANR_11249 VTDRSDENWWNGEIGNRKGIFPATYVTPYHS dere_GLEANR_7213 ............................... dgri_GLEANR_6962 ............................... FBgn0004638 ............................... dmoj_GLEANR_6118 ............Q.................. dper_GLEANR_18885 ............................... dpse_GLEANR_14384 ............................... dsec_GLEANR_3096 ............................... dsim_GLEANR_9744 ............................... dvir_GLEANR_4811 ............................... dwil_GLEANR_10869 ............................... dyak_GLEANR_13576 ............................... I want to change those "." characters back to alphabetic expression, then i write the code like this: use Bio::AlignIO; my $in=Bio::AlignIO->new(-file =>"FBgn0000097.aln", -format => 'clustalw'); my $out=Bio::AlignIO->new(-file=>">../clustalw/0097.aln", -format =>'clustalw'); while (my $aln=$in->next_aln() ){ $aln->unmatch(); $aln->set_displayname_flat(); $out->write_aln($aln); } but when i run the code, there are error message like: -------------------- WARNING --------------------- MSG: Got a sequence with no letters in it cannot guess alphabet [] --------------------------------------------------- ------------- EXCEPTION ------------- MSG: No sequence with name [dsim_GLEANR_9744/1-182] STACK Bio::SimpleAlign::displayname /home/src/bioperl/bioperl-live/Bio/SimpleAlign.pm:2307 STACK Bio::SimpleAlign::set_displayname_flat /home/src/bioperl/bioperl-live/Bio/SimpleAlign.pm:2374 STACK toplevel aligntest.pl:11 -------------------------------------- I don't know where is the problem. Jiang _________________________________________________________________ ???????? MSN Explorer: http://explorer.msn.com/lccn/ From cjfields at uiuc.edu Sun Feb 25 14:58:23 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 25 Feb 2007 13:58:23 -0600 Subject: [Bioperl-l] AlignIO problems In-Reply-To: References: Message-ID: <19EA5F46-D1A4-45B5-B2DB-55194F79215C@uiuc.edu> Bio::AlignIO::clustalw doesn't work with masked sequences; it parses the output quite literally as is, so any [.-] are treated as gaps. If the seqs are 100% identical then you will have a seq with 100% gaps and no sequence, thus giving you the warnings you see. The best way to accomplish what you want is to not mask the sequence alignment to begin with when running clustalw/muscle/whatever. Exactly how are you generating these? When I use clustalw no identity masking occurs by default. chris On Feb 25, 2007, at 7:28 AM, ? ?? wrote: > hi, all, > I use the AlignIO module to convert the alignment file. > my original file is : > CLUSTAL W(1.81) multiple sequence alignment > > > dana_GLEANR_11249 > MEAIAKHDFSATADDELSFRKTQTLKILNMEDDSNWYRAELDGKEGLIPSNYIEMKNHDW > dere_GLEANR_7213 ...V...................I....................... > ............. > dgri_GLEANR_6962 .......................I....................... > ............. > FBgn0004638 .......................I....................... > ............. > dmoj_GLEANR_6118 ...........N...........I....................... > ............. > dper_GLEANR_18885 ...V...................I....................... > ............. > dpse_GLEANR_14384 ...V...................I....................... > ............. > dsec_GLEANR_3096 .................N.....I....................... > ............. > dsim_GLEANR_9744 > -----------------------------............................... > dvir_GLEANR_4811 .......................I....................... > ............. > dwil_GLEANR_10869 .......................I....................... > ............. > dyak_GLEANR_13576 .......................I....................... > ............. > > > > dana_GLEANR_11249 > YYGRITRADAEKLLSNKHEGAFLIRISESSPGDFSLSVKCPDGVQHFKVLRDAQSKFFLW > dere_GLEANR_7213 ............................................... > ............. > dgri_GLEANR_6962 ............................................... > ............. > FBgn0004638 ............................................... > ............. > dmoj_GLEANR_6118 .................L............................. > ............. > dper_GLEANR_18885 ............................................... > ............. > dpse_GLEANR_14384 ............................................... > ............. > dsec_GLEANR_3096 ............................................... > ............. > dsim_GLEANR_9744 ............................................... > ............. > dvir_GLEANR_4811 ............................................... > ............. > dwil_GLEANR_10869 ............................................... > ............. > dyak_GLEANR_13576 ............................................... > ............. > > > > dana_GLEANR_11249 > VVKFNSLNELVEYHRTASVSRSQDVKLRDMIPEEMLVQALYDFVPQESGELDFRRGDVIT > dere_GLEANR_7213 ............................................... > ............. > dgri_GLEANR_6962 ............................................... > ............. > FBgn0004638 ............................................... > ............. > dmoj_GLEANR_6118 ..............................V.D.............. > ............. > dper_GLEANR_18885 .......................E....................... > ............. > dpse_GLEANR_14384 .......................E....................... > ............. > dsec_GLEANR_3096 ............................................... > ............. > dsim_GLEANR_9744 ............................................... > ............. > dvir_GLEANR_4811 ............................................... > ............. > dwil_GLEANR_10869 ............................................... > ............. > dyak_GLEANR_13576 ............................................... > ............. > > > > dana_GLEANR_11249 VTDRSDENWWNGEIGNRKGIFPATYVTPYHS > dere_GLEANR_7213 ............................... > dgri_GLEANR_6962 ............................... > FBgn0004638 ............................... > dmoj_GLEANR_6118 ............Q.................. > dper_GLEANR_18885 ............................... > dpse_GLEANR_14384 ............................... > dsec_GLEANR_3096 ............................... > dsim_GLEANR_9744 ............................... > dvir_GLEANR_4811 ............................... > dwil_GLEANR_10869 ............................... > dyak_GLEANR_13576 ............................... > > > I want to change those "." characters back to alphabetic > expression, then i write the code like this: > use Bio::AlignIO; > my $in=Bio::AlignIO->new(-file =>"FBgn0000097.aln", > -format => 'clustalw'); > my $out=Bio::AlignIO->new(-file=>">../clustalw/0097.aln", > -format =>'clustalw'); > while (my $aln=$in->next_aln() ){ > $aln->unmatch(); > $aln->set_displayname_flat(); > $out->write_aln($aln); > } > > but when i run the code, there are error message like: > > -------------------- WARNING --------------------- > MSG: Got a sequence with no letters in it cannot guess alphabet [] > --------------------------------------------------- > > ------------- EXCEPTION ------------- > MSG: No sequence with name [dsim_GLEANR_9744/1-182] > STACK Bio::SimpleAlign::displayname /home/src/bioperl/bioperl-live/ > Bio/SimpleAlign.pm:2307 > STACK Bio::SimpleAlign::set_displayname_flat /home/src/bioperl/ > bioperl-live/Bio/SimpleAlign.pm:2374 > STACK toplevel aligntest.pl:11 > > -------------------------------------- > > I don't know where is the problem. > > Jiang > > _________________________________________________________________ > ???? MSN Explorer: http://explorer.msn.com/lccn/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cristiangary at gmail.com Sun Feb 25 16:04:57 2007 From: cristiangary at gmail.com (Cristian Gary) Date: Sun, 25 Feb 2007 18:04:57 -0300 Subject: [Bioperl-l] problem with blast report to ncbi webpage Message-ID: <95ef8cd0702251304o45bea6a0tcedc59156cb0cfe4@mail.gmail.com> i have a problem with the blast report to the ncbi server. the time to wait the Rids dont showme any result. the problem is the ncbi server o the biperl version.? pd: the same code works very well a 3 weeks ago. -- "El conocimiento le pertecene a la humanidad" "Gnu/linux -------- free your mind...... www.kubuntu.org From granjeau at tagc.univ-mrs.fr Mon Feb 26 04:17:15 2007 From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137) Date: Mon, 26 Feb 2007 10:17:15 +0100 Subject: [Bioperl-l] Reading a XML sequence (UniParc) into a BioSeq object Message-ID: <45E2A59B.6080300@tagc.univ-mrs.fr> Hello ! I would like to fill a BioSeq object with the output from a dbfetch request at EI on UniParc database (which replies only XML code, as I am interested in references). If somebody could tell which BioPerl object to use or a way or convert it in Swiss format or could tell me the way to do it or has got a piece of code (is http://doc.bioperl.org/bioperl-live/Bio/SeqIO/interpro.html a good starting point), I would appreciate a lot. Best regards, --Samuel MSTRSVSSSSYRRMFGGPGTASRPSSSRSYVTTSTRTYSLGSALRPSTSRSLYASSPGGV YATRSSAVRLRSSVPGVRLLQDSVDFSLADAINTEFKNTRTNEKVELQELNDRFANYIDK VRFLEQQNKILLAELEQLKGQGKSRLGDLYEEEMRELRRQVDQLTNDKARVEVERDNLAE DIMRLREKLQEEMLQREEAENTLQSFRQDVDNASLARLDLERKVESLQEEIAFLKKLHEE EIQELQAQIQEQHVQIDVDVSKPDLTAALRDVRQQYESVAAKNLQEAEEWYKSKFADLSE AANRNNDALRQAKQESTEYRRQVQSLTCEVDALKGTNESLERQMREMEENFAVEAANYQD TIGRLQDEIQNMKEEMARHLREYQDLLNVKMALDIEIATYRKLLEGEESRISLPLPNFSS LNLRGKHFISL From bix at sendu.me.uk Mon Feb 26 06:46:39 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 26 Feb 2007 11:46:39 +0000 Subject: [Bioperl-l] [Fwd: Bio::Graph::SimpleGraph] In-Reply-To: <45E16ABC.3060405@sendu.me.uk> References: <45E16ABC.3060405@sendu.me.uk> Message-ID: <45E2C89F.1020402@sendu.me.uk> Nat replied, but I messed up to To:s so his reply didn't make it to the list. Here's what he said: Nathan (Nat) Goodman wrote: Hi Tels I agree it's sad to reinvent the wheel, but I don't think that's what happened here. Your module seems to be focused on rendering graphs while my module is concerned with computations on graphs. In any case, as Sendu notes, SimpleGraph is in the process of being deprecated. I fully support this move. It was intended to be a stopgap until the main Perl Graph module was fixed. Since that has now happened, it's time for SimpleGraph to retire. For the benefit of anyone using Graph: last I checked (six months or more ago), it had serious performance problems on large graphs (probably not too much of a surprise), and also was inexplicably slow on graphs with edge attributes. I see that the latter bug is marked "resolved" in CPAN, but there's no indication of when or how. We've moved to Boost for graphs as large as the human protein interaction network. Best, Nat From sanjib at bic.boseinst.ernet.in Mon Feb 26 00:23:36 2007 From: sanjib at bic.boseinst.ernet.in (Sanjib Kumar Gupta) Date: Mon, 26 Feb 2007 10:53:36 +0530 Subject: [Bioperl-l] Remote blast In-Reply-To: <20070221064743.M54123@bic.boseinst.ernet.in> References: <20070221064743.M54123@bic.boseinst.ernet.in> Message-ID: <20070226052336.M74918@bic.boseinst.ernet.in> Hi I have been running this script for some time and it was running fine. I am using this linux machine with live IP(no proxy). But suudenly it has stopped working with this errors waiting...waiting... -------------------- WARNING --------------------- MSG: An Error Occurred

An Error Occurred

500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad hostname 'www.ncbi.nlm.nih.gov') --------------------------------------------------- xx.pep -------------------- WARNING --------------------- MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5 Content-Length: 497 Content-Type: application/x-www-form-urlencoded DATABASE=nr&QUERY=%3ENM_005277.3%3B_1414961_1+837% 0AMEENMEEGQTQKGCFECCIKCLGGIPYASLIATILLYAGVALFCGCGHEALSGTVNILQTYFEMARTAGDTLDVF TMIDIFKYVIYGIAAAFFVYGILLMVEGFFTTGAIKDLYGDFKITTCGRCVSAWFIMLTYLFMLAWLGVTAFTSLPV YMYFNLWTICRNTTLVEGANLCLDLRQFGIVTIGEEKKICTVSENFLRMCESTELNMTFHLFIVALAGAGAAVIAMV HYLMVLSANWAYVKDACRMQKYEDIKSKEEQELHDIHSTRSKERLNAYT*&COMPOSITION_BASED_STATISTI CS=off&EXPECT=1e- 10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&MATRIX_NAME=BLOSUM62&ENTREZ_ QUERY=Xenopus+laevis[Organism]&PROGRAM=blastp An Error Occurred

An Error Occurred

500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad hostname 'www.ncbi.nlm.nih.gov') --------------------------------------------------- waiting...waiting... -------------------- WARNING --------------------- MSG: An Error Occurred

An Error Occurred

500 Internal Server Error --------------------------------------------------- Though I am able to see the ncbi page from browser but am unable to ping ot trace route to the server. Please help me. On Wed, 21 Feb 2007 01:00:46 -0500, bioperl-l-request wrote > Mailing list subscription confirmation notice for mailing list > Bioperl-l > > We have received a request from 202.141.148.27 for subscription of > your email address, "sanjib at bic.boseinst.ernet.in", to the > bioperl-l at lists.open-bio.org mailing list. To confirm that you want > to be added to this mailing list, simply reply to this message, > keeping the Subject: header intact. Or visit this web page: > > http://lists.open-bio.org/mailman/confirm/bioperl- l/d31449c0ad1146c7ae6d2d9b585816664f476568 > > Or include the following line -- and only the following line -- in a > message to bioperl-l-request at lists.open-bio.org: > > confirm d31449c0ad1146c7ae6d2d9b585816664f476568 > > Note that simply sending a `reply' to this message should work from > most mail readers, since that usually leaves the Subject: line in the > right form (additional "Re:" text in the Subject: is okay). > > If you do not wish to be subscribed to this list, please simply > disregard this message. If you think you are being maliciously > subscribed to the list, or have any other questions, send them to > bioperl-l-owner at lists.open-bio.org. -- Sanjib Kumar Gupta Bioinformatics Centre Bose Institute Kolkata 700054, INDIA Phone : +91-33-2355 6626, 2816, 2355 4766 Fax : +91-33-2355 3886 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: n9.pl URL: From cjfields at uiuc.edu Mon Feb 26 09:59:21 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 26 Feb 2007 08:59:21 -0600 Subject: [Bioperl-l] Remote blast In-Reply-To: <20070226052336.M74918@bic.boseinst.ernet.in> References: <20070221064743.M54123@bic.boseinst.ernet.in> <20070226052336.M74918@bic.boseinst.ernet.in> Message-ID: I tested this out and got BLAST to work for my test case (single fasta seq, since you didn't send any seqs for testing). It keeps querying for the RID in what appears to be an infinite loop (i.e. it doesn't get rid of the RID properly); you can see this if you add '- verbose => 1' to your parameters. I don't have time to delve into it but from a quick glance it may be due to your looping structure and how you are saving your rids. As for your particular error, could it be something as simple as the server was overloaded or down? It does happen from time to time... Beyond that I can't make heads or tails of your script. Was it cobbled together from a bunch of others? If you are doing that you can probably expect some bugs to occur. chris On Feb 25, 2007, at 11:23 PM, Sanjib Kumar Gupta wrote: > Hi > I have been running this script for some time and it was running > fine. I am > using this linux machine with live IP(no proxy). But suudenly it > has stopped > working with this errors > > waiting...waiting... > -------------------- WARNING --------------------- > MSG: > An Error Occurred > >

An Error Occurred

> 500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad > hostname 'www.ncbi.nlm.nih.gov') > > > > --------------------------------------------------- > xx.pep > > -------------------- WARNING --------------------- > MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi > User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5 > Content-Length: 497 > Content-Type: application/x-www-form-urlencoded > > DATABASE=nr&QUERY=%3ENM_005277.3%3B_1414961_1+837% > 0AMEENMEEGQTQKGCFECCIKCLGGIPYASLIATILLYAGVALFCGCGHEALSGTVNILQTYFEMARTA > GDTLDVF > TMIDIFKYVIYGIAAAFFVYGILLMVEGFFTTGAIKDLYGDFKITTCGRCVSAWFIMLTYLFMLAWLGVT > AFTSLPV > YMYFNLWTICRNTTLVEGANLCLDLRQFGIVTIGEEKKICTVSENFLRMCESTELNMTFHLFIVALAGAG > AAVIAMV > HYLMVLSANWAYVKDACRMQKYEDIKSKEEQELHDIHSTRSKERLNAYT*&COMPOSITION_BASED_S > TATISTI > CS=off&EXPECT=1e- > 10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&MATRIX_NAME=BLOSUM62& > ENTREZ_ > QUERY=Xenopus+laevis[Organism]&PROGRAM=blastp > > > An Error Occurred > >

An Error Occurred

> 500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad > hostname 'www.ncbi.nlm.nih.gov') > > > > --------------------------------------------------- > waiting...waiting... > -------------------- WARNING --------------------- > MSG: > An Error Occurred > >

An Error Occurred

> 500 Internal Server Error > > > > --------------------------------------------------- > > Though I am able to see the ncbi page from browser but am unable to > ping ot > trace route to the server. > > Please help me. From cjfields at uiuc.edu Mon Feb 26 10:05:50 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 26 Feb 2007 09:05:50 -0600 Subject: [Bioperl-l] AlignIO problems In-Reply-To: References: Message-ID: <082E0708-6B1C-45CE-B387-429B8B6A8D7A@uiuc.edu> Make sure to keep this on the list, others may have some input. You should be able to test the various sequence objects you're retrieving from Bio::DB::Fasta via Bio::SeqIO to see if they are what you're expecting, then track down the problematic sequences. My guess is the odd seqs are due to the way you are using Bio::DB::Fasta for each of the files. I'm wondering if you are having problems with indices overwriting one another and are thus getting back blank seq objects. You should probably consider just indexing all of your files together; according to the POD you can use a single Bio::DB::Fasta to index all of the files in one go (indicate the path and use '-glob') and retrieve what you need that way. Either that or separating them into separate directories so the indices are also separate. chris On Feb 25, 2007, at 9:50 PM, ? ?? wrote: > Thank you for your help! > May be you are right, I use the following code to create my seq > object arrays: > my $outfilename=$dmel; > my $ana_pep_db=Bio::DB::Fasta->new("dana.translation.fasta"); > my $ana_cdna_db=Bio::DB::Fasta->new("dana.cds.fasta"); > my $ere_pep_db=Bio::DB::Fasta->new("dere.translation.fasta"); > my $ere_cdna_db=Bio::DB::Fasta->new("dere.cds.fasta"); > my $mel_pep_db=Bio::DB::Fasta->new("dmel.translation.fasta"); > my $mel_cdna_db=Bio::DB::Fasta->new("dmel.cds.fasta"); > my $sec_pep_db=Bio::DB::Fasta->new("dsec.translation.fasta"); > my $sec_cdna_db=Bio::DB::Fasta->new("dsec.cds.fasta"); > my $sim_pep_db=Bio::DB::Fasta->new("dsim.translation.fasta"); > my $sim_cdna_db=Bio::DB::Fasta->new("dsim.cds.fasta"); > my $yak_pep_db=Bio::DB::Fasta->new("dyak.translation.fasta"); > my $yak_cdna_db=Bio::DB::Fasta->new("dyak.cds.fasta"); > my $ana_pep_obj=$ana_pep_db->get_Seq_by_id($dana); > my $ana_nuc_obj=$ana_cdna_db->get_Seq_by_id($dana); > my $ere_pep_obj=$ere_pep_db->get_Seq_by_id($dere); > my $ere_nuc_obj=$ere_cdna_db->get_Seq_by_id($dere); > my $mel_pep_obj=$mel_pep_db->get_Seq_by_id($dmel); > my $mel_nuc_obj=$mel_cdna_db->get_Seq_by_id($dmel); > my $sec_pep_obj=$sec_pep_db->get_Seq_by_id($dsec); > my $sec_nuc_obj=$sec_cdna_db->get_Seq_by_id($dsec); > my $sim_pep_obj=$sim_pep_db->get_Seq_by_id($dsim); > my $sim_nuc_obj=$sim_cdna_db->get_Seq_by_id($dsim); > my $yak_pep_obj=$yak_pep_db->get_Seq_by_id($ddyak); > my $yak_nuc_obj=$yak_cdna_db->get_Seq_by_id($ddyak); > push @prots, $ana_pep_obj; > push @cdna, $ana_nuc_obj; > push @prots, $ere_pep_obj; > push @cdna, $ere_nuc_obj; > push @prots, $mel_pep_obj; > push @cdna, $mel_nuc_obj; > push @prots, $sec_pep_obj; > push @cdna, $sec_nuc_obj; > push @prots, $sim_pep_obj; > push @cdna, $sim_nuc_obj; > push @prots, $yak_pep_obj; > push @cdna, $yak_nuc_obj; > > then I use the @prots as input for my $aln=$aln_factory->align > (\@prots); > This method will create align files with sequences masked. > > But if I use fasta files(not an object) which contain protein > sequences as input, $inputfile='FBgn0000097.pep'; > @params=('outorder'=>'INPUT'); > $factory=Bio::Tools::Run::Alignment::Clustalw->new(@params); > $aln=$factory->align($inputfile); > #$aln->gap_char('-'); > $aln->map_chars('\.','-'); > $aln_out=Bio::AlignIO->new(-file=>">0097.aln",-format=>'clustalw'); > $aln_out->write_aln($aln); > > This methods create files without masking~~~ > I think sequence objects created by "get_Seq_by_id" from sequence > databases directly are not appropriate. > > Thank you for your suggestion again! > > Jiang. > >> From: Chris Fields >> To: ????? >> Subject: Re: [Bioperl-l] AlignIO problems >> Date: Sun, 25 Feb 2007 21:26:34 -0600 >> >> I ran the same using a local fasta formatted file on my system >> which works (no masking). >> >> Of note, the gaps were all marked as '.'. You're gaps were both >> '.' and '-', which may mean that something is wrong with the seq >> objects themselves. Maybe SeqIO is misreading them? >> >> chris >> >> On Feb 25, 2007, at 7:34 PM, ????? wrote: >> >>> I use the Bio::Tools::Run::Alignment::Clustalw module to carry >>> out multiple alignment. >>> my code is: >>> my @clustal_param=('outorder'=>'INPUT'); >>> my $aln_factory=Bio::Tools::Run::Alignment::Clustalw->new >>> (@clustal_param); >>> my $aln=$aln_factory->align(\@prots);###@prots is >>> array of protein sequence objects >>> my $aln_out=Bio::AlignIO->new(-file=>">./dmel_group/ >>> clustal/ ${outfilename}.aln",-format=>'clustalw'); >>> >>> $aln_out->write_aln($aln); >>> This code produce alignment which mask identity residues. >>> But if i use clustalW directly, the output is normal. >>> Thank you for your help~ >>> >>> Jiang >> > > _________________________________________________________________ > ???? MSN Explorer: http://explorer.msn.com/lccn Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From michael.watson at bbsrc.ac.uk Mon Feb 26 11:00:31 2007 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Mon, 26 Feb 2007 16:00:31 -0000 Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna In-Reply-To: <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com> References: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk> <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2EBD3@iahce2ksrv1.iah.bbsrc.ac.uk> Hi Lincoln/List That's great, the axis now appears, but there are no labels. This in itself isn't a problem, as long as we can assume that the tick marks are at 0, 50% and 100%? If that's true, we can go with what we have, otherwise I'm going to have to figure out a way to label the y-axis Thanks Mick ________________________________ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: 15 February 2007 18:53 To: michael watson (IAH-C) Cc: BioPerl-List Subject: Re: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna Hi Michael, When you set up the panel, do this: Bio::Graphics::Panel->new(-blah -blah, -pad_left => 20, -pad_right => 20); This will leave enough room on the left and right for you to see the Y axis. Otherwise it runs off the edge of the image (ok, this is a mis-design, but it was the only way to solve a chicken-and-egg problem about who gets to say how wide the panel is) Lincoln On 2/15/07, michael watson (IAH-C) wrote: Hi OK I have some great images out of this glyph, but I can't see the axis, and nor is it labelled (ie does it go from 0 - 100%?) so isn't great for publication. The docs say: "NOTE: -gc_window=>'auto' gives nice results and is recommended for drawing GC content. The GC content axes draw slightly outside the panel, so you may wish to add some extra padding on the right and left. " Any idea how to do this? Basically, I want a nice GC graph with the axis quite clearly labelled, and a nice "%GC" title next to it :) Thanks Mick The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Mon Feb 26 12:18:38 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 26 Feb 2007 11:18:38 -0600 Subject: [Bioperl-l] AlignIO problems In-Reply-To: References: Message-ID: <7DF958E6-E233-427F-8901-3FE571CD99BD@uiuc.edu> On Feb 26, 2007, at 9:59 AM, ? ?? wrote: > Thank you! > I have checked the sequences retrieved through lots of Bio:DB > objects work simultaneously. > There are not problems you mentioned, the sequences are not > overwritten. Again, keep this on the list. I have my hands full this month so I will be checking the list only very sporadically; someone else may be able to help you. The only explanation for the clustalw output you get is that you are not retrieving the correct sequence in some way fundamental way, which to me indicates the bug originates either in the way the sequences are retrieved (i.e. somehow via Bio::DB::Fasta, hence my thought about conflicting indices) or in the way they are converted via Bio::SeqIO, which is used in Bio::Tools::Run::Alignment::Clustalw. When I have used Bio::DB::Fasta in the past I have never had a problem when indexing multiple files and retrieving sequences, so beyond running tests with your data I can't help you much beyond the above conjecturing. chris From jason at bioperl.org Mon Feb 26 13:45:34 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 26 Feb 2007 10:45:34 -0800 Subject: [Bioperl-l] Question to Bio::Tools::Run RemoteBlast In-Reply-To: <20070226095515.68810@gmx.net> References: <20070226095515.68810@gmx.net> Message-ID: <2D2DF6D9-6DAE-4BB7-B31B-8C19CCCA7301@bioperl.org> Alex - I am glad to see of your interest in the module, but I don't currently have any time to maintain it and so queries should be sent to the BioPerl mailing list. In general we prefer you don't contact developers directly, but use the mailing list so that others can learn from questions. Please note there are several tutorials and documentation on the website, you will get a better response from people if you can show you have at least tried to use the existing example code to construct your program. -jason On Feb 26, 2007, at 1:55 AM, Alexander Auner wrote: > Daer Jason Stajich, > I hope you can me help. > > I am inspired of their module and would like to work with it. > I am a student to the TFH Wildau. > I have problems with the understanding of the module. > > You could send me an example. > > The example is to process a text file (FASTA) with NCBI-Blast (Web). > > Parameter: > Choose database -> Others -> nr > Limit by entrez query -> Campylobacter -> or select from: -> > Bacteria [ORGN] > Expect -> 10 > Other advanced -> -q-1 > > output format > plain text without Graphical Overview > Number of: -> Descriptions -> 10000 > Alignment view -> query-anchored with identities > > All other parameters remain undef. > > Thank you for your help. > > faithfully Alexander Auner > -- > "Feel free" - 5 GB Mailbox, 50 FreeSMS/Monat ... > Jetzt GMX ProMail testen: www.gmx.net/de/go/mailfooter/promail-out From jason at bioperl.org Mon Feb 26 14:13:00 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 26 Feb 2007 11:13:00 -0800 Subject: [Bioperl-l] BioPerl leadership additions Message-ID: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> Dear BioPerl Users and Developers, I want to announce a addition in the leadership of BioPerl. Christopher Fields and and Sendu Bala are now members of the BioPerl Core developer group to recognize their ongoing leadership in the project. Chris and Sendu were instrumental in the 1.5.2 Developer release and have made a significant commitment and contribution to the quality of the code and the documentation of the project. We have invited them to be part of the core to recognize their work and to feel comfortable to ask them to do more. ;-) The Core group was established to insure that someone was responsible for making code releases, vetting new developers for CVS write accounts, and generally dealing with things that might otherwise slip through the cracks. We are very excited to have more people contributing to and maintaining the toolkit. We look forward to their help along with all the other developers, as we work towards a 1.6 release release this year. As always, while their is a need for some individuals to lead the project, we encourage contributions from all levels of expertise to improve the code, documentation, and tutorials of the project. We plan to discuss the progress of the toolkit at this year's Bioinformatics Open Source Conference held in Vienna, Austria in conjunction with the SIG meetings at ISMB. We are trying to use BOSC 2007 as a chance for the developers of Open Bioinformatics Foundation sponsored and related projects to coordinate future development and release cycles. Jason Stajich on behalf of the Core developers From khan at cshl.edu Mon Feb 26 15:29:19 2007 From: khan at cshl.edu (Khan, Sohail) Date: Mon, 26 Feb 2007 15:29:19 -0500 Subject: [Bioperl-l] parsing a list of ids to a fasta file. Message-ID: Thanks Michael. I have the scripts installed. I can pass an id to indexed fasta file and retrieve the seq. However, I was wondering if I can pass a list of ids from a file and get seq. for all the ids? Thanks. -Sohail -----Original Message----- From: michael watson (IAH-C) [mailto:michael.watson at bbsrc.ac.uk] Sent: Tuesday, February 20, 2007 4:33 PM To: Khan, Sohail; Bioperl-l at lists.open-bio.org Subject: RE: [Bioperl-l] parsing a list of ids to a fasta file. Suggest you use Bio::Index::Fasta to create an index for the fasta file and then a simple script to retrieve sequences using that index. Or just use the pre-written bp_index.PLS and bp_fetch.PLS scripts. http://www.bioperl.org/wiki/Module:Bio::Index::Fasta ________________________________ From: bioperl-l-bounces at lists.open-bio.org on behalf of Khan, Sohail Sent: Tue 20/02/2007 8:42 PM To: Bioperl-l at lists.open-bio.org Subject: [Bioperl-l] parsing a list of ids to a fasta file. Dear list, I am new to Bio-Perl. I have the following question: I have a list of ids, which I would like to parse against a large fasta file to retrieve the Seq for parsed ids. I appreciate any suggestions. Thanks. Khan _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From arareko at campus.iztacala.unam.mx Mon Feb 26 16:44:49 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Mon, 26 Feb 2007 15:44:49 -0600 Subject: [Bioperl-l] BioPerl leadership additions In-Reply-To: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> References: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> Message-ID: <45E354D1.4070600@campus.iztacala.unam.mx> Congrats Chris & Sendu! Very well-deserved. Keep up the great work. Cheers! Mauricio. Jason Stajich wrote: > Dear BioPerl Users and Developers, > > I want to announce a addition in the leadership of BioPerl. > Christopher Fields and and Sendu Bala are now members of the BioPerl > Core developer group to recognize their ongoing leadership in the > project. Chris and Sendu were instrumental in the 1.5.2 Developer > release and have made a significant commitment and contribution to > the quality of the code and the documentation of the project. We > have invited them to be part of the core to recognize their work and > to feel comfortable to ask them to do more. ;-) > > The Core group was established to insure that someone was responsible > for making code releases, vetting new developers for CVS write > accounts, and generally dealing with things that might otherwise slip > through the cracks. We are very excited to have more people > contributing to and maintaining the toolkit. We look forward to > their help along with all the other developers, as we work towards a > 1.6 release release this year. > > As always, while their is a need for some individuals to lead the > project, we encourage contributions from all levels of expertise to > improve the code, documentation, and tutorials of the project. > > We plan to discuss the progress of the toolkit at this year's > Bioinformatics Open Source Conference held in Vienna, Austria in > conjunction with the SIG meetings at ISMB. We are trying to use > BOSC 2007 as a chance for the developers of Open Bioinformatics > Foundation sponsored and related projects to coordinate future > development and release cycles. > > Jason Stajich on behalf of the Core developers > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From lubapardo at gmail.com Tue Feb 27 08:26:30 2007 From: lubapardo at gmail.com (Luba Pardo) Date: Tue, 27 Feb 2007 14:26:30 +0100 Subject: [Bioperl-l] parsing blast results Message-ID: <58ff33550702270526q43ab584ci7baf6a09e2584e38@mail.gmail.com> Hi, I am using the module Bio::SearchIO to parse some blast results. I would like to store the ids of the results into an array but I am not sure if this is possible to do it with an existing subroutine. Does anyone have an idea whether there is a method included within the module Bio::SearchIO to do so? Thanks in advance, L.Pardo From cjfields at uiuc.edu Tue Feb 27 09:11:37 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 27 Feb 2007 08:11:37 -0600 Subject: [Bioperl-l] parsing blast results In-Reply-To: <58ff33550702270526q43ab584ci7baf6a09e2584e38@mail.gmail.com> References: <58ff33550702270526q43ab584ci7baf6a09e2584e38@mail.gmail.com> Message-ID: On Feb 27, 2007, at 7:26 AM, Luba Pardo wrote: > Hi, > I am using the module Bio::SearchIO to parse some blast results. I > would > like to store the ids of the results into an array but I am not > sure if this > is possible to do it with an existing subroutine. Does anyone have > an idea > whether there is a method included within the module Bio::SearchIO > to do so? > Thanks in advance, > L.Pardo Bio::SearchIO doesn't currently have a method to retrieve all the accessions in a BLAST result. The best way to do this is to iterate through the objects: my @accs; while (my $result = $searchio->next_result) { while (my $hit = $result->next_hit) { push @accs, $hit->accession; # do whatever else here... } } print join ',', @accs; I don't think all accessions in the description are parsed out at the moment, just the first one (or the one in the hit table). If you want all of them or if you want the NCBI GI you'll need to parse them out of the description heading ($hit->description). chris From sac at bioperl.org Tue Feb 27 12:59:22 2007 From: sac at bioperl.org (Steve Chervitz) Date: Tue, 27 Feb 2007 09:59:22 -0800 Subject: [Bioperl-l] [Bioperl-announce-l] BioPerl leadership additions In-Reply-To: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> References: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> Message-ID: <8f200b4c0702270959n7521f693l915bfabe7ccb7ef7@mail.gmail.com> Welcome to the club, Chris & Sendu. Always good to have an infusion of new blood and capable, motivated hands. Steve On 2/26/07, Jason Stajich wrote: > > Dear BioPerl Users and Developers, > > I want to announce a addition in the leadership of BioPerl. > Christopher Fields and and Sendu Bala are now members of the BioPerl > Core developer group to recognize their ongoing leadership in the > project. Chris and Sendu were instrumental in the 1.5.2 Developer > release and have made a significant commitment and contribution to > the quality of the code and the documentation of the project. We > have invited them to be part of the core to recognize their work and > to feel comfortable to ask them to do more. ;-) > > The Core group was established to insure that someone was responsible > for making code releases, vetting new developers for CVS write > accounts, and generally dealing with things that might otherwise slip > through the cracks. We are very excited to have more people > contributing to and maintaining the toolkit. We look forward to > their help along with all the other developers, as we work towards a > 1.6 release release this year. > > As always, while their is a need for some individuals to lead the > project, we encourage contributions from all levels of expertise to > improve the code, documentation, and tutorials of the project. > > We plan to discuss the progress of the toolkit at this year's > Bioinformatics Open Source Conference held in Vienna, Austria in > conjunction with the SIG meetings at ISMB. We are trying to use > BOSC 2007 as a chance for the developers of Open Bioinformatics > Foundation sponsored and related projects to coordinate future > development and release cycles. > > Jason Stajich on behalf of the Core developers > > _______________________________________________ > Bioperl-announce-l mailing list > Bioperl-announce-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l > From cjfields at uiuc.edu Tue Feb 27 15:57:40 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 27 Feb 2007 14:57:40 -0600 Subject: [Bioperl-l] Bio::SeqIO::FTHelper Message-ID: Could anyone tell me what FTHelper is used for? From what I gather it rolls up seqfeature data into a lightweight object but then creates a SeqFeature::Generic anyway (at least for GenBank/EMBL/ Swiss), which seems to be a waste of memory and time. Is there something I'm missing (besides my sanity of course)? chris From Jay at jays.net Wed Feb 28 04:39:55 2007 From: Jay at jays.net (Jay Hannah) Date: Wed, 28 Feb 2007 03:39:55 -0600 Subject: [Bioperl-l] "Command-Line Bioinformatics" Message-ID: Reading this article: http://www.linuxjournal.com/article/6977 Sequencing the SARS Virus - Linux Journal, Nov 2003 This guy needs Perl and/or BioPerl. :) > The sequence file is in FASTA format consisting of a header line > and the sequence, split into fixed-width lines. The following > counts the number of Gs and Cs in the sequence and presents the > total as a fraction of the total number of bases: > > > grep -v "^>" AY274119.fa | fold -w 1 | > tr "ATGC" "..xx" | sort | uniq -c | > sed 's/[^0-9]//g' | t -s "\012" " " | > sed 's/\([0-9]*\) \([0-9]*\)/scale = 3; > ?\2 \/ (\1+\2)/' | > bc -i > scale = 3; 12127 / (17624+12127) > .407 > > Out of the 29,751 bases in our sequence, 12,127 are either G or C, > giving a GC content of 41%. BioPerl version: use Bio::SeqIO; my $io = Bio::SeqIO->new( -file => 'AY274119.fa', -format => 'Fasta' ); my $seq = $io->next_seq->seq; print ( ($seq =~ tr/GC/GC/) / length ($seq) ); Command-line Perl: perl -e '$/ = undef; $_ = <>; s/>.*//; s/\n//g; print tr/GC/GC/ / length($_)' AY274119.fa I'm sure you can Perl Golf my stabs at it. :) j seqlab.net http://www.bioperl.org/wiki/User:Jhannah From n.saunders at uq.edu.au Wed Feb 28 05:25:08 2007 From: n.saunders at uq.edu.au (Neil Saunders) Date: Wed, 28 Feb 2007 20:25:08 +1000 Subject: [Bioperl-l] Bio::Factory::EMBOSS, CGI and taint Message-ID: <45E55884.9010908@uq.edu.au> Dear Bioperlers, I'm trying to understand an error that occurs when Bio::Factory::EMBOSS is used in a CGI script. Using BioPerl 1.5.2 on Ubuntu Dapper, Apache 2.0.55, Perl 5.8.7. If I load this test CGI script (cgi.pl) in a browser: BEGIN CODE ---------- #!/usr/bin/perl -Tw use strict; use CGI; use Bio::Factory::EMBOSS; my $cgi = new CGI; my $f = new Bio::Factory::EMBOSS; print $cgi->header, $cgi->start_html, $cgi->end_html; -------- END CODE I get a 500 server error and the Apache error log reads: [error] [client 192.168.0.3] Premature end of script headers: cgi.pl I can fix this in 2 ways: (1) Move the "my $f = new Bio::Factory::EMBOSS" line to the end of the script, which isn't a very useful fix. (2) Remove the -T switch from the shebang line There seem to be a few old posts on the list regarding "taint-safe" modules. It seems that the new Bio::Factory::EMBOSS object is interfering with the headers in some way, but I'm no CGI.pm guru and wondered if anyone could shed light on this. thanks, Neil -- School of Molecular and Microbial Sciences University of Queensland Brisbane 4072 Australia http://nsaunders.wordpress.com From n.saunders at uq.edu.au Wed Feb 28 05:30:31 2007 From: n.saunders at uq.edu.au (Neil Saunders) Date: Wed, 28 Feb 2007 20:30:31 +1000 Subject: [Bioperl-l] more on Bio::Factory::EMBOSS, CGI and taint Message-ID: <45E559C7.1090308@uq.edu.au> Further to my previous email, adding: BEGIN { $|=1; print "Content-type: text/html\n\n"; use CGI::Carp('fatalsToBrowser'); } to my CGI script generates: Insecure $ENV{PATH} while running with -T switch at /usr/local/share/perl/5.8.7/Bio/Factory/EMBOSS.pm line 251. Neil -- School of Molecular and Microbial Sciences University of Queensland Brisbane 4072 Australia http://nsaunders.wordpress.com From n.saunders at uq.edu.au Wed Feb 28 05:50:58 2007 From: n.saunders at uq.edu.au (Neil Saunders) Date: Wed, 28 Feb 2007 20:50:58 +1000 Subject: [Bioperl-l] CGI taint solved Message-ID: <45E55E92.10608@uq.edu.au> Apologies for running a one-man thread, but I realised that I've now answered my own question regarding errors with CGI, Bio::Factory::EMBOSS and taint. Given that the EMBOSS binaries are in /usr/local/bin, adding: $ENV{'PATH'} = '/usr/local/bin' near the top of the script does the trick. Neil -- School of Molecular and Microbial Sciences University of Queensland Brisbane 4072 Australia http://nsaunders.wordpress.com From cjfields at uiuc.edu Wed Feb 28 08:39:24 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Feb 2007 07:39:24 -0600 Subject: [Bioperl-l] CGI taint solved In-Reply-To: <45E55E92.10608@uq.edu.au> References: <45E55E92.10608@uq.edu.au> Message-ID: That could possibly clobber any other program calls from within the same script (unless they reside in /usr/local/bin) since you're explicitly assigning PATH, not appending: $ENV{"PATH"} = '/usr/local/bin'; gets me (printing $ENV{"PATH"}): /usr/local/bin whereas this: $ENV{"PATH"} = '/usr/local/bin:' . $ENV{"PATH"}; gets me: /usr/local/bin:/Users/cjfields/bin:/Users/cjfields/dart/bin:/usr/ local/mysql/bin:/usr/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin There's probably a File::* module that does this safely per OS flavor. chris On Feb 28, 2007, at 4:50 AM, Neil Saunders wrote: > Apologies for running a one-man thread, but I realised that I've > now answered my > own question regarding errors with CGI, Bio::Factory::EMBOSS and > taint. > > Given that the EMBOSS binaries are in /usr/local/bin, adding: > > $ENV{'PATH'} = '/usr/local/bin' > > near the top of the script does the trick. > > > Neil > -- > School of Molecular and Microbial Sciences > University of Queensland > Brisbane 4072 Australia > > http://nsaunders.wordpress.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From stefan.kirov at bms.com Wed Feb 28 10:35:31 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Wed, 28 Feb 2007 10:35:31 -0500 Subject: [Bioperl-l] CGI taint solved In-Reply-To: References: <45E55E92.10608@uq.edu.au> Message-ID: <45E5A143.3080303@bms.com> Neil, I believe this is your situation: http://wn.cyberwerks.com/2000/0411.html my advice: any commands executed from within cgi script should have a path hardcoded whenever possible. If those commands require different path, try writing a wrapper shell script that sets the environment (which should be reset to the default once the shell script terminates). It all also depends on the type of environment you have- it it is not secure you may wish to think hard how to eliminate all security loopholes with CGI, I am definitely not an expert on this. Stefan Chris Fields wrote: > That could possibly clobber any other program calls from within the > same script (unless they reside in /usr/local/bin) since you're > explicitly assigning PATH, not appending: > > $ENV{"PATH"} = '/usr/local/bin'; > > gets me (printing $ENV{"PATH"}): > > /usr/local/bin > > whereas this: > > $ENV{"PATH"} = '/usr/local/bin:' . $ENV{"PATH"}; > > gets me: > > /usr/local/bin:/Users/cjfields/bin:/Users/cjfields/dart/bin:/usr/ > local/mysql/bin:/usr/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin > > There's probably a File::* module that does this safely per OS flavor. > > chris > > On Feb 28, 2007, at 4:50 AM, Neil Saunders wrote: > > >> Apologies for running a one-man thread, but I realised that I've >> now answered my >> own question regarding errors with CGI, Bio::Factory::EMBOSS and >> taint. >> >> Given that the EMBOSS binaries are in /usr/local/bin, adding: >> >> $ENV{'PATH'} = '/usr/local/bin' >> >> near the top of the script does the trick. >> >> >> Neil >> -- >> School of Molecular and Microbial Sciences >> University of Queensland >> Brisbane 4072 Australia >> >> http://nsaunders.wordpress.com >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From lubapardo at gmail.com Wed Feb 28 12:21:07 2007 From: lubapardo at gmail.com (Luba Pardo) Date: Wed, 28 Feb 2007 18:21:07 +0100 Subject: [Bioperl-l] retrieven ids Message-ID: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com> Hi everyone, I wonder if someone could give an advice of the following: I want to retrieve the DNA coding sequence of a RefSeq protein id. I do not want to translate the protein back to DNA, but rather get the DNA coding sequence ID and then retrieve the DNA sequence from Gen Bank. Is there any module that allow to get all possible ids for a sequence given a gi protein ? Thank you very much in advance, L. Pardo From johnston at biochem.ucl.ac.uk Wed Feb 28 12:05:49 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Wed, 28 Feb 2007 17:05:49 +0000 (GMT) Subject: [Bioperl-l] _rearrange Message-ID: hi, Is there a discussion of the rationale behind the _rearrange method somewhere? I'm probably just being gormless, but I think I'm missing the point a bit. Is it okay for a method just to expect named params like ->foo(arg1=>'stuff', arg2=>'things'); ? Cxx From ckuanglim at yahoo.com Wed Feb 28 10:51:50 2007 From: ckuanglim at yahoo.com (Chan Kuang Lim) Date: Wed, 28 Feb 2007 07:51:50 -0800 (PST) Subject: [Bioperl-l] Problem of Installing Bioperl Message-ID: <459942.77644.qm@web60518.mail.yahoo.com> I have problem of installing bioperl in windows using command-line installation. In the cmd windows, after ppm-shell search bioperl install 2 many downloading had done, but the next line is: Unpacking bioperl-1.5.2_100...ppm install failed: Can't extract files from C:.............../Bioperl-1.5.2_100.tgz Hope you can answer my question. Thank you. Regards, Chan Kuang Lim Malaysia --------------------------------- TV dinner still cooling? Check out "Tonight's Picks" on Yahoo! TV. From cjfields at uiuc.edu Wed Feb 28 13:30:45 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Feb 2007 12:30:45 -0600 Subject: [Bioperl-l] _rearrange In-Reply-To: References: Message-ID: <25C736A2-2DCA-413B-8F92-D799F583515B@uiuc.edu> From what I gather it's a convenient utility method that is used for consistent and enforced parameter checking/setting for any method, including the constructor. There are a few modules that don't use _rearrange (Bio::WebAgent::new () comes to mind). It's not required that you use it but the naming conventions for parameters outlined in _rearrange (in Bio::Root::RootI POD) are generally enforced for consistency across classes. As a note, Sendu has committed a related method (_set_from_args) to CVS which works rather well, but I don't think it is in the last release. chris On Feb 28, 2007, at 11:05 AM, Caroline Johnston wrote: > hi, > > Is there a discussion of the rationale behind the _rearrange method > somewhere? I'm probably just being gormless, but I think I'm > missing the > point a bit. > > Is it okay for a method just to expect named params like > ->foo(arg1=>'stuff', arg2=>'things'); ? > > Cxx > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dmessina at wustl.edu Wed Feb 28 14:31:29 2007 From: dmessina at wustl.edu (Dave Messina) Date: Wed, 28 Feb 2007 13:31:29 -0600 (CST) Subject: [Bioperl-l] retrieven ids In-Reply-To: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com> References: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com> Message-ID: <57419.10.0.7.57.1172691089.squirrel@gscmail.wustl.edu> Whenever I'm unsure of how to do something, I first look to see if one of the HOWTOs on bioperl.org covers it. In this case, the Features HOWTO has example code which I think will do what you want. Genbank records typically have the coding sequence of a protein as a feature, so I would do something like: - use the RefSeq protein IDs to query Entrez and get back the Genbank records. - read the Features HOWTO to refresh my memory on the syntax for grabbing features. That HOWTO is at: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation - whip up a little script to loop through the Genbank records one at a time with SeqIO and pull out the cDNA sequence features. Dave From bix at sendu.me.uk Wed Feb 28 14:38:46 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 28 Feb 2007 19:38:46 +0000 Subject: [Bioperl-l] _rearrange In-Reply-To: References: Message-ID: <45E5DA46.3020503@sendu.me.uk> Caroline Johnston wrote: > hi, > > Is there a discussion of the rationale behind the _rearrange method > somewhere? I'm probably just being gormless, but I think I'm missing the > point a bit. > > Is it okay for a method just to expect named params like > ->foo(arg1=>'stuff', arg2=>'things'); ? The Bioperl style for named args is -arg1, and wrong case is allowed as well. So, make use of _rearrange; it won't do you any harm. From johnsonm at gmail.com Wed Feb 28 14:59:09 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Wed, 28 Feb 2007 13:59:09 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer Message-ID: I happen to need something like Bio::Tools::Run::Genemark, so I'm coding one up. When I started on the tests for it, I realized I have a problem. I can distribute a fasta file downloaded from GenBank to use as input, but I can't distribute the model file needed to actually run Genemark ( Genemark.hmm for prokaryotes, gmhmmp, in my case). It took *forever* to get a license, and I'm not thrilled with the prospect of talking them out of a redistributable model file. I'd love to distribute the test, but I don't see how I'm going to be able to. Suggestions? Also, I've settled on IPC::Run instead of system(). The docs indicate the bits of it I'm using should be OK on Windows, except maybe for Win9X. I don't want to clutter up the console, I don't like embedding stdout/stderr redirection in command strings, and I don't want to have to worry about signal handling (What if the child catches a ctrl-c halfway through parsing? What if the parent does?). Anybody object to that? One final thing. I'm lazy, I don't want to deal with parsing arguments to the constructor, so I'm just calling _rearrange() to deal with it. The Bio::Tools:: parsers all take dash options, but it looks like a bunch of the stuff in Bio::Tools::Run:: takes dashless args. Objections? From dmessina at wustl.edu Wed Feb 28 15:14:56 2007 From: dmessina at wustl.edu (Dave Messina) Date: Wed, 28 Feb 2007 14:14:56 -0600 (CST) Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: References: Message-ID: <57538.10.0.7.57.1172693696.squirrel@gscmail.wustl.edu> > I'm not thrilled with the prospect of talking them out of a redistributable > model file. I suppose it's not possible to fake your own, or at least the parts of it you're testing for? If not, I'd put the tests in a skip block while waiting to hear from the Genemark folks. > The Bio::Tools:: parsers all take dash options, but it looks like a bunch of > the stuff in Bio::Tools::Run:: takes dashless args. Objections? Sendu will chime in I'm sure, but I think he was planning to switch everything in Bio::Tools::Run over to dashed args anyway... Dave From bix at sendu.me.uk Wed Feb 28 15:52:23 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 28 Feb 2007 20:52:23 +0000 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: References: Message-ID: <45E5EB87.9020106@sendu.me.uk> Mark Johnson wrote: > One final thing. I'm lazy, I don't want to deal with parsing arguments > to the constructor, so I'm just calling _rearrange() to deal with it. The > Bio::Tools:: parsers all take dash options, but it looks like a bunch of the > stuff in Bio::Tools::Run:: takes dashless args. Objections? You can make use of _set_from_args(). See Bio::Tools::Run::Phylo::Gumby for an example. From bix at sendu.me.uk Wed Feb 28 16:29:32 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 28 Feb 2007 21:29:32 +0000 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails Message-ID: <45E5F43C.9080902@sendu.me.uk> I have GD 2.35 and GD::SVG 2.33 installed. I have a working script in which a Bio::Graphics::Panel object is made and output with: print $panel->png; This is fine. Changing it to: print $panel->svg; Gives the error: Can't locate object method "svg" via package "GD:Image" at /.../Bio/Graphics/Panel.pm line 971, line 192. Am I supposed to do something else to get this to work? Cheers, Sendu. From crabtree at tigr.ORG Wed Feb 28 16:40:52 2007 From: crabtree at tigr.ORG (Jonathan Crabtree) Date: Wed, 28 Feb 2007 16:40:52 -0500 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <45E5F43C.9080902@sendu.me.uk> References: <45E5F43C.9080902@sendu.me.uk> Message-ID: <45E5F6E4.80003@tigr.org> Sendu- I believe you must set 'image_class' to 'GD::SVG' when you create the Panel (and note that older versions of Bio::Graphics::Panel don't know anything about this parameter.) Here's the relevant part of the Panel perldoc: -image_class To create output in scalable vector graphics (SVG), optionally pass the image class parameter 'GD::SVG'. Defaults to using vanilla GD. See the corresponding image_class() method below for details. Jonathan Sendu Bala wrote: > I have GD 2.35 and GD::SVG 2.33 installed. > > I have a working script in which a Bio::Graphics::Panel object is made > and output with: > > print $panel->png; > > This is fine. Changing it to: > > print $panel->svg; > > Gives the error: > > Can't locate object method "svg" via package "GD:Image" at > /.../Bio/Graphics/Panel.pm line 971, line 192. > > > Am I supposed to do something else to get this to work? > > > Cheers, > Sendu. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bix at sendu.me.uk Wed Feb 28 17:01:17 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 28 Feb 2007 22:01:17 +0000 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <45E5F6E4.80003@tigr.org> References: <45E5F43C.9080902@sendu.me.uk> <45E5F6E4.80003@tigr.org> Message-ID: <45E5FBAD.3030404@sendu.me.uk> Jonathan Crabtree wrote: > > Sendu- > > I believe you must set 'image_class' to 'GD::SVG' when you create the > Panel (and note that older versions of Bio::Graphics::Panel don't know > anything about this parameter.) Here's the relevant part of the Panel > perldoc: ... Oh! I had no idea there was any perldoc for these modules, hiding down there at the bottom. Does anyone want to intersperse the docs?... From cjfields at uiuc.edu Wed Feb 28 17:10:54 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Feb 2007 16:10:54 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: References: Message-ID: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> On Feb 28, 2007, at 1:59 PM, Mark Johnson wrote: > I happen to need something like Bio::Tools::Run::Genemark, so > I'm coding > one up. When I started on the tests for it, I realized I have a > problem. I > can distribute a fasta file downloaded from GenBank to use as > input, but I > can't distribute the model file needed to actually run Genemark ( > Genemark.hmm for prokaryotes, gmhmmp, in my case). > It took *forever* to get a license, and I'm not thrilled with the > prospect of talking them out of a redistributable model file. I'd > love to > distribute the test, but I don't see how I'm going to be able to. > Suggestions? For bioperl-run tests you have to have the program installed for tests to work (otherwise they are passed over). Therefore one would assume if you had the GeneMark program you would have the models as well. You could set up your module to require an env. variable be set (like the HMMER module, for instance) which contains the executables and/or the models, so that if it isn't set the tests are skipped. > Also, I've settled on IPC::Run instead of system(). The docs > indicate > the bits of it I'm using should be OK on Windows, except maybe for > Win9X. > I don't want to clutter up the console, I don't like embedding > stdout/stderr > redirection in command strings, and I don't want to have to worry > about > signal handling (What if the child catches a ctrl-c halfway through > parsing? What if the parent does?). Anybody object to that? I wouldn't worry too much about Win9x. Is IPC::Run in perl core? Otherwise we'll need to add it to the optional dependencies for bioperl-run. > One final thing. I'm lazy, I don't want to deal with parsing > arguments > to the constructor, so I'm just calling _rearrange() to deal with > it. The > Bio::Tools:: parsers all take dash options, but it looks like a > bunch of the > stuff in Bio::Tools::Run:: takes dashless args. Objections? Sendu's suggestion (_set_from_args() ) is the best. As mentioned in another thread _rearrange() works as well. chris From johnsonm at gmail.com Wed Feb 28 17:29:36 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Wed, 28 Feb 2007 16:29:36 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: <57538.10.0.7.57.1172693696.squirrel@gscmail.wustl.edu> References: <57538.10.0.7.57.1172693696.squirrel@gscmail.wustl.edu> Message-ID: On 2/28/07, Dave Messina wrote: > > > I'm not thrilled with the prospect of talking them out of a > redistributable model file. > > I suppose it's not possible to fake your own, or at least the parts of it > you're testing for? We got a gzipped tarball with some model files and a precompiled executable (gmhmmp). As far as building a model file goes, I don't even have two sticks to rub together. I suppose it's possible that it's not actually some weird proprietary format, I'll go dig for some docs...but I don't hold out a lot of hope. From sukhinder.sandhu at osumc.edu Wed Feb 28 16:49:31 2007 From: sukhinder.sandhu at osumc.edu (Sukhinder Sandhu) Date: Wed, 28 Feb 2007 16:49:31 -0500 Subject: [Bioperl-l] Problem installing bioperl: plz reply soon. thx Message-ID: Hi I am having trouble installing Bundle::BioPerl through CPAN. I don't know if this has something to do with my having root priveleges. Can you please suggest how may I proceed to get over this. I shall really appreciate any help. I am pasting part of the error it keeps giving after trying to install every module. ###################### CPAN.pm: Going to build G/GA/GAAS/HTML-Parser-3.56.tar.gz make: *** No rule to make target `/System/Library/Perl/5.8.6/darwin-thread-multi-2level/CORE/config.h', needed by `Makefile'. Stop. /usr/bin/make -- NOT OK Running make test Can't test without successful make Running make install make had returned bad status, install seems impossible ############################### Thanks sukhinder From sukhinder.sandhu at osumc.edu Tue Feb 27 23:41:43 2007 From: sukhinder.sandhu at osumc.edu (Sukhinder Sandhu) Date: Tue, 27 Feb 2007 23:41:43 -0500 Subject: [Bioperl-l] Problem installing bioperl-1.5.2_102 Message-ID: Hi I am trying to install bioperl on my MACOSX and having problems. I try to following the instructions both at the www.tc.umn.edu..... And in the README and INSTALL files in the bioperl folder that I downloaded. The error I get is the following: (end part of the output is copied) #################### t/versions........ok t/xs..............skipped all skipped: C_support not enabled Failed Test Stat Wstat Total Fail Failed List of Failed ---------------------------------------------------------------------------- --- t/compat.t 5 1280 60 5 8.33% 25-28 31 4 tests and 31 subtests skipped. Failed 1/22 test scripts, 95.45% okay. 5/683 subtests failed, 99.27% okay. make: *** [test] Error 2 /usr/bin/make test -- NOT OK Running make install make test had returned bad status, won't install without force Couldn't install Module::Build, giving up. BEGIN failed--compilation aborted at ModuleBuildBioperl.pm line 51. Compilation failed in require at Build.PL line 14. BEGIN failed--compilation aborted at Build.PL line 14. ########################################################################### I am not able to figure out whats' going wrong. And when I try to run the CPAN, I get the follwing error. I have no idea how to fix these. Any help is greatly appreciated. ############################################################################ [Sukhinders-Computer:~/Desktop/bioperl-1.5.2_102] sand60% perl -MCPAN -e shell Terminal does not support AddHistory. There seems to be running another CPAN process (pid 7207). Contacting... Lockfile /Users/sand60/.cpan/.lock not writeable by you. Cannot proceed. On UNIX try: rm /Users/sand60/.cpan/.lock and then rerun us. at -e line 1 ################################################### And doing what it says, removing some lock file doesn't help. I am wondering if all this has something to do with having root priveleges on the system and if so , is there an alternative? Thanks sukhinder From stefan.kirov at bms.com Wed Feb 28 16:44:05 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Wed, 28 Feb 2007 16:44:05 -0500 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <45E5F43C.9080902@sendu.me.uk> References: <45E5F43C.9080902@sendu.me.uk> Message-ID: <45E5F7A5.3090805@bms.com> I think you should create the object with -image_class='svg'. Can you post the code with wich you create the object? Stefan Sendu Bala wrote: > I have GD 2.35 and GD::SVG 2.33 installed. > > I have a working script in which a Bio::Graphics::Panel object is made > and output with: > > print $panel->png; > > This is fine. Changing it to: > > print $panel->svg; > > Gives the error: > > Can't locate object method "svg" via package "GD:Image" at > /.../Bio/Graphics/Panel.pm line 971, line 192. > > > Am I supposed to do something else to get this to work? > > > Cheers, > Sendu. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From johnsonm at gmail.com Wed Feb 28 17:54:02 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Wed, 28 Feb 2007 16:54:02 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> References: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> Message-ID: On 2/28/07, Chris Fields wrote: > For bioperl-run tests you have to have the program installed for > tests to work (otherwise they are passed over). Therefore one would > assume if you had the GeneMark program you would have the models as > well. > > You could set up your module to require an env. variable be set (like > the HMMER module, for instance) which contains the executables and/or > the models, so that if it isn't set the tests are skipped. Sounds like a plan. I wouldn't worry too much about Win9x. Is IPC::Run in perl core? > Otherwise we'll need to add it to the optional dependencies for > bioperl-run. I'd test it, but I don't have access to any Win9x boxes anymore. IPC::Run is not a core module, but I think it's worth the dependency. I considered IPC::Open3, but it can't be made reliable on Win32, something about not being able to select() on filehandles, only sockets. I also looked at IPC::Run3, but under the hood, it's just got STDOUT/STDERR redirection layered on top of system(). I don't like using system() due to issues with signals (Such as the user hitting ctrl-c and taking out the child). I feel better knowing the wrapped executable is in another process disconnected from the console. Sendu's suggestion (_set_from_args() ) is the best. As mentioned in > another thread _rearrange() works as well. I'm using _rearrange() now. I'll look at _set_from_args(). Is either one preferred to the other? From bix at sendu.me.uk Wed Feb 28 19:13:29 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 01 Mar 2007 00:13:29 +0000 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: References: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> Message-ID: <45E61AA9.9030906@sendu.me.uk> Mark Johnson wrote: > I'm using _rearrange() now. I'll look at _set_from_args(). Is either one > preferred to the other? _set_from_args() is implemented using _rearrange() iirc. In any case, they do different things but _set_from_args() just makes creating wrapper modules a lot simpler. Another example: compare revisions 1.15 and 1.16 of Bio::Tools::Run::Alignment::Lagan where I reimplemented it to use _set_from_args() and _setparams(). http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-run/Bio/Tools/Run/Alignment/Lagan.pm.diff?r1=text&tr1=1.15&r2=text&tr2=1.16&diff_format=h So, its new, but I'd recommend new modules, especially wrappers, make use of it. From bix at sendu.me.uk Wed Feb 28 19:19:29 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 01 Mar 2007 00:19:29 +0000 Subject: [Bioperl-l] Problem of Installing Bioperl In-Reply-To: <459942.77644.qm@web60518.mail.yahoo.com> References: <459942.77644.qm@web60518.mail.yahoo.com> Message-ID: <45E61C11.90806@sendu.me.uk> Chan Kuang Lim wrote: > I have problem of installing bioperl in windows using command-line installation. > In the cmd windows, after > ppm-shell > search bioperl > install 2 > > many downloading had done, but the next line is: > Unpacking bioperl-1.5.2_100...ppm install failed: Can't extract files from C:.............../Bioperl-1.5.2_100.tgz Does that file exist on your system? Is it larger than 0kb? Can you open it yourself? From cjfields at uiuc.edu Wed Feb 28 20:19:31 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Feb 2007 19:19:31 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: <45E61AA9.9030906@sendu.me.uk> References: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> <45E61AA9.9030906@sendu.me.uk> Message-ID: <93734147-BDDE-4D73-B8F1-FB4A9D073F9B@uiuc.edu> On Feb 28, 2007, at 6:13 PM, Sendu Bala wrote: > Mark Johnson wrote: >> I'm using _rearrange() now. I'll look at _set_from_args(). Is >> either one >> preferred to the other? > > _set_from_args() is implemented using _rearrange() iirc. In any case, > they do different things but _set_from_args() just makes creating > wrapper modules a lot simpler. Another example: compare revisions 1.15 > and 1.16 of Bio::Tools::Run::Alignment::Lagan where I reimplemented it > to use _set_from_args() and _setparams(). > > http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-run/Bio/Tools/Run/ > Alignment/Lagan.pm.diff? > r1=text&tr1=1.15&r2=text&tr2=1.16&diff_format=h > > So, its new, but I'd recommend new modules, especially wrappers, make > use of it. Agreed; I think it allows for parameter variations (dashed, dashless, etc) and can create on-the-fly simple get/setters, so is particularly suited for wrappers. _rearrange() will always have use in situations where using named parameters helps (long arg lists) but you don't want get/setters, just values. From dmessina at wustl.edu Wed Feb 28 20:40:39 2007 From: dmessina at wustl.edu (Dave Messina) Date: Wed, 28 Feb 2007 19:40:39 -0600 (CST) Subject: [Bioperl-l] Problem installing bioperl-1.5.2_102 In-Reply-To: References: Message-ID: <58485.75.33.119.169.1172713239.squirrel@gscmail.wustl.edu> > t/compat.t 5 1280 60 5 8.33% 25-28 31 This is the test that failed. I think you snipped the part above where the actual errors causing the failure was printed. > There seems to be running another CPAN process (pid 7207). Contacting... > Lockfile /Users/sand60/.cpan/.lock not writeable by you. Cannot proceed. > On UNIX try: > rm /Users/sand60/.cpan/.lock > and then rerun us. > at -e line 1 > ################################################### > And doing what it says, removing some lock file doesn't help. Are you sure the lock file is really being removed? If so, what was the error you got when running it after doing that? Also, this line is important: > /usr/bin/make test -- NOT OK It looks like you're trying to install on OS X. By default, OS X has perl but not make. So /usr/bin/make probably doesn't exist on your system, along with lots of other UNIX tools you'll want. To verify this, type: which /usr/bin/make on the command line. If you get: /usr/bin/make: Command not found. you'll need to install the OS X developer tools, called Xcode. You'll need to register first, but you can get the latest version at: http://developer.apple.com/tools/download/ After you do that, reread the BioPerl install docs and try to install again. Since you don't have root on your machine, be sure to read the part of the install instructions that describe what to do. Dave From hlapp at gmx.net Wed Feb 28 23:16:38 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 28 Feb 2007 23:16:38 -0500 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: References: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> Message-ID: On Feb 28, 2007, at 5:54 PM, Mark Johnson wrote: > I don't like using system() due to issues with > signals (Such as the user hitting ctrl-c and taking out the > child). I feel > better knowing the wrapped executable is in another process > disconnected > from the console. I'm not sure how the user would be able to take out the child hitting ctrl-c if you run it through system() (except if the parent terminates first - but maybe then terminating a run-away child is in good order). I haven't read the IPC::run POD in full detail but you will want to make sure that if the parent gets killed the child does get killed too, or otherwise you'll have a run-away process that novices will have trouble with understanding or terminating. Other than that though IPC::run seems like a useful module, so incurring this as a dependency should be fine. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cuiw at ncbi.nlm.nih.gov Thu Feb 1 09:47:38 2007 From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C]) Date: Thu, 1 Feb 2007 09:47:38 -0500 Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ? In-Reply-To: <45C1059D.1070100@tbi.univie.ac.at> References: <45C1059D.1070100@tbi.univie.ac.at> Message-ID: <18C407FD4FFB424292D769FBD68C1987020BB960@NIHCESMLBX8.nih.gov> This is a simple test from gene ID 3632373 (protein is 46100068) to contig coordinates: perl -MLWP::Simple -e 'map {print $_, "\n" if /<(Gene-source_src.*?>)(.*)?<$1/} (split "\n", get(q{http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&i d=3632373&retmode=xml}))' You need to translate protein id to gene id though. If the genome is available at Map Viewer, try (the contig name is NW_101115 from last step) http://www.ncbi.nlm.nih.gov/mapview/maps.cgi?taxid=5270&gnl=NW_101115&MA PS=genes&cmd=txt Wenwu Cui, PhD -----Original Message----- From: Rainer Machne [mailto:raim at tbi.univie.ac.at] Sent: Wednesday, January 31, 2007 4:10 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ? Dear Bioperl list, hoping not be on the wrong email list, i would have a short question: Is there a standard way or are there nice (Bioperl) tools to come from a gene id (gi) other ids (see below) to the genomic coordinates of the respective gene? We have Fasta files retrieved from NCBI protein Blast in fungal genomes: >gi|46100068|gb|EAK85301.1| hypothetical protein UM04252.1 [Ustilago maydis 521] or >gi|50292953|ref|XP_448909.1| unnamed protein product [Candida glabrata] (we only have gi, ref and gb in my set). I retrieved all my fasta files from whole fungal genomes with available protein sequences at http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?organism=fungi As I only searched whole finished genomes (not shotgun), I thought it would then be easy to get the genomic coordinates and retrieve upstream sequences, but we have failed so far to find a consistent way to do this automatically. Many of the gi entries refer to mRNAs or partial mRNAs and the way to the coordinates seems to differ for each case. Any suggestions would be appreciated. with kind regards, Rainer Machne University of Vienna Department for Theoretical Chemistry Theoretical Biochemistry Group _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From raim at tbi.univie.ac.at Thu Feb 1 07:54:21 2007 From: raim at tbi.univie.ac.at (Rainer Machne) Date: Thu, 01 Feb 2007 13:54:21 +0100 Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ? In-Reply-To: References: <45C1059D.1070100@tbi.univie.ac.at> Message-ID: <45C1E2FD.3070709@tbi.univie.ac.at> Barry and Jason, thanks for your quick and very helpful replies. I guess we should have done (or repeat) our blast search at http://fungal.genome.duke.edu/ to get better mapping from proteins to genomes ? As I retrieved all my proteins via whole genome blasts we should find (most of) them in the genbank files ... a good opportunity for me to learn some Bioperl and the other packages you mentioned in case we want to do more complex analysis later :-) Thank you very much! Rainer Barry Moore wrote: > Rainer, > > We use a perl library called CGL written by Mark Yandell and colleagues > (which in turn uses Chris Mungal's BioChaos and Unflattener.pm referred > to by Jason) for this type of task. The basic pipeline is convert > GenBank files to Chaos XML, then use CGL with those XML files to get a > nice object oriented access to exons, transcripts, proteins, > coordinates and more for of those genes. I am currently using this > with good success on most GenBank genomes (unfortunately I haven't been > working with the fungal genomes, but it should work fine). The Ensembl > API provides similar functionality for Ensembl genomes - but not very > many fungi there. > > http://www.yandell-lab.org/cgl/ > http://www.ensembl.org/info/software/core/core_tutorial.html > > Feel free to contact Mark or myself directly if you are interested in > using CGL. > > Barry > > On Jan 31, 2007, at 2:09 PM, Rainer Machne wrote: > >> Dear Bioperl list, >> >> hoping not be on the wrong email list, i would have a short question: >> >> Is there a standard way or are there nice (Bioperl) tools to come from a >> gene id (gi) other ids (see below) to the genomic coordinates of the >> respective gene? >> >> We have Fasta files retrieved from NCBI protein Blast in fungal genomes: >> >>> gi|46100068|gb|EAK85301.1| hypothetical protein UM04252.1 [Ustilago >> >> maydis 521] >> or >> >>> gi|50292953|ref|XP_448909.1| unnamed protein product [Candida glabrata] >> >> >> (we only have gi, ref and gb in my set). >> >> I retrieved all my fasta files from whole fungal genomes with available >> protein sequences at >> http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?organism=fungi >> >> As I only searched whole finished genomes (not shotgun), I thought it >> would then be easy to get the genomic coordinates and retrieve upstream >> sequences, but we have failed so far to find a consistent way to do this >> automatically. Many of the gi entries refer to mRNAs or partial mRNAs >> and the way to the coordinates seems to differ for each case. >> >> Any suggestions would be appreciated. >> >> with kind regards, >> Rainer Machne >> >> University of Vienna >> Department for Theoretical Chemistry >> Theoretical Biochemistry Group >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at uiuc.edu Thu Feb 1 12:55:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Feb 2007 11:55:27 -0600 Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ? In-Reply-To: <45C1E2FD.3070709@tbi.univie.ac.at> References: <45C1059D.1070100@tbi.univie.ac.at> <45C1E2FD.3070709@tbi.univie.ac.at> Message-ID: On Feb 1, 2007, at 6:54 AM, Rainer Machne wrote: > Barry and Jason, > > thanks for your quick and very helpful replies. > > I guess we should have done (or repeat) our blast search at > http://fungal.genome.duke.edu/ > to get better mapping from proteins to genomes ? > > As I retrieved all my proteins via whole genome blasts we should find > (most of) them in the genbank files ... a good opportunity for me to > learn some Bioperl and the other packages you mentioned in case we > want > to do more complex analysis later :-) > > Thank you very much! > > Rainer If the data is available in GenBank you could run the BLAST searches at NCBI and limit the search with an Entrez query: http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#entrez_query Most (all?) genome files are tagged as complete I'm not sure but there might be a way of doing this via Bio::Tools::Run::RemoteBlast. Jason, any ideas? chris From cjfields at uiuc.edu Thu Feb 1 13:09:16 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Feb 2007 12:09:16 -0600 Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ? In-Reply-To: References: <45C1059D.1070100@tbi.univie.ac.at> <45C1E2FD.3070709@tbi.univie.ac.at> Message-ID: <748CC48E-D224-4234-A5C4-E33968F17418@uiuc.edu> > If the data is available in GenBank you could run the BLAST searches > at NCBI and limit the search with an Entrez query: > > http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#entrez_query > > Most (all?) genome files are tagged as complete sorry, didn't finish that... "Most (all?) genome files are tagged as complete, wgs, in progress, etc. and can be limited by taxonomy using Fungi[ORGN] or similar." chris From jason at bioperl.org Thu Feb 1 13:36:02 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 1 Feb 2007 10:36:02 -0800 Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ? In-Reply-To: References: <45C1059D.1070100@tbi.univie.ac.at> <45C1E2FD.3070709@tbi.univie.ac.at> Message-ID: On Feb 1, 2007, at 9:55 AM, Chris Fields wrote: > > On Feb 1, 2007, at 6:54 AM, Rainer Machne wrote: > >> Barry and Jason, >> >> thanks for your quick and very helpful replies. >> >> I guess we should have done (or repeat) our blast search at >> http://fungal.genome.duke.edu/ >> to get better mapping from proteins to genomes ? >> Well I'm not quite sure of your exact goals. To find upstream regions of known genes, or look at upstream regions of orthologous genes? You can first figure out orthologs based on protein similarities, then go in an extract upstream regions for the orthologous genes (I provide a link to a big all-vs-all FASTA result at the bottom of the page if you want those results, as well as some pairiwise orthology assignments, although you may want more or less stringent parameters). All the GFF and AA data is freely available for download on the site for each genome we've annotated or for annotation we've re-formatted so you can do things locally and/or modify it to your liking. >> As I retrieved all my proteins via whole genome blasts we should find >> (most of) them in the genbank files ... a good opportunity for me to >> learn some Bioperl and the other packages you mentioned in case we >> want >> to do more complex analysis later :-) >> >> Thank you very much! >> >> Rainer > > If the data is available in GenBank you could run the BLAST > searches at NCBI and limit the search with an Entrez query: > > http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#entrez_query > > Most (all?) genome files are tagged as complete > > I'm not sure but there might be a way of doing this via > Bio::Tools::Run::RemoteBlast. Jason, any ideas? > > chris -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From reenayadav at gmail.com Thu Feb 1 13:38:03 2007 From: reenayadav at gmail.com (Reena Yadav) Date: Fri, 2 Feb 2007 00:08:03 +0530 Subject: [Bioperl-l] pdb parser Message-ID: <76f897dd0702011038v7afe0207gb05465478e026205@mail.gmail.com> hi need to extract pdb atomic coordinates (1ake), and do certain calculations. i am going stepwise: steps that involved are: (1) reading the atomic coordinates (2) read the result in a file. need to understand how to whole xyz line in another file. could someone help. R. From jason at bioperl.org Thu Feb 1 08:06:42 2007 From: jason at bioperl.org (sandhya khatal) Date: Thu, 1 Feb 2007 13:06:42 +0000 Subject: [Bioperl-l] Regarding Bioperl program Message-ID: <75899ED1-72C6-4272-8CAC-028CF133A0B4@gmail.com> Respected Sir, I want to do a program which gives dendrogram like UPGMA a clustering method, but i want this dendrogram by using single linkage or centroid method.Can u help me for this.U have given the code for tree but i want dendrogram as output by using above any method. Thanks for anticipating. Regards, Sandhya Khatal. From jason at bioperl.org Thu Feb 1 19:55:26 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 1 Feb 2007 16:55:26 -0800 Subject: [Bioperl-l] Fwd: Regarding Bioperl program References: <394d31ee0702010506j4bbd79dck41d5ac2162eaafdd@mail.gmail.com> Message-ID: <40020502-3421-407D-85EB-24F420AB699C@bioperl.org> re-forwarding Sandhya's email to the list so the email address is visible. The approach that is coded in bioperl is for distance based data such as evolutionary distance of DNA or protein sequences - I assume you are talking about clustering expression data? You may want to focus on the available literature and toolkits that focus on expression data - something BioPerl doesn't deliberately focus on right now. -jason Begin forwarded message: > From: "sandhya khatal" > Date: February 1, 2007 5:06:42 AM PST > To: jason at bioperl.org > Subject: Regarding Bioperl program > > Respected Sir, > I want to do a program which gives dendrogram > like > UPGMA a clustering method, but i want this dendrogram by using single > linkage or centroid method.Can u help me for this.U have given the > code for > tree but i want dendrogram as output by using above any method. > > Thanks for anticipating. > > Regards, > Sandhya Khatal. -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From lzhtom at hotmail.com Thu Feb 1 22:20:10 2007 From: lzhtom at hotmail.com (zhihua li) Date: Fri, 02 Feb 2007 03:20:10 +0000 Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file? Message-ID: _________________________________________________________________ ???????? MSN Explorer: http://explorer.msn.com/lccn/ From lzhtom at hotmail.com Thu Feb 1 22:27:39 2007 From: lzhtom at hotmail.com (zhihua li) Date: Fri, 02 Feb 2007 03:27:39 +0000 Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file? Message-ID: Sorry guys, the former empty mail was sent out by mistake. I'm using Bio::index::Fasta to index a file containing lots of sequences in fasta format. All is fine except one thing. According to the bioperl tutorial and the documents, the following code will make a indexed file: my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx", -write_flag => 1); $inx->make_index("test.fasta"); And in another script I can access the indexed file by sayinig $ENV{BIOPERL_INDEX} = "."; # find index in current directory my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx"); my $seq=$inx->fetch("ent1001"); #fetch the sequence named ent1001 However, after running the first script, I cannot find a new file test.fasta.idx in my current directory. And not surprisingly, when I ran the second script, perl told me it couldn't find "test.fasta.idx". What's going on here? Thanks a lot! _________________________________________________________________ ???????????????????????????? MSN Messenger: http://messenger.msn.com/cn From jason at bioperl.org Fri Feb 2 01:24:44 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 1 Feb 2007 22:24:44 -0800 Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file? In-Reply-To: References: Message-ID: I don't think BIOPERL_INDEX does anything in the module so that documentation is not quite right. the env variable is used in the scripts/index/bp_index and bp_fetch scripts so maybe a cut+paste job went bad somewhere. you need to specify the full path you want with -filename - you can just prepen the BIOPERL_INDEX to the filename like. -filename => $ENV{BIOPERL_INDEX}."/$index" -jason On Feb 1, 2007, at 7:27 PM, zhihua li wrote: > Sorry guys, the former empty mail was sent out by mistake. > > I'm using Bio::index::Fasta to index a file containing lots of > sequences in fasta format. All is fine except one thing. > > According to the bioperl tutorial and the documents, the following > code will make a indexed file: > > my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx", > -write_flag => 1); > $inx->make_index("test.fasta"); > > And in another script I can access the indexed file by sayinig > > $ENV{BIOPERL_INDEX} = "."; # find index in current directory > my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx"); > my $seq=$inx->fetch("ent1001"); #fetch the sequence named > ent1001 > > However, after running the first script, I cannot find a new file > test.fasta.idx in my current directory. And not surprisingly, when > I ran the second script, perl told me it couldn't find > "test.fasta.idx". > > What's going on here? > > Thanks a lot! > > _________________________________________________________________ > ?????????????? MSN Messenger: http:// > messenger.msn.com/cn > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From marian.thieme at lycos.de Fri Feb 2 05:06:09 2007 From: marian.thieme at lycos.de (marian thieme) Date: Fri, 2 Feb 2007 10:06:09 +0000 Subject: [Bioperl-l] seqDiff Message-ID: <101051013116870@lycos-europe.com> An HTML attachment was scrubbed... URL: From marian.thieme at lycos.de Fri Feb 2 06:37:05 2007 From: marian.thieme at lycos.de (marian thieme) Date: Fri, 2 Feb 2007 11:37:05 +0000 Subject: [Bioperl-l] susp. header Message-ID: <188661178024725@lycos-europe.com> An HTML attachment was scrubbed... URL: From lubapardo at gmail.com Fri Feb 2 09:31:06 2007 From: lubapardo at gmail.com (Luba Pardo) Date: Fri, 2 Feb 2007 15:31:06 +0100 Subject: [Bioperl-l] Problem using Bio::DB::Query::GenBank; Message-ID: <58ff33550702020631l4e7bc59dmabcf8c72fa67a6d5@mail.gmail.com> Hello, (I am using bioperl-1.5.2_100, linux machine) I am trying to get the ids of a list of genes using the module Bio::DB::Query:GenBank. I have the following code: use Bio::DB::Query::GenBank; use strict; use warnings; open (READER_1,"list.txt") || die "\n I can't open the file READER_1 !!\n"; my @a1=; close (READER_1); for (my $i=0; $i<=$#a1;$i=$i+1 ) { my @a1_s=split/\s+/,$a1[$i]; my $query_string = ' Homo Sapiens[Organism] AND $a1[$i] '; my $query = Bio::DB::Query::GenBank->new(-db=>'Protein', -query=>$query_string ); my $count = $query->count; my @ids = $query->ids; print " gene: $a1[$i] first id is $ids[0] o no? \n"; I want to tell the program to get all the genes contained in the file list.txt and to retrieve the ids from GenBank. However the program gives me the following error: ------------EXCEPTION: Bio::Root::Exception ------------- MSG: Id list has been truncated even after maxids requested STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.1/Bio/Root/Root.pm:359 STACK: Bio::DB::Query::WebQuery::_fetch_ids /usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:236 STACK: Bio::DB::Query::WebQuery::ids /usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:200 STACK: query.pl:27 ------------------ Is that a problem if I try to use the $a1[$i] to replace the name of the gene? I thank before hand for the attention you may pay to this message Regards, Luba Pardo From hlapp at gmx.net Fri Feb 2 10:44:02 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 2 Feb 2007 10:44:02 -0500 Subject: [Bioperl-l] susp. header In-Reply-To: <188661178024725@lycos-europe.com> References: <188661178024725@lycos-europe.com> Message-ID: You are sending HTML emails. You should configure your mailer to ideally just send plain text. If you really must have fancy formatted emails (i.e., HTML-formatted emails), then configure it such that the mailer will send a plain text and a HTML version. (Many spam filters will flag email the body of which consists of only an HTML attachment.) -hilmar On Feb 2, 2007, at 6:37 AM, marian thieme wrote: > why each message I sent to this list is considered to have a susp. > header ? > > Marian > > Schreiben Sie sich kostenlos ein und erhalten Sie eine Liste mit > 20 Singles aus Ihrer Umgebung.Meetic.de > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cain.cshl at gmail.com Fri Feb 2 11:03:16 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Fri, 02 Feb 2007 11:03:16 -0500 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu> References: <1170359746.2706.622.camel@localhost.localdomain> <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu> Message-ID: <1170432196.2706.661.camel@localhost.localdomain> Hi Hilmar, That is a good idea; when I started down this road, it felt like there would only be a few things that I might want to allow to be different, but I think you are right that having one standard implementation that can be subclassed for legacy systems is a good thing. Scott On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote: > On Feb 1, 2007, at 2:55 PM, Scott Cain wrote: > > > The second main change was to introduce a -flybase_compat argument > > when > > initializing the Bio::SeqIO writer, so that 'old style' cv and cvterms > > (that are compatable with flybase) will be used, but now the default > > will be to use current standards: > > Just my $0.02 ... obviously, Flybase may be the only organization > that uses an 'old style' or any other way not compliant with 'current > standards' (presumably SO), but if it's not the only one then this > approach won't scale. > > Also, an argument -flybase_compat suggests to the unsuspecting that > this is an endorsed flavor of the standard and fine to use for > everyone else too. > > If Flybase is idiosyncratic in this way, why not make chadoxml.pm > compliant with the standard as we all want it, keep it free from > litter caused by usage of old versions of SO, and create a second > module fb-chadoxml.pm that inherits from the first and merely > overrides a few things so that it works for Flybase. This way, other > organizations with similar needs can follow the path and create their > own xyz-chadoxml.pm, rather than having to muck around in the > chadoxml.pm that comes with the distribution. > > I'm not sure I fully grasp the underlying issue, so I may not make > much sense here. Apologies if that's the case ... > > -hilmar -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From bosborne11 at verizon.net Fri Feb 2 10:27:44 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 02 Feb 2007 10:27:44 -0500 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu> Message-ID: Hilmar, I second your motion, good idea. Let's keep the standard module nice and clean. Brian O. On 2/2/07 10:09 AM, "Hilmar Lapp" wrote: > and create a second > module fb-chadoxml.pm that inherits from the first and merely > overrides a few things so that it works for Flybase From Kevin.M.Brown at asu.edu Fri Feb 2 10:52:20 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Fri, 2 Feb 2007 08:52:20 -0700 Subject: [Bioperl-l] Problem using Bio::DB::Query::GenBank; References: <58ff33550702020631l4e7bc59dmabcf8c72fa67a6d5@mail.gmail.com> Message-ID: <1A4207F8295607498283FE9E93B775B402AABA1C@EX02.asurite.ad.asu.edu> It looks like you have some problems with the code you posted. use Bio::DB::Query::GenBank; use strict; use warnings; open (READER_1,"list.txt") || die "\n I can't open the file READER_1 !!\n"; my @a1=; close (READER_1); for (my $i=0; $i < @a1;$i++ ) { # is this necessary as you don't seem to use it anywhere later in your code. my @a1_s=split/\s+/,$a1[$i]; # you enclosed the variable in '' which means perl won't evaluate it # changed the query so that perl can evaluate the variable my $query_string = ' Homo Sapiens[Organism] AND '.$a1[$i] .' '; my $query = Bio::DB::Query::GenBank->new(-db=>'Protein', -query=>$query_string ); my $count = $query->count; my @ids = $query->ids; print " gene: $a1[$i] first id is $ids[0] o no? \n"; -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Luba Pardo Sent: Friday, February 02, 2007 7:31 AM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Problem using Bio::DB::Query::GenBank; Hello, (I am using bioperl-1.5.2_100, linux machine) I am trying to get the ids of a list of genes using the module Bio::DB::Query:GenBank. I have the following code: use Bio::DB::Query::GenBank; use strict; use warnings; open (READER_1,"list.txt") || die "\n I can't open the file READER_1 !!\n"; my @a1=; close (READER_1); for (my $i=0; $i<=$#a1;$i=$i+1 ) { my @a1_s=split/\s+/,$a1[$i]; my $query_string = ' Homo Sapiens[Organism] AND $a1[$i] '; my $query = Bio::DB::Query::GenBank->new(-db=>'Protein', -query=>$query_string ); my $count = $query->count; my @ids = $query->ids; print " gene: $a1[$i] first id is $ids[0] o no? \n"; I want to tell the program to get all the genes contained in the file list.txt and to retrieve the ids from GenBank. However the program gives me the following error: ------------EXCEPTION: Bio::Root::Exception ------------- MSG: Id list has been truncated even after maxids requested STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.1/Bio/Root/Root.pm:359 STACK: Bio::DB::Query::WebQuery::_fetch_ids /usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:236 STACK: Bio::DB::Query::WebQuery::ids /usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:200 STACK: query.pl:27 ------------------ Is that a problem if I try to use the $a1[$i] to replace the name of the gene? I thank before hand for the attention you may pay to this message Regards, Luba Pardo _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri Feb 2 11:37:49 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 2 Feb 2007 10:37:49 -0600 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: <1170432196.2706.661.camel@localhost.localdomain> References: <1170359746.2706.622.camel@localhost.localdomain> <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu> <1170432196.2706.661.camel@localhost.localdomain> Message-ID: <64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu> I was going to suggest maybe allowing one to switch out XML handlers/ writers based on the style (ala XML::SAX), but I see that chadoxml currently uses XML::Writer and there is no next_seq() implemented. Oh well... chris On Feb 2, 2007, at 10:03 AM, Scott Cain wrote: > Hi Hilmar, > > That is a good idea; when I started down this road, it felt like there > would only be a few things that I might want to allow to be different, > but I think you are right that having one standard implementation that > can be subclassed for legacy systems is a good thing. > > Scott > > > On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote: >> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote: >> >>> The second main change was to introduce a -flybase_compat argument >>> when >>> initializing the Bio::SeqIO writer, so that 'old style' cv and >>> cvterms >>> (that are compatable with flybase) will be used, but now the default >>> will be to use current standards: >> >> Just my $0.02 ... obviously, Flybase may be the only organization >> that uses an 'old style' or any other way not compliant with 'current >> standards' (presumably SO), but if it's not the only one then this >> approach won't scale. >> >> Also, an argument -flybase_compat suggests to the unsuspecting that >> this is an endorsed flavor of the standard and fine to use for >> everyone else too. >> >> If Flybase is idiosyncratic in this way, why not make chadoxml.pm >> compliant with the standard as we all want it, keep it free from >> litter caused by usage of old versions of SO, and create a second >> module fb-chadoxml.pm that inherits from the first and merely >> overrides a few things so that it works for Flybase. This way, other >> organizations with similar needs can follow the path and create their >> own xyz-chadoxml.pm, rather than having to muck around in the >> chadoxml.pm that comes with the distribution. >> >> I'm not sure I fully grasp the underlying issue, so I may not make >> much sense here. Apologies if that's the case ... >> >> -hilmar > -- > ---------------------------------------------------------------------- > -- > Scott Cain, Ph. D. > cain.cshl at gmail.com > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Fri Feb 2 11:45:30 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 2 Feb 2007 11:45:30 -0500 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: <64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu> References: <1170359746.2706.622.camel@localhost.localdomain> <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu> <1170432196.2706.661.camel@localhost.localdomain> <64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu> Message-ID: <3A3FF1B0-129A-410D-9DE2-D1CB28015C9A@gmx.net> There must be at least a stub for next_seq(). It may throw a not- implemented exception, but it should not just be absent. -hilmar On Feb 2, 2007, at 11:37 AM, Chris Fields wrote: > I was going to suggest maybe allowing one to switch out XML > handlers/writers based on the style (ala XML::SAX), but I see that > chadoxml currently uses XML::Writer and there is no next_seq() > implemented. Oh well... > > chris > > On Feb 2, 2007, at 10:03 AM, Scott Cain wrote: > >> Hi Hilmar, >> >> That is a good idea; when I started down this road, it felt like >> there >> would only be a few things that I might want to allow to be >> different, >> but I think you are right that having one standard implementation >> that >> can be subclassed for legacy systems is a good thing. >> >> Scott >> >> >> On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote: >>> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote: >>> >>>> The second main change was to introduce a -flybase_compat argument >>>> when >>>> initializing the Bio::SeqIO writer, so that 'old style' cv and >>>> cvterms >>>> (that are compatable with flybase) will be used, but now the >>>> default >>>> will be to use current standards: >>> >>> Just my $0.02 ... obviously, Flybase may be the only organization >>> that uses an 'old style' or any other way not compliant with >>> 'current >>> standards' (presumably SO), but if it's not the only one then this >>> approach won't scale. >>> >>> Also, an argument -flybase_compat suggests to the unsuspecting that >>> this is an endorsed flavor of the standard and fine to use for >>> everyone else too. >>> >>> If Flybase is idiosyncratic in this way, why not make chadoxml.pm >>> compliant with the standard as we all want it, keep it free from >>> litter caused by usage of old versions of SO, and create a second >>> module fb-chadoxml.pm that inherits from the first and merely >>> overrides a few things so that it works for Flybase. This way, other >>> organizations with similar needs can follow the path and create >>> their >>> own xyz-chadoxml.pm, rather than having to muck around in the >>> chadoxml.pm that comes with the distribution. >>> >>> I'm not sure I fully grasp the underlying issue, so I may not make >>> much sense here. Apologies if that's the case ... >>> >>> -hilmar >> -- >> --------------------------------------------------------------------- >> --- >> Scott Cain, Ph. D. >> cain.cshl at gmail.com >> GMOD Coordinator (http://www.gmod.org/) >> 216-392-3087 >> Cold Spring Harbor Laboratory >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cain.cshl at gmail.com Fri Feb 2 12:02:32 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Fri, 02 Feb 2007 12:02:32 -0500 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: <3A3FF1B0-129A-410D-9DE2-D1CB28015C9A@gmx.net> References: <1170359746.2706.622.camel@localhost.localdomain> <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu> <1170432196.2706.661.camel@localhost.localdomain> <64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu> <3A3FF1B0-129A-410D-9DE2-D1CB28015C9A@gmx.net> Message-ID: <1170435752.2706.676.camel@localhost.localdomain> Ah, I'll go ahead and add one, though it will just throw an exception because this is a write-only adapter. Scott On Fri, 2007-02-02 at 11:45 -0500, Hilmar Lapp wrote: > There must be at least a stub for next_seq(). It may throw a not- > implemented exception, but it should not just be absent. > > -hilmar > > On Feb 2, 2007, at 11:37 AM, Chris Fields wrote: > > > I was going to suggest maybe allowing one to switch out XML > > handlers/writers based on the style (ala XML::SAX), but I see that > > chadoxml currently uses XML::Writer and there is no next_seq() > > implemented. Oh well... > > > > chris > > > > On Feb 2, 2007, at 10:03 AM, Scott Cain wrote: > > > >> Hi Hilmar, > >> > >> That is a good idea; when I started down this road, it felt like > >> there > >> would only be a few things that I might want to allow to be > >> different, > >> but I think you are right that having one standard implementation > >> that > >> can be subclassed for legacy systems is a good thing. > >> > >> Scott > >> > >> > >> On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote: > >>> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote: > >>> > >>>> The second main change was to introduce a -flybase_compat argument > >>>> when > >>>> initializing the Bio::SeqIO writer, so that 'old style' cv and > >>>> cvterms > >>>> (that are compatable with flybase) will be used, but now the > >>>> default > >>>> will be to use current standards: > >>> > >>> Just my $0.02 ... obviously, Flybase may be the only organization > >>> that uses an 'old style' or any other way not compliant with > >>> 'current > >>> standards' (presumably SO), but if it's not the only one then this > >>> approach won't scale. > >>> > >>> Also, an argument -flybase_compat suggests to the unsuspecting that > >>> this is an endorsed flavor of the standard and fine to use for > >>> everyone else too. > >>> > >>> If Flybase is idiosyncratic in this way, why not make chadoxml.pm > >>> compliant with the standard as we all want it, keep it free from > >>> litter caused by usage of old versions of SO, and create a second > >>> module fb-chadoxml.pm that inherits from the first and merely > >>> overrides a few things so that it works for Flybase. This way, other > >>> organizations with similar needs can follow the path and create > >>> their > >>> own xyz-chadoxml.pm, rather than having to muck around in the > >>> chadoxml.pm that comes with the distribution. > >>> > >>> I'm not sure I fully grasp the underlying issue, so I may not make > >>> much sense here. Apologies if that's the case ... > >>> > >>> -hilmar > >> -- > >> --------------------------------------------------------------------- > >> --- > >> Scott Cain, Ph. D. > >> cain.cshl at gmail.com > >> GMOD Coordinator (http://www.gmod.org/) > >> 216-392-3087 > >> Cold Spring Harbor Laboratory > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From peili at morgan.harvard.edu Fri Feb 2 10:56:56 2007 From: peili at morgan.harvard.edu (Peili Zhang) Date: Fri, 02 Feb 2007 10:56:56 -0500 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: References: Message-ID: <1170431816.6583.47.camel@jacks> i 'third' Hilmar's opinion. flybase's fingerprint is shown in the module because i wrote it for fb's data loading task. no need to worry about flybase compatibility in making the module generic. in fact, at flybase, i tweak the module frequently to make it work for different scenarios. cheers, peili On Fri, 2007-02-02 at 10:27, Brian Osborne wrote: > Hilmar, > > I second your motion, good idea. Let's keep the standard module nice and > clean. > > Brian O. > > > On 2/2/07 10:09 AM, "Hilmar Lapp" wrote: > > > and create a second > > module fb-chadoxml.pm that inherits from the first and merely > > overrides a few things so that it works for Flybase > > > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier. > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Gmod-schema mailing list > Gmod-schema at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-schema > From cain.cshl at gmail.com Fri Feb 2 13:05:47 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Fri, 02 Feb 2007 13:05:47 -0500 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: <1170431816.6583.47.camel@jacks> References: <1170431816.6583.47.camel@jacks> Message-ID: <1170439549.2706.683.camel@localhost.localdomain> Hi Peili, A little bit ago I checked in Bio::SeqIO::flybase_chadoxml that is fairly simple. My suggestion is that when you make tweaks for different scenarios, that you turn the things you are tweaking into methods in BSIO::chadoxml and then override them in flybase_chadoxml (and commit at least the chadoxml module) to make it more flexible when other people have similar scenarios. Scott On Fri, 2007-02-02 at 10:56 -0500, Peili Zhang wrote: > i 'third' Hilmar's opinion. flybase's fingerprint is shown in the module > because i wrote it for fb's data loading task. no need to worry about > flybase compatibility in making the module generic. in fact, at flybase, > i tweak the module frequently to make it work for different scenarios. > > cheers, > peili > > On Fri, 2007-02-02 at 10:27, Brian Osborne wrote: > > Hilmar, > > > > I second your motion, good idea. Let's keep the standard module nice and > > clean. > > > > Brian O. > > > > > > On 2/2/07 10:09 AM, "Hilmar Lapp" wrote: > > > > > and create a second > > > module fb-chadoxml.pm that inherits from the first and merely > > > overrides a few things so that it works for Flybase > > > > > > > > ------------------------------------------------------------------------- > > Using Tomcat but need to do more? Need to support web services, security? > > Get stuff done quickly with pre-integrated technology to make your job easier. > > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > > _______________________________________________ > > Gmod-schema mailing list > > Gmod-schema at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From cjfields at uiuc.edu Fri Feb 2 15:33:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 2 Feb 2007 14:33:46 -0600 Subject: [Bioperl-l] seqDiff In-Reply-To: <101051013116870@lycos-europe.com> References: <101051013116870@lycos-europe.com> Message-ID: Judging by the code you'll have to recreate the SeqDiff while iterating through various alleles; there is no method to remove particular variants or purge them (at least I couldn't find one). I also noticed SeqDiff doesn't support deletions/insertions either; using a null allele (no seq) or leaving out either the mutant or original allele leads to errors. I'll look into the latter, and I may try to add a method to at least purge variants and reset dna_mut(). chris On Feb 2, 2007, at 4:06 AM, marian thieme wrote: > HI, > > is there a way to put out all mutated sequences of a seqdiff object ? > Suppose I add some variants via: > > $dnamut->add_Allele($a2); > $dnamut->add_Allele($a3); > $seqDiff->add_Variant($dnamut); > > and afterwards want to access the alternative sequences via > $seqDiff->dna_mut() > > which allele is choosen when using dna_mut(), respective can I > control to access the first or the second alternate sequence ? > If yes, how can I do this ? > > Regards, > Marian > > Brauchst du eine Schocktherapie gegen den Alltag? L?chle! Die warme > Sonne von Ibiza und ein bisschen Sand vom Mittelmeer ist die > Therapie, die du brauchst. Plan deinen Urlaub in Spanien auf > www.spain.info > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From MEC at stowers-institute.org Fri Feb 2 16:47:08 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 2 Feb 2007 15:47:08 -0600 Subject: [Bioperl-l] Bio::DB::SeqFeature treamtent of tags and annotations Message-ID: Lincoln, I don't think that adding this directive is a good idea after all either. But, I see that you remap the ID= to a load_id attribute which is preserved in the Bio::DB::SeqFeatureStore database. And then it gets squelched during GFF production by NormalizedFeature::format_attributes. However, if ID is prone to clashes, then certainly simply renaming the attribute to be load_id does not preclude clashes from happening, and only courts disaster. Don't you think? I'm a little blurry on the GFF3Loader, but it looks like you're using load_id to facilitate loading parent/child features out of order. Is that right? If so, I suggest you delete all load_id features immediately after performing a load. It has not further use. Or, you might consider instead of `round-trip-ids` directive, rather, give the GFF3Loader an IDAttribute option which would allow the use of the loader to preserve the ID values, but to use a named In my case, munging flybase gff, I would then use it like this: bp_seqfeature_load.PLS --fast --IDAttribute flybaseID which would preserve the ID values in the database but under the FlybaseID attribute for features so loaded. --------------------------------------------- On a related topic: I just committed this patch to Bio::DB::SeqFeature::NormalizedFeature _create_subfeatures : ensure that subfeatures get the `source` of their parent While doing this I wonder: what is the -class that subfeatures are getting from their parent...??? I left it in place. Please advise! Fix my thinking.... ---------------------------------------------- Further, I observe that Bio::Graphics::FeatureBase::new handles the -segments option is to call add_segment. So, when I create a new Bio::DB::SeqFeature with -segments [[ 100,200 ] [300,400]], the -segments option gets handled by Bio::Graphics::FeatureBase::new, which, as mentioned, calls add_segment. The surprising thing to me when thrying to trace through the class modules and understand what is going on is that what gets run at this point is not Bio::Graphics::FeatureBase::add_segment, but rather Bio::DB::SeqFeature::add_segment, whose semantics is different in at least one regard, namely, that it does not set the start and stop of the parent feature from the min and max of the segments. I have committed a patch to Bio::Graphics::FeatureBase with a comment to this effect, and have also patched it's add_segment method to copy the parent's source into the segment. I hope my commits and suggestions further the cause. Let me know if not! -- Malcolm ________________________________ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: Tuesday, January 30, 2007 4:46 PM To: Cook, Malcolm Cc: bioperl list; lstein at cshl.org Subject: Re: Bio::DB::SeqFeature treamtent of tags and annotations I've fixed the first issue in CVS. Sorry for the inconsistency. add_tag_value(), delete_tag_value() and get_Annotations() now all work as expected. The problem with the ID column is that it is supposed to be LOCAL to the GFF3 file and is not intended to be stored in the database. In contrast, Name can survive roundtripping. Perhaps the thing to do is to add a flag to the GFF3 file that turns on ID round-tripping, e.g. ##round-trip-ids: 1 If you like this idea, I can implement it. Lincoln On 1/29/07, Cook, Malcolm < MEC at stowers-institute.org > wrote: Lincoln, Thanks for your suggestions on approach to my problems augmenting Flybase annotation. I am trying to follow them and finding the following oddities The first issue relates to the intermix of 'annotations' and 'tag values'. I find that Bio::DB::SeqFeature implements some of the 'tag' methods and some of the 'Annotation' methods. Here is a perl one-liner that shows values stored using add_tag_value are not retreived with get_tag_values, but rather with get_Annotations. > perl -MBio::DB::SeqFeature -e 'my $f=Bio::DB::SeqFeature->new; $f->add_tag_value("x",666); print "get_tag_values:\t" . $f->get_tag_values("x") . "\nget_Annotations:\t" . $f->get_Annotations("x");' whose output is: get_tag_values: get_Annotations: 666 Tracing this shows me that this results from the fact that: Bio::DB::SeqFeature uses of Bio::Graphics::FeatureBase (via Bio::DB::SeqFeature::NormalizedFeature) which does not support -tags in ->new but rather -attributes, viz: -attributes a hashref of tag value attributes, in which the key is the tag and the value is an array reference of values And though Bio::Graphics::FeatureBase purports to implement Bio::SeqFeatureI, it only partially implements the 'tag' methods (now deprecated and relegated to Bio::AnnotatableI). In particular, the '*' methods Bio::SeqFeatureI are not implemented in Bio::Graphics::FeatureBase has_tag * add_tag_value get_tag_values get_all_tags * remove_tag get_tagset_values get_Annotations As a result, add_tag_value and remove_tag are inherited from different modules whose understanding of tags is not the same! This one-liner : >perl -MClass::ISA -MClass::Inspector -MBio::DB::SeqFeature -e 'my @c = Class::ISA::self_and_super_path("Bio::DB::SeqFeature"); foreach my $fn qw(add_tag_value get_tag_values) {print "\n$fn:\t", join "\t", (grep {Class::Inspector->function_exists($_, $fn)} @c)}' confirms that they are defined in different packages, namely: add_tag_value: Bio::AnnotatableI get_tag_values: Bio::Graphics::FeatureBase Bio::AnnotatableI Proposed solution... hmmmm ..... I dunno.... maybe the following patch to Bio::Graphics::FeatureBase->add_tag_value : sub add_tag_value { my ($self,$tag, at vals) = @_; push @{$self->{attributes}{$tag}}, @vals; } It fixes my use case for now but I'm still concerned and confused about this variety of methods. Suggestions? ------------------------------------------------------------------------ - Also, I think that any "ID" in column 9 of GFF3 float file should be preserved through a round-trip through a Bio::DB::SeqFeature store, but this is not yet possible since any ID attribute in GFF3 column 9 is being lost by GFF3Loader, causing me to locally patch GFF3Loader::handle_feature method to add the following: # mec at stowers-institute.org , wondering why not all attributes are # carried forward, adds ID tag in particular service of # round-tripping ID, which, though present in database as load_id # attribute, was getting lost as itself $unreserved->{ID}= $reserved->{ID} if exists $reserved->{ID}; Poised to patch.... what d'you think? Malcolm Cook Stowers Institute for Medical Research - Kansas City, Missouri ________________________________ From: lincoln.stein at gmail.com [mailto: lincoln.stein at gmail.com ] On Behalf Of Lincoln Stein Sent: Tuesday, December 19, 2006 3:58 PM To: Cook, Malcolm Cc: bioperl list; lstein at cshl.org Subject: Re: bp_seqfeature_load / Bio::DB::SeqFeature::Store::GFF3Loader problems augmenting Flybase annotation Hi Malcom, Your second guess was right. The use case of augmenting an existing gene with additional splice forms isn't provided for. You can get the functionality by making direct calls to Bio::DB::SeqFeature::Store methods: my @genes = $db->get_features_by_name('FBgn0017545'); @genes == 1 or die "Didn't get exactly one gene"; my $parent = $genes[0]; my $parent = $genes[0]; my $chr = $parent->seq_id; my $start = $parent->start; my $end = $parent->end; my $strand = $parent->strand; my $new_splice_form = $db->new_feature(-primary_tag => 'mRNA', -source => 'added', -seq_id => '4r', -strand => $strand, -start => $start+10, -end => $end, ); $parent->add_SeqFeature($new_splice_form); for my $pos ([$start+10,$start+100],[$start+200,$end]) { my ($e_start,$e_end) = @$pos; my $exon = Bio::DB::SeqFeature->new(-primary_tag => 'exon', -store => $db, -seq_id => '4r', -strand => $strand, -start => $e_start, -end => $e_end); $new_splice_form->add_SeqFeature($exon); } I found a bug in updating the seqfeature database when I wrote this script, so you'll have to get the latest biperl live. I think you can use this to write a splice form updating script. In order to support the idea of adding new splice forms to an existing gene using the GFF3 format, I will have to either modify the loader, or write a separate script (probably better to do the latter). It shouldn't be hard if you'd like to give it a try. Lincoln On 12/19/06, Cook, Malcolm > wrote: Lincoln and fellow Bio::DB::SeqFeature travelers, I find that using bp_seqfeature_load.PLS to load subfeatures of genes already loaded using bp_seqfeature_load.PLS fails with ------------- EXCEPTION ------------- MSG: FBgn0017545 doesn't have a primary id STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682 STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663 STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372 STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345 STACK Bio::DB::SeqFeature::Store::GFF3Loader::load /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242 STACK toplevel /home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo ad.PLS:76 Where FBgn0017545 is the ID of a gene previously loaded. I am unsure how to remedy my situation and welcome any advise on correct or improved approach to my problem. Here's some detail if it helps. I am developing a pipeline to design a microarray probes capable of distinguishing among splice variants in drosophila (using latest Flybase dmel_r5.1 annotation). So I 1) load a filtered selection of Flybase annotation using bp_seqfeature_load. (for testing purposes, I am using a single gene's worth of annotation, FBgn0017545.gff, attached). This is done as follows: > bp_seqfeature_load.PLS --create FBgn0017545.gff 2) analyze all the genes in the database, and create GFF3 output each feature of which has a 'Parent' that is a previously loaded gene (i.e. FBgn0017545). (These features represent the unique introns, splice sites, and exonic design targets. Output of this analysis, FBgn0017545_matd.gff, is also attached) 3) load these analysis results into the same database, as follows: > bp_seqfeature_load.PLS FBgn0017545_matd.gff It is at this point that I get the above error. However, I don't get any error and the data loads fine if I load the two files together, as follows: > bp_seqfeature_load.PLS --create <(cat FBgn0017545.gff FBgn0017545_matd.gff) So, I suspect that either I am misunderstanding when/how to use bp_seqfeature_load.PLS or else this use case has not yet arisen and must be provided for somehow. I am running against bioperl-live Thanks for your thoughts and assistance, Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From neha_bafs at yahoo.co.in Mon Feb 5 12:59:03 2007 From: neha_bafs at yahoo.co.in (Neha Nahar) Date: Mon, 5 Feb 2007 17:59:03 +0000 (GMT) Subject: [Bioperl-l] Convert newick to nexus format Message-ID: <20070205175903.30379.qmail@web8701.mail.in.yahoo.com> Hello everyone, I am trying to convert newick tree to nexus format. Using the script (refered from and email from George dated Wed Sep 22 11:52:47 EDT 2004) : /*------------------------------------------------------------*/ $ cat nexus.pl #!/usr/bin/perl -w use Bio::TreeIO; ($NEWICKFILE, $NEXUSFILE) = @ARGV; print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n"; my $treeio = new Bio::TreeIO(-format => 'newick', -file => "$NEWICKFILE"); my $treeout = new Bio::TreeIO(-format => 'nexus', -file => ">$NEXUSFILE"); while(my $tree = $treeio->next_tree) { $treeout->write_tree($treeout); } exit 0; /*------------------------------------------------------------*/ Running the script through command line: Gives the following error: $ ./nexus.pl mrp-input.txt nexus.out newickfile=mrp-input.txt, nexusfile=nexus.out ------------- EXCEPTION ------------- MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170 STACK toplevel ./nexus.pl:23 -------------------------------------- Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm Questions:- 1. Please let me know if I am using the correct version. If not, please point me to the latest one. 2. Provided that the version I am using is the right one, please let me know what is wrong with the script. Thank you. Regards, Neha. -Neha Nahar " Work for cause and not for applause, live to express and not to impress !" --------------------------------- Here?s a new way to find what you're looking for - Yahoo! Answers From jason at bioperl.org Mon Feb 5 13:10:42 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 5 Feb 2007 10:10:42 -0800 Subject: [Bioperl-l] Convert newick to nexus format In-Reply-To: <20070205175903.30379.qmail@web8701.mail.in.yahoo.com> References: <20070205175903.30379.qmail@web8701.mail.in.yahoo.com> Message-ID: <46219DCD-8C6E-4DBE-82F2-D4B58207AD54@bioperl.org> you want to write the TREE out not the TREE WRITER. $treeout->write_tree($tree) not $treeout->write_tree($treeout); On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote: > Hello everyone, > > I am trying to convert newick tree to nexus format. > Using the script (refered from and email from George dated Wed Sep > 22 11:52:47 EDT 2004) : > > /*------------------------------------------------------------*/ > > $ cat nexus.pl > #!/usr/bin/perl -w > > use Bio::TreeIO; > > ($NEWICKFILE, $NEXUSFILE) = @ARGV; > print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n"; > my $treeio = new Bio::TreeIO(-format => 'newick', -file => > "$NEWICKFILE"); > my $treeout = new Bio::TreeIO(-format => 'nexus', -file => "> > $NEXUSFILE"); > while(my $tree = $treeio->next_tree) { > $treeout->write_tree($treeout); > } > > exit 0; > > > /*------------------------------------------------------------*/ > > Running the script through command line: > Gives the following error: > > $ ./nexus.pl mrp-input.txt nexus.out > newickfile=mrp-input.txt, nexusfile=nexus.out > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus.pl:23 > > -------------------------------------- > > > Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ > ~sendu/bioperl/Bio/TreeIO.pm > > Questions:- > > 1. Please let me know if I am using the correct version. > If not, please point me to the latest one. > > 2. Provided that the version I am using is the right one, please > let me know what is wrong with the script. > > Thank you. > Regards, > Neha. > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From nehadnahar at yahoo.co.in Mon Feb 5 13:05:26 2007 From: nehadnahar at yahoo.co.in (Neha Nahar) Date: Mon, 5 Feb 2007 18:05:26 +0000 (GMT) Subject: [Bioperl-l] Convert newick to nexus format Message-ID: <288335.22352.qm@web8412.mail.in.yahoo.com> Hello everyone, I am trying to convert newick tree to nexus format. Using the script (refered from and email from George dated Wed Sep 22 11:52:47 EDT 2004) : /*------------------------------------------------------------*/ $ cat nexus.pl #!/usr/bin/perl -w use Bio::TreeIO; ($NEWICKFILE, $NEXUSFILE) = @ARGV; print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n"; my $treeio = new Bio::TreeIO(-format => 'newick', -file => "$NEWICKFILE"); my $treeout = new Bio::TreeIO(-format => 'nexus', -file => ">$NEXUSFILE"); while(my $tree = $treeio->next_tree) { $treeout->write_tree($treeout); } exit 0; /*------------------------------------------------------------*/ Running the script through command line: Gives the following error: $ ./nexus.pl mrp-input.txt nexus.out newickfile=mrp-input.txt, nexusfile=nexus.out ------------- EXCEPTION ------------- MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170 STACK toplevel ./nexus.pl:23 -------------------------------------- Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm Questions:- 1. Please let me know if I am using the correct version. If not, please point me to the latest one. 2. Provided that the version I am using is the right one, please let me know what is wrong with the script. Thank you. Regards, Neha. -Neha Nahar " Work for cause and not for applause, live to express and not to impress !" --------------------------------- Here?s a new way to find what you're looking for - Yahoo! Answers From hlapp at duke.edu Fri Feb 2 10:09:57 2007 From: hlapp at duke.edu (Hilmar Lapp) Date: Fri, 2 Feb 2007 10:09:57 -0500 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: <1170359746.2706.622.camel@localhost.localdomain> References: <1170359746.2706.622.camel@localhost.localdomain> Message-ID: <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote: > The second main change was to introduce a -flybase_compat argument > when > initializing the Bio::SeqIO writer, so that 'old style' cv and cvterms > (that are compatable with flybase) will be used, but now the default > will be to use current standards: Just my $0.02 ... obviously, Flybase may be the only organization that uses an 'old style' or any other way not compliant with 'current standards' (presumably SO), but if it's not the only one then this approach won't scale. Also, an argument -flybase_compat suggests to the unsuspecting that this is an endorsed flavor of the standard and fine to use for everyone else too. If Flybase is idiosyncratic in this way, why not make chadoxml.pm compliant with the standard as we all want it, keep it free from litter caused by usage of old versions of SO, and create a second module fb-chadoxml.pm that inherits from the first and merely overrides a few things so that it works for Flybase. This way, other organizations with similar needs can follow the path and create their own xyz-chadoxml.pm, rather than having to muck around in the chadoxml.pm that comes with the distribution. I'm not sure I fully grasp the underlying issue, so I may not make much sense here. Apologies if that's the case ... -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== From jason at bioperl.org Mon Feb 5 14:43:09 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 5 Feb 2007 11:43:09 -0800 Subject: [Bioperl-l] Convert newick to nexus format In-Reply-To: <209988.63723.qm@web8715.mail.in.yahoo.com> References: <209988.63723.qm@web8715.mail.in.yahoo.com> Message-ID: <9E477447-67F5-46CA-BCC1-47BB4170EC76@bioperl.org> please cc the mailing list when asking a question or followup. Sorry I don't know what you are doing wrong - you didn't resend your code so I don't know if you still have a typo. This code works fine for me use Bio::TreeIO; use strict; my ($filein,$fileout) = @ARGV; my ($format,$oformat) = qw(newick nexus); my $in = Bio::TreeIO->new(-file => $filein, -format => $format); my $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout"); while( my $t = $in->next_tree ) { $out->write_tree($t); } On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote: > Thank you very much for the reply. > > I fixed the code as per your suggestion,but now am getting a > different error: > > $ ./nexus.pl mrp-input.txt nexus.out > newickfile=mrp-input.txt, nexusfile=nexus.out > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus.pl:23 > > -------------------------------------- > > Please help me out with this script. > > Thank you. > Regards, > Neha > > > > > Jason Stajich wrote: you want to write the TREE > out not the TREE WRITER. > > > $treeout->write_tree($tree) > > not > $treeout->write_tree($treeout); > > On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote: > > Hello everyone, > > > I am trying to convert newick tree to nexus format. > Using the script (refered from and email from George dated Wed Sep > 22 11:52:47 EDT 2004) : > > > /*------------------------------------------------------------*/ > > > $ cat nexus.pl > #!/usr/bin/perl -w > > > use Bio::TreeIO; > > > ($NEWICKFILE, $NEXUSFILE) = @ARGV; > print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n"; > my $treeio = new Bio::TreeIO(-format => 'newick', -file => > "$NEWICKFILE"); > my $treeout = new Bio::TreeIO(-format => 'nexus', -file => "> > $NEXUSFILE"); > while(my $tree = $treeio->next_tree) { > $treeout->write_tree($treeout); > } > > > exit 0; > > > > > /*------------------------------------------------------------*/ > > > Running the script through command line: > Gives the following error: > > > $ ./nexus.pl mrp-input.txt nexus.out > newickfile=mrp-input.txt, nexusfile=nexus.out > > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus.pl:23 > > > -------------------------------------- > > > > > Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ > ~sendu/bioperl/Bio/TreeIO.pm > > > Questions:- > > > 1. Please let me know if I am using the correct version. > If not, please point me to the latest one. > > > 2. Provided that the version I am using is the right one, please > let me know what is wrong with the script. > > > Thank you. > Regards, > Neha. > > > > > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From nehadnahar at yahoo.co.in Mon Feb 5 14:58:08 2007 From: nehadnahar at yahoo.co.in (Neha Nahar) Date: Mon, 5 Feb 2007 19:58:08 +0000 (GMT) Subject: [Bioperl-l] Convert newick to nexus format In-Reply-To: <99196.23114.qm@web8711.mail.in.yahoo.com> Message-ID: <36024.1212.qm@web8405.mail.in.yahoo.com> Hi, Thank you for the code. I tried it but I still get the same exception. ------------- EXCEPTION ------------- MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170 STACK toplevel ./nexus1.pl:18 Please find attached the perl file(nexus.pl). Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm Please let me know if I am using the correct version.If not, please point me to the latest one. Thank you. Regards, nnahar Jason Stajich wrote:please cc the mailing list when asking a question or followup. Sorry I don't know what you are doing wrong - you didn't resend your code so I don't know if you still have a typo. This code works fine for me use Bio::TreeIO; use strict; my ($filein,$fileout) = @ARGV; my ($format,$oformat) = qw(newick nexus); my $in = Bio::TreeIO->new(-file => $filein, -format => $format);my $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout"); while( my $t = $in->next_tree ) { $out->write_tree($t); } On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote: Thank you very much for the reply. I fixed the code as per your suggestion,but now am getting a different error: $ ./nexus.pl mrp-input.txt nexus.out newickfile=mrp-input.txt, nexusfile=nexus.out ------------- EXCEPTION ------------- MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170 STACK toplevel ./nexus.pl:23 -------------------------------------- Please help me out with this script. Thank you. Regards, Neha Jason Stajich wrote: you want to write the TREE out not the TREE WRITER. $treeout->write_tree($tree) not $treeout->write_tree($treeout); On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote: Hello everyone, I am trying to convert newick tree to nexus format. Using the script (refered from and email from George dated Wed Sep 22 11:52:47 EDT 2004) : /*------------------------------------------------------------*/ $ cat nexus.pl #!/usr/bin/perl -w use Bio::TreeIO; ($NEWICKFILE, $NEXUSFILE) = @ARGV; print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n"; my $treeio = new Bio::TreeIO(-format => 'newick', -file => "$NEWICKFILE"); my $treeout = new Bio::TreeIO(-format => 'nexus', -file => ">$NEXUSFILE"); while(my $tree = $treeio->next_tree) { $treeout->write_tree($treeout); } exit 0; /*------------------------------------------------------------*/ Running the script through command line: Gives the following error: $ ./nexus.pl mrp-input.txt nexus.out newickfile=mrp-input.txt, nexusfile=nexus.out ------------- EXCEPTION ------------- MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170 STACK toplevel ./nexus.pl:23 -------------------------------------- Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm Questions:- 1. Please let me know if I am using the correct version. If not, please point me to the latest one. 2. Provided that the version I am using is the right one, please let me know what is wrong with the script. Thank you. Regards, Neha. -Neha Nahar " Work for cause and not for applause, live to express and not to impress !" --------------------------------- Here?s a new way to find what you're looking for - Yahoo! Answers -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ -Neha Nahar " Work for cause and not for applause, live to express and not to impress !" --------------------------------- Here?s a new way to find what you're looking for - Yahoo! Answers -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ -Neha Nahar " Work for cause and not for applause, live to express and not to impress !" --------------------------------- Here?s a new way to find what you're looking for - Yahoo! Answers -Neha Nahar " Work for cause and not for applause, live to express and not to impress !" --------------------------------- Here?s a new way to find what you're looking for - Yahoo! Answers -------------- next part -------------- A non-text attachment was scrubbed... Name: nexus.pl Type: application/x-perl Size: 811 bytes Desc: 1389215665-nexus.pl URL: From jason at bioperl.org Mon Feb 5 17:15:52 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 5 Feb 2007 14:15:52 -0800 Subject: [Bioperl-l] Convert newick to nexus format In-Reply-To: <36024.1212.qm@web8405.mail.in.yahoo.com> References: <36024.1212.qm@web8405.mail.in.yahoo.com> Message-ID: <84703383-600F-42F4-A860-DD0D1C43EE83@bioperl.org> Something is wrong with your install I am guessing - can you run the tests? Go to bioperl directory: $ perl t/TreeIO.t can you describe how you installed bioperl? On Feb 5, 2007, at 11:58 AM, Neha Nahar wrote: > > Hi, > Thank you for the code. > I tried it but I still get the same exception. > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus1.pl:18 > > > Please find attached the perl file(nexus.pl). > > > Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ > ~sendu/bioperl/Bio/TreeIO.pm > > Please let me know if I am using the correct version.If not, please > point me to the latest one. > > Thank you. > Regards, > nnahar > > > > > > Jason Stajich wrote:please cc the mailing list > when asking a question or followup. > > Sorry I don't know what you are doing wrong - you didn't resend > your code so I don't know if you still have a typo. > > This code works fine for me > > use Bio::TreeIO; > use strict; > my ($filein,$fileout) = @ARGV; > my ($format,$oformat) = qw(newick nexus); > my $in = Bio::TreeIO->new(-file => $filein, -format => $format);my > $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout"); > > > while( my $t = $in->next_tree ) { > $out->write_tree($t); > } > > > > On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote: > > Thank you very much for the reply. > > > I fixed the code as per your suggestion,but now am getting a > different error: > > > $ ./nexus.pl mrp-input.txt nexus.out > newickfile=mrp-input.txt, nexusfile=nexus.out > > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus.pl:23 > > > -------------------------------------- > > > Please help me out with this script. > > > Thank you. > Regards, > Neha > > > > > > > > > Jason Stajich wrote: you want to write the TREE > out not the TREE WRITER. > > > > > $treeout->write_tree($tree) > > > not > $treeout->write_tree($treeout); > > > On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote: > > > Hello everyone, > > > > > I am trying to convert newick tree to nexus format. > Using the script (refered from and email from George dated Wed Sep > 22 11:52:47 EDT 2004) : > > > > > /*------------------------------------------------------------*/ > > > > > $ cat nexus.pl > #!/usr/bin/perl -w > > > > > use Bio::TreeIO; > > > > > ($NEWICKFILE, $NEXUSFILE) = @ARGV; > print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n"; > my $treeio = new Bio::TreeIO(-format => 'newick', -file => > "$NEWICKFILE"); > my $treeout = new Bio::TreeIO(-format => 'nexus', -file => "> > $NEXUSFILE"); > while(my $tree = $treeio->next_tree) { > $treeout->write_tree($treeout); > } > > > > > exit 0; > > > > > > > > > /*------------------------------------------------------------*/ > > > > > Running the script through command line: > Gives the following error: > > > > > $ ./nexus.pl mrp-input.txt nexus.out > newickfile=mrp-input.txt, nexusfile=nexus.out > > > > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus.pl:23 > > > > > -------------------------------------- > > > > > > > > > Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ > ~sendu/bioperl/Bio/TreeIO.pm > > > > > Questions:- > > > > > 1. Please let me know if I am using the correct version. > If not, please point me to the latest one. > > > > > 2. Provided that the version I am using is the right one, please > let me know what is wrong with the script. > > > > > Thank you. > Regards, > Neha. > > > > > > > > > > > > > > > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > > > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > > > > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > > > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > > > > > > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not > to impress !" > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > From lzhtom at hotmail.com Mon Feb 5 22:31:56 2007 From: lzhtom at hotmail.com (zhihua li) Date: Tue, 06 Feb 2007 03:31:56 +0000 Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file? In-Reply-To: Message-ID: Thanks a lot! After checking out the script bp_index, I changed the syntax to: my $inx = Bio::Index::Fasta->new("test.fasta.idx", 'WRITE'); $inx->make_index("test.fasta"); Now I have a index file test.fasta.idx in my current directory. And I can use it in my later script by saying my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx"); So now everything is OK. But I don't understand why I have to use that syntax. And why the syntax provided in the document didn't work? >From: Jason Stajich >To: zhihua li >CC: bioperl-l at lists.open-bio.org, arokfl at yahoo.com >Subject: Re: [Bioperl-l] Bio::index::Fasta- where's the indexed file? >Date: Thu, 1 Feb 2007 22:24:44 -0800 > >I don't think BIOPERL_INDEX does anything in the module so that >documentation is not quite right. the env variable is used in the >scripts/index/bp_index and bp_fetch scripts so maybe a cut+paste job >went bad somewhere. > >you need to specify the full path you want with -filename - you can >just prepen the BIOPERL_INDEX to the filename like. >-filename => $ENV{BIOPERL_INDEX}."/$index" > >-jason >On Feb 1, 2007, at 7:27 PM, zhihua li wrote: > > > Sorry guys, the former empty mail was sent out by mistake. > > > > I'm using Bio::index::Fasta to index a file containing lots of > > sequences in fasta format. All is fine except one thing. > > > > According to the bioperl tutorial and the documents, the following > > code will make a indexed file: > > > > my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx", > > -write_flag => 1); > > $inx->make_index("test.fasta"); > > > > And in another script I can access the indexed file by sayinig > > > > $ENV{BIOPERL_INDEX} = "."; # find index in current directory > > my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx"); > > my $seq=$inx->fetch("ent1001"); #fetch the sequence named > > ent1001 > > > > However, after running the first script, I cannot find a new file > > test.fasta.idx in my current directory. And not surprisingly, when > > I ran the second script, perl told me it couldn't find > > "test.fasta.idx". > > > > What's going on here? > > > > Thanks a lot! > > > > _________________________________________________________________ > > ???????????????????????????????????????? MSN Messenger: http:// > > messenger.msn.com/cn > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > >-- >Jason Stajich >Miller Research Fellow >University of California, Berkeley >lab: 510.642.8441 >http://pmb.berkeley.edu/~taylor/people/js.html >http://fungalgenomes.org/ > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l _________________________________________________________________ ???????? MSN Explorer: http://explorer.msn.com/lccn/ From johnston at biochem.ucl.ac.uk Tue Feb 6 06:52:08 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Tue, 6 Feb 2007 11:52:08 +0000 (GMT) Subject: [Bioperl-l] RNA folding Message-ID: Hello, I've just joined the list - I'm a Bioinformatics PhD student at Essex University doing transcriptomics-related things. Mainly microarray analysis and more recently looking at RNA structure prediction. I was thinking about having a go at writing a bioperl-run wrapper around some of the structure prediction stuff, but according to the wiki this is being done already (at least for the Vienna tools). I spoke to Albert Vilella at the EBI the other day and he said Chris Fields was the man to speak to. So could he (or anyone) let me know what the current state of RNA structure prediction tools in bioperl is? Cheers, Cass xx From marian.thieme at lycos.de Tue Feb 6 08:52:10 2007 From: marian.thieme at lycos.de (Marian Thieme) Date: Tue, 06 Feb 2007 14:52:10 +0100 Subject: [Bioperl-l] dbSNP Message-ID: <45C8880A.7030702@lycos.de> Hello all, I looked for a method/class/function/script in the docuementation which provides the opportunity to generate a snp assay suited to submit to dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP/ http://www.ncbi.nlm.nih.gov/projects/SNP/how_to_submit.html) I didnt find those code, but I recognized that there is at least a xml parser to read dbSNP reports. Does anybody know if there is also an output class to generate dbSNP reports ? I could imagine, that at least the snp assay section is worth to be implemented. This example is given by ncbi: TYPE:SNPASSAY HANDLE:WI BATCH: 1.98 MOLTYPE:Genomic METHOD:RESEQ SYN NAMES:WI-SNP,DnaId,MapDna COMMENT: Here is where some public comment that applies to the entire batch of SNPS could be put. PRIVATE: Here is where a note to NCBI regarding processing that would not be seen by the outside, could be put. Note that these are is not exactly real SNPs, as the data were modified. || SNP:WI|WIAF-1234567 SYNONYM:EST4291092,EST8291092,EST7291092 ACCESSION:H30533 LENGTH:101 5'_ASSAY:GGCAGGGAAGGAAAATCCTAGGGNCAGCATTGGGGAGGGGGGGACTCTG OBSERVED:C/T 3'_ASSAY:TAAATTTATTGGGCAACAGGCTGCAGGTGAGGGGGCTGACAGGAGGAGGGA || SNP:WI|WIAF-1722 SYNONYM:STS-T17494,STS-T17494,STS-T17494 ACCESSION:T17494 LENGTH:269 5'_FLANK:CTTTCCCTCATCCCCTCTTCCACCACACCATCCCGGAACAAGTGCTCCAGGATT 5'_ASSAY:CCCTGCCCACTGGCCATTTTGGAGTGTGTCC OBSERVED:A/T 3'_ASSAY:GTGGGTAGCAATGTGGAAACCACCAGGGCCTTTGTGGAGAAAA 3'_FLANK:TGGAGGGGGTTGAGGGAGTCCCAGGAGGGGCTTATTTGAGGGCCTTTGCCACTT GCTCATAGGCGAGCTCGATCTCCTCATCATCTGGACAGGTGGAAGCGAATTCTT CCCGGGCGTAGGCATTGCTCAAGTACCGAT || Regards, Marian P.S. this is not in contradiction to my first request about the brackets notation. We need both formats. From cjfields at uiuc.edu Tue Feb 6 11:45:36 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 6 Feb 2007 10:45:36 -0600 Subject: [Bioperl-l] RNA folding In-Reply-To: References: Message-ID: On Feb 6, 2007, at 5:52 AM, Caroline Johnston wrote: > Hello, > > I've just joined the list - I'm a Bioinformatics PhD student at Essex > University doing transcriptomics-related things. Mainly microarray > analysis and more recently looking at RNA structure prediction. > > I was thinking about having a go at writing a bioperl-run wrapper > around > some of the structure prediction stuff, but according to the wiki > this is > being done already (at least for the Vienna tools). I spoke to Albert > Vilella at the EBI the other day and he said Chris Fields was the > man to > speak to. So could he (or anyone) let me know what the current > state of > RNA structure prediction tools in bioperl is? > > Cheers, > Cass xx > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Actually, the only RNA tool wrappers I have made are ones for ERPIN, RNAMotif, and Infernal (the only one in bioperl-run CVS at this time is RNAMotif). I am planning on writing up wrappers for Vienna, UNAFold, and a few others but haven't really started in. Here's where I'm at right now... I am writing up a new set of AnnotationI classes which positionally describe data (Meta) which I hope will help deal with this stuff. These would be similar in nature to Heikki's Bio::Seq::Meta classes: http://bioperl.org/pipermail/bioperl-l/2006-December/024414.html I would use a regular Bio::SeqI and store the structural data and anything else (such as energy calculations, etc) as Annotation objects in an AnnotationCollection, and then write up a series of SeqIO modules to get data into/out of the designated structure formats, like UNAfold ct, RNAML, and so on. Each sequence would then be capable of holding more than one structural Annotation (i.e. could represent different folding pathways, alternative RNA folds, and so on). At this point I represent the data as an array of hashes where $array [0] is nt 1 and the hash keys indicate the type of interaction, base interacted with, etc. The text representation would be as simple Eddy WUSS (Rfam-like) format by default, which is capable of representing some complex data (pseudoknots, for instance), is compact, and is documented (via the Infernal manual). Tags will probably switch to more ontologically relevant terms (probably from RNAML or RNA Ontology), but in general it is something like this: [ {'interaction' => 'WC', 'base' => 24}, {'interaction' => 'WC', 'base' => 23}, {'interaction' => 'SS'}, ... ] In this implementation every seq position would have some kind of interaction designation, though that's open for debate as it could just be simple text or undef for single-stranded regions. This is also scalable based on complexity of the data: if one wanted to add tert/quaternary interactions, location, base modifications, remote sequence interactions, etc., extra key/value pairs could be used. Comversely, if one only wanted sec structure (for drawing RNA structures, for example), then only that data would be parsed. If you (or anyone listening) have any suggestions I would greatly appreciate them. chris From johnsonm at gmail.com Tue Feb 6 18:53:49 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Tue, 6 Feb 2007 17:53:49 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> Message-ID: Okay, I need to get something going for a project I'm working on. Options: 1) Stick it all in one module: This can get a bit ugly, as Glimmer, as opposed to GlimmerM and GlimmerHMM, does not explicitly identify itself in the prediction report. You can pick up on some unique things in the output file, but you don't know what you've got until you're actually parsing it. Unless you require a format argument up front, then you can split the parsing code up into different functions. 2) Two modules, one for GlimmerM/GlimmerHMM and one for Glimmer2/Glimmer3. With or without an abstract dispatch front end. I suppose at this point, after getting my hands dirty, I'd prefer 1), with an explicit -format => Glimmer2/3/M/HMM arg required in the constructor. Though I'm not opposed to 2) if that is what it takes to get it into Bioperl. If we can achieve some sort of consensus without too much bloodshed, I'll shoot y'all some patches and we can consider this issue checked off the list. On 9/20/06, Mark Johnson wrote: > > I think it's going to be at least two modules, one for the > prokaryotic stuff and one for the eukaryotic. And really, the > prokaryotic stuff is different enough to warrant two modules. So three > different parsers. Could do it in one, but it would be ugly and > nasty. However, this does not preclude three parsers and one abstract > interface, which is your excellent suggestion. > Oh, and excuse me, but I have a bit of a rant here, after dealing > with parsers and pipelines for the last few months. Parsers should > not load the whole input file into RAM to parse it. And Pipelines > using the parsers (Ensembl / biopipe) should not stuff the whole > result set from the parser into a single array. When you're trying to > annotate assemblies, it sucks to have to split up contigs/supercontigs > because the whole result set won't fit into RAM on a 12 gig blade. > Sheesh. Though this doesn't matter for bacterial genomes, as they're > tiny (by comparison to vertebrates). There, sorry, been saving up > that frustration for a while. No offense meant, hope I didn't tick > anybody off. 8) > Torsten: You sound like you know what you're doing with respect > to Bioperl more than I do, and I know I don't have CVS access, so I'll > defer to you. I'd be happy to help out, though. > > > On 9/20/06, Hilmar Lapp wrote: > > > > On Sep 19, 2006, at 9:13 PM, Torsten Seemann wrote: > > > > > I'm not sure whether to > > > > > > 1. parse them all under the same module, perhaps with a > > > -format=>'glimmerXXX' parameter > > > > > > 2. create a single new module Glimmer2 and Glimmer3 > > > > > > 3. create two new modules, one for Glimmer2 and one for Glimmer3, > > > given > > > they are different outputs both in syntax and number of output files > > > > > > Any advice from Bioperl 'old timers' appreciated ;-) > > > > > > > If at all possible I'd favor 1), with e.g. Bio::Tools::GFF being an > > example for how this can work. > > > > If this would amount to basically 4 modules stringed together into > > one file (because the parsing code can't share much if anything > > between the flavors), it'd still be advantageous to have a single > > frontend module that would then dispatch. > > > > -hilmar > > > > -- > > =========================================================== > > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > =========================================================== > > > > > > > > > > > > > From jason at bioperl.org Tue Feb 6 19:33:11 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 6 Feb 2007 16:33:11 -0800 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> Message-ID: <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> I definitely vote for 1) - worst case you have 4 separate methods if there is no good way to condense the parsing for each format and require the user to specify the format. I have no problem with requiring user to specify what program she used - if we can be fancy and guess the format later (i.e. guess format in SeqIO) -then that's icing. -jason On Feb 6, 2007, at 3:53 PM, Mark Johnson wrote: > Okay, I need to get something going for a project I'm working on. > Options: > > 1) Stick it all in one module: This can get a bit ugly, as > Glimmer, as > opposed to GlimmerM and GlimmerHMM, does not explicitly identify > itself in > the prediction report. You can pick up on some unique things in > the output > file, but you don't know what you've got until you're actually > parsing it. > Unless you require a format argument up front, then you can split the > parsing code up into different functions. > 2) Two modules, one for GlimmerM/GlimmerHMM and one for Glimmer2/ > Glimmer3. > With or without an abstract dispatch front end. > > I suppose at this point, after getting my hands dirty, I'd prefer > 1), with > an explicit -format => Glimmer2/3/M/HMM arg required in the > constructor. > Though I'm not opposed to 2) if that is what it takes to get it into > Bioperl. > > If we can achieve some sort of consensus without too much > bloodshed, I'll > shoot y'all some patches and we can consider this issue checked off > the > list. > > On 9/20/06, Mark Johnson wrote: >> >> I think it's going to be at least two modules, one for the >> prokaryotic stuff and one for the eukaryotic. And really, the >> prokaryotic stuff is different enough to warrant two modules. So >> three >> different parsers. Could do it in one, but it would be ugly and >> nasty. However, this does not preclude three parsers and one >> abstract >> interface, which is your excellent suggestion. >> Oh, and excuse me, but I have a bit of a rant here, after dealing >> with parsers and pipelines for the last few months. Parsers should >> not load the whole input file into RAM to parse it. And Pipelines >> using the parsers (Ensembl / biopipe) should not stuff the whole >> result set from the parser into a single array. When you're >> trying to >> annotate assemblies, it sucks to have to split up contigs/ >> supercontigs >> because the whole result set won't fit into RAM on a 12 gig blade. >> Sheesh. Though this doesn't matter for bacterial genomes, as they're >> tiny (by comparison to vertebrates). There, sorry, been saving up >> that frustration for a while. No offense meant, hope I didn't tick >> anybody off. 8) >> Torsten: You sound like you know what you're doing with respect >> to Bioperl more than I do, and I know I don't have CVS access, so >> I'll >> defer to you. I'd be happy to help out, though. >> >> >> On 9/20/06, Hilmar Lapp wrote: >>> >>> On Sep 19, 2006, at 9:13 PM, Torsten Seemann wrote: >>> >>>> I'm not sure whether to >>>> >>>> 1. parse them all under the same module, perhaps with a >>>> -format=>'glimmerXXX' parameter >>>> >>>> 2. create a single new module Glimmer2 and Glimmer3 >>>> >>>> 3. create two new modules, one for Glimmer2 and one for Glimmer3, >>>> given >>>> they are different outputs both in syntax and number of output >>>> files >>>> >>>> Any advice from Bioperl 'old timers' appreciated ;-) >>>> >>> >>> If at all possible I'd favor 1), with e.g. Bio::Tools::GFF being an >>> example for how this can work. >>> >>> If this would amount to basically 4 modules stringed together into >>> one file (because the parsing code can't share much if anything >>> between the flavors), it'd still be advantageous to have a single >>> frontend module that would then dispatch. >>> >>> -hilmar >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >>> >>> >>> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From torsten.seemann at infotech.monash.edu.au Tue Feb 6 21:36:54 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Wed, 7 Feb 2007 13:36:54 +1100 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: > I definitely vote for 1) - worst case you have 4 separate methods if > there is no good way to condense the parsing for each format and > require the user to specify the format. And make the defaut -format to be what is currently parses, ie. GlimmerM/GlimmerHMM > I have no problem with requiring user to specify what program she > used - if we can be fancy and guess the format later (i.e. guess > format in SeqIO) -then that's icing. Agreed. >> Okay, I need to get something going for a project I'm working on. I would normally try to help but I am so swamped with work-work at the moment. Just a reminder that last year I added examples of the different Glimmer outputs to the CVS repository: ./t/data/Glimmer3.predict ./t/data/Glimmer3.detail ./t/data/GlimmerHMM.out ./t/data/Glimmer2.out ./t/data/GlimmerM.out ./t/data/glimmer.out (this was the original one) Thanks for taking this on! --Torsten From mitch_skinner at berkeley.edu Tue Feb 6 23:37:35 2007 From: mitch_skinner at berkeley.edu (Mitch Skinner) Date: Tue, 06 Feb 2007 20:37:35 -0800 Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels Message-ID: <45C9578F.2060802@berkeley.edu> Hello, I'm working on an AJAX version of GBrowse (http://genome.biowiki.org), where we're pre-rendering entire chromosomes by breaking them up into tiles. One of the problems we have is that it takes a long time to render all those tiles. One of the things that's slowing the process down (and using lots of RAM) is rendering the gridlines, and it would make things a lot easier (and faster) for us if we could assume that the gridlines were the same for each tile. Since we're only rendering at a particular set of zoom levels (that we have control over), I think this is a reasonable assumption. Given the right set of zoom levels, the assumption works almost all the time, except for one specific case. It has to do with the way draw_grid and map_pt in Bio::Graphics::Panel work for the very first gridline. Here's how draw_grid (in CVS HEAD) computes the first gridline: my $first_tick = $minor * int($self->start/$minor); $first_tick, $minor and $self->start are in base-pair space, which is 1-based. However, if ($self->start < $minor) then $first_tick is 0. This might not be a problem, except that $first_tick is translated into pixel coordinates in map_pt, which expects 1-based bp coordinates. Here are the relevant lines in map_pt: my $val = $flip ? int (0.5 + $pr - ($length - ($_- 1)) * $scale) : int (0.5 + ($_-$offset-1) * $scale); This style of rounding only works for positive numbers; rounding 0.6 by doing int(0.5 + 0.6) gives you 1 as expected, but rounding -0.6 by doing int(0.5 + -0.6) gives you 0. So if the first three gridlines are at 0, 10, and 20 bp, then (assuming $scale is 1, $offset is 0, $flip evaluates false, and pad left is 0) they're drawn at pixels 0, 9, and 19. I think that there should be gridlines at pixels 0, 10, and 20. The fact that currently the first interval is 9 pixels and the second is 10 pixels is breaking my hopeful assumption about the gridlines. AFAICT my problems are solved if we make two changes: change the above line from draw_grid to this: my $first_tick = 1 + $minor * int(($start - 1)/$minor); and change the lines from map_pt to this: my $val = $flip ? ($pr - ($length - ($_- 1)) * $scale) : (($_-$offset-1) * $scale); $val = int($val + .5 * ($val <=> 0)); Does this make sense? If people agree that these changes are right then I can also produce a proper patch if y'all would prefer that. Regards, Mitch From lstein at cshl.edu Wed Feb 7 07:17:22 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 7 Feb 2007 07:17:22 -0500 Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels In-Reply-To: <45C9578F.2060802@berkeley.edu> References: <45C9578F.2060802@berkeley.edu> Message-ID: <6dce9a0b0702070417k692fedcchfac5b1f0e01f72bd@mail.gmail.com> Hi Mitch, Zero is not a forbidden coordinate, since gbrowse also works on genetic maps which have negative and floating point coordinates. You've simply picked up a boundary case where the rounding isn't working properly. I will fix this now. Lincoln On 2/6/07, Mitch Skinner wrote: > > Hello, > > I'm working on an AJAX version of GBrowse (http://genome.biowiki.org), > where we're pre-rendering entire chromosomes by breaking them up into > tiles. One of the problems we have is that it takes a long time to > render all those tiles. One of the things that's slowing the process > down (and using lots of RAM) is rendering the gridlines, and it would > make things a lot easier (and faster) for us if we could assume that the > gridlines were the same for each tile. Since we're only rendering at a > particular set of zoom levels (that we have control over), I think this > is a reasonable assumption. > > Given the right set of zoom levels, the assumption works almost all the > time, except for one specific case. It has to do with the way draw_grid > and map_pt in Bio::Graphics::Panel work for the very first gridline. > > Here's how draw_grid (in CVS HEAD) computes the first gridline: > > my $first_tick = $minor * int($self->start/$minor); > > $first_tick, $minor and $self->start are in base-pair space, which is > 1-based. However, if ($self->start < $minor) then $first_tick is 0. > This might not be a problem, except that $first_tick is translated into > pixel coordinates in map_pt, which expects 1-based bp coordinates. Here > are the relevant lines in map_pt: > > my $val = $flip > ? int (0.5 + $pr - ($length - ($_- 1)) * $scale) > : int (0.5 + ($_-$offset-1) * $scale); > > This style of rounding only works for positive numbers; rounding 0.6 by > doing int(0.5 + 0.6) gives you 1 as expected, but rounding -0.6 by doing > int(0.5 + -0.6) gives you 0. So if the first three gridlines are at 0, > 10, and 20 bp, then (assuming $scale is 1, $offset is 0, $flip evaluates > false, and pad left is 0) they're drawn at pixels 0, 9, and 19. > > I think that there should be gridlines at pixels 0, 10, and 20. The > fact that currently the first interval is 9 pixels and the second is 10 > pixels is breaking my hopeful assumption about the gridlines. > > AFAICT my problems are solved if we make two changes: > change the above line from draw_grid to this: > my $first_tick = 1 + $minor * int(($start - 1)/$minor); > and change the lines from map_pt to this: > > my $val = $flip > ? ($pr - ($length - ($_- 1)) * $scale) > : (($_-$offset-1) * $scale); > $val = int($val + .5 * ($val <=> 0)); > > Does this make sense? If people agree that these changes are right then > I can also produce a proper patch if y'all would prefer that. > > Regards, > Mitch > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Wed Feb 7 07:18:40 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 7 Feb 2007 07:18:40 -0500 Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels In-Reply-To: <45C9578F.2060802@berkeley.edu> References: <45C9578F.2060802@berkeley.edu> Message-ID: <6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com> However, I'm also very interested in why grid-drawing takes so long. When I've profiled drawing, neither grid drawing nor map_pt() consume any significant amount of time. Lincoln On 2/6/07, Mitch Skinner wrote: > > Hello, > > I'm working on an AJAX version of GBrowse (http://genome.biowiki.org), > where we're pre-rendering entire chromosomes by breaking them up into > tiles. One of the problems we have is that it takes a long time to > render all those tiles. One of the things that's slowing the process > down (and using lots of RAM) is rendering the gridlines, and it would > make things a lot easier (and faster) for us if we could assume that the > gridlines were the same for each tile. Since we're only rendering at a > particular set of zoom levels (that we have control over), I think this > is a reasonable assumption. > > Given the right set of zoom levels, the assumption works almost all the > time, except for one specific case. It has to do with the way draw_grid > and map_pt in Bio::Graphics::Panel work for the very first gridline. > > Here's how draw_grid (in CVS HEAD) computes the first gridline: > > my $first_tick = $minor * int($self->start/$minor); > > $first_tick, $minor and $self->start are in base-pair space, which is > 1-based. However, if ($self->start < $minor) then $first_tick is 0. > This might not be a problem, except that $first_tick is translated into > pixel coordinates in map_pt, which expects 1-based bp coordinates. Here > are the relevant lines in map_pt: > > my $val = $flip > ? int (0.5 + $pr - ($length - ($_- 1)) * $scale) > : int (0.5 + ($_-$offset-1) * $scale); > > This style of rounding only works for positive numbers; rounding 0.6 by > doing int(0.5 + 0.6) gives you 1 as expected, but rounding -0.6 by doing > int(0.5 + -0.6) gives you 0. So if the first three gridlines are at 0, > 10, and 20 bp, then (assuming $scale is 1, $offset is 0, $flip evaluates > false, and pad left is 0) they're drawn at pixels 0, 9, and 19. > > I think that there should be gridlines at pixels 0, 10, and 20. The > fact that currently the first interval is 9 pixels and the second is 10 > pixels is breaking my hopeful assumption about the gridlines. > > AFAICT my problems are solved if we make two changes: > change the above line from draw_grid to this: > my $first_tick = 1 + $minor * int(($start - 1)/$minor); > and change the lines from map_pt to this: > > my $val = $flip > ? ($pr - ($length - ($_- 1)) * $scale) > : (($_-$offset-1) * $scale); > $val = int($val + .5 * ($val <=> 0)); > > Does this make sense? If people agree that these changes are right then > I can also produce a proper patch if y'all would prefer that. > > Regards, > Mitch > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From johnsonm at gmail.com Wed Feb 7 11:50:05 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Wed, 7 Feb 2007 10:50:05 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: Well, each format has some unique features. If the user declines to specify the format, I can figure it out, but it will probably involve scanning the input file twice. I'll take a look. I can do all the parsing in one function, in fact I have, just to see how nasty it would end up being. I just can't stomach having the code that tightly coupled and hard to read. In the end it'll probably be three functions. GlimmermM/HMM are pretty close. Maybe two, Glimmer2 and Glimmer3 aren't *that* different, either. On 2/6/07, Jason Stajich wrote: > > I definitely vote for 1) - worst case you have 4 separate methods if there > is no good way to condense the parsing for each format and require the user > to specify the format. > > I have no problem with requiring user to specify what program she used - > if we can be fancy and guess the format later (i.e. guess format in SeqIO) > -then that's icing. > > -jason > > From adsj at novozymes.com Wed Feb 7 12:11:32 2007 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Wed, 07 Feb 2007 18:11:32 +0100 Subject: [Bioperl-l] Bioperl 1.4 to 1.5.2, adding qualifiers to Bio::Seq-objects Message-ID: <8764adoptn.fsf@topper.koldfront.dk> Hi. I am transitioning from Bioperl 1.4 to 1.5.2, and the qualifiers I add to features in Bio::Seq objects have stopped appearing when I output them as EMBL or GenBank-files. Below is a test-script that exercises the problem. I guess I should be doing something else when adding qualifiers, now with 1.5.2 (as reading an EMBL-file with Bio::SeqIO and outputting it again of course works perfectly), but I can't deduce what from perldoc Bio::SeqFeature::Generic - it still lists the add_tag_value method, and calling it doesn't croak nor warn. I have found some comments on this in the release notes of 1.5.0? on the Bioperl wiki, but I must admit I wasn't able to extract what methods I should be calling instead. If someone could point me to the relevant documentation or tell me what method to use instead, I would be happy as a clam. Best regards, Adam == = use Test::More tests=>2; use strict; use warnings; use Bio::Seq; use Bio::SeqFeature::Generic; use IO::String; use Bio::SeqIO; my $seq=Bio::Seq->new( -seq=>'actgactgactg', ); $seq->display_id('D27'); $seq->accession_number('DB:D27'); my $seq_feature=Bio::SeqFeature::Generic->new( -strand=>1, -primary=>'source', ); $seq_feature->set_attributes(-start=>2, -end=>8); $seq_feature->add_tag_value(note=>'TEST'); $seq_feature->add_tag_value(db_xref=>'DB:D27'); $seq->add_SeqFeature($seq_feature); my $raw=''; my $fh=IO::String->new($raw); my $out=Bio::SeqIO->new(-format=>'EMBL', -fh=>$fh); $out->write_seq($seq); ok($raw=~m!/note!, 'Qualifier note found'); ok($raw=~m!/db_xref!, 'Qualifier db_xref found'); == = ? -- Adam Sj?gren adsj at novozymes.com From cjfields at uiuc.edu Wed Feb 7 12:50:13 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Feb 2007 11:50:13 -0600 Subject: [Bioperl-l] Bioperl 1.4 to 1.5.2, adding qualifiers to Bio::Seq-objects In-Reply-To: <8764adoptn.fsf@topper.koldfront.dk> References: <8764adoptn.fsf@topper.koldfront.dk> Message-ID: On Feb 7, 2007, at 11:11 AM, Adam Sj?gren wrote: > Hi. > > > I am transitioning from Bioperl 1.4 to 1.5.2, and the qualifiers I add > to features in Bio::Seq objects have stopped appearing when I output > them as EMBL or GenBank-files. > > Below is a test-script that exercises the problem. > > I guess I should be doing something else when adding qualifiers, now > with 1.5.2 (as reading an EMBL-file with Bio::SeqIO and outputting it > again of course works perfectly), but I can't deduce what from perldoc > Bio::SeqFeature::Generic - it still lists the add_tag_value method, > and calling it doesn't croak nor warn. > > I have found some comments on this in the release notes of 1.5.0? on > the Bioperl wiki, but I must admit I wasn't able to extract what > methods I should be calling instead. > > If someone could point me to the relevant documentation or tell me > what method to use instead, I would be happy as a clam. > > > Best regards, > > Adam ... This works for me using bioperl-live (Mac OS X): ok 1 - Qualifier note found ok 2 - Qualifier db_xref found If I print the string I get: ID DB:D27; SV 1; linear; unassigned DNA; STD; UNC; 12 BP. XX AC DB:D27; XX XX FH Key Location/Qualifiers FH FT source 2..8 FT /db_xref="DB:D27" FT /note="TEST" XX SQ Sequence 12 BP; 3 A; 3 C; 3 G; 3 T; 0 other; actgactgac tg 12 // GenBank also works: LOCUS D27 12 bp dna linear UNK ACCESSION DB:D27 FEATURES Location/Qualifiers source 2..8 /db_xref="DB:D27" /note="TEST" BASE COUNT 3 a 3 c 3 g 3 t ORIGIN 1 actgactgac tg // If you haven't uninstalled 1.4, make sure you aren't running 1.4 or mixing the two versions (you can check by using 'perldoc -l Bio::Root::Root'). chris From cjfields at uiuc.edu Wed Feb 7 13:04:33 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Feb 2007 12:04:33 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: <91A3A651-C0D5-495F-941F-05B8AA0DDA60@uiuc.edu> On Feb 7, 2007, at 10:50 AM, Mark Johnson wrote: > Well, each format has some unique features. If the user > declines to > specify the format, I can figure it out, but it will probably involve > scanning the input file twice. I'll take a look. > I can do all the parsing in one function, in fact I have, just > to see > how nasty it would end up being. I just can't stomach having the > code that > tightly coupled and hard to read. In the end it'll probably be three > functions. GlimmermM/HMM are pretty close. Maybe two, Glimmer2 and > Glimmer3 aren't *that* different, either. I don't see a problem with passing off the parse to a defined class method either right off or mid-parse. I'm doing something like this with a revamped GenBank parser: # declare local to module my %GLIMMER_METHODS = ( 'GlimmerHMM' => '_parsehmm', 'Glimmer' => '_parsenormal', ....others if needed '_DEFAULT_' => '_parseabnormal' ); ... Then either preparse part of file using _readline() to determine format, or use -format and bypass preparsing: sub next_thingy { ... if (!$format) { while (my $line = $self->_readline()) { if ($line =~ m{(something)}) { $format = $1; $self->_pushback($line); last; } } } my $method = (exists $GLIMMER_METHODS($format)) ? $GLIMMER_METHODS($format) : ($GLIMMER_METHODS('_DEFAULT_'); # fallback to this one return $self->$method() # hand off parsing flow to to proper parser ... } # all parser variants would have this structure: sub _parsehmm { my $self = shift; ... init stuff here while (my $line = $self->_readline()) { ... do stuff until END of next prediction/report } ... return data if any } chris > On 2/6/07, Jason Stajich wrote: >> >> I definitely vote for 1) - worst case you have 4 separate methods >> if there >> is no good way to condense the parsing for each format and require >> the user >> to specify the format. >> >> I have no problem with requiring user to specify what program she >> used - >> if we can be fancy and guess the format later (i.e. guess format >> in SeqIO) >> -then that's icing. >> >> -jason >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From johnston at biochem.ucl.ac.uk Wed Feb 7 13:56:52 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Wed, 7 Feb 2007 18:56:52 +0000 (GMT) Subject: [Bioperl-l] RNA folding In-Reply-To: References: Message-ID: Thanks Chris. Storing the interaction data as a hash according to an ontology and using an extended bracket notation as the string representation seems to make sense, but I'm still unsure how this is supposed to be attached to the Seq objects. You reckon it should be an AnnotationI? I'm not sure I understand the distinction between annotations and features. From the docs I got the impression that Features were like annotation on bits of sequences and had a reference to the sequence to which they belong, whereas annotations don't. If that's the case though, why would RNA structure be an annotation rather than a feature? If not, what is the distinction between them? Are the positional Annotation subclasses you're developing intended to replace features? Have I got the wrong end of the stick entirely? Cheers, Cass On Tue, 6 Feb 2007, Chris Fields wrote: > Actually, the only RNA tool wrappers I have made are ones for ERPIN, > RNAMotif, and Infernal (the only one in bioperl-run CVS at this time > is RNAMotif). I am planning on writing up wrappers for Vienna, > UNAFold, and a few others but haven't really started in. Here's > where I'm at right now... > > I am writing up a new set of AnnotationI classes which positionally > describe data (Meta) which I hope will help deal with this stuff. > These would be similar in nature to Heikki's Bio::Seq::Meta classes: > > http://bioperl.org/pipermail/bioperl-l/2006-December/024414.html > > I would use a regular Bio::SeqI and store the structural data and > anything else (such as energy calculations, etc) as Annotation > objects in an AnnotationCollection, and then write up a series of > SeqIO modules to get data into/out of the designated structure > formats, like UNAfold ct, RNAML, and so on. Each sequence would then > be capable of holding more than one structural Annotation (i.e. could > represent different folding pathways, alternative RNA folds, and so on). > > At this point I represent the data as an array of hashes where $array > [0] is nt 1 and the hash keys indicate the type of interaction, base > interacted with, etc. The text representation would be as simple > Eddy WUSS (Rfam-like) format by default, which is capable of > representing some complex data (pseudoknots, for instance), is > compact, and is documented (via the Infernal manual). Tags will > probably switch to more ontologically relevant terms (probably from > RNAML or RNA Ontology), but in general it is something like this: > > [ > {'interaction' => 'WC', > 'base' => 24}, > {'interaction' => 'WC', > 'base' => 23}, > {'interaction' => 'SS'}, > ... > ] > > In this implementation every seq position would have some kind of > interaction designation, though that's open for debate as it could > just be simple text or undef for single-stranded regions. > > This is also scalable based on complexity of the data: if one wanted > to add tert/quaternary interactions, location, base modifications, > remote sequence interactions, etc., extra key/value pairs could be > used. Comversely, if one only wanted sec structure (for drawing RNA > structures, for example), then only that data would be parsed. > > If you (or anyone listening) have any suggestions I would greatly > appreciate them. > > chris > > From cjfields at uiuc.edu Wed Feb 7 17:15:44 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Feb 2007 16:15:44 -0600 Subject: [Bioperl-l] RNA folding In-Reply-To: References: Message-ID: <7360B66F-6AF3-4CB1-8343-0A19E42AD7F8@uiuc.edu> On Feb 7, 2007, at 12:56 PM, Caroline Johnston wrote: > Thanks Chris. > > Storing the interaction data as a hash according to an ontology and > using > an extended bracket notation as the string representation seems to > make > sense, but I'm still unsure how this is supposed to be > attached to the Seq objects. You reckon it should be an AnnotationI? As long as it describes everything in the object and that there is a reasonable way of textually representing the data, I think you can attach anything as annotation. A recent example is the addition of trees as annotation. Also, Annotation can be used to describe alignments (such as the structure consensus string in Rfam alignments), or added to SeqFeatures. The class just needs to implement AnnotatableI. > I'm not sure I understand the distinction between annotations and > features. From the docs I got the impression that Features were like > annotation on bits of sequences and had a reference to the sequence to > which they belong, whereas annotations don't. If that's the case > though, > why would RNA structure be an annotation rather than a feature? If > not, > what is the distinction between them? Are the positional Annotation > subclasses you're developing intended to replace features? Have I > got the > wrong end of the stick entirely? > > Cheers, > Cass The key distinction between seqfeatures and annotations is that annotations are normally associated with the entire sequence record, while seqfeatures normally describe a part of the sequence (and thus have a location on the sequence). There are a few exceptions, but in general that's that case. The HOWTO gives a bit more background: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation Using annotations or seqfeatures in a case like this may be completely dependent on one's point of view. For instance, one implementation I had considered was adding an interface to Bio::Seq which would allow Seq objects to also have Bio::Structure objects/ since my view is that any sequence could (optionally) have a structure associated with it. However, I reasoned that a sequence could actually have multiple structures (RNA, ssDNA, and protein can have several alternative folds or different folding pathways, for instance). Instead of splitting up each structure into individual seqfeatures (where each which would have to be tagged with the relevant structure and score info), I could have one class encompass all of that data in a reasonable way. Hence I used Annotation. BTW, this isn't meant to replace features in any way. It would be primarily used to describe (1) a sequence as a whole, such as a tRNA sequence, (2) a seqfeature, such as a tRNA, rRNA, riboswitch, etc in a genome sequence, or (3) a conserved structure in an alignment, such as Rfam stockholm output. I'll add that the option of splitting the data into seqfeatures isn't ruled out. It would be a matter of using a helper method, maybe in SeqUtils or directly in Annotation::Meta or whatever I end up calling it. I plan on adding something along those lines at some point. chris From mitch_skinner at berkeley.edu Wed Feb 7 18:26:53 2007 From: mitch_skinner at berkeley.edu (Mitch Skinner) Date: Wed, 07 Feb 2007 15:26:53 -0800 Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels In-Reply-To: <6dce9a0b0702070417k692fedcchfac5b1f0e01f72bd@mail.gmail.com> References: <45C9578F.2060802@berkeley.edu> <6dce9a0b0702070417k692fedcchfac5b1f0e01f72bd@mail.gmail.com> Message-ID: <45CA603D.1070901@berkeley.edu> Lincoln Stein wrote: > Zero is not a forbidden coordinate, since gbrowse also works on > genetic maps which have negative and floating point coordinates. > You've simply picked up a boundary case where the rounding isn't > working properly. I will fix this now. Thanks for the fix. What do you think of the following case?. This is something I actually ran into. Suppose you have: the original draw_grid: my $first_tick = $minor * int($self->start/$minor); and my version of map_pt: my $val = $flip ? ($pr - ($length - ($_- 1)) * $scale) : (($_-$offset-1) * $scale); $val = int($val + .5 * ($val <=> 0)); and scale=0.5, offset=0, pad_left=0, flip=0, and minor=10. Our tiles are currently 1000px wide. So the first gridline will be at 0bp => -1px and the 200th gridline will be at 2000bp => 1000px. So the first tile will not have a gridline at it's 0th pixel but the second tile will have one there. Last night I was thinking that this was an artifact of having gridlines start at 0bp but now I'm thinking this is just because rounding half-pixels leaves an extra space when crossing zero. Which is not unreasonable; it just invalidates the assumption I was hoping to make that the gridlines are the same for each tile. Maybe it's just unreasonable to think that floating point calculations will give pixel-exact results. Or I may just be barking up the wrong tree entirely. Perhaps it's time to reconsider at a higher level (see my next message). Mitch From mitch_skinner at berkeley.edu Wed Feb 7 18:28:11 2007 From: mitch_skinner at berkeley.edu (Mitch Skinner) Date: Wed, 07 Feb 2007 15:28:11 -0800 Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels In-Reply-To: <6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com> References: <45C9578F.2060802@berkeley.edu> <6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com> Message-ID: <45CA608B.80907@berkeley.edu> Lincoln Stein wrote: > However, I'm also very interested in why grid-drawing takes so long. > When I've profiled drawing, neither grid drawing nor map_pt() consume > any significant amount of time. Well, the approach that we've been taking is to hand Bio::Graphics::Panel a fake GD object that stores all of the graphical primitives (line, rectangle, filledRectangle, etc. + their parameters) and then draws them later in chunks (for each tile, we draw all the primitives that overlap its pixel boundaries). We're doing this because trying to create a real GD object that's hundreds of millions of pixels wide takes too much RAM. But storing all the gridlines (for a whole chromosome, at a high zoom level) also takes a lot of RAM, and getting the gridlines for the current tile and translating their coordinates into the coordinate space of the tile also takes a fair amount of CPU. The gridline hack I've been experimenting with (that prompted these emails) was motivated by the hope that the gridlines were regular enough that we wouldn't have to store them explicitly, but just draw the same gridlines over and over again. It runs almost twice as fast as the version that explicitly stores the gridlines. So the main slowdown is not in draw_grid or map_pt, but in our code that's storing/retrieving and translating the gridlines. Which we are also looking into speeding up. But the memory usage is harder to reduce; I've experimented with trying to compress the gridline data but it seems easier to just have the panel draw the grid directly. The more I read the Panel code, the more I think it would be nice to make more use of it. One of the reasons that we're trying to fool it right now is that there seem to be a number of behaviors in it (and/or in the glyphs?) that take the current image boundaries into account (drawing an arrow where a feature runs off the edge of the image, etc.). But in our browser each tile is supposed to mesh seamlessly with its neighbor, so if there's an easy way to turn off those edge-aware behaviors that would be pretty interesting. Ian has also suggested that it might be better to store less information than the full set of graphics primitives. For example, we could just store the Panel's glyph boxes and use their (pixel bound)->feature information to decide which features need to be drawn for each tile. I'm going to be spending some time reading the Bio::Graphics code in more depth. I'd also welcome suggestions from you or anyone on the list. Thanks, Mitch From sdbrown at annular.org Wed Feb 7 18:41:13 2007 From: sdbrown at annular.org (Steven Brown) Date: Wed, 7 Feb 2007 15:41:13 -0800 Subject: [Bioperl-l] Bio::Restriction::Analysis cut site problem in 1.5.2 Message-ID: <3828F5B9-BFF8-47F4-8228-3BEA036513A0@annular.org> The module seems to have trouble handling the cut-site specifiers that surround the sequence that the enzyme is specific for. The error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Bad end parameter (22). End must be less than the total length of sequence (total=6) STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.8.6/ Bio/Root/Root.pm:328 STACK: Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.8.6/ Bio/PrimarySeq.pm:371 STACK: Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/perl5/ site_perl/5.8.6/Bio/Restriction/Analysis.pm:884 STACK: Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ site_perl/5.8.6/Bio/Restriction/Analysis.pm:785 STACK: Bio::Restriction::Analysis::cut /usr/local/lib/perl5/site_perl/ 5.8.6/Bio/Restriction/Analysis.pm:369 STACK: Bio::Restriction::Analysis::cutters /usr/local/lib/perl5/ site_perl/5.8.6/Bio/Restriction/Analysis.pm:678 ---snip (my script line)--- ----------------------------------------------------------- The offending enzyme: ---snip--- <1>AcuI <2>Eco57I,Bsp6II,BspD6II,BspKT5I,Eco112I,Eco125I,FsfI <3>CTGAAG(16/14) ---snip--- If I get rid of the (16/14) the error disappears and the right sequence site is matched. It seems like maybe a decision was made not analyze enzymes with remote cut positions, or the code wouldn't throw the error...? Any information on this would be helpful. Thanks, Steve From adsj at novozymes.com Thu Feb 8 03:55:50 2007 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Thu, 08 Feb 2007 09:55:50 +0100 Subject: [Bioperl-l] Bioperl 1.4 to 1.5.2, adding qualifiers to Bio::Seq-objects References: <8764adoptn.fsf@topper.koldfront.dk> Message-ID: <87fy9hqb8p.fsf@topper.koldfront.dk> On Wed, 7 Feb 2007 11:50:13 -0600, Chris wrote: > This works for me using bioperl-live (Mac OS X): > ok 1 - Qualifier note found > ok 2 - Qualifier db_xref found *slaps forehead* Thanks for the test - your diagnose was spot on: > If you haven't uninstalled 1.4, make sure you aren't running 1.4 or > mixing the two versions (you can check by using 'perldoc -l > Bio::Root::Root'). I had a modified version of Bio::Seq and Bio::SeqFeature::Generic in my @INC (added, and promptly forgotten, writing the patch mentioned here: ). Removing those and patching 1.5.2 fixed my self-inflicted problem. Thanks again! Adam -- Adam Sj?gren adsj at novozymes.com From heikki at sanbi.ac.za Thu Feb 8 04:39:47 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 8 Feb 2007 11:39:47 +0200 Subject: [Bioperl-l] Bio::Restriction::Analysis cut site problem in 1.5.2 In-Reply-To: <3828F5B9-BFF8-47F4-8228-3BEA036513A0@annular.org> References: <3828F5B9-BFF8-47F4-8228-3BEA036513A0@annular.org> Message-ID: <200702081139.48125.heikki@sanbi.ac.za> The error comes from Bio::PrimarySeq::subseq when it tries to cut beyond an existing sequence. Maybe your sequence has a restriction site that is near the end of your sequence? This is a special case which has not been into account in Bio::Restriction::Analysis::_cuts method. The question is : should the site be be detected if its cut site is not within the studied sequence? Please submit a bugzilla bug, so this gets solved. I probably do not have time to tweak the code myself. -Heikki On Thursday 08 February 2007 01:41:13 Steven Brown wrote: > The module seems to have trouble handling the cut-site specifiers > that surround the sequence that the enzyme is specific for. The error: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Bad end parameter (22). End must be less than the total length > of sequence (total=6) > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.8.6/ > Bio/Root/Root.pm:328 > STACK: Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.8.6/ > Bio/PrimarySeq.pm:371 > STACK: Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/perl5/ > site_perl/5.8.6/Bio/Restriction/Analysis.pm:884 > STACK: Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ > site_perl/5.8.6/Bio/Restriction/Analysis.pm:785 > STACK: Bio::Restriction::Analysis::cut /usr/local/lib/perl5/site_perl/ > 5.8.6/Bio/Restriction/Analysis.pm:369 > STACK: Bio::Restriction::Analysis::cutters /usr/local/lib/perl5/ > site_perl/5.8.6/Bio/Restriction/Analysis.pm:678 > ---snip (my script line)--- > ----------------------------------------------------------- > > The offending enzyme: > > ---snip--- > <1>AcuI > <2>Eco57I,Bsp6II,BspD6II,BspKT5I,Eco112I,Eco125I,FsfI > <3>CTGAAG(16/14) > ---snip--- > > If I get rid of the (16/14) the error disappears and the right > sequence site is matched. It seems like maybe a decision was made > not analyze enzymes with remote cut positions, or the code wouldn't > throw the error...? Any information on this would be helpful. > > Thanks, > Steve > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From cjfields at uiuc.edu Thu Feb 8 09:20:26 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 8 Feb 2007 08:20:26 -0600 Subject: [Bioperl-l] BLASTXML changes (good this time!) Message-ID: All, BLAST XML parsing should now work for any CPAN-based XML::SAX parser! XML::SAX::PurePerl (comes with XML::SAX, the slowest) XML::SAX::Expat XML::SAX::ExpatXS (the fastest) XML::LibXML::SAX XML::LibXML::SAX::Parser Grant MacLean has updated XML::SAX on CPAN to fix a XML::SAX:PurePerl bug, so using that parser will necessitate an XML::SAX upgrade. I had also found a bug in the SAX handler which chopped off a large chunk of the description for hits which is now fixed in CVS. If Sendu is out there, I think we can safely remove any dependencies beyond XML::SAX 0.15 for the next release. Should I go ahead and modify Build.PL? chris From lstein at cshl.edu Thu Feb 8 10:51:49 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 8 Feb 2007 10:51:49 -0500 Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels In-Reply-To: <45CA608B.80907@berkeley.edu> References: <45C9578F.2060802@berkeley.edu> <6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com> <45CA608B.80907@berkeley.edu> Message-ID: <6dce9a0b0702080751m210e4d44k3e5c38bfdd3ee9ea@mail.gmail.com> Hi, I like the approach you're taking (creating a fake GD object that stores the graphics primitives). Perhaps the best thing to do is to subclass Panel itself so that it doesn't draw the gridlines (or turn gridlines off completely). Then you can draw gridlines after the fact in each tile as needed. Lincoln On 2/7/07, Mitch Skinner wrote: > > Lincoln Stein wrote: > > However, I'm also very interested in why grid-drawing takes so long. > > When I've profiled drawing, neither grid drawing nor map_pt() consume > > any significant amount of time. > Well, the approach that we've been taking is to hand > Bio::Graphics::Panel a fake GD object that stores all of the graphical > primitives (line, rectangle, filledRectangle, etc. + their parameters) > and then draws them later in chunks (for each tile, we draw all the > primitives that overlap its pixel boundaries). We're doing this because > trying to create a real GD object that's hundreds of millions of pixels > wide takes too much RAM. But storing all the gridlines (for a whole > chromosome, at a high zoom level) also takes a lot of RAM, and getting > the gridlines for the current tile and translating their coordinates > into the coordinate space of the tile also takes a fair amount of CPU. > The gridline hack I've been experimenting with (that prompted these > emails) was motivated by the hope that the gridlines were regular enough > that we wouldn't have to store them explicitly, but just draw the same > gridlines over and over again. It runs almost twice as fast as the > version that explicitly stores the gridlines. > > So the main slowdown is not in draw_grid or map_pt, but in our code > that's storing/retrieving and translating the gridlines. Which we are > also looking into speeding up. But the memory usage is harder to > reduce; I've experimented with trying to compress the gridline data but > it seems easier to just have the panel draw the grid directly. > > The more I read the Panel code, the more I think it would be nice to > make more use of it. One of the reasons that we're trying to fool it > right now is that there seem to be a number of behaviors in it (and/or > in the glyphs?) that take the current image boundaries into account > (drawing an arrow where a feature runs off the edge of the image, > etc.). But in our browser each tile is supposed to mesh seamlessly with > its neighbor, so if there's an easy way to turn off those edge-aware > behaviors that would be pretty interesting. > > Ian has also suggested that it might be better to store less information > than the full set of graphics primitives. For example, we could just > store the Panel's glyph boxes and use their (pixel bound)->feature > information to decide which features need to be drawn for each tile. > > I'm going to be spending some time reading the Bio::Graphics code in > more depth. I'd also welcome suggestions from you or anyone on the list. > > Thanks, > Mitch > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From Kevin.M.Brown at asu.edu Thu Feb 8 10:28:30 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 8 Feb 2007 08:28:30 -0700 Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels References: <45C9578F.2060802@berkeley.edu><6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com> <45CA608B.80907@berkeley.edu> Message-ID: <1A4207F8295607498283FE9E93B775B402AAC1D0@EX02.asurite.ad.asu.edu> > The more I read the Panel code, the more I think it would be > nice to make more use of it. One of the reasons that we're > trying to fool it right now is that there seem to be a number > of behaviors in it (and/or in the glyphs?) that take the > current image boundaries into account (drawing an arrow where > a feature runs off the edge of the image, etc.). But in our > browser each tile is supposed to mesh seamlessly with its > neighbor, so if there's an easy way to turn off those > edge-aware behaviors that would be pretty interesting. I think the glyphs try to deal with edges because if they didn't, then they would flow out into whatever right or left padding had been placed around the image when the panel was created. Something I've noticed is that when I create tiles for the chromosomes I'm working on the panels don't line up because the bump position in one panel is not accounted for when the next panel is drawn. From sheris at eps.berkeley.edu Thu Feb 8 12:42:27 2007 From: sheris at eps.berkeley.edu (Sheri Simmons) Date: Thu, 08 Feb 2007 09:42:27 -0800 Subject: [Bioperl-l] bioperl newbie needs help with extracting cds info Message-ID: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu> Hi, I'm a newbie to BioPerl so apologies if this is a very basic question. I am trying to parse GenBank files with the goal of creating concatenated gene lists in nucleic acid or amino acid format. It is working fine, except for one thing: I need to create gene labels incorporating information on whether the gene is on the complementary strand or not ("complement" in the CDS tag). How can I get Bioperl to tell me whether the CDS tag value includes the word "complement"? Thanks Sheri From george.heller at yahoo.com Thu Feb 8 13:54:41 2007 From: george.heller at yahoo.com (George Heller) Date: Thu, 8 Feb 2007 10:54:41 -0800 (PST) Subject: [Bioperl-l] Perl script to extract from ncbi Message-ID: <178139.85769.qm@web56506.mail.re3.yahoo.com> Hi all, I have a question regarding extracting data from Ncbi. I have a database to store the sequence data, but the files I have loaded into it, dont have a proper description line specified. Based on the accession number, I need to find out what is the genus and species name (organism name) from ncbi. I have about 1500 records for which I need to extract the names from ncbi. Any ideas of how I can go about writing a perl script for extracting this information from ncbi? Thanks! George. --------------------------------- Now that's room service! Choose from over 150,000 hotels in 45,000 destinations on Yahoo! Travel to find your fit. From Kevin.M.Brown at asu.edu Thu Feb 8 14:11:50 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 8 Feb 2007 12:11:50 -0700 Subject: [Bioperl-l] bioperl newbie needs help with extracting cds info References: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu> Message-ID: <1A4207F8295607498283FE9E93B775B402AAC29A@EX02.asurite.ad.asu.edu> When you extract the features, just look at the strand method on the returned sequence to find out. @features = $seq->all_SeqFeatures; # sort features by their primary tags for my $f (@features) { my $tag = $f->primary_tag; if ($tag eq 'CDS') { print $f->strand ."\n"; } } > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Sheri Simmons > Sent: Thursday, February 08, 2007 10:42 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] bioperl newbie needs help with > extracting cds info > > Hi, > I'm a newbie to BioPerl so apologies if this is a very basic > question. I am trying to parse GenBank files with the goal of > creating concatenated gene lists in nucleic acid or amino > acid format. It is working fine, except for one thing: I need > to create gene labels incorporating information on whether > the gene is on the complementary strand or not ("complement" > in the CDS tag). How can I get Bioperl to tell me whether the > CDS tag value includes the word "complement"? > > Thanks > Sheri > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From barry.moore at genetics.utah.edu Thu Feb 8 14:35:03 2007 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu, 8 Feb 2007 12:35:03 -0700 Subject: [Bioperl-l] bioperl newbie needs help with extracting cds info In-Reply-To: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu> References: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu> Message-ID: Sheri- The Bio::SeqFeature::Generic object has a 'strand' method, so you can just call strand on the CDS (or any other) feature like this. my @features = grep { $_->primary_tag eq 'CDS' } $seq- >get_SeqFeatures(); for my $feature (@features) { my $strand = $feature->strand; } Barry On Feb 8, 2007, at 10:42 AM, Sheri Simmons wrote: > Hi, > I'm a newbie to BioPerl so apologies if this is a very basic > question. I am trying to parse GenBank files with the goal of > creating concatenated gene lists in nucleic acid or amino acid > format. It is working fine, except for one thing: I need to create > gene labels incorporating information on whether the gene is on the > complementary strand or not ("complement" in the CDS tag). How can I > get Bioperl to tell me whether the CDS tag value includes the word > "complement"? > > Thanks > Sheri > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From torsten.seemann at infotech.monash.edu.au Thu Feb 8 23:18:33 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 9 Feb 2007 15:18:33 +1100 Subject: [Bioperl-l] BLASTXML changes (good this time!) In-Reply-To: References: Message-ID: Chris, > BLAST XML parsing should now work for any CPAN-based XML::SAX parser! > XML::SAX::PurePerl (comes with XML::SAX, the slowest) > XML::SAX::Expat > XML::SAX::ExpatXS (the fastest) > XML::LibXML::SAX > XML::LibXML::SAX::Parser That's excellent news - thanks for all the work you have put in on this one. I'm impressed. This is a good opportunity to encourage people who use Bio::SearchIO for BLAST parsing to switch to 'blastxml' format over 'blast'. Although the latter is more human readable, it perenially requires parser source changes to cope with the variations and new formatting introduced with each new NCBI BLAST release. Best to use "-m 7" XML format, and convert as appropriate using one of the Bio::Search::Writer:: classes. --Torsten From cjfields at uiuc.edu Fri Feb 9 08:58:24 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 9 Feb 2007 07:58:24 -0600 Subject: [Bioperl-l] BLASTXML changes (good this time!) In-Reply-To: References: Message-ID: <4FC966A7-7A03-41D9-ABF7-6ACD888720FB@uiuc.edu> On Feb 8, 2007, at 10:18 PM, Torsten Seemann wrote: > Chris, > >> BLAST XML parsing should now work for any CPAN-based XML::SAX parser! >> XML::SAX::PurePerl (comes with XML::SAX, the slowest) >> XML::SAX::Expat >> XML::SAX::ExpatXS (the fastest) >> XML::LibXML::SAX >> XML::LibXML::SAX::Parser > > That's excellent news - thanks for all the work you have put in on > this one. I'm impressed. Jason did most of the hard work; I just tinkered with it until it worked (and pestered a few perl XML guys along the way). Thanks Grant and Bj?rn! > This is a good opportunity to encourage people who use Bio::SearchIO > for BLAST parsing to switch to 'blastxml' format over 'blast'. > Although the latter is more human readable, it perenially requires > parser source changes to cope with the variations and new formatting > introduced with each new NCBI BLAST release. Best to use "-m 7" XML > format, and convert as appropriate using one of the > Bio::Search::Writer:: classes. > > --Torsten I'll try getting some benchmarks for the different parsers up today on the wiki if I have time. Strangely enough, NCBI changed a few things about BLAST XML a few releases back w/o mentioning it to anyone (it was a silent bug in BLAST XML parsing which I fixed recently). If you sent in multiple queries in older versions of BLAST you would get all of the BLAST XML reports concatenated together, which required preparsing the reports to carve up the XML prior to parsing. Now they treat it like PSI- BLAST where multiple queries = multiple iterations, so you get one long XML BLAST report where each iteration=Result. The current parser should handle both as it just caches the other results and returns them one at a time prior to new parses, but I wouldn't recommend parsing a huge BLAST XML file with hundreds of queries as you'll quickly run out of memory! If they get Perl SAX2 up to date with Expat they'll eventually add parse_chunk() and pause_parse() for each parser. Until then... chris Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cuiw at ncbi.nlm.nih.gov Fri Feb 9 09:20:10 2007 From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C]) Date: Fri, 9 Feb 2007 09:20:10 -0500 Subject: [Bioperl-l] Perl script to extract from ncbi In-Reply-To: <178139.85769.qm@web56506.mail.re3.yahoo.com> References: <178139.85769.qm@web56506.mail.re3.yahoo.com> Message-ID: <18C407FD4FFB424292D769FBD68C1987020BBC58@NIHCESMLBX8.nih.gov> This is an example for fetching two GenBank records (id=124504630,110665734) in XML format. Organism names like 'Rattus norvegicus' can be parsed from the XML. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&i d=124504630,110665734&retmode=xml&rettype=gb Or you can get TaxIds and translate them into real names: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=nucleotide &id=124504630,110665734&retmode=xml Wenwu Cui, PhD -----Original Message----- From: George Heller [mailto:george.heller at yahoo.com] Sent: Thursday, February 08, 2007 1:55 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Perl script to extract from ncbi Hi all, I have a question regarding extracting data from Ncbi. I have a database to store the sequence data, but the files I have loaded into it, dont have a proper description line specified. Based on the accession number, I need to find out what is the genus and species name () from ncbi. I have about 1500 records for which I need to extract the names from ncbi. Any ideas of how I can go about writing a perl script for extracting this information from ncbi? Thanks! George. --------------------------------- Now that's room service! Choose from over 150,000 hotels in 45,000 destinations on Yahoo! Travel to find your fit. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Fri Feb 9 12:51:19 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 09 Feb 2007 12:51:19 -0500 Subject: [Bioperl-l] Perl script to extract from ncbi In-Reply-To: <178139.85769.qm@web56506.mail.re3.yahoo.com> Message-ID: George, http://www.bioperl.org/wiki/HOWTO:Beginners#Retrieving_a_sequence_from_a_dat abase Brian O. On 2/8/07 1:54 PM, "George Heller" wrote: > Hi all, > > I have a question regarding extracting data from Ncbi. I have a database to > store the sequence data, but the files I have loaded into it, dont have a > proper description line specified. Based on the accession number, I need to > find out what is the genus and species name (organism name) from ncbi. > > I have about 1500 records for which I need to extract the names from ncbi. > > Any ideas of how I can go about writing a perl script for extracting this > information from ncbi? > > Thanks! > George. > > > --------------------------------- > Now that's room service! Choose from over 150,000 hotels > in 45,000 destinations on Yahoo! Travel to find your fit. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From johnston at biochem.ucl.ac.uk Fri Feb 9 14:23:41 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Fri, 9 Feb 2007 19:23:41 +0000 (GMT) Subject: [Bioperl-l] WrapperBase Message-ID: Hi, Could WrapperBase::executable warn you if it doesn't find the exe in program_path? At the moment it just silently goes ahead and uses one in the system path if it exists. Cass. I've never used diff, so not sure if this is right, but: 305,308c305,314 < if( $prog_path && -e $prog_path && -x $prog_path ) { < $self->{'_pathtoexe'} = $prog_path; < } else { < my $exe; --- > if($prog_path){ > if(-e $prog_path && -x $prog_path){ > $self->{'_pathtoexe'} = $prog_path; > } > else{ > $self->warn("executable not found in $prog_path, trying system path...") if $warn; > } > } > unless ($self->{_path_to_exe}){ > my $exe; 335a342 From bix at sendu.me.uk Fri Feb 9 17:38:59 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 09 Feb 2007 22:38:59 +0000 Subject: [Bioperl-l] WrapperBase In-Reply-To: References: Message-ID: <45CCF803.9030004@sendu.me.uk> Caroline Johnston wrote: > Hi, > > Could WrapperBase::executable warn you if it doesn't find the exe in > program_path? At the moment it just silently goes ahead and uses one in > the system path if it exists. No, I think not. That would be very annoying when using wrappers for programs that you just have in your system path. What specific problem are you encountering with the current behaviour? From bix at sendu.me.uk Fri Feb 9 17:40:33 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 09 Feb 2007 22:40:33 +0000 Subject: [Bioperl-l] BLASTXML changes (good this time!) In-Reply-To: References: Message-ID: <45CCF861.8030000@sendu.me.uk> Chris Fields wrote: > If Sendu is out there, I think we can safely remove any dependencies > beyond XML::SAX 0.15 for the next release. Should I go ahead and > modify Build.PL? Sure, good to hear. From cjfields at uiuc.edu Fri Feb 9 22:42:24 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 9 Feb 2007 21:42:24 -0600 Subject: [Bioperl-l] BLASTXML changes (good this time!) In-Reply-To: <45CCF861.8030000@sendu.me.uk> References: <45CCF861.8030000@sendu.me.uk> Message-ID: On Feb 9, 2007, at 4:40 PM, Sendu Bala wrote: > Chris Fields wrote: >> If Sendu is out there, I think we can safely remove any dependencies >> beyond XML::SAX 0.15 for the next release. Should I go ahead and >> modify Build.PL? > > Sure, good to hear. I added a version dependency for XML::SAX (v. 0.15) for the PurePerl fix. That likely obviates the need for a Bundle for XML::Simple. Not too pressing; we can determine that before the next release. chris From johnston at biochem.ucl.ac.uk Sat Feb 10 11:27:53 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Sat, 10 Feb 2007 16:27:53 +0000 (GMT) Subject: [Bioperl-l] WrapperBase In-Reply-To: <45CCF803.9030004@sendu.me.uk> References: <45CCF803.9030004@sendu.me.uk> Message-ID: > No, I think not. That would be very annoying when using wrappers for > programs that you just have in your system path. > Hmm, maybe I misundertood what the program_path was for? The executable method goes straight to the system path unless program_path is set, so I assumed you would only set program_path if you specifically wanted it to look somewhere else. You wouldn't get a warning if you didn't specify a program_path and just left it to look in the system path. > What specific problem are you encountering with the current behaviour? One version of an executable in /usr/local, another version - which I wanted to use in my home directory. The program_path method gets a path from an environment variable, which was set to ~/. I didn't realise I had the wrong permissions on the executable though, and it was silently failing to use my version and using the one in /usr/local instead. Cass From george.heller at yahoo.com Sat Feb 10 15:35:18 2007 From: george.heller at yahoo.com (George Heller) Date: Sat, 10 Feb 2007 12:35:18 -0800 (PST) Subject: [Bioperl-l] Error while parsing Message-ID: <162150.76282.qm@web56511.mail.re3.yahoo.com> Hi all, I am in the process of parsing a few files, actually blast results, but happen to get the following error: ------------- EXCEPTION ------------- MSG: Can't get HSPs: data not collected. STACK Bio::Search::Hit::GenericHit::hsp /usr/lib/perl5/site_perl/5.8.5/Bio/Search/Hit/GenericHit.pm:649 STACK toplevel parser.pl:31 -------------------------------------- I am not sure if this is a bug, or is there something I am doing wrong. Any pointers are appreciated. Thanks! George. --------------------------------- No need to miss a message. Get email on-the-go with Yahoo! Mail for Mobile. Get started. From cjfields at uiuc.edu Sat Feb 10 17:56:19 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 10 Feb 2007 16:56:19 -0600 Subject: [Bioperl-l] Error while parsing In-Reply-To: <162150.76282.qm@web56511.mail.re3.yahoo.com> References: <162150.76282.qm@web56511.mail.re3.yahoo.com> Message-ID: On Feb 10, 2007, at 2:35 PM, George Heller wrote: > Hi all, > > I am in the process of parsing a few files, actually blast > results, but happen to get the following error: > > ------------- EXCEPTION ------------- > MSG: Can't get HSPs: data not collected. > STACK Bio::Search::Hit::GenericHit::hsp /usr/lib/perl5/site_perl/ > 5.8.5/Bio/Search/Hit/GenericHit.pm:649 > STACK toplevel parser.pl:31 > -------------------------------------- > > I am not sure if this is a bug, or is there something I am doing > wrong. Any pointers are appreciated. > > Thanks! > George. We'll need more to go on than that. If the bioperl version is v1.5.2, please file a bug via the bioperl bugzilla: http://bugzilla.open-bio.org/ Don't forget to attach a test file which triggers the bug using the 'Create a new attachment' link after the report has been filed. chris From sac at bioperl.org Sat Feb 10 22:56:10 2007 From: sac at bioperl.org (Steve Chervitz) Date: Sat, 10 Feb 2007 19:56:10 -0800 Subject: [Bioperl-l] Error while parsing In-Reply-To: <162150.76282.qm@web56511.mail.re3.yahoo.com> References: <162150.76282.qm@web56511.mail.re3.yahoo.com> Message-ID: <8f200b4c0702101956h53fea96dm241126c680d64ab4@mail.gmail.com> Your report may be lacking HSP alignments for the hit you are attempting to process. Note that by default, blast will report twice as many one-line descriptions as it will alignments: -v Number of database sequences to show one-line descriptions for (V) [Integer] default = 500 -b Number of database sequence to show alignments for (B) [Integer] default = 250 Verify that this isn't the case for your error. If not, go ahead and file a bug report. Attach the report (zipped if big) as well as the relevant portion of your processing script. Steve On 2/10/07, George Heller wrote: > > Hi all, > > I am in the process of parsing a few files, actually blast results, but > happen to get the following error: > > ------------- EXCEPTION ------------- > MSG: Can't get HSPs: data not collected. > STACK Bio::Search::Hit::GenericHit::hsp > /usr/lib/perl5/site_perl/5.8.5/Bio/Search/Hit/GenericHit.pm:649 > STACK toplevel parser.pl:31 > -------------------------------------- > > I am not sure if this is a bug, or is there something I am doing wrong. > Any pointers are appreciated. > > Thanks! > George. > > > --------------------------------- > No need to miss a message. Get email on-the-go > with Yahoo! Mail for Mobile. Get started. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jay at jays.net Sun Feb 11 09:24:55 2007 From: jay at jays.net (Jay Hannah) Date: Sun, 11 Feb 2007 08:24:55 -0600 Subject: [Bioperl-l] wiki: Database error when attempting to change preferences (1054: Unknown column 'user_newpass_time') Message-ID: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net> Just a heads-up -- I wanted to check the "E-mail me when a page I'm watching is changed" box in my preferences http://www.bioperl.org/wiki/Special:Preferences But I can't. Even if I change nothing and hit the Save button I get this: ---------- Database error A database query syntax error has occurred. This may indicate a bug in the software. The last attempted database query was: (SQL query hidden) from within function "User::saveSettings". MySQL returned error "1054: Unknown column 'user_newpass_time' in 'field list' (localhost)". ---------- (Yes, it literally says "(SQL query hidden)". That wasn't me for the purposes of this email. -grin-) Thanks, j seqlab.net http://www.bioperl.org/wiki/User:Jhannah Username: Jhannah User ID: 51 From jay at jays.net Sun Feb 11 10:16:13 2007 From: jay at jays.net (Jay Hannah) Date: Sun, 11 Feb 2007 09:16:13 -0600 Subject: [Bioperl-l] wiki: Database error when attempting to change preferences (1054: Unknown column 'user_newpass_time') In-Reply-To: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net> References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net> Message-ID: <9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net> Hmm.... The error appears to not be limited to changing preferences. I tried to update a couple different pages and got errors like this: ------ Database error A database query syntax error has occurred. This may indicate a bug in the software. The last attempted database query was: (SQL query hidden) from within function "Article::updateRedirectOn". MySQL returned error "1146: Table 'perlwikidb.redirect' doesn't exist (localhost)". ------ So all changes to the wiki aren't working right now? j seqlab.net http://www.bioperl.org/wiki/User:Jhannah From jason at bioperl.org Sun Feb 11 15:18:15 2007 From: jason at bioperl.org (Jason Stajich) Date: Sun, 11 Feb 2007 12:18:15 -0800 Subject: [Bioperl-l] wiki: Database error when attempting to change preferences (1054: Unknown column 'user_newpass_time') In-Reply-To: <9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net> References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net> <9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net> Message-ID: <3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org> Should be fine now - I did an upgrade to mediawiki 1.9 this weekend and i think the upgrade script didn't finish. In the future system support requests should go to support - AT - open-bio.org so we can track them. -jason On Feb 11, 2007, at 7:16 AM, Jay Hannah wrote: > Hmm.... The error appears to not be limited to changing preferences. > I tried to update a couple different pages and got errors like this: > > ------ > Database error > A database query syntax error has occurred. This may indicate a bug > in the software. The last attempted database query was: > > (SQL query hidden) > > from within function "Article::updateRedirectOn". MySQL returned > error "1146: Table 'perlwikidb.redirect' doesn't exist (localhost)". > ------ > > So all changes to the wiki aren't working right now? > > j > seqlab.net > http://www.bioperl.org/wiki/User:Jhannah > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From cjfields at uiuc.edu Sun Feb 11 15:51:53 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 11 Feb 2007 14:51:53 -0600 Subject: [Bioperl-l] wiki: Database error when attempting to change preferences (1054: Unknown column 'user_newpass_time') In-Reply-To: <3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org> References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net> <9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net> <3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org> Message-ID: Is there a good place on the main wiki page to prominently display this? I wanted to place something at the top of the main page but I didn't know if we wanted to post the support email address on the page itself. chris On Feb 11, 2007, at 2:18 PM, Jason Stajich wrote: > Should be fine now - I did an upgrade to mediawiki 1.9 this weekend > and i think the upgrade script didn't finish. > > In the future system support requests should go to support - AT - > open-bio.org so we can track them. > > -jason > On Feb 11, 2007, at 7:16 AM, Jay Hannah wrote: > >> Hmm.... The error appears to not be limited to changing preferences. >> I tried to update a couple different pages and got errors like this: >> >> ------ >> Database error >> A database query syntax error has occurred. This may indicate a bug >> in the software. The last attempted database query was: >> >> (SQL query hidden) >> >> from within function "Article::updateRedirectOn". MySQL returned >> error "1146: Table 'perlwikidb.redirect' doesn't exist (localhost)". >> ------ >> >> So all changes to the wiki aren't working right now? >> >> j >> seqlab.net >> http://www.bioperl.org/wiki/User:Jhannah >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jay at jays.net Sun Feb 11 15:56:53 2007 From: jay at jays.net (Jay Hannah) Date: Sun, 11 Feb 2007 14:56:53 -0600 Subject: [Bioperl-l] wiki: Database error when attempting to change preferences (1054: Unknown column 'user_newpass_time') In-Reply-To: References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net> <9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net> <3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org> Message-ID: On Feb 11, 2007, at 2:51 PM, Chris Fields wrote: > Is there a good place on the main wiki page to prominently display > this? I wanted to place something at the top of the main page but > I didn't know if we wanted to post the support email address on the > page itself. I added it here: http://www.bioperl.org/wiki/About_site Which is linked from all pages via the left-hand bar: community | About this site j seqlab.net http://www.bioperl.org/wiki/User:Jhannah From agd27 at cornell.edu Sun Feb 11 12:47:03 2007 From: agd27 at cornell.edu (Adam Diehl) Date: Sun, 11 Feb 2007 12:47:03 -0500 Subject: [Bioperl-l] Getting GFF output in UCSC-specific format Message-ID: <45CF5697.60703@cornell.edu> Good morning folks, I've got sort of a newbie question regarding how to get gff's out of Bio::Tools:GFF objects that are formatted according to the UCSC browser conventions, described here: http://genome.ucsc.edu/goldenPath/help/customTrack.html#GFF (Ignore the custom track headers and what-not. I just need the fields to be set up according to the descriptions in 1 - 9). The write_feature($feature) method isn't doing it for me, as I get lines like the following (newlines excepted): chr1 EMBL/GenBank/SwissProt gene 1712 2848 . + . db_xref=GeneID:4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002 chr1 EMBL/GenBank/SwissProt CDS 1712 2848 . + . EC_number=2.7.7.7;codon_start=1;db_xref=GI:94989511,GeneID:4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002;product=DNA+polymerase+III%2C+beta+chain;protein_ id=YP_597611.1;transl_table=11;translation=MIQFSINRTLFIHALNATKRAISTKNAIPILSSIKIEVTSTGVTLTGSNGQISIENTIPVSNENAGLLITSPGAILLEASFFINIISSLPDISINVKEIEQHQVVLTSGKSEITLKGKDVDQYPRLQEVSTENPLILKTKLLKSIIAETAFAASLQESRPILTGVHIVLSNHKDFKAVATDSHRMSQRLIT LENTSADFDVVIPSKSLREFSAVFTDDIETVEVFFSPSQILFRSEHISFYTRLLEGNYPDTDRLLMTEFETEVVFNTQSLRHAMERAFLISNATQNGTVKLEITQNHISAHVNSPEVGKVNEDLDIVSQSGSDLTISFNPTYLIESLKAIKSETVKIHFLSPVRPFTLTPGDEEESFIQLITPVRTN As you can see, field 8, which should be frame according to UCSC conventions is blank, and field 9, group according to UCSC, has frame, along with ID, etc. All this extra stuff causes the UCSC browser to choke. First off, it can't identify which features are the same (it does this by matching the group field), and second, it can't interpret the CDS's into translated proteins because it lacks frame data. Basically what I need to do is, for CDS features, extract frame (or codon_start, as it were), from the last field, parse out the integer value and store that in field 8 (as frame), then parse out locus_tag from the last field, clear out everything else and store the locus_tag only in that field (preferably without the qualifier locus_tag=). For feature type gene, I just want to do the last step, so that the gene and CDS features for the same feature have matching group fields that are as simple as possible. Let me know if this is not clear. The way I've been trying to do this is by stringifying each gff object, splitting into an array, @tmp1, splitting @tmp1[8] into @tmp2 with the following code: my @tmp2 = split /\;\, $tmp1[8]; and finally, trying to parse out the bits I need with regular expressions and store back to @tmp1[n]. -- This does not work, because perl wants to interpret every / + etc. as a metacharacter! I am assuming there's a simple way to get at each value in the last field of the gff object using methods supplied by Bio::Tools::GFF, but the API docs seem a bit lacking in this area. Could anyone steer me towards what I need to know to do this? Please let me know if I can clarify any details! Cheers, Adam Diehl From jason at bioperl.org Sun Feb 11 18:29:16 2007 From: jason at bioperl.org (Jason Stajich) Date: Sun, 11 Feb 2007 15:29:16 -0800 Subject: [Bioperl-l] Getting GFF output in UCSC-specific format In-Reply-To: <45CF5697.60703@cornell.edu> References: <45CF5697.60703@cornell.edu> Message-ID: I assume you are getting your features from a Bio::SeqIO parse of a Genbank file? you get back a Bio::SeqFeature::Generic objects so you want to look at the docs for that module to see what the API is. you will need to set frame via $feature->frame($frame) You are going to have to determine the frame yourself if that isn't part of the feature, we don't calculate it for you. For the 9th column, this is available through the tags methods has_tag, add_tag_values, get_tag_values, get_all_tags, and remove_tag so you can remove all the tags you don't want through remove_tag (or if you want to remove them all) my $locus; for my $tag ( $feature->get_all_tags ) { if( $tag eq 'locus_tag' ) { # save the locus_tag when we see it ($locus) = $feature->get_tag_values($tag); } $feature->remove_tag($tag); } You will also want to set the GFF format when you call Bio::Tools::GFF - I think the UCSC site is only supporting GFF1, I don't know exactly how you set the tag then when they aren't paired with key=>value, you'll need to set the tag to 'group' so $feature->add_tag_value('group', $locus); If this is all unsatistfactory you can easily write your own GFF write for your flavor of the data with the print join("\t", $feat->seq_id, $feat->source_tag, $feat->primary_tag, $feat->start, $feat->end, $feat->score, $feat->strand > 0 ? '+' : '-', $feat->frame, $locus), "\n"; -jason On Feb 11, 2007, at 9:47 AM, Adam Diehl wrote: > Good morning folks, > > I've got sort of a newbie question regarding how to get gff's out of > Bio::Tools:GFF objects that are formatted according to the UCSC > browser > conventions, described here: > > http://genome.ucsc.edu/goldenPath/help/customTrack.html#GFF > (Ignore the custom track headers and what-not. I just need the > fields to > be set up according to the descriptions in 1 - 9). > > The write_feature($feature) method isn't doing it for me, as I get > lines > like the following (newlines excepted): > > chr1 EMBL/GenBank/SwissProt gene 1712 2848 . + > . db_xref=GeneID:4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002 > chr1 EMBL/GenBank/SwissProt CDS 1712 2848 . + > . > EC_number=2.7.7.7;codon_start=1;db_xref=GI:94989511,GeneID: > 4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002;product=DNA+polymerase > +III%2C+beta+chain;protein_ > id=YP_597611.1;transl_table=11;translation=MIQFSINRTLFIHALNATKRAISTKNA > IPILSSIKIEVTSTGVTLTGSNGQISIENTIPVSNENAGLLITSPGAILLEASFFINIISSLPDISINVK > EIEQHQVVLTSGKSEITLKGKDVDQYPRLQEVSTENPLILKTKLLKSIIAETAFAASLQESRPILTGVHI > VLSNHKDFKAVATDSHRMSQRLIT > LENTSADFDVVIPSKSLREFSAVFTDDIETVEVFFSPSQILFRSEHISFYTRLLEGNYPDTDRLLMTEFE > TEVVFNTQSLRHAMERAFLISNATQNGTVKLEITQNHISAHVNSPEVGKVNEDLDIVSQSGSDLTISFNP > TYLIESLKAIKSETVKIHFLSPVRPFTLTPGDEEESFIQLITPVRTN > > As you can see, field 8, which should be frame according to UCSC > conventions is blank, and field 9, group according to UCSC, has frame, > along with ID, etc. All this extra stuff causes the UCSC browser to > choke. First off, it can't identify which features are the same (it > does > this by matching the group field), and second, it can't interpret the > CDS's into translated proteins because it lacks frame data. > > Basically what I need to do is, for CDS features, extract frame (or > codon_start, as it were), from the last field, parse out the integer > value and store that in field 8 (as frame), then parse out locus_tag > from the last field, clear out everything else and store the locus_tag > only in that field (preferably without the qualifier locus_tag=). For > feature type gene, I just want to do the last step, so that the > gene and > CDS features for the same feature have matching group fields that > are as > simple as possible. Let me know if this is not clear. > > The way I've been trying to do this is by stringifying each gff > object, > splitting into an array, @tmp1, splitting @tmp1[8] into @tmp2 with the > following code: my @tmp2 = split /\;\, $tmp1[8]; and finally, > trying to > parse out the bits I need with regular expressions and store back to > @tmp1[n]. -- This does not work, because perl wants to interpret > every > / + etc. as a metacharacter! > > I am assuming there's a simple way to get at each value in the last > field of the gff object using methods supplied by Bio::Tools::GFF, but > the API docs seem a bit lacking in this area. Could anyone steer me > towards what I need to know to do this? Please let me know if I can > clarify any details! > > Cheers, > Adam Diehl > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From bix at sendu.me.uk Sun Feb 11 18:39:15 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 11 Feb 2007 23:39:15 +0000 Subject: [Bioperl-l] WrapperBase In-Reply-To: References: <45CCF803.9030004@sendu.me.uk> Message-ID: <45CFA923.8010201@sendu.me.uk> Caroline Johnston wrote: >> No, I think not. That would be very annoying when using wrappers for >> programs that you just have in your system path. > > Hmm, maybe I misundertood what the program_path was for? The executable > method goes straight to the system path unless program_path is set, so I > assumed you would only set program_path if you specifically wanted it to > look somewhere else. You wouldn't get a warning if you didn't specify a > program_path and just left it to look in the system path. Yes, sorry. Having now actually looked at your patch it seems fine. I'll commit it unless someone beats me to it. From flope004 at hotmail.com Sun Feb 11 21:40:08 2007 From: flope004 at hotmail.com (Wolverine Fran) Date: Mon, 12 Feb 2007 03:40:08 +0100 Subject: [Bioperl-l] TreeIO, how it works? Message-ID: Hi, I have a problem. I don't understand how TreeIO reads the trees: my input: ((dog:0.04,cat:0.08):0.12,(human:0.15,mouse:0.2)); An unrooted tree with 4 tips and 2 internal nodes. when I asked for: print "Total number of nodes ",$tree->number_nodes; I get 6 but when I ask for: foreach my $node (@nodes) { print $node->internal_id,","; } I get 6,0,1,2,3,4,5. Total 7. The root is number 6 and 2 and 5 are my internal nodes. If I set the root to be number 5 this node 6 is still present. Why? what is the node 6? when I try the following: $node5 = $tree->find_node(-internal_id => '5'); $node6 = $tree->find_node(-internal_id => '6'); $node2 = $tree->find_node(-internal_id => '2'); $distance1 = $tree->distance(-nodes =>[$node5,$node2]); $distance2 = $tree->distance(-nodes =>[$node5,$node6]); $distance3 = $tree->distance(-nodes =>[$node2,$node6]); or any other distance I get 2 warnings: -------------------- WARNING --------------------- MSG: Must provide a valid array reference for -nodes --------------------------------------------------- -------------------- WARNING --------------------- MSG: Could not find distance! --------------------------------------------------- What am I doing incorrectly? I am practicing with AlignIO and TreeIO to calculate the maximum likelihood for a given tree. So, other information about that would be of great help. I am practicing with this to see how Bioperl can help me with more complex problems. Thank you very much for your help! _________________________________________________________________ Acepta el reto MSN Premium: Correos m?s divertidos con fotos y textos incre?bles en MSN Premium. Desc?rgalo y pru?balo 2 meses gratis. http://join.msn.com?XAPID=1697&DI=1055&HL=Footer_mailsenviados_correosmasdivertidos From jason at bioperl.org Sun Feb 11 22:05:18 2007 From: jason at bioperl.org (Jason Stajich) Date: Sun, 11 Feb 2007 19:05:18 -0800 Subject: [Bioperl-l] TreeIO, how (does) it work? In-Reply-To: References: Message-ID: <60D2DD3B-2ED1-4A92-A9FA-2875FEAA28CC@bioperl.org> On Feb 11, 2007, at 6:40 PM, Wolverine Fran wrote: > Hi, > > I have a problem. I don't understand how TreeIO reads the trees: > my input: ((dog:0.04,cat:0.08):0.12,(human:0.15,mouse:0.2)); > > An unrooted tree with 4 tips and 2 internal nodes. > when I asked for: > print "Total number of nodes ",$tree->number_nodes; > > I get 6 but when I ask for: > foreach my $node (@nodes) { > print $node->internal_id,","; > } > I get 6,0,1,2,3,4,5. Total 7. > > The root is number 6 and 2 and 5 are my internal nodes. > If I set the root to be number 5 this node 6 is still present. > Why? what is the node 6? Node 6 is to hold the root or a fake root with a trifurcation for unrooted trees. Did you actually call the reroot method to set the root to node 5? > > when I try the following: > $node5 = $tree->find_node(-internal_id => '5'); > $node6 = $tree->find_node(-internal_id => '6'); > $node2 = $tree->find_node(-internal_id => '2'); > $distance1 = $tree->distance(-nodes =>[$node5,$node2]); > $distance2 = $tree->distance(-nodes =>[$node5,$node6]); > $distance3 = $tree->distance(-nodes =>[$node2,$node6]); > or any other distance I get 2 warnings: > -------------------- WARNING --------------------- > MSG: Must provide a valid array reference for -nodes > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: Could not find distance! > --------------------------------------------------- > What am I doing incorrectly? > The distance method is just summing branch lengths on the path between two nodes. Is that what are you trying to do? The error message you report doesn't make sense as "Must provide a valid array reference for -nodes" is only printed when you call is_monophyletic or is_paraphyletic as far as I can tell. what version of bioperl are you using? > I am practicing with AlignIO and TreeIO to calculate the maximum > likelihood > for a given tree. So,other information about that would be of great > help. I am practicing with > this to see how Bioperl can help me with more complex problems. > You are trying to calculate the likelihood of a tree or are you trying to generate a ML tree from an alignment? > Thank you very much for your help! > > _________________________________________________________________ > Acepta el reto MSN Premium: Correos m?s divertidos con fotos y textos > incre?bles en MSN Premium. Desc?rgalo y pru?balo 2 meses gratis. > http://join.msn.com? > XAPID=1697&DI=1055&HL=Footer_mailsenviados_correosmasdivertidos > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From er at xs4all.nl Mon Feb 12 08:03:06 2007 From: er at xs4all.nl (Erik) Date: Mon, 12 Feb 2007 14:03:06 +0100 (CET) Subject: [Bioperl-l] bioperl wiki changes rss / atom In-Reply-To: References: <162150.76282.qm@web56511.mail.re3.yahoo.com> Message-ID: <20181.156.83.0.124.1171285386.squirrel@webmail.xs4all.nl> Hi, The bioperl wiki changes rss / atom feed has two leading empty lines which invalidate the xml: XML Parsing Error: xml declaration not at start of external entity Location: http://www.bioperl.org/w/index.php?title=Special:Recentchanges&feed=rss Line Number 3, Column 1: ^ Could those be removed? (I didn't see a way to do it myself). Might be a useful feed :) thanks, Erik From cjfields at uiuc.edu Mon Feb 12 09:52:44 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 12 Feb 2007 08:52:44 -0600 Subject: [Bioperl-l] bioperl wiki changes rss / atom In-Reply-To: <20181.156.83.0.124.1171285386.squirrel@webmail.xs4all.nl> References: <162150.76282.qm@web56511.mail.re3.yahoo.com> <20181.156.83.0.124.1171285386.squirrel@webmail.xs4all.nl> Message-ID: I have forwarded this to support at open-bio.org, which should take care of it. chris On Feb 12, 2007, at 7:03 AM, Erik wrote: > Hi, > > > The bioperl wiki changes rss / atom feed has two leading empty > lines which > invalidate the xml: > > XML Parsing Error: xml declaration not at start of external entity > Location: > http://www.bioperl.org/w/index.php? > title=Special:Recentchanges&feed=rss > Line Number 3, Column 1: > ^ > > Could those be removed? (I didn't see a way to do it myself). Might > be a > useful feed :) > > > thanks, > > Erik > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From sm8 at sanger.ac.uk Mon Feb 12 12:12:00 2007 From: sm8 at sanger.ac.uk (Stephen Montgomery) Date: Mon, 12 Feb 2007 17:12:00 -0000 Subject: [Bioperl-l] subtract for Bio::RangeI.pm Message-ID: Hi - It is a subtract function for the Bio::RangeI class. (To be added if interested) All the best! Stephen Montgomery //ADD TO BIO::RANGEI =head2 subtract Title : subtract Usage : my @subtracted = $r1->subtract($r2) Function: Subtract range r2 from range r1 Args : arg #1 = a range to subtract from this one (mandatory) arg #2 = strand option ('strong', 'weak', 'ignore') (optional) Returns : undef if they do not overlap or r2 contains this RangeI, or an arrayref of Range objects (this is an array since some instances where the subtract range is enclosed within this range will result in the creation of two new disjoint ranges) =cut sub subtract() { my ($self, $range, $so) = @_; $self->throw("missing arg: you need to pass in another feature") unless $range; return unless $self->_testStrand($range, $so); if ($self eq "Bio::RangeI") { $self = "Bio::Range"; $self->warn("calling static methods of an interface is deprecated; use $self instead"); } $range->throw("Input a Bio::RangeI object") unless $range->isa('Bio::RangeI'); if (!$self->overlaps($range)) { return undef; } ##Subtracts everything if ($range->contains($self)) { return undef; } my ($start, $end, $strand) = $self->intersection($range, $so); ##Subtract intersection from $self range my @outranges = (); if ($self->start < $start) { push(@outranges, $self->new('-start'=>$self->start, '-end'=>$start - 1, '-strand'=>$self->strand, )); } if ($self->end > $end) { push(@outranges, $self->new('-start'=>$end + 1, '-end'=>$self->end, '-strand'=>$self->strand, )); } return \@outranges; } //UNIT TEST #!/usr/bin/perl use strict; use Bio::SeqFeature::Generic; use Data::Dumper; use Test; plan tests => 13; my $feature1 = new Bio::SeqFeature::Generic ( -start => 1, -end => 1000, -strand => 1); my $feature2 = new Bio::SeqFeature::Generic ( -start => 100, -end => 900, -strand => -1); my $subtracted = $feature1->subtract($feature2); ok(defined($subtracted)); ok(scalar(@$subtracted) == 2); foreach my $range (@$subtracted) { ok($range->start == 1 || $range->start == 901); ok($range->end == 99 || $range->end == 1000); } my $subtracted = $feature2->subtract($feature1); ok(!defined($subtracted)); my $subtracted = $feature1->subtract($feature2, 'weak'); ok(!defined($subtracted)); my $subtracted = $feature1->subtract($feature2, 'strong'); ok(!defined($subtracted)); my $feature3 = new Bio::SeqFeature::Generic ( -start => 500, -end => 1500, -strand => 1); my $subtracted = $feature1->subtract($feature3); ok(defined($subtracted)); ok(scalar(@$subtracted) == 1); my $subtracted_i = @$subtracted[0]; ok($subtracted_i->start == 1); ok($subtracted_i->end == 499); From sm8 at sanger.ac.uk Mon Feb 12 11:04:41 2007 From: sm8 at sanger.ac.uk (Stephen Montgomery) Date: Mon, 12 Feb 2007 16:04:41 -0000 Subject: [Bioperl-l] subtract for Bio::RangeI.pm Message-ID: Hi - It is a subtract function for the Bio::RangeI class. (To be added if interested) All the best! Stephen Montgomery //ADD TO BIO::RANGEI =head2 subtract Title : subtract Usage : my @subtracted = $r1->subtract($r2) Function: Subtract range r2 from range r1 Args : arg #1 = a range to subtract from this one (mandatory) arg #2 = strand option ('strong', 'weak', 'ignore') (optional) Returns : undef if they do not overlap or r2 contains this RangeI, or an arrayref of Range objects (this is an array since some instances where the subtract range is enclosed within this range will result in the creation of two new disjoint ranges) =cut sub subtract() { my ($self, $range, $so) = @_; $self->throw("missing arg: you need to pass in another feature") unless $range; return unless $self->_testStrand($range, $so); if ($self eq "Bio::RangeI") { $self = "Bio::Range"; $self->warn("calling static methods of an interface is deprecated; use $self instead"); } $range->throw("Input a Bio::RangeI object") unless $range->isa('Bio::RangeI'); if (!$self->overlaps($range)) { return undef; } ##Subtracts everything if ($range->contains($self)) { return undef; } my ($start, $end, $strand) = $self->intersection($range, $so); ##Subtract intersection from $self range my @outranges = (); if ($self->start < $start) { push(@outranges, $self->new('-start'=>$self->start, '-end'=>$start - 1, '-strand'=>$self->strand, )); } if ($self->end > $end) { push(@outranges, $self->new('-start'=>$end + 1, '-end'=>$self->end, '-strand'=>$self->strand, )); } return \@outranges; } //UNIT TEST #!/usr/bin/perl use strict; use Bio::SeqFeature::Generic; use Data::Dumper; use Test; plan tests => 13; my $feature1 = new Bio::SeqFeature::Generic ( -start => 1, -end => 1000, -strand => 1); my $feature2 = new Bio::SeqFeature::Generic ( -start => 100, -end => 900, -strand => -1); my $subtracted = $feature1->subtract($feature2); ok(defined($subtracted)); ok(scalar(@$subtracted) == 2); foreach my $range (@$subtracted) { ok($range->start == 1 || $range->start == 901); ok($range->end == 99 || $range->end == 1000); } my $subtracted = $feature2->subtract($feature1); ok(!defined($subtracted)); my $subtracted = $feature1->subtract($feature2, 'weak'); ok(!defined($subtracted)); my $subtracted = $feature1->subtract($feature2, 'strong'); ok(!defined($subtracted)); my $feature3 = new Bio::SeqFeature::Generic ( -start => 500, -end => 1500, -strand => 1); my $subtracted = $feature1->subtract($feature3); ok(defined($subtracted)); ok(scalar(@$subtracted) == 1); my $subtracted_i = @$subtracted[0]; ok($subtracted_i->start == 1); ok($subtracted_i->end == 499); From flope004 at hotmail.com Mon Feb 12 13:07:12 2007 From: flope004 at hotmail.com (Wolverine Fran) Date: Mon, 12 Feb 2007 19:07:12 +0100 Subject: [Bioperl-l] TreeIO, how (does) it work? In-Reply-To: <60D2DD3B-2ED1-4A92-A9FA-2875FEAA28CC@bioperl.org> Message-ID: thanks for your reply! I am using Bioperl 1.4. >Node 6 is to hold the root or a fake root with a trifurcation for >unrooted trees. Did you actually call the reroot method to set the >root to node 5? Yes, I tried the following with the same result: $tree->reroot($tree->find_node(-internal_id => '5')); or $tree->set_root_node($tree->find_node(-internal_id => '5')); Even if I use a rooted tree: (((dog:0.04,cat:0.08):0.12,human:0.15):0.1,mouse:0.1); I get the node #6. So, is it always present? Am I not representing properly a rooted tree in newick format? >The distance method is just summing branch lengths on the path >between two nodes. Is that what are you trying to do? > >The error message you report doesn't make sense as >"Must provide a valid array reference for -nodes" >is only printed when you call is_monophyletic or is_paraphyletic as >far as I can tell. I do not know yet what I was doing incorrectly but now It works. Yes, I was using the distance method to know where the node 6 was located. For the unrooted tree, node 6 was node 5 (an internal node) and for the rooted tree node 6 was 0.1 from the mouse leaf and the internal node (root). The error message: "Must provide a valid array reference for -nodes" is shown if I indicate a node which is not present in the tree. >You are trying to calculate the likelihood of a tree or are you >trying to generate a ML tree from an alignment? I am trying to calculate the likelihood of a tree, as a practice. Probably there are other bioperl modules, besides AlignIO and TreeIO, which can help me in the process and I do not know them. Again, thank you for your time! _________________________________________________________________ Acepta el reto MSN Premium: Protecci?n para tus hijos en internet. Desc?rgalo y pru?balo 2 meses gratis. http://join.msn.com?XAPID=1697&DI=1055&HL=Footer_mailsenviados_proteccioninfantil From dmessina at wustl.edu Mon Feb 12 12:49:49 2007 From: dmessina at wustl.edu (David Messina) Date: Mon, 12 Feb 2007 11:49:49 -0600 Subject: [Bioperl-l] subtract for Bio::RangeI.pm In-Reply-To: References: Message-ID: <1574ACCF-92D5-4DEC-AD04-14EB7767F22A@wustl.edu> Stephen, Great, thanks for this. Could you submit it to Bugzilla as an enhancement? http://bugzilla.open-bio.org/ Thanks, Dave From jason at bioperl.org Mon Feb 12 13:38:11 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 12 Feb 2007 10:38:11 -0800 Subject: [Bioperl-l] TreeIO, how (does) it work? In-Reply-To: References: Message-ID: I would definitely suggest getting ahold of bioperl 1.5.2 as I seem to remember there are several fixes in the tree module code for re- rooting a tree. -jason On Feb 12, 2007, at 10:07 AM, Wolverine Fran wrote: > thanks for your reply! > > I am using Bioperl 1.4. > >> Node 6 is to hold the root or a fake root with a trifurcation for >> unrooted trees. Did you actually call the reroot method to set the >> root to node 5? > > Yes, I tried the following with the same result: > $tree->reroot($tree->find_node(-internal_id => '5')); > or > $tree->set_root_node($tree->find_node(-internal_id => '5')); > > Even if I use a rooted tree: (((dog:0.04,cat:0.08):0.12,human:0.15): > 0.1,mouse:0.1); > I get the node #6. So, is it always present? Am I not representing > properly a rooted tree in newick format? > >> The distance method is just summing branch lengths on the path >> between two nodes. Is that what are you trying to do? >> >> The error message you report doesn't make sense as >> "Must provide a valid array reference for -nodes" >> is only printed when you call is_monophyletic or is_paraphyletic as >> far as I can tell. > > I do not know yet what I was doing incorrectly but now It works. > Yes, I was using the distance method to know where the node 6 was > located. For the unrooted tree, node 6 was node 5 (an internal > node) and for the rooted tree node 6 was 0.1 from the mouse leaf > and the internal node (root). > The error message: "Must provide a valid array reference for - > nodes" is shown if I indicate a node which is not present in the tree. > >> You are trying to calculate the likelihood of a tree or are you >> trying to generate a ML tree from an alignment? > > I am trying to calculate the likelihood of a tree, as a practice. > Probably there are other bioperl modules, besides AlignIO and > TreeIO, which can help me in the process and I do not know them. > > Again, thank you for your time! > > _________________________________________________________________ > Acepta el reto MSN Premium: Protecci?n para tus hijos en internet. > Desc?rgalo y pru?balo 2 meses gratis. http://join.msn.com? > XAPID=1697&DI=1055&HL=Footer_mailsenviados_proteccioninfantil > -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From johnsonm at gmail.com Mon Feb 12 18:13:09 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Mon, 12 Feb 2007 17:13:09 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: On 2/7/07, Mark Johnson wrote: > > Well, each format has some unique features. If the user declines to > specify the format, I can figure it out, but it will probably involve > scanning the input file twice. I'll take a look. > I can do all the parsing in one function, in fact I have, just to see > how nasty it would end up being. I just can't stomach having the code that > tightly coupled and hard to read. In the end it'll probably be three > functions. GlimmermM/HMM are pretty close. Maybe two, Glimmer2 and > Glimmer3 aren't *that* different, either. I've got a 4-in-1 parser roughed in per Chris Fields' suggestion. Two actual parsing routines (prokaryotic and eukaryotic). You can specify -format as an arg to the constructor (Glimmer, GlimmerM, GlimmerHMM), or it will look through the input until it can figure out what it is looking at. I've got one main issue to solve, the rest is just stuff like updating the POD. Torsten Seemann very helpfully added example output for all 4 formats to t/data. Looking at GlimmerHMM.out, the first line is 'GlimmerHMM'. However, I think there is a bug in the existing _parse_predictions: Shouldn't this: } elsif( /^(Glimmer\S*)$/ ) { # GlimmerHMM has no version $source = $1; next; } be this instead: } elsif( /^(GlimmerHMM\S*)$/ ) { # GlimmerHMM has no version $source = $1; next; } I lifted that bit of code to do format detection...we don't have GlimmerHMM installed locally, so I'm assuming Torsten's output is correct and the above is a bug. Guess I'll go check bugzilla... From torsten.seemann at infotech.monash.edu.au Mon Feb 12 21:07:40 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 13 Feb 2007 13:07:40 +1100 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: Mark, > I've got one main issue to solve, the rest is just stuff like updating > the POD. Torsten Seemann very helpfully added example output for all 4 > formats to t/data. Looking at GlimmerHMM.out, the first line is > 'GlimmerHMM'. However, I think there is a bug in the existing > _parse_predictions: > Shouldn't this: > } elsif( /^(Glimmer\S*)$/ ) { # GlimmerHMM has no version > be this instead: > } elsif( /^(GlimmerHMM\S*)$/ ) { # GlimmerHMM has no version I think it should be what it says, or perhaps now /^(Glimmer(M|HMM))/. Here's why: I came onto the scene at Glimmer.pm rev 1.4. At that stage it only parse GlimmerM. I noted that GlimmerHMM was the same output format as GlimmerM, except for the first line. So in rev 1.5 I modified the regexp to match both ie. \S* . This would also hopefully match any other Glimmer-clone formats that arose. I also fixed the pdocs to say this, and added tests to t/Genpred.t. % cvs diff -r 1.4 -r 1.5 Bio/Tools/Glimmer.pm % cvs diff -r 1.15 -r 1.16 t/Genpred.t I then planned to extend support to Glimmer2 and Glimmer3. I added the 4 test files (t/Glimmer*.out) but never wrote the code. This is where you have come in Mark :-) > I lifted that bit of code to do format detection...we don't have GlimmerHMM > installed locally, so I'm assuming Torsten's output is correct and the above > is a bug. Guess I'll go check bugzilla... I'm pretty sure my 4 test files are correct - I spent a lot of time ensuring they were consistent etc, as I was getting very confused with the different "glimmer" versions! Hope this all helps, --Torsten From avilella at gmail.com Tue Feb 13 08:20:15 2007 From: avilella at gmail.com (Albert Vilella) Date: Tue, 13 Feb 2007 13:20:15 +0000 Subject: [Bioperl-l] number of gaps for the other sequences in an alignment Message-ID: <358f4d650702130520n269419cfkb9cb6dac8feaaa5c@mail.gmail.com> Hi, It would be great if we could have a method to count, given one sequence in an alignment, the number of gaps present in the rest of the sequences of the alignment. That is, for each nucleotide/aminoacidic position of the sequence of interest, look at the column in the alignment, count the gaps, then sum them over for the rest of the non-gapped columns in the sequence of interest. Has anyone tried this before? My idea is to end up having a coefficient of indel contribution for each of the sequences in the alignment, with this coefficient being high when one sequences forces a lot of gaps to be inserted in the final alignment, in order to accommodate this given sequence. I would say that the best place for this is either using methods already available in SimpleAlign, or have something new added there. Looking forward to your comments, Cheers, Albert. From bix at sendu.me.uk Tue Feb 13 11:09:09 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 13 Feb 2007 16:09:09 +0000 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts Message-ID: <45D1E2A5.6060104@sendu.me.uk> I have some raw sequences in a Bio::DB::SeqFeature::Store mysql database and wanted to associated some basic information with them, like exon positions. I thought of creating Bio::SeqFeature::Gene::Transcript objects and storing them so I could later use features() to see what other features overlapped exons. I ran into a fatal error that can be replicated with the following simplified one-liner: perl -MBio::DB::SeqFeature::Store -MBio::SeqFeature::Gene::Transcript -e '$db = Bio::DB::SeqFeature::Store->new(-adaptor => "DBI::mysql", -dsn => "dbi:mysql:test"); $trans = Bio::SeqFeature::Gene::Transcript->new(-start => 1, -end => 2, -seq_id => "test"); $db->store($trans); @trans = $db->features(-seqid => $id, -type => "transcript"); print "@trans\n";' code sub { package Bio::SeqFeature::Generic; use strict 'refs'; my $self = shift @_; foreach my $f (@{[] unless $$self{'_gsf_sub_array'};}) { $f = undef; } $$self{'_gsf_seq'} = undef; foreach my $t (keys %{{} unless $$self{'_gsf_tag_hash'};}) { $$self{'_gsf_tag_hash'}{$t} = undef; delete $$self{'_gsf_tag_hash'}{$t}; } } did not evaluate to a subroutine reference, at /.../Bio/DB/SeqFeature/Store.pm line 2280 Is this a bug? Or am I taking the wrong approach? From johnsonm at gmail.com Tue Feb 13 15:10:23 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Tue, 13 Feb 2007 14:10:23 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: You're quite correct. I wasn't paying enough attention. That does work just fine. I fat-fingered something somewhere else, broke my version of the module for GlimmerHMM, hallucinated and confused \S and \s. 8) All I have left now is to fixup the POD documentation and such and then I can send the module along and somebody can make whatever tweaks and check it in. Shall I open a ticket in Bugzilla for this and attach diffs, or just send them along to somebody to take care of directly? Oh, one thing I have not mentioned. I also added a -seqname argument. Glimmer2 does not provide any kind of sequence identifier in the output, and only processes the first sequence in a fasta file. It would be tedious to have to code around this by fixing up the predictions after they are produced, so I added the option to provide this missing info up front, hopefully allowing downstream code to not have to care as much and have a special case for fixing up Glimmer2 predictions. On 2/12/07, Torsten Seemann wrote: > I think it should be what it says, or perhaps now /^(Glimmer(M|HMM))/. > Here's why: > > I came onto the scene at Glimmer.pm rev 1.4. At that stage it only > parse GlimmerM. I noted that GlimmerHMM was the same output format as > GlimmerM, except for the first line. So in rev 1.5 I modified the > regexp to match both ie. \S* . This would also hopefully match any > other Glimmer-clone formats that arose. I also fixed the pdocs to say > this, and added tests to t/Genpred.t. > % cvs diff -r 1.4 -r 1.5 Bio/Tools/Glimmer.pm > % cvs diff -r 1.15 -r 1.16 t/Genpred.t > > I then planned to extend support to Glimmer2 and Glimmer3. I added the > 4 test files (t/Glimmer*.out) but never wrote the code. This is where > you have come in Mark :-) > > > I lifted that bit of code to do format detection...we don't have > GlimmerHMM > > installed locally, so I'm assuming Torsten's output is correct and the > above > > is a bug. Guess I'll go check bugzilla... > > I'm pretty sure my 4 test files are correct - I spent a lot of time > ensuring they were consistent etc, as I was getting very confused with > the different "glimmer" versions! > > Hope this all helps, > > --Torsten > From cjfields at uiuc.edu Tue Feb 13 15:47:19 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 13 Feb 2007 14:47:19 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: You'll also want to update whatever relevant tests there are for Glimmer; looks like they are in GenPred.t. chris On Feb 13, 2007, at 2:10 PM, Mark Johnson wrote: > You're quite correct. I wasn't paying enough attention. That > does work > just fine. I fat-fingered something somewhere else, broke my > version of the > module for GlimmerHMM, hallucinated and confused \S and \s. 8) > All I have left now is to fixup the POD documentation and such > and then > I can send the module along and somebody can make whatever tweaks > and check > it in. Shall I open a ticket in Bugzilla for this and attach > diffs, or just > send them along to somebody to take care of directly? > Oh, one thing I have not mentioned. I also added a -seqname > argument. > Glimmer2 does not provide any kind of sequence identifier in the > output, and > only processes the first sequence in a fasta file. It would be > tedious to > have to code around this by fixing up the predictions after they are > produced, so I added the option to provide this missing info up front, > hopefully allowing downstream code to not have to care as much and > have a > special case for fixing up Glimmer2 predictions. > > On 2/12/07, Torsten Seemann > wrote: > >> I think it should be what it says, or perhaps now /^(Glimmer(M| >> HMM))/. >> Here's why: >> >> I came onto the scene at Glimmer.pm rev 1.4. At that stage it only >> parse GlimmerM. I noted that GlimmerHMM was the same output format as >> GlimmerM, except for the first line. So in rev 1.5 I modified the >> regexp to match both ie. \S* . This would also hopefully match any >> other Glimmer-clone formats that arose. I also fixed the pdocs to say >> this, and added tests to t/Genpred.t. >> % cvs diff -r 1.4 -r 1.5 Bio/Tools/Glimmer.pm >> % cvs diff -r 1.15 -r 1.16 t/Genpred.t >> >> I then planned to extend support to Glimmer2 and Glimmer3. I added >> the >> 4 test files (t/Glimmer*.out) but never wrote the code. This is where >> you have come in Mark :-) >> >>> I lifted that bit of code to do format detection...we don't have >> GlimmerHMM >>> installed locally, so I'm assuming Torsten's output is correct >>> and the >> above >>> is a bug. Guess I'll go check bugzilla... >> >> I'm pretty sure my 4 test files are correct - I spent a lot of time >> ensuring they were consistent etc, as I was getting very confused >> with >> the different "glimmer" versions! >> >> Hope this all helps, >> >> --Torsten >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From thokeller at gmail.com Tue Feb 13 17:00:06 2007 From: thokeller at gmail.com (Thomas Keller) Date: Tue, 13 Feb 2007 14:00:06 -0800 Subject: [Bioperl-l] update/install problem Message-ID: Could someone suggest a workaround or fix for this error? $ sudo fink update bioperl-pm586 Information about 5850 packages read in 2 seconds. The package 'bioperl-pm586' will be built and installed. The package 'xml-sax-pm586' will be installed. The package 'xml-sax-writer-pm586' will be built and installed. The package 'xml-filter-buffertext-pm586' will be built and installed. The following package will be installed or updated: bioperl-pm586 The following 3 additional packages will be installed: xml-filter-buffertext-pm586 xml-sax-pm586 xml-sax-writer-pm586 Do you want to continue? [Y/n] Y /sw/bin/dpkg-lockwait -i /sw/fink/dists/unstable/main/binary-darwin-powerpc/libs/perlmods/xml-sax-pm586_0.13-2_darwin- powerpc.deb (Reading database ... 48029 files and directories currently installed.) Preparing to replace xml-sax-pm586 0.13-2 (using .../xml-sax-pm586_0.13-2_darwin-powerpc.deb) ... Unpacking replacement xml-sax-pm586 ... Setting up xml-sax-pm586 (0.13-2) ... update-perl586-sax-parsers: adding Perl SAX parser module info file of XML::SAX::PurePerl... Can't locate object method "save_parsers_debian" via package "XML::SAX" at /sw/sbin/update-perl586-sax-parsers line 96. /sw/bin/dpkg: error processing xml-sax-pm586 (--install): subprocess post-installation script returned error exit status 22 Errors were encountered while processing: xml-sax-pm586 ### execution of /sw/bin/dpkg-lockwait failed, exit code 1 Failed: can't install package xml-sax-pm586-0.13-2 -- Tom Keller "Ecrasez l'Infame!" -- Voltaire From sac at bioperl.org Tue Feb 13 18:00:46 2007 From: sac at bioperl.org (Steve Chervitz) Date: Tue, 13 Feb 2007 15:00:46 -0800 Subject: [Bioperl-l] Bio::Root::Utilities.pm Message-ID: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com> I noticed that Bio::Root::Utilities was purged from bioperl-live for the 1.5.2 release, but I'd like us to consider adding it back. I agree that the other purged Root modules were ancient relics of the past, but Bio::Root:: Utilities.pm still has signs of life (at least I still find occasion to use it, or refer to code in it). I know that it's not currently used by any other modules in Bioperl, but there are likely some legacy scripts out there that rely on it. Probably most of those scripts are ones I've written, but there have been substantive commits by others in the not-to-distant past (Dec 2005), so at least some folks besides myself are using it and may hesitate to upgrade their bioperl installation if it's absent. I'm all for avoiding bloat in the codebase and am eager to see Bioperl be more lean and mean, but I'd like to keep this module around. I'll agree to add some tests for it as well as clean some things up (e.g., use Bio::Root::IO to get temp file name). Cheers, Steve -- Steve Chervitz sac at bioperl.org From cjfields at uiuc.edu Tue Feb 13 20:29:03 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 13 Feb 2007 19:29:03 -0600 Subject: [Bioperl-l] update/install problem In-Reply-To: References: Message-ID: On Feb 13, 2007, at 4:00 PM, Thomas Keller wrote: > Could someone suggest a workaround or fix for this error? > > $ sudo fink update bioperl-pm586 > Information about 5850 packages read in 2 seconds. > The package 'bioperl-pm586' will be built and installed. > The package 'xml-sax-pm586' will be installed. > The package 'xml-sax-writer-pm586' will be built and installed. > The package 'xml-filter-buffertext-pm586' will be built and installed. > The following package will be installed or updated: > bioperl-pm586 > The following 3 additional packages will be installed: > xml-filter-buffertext-pm586 xml-sax-pm586 xml-sax-writer-pm586 > Do you want to continue? [Y/n] Y > /sw/bin/dpkg-lockwait -i > /sw/fink/dists/unstable/main/binary-darwin-powerpc/libs/perlmods/ > xml-sax-pm586_0.13-2_darwin- > powerpc.deb > (Reading database ... 48029 files and directories currently > installed.) > Preparing to replace xml-sax-pm586 0.13-2 (using > .../xml-sax-pm586_0.13-2_darwin-powerpc.deb) ... > Unpacking replacement xml-sax-pm586 ... > Setting up xml-sax-pm586 (0.13-2) ... > update-perl586-sax-parsers: adding Perl SAX parser module info file of > XML::SAX::PurePerl... > Can't locate object method "save_parsers_debian" via package > "XML::SAX" at > /sw/sbin/update-perl586-sax-parsers line 96. > /sw/bin/dpkg: error processing xml-sax-pm586 (--install): > subprocess post-installation script returned error exit status 22 > Errors were encountered while processing: > xml-sax-pm586 > ### execution of /sw/bin/dpkg-lockwait failed, exit code 1 > Failed: can't install package xml-sax-pm586-0.13-2 The fink installation seems to be hanging on XML::SAX, not bioperl. You could try installing XML::SAX (now at v. 0.15) via CPAN using 'sudo cpan'; I updated just recently w/o problems. As an aside, you could similarly install bioperl directly from CPAN (which I also haven't had any problems with). The installation allows for installing optional modules. chris From cjfields at uiuc.edu Tue Feb 13 22:41:31 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 13 Feb 2007 21:41:31 -0600 Subject: [Bioperl-l] Bio::Root::Utilities.pm In-Reply-To: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com> References: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com> Message-ID: <1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu> On Feb 13, 2007, at 5:00 PM, Steve Chervitz wrote: > I noticed that Bio::Root::Utilities was purged from bioperl-live > for the > 1.5.2 release, but I'd like us to consider adding it back. I agree > that the > other purged Root modules were ancient relics of the past, but > Bio::Root:: > Utilities.pm still has signs of life (at least I still find > occasion to use > it, or refer to code in it). > > I know that it's not currently used by any other modules in > Bioperl, but > there are likely some legacy scripts out there that rely on it. > Probably > most of those scripts are ones I've written, but there have been > substantive > commits by others in the not-to-distant past (Dec 2005), so at > least some > folks besides myself are using it and may hesitate to upgrade their > bioperl > installation if it's absent. > > I'm all for avoiding bloat in the codebase and am eager to see > Bioperl be > more lean and mean, but I'd like to keep this module around. I'll > agree to > add some tests for it as well as clean some things up (e.g., use > Bio::Root::IO to get temp file name). > > Cheers, > Steve > -- > Steve Chervitz > sac at bioperl.org I don't have a problem with adding it back, esp. if tests are added. Everything in Bio::Root* not tied to a module was yanked out when no one spoke up about cleaning up Bio::Root* modules: http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12792/ focus=12839 Maybe others disagree? chris From bix at sendu.me.uk Wed Feb 14 03:00:35 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 14 Feb 2007 08:00:35 +0000 Subject: [Bioperl-l] update/install problem In-Reply-To: References: Message-ID: <45D2C1A3.9060300@sendu.me.uk> Chris Fields wrote: > As an aside, you could similarly install bioperl directly from CPAN > (which I also haven't had any problems with). Indeed. If you follow the unix instructions at http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix you should have a problem-free complete install under Mac OS X. From bix at sendu.me.uk Wed Feb 14 09:08:22 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 14 Feb 2007 14:08:22 +0000 Subject: [Bioperl-l] BLASTXML changes (good this time!) In-Reply-To: References: <45CCF861.8030000@sendu.me.uk> Message-ID: <45D317D6.5070903@sendu.me.uk> Chris Fields wrote: > > On Feb 9, 2007, at 4:40 PM, Sendu Bala wrote: > >> Chris Fields wrote: >>> If Sendu is out there, I think we can safely remove any dependencies >>> beyond XML::SAX 0.15 for the next release. Should I go ahead and >>> modify Build.PL? >> >> Sure, good to hear. > > I added a version dependency for XML::SAX (v. 0.15) for the PurePerl > fix. That likely obviates the need for a Bundle for XML::Simple. Not > too pressing; we can determine that before the next release. The bundle is now obsolete. Does anything in Bioperl, or any of its dependencies, now make use of the expat library? If not, I can remove mention of it from the install documentation. From bix at sendu.me.uk Wed Feb 14 09:02:39 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 14 Feb 2007 14:02:39 +0000 Subject: [Bioperl-l] DB.t failures Message-ID: <45D3167F.2000608@sendu.me.uk> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer getting sequences back from NCBI in the order we requested them in batch mode. Is this a change at NCBI? Is there some way we can make sure to return the sequences in the expected order? Or shouldn't the order be expected (should the test script be altered)? From cjfields at uiuc.edu Wed Feb 14 09:37:07 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 14 Feb 2007 08:37:07 -0600 Subject: [Bioperl-l] DB.t failures In-Reply-To: <45D3167F.2000608@sendu.me.uk> References: <45D3167F.2000608@sendu.me.uk> Message-ID: <49A5C7D3-8D63-452C-B0EA-6F7144F85E35@uiuc.edu> Confirmed on this end. It's possible that the default sort order from eutils is different now though I haven't seen anything on the eutils mail list. There may be a way to set the sort order via the base URL; I'll check into it later today; I'm still digging myself out from the midwest blizzard. chris On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: > DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer > getting sequences back from NCBI in the order we requested them in > batch > mode. > > Is this a change at NCBI? Is there some way we can make sure to return > the sequences in the expected order? Or shouldn't the order be > expected > (should the test script be altered)? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Feb 14 09:42:05 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 14 Feb 2007 08:42:05 -0600 Subject: [Bioperl-l] BLASTXML changes (good this time!) In-Reply-To: <45D317D6.5070903@sendu.me.uk> References: <45CCF861.8030000@sendu.me.uk> <45D317D6.5070903@sendu.me.uk> Message-ID: On Feb 14, 2007, at 8:08 AM, Sendu Bala wrote: > Chris Fields wrote: >> >> On Feb 9, 2007, at 4:40 PM, Sendu Bala wrote: >> >>> Chris Fields wrote: >>>> If Sendu is out there, I think we can safely remove any >>>> dependencies >>>> beyond XML::SAX 0.15 for the next release. Should I go ahead and >>>> modify Build.PL? >>> >>> Sure, good to hear. >> >> I added a version dependency for XML::SAX (v. 0.15) for the PurePerl >> fix. That likely obviates the need for a Bundle for XML::Simple. >> Not >> too pressing; we can determine that before the next release. > > The bundle is now obsolete. Does anything in Bioperl, or any of its > dependencies, now make use of the expat library? If not, I can remove > mention of it from the install documentation. I'll try getting something up about XML::SAX on the wiki today. XML::Parser, though, still requires expat AFAIK: http://www.bioperl.org/wiki/BioPerl_Dependencies chris From kellert at ohsu.edu Tue Feb 13 17:43:24 2007 From: kellert at ohsu.edu (Thomas J Keller) Date: Tue, 13 Feb 2007 14:43:24 -0800 Subject: [Bioperl-l] HowTo:SearchIO Message-ID: Greetings, I've been away from programming and informatics for many months. Hoping to get back into it, I thought it would be good to review the tutorials. I tried the code in the tutorial on the sample blast report in the tutorial and it worked fine. So I ran a blastx search and saved the results and tried to parse them: It gave the "... parsing" message, but no other results get reported. Any suggestions? Thanks, Tom Tom Keller, Ph.D. kellert at ohsu.edu 503-494-2442 6339b Basic Science Bldg http://www.ohsu.edu/research/core From mrouard at gmail.com Wed Feb 14 06:23:47 2007 From: mrouard at gmail.com (Mathieu Rouard) Date: Wed, 14 Feb 2007 12:23:47 +0100 Subject: [Bioperl-l] get the sequence of a column in a multiple alignment Message-ID: Dear all, I am starting to use the bioperl API to parse multiple alignments and I am wondering what is the most effective way to splice all the columns from an alignment (all the AA at the postion 1, position 2 etc.). I quickly implemented this simple code but it becomes quite slow when the length of sequences increases. my $stream = Bio::AlignIO->new(-file => $inputfilename, '-format' => 'stockholm'); my $aln = $stream->next_aln(); my $length = $aln->length(); my %column; for (my $i=1;$i<=$length;$i++) { my $aa; foreach my $seq ($aln->each_seq()) { my $obj = $seq->trunc($i,$i); $aa .=$obj->seq; } # need to track the column number and the sequence of the column push $column, $aa; } Would you have any other suggestion? thanks Mathieu From avilella at gmail.com Wed Feb 14 10:29:02 2007 From: avilella at gmail.com (Albert Vilella) Date: Wed, 14 Feb 2007 15:29:02 +0000 Subject: [Bioperl-l] get the sequence of a column in a multiple alignment In-Reply-To: References: Message-ID: <358f4d650702140729u4dae2847qc8eeeb45b20faca4@mail.gmail.com> there is a slice method: $mini_aln = $aln->slice(20,30); # get a block of columns Title : slice Usage : $aln2 = $aln->slice(20,30) Function : Creates a slice from the alignment inclusive of start and end columns, and the first column in the alignment is denoted 1. Sequences with no residues in the slice are excluded from the new alignment and a warning is printed. Slice beyond the length of the sequence does not do padding. Returns : A Bio::SimpleAlign object Args : Positive integer for start column, positive integer for end column, optional boolean which if true will keep gap-only columns in the newly created slice. Example: $aln2 = $aln->slice(20,30,1) but I don't know how well it behaves for lots of sequences :) On 2/14/07, Mathieu Rouard wrote: > Dear all, > > I am starting to use the bioperl API to parse multiple alignments and I am > wondering what is the most effective way to splice all the columns from an > alignment (all the AA at the postion 1, position 2 etc.). I quickly > implemented this simple code but it becomes quite slow when the length of > sequences increases. > > my $stream = Bio::AlignIO->new(-file => $inputfilename, > '-format' => 'stockholm'); > > my $aln = $stream->next_aln(); > > my $length = $aln->length(); > my %column; > > for (my $i=1;$i<=$length;$i++) { > my $aa; > foreach my $seq ($aln->each_seq()) { > my $obj = $seq->trunc($i,$i); > $aa .=$obj->seq; > } > # need to track the column number and the sequence of the column > push $column, $aa; > } > > Would you have any other suggestion? > > thanks > Mathieu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Wed Feb 14 11:59:49 2007 From: jason at bioperl.org (Jason Stajich) Date: Wed, 14 Feb 2007 08:59:49 -0800 Subject: [Bioperl-l] HowTo:SearchIO In-Reply-To: References: Message-ID: As always, reporting the version of BLAST and Bioperl you have installed will help someone diagnose if this is a fixed problem or not. If you trawl through the list archives you'll chris and others have been playing cat and mouse with the text version output from NCBI BLAST which appears to be an ever evolving beast. So the best advice right now is to get the latest bioperl from CVS to insure you have all the patches that might parse this version. If it still fails then the standard response will be to submit the report as an attachment to a new bug report on the bugzilla. thanks, -jason On Feb 13, 2007, at 2:43 PM, Thomas J Keller wrote: > Greetings, > I've been away from programming and informatics for many months. > Hoping to get back into it, I thought it would be good to review the > tutorials. > I tried the code in the tutorial on the sample blast report in the > tutorial and it worked fine. So I ran a blastx search and saved the > results and tried to parse them: It gave the "... parsing" message, > but no other results get reported. > > Any suggestions? > > Thanks, > Tom > > Tom Keller, Ph.D. > kellert at ohsu.edu > 503-494-2442 > 6339b Basic Science Bldg > http://www.ohsu.edu/research/core > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From dmessina at wustl.edu Wed Feb 14 11:58:45 2007 From: dmessina at wustl.edu (David Messina) Date: Wed, 14 Feb 2007 10:58:45 -0600 Subject: [Bioperl-l] HowTo:SearchIO In-Reply-To: References: Message-ID: <6E3CAB6B-9F9E-46FD-9021-50D7FE011860@wustl.edu> Hi Tom, Could you tell us what version of BioPerl you are using, and what specific example is failing for you? And could you post your code? That would make it easier to diagnose the problem. Thanks, Dave -- Dave Messina Senior Programmer/Analyst, Assembly Group WashU Genome Sequencing Center dmessina a t wustl.edu 314-286-1415 From cjfields at uiuc.edu Wed Feb 14 12:28:24 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 14 Feb 2007 11:28:24 -0600 Subject: [Bioperl-l] HowTo:SearchIO In-Reply-To: References: Message-ID: <3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu> I would also strongly encourage switching to using XML-based parsing, which is much more stable now. Here's the link to the NCBI response re: BLAST report parsing: http://bioperl.org/wiki/NCBI_Blast_email chris (taking a break from shoveling snow...) On Feb 14, 2007, at 10:59 AM, Jason Stajich wrote: > As always, reporting the version of BLAST and Bioperl you have > installed will help someone diagnose if this is a fixed problem or > not. If you trawl through the list archives you'll chris and others > have been playing cat and mouse with the text version output from > NCBI BLAST which appears to be an ever evolving beast. > > So the best advice right now is to get the latest bioperl from CVS > to insure you have all the patches that might parse this version. If > it still fails then the standard response will be to submit the > report as an attachment to a new bug report on the bugzilla. > > thanks, > -jason > > > On Feb 13, 2007, at 2:43 PM, Thomas J Keller wrote: > >> Greetings, >> I've been away from programming and informatics for many months. >> Hoping to get back into it, I thought it would be good to review the >> tutorials. >> I tried the code in the tutorial on the sample blast report in the >> tutorial and it worked fine. So I ran a blastx search and saved the >> results and tried to parse them: It gave the "... parsing" message, >> but no other results get reported. >> >> Any suggestions? >> >> Thanks, >> Tom >> >> Tom Keller, Ph.D. >> kellert at ohsu.edu >> 503-494-2442 >> 6339b Basic Science Bldg >> http://www.ohsu.edu/research/core >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From sac at bioperl.org Wed Feb 14 13:20:17 2007 From: sac at bioperl.org (Steve Chervitz) Date: Wed, 14 Feb 2007 10:20:17 -0800 Subject: [Bioperl-l] Bio::Root::Utilities.pm In-Reply-To: <1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu> References: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com> <1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu> Message-ID: <8f200b4c0702141020x5815018bvd69873222582d9d5@mail.gmail.com> On 2/13/07, Chris Fields wrote: > > > On Feb 13, 2007, at 5:00 PM, Steve Chervitz wrote: > > > I noticed that Bio::Root::Utilities was purged from bioperl-live > > for the > > 1.5.2 release, but I'd like us to consider adding it back. I agree > > that the > > other purged Root modules were ancient relics of the past, but > > Bio::Root:: > > Utilities.pm still has signs of life (at least I still find > > occasion to use > > it, or refer to code in it). > > > > I know that it's not currently used by any other modules in > > Bioperl, but > > there are likely some legacy scripts out there that rely on it. > > Probably > > most of those scripts are ones I've written, but there have been > > substantive > > commits by others in the not-to-distant past (Dec 2005), so at > > least some > > folks besides myself are using it and may hesitate to upgrade their > > bioperl > > installation if it's absent. > > > > I'm all for avoiding bloat in the codebase and am eager to see > > Bioperl be > > more lean and mean, but I'd like to keep this module around. I'll > > agree to > > add some tests for it as well as clean some things up (e.g., use > > Bio::Root::IO to get temp file name). > > > > Cheers, > > Steve > > -- > > Steve Chervitz > > sac at bioperl.org > > I don't have a problem with adding it back, esp. if tests are added. > Everything in Bio::Root* not tied to a module was yanked out when no > one spoke up about cleaning up Bio::Root* modules: > > http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12792/ > focus=12839 > > Maybe others disagree? > > chris > Sorry I missed out on that thread. I had some trouble with my bioperl-l email delivery getting disabled due to excessive bounces, and it took me a while to catch it. Bio::Root::Utilities is quite a grab bag of miscellaneous general functions that are occasionally useful for perl scripting (e.g., determining end-of-line characters, sending email, etc.). The code could definitely use a review, and maybe an example script to advertise it. I can look into this, and suggestions are welcome. Steve From dmessina at wustl.edu Wed Feb 14 13:55:18 2007 From: dmessina at wustl.edu (David Messina) Date: Wed, 14 Feb 2007 12:55:18 -0600 Subject: [Bioperl-l] HowTo:SearchIO In-Reply-To: <3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu> References: <3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu> Message-ID: <49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu> On Feb 14, 2007, at 11:28 AM, Chris Fields wrote: > I would also strongly encourage switching to using XML-based parsing, Unless anyone objects, I would be happy to update the HOWTO to suggest people make the switch and give an example of XML parsing. The Bio::SearchIO synopsis is already an XML example. However, there's no warning about text-based parsing nor a suggestion to use XML that I can see -- perhaps should be added? Dave From cjfields at uiuc.edu Wed Feb 14 15:12:21 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 14 Feb 2007 14:12:21 -0600 Subject: [Bioperl-l] HowTo:SearchIO In-Reply-To: <49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu> References: <3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu> <49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu> Message-ID: On Feb 14, 2007, at 12:55 PM, David Messina wrote: > > On Feb 14, 2007, at 11:28 AM, Chris Fields wrote: > >> I would also strongly encourage switching to using XML-based parsing, > > Unless anyone objects, I would be happy to update the HOWTO to > suggest people make the switch and give an example of XML parsing. > > The Bio::SearchIO synopsis is already an XML example. However, > there's no warning about text-based parsing nor a suggestion to use > XML that I can see -- perhaps should be added? > > Dave We should probably add something specifically for BLAST, yes. Other text parsers should be fine. Personally, I use XML or tabular output parsing simply b/c they are faster and do what I need. I think we'll need to retain the capability for text-based BLAST parsing, but it will become extremely bloated long-term if we plan on continuing support for parsing all versions and flavors of BLAST, particularly if NCBI continues to change the output. chris From dmessina at wustl.edu Wed Feb 14 15:46:31 2007 From: dmessina at wustl.edu (David Messina) Date: Wed, 14 Feb 2007 14:46:31 -0600 Subject: [Bioperl-l] HowTo:SearchIO In-Reply-To: References: <3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu> <49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu> Message-ID: <136DA052-B9FD-4547-B262-EC6E38B47392@wustl.edu> On Feb 14, 2007, at 2:12 PM, Chris Fields wrote: > We should probably add something specifically for BLAST, yes. > Other text parsers should be fine. Good point -- I'll make it clear it's only pertinent to BLAST. > I think we'll need to retain the capability for text-based BLAST > parsing, Agreed. Through the 1.6 release at least, I would think. > particularly if NCBI continues to change the output. Well, clearly the solution is not to use the NCBI flavor of BLAST. :) Dave (look at my email address) From jay at jays.net Thu Feb 15 08:08:56 2007 From: jay at jays.net (Jay Hannah) Date: Thu, 15 Feb 2007 07:08:56 -0600 Subject: [Bioperl-l] DB.t failures In-Reply-To: <45D3167F.2000608@sendu.me.uk> References: <45D3167F.2000608@sendu.me.uk> Message-ID: On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: > DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer > getting sequences back from NCBI in the order we requested them in > batch > mode. Is this the same result you get? DIED. FAILED tests 59-60, 63-64, 67-68, 71-72 Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97 okay, 85.84%) Failed Test Stat Wstat Total Fail Failed List of Failed ------------------------------------------------------------------------ ------- DB.t 8 2048 113 8 7.08% 59-60 63-64 67-68 71-72 8 subtests skipped. Thanks, j seqlab.net http://www.bioperl.org/wiki/User:Jhannah From bix at sendu.me.uk Thu Feb 15 08:37:32 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 15 Feb 2007 13:37:32 +0000 Subject: [Bioperl-l] DB.t failures In-Reply-To: References: <45D3167F.2000608@sendu.me.uk> Message-ID: <45D4621C.6040309@sendu.me.uk> Jay Hannah wrote: > On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: >> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer >> getting sequences back from NCBI in the order we requested them in >> batch >> mode. > > Is this the same result you get? > > > DIED. FAILED tests 59-60, 63-64, 67-68, 71-72 > Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97 > okay, 85.84%) > Failed Test Stat Wstat Total Fail Failed List of Failed > ------------------------------------------------------------------------ > ------- > DB.t 8 2048 113 8 7.08% 59-60 63-64 67-68 71-72 > 8 subtests skipped. Yes, those fails are all caused by results in the wrong order (I believe). From cjfields at uiuc.edu Thu Feb 15 09:22:09 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 08:22:09 -0600 Subject: [Bioperl-l] DB.t failures In-Reply-To: <45D4621C.6040309@sendu.me.uk> References: <45D3167F.2000608@sendu.me.uk> <45D4621C.6040309@sendu.me.uk> Message-ID: On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote: > Jay Hannah wrote: >> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: >>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer >>> getting sequences back from NCBI in the order we requested them in >>> batch >>> mode. >> >> Is this the same result you get? >> >> >> DIED. FAILED tests 59-60, 63-64, 67-68, 71-72 >> Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97 >> okay, 85.84%) >> Failed Test Stat Wstat Total Fail Failed List of Failed >> --------------------------------------------------------------------- >> --- >> ------- >> DB.t 8 2048 113 8 7.08% 59-60 63-64 67-68 71-72 >> 8 subtests skipped. > > Yes, those fails are all caused by results in the wrong order (I > believe). I'm fixing those now so it doesn't depend on order and will commit in the next few minutes. chris From bix at sendu.me.uk Thu Feb 15 09:37:00 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 15 Feb 2007 14:37:00 +0000 Subject: [Bioperl-l] DB.t failures In-Reply-To: <586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu> References: <45D3167F.2000608@sendu.me.uk> <45D4621C.6040309@sendu.me.uk> <586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu> Message-ID: <45D4700C.8020305@sendu.me.uk> Chris Fields wrote: > > On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote: > >> Jay Hannah wrote: >>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: >>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer >>>> getting sequences back from NCBI in the order we requested them in >>>> batch mode. > > Okay, I committed a fix for that. I hope there are many users who > depend on the returned sequence order for anything! s/are/aren't/ ? I suspect there might be, and its certainly a reasonable assumption to make. Did you not see an easy way of maintaining the order? From cjfields at uiuc.edu Thu Feb 15 09:28:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 08:28:46 -0600 Subject: [Bioperl-l] DB.t failures In-Reply-To: <45D4621C.6040309@sendu.me.uk> References: <45D3167F.2000608@sendu.me.uk> <45D4621C.6040309@sendu.me.uk> Message-ID: <586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu> On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote: > Jay Hannah wrote: >> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: >>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer >>> getting sequences back from NCBI in the order we requested them in >>> batch >>> mode. >> >> Is this the same result you get? >> >> >> DIED. FAILED tests 59-60, 63-64, 67-68, 71-72 >> Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97 >> okay, 85.84%) >> Failed Test Stat Wstat Total Fail Failed List of Failed >> --------------------------------------------------------------------- >> --- >> ------- >> DB.t 8 2048 113 8 7.08% 59-60 63-64 67-68 71-72 >> 8 subtests skipped. > > Yes, those fails are all caused by results in the wrong order (I > believe). Okay, I committed a fix for that. I hope there are many users who depend on the returned sequence order for anything! chris From michael.watson at bbsrc.ac.uk Thu Feb 15 09:44:27 2007 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu, 15 Feb 2007 14:44:27 -0000 Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk> Hi OK I have some great images out of this glyph, but I can't see the axis, and nor is it labelled (ie does it go from 0 - 100%?) so isn't great for publication. The docs say: "NOTE: -gc_window=>'auto' gives nice results and is recommended for drawing GC content. The GC content axes draw slightly outside the panel, so you may wish to add some extra padding on the right and left. " Any idea how to do this? Basically, I want a nice GC graph with the axis quite clearly labelled, and a nice "%GC" title next to it :) Thanks Mick The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. From nehadnahar at yahoo.co.in Thu Feb 15 10:28:42 2007 From: nehadnahar at yahoo.co.in (Neha Nahar) Date: Thu, 15 Feb 2007 15:28:42 +0000 (GMT) Subject: [Bioperl-l] Convert newick to nexus format In-Reply-To: <84703383-600F-42F4-A860-DD0D1C43EE83@bioperl.org> Message-ID: <777943.33252.qm@web8404.mail.in.yahoo.com> Thank you Jason. I ran the tests and they failed, so I re-installed the bioperl module and now it works fine. Regards, Neha. Jason Stajich wrote: Something is wrong with your install I am guessing - can you run the tests? Go to bioperl directory: $ perl t/TreeIO.t can you describe how you installed bioperl? On Feb 5, 2007, at 11:58 AM, Neha Nahar wrote: > > Hi, > Thank you for the code. > I tried it but I still get the same exception. > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus1.pl:18 > > > Please find attached the perl file(nexus.pl). > > > Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ > ~sendu/bioperl/Bio/TreeIO.pm > > Please let me know if I am using the correct version.If not, please > point me to the latest one. > > Thank you. > Regards, > nnahar > > > > > > Jason Stajich wrote:please cc the mailing list > when asking a question or followup. > > Sorry I don't know what you are doing wrong - you didn't resend > your code so I don't know if you still have a typo. > > This code works fine for me > > use Bio::TreeIO; > use strict; > my ($filein,$fileout) = @ARGV; > my ($format,$oformat) = qw(newick nexus); > my $in = Bio::TreeIO->new(-file => $filein, -format => $format);my > $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout"); > > > while( my $t = $in->next_tree ) { > $out->write_tree($t); > } > > > > On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote: > > Thank you very much for the reply. > > > I fixed the code as per your suggestion,but now am getting a > different error: > > > $ ./nexus.pl mrp-input.txt nexus.out > newickfile=mrp-input.txt, nexusfile=nexus.out > > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus.pl:23 > > > -------------------------------------- > > > Please help me out with this script. > > > Thank you. > Regards, > Neha > > > > > > > > > Jason Stajich wrote: you want to write the TREE > out not the TREE WRITER. > > > > > $treeout->write_tree($tree) > > > not > $treeout->write_tree($treeout); > > > On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote: > > > Hello everyone, > > > > > I am trying to convert newick tree to nexus format. > Using the script (refered from and email from George dated Wed Sep > 22 11:52:47 EDT 2004) : > > > > > /*------------------------------------------------------------*/ > > > > > $ cat nexus.pl > #!/usr/bin/perl -w > > > > > use Bio::TreeIO; > > > > > ($NEWICKFILE, $NEXUSFILE) = @ARGV; > print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n"; > my $treeio = new Bio::TreeIO(-format => 'newick', -file => > "$NEWICKFILE"); > my $treeout = new Bio::TreeIO(-format => 'nexus', -file => "> > $NEXUSFILE"); > while(my $tree = $treeio->next_tree) { > $treeout->write_tree($treeout); > } > > > > > exit 0; > > > > > > > > > /*------------------------------------------------------------*/ > > > > > Running the script through command line: > Gives the following error: > > > > > $ ./nexus.pl mrp-input.txt nexus.out > newickfile=mrp-input.txt, nexusfile=nexus.out > > > > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus.pl:23 > > > > > -------------------------------------- > > > > > > > > > Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ > ~sendu/bioperl/Bio/TreeIO.pm > > > > > Questions:- > > > > > 1. Please let me know if I am using the correct version. > If not, please point me to the latest one. > > > > > 2. Provided that the version I am using is the right one, please > let me know what is wrong with the script. > > > > > Thank you. > Regards, > Neha. > > > > > > > > > > > > > > > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > > > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > > > > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > > > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > > > > > > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not > to impress !" > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -Neha Nahar " Work for cause and not for applause, live to express and not to impress !" --------------------------------- Here?s a new way to find what you're looking for - Yahoo! Answers From cjfields at uiuc.edu Thu Feb 15 10:44:23 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 09:44:23 -0600 Subject: [Bioperl-l] DB.t failures In-Reply-To: <45D4700C.8020305@sendu.me.uk> References: <45D3167F.2000608@sendu.me.uk> <45D4621C.6040309@sendu.me.uk> <586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu> <45D4700C.8020305@sendu.me.uk> Message-ID: <809A1A9D-0344-412D-98D7-EA852EFED63D@uiuc.edu> On Feb 15, 2007, at 8:37 AM, Sendu Bala wrote: > Chris Fields wrote: >> >> On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote: >> >>> Jay Hannah wrote: >>>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: >>>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no >>>>> longer >>>>> getting sequences back from NCBI in the order we requested them in >>>>> batch mode. >> >> Okay, I committed a fix for that. I hope there are many users who >> depend on the returned sequence order for anything! > > s/are/aren't/ ? Yes, my oops. > I suspect there might be, and its certainly a reasonable assumption to > make. Did you not see an easy way of maintaining the order? I haven't looked (been busy the last few days), but I think there is a way via efetch. We could add in something to the default base URL if there is something or (probably better) add a sort_order() method to designate a particular sort order, defaulting to the old order if not set. chris From lstein at cshl.edu Thu Feb 15 13:53:13 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 15 Feb 2007 13:53:13 -0500 Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com> Hi Michael, When you set up the panel, do this: Bio::Graphics::Panel->new(-blah -blah, -pad_left => 20, -pad_right => 20); This will leave enough room on the left and right for you to see the Y axis. Otherwise it runs off the edge of the image (ok, this is a mis-design, but it was the only way to solve a chicken-and-egg problem about who gets to say how wide the panel is) Lincoln On 2/15/07, michael watson (IAH-C) wrote: > > Hi > > OK I have some great images out of this glyph, but I can't see the axis, > and nor is it labelled (ie does it go from 0 - 100%?) so isn't great for > publication. The docs say: > > "NOTE: -gc_window=>'auto' gives nice results and is recommended for > drawing GC content. The GC content axes draw slightly outside the > panel, so you may wish to add some extra padding on the right and > left. " > > Any idea how to do this? > > Basically, I want a nice GC graph with the axis quite clearly labelled, > and a nice "%GC" title next to it :) > > Thanks > > Mick > > The information contained in this message may be confidential or legally > privileged and is intended solely for the addressee. If you have > received this message in error please delete it & notify the originator > immediately. > Unauthorised use, disclosure, copying or alteration of this message is > forbidden & may be unlawful. > The contents of this e-mail are the views of the sender and do not > necessarily represent the views of the Institute. > This email and associated attachments has been checked locally for > viruses but we can accept no responsibility once it has left our > systems. > Communications on Institute computers are monitored to secure the > effective operation of the systems and for other lawful purposes. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From johnsonm at gmail.com Thu Feb 15 14:24:08 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Thu, 15 Feb 2007 13:24:08 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: Done. Bug opened in Bugzilla, diffs attached including new/updated tests: http://bugzilla.open-bio.org/show_bug.cgi?id=2206 Can somebody grab that, take a look, tweak to taste, test and commit? Tests pass on my end presently. On 2/13/07, Chris Fields wrote: > > You'll also want to update whatever relevant tests there are for > Glimmer; looks like they are in GenPred.t. > > chris > From cjfields at uiuc.edu Thu Feb 15 14:37:22 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 13:37:22 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: <4C15214E-AE4B-4D85-A710-60536B08BE86@uiuc.edu> On Feb 15, 2007, at 1:24 PM, Mark Johnson wrote: > Done. Bug opened in Bugzilla, diffs attached including new/updated > tests: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2206 > > Can somebody grab that, take a look, tweak to taste, test and > commit? Tests > pass on my end presently. > > On 2/13/07, Chris Fields wrote: >> >> You'll also want to update whatever relevant tests there are for >> Glimmer; looks like they are in GenPred.t. >> >> chris Done; everything passed on this end as well, no tweaking necessary. If there are problems we'll definitely hear about it down the road (Glimmer is a popular tool), but I think you'll be fine. Thanks Mark! chris From cjfields at uiuc.edu Thu Feb 15 14:46:07 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 13:46:07 -0600 Subject: [Bioperl-l] DB.t failures In-Reply-To: <809A1A9D-0344-412D-98D7-EA852EFED63D@uiuc.edu> References: <45D3167F.2000608@sendu.me.uk> <45D4621C.6040309@sendu.me.uk> <586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu> <45D4700C.8020305@sendu.me.uk> <809A1A9D-0344-412D-98D7-EA852EFED63D@uiuc.edu> Message-ID: On Feb 15, 2007, at 9:44 AM, Chris Fields wrote: > > On Feb 15, 2007, at 8:37 AM, Sendu Bala wrote: > >> Chris Fields wrote: >>> >>> On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote: >>> >>>> Jay Hannah wrote: >>>>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: >>>>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no >>>>>> longer >>>>>> getting sequences back from NCBI in the order we requested >>>>>> them in >>>>>> batch mode. >>> >>> Okay, I committed a fix for that. I hope there are many users who >>> depend on the returned sequence order for anything! >> >> s/are/aren't/ ? > > Yes, my oops. > >> I suspect there might be, and its certainly a reasonable >> assumption to >> make. Did you not see an easy way of maintaining the order? > > I haven't looked (been busy the last few days), but I think there is > a way via efetch. > > We could add in something to the default base URL if there is > something or (probably better) add a sort_order() method to designate > a particular sort order, defaulting to the old order if not set. > > chris Delving in to it further, the problem only occurs when using get_seq_stream() directly in batch mode, which is likely only used by developers for testing. The sort issue only pops up when eposting IDs using that mode; retrieved seqs are returned in a different order than through a direct efetch query (the default with get_Stream* or get_Seq* methods). No use of the 'sort' parameter works to get around that problem, not a complete surprise since it is supposed to only work for PubMed, but since the method is rarely used I'll just leave the bullet-proofed tests alone. chris From letondal at pasteur.fr Thu Feb 15 15:23:55 2007 From: letondal at pasteur.fr (Catherine Letondal) Date: Thu, 15 Feb 2007 21:23:55 +0100 Subject: [Bioperl-l] Problem parsing clustalw format ni AlignIO Message-ID: Hi bioperlers, I have a script called protal2dna (http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html, see attachment #1) that realign DNA sequences giving their sequences + the corresponding protein alignment (sequences have to be in the same order or named equivalently). We have a parsing problem reported from the AlignIO class when users enter some clustalw file (see attachment #2 for an example): % protal2dna alig-protal2dna.dat dna-protal2dna.data no alignment available in 'clustalw' format from file 'alig-protal2dna.dat' % I have tried with bioperl 1.4. I have looked in the archive and in the BUGS, but found nothing? Is there any bug fix for this? I also provide the DNA sequences file if you want to test. Thanks a lot in advance, -- Catherine Letondal -- Institut Pasteur www.pasteur.fr/~letondal -------------- next part -------------- A non-text attachment was scrubbed... Name: protal2dna Type: application/octet-stream Size: 11093 bytes Desc: not available URL: -------------- next part -------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: alig-protal2dna.dat Type: application/octet-stream Size: 12022 bytes Desc: not available URL: -------------- next part -------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: dna-protal2dna.data Type: application/octet-stream Size: 7739 bytes Desc: not available URL: From Kevin.M.Brown at asu.edu Thu Feb 15 16:38:25 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 15 Feb 2007 14:38:25 -0700 Subject: [Bioperl-l] Problem parsing clustalw format ni AlignIO In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B402BA7764@EX02.asurite.ad.asu.edu> Did you try Bioperl 1.5.2 to see if updates to it might fix the issue? IIRC 1.4 is nearly 2 years old now. 1.5.2 was released within the last few months. > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Catherine Letondal > Sent: Thursday, February 15, 2007 1:24 PM > To: bioperl-l > Cc: Catherine Letondal; Katja Schuerer > Subject: [Bioperl-l] Problem parsing clustalw format ni AlignIO > > Hi bioperlers, > > I have a script called protal2dna > (http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html, > see attachment #1) that realign DNA sequences giving their > sequences + the corresponding protein alignment (sequences > have to be in the same order or named equivalently). We have > a parsing problem reported from the AlignIO class when users > enter some clustalw file (see attachment #2 for an example): > > % protal2dna alig-protal2dna.dat dna-protal2dna.data no > alignment available in 'clustalw' format from file > 'alig-protal2dna.dat' > % > > I have tried with bioperl 1.4. I have looked in the archive > and in the BUGS, but found nothing? > Is there any bug fix for this? I also provide the DNA > sequences file if you want to test. > > Thanks a lot in advance, > > -- > Catherine Letondal -- Institut Pasteur > www.pasteur.fr/~letondal > > From cjfields at uiuc.edu Thu Feb 15 16:50:54 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 15:50:54 -0600 Subject: [Bioperl-l] Bio::Root::Utilities.pm In-Reply-To: <8f200b4c0702141020x5815018bvd69873222582d9d5@mail.gmail.com> References: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com> <1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu> <8f200b4c0702141020x5815018bvd69873222582d9d5@mail.gmail.com> Message-ID: On Feb 14, 2007, at 12:20 PM, Steve Chervitz wrote: ... >> >> I don't have a problem with adding it back, esp. if tests are added. >> Everything in Bio::Root* not tied to a module was yanked out when no >> one spoke up about cleaning up Bio::Root* modules: >> >> http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12792/ >> focus=12839 >> >> Maybe others disagree? >> >> chris >> > > Sorry I missed out on that thread. I had some trouble with my > bioperl-l > email delivery getting disabled due to excessive bounces, and it > took me a > while to catch it. > > Bio::Root::Utilities is quite a grab bag of miscellaneous general > functions > that are occasionally useful for perl scripting (e.g., determining > end-of-line characters, sending email, etc.). The code could > definitely use > a review, and maybe an example script to advertise it. I can look > into this, > and suggestions are welcome. > > Steve Steve, I have added Root::Utilities back to CVS but I didn't know if I should add back the other related Root modules (didn't know what your future plans were for them). Could the Bio::Root::Global and Bio::Root::Object stuff be consolidated into Bio::Root::Utilities or would that be too problematic? None of the other Bio* modules currently use them. Personally, I use Date::Manip for anything that requires date/time manipulation (updating seq records based on dates, for instance). Some of the other utilities could come in handy, though. Don't know if that helps... chris From cjfields at uiuc.edu Thu Feb 15 16:51:58 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 15:51:58 -0600 Subject: [Bioperl-l] XEMBL deprecation Message-ID: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> I have gone ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService both for deprecation in the wiki and in CVS (though I haven't set any timeline): http://www.bioperl.org/wiki/Deprecated_modules The XEMBL web services are no longer available, and it looks like everything is running through DBFetch now. The XEMBL tests are skipped if no server is detected, so they shouldn't cause any problems with Bioperl installations. Lincoln, was there anything to salvage from these? I noticed they used SOAP::Lite, so maybe we could convert these over to a SOAP-based interface to DBFetch web services? chris From johnsonm at gmail.com Thu Feb 15 17:29:37 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Thu, 15 Feb 2007 16:29:37 -0600 Subject: [Bioperl-l] Bio::Tools::Run::Glimmer? Message-ID: Now that I've got Bio::Tools::Glimmer parsing Glimmer2 and Glimmer3 output, I suppose I might as well go and write Bio::Tools::Run::Glimmer. I suspect another 4-in-1 module may be possible. Now that I think about it, I'll need one for GeneMark, too. Comments? Suggestions on a good module to use as a template? From hlapp at gmx.net Thu Feb 15 20:18:56 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 15 Feb 2007 20:18:56 -0500 Subject: [Bioperl-l] XEMBL deprecation In-Reply-To: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> Message-ID: <989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net> On Feb 15, 2007, at 4:51 PM, Chris Fields wrote: > The XEMBL web services are no longer available What happens if someone invokes the module? Should it maybe return nothing and warn()? I don't think it's a good idea if the module just silently does not function because its backend is no more. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Thu Feb 15 20:48:12 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 19:48:12 -0600 Subject: [Bioperl-l] XEMBL deprecation In-Reply-To: <989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net> References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> <989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net> Message-ID: <00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu> On Feb 15, 2007, at 7:18 PM, Hilmar Lapp wrote: > On Feb 15, 2007, at 4:51 PM, Chris Fields wrote: > >> The XEMBL web services are no longer available > > What happens if someone invokes the module? Should it maybe return > nothing and warn()? I don't think it's a good idea if the module > just silently does not function because its backend is no more. > > -hilmar Yes, I thought the same. I have added a warn() noting the deprecation to the XEMBL constructor and removed XEMBL tests from CVS. The modules are still there for the time being. I actually worry more about the internals; it would be a shame to toss them altogether. Would it be worth it to shift this towards a SOAP-based interface to DBFetch? Or, more precisely, how much trouble would it be to do so? chris From hlapp at gmx.net Thu Feb 15 20:54:29 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 15 Feb 2007 20:54:29 -0500 Subject: [Bioperl-l] XEMBL deprecation In-Reply-To: <00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu> References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> <989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net> <00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu> Message-ID: Well, if dbFetch dosn't have a SOAP based interface, how would you want to do this? -hilmar On Feb 15, 2007, at 8:48 PM, Chris Fields wrote: > On Feb 15, 2007, at 7:18 PM, Hilmar Lapp wrote: > >> On Feb 15, 2007, at 4:51 PM, Chris Fields wrote: >> >>> The XEMBL web services are no longer available >> >> What happens if someone invokes the module? Should it maybe return >> nothing and warn()? I don't think it's a good idea if the module >> just silently does not function because its backend is no more. >> >> -hilmar > > Yes, I thought the same. I have added a warn() noting the > deprecation to the XEMBL constructor and removed XEMBL tests from > CVS. The modules are still there for the time being. > > I actually worry more about the internals; it would be a shame to > toss them altogether. Would it be worth it to shift this towards a > SOAP-based interface to DBFetch? Or, more precisely, how much > trouble would it be to do so? > > chris -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Thu Feb 15 20:59:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 19:59:46 -0600 Subject: [Bioperl-l] XEMBL deprecation In-Reply-To: References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> <989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net> <00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu> Message-ID: <8C7E18C6-B38D-4E15-BE9C-84256B09C312@uiuc.edu> On Feb 15, 2007, at 7:54 PM, Hilmar Lapp wrote: > Well, if dbFetch dosn't have a SOAP based interface, how would you > want to do this? > > -hilmar DBfetch has a SOAP-based interface: http://www.ebi.ac.uk/Tools/webservices/services/dbfetch Just not sure how easy it would be to switch XEMBL code over to using it. We already have Bio::DB::DBFetch so it may be redundant, but I don't recall any other SOAP-based tools in BioPerl beyond some stuff in bioperl-run (and I'm not sure how up-to-date the DBFetch module is). chris From jimhu at tamu.edu Fri Feb 16 00:20:09 2007 From: jimhu at tamu.edu (Jim Hu) Date: Thu, 15 Feb 2007 23:20:09 -0600 Subject: [Bioperl-l] Pathway tools output parser In-Reply-To: References: Message-ID: <1632E2BF-4402-47DE-B750-9763E02711D2@tamu.edu> Hi Chris, I need to check the list more often! I never got an answer here, but Eric Just pointed out a perl api at TAIR that's linked from the BioCyc site. I've used the lisp parser functions from that to move the data to a perl array of arrays, and I'm working on creating object classes for BioCyc objects, starting with genes and products. I need to look at the appropriate ways to link this up to the existing codebase for interconverting to Chado and other BioPerl data types. Jim ===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 On Feb 7, 2007, at 12:07 AM, Chris Mungall wrote: > > Hi Jim > > Did you ever get an answer to this? I'm interested in storing > pathway data > in Chado & I remember enough lisp to get it into something perl- > manageable > like XML > > On Thu, 25 Jan 2007, Jim Hu wrote: > >> Is there a module to parse the lisp object files from Peter Karp's >> Pathway Tools? I need a parser to convert the gene and protein >> objects in EcoCyc releases into something that can be imported into >> Chado. >> ===================================== >> Jim Hu >> Associate Professor >> Dept. of Biochemistry and Biophysics >> 2128 TAMU >> Texas A&M Univ. >> College Station, TX 77843-2128 >> 979-862-4054 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From lstein at cshl.edu Fri Feb 16 08:35:19 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 16 Feb 2007 08:35:19 -0500 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts In-Reply-To: <45D1E2A5.6060104@sendu.me.uk> References: <45D1E2A5.6060104@sendu.me.uk> Message-ID: <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com> Hi, Older versions of Storable can't deal with features that contain subroutine refs. You should get the current version from CPAN. Note that there is a slight security problem here if you don't trust the objects stored in the database. If they contain code refs, the code will be evaluated during deserialization. Lincoln On 2/13/07, Sendu Bala wrote: > > I have some raw sequences in a Bio::DB::SeqFeature::Store mysql database > and wanted to associated some basic information with them, like exon > positions. I thought of creating Bio::SeqFeature::Gene::Transcript > objects and storing them so I could later use features() to see what > other features overlapped exons. I ran into a fatal error that can be > replicated with the following simplified one-liner: > > perl -MBio::DB::SeqFeature::Store -MBio::SeqFeature::Gene::Transcript -e > '$db = Bio::DB::SeqFeature::Store->new(-adaptor => "DBI::mysql", -dsn => > "dbi:mysql:test"); $trans = > Bio::SeqFeature::Gene::Transcript->new(-start => 1, -end => 2, -seq_id > => "test"); $db->store($trans); @trans = $db->features(-seqid => $id, > -type => "transcript"); print "@trans\n";' > > code sub { > package Bio::SeqFeature::Generic; > use strict 'refs'; > my $self = shift @_; > foreach my $f (@{[] unless $$self{'_gsf_sub_array'};}) { > $f = undef; > } > $$self{'_gsf_seq'} = undef; > foreach my $t (keys %{{} unless $$self{'_gsf_tag_hash'};}) { > $$self{'_gsf_tag_hash'}{$t} = undef; > delete $$self{'_gsf_tag_hash'}{$t}; > } > } did not evaluate to a subroutine reference, at > /.../Bio/DB/SeqFeature/Store.pm line 2280 > > > Is this a bug? Or am I taking the wrong approach? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Fri Feb 16 08:47:29 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 16 Feb 2007 08:47:29 -0500 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts In-Reply-To: <45D5B42A.1080303@sendu.me.uk> References: <45D1E2A5.6060104@sendu.me.uk> <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com> <45D5B42A.1080303@sendu.me.uk> Message-ID: <6dce9a0b0702160547s5873cd2bg2c5cf09779138249@mail.gmail.com> Hi Sendu, I'll do a little digging and let you know. Lincoln On 2/16/07, Sendu Bala wrote: > > Lincoln Stein wrote: > > Hi, > > > > Older versions of Storable can't deal with features that contain > > subroutine refs. You should get the current version from CPAN. > > Do you have any idea which version of Storable first supported this? I > can specify that version in Bioperl's Build.PL. > > (else I just just specify the latest version) > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Fri Feb 16 08:52:30 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 16 Feb 2007 08:52:30 -0500 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts In-Reply-To: <45D5B42A.1080303@sendu.me.uk> References: <45D1E2A5.6060104@sendu.me.uk> <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com> <45D5B42A.1080303@sendu.me.uk> Message-ID: <6dce9a0b0702160552l11417675kec7b5b4537cbfa70@mail.gmail.com> It looks like 2.05 or higher is the Storable version to use. It requires B::Deparse, which is (I think) standard on perl 5.6 or higher. Lincoln On 2/16/07, Sendu Bala wrote: > > Lincoln Stein wrote: > > Hi, > > > > Older versions of Storable can't deal with features that contain > > subroutine refs. You should get the current version from CPAN. > > Do you have any idea which version of Storable first supported this? I > can specify that version in Bioperl's Build.PL. > > (else I just just specify the latest version) > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Fri Feb 16 08:55:06 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 16 Feb 2007 08:55:06 -0500 Subject: [Bioperl-l] XEMBL deprecation In-Reply-To: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> Message-ID: <6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com> I like the idea of converting these over to use DBFetch's SOAP services. On the other hand, it isn't llikely that I'm going to have time to do this anytime soon. Probably the best thing to do is to issue a warning and return undef if someone tries to use othe XEMBL module. I'll make that change. Lincoln On 2/15/07, Chris Fields wrote: > > I have gone ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService > both for deprecation in the wiki and in CVS (though I haven't set any > timeline): > > http://www.bioperl.org/wiki/Deprecated_modules > > The XEMBL web services are no longer available, and it looks like > everything is running through DBFetch now. The XEMBL tests are > skipped if no server is detected, so they shouldn't cause any > problems with Bioperl installations. > > Lincoln, was there anything to salvage from these? I noticed they > used SOAP::Lite, so maybe we could convert these over to a SOAP-based > interface to DBFetch web services? > > chris > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Fri Feb 16 08:55:47 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 16 Feb 2007 08:55:47 -0500 Subject: [Bioperl-l] XEMBL deprecation In-Reply-To: <6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com> References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> <6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com> Message-ID: <6dce9a0b0702160555s5b4d7d70n936c603bc79dd40d@mail.gmail.com> Oh, looks like someone has inserted the warnings already. Good. Lincoln On 2/16/07, Lincoln Stein wrote: > > I like the idea of converting these over to use DBFetch's SOAP services. > On the other hand, it isn't llikely that I'm going to have time to do this > anytime soon. > > Probably the best thing to do is to issue a warning and return undef if > someone tries to use othe XEMBL module. I'll make that change. > > Lincoln > > On 2/15/07, Chris Fields wrote: > > > > I have gone ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService > > both for deprecation in the wiki and in CVS (though I haven't set any > > timeline): > > > > http://www.bioperl.org/wiki/Deprecated_modules > > > > The XEMBL web services are no longer available, and it looks like > > everything is running through DBFetch now. The XEMBL tests are > > skipped if no server is detected, so they shouldn't cause any > > problems with Bioperl installations. > > > > Lincoln, was there anything to salvage from these? I noticed they > > used SOAP::Lite, so maybe we could convert these over to a SOAP-based > > interface to DBFetch web services? > > > > chris > > > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From bix at sendu.me.uk Fri Feb 16 08:56:50 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 16 Feb 2007 13:56:50 +0000 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts In-Reply-To: <6dce9a0b0702160552l11417675kec7b5b4537cbfa70@mail.gmail.com> References: <45D1E2A5.6060104@sendu.me.uk> <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com> <45D5B42A.1080303@sendu.me.uk> <6dce9a0b0702160552l11417675kec7b5b4537cbfa70@mail.gmail.com> Message-ID: <45D5B822.6080908@sendu.me.uk> Lincoln Stein wrote: > It looks like 2.05 or higher is the Storable version to use. It requires > B::Deparse, which is (I think) standard on perl 5.6 or higher. Thanks, now recommended in Build.PL From cjfields at uiuc.edu Fri Feb 16 09:05:08 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 16 Feb 2007 08:05:08 -0600 Subject: [Bioperl-l] XEMBL deprecation In-Reply-To: <6dce9a0b0702160555s5b4d7d70n936c603bc79dd40d@mail.gmail.com> References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> <6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com> <6dce9a0b0702160555s5b4d7d70n936c603bc79dd40d@mail.gmail.com> Message-ID: I added the warning yesterday. We can add something to the project priority list on modifying XEMBL to use DBFetch instead; I like the SOAP-based interface. I am thinking of a similar interface for NCBI eutils but I haven't had time to work on it. chris On Feb 16, 2007, at 7:55 AM, Lincoln Stein wrote: > Oh, looks like someone has inserted the warnings already. Good. > > Lincoln > > On 2/16/07, Lincoln Stein wrote:I like the idea > of converting these over to use DBFetch's SOAP services. On the > other hand, it isn't llikely that I'm going to have time to do this > anytime soon. > > Probably the best thing to do is to issue a warning and return > undef if someone tries to use othe XEMBL module. I'll make that > change. > > Lincoln > > > On 2/15/07, Chris Fields < cjfields at uiuc.edu> wrote: I have gone > ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService > both for deprecation in the wiki and in CVS (though I haven't set any > timeline): > > http://www.bioperl.org/wiki/Deprecated_modules > > The XEMBL web services are no longer available, and it looks like > everything is running through DBFetch now. The XEMBL tests are > skipped if no server is detected, so they shouldn't cause any > problems with Bioperl installations. > > Lincoln, was there anything to salvage from these? I noticed they > used SOAP::Lite, so maybe we could convert these over to a SOAP-based > interface to DBFetch web services? > > chris > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Fri Feb 16 08:39:54 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 16 Feb 2007 13:39:54 +0000 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts In-Reply-To: <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com> References: <45D1E2A5.6060104@sendu.me.uk> <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com> Message-ID: <45D5B42A.1080303@sendu.me.uk> Lincoln Stein wrote: > Hi, > > Older versions of Storable can't deal with features that contain > subroutine refs. You should get the current version from CPAN. Do you have any idea which version of Storable first supported this? I can specify that version in Bioperl's Build.PL. (else I just just specify the latest version) From eu at otelo-online.de Sat Feb 17 07:55:08 2007 From: eu at otelo-online.de (eu at otelo-online.de) Date: Sat, 17 Feb 2007 13:55:08 +0100 (CET) Subject: [Bioperl-l] Bioperl Module OddCodes(help) Message-ID: <29037001.1171716908969.JavaMail.ngmail@webmail18> Hello @all, i want translate a Sequence in Fasta Format only to acidic,basic and polar dependent on the pH. OddCodes Module can ony to acidic,basic, polar and hydrophobic. And i think on default pH. Can somebody help me? I dont know whether it is possible? Because i need for each amino acid a positive, negative charge and unchargedly. thx Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: g?nstig und schnell mit DSL - das All-Inclusive-Paket f?r clevere Doppel-Sparer, nur 44,85 ? inkl. DSL- und ISDN-Grundgeb?hr! http://www.arcor.de/rd/emf-dsl-2 From The_Polymorph at rocketmail.com Sun Feb 18 14:08:34 2007 From: The_Polymorph at rocketmail.com (Caitlin) Date: Sun, 18 Feb 2007 11:08:34 -0800 (PST) Subject: [Bioperl-l] Missing ppm for Bioperl 1.5.2_100(?) Message-ID: <148421.50501.qm@web50801.mail.yahoo.com> Hi. In an attempt to upgrade my Bioperl install from 1.5.2 RC5 to 1.5.2_100, I noticed the ppm was not found on the activestate repositories. Thanks, ~Caitlin ____________________________________________________________________________________ No need to miss a message. Get email on-the-go with Yahoo! Mail for Mobile. Get started. http://mobile.yahoo.com/mail From bix at sendu.me.uk Sun Feb 18 15:36:03 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 18 Feb 2007 20:36:03 +0000 Subject: [Bioperl-l] Missing ppm for Bioperl 1.5.2_100(?) In-Reply-To: <148421.50501.qm@web50801.mail.yahoo.com> References: <148421.50501.qm@web50801.mail.yahoo.com> Message-ID: <45D8B8B3.4000408@sendu.me.uk> Caitlin wrote: > Hi. > > In an attempt to upgrade my Bioperl install from 1.5.2 RC5 to > 1.5.2_100, I noticed the ppm was not found on the activestate > repositories. Follow the install instructions: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows Its not in the normal activestate repository, but on bioperl.org. From t.nugent at cs.ucl.ac.uk Mon Feb 19 12:29:48 2007 From: t.nugent at cs.ucl.ac.uk (Tim Nugent) Date: Mon, 19 Feb 2007 17:29:48 +0000 Subject: [Bioperl-l] Module to draw transmembrane protein toplogoy Message-ID: <45D9DE8C.2010301@cs.ucl.ac.uk> Hi everyone, I've written a perl module to display transmembrane protein topology using GD. There are various options, including labels, helix/loop dimensions, colour schemes etc but it only requires a string or array containing the protein topology (e.g. transmembrane helix start/stop points). It produces output like this: http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png using the code at the bottom. Here is a the module: http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm I've never submitted anything to Bioperl before - is this sort of thing likely to be of use to others? I imagine it would sit alongside some of the Bio::Graphics stuff. Best wishes, Tim #!/usr/bin/perl use strict; use warnings; use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to module use DrawTransmembrane; my @topology = (20,45,59,70,86,109,145,168,194,220); my %labels = ('5' => '5 - Sulphation Site', '21' => '1st Helix', '47' => '40 - Mutation', '60' => 'Voltage Sensor', '72' => '72 - Mutation 2', '73' => '73 - Mutation 3', '138' => '138 - Glycosylation Site', '170' => '170 - Phosphorylation Site', '200' => 'Last Helix'); my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a cartoon displaying transmembrane helices.', -topology => \@topology, -n_terminal => 'out', -helix_width => 48, -helix_height => 125, -short_loop_limit => 10, -long_loop_limit => 35, -loop_width => 25, -colour_scheme => 'yellow', -labels => \%labels, -text_offset => -10); ## print the .png file my $output = 'test.png'; open(OUTPUT, ">$output"); binmode OUTPUT; print OUTPUT $im->png; close OUTPUT; my $system = `display $output`; -- Tim Nugent (MRes) Research Student Bioinformatics Unit Department of Computer Science University College London Gower Street London WC1E 6BT Tel: 020-7679-0410 t.nugent at ucl.ac.uk From bix at sendu.me.uk Mon Feb 19 12:42:23 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 19 Feb 2007 17:42:23 +0000 Subject: [Bioperl-l] t/FeatureHolder.x Message-ID: <45D9E17F.4030302@sendu.me.uk> Is this supposed to work? It doesn't get run in the test suite normally because of its name. With a live checkout I get: ./Build test --test_files t/FeatureHolder.x --verbose t/FeatureHolder....1..6 ok 1 ok 2 Set group tag to: locus_tag GROUPS: GROUP [?]:source [snip] resolved pair Bio::SeqFeature::Generic=HASH(0x1375dc0) Bio::SeqFeature::Generic=HASH(0x1362830) UNFLATTENING GROUP: GROUP [?]:gene UNFLATTENING GROUP: GROUP [?]:repeat_region UNFLATTENING GROUP: GROUP [?]:gene UNFLATTENING GROUP: GROUP [?]:repeat_region UNFLATTENING GROUP: GROUP [BG:DS07721.3]:gene mRNA CDS UNFLATTENING GROUP: GROUP [BG:DS07721.6]:gene mRNA CDS ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: DUPLICATE ID: AAF53399.1 STACK: Error::throw STACK: Bio::Root::Root::throw /home/sendu/src/bioperl/core/blib/lib/Bio/Root/Root.pm:359 STACK: Bio::SeqFeature::Tools::IDHandler::create_hierarchy_from_ParentIDs /home/sendu/src/bioperl/core/blib/lib/Bio/SeqFeature/Tools/IDHandler.pm:175 STACK: Bio::FeatureHolderI::create_hierarchy_from_ParentIDs /home/sendu/src/bioperl/core/blib/lib/Bio/FeatureHolderI.pm:245 STACK: t/FeatureHolder.x:68 ----------------------------------------------------------- dubious Test returned status 255 (wstat 65280, 0xff00) DIED. FAILED tests 3-6 Failed 4/6 tests, 33.33% okay Failed Test Stat Wstat Total Fail List of Failed ------------------------------------------------------------------------------- t/FeatureHolder.x 255 65280 6 8 3-6 Failed 1/1 test scripts. 4/6 subtests failed. Files=1, Tests=6, 1 wallclock secs ( 0.55 cusr + 0.04 csys = 0.59 CPU) Failed 1/1 test programs. 4/6 subtests failed. It also fails quite differently with 1.5.2. From cjfields at uiuc.edu Mon Feb 19 15:04:20 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 19 Feb 2007 14:04:20 -0600 Subject: [Bioperl-l] t/FeatureHolder.x In-Reply-To: <45D9E17F.4030302@sendu.me.uk> References: <45D9E17F.4030302@sendu.me.uk> Message-ID: <534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu> Looks like that's some of Chris Mungall's stuff for GFF3. Don't know if he's stalking the mail list. Wonder if this has anything to do the feature/annotation changes around rel 1.5. (the other) chris On Feb 19, 2007, at 11:42 AM, Sendu Bala wrote: > Is this supposed to work? It doesn't get run in the test suite > normally > because of its name. > > With a live checkout I get: > ./Build test --test_files t/FeatureHolder.x --verbose > t/FeatureHolder....1..6 ... From cjfields at uiuc.edu Mon Feb 19 16:24:04 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 19 Feb 2007 15:24:04 -0600 Subject: [Bioperl-l] Module to draw transmembrane protein toplogoy In-Reply-To: <45D9DE8C.2010301@cs.ucl.ac.uk> References: <45D9DE8C.2010301@cs.ucl.ac.uk> Message-ID: <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu> I think this is pretty nice! We can add the code and test script to bugzilla and (if someone has time) try to see where it might fit in, though Bio::Graphics sounds like a good spot. Anyone else have ideas on where this could go? chris On Feb 19, 2007, at 11:29 AM, Tim Nugent wrote: > Hi everyone, > > I've written a perl module to display transmembrane protein topology > using GD. There are various options, including labels, helix/loop > dimensions, colour schemes etc but it only requires a string or array > containing the protein topology (e.g. transmembrane helix start/stop > points). It produces output like this: > > http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png > http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png > > using the code at the bottom. > > Here is a the module: > http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm > > I've never submitted anything to Bioperl before - is this sort of > thing > likely to be of use to others? I imagine it would sit alongside > some of > the Bio::Graphics stuff. > > Best wishes, > > Tim > > #!/usr/bin/perl > > use strict; > use warnings; > use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to module > use DrawTransmembrane; > > my @topology = (20,45,59,70,86,109,145,168,194,220); > > my %labels = ('5' => '5 - Sulphation Site', > '21' => '1st Helix', > '47' => '40 - Mutation', > '60' => 'Voltage Sensor', > '72' => '72 - Mutation 2', > '73' => '73 - Mutation 3', > '138' => '138 - Glycosylation Site', > '170' => '170 - Phosphorylation Site', > '200' => 'Last Helix'); > > my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a > cartoon displaying transmembrane helices.', > -topology => > \@topology, > -n_terminal => 'out', > -helix_width => 48, > -helix_height => 125, > -short_loop_limit > => 10, > -long_loop_limit => > 35, > -loop_width => 25, > -colour_scheme => > 'yellow', > -labels => \%labels, > -text_offset => -10); > > ## print the .png file > my $output = 'test.png'; > open(OUTPUT, ">$output"); > binmode OUTPUT; > print OUTPUT $im->png; > close OUTPUT; > > my $system = `display $output`; > > -- > Tim Nugent (MRes) > Research Student > Bioinformatics Unit > Department of Computer Science > University College London > Gower Street > London WC1E 6BT > Tel: 020-7679-0410 > t.nugent at ucl.ac.uk > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjm at fruitfly.org Mon Feb 19 17:23:56 2007 From: cjm at fruitfly.org (Chris Mungall) Date: Mon, 19 Feb 2007 14:23:56 -0800 Subject: [Bioperl-l] t/FeatureHolder.x In-Reply-To: <534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu> References: <45D9E17F.4030302@sendu.me.uk> <534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu> Message-ID: On Feb 19, 2007, at 12:04 PM, Chris Fields wrote: > Looks like that's some of Chris Mungall's stuff for GFF3. Don't know > if he's stalking the mail list. occasionally.. > Wonder if this has anything to do the feature/annotation changes > around rel 1.5. possibly even before then. there was a reason for the .x prefix... I think it was intended to denote requirements; tests that don't pass yet but should in the future anyway, this file can go > (the other) chris > > On Feb 19, 2007, at 11:42 AM, Sendu Bala wrote: > >> Is this supposed to work? It doesn't get run in the test suite >> normally >> because of its name. >> >> With a live checkout I get: >> ./Build test --test_files t/FeatureHolder.x --verbose >> t/FeatureHolder....1..6 > ... > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From torsten.seemann at infotech.monash.edu.au Mon Feb 19 18:20:48 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 20 Feb 2007 10:20:48 +1100 Subject: [Bioperl-l] Bioperl Module OddCodes(help) In-Reply-To: <29037001.1171716908969.JavaMail.ngmail@webmail18> References: <29037001.1171716908969.JavaMail.ngmail@webmail18> Message-ID: > i want translate a Sequence in Fasta Format only to acidic,basic and polar dependent on the pH. > OddCodes Module can ony to acidic,basic, polar and hydrophobic. And i think on default pH. > Can somebody help me? I dont know whether it is possible? > Because i need for each amino acid a positive, negative charge and unchargedly. The latest released Bioperl 1.5.x has a charge() function which does what you want: http://doc.bioperl.org/releases/bioperl-1.5.2/Bio/Tools/OddCodes.html It returns A, N, C for the charges. --Torsten From bix at sendu.me.uk Tue Feb 20 06:18:14 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 20 Feb 2007 11:18:14 +0000 Subject: [Bioperl-l] Bio::Graphics::FeatureBase seq_id question Message-ID: <45DAD8F6.1030409@sendu.me.uk> Bio::Graphics::FeatureBase::seq_id is currently implemented as a read-only alias to ref(): sub seq_id { shift->ref() } What is the reasoning behind this? Can it be made to handle setting of the value as well?: sub seq_id { shift->ref(@_) } Cheers, Sendu. From cjfields at uiuc.edu Tue Feb 20 08:39:11 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Feb 2007 07:39:11 -0600 Subject: [Bioperl-l] t/FeatureHolder.x In-Reply-To: References: <45D9E17F.4030302@sendu.me.uk> <534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu> Message-ID: <67E26F10-67D5-405E-A00E-826EF51C476F@uiuc.edu> On Feb 19, 2007, at 4:23 PM, Chris Mungall wrote: > On Feb 19, 2007, at 12:04 PM, Chris Fields wrote: > >> Looks like that's some of Chris Mungall's stuff for GFF3. Don't know >> if he's stalking the mail list. > > occasionally.. > >> Wonder if this has anything to do the feature/annotation changes >> around rel 1.5. > > possibly even before then. > > there was a reason for the .x prefix... I think it was intended to > denote requirements; tests that don't pass yet but should in the > future > > anyway, this file can go Chris, I removed it from CVS. Thanks! (the other) chris besides chris D. P.S. I may have some Data::Stag questions for you at some point. I'm guessing you're still at fruitfly.org? From cjfields at uiuc.edu Tue Feb 20 08:29:20 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Feb 2007 07:29:20 -0600 Subject: [Bioperl-l] Fwd: help on remote blast References: <20070220073200.M42567@bic.boseinst.ernet.in> Message-ID: <6CC54E14-0581-45AF-8F12-E500A2FFDE86@uiuc.edu> Sanjib, You shouldn't email the developers directly. Questions like this should go to the bioperl mail list in case I (or others) can't answer them immediately. chris Begin forwarded message: > From: "Sanjib Kumar Gupta" > Date: February 20, 2007 1:32:00 AM CST > To: cjfields at uiuc.edu > Subject: help on remote blast > > Dear Dr. Chris > I am very new usedr to bioperl. and have been using the script for > retrieving some blast sequences . But suddenly it has stopped > retrieving > #perl n9.pl > te.pep > waiting........ > for a long time > > I am attaching the file. Can you please tell me what I should do so > that it > again runs. > > > -- > Sanjib Kumar Gupta > Bioinformatics Centre > Bose Institute > Kolkata 700054, INDIA > Phone : +91-33-2355 6626, 2816, 2355 4766 > Fax : +91-33-2355 3886 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: n9.pl URL: -------------- next part -------------- Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From t.nugent at cs.ucl.ac.uk Tue Feb 20 09:31:20 2007 From: t.nugent at cs.ucl.ac.uk (Tim Nugent) Date: Tue, 20 Feb 2007 14:31:20 +0000 Subject: [Bioperl-l] Module to draw transmembrane protein topology In-Reply-To: <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu> References: <45D9DE8C.2010301@cs.ucl.ac.uk> <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu> Message-ID: <45DB0638.1030001@cs.ucl.ac.uk> Thanks Chris, glad it's appreciated. Is there anything else I can do? If anyone has any requests/suggestions please let me know too. Best wishes, Tim Chris Fields wrote: > I think this is pretty nice! We can add the code and test script to > bugzilla and (if someone has time) try to see where it might fit in, > though Bio::Graphics sounds like a good spot. > > Anyone else have ideas on where this could go? > > chris > > On Feb 19, 2007, at 11:29 AM, Tim Nugent wrote: > >> Hi everyone, >> >> I've written a perl module to display transmembrane protein topology >> using GD. There are various options, including labels, helix/loop >> dimensions, colour schemes etc but it only requires a string or array >> containing the protein topology (e.g. transmembrane helix start/stop >> points). It produces output like this: >> >> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png >> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png >> >> using the code at the bottom. >> >> Here is a the module: >> http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm >> >> I've never submitted anything to Bioperl before - is this sort of >> thing >> likely to be of use to others? I imagine it would sit alongside >> some of >> the Bio::Graphics stuff. >> >> Best wishes, >> >> Tim >> >> #!/usr/bin/perl >> >> use strict; >> use warnings; >> use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to module >> use DrawTransmembrane; >> >> my @topology = (20,45,59,70,86,109,145,168,194,220); >> >> my %labels = ('5' => '5 - Sulphation Site', >> '21' => '1st Helix', >> '47' => '40 - Mutation', >> '60' => 'Voltage Sensor', >> '72' => '72 - Mutation 2', >> '73' => '73 - Mutation 3', >> '138' => '138 - Glycosylation Site', >> '170' => '170 - Phosphorylation Site', >> '200' => 'Last Helix'); >> >> my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a >> cartoon displaying transmembrane helices.', >> -topology => >> \@topology, >> -n_terminal => 'out', >> -helix_width => 48, >> -helix_height => 125, >> -short_loop_limit >> => 10, >> -long_loop_limit => >> 35, >> -loop_width => 25, >> -colour_scheme => >> 'yellow', >> -labels => \%labels, >> -text_offset => -10); >> >> ## print the .png file >> my $output = 'test.png'; >> open(OUTPUT, ">$output"); >> binmode OUTPUT; >> print OUTPUT $im->png; >> close OUTPUT; >> >> my $system = `display $output`; >> >> -- >> Tim Nugent (MRes) >> Research Student >> Bioinformatics Unit >> Department of Computer Science >> University College London >> Gower Street >> London WC1E 6BT >> Tel: 020-7679-0410 >> t.nugent at ucl.ac.uk >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Tim Nugent (MRes) Research Student Bioinformatics Unit Department of Computer Science University College London Gower Street London WC1E 6BT Tel: 020-7679-0410 t.nugent at ucl.ac.uk From marian.thieme at lycos.de Tue Feb 20 08:34:24 2007 From: marian.thieme at lycos.de (marian thieme) Date: Tue, 20 Feb 2007 13:34:24 +0000 Subject: [Bioperl-l] Alignment Message-ID: <188661178021328@lycos-europe.com> Hi all, perhaps somebody can give some comments in the following matter: I have a series of sequences which should be aligned against a reference sequence. In this special case we dont need to calculate anything, we only need to represent the sequences and get for instance some columns of interest. The problem now is, that some sequences have gaps and we need to represent gaps in the rewference sequence as well as in some individual sequences. Question: Can I use LocatableSeq to describe sequences with gaps and to add the sequence to the alignment ? If yes how I have to understand the example in the doc: use Bio::LocatableSeq; my $seq = new Bio::LocatableSeq(-seq => "CAGT-GGT",-id => "seq1", -start => 1,-end => 7); Does the "-" sign represents a gap ? When this sequence starts at position 1 why it ends at position 7, because when considering the gap, there are 8 positions. Does the SimpleAlign object can treat the gap ? Thanks for your attention, Marian Benachrichtigung bei E-Mail Empfang! - http://mail.lycos.de/app/lycosinside/setupLI.exe From cjfields at uiuc.edu Tue Feb 20 09:40:38 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Feb 2007 08:40:38 -0600 Subject: [Bioperl-l] Module to draw transmembrane protein topology In-Reply-To: <45DB0638.1030001@cs.ucl.ac.uk> References: <45D9DE8C.2010301@cs.ucl.ac.uk> <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu> <45DB0638.1030001@cs.ucl.ac.uk> Message-ID: You can add the module and test code (the script) to bugzilla: http://www.bioperl.org/wiki/Bugs http://bugzilla.open-bio.org/ Basically file a new bug report but note that it in an enhancement request when filling it out. Attach the code and test script to the report after it is generated (note that it may be easier to add all of the files together as a zipped archive). I think you could also add the graphical output as a binary file if they are huge files. chris On Feb 20, 2007, at 8:31 AM, Tim Nugent wrote: > Thanks Chris, glad it's appreciated. > > Is there anything else I can do? If anyone has any requests/ > suggestions please let me know too. > > Best wishes, > > Tim > > Chris Fields wrote: >> I think this is pretty nice! We can add the code and test script >> to bugzilla and (if someone has time) try to see where it might >> fit in, though Bio::Graphics sounds like a good spot. >> Anyone else have ideas on where this could go? >> chris >> On Feb 19, 2007, at 11:29 AM, Tim Nugent wrote: >>> Hi everyone, >>> >>> I've written a perl module to display transmembrane protein topology >>> using GD. There are various options, including labels, helix/loop >>> dimensions, colour schemes etc but it only requires a string or >>> array >>> containing the protein topology (e.g. transmembrane helix start/stop >>> points). It produces output like this: >>> >>> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png >>> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png >>> >>> using the code at the bottom. >>> >>> Here is a the module: >>> http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm >>> >>> I've never submitted anything to Bioperl before - is this sort >>> of thing >>> likely to be of use to others? I imagine it would sit alongside >>> some of >>> the Bio::Graphics stuff. >>> >>> Best wishes, >>> >>> Tim >>> >>> #!/usr/bin/perl >>> >>> use strict; >>> use warnings; >>> use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to >>> module >>> use DrawTransmembrane; >>> >>> my @topology = (20,45,59,70,86,109,145,168,194,220); >>> >>> my %labels = ('5' => '5 - Sulphation Site', >>> '21' => '1st Helix', >>> '47' => '40 - Mutation', >>> '60' => 'Voltage Sensor', >>> '72' => '72 - Mutation 2', >>> '73' => '73 - Mutation 3', >>> '138' => '138 - Glycosylation Site', >>> '170' => '170 - Phosphorylation Site', >>> '200' => 'Last Helix'); >>> >>> my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a >>> cartoon displaying transmembrane helices.', >>> -topology => >>> \@topology, >>> -n_terminal => >>> 'out', >>> -helix_width => 48, >>> -helix_height => >>> 125, >>> - >>> short_loop_limit => 10, >>> -long_loop_limit >>> => 35, >>> -loop_width => 25, >>> -colour_scheme >>> => 'yellow', >>> -labels => \%labels, >>> -text_offset => >>> -10); >>> >>> ## print the .png file >>> my $output = 'test.png'; >>> open(OUTPUT, ">$output"); >>> binmode OUTPUT; >>> print OUTPUT $im->png; >>> close OUTPUT; >>> >>> my $system = `display $output`; >>> >>> -- >>> Tim Nugent (MRes) >>> Research Student >>> Bioinformatics Unit >>> Department of Computer Science >>> University College London >>> Gower Street >>> London WC1E 6BT >>> Tel: 020-7679-0410 >>> t.nugent at ucl.ac.uk >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Tim Nugent (MRes) > Research Student > Bioinformatics Unit > Department of Computer Science > University College London > Gower Street > London WC1E 6BT > Tel: 020-7679-0410 > t.nugent at ucl.ac.uk Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From avilella at gmail.com Tue Feb 20 10:30:17 2007 From: avilella at gmail.com (Albert Vilella) Date: Tue, 20 Feb 2007 15:30:17 +0000 Subject: [Bioperl-l] Alignment In-Reply-To: <188661178021328@lycos-europe.com> References: <188661178021328@lycos-europe.com> Message-ID: <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com> I think the SimpleAlign object contains a set of sequences, each of which is a LocatableSeq object. These LocatableSeq objects will have gaps, represented by '-' or whatever other symbol is specified (I think there are methods for it), and then one can use methods like column_from_residue_number to map the coordinates between the primary sequence and the aligned sequence. The perldoc for LocatableSeq has some examples on how to use these methods. [Hopefully I haven't written any lie in this message], Cheers, Albert. On 2/20/07, marian thieme wrote: > Hi all, > > perhaps somebody can give some comments in the following matter: > > I have a series of sequences which should be aligned against a reference sequence. > In this special case we dont need to calculate anything, we only need to represent the sequences and get for instance some columns of interest. > The problem now is, that some sequences have gaps and we need to represent gaps in the rewference sequence as well as in some individual sequences. > > Question: Can I use LocatableSeq to describe sequences with gaps and to add the sequence to the alignment ? > If yes how I have to understand the example in the doc: > use Bio::LocatableSeq; > my $seq = new Bio::LocatableSeq(-seq => "CAGT-GGT",-id => "seq1", -start => 1,-end => 7); > > Does the "-" sign represents a gap ? When this sequence starts at position 1 > why it ends at position 7, because when considering the gap, there are 8 positions. > Does the SimpleAlign object can treat the gap ? > > > Thanks for your attention, > Marian > > Benachrichtigung bei E-Mail Empfang! - http://mail.lycos.de/app/lycosinside/setupLI.exe > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Tue Feb 20 10:30:15 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Feb 2007 09:30:15 -0600 Subject: [Bioperl-l] Module to draw transmembrane protein topology In-Reply-To: <45DB0638.1030001@cs.ucl.ac.uk> References: <45D9DE8C.2010301@cs.ucl.ac.uk> <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu> <45DB0638.1030001@cs.ucl.ac.uk> Message-ID: <4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu> Sorry, I sent that last one off prematurely. I could see this being used as a very useful utility if a Bioperl object had SeqFeatures which described transmembrane regions, or if output from something like TMHMM were parsed and used for input. Don't know if it's included, but if not you probably should allow labeling of the intracellular/extracellular space to designate periplasmic space, mitochondrial matrix, thylakoid, etc. I think Bio::Graphics namespace is definitely the place to go. If I ever get around to writing up the RNA structural stuff I may put something there myself. chris On Feb 20, 2007, at 8:31 AM, Tim Nugent wrote: > Thanks Chris, glad it's appreciated. > > Is there anything else I can do? If anyone has any requests/ > suggestions > please let me know too. > > Best wishes, > > Tim From cjfields at uiuc.edu Tue Feb 20 10:49:56 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Feb 2007 09:49:56 -0600 Subject: [Bioperl-l] Alignment In-Reply-To: <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com> References: <188661178021328@lycos-europe.com> <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com> Message-ID: <97E36074-1CF4-4348-85AB-DF23F1048727@uiuc.edu> On Feb 20, 2007, at 9:30 AM, Albert Vilella wrote: > I think the SimpleAlign object contains a set of sequences, each of > which is a LocatableSeq object. > > These LocatableSeq objects will have gaps, represented by '-' or > whatever other symbol is specified (I think there are methods for it), > and then one can use methods like column_from_residue_number to map > the coordinates between the primary sequence and the aligned sequence. > The perldoc for LocatableSeq has some examples on how to use these > methods. > > [Hopefully I haven't written any lie in this message], > > Cheers, > > Albert. No lies. The comparison methods are in SimpleAlign; if you look in SimpleAlign.t you'll see several demos on how to go abouot adding LocatableSeqs to a SimpleAlign object and then use SimpleAlign methods for them. chris PS (to marian): I'm a bit behind this week, so the bracket_strings stuff is lagging behind; I'm writing up some stuff on a deadline. From t.nugent at cs.ucl.ac.uk Tue Feb 20 10:50:10 2007 From: t.nugent at cs.ucl.ac.uk (Tim Nugent) Date: Tue, 20 Feb 2007 15:50:10 +0000 Subject: [Bioperl-l] Module to draw transmembrane protein topology In-Reply-To: <4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu> References: <45D9DE8C.2010301@cs.ucl.ac.uk> <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu> <45DB0638.1030001@cs.ucl.ac.uk> <4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu> Message-ID: <45DB18B2.8070004@cs.ucl.ac.uk> Labeling of inside/outside and membrane is already possible via -inside_label, -outside_label and -membrane_label tags, defaults are intracellular, extracellular and plasma membrane. Was definitely going to add an input/parser for MEMSAT, developed here at UCL, and probably a few other popular TM predictors too, e.g. PHOBIUS, TMHMM etc. Can already accept topology in the string format used by OPM (http://opm.phar.umich.edu/). Tim Chris Fields wrote: > Sorry, I sent that last one off prematurely. > > I could see this being used as a very useful utility if a Bioperl object > had SeqFeatures which described transmembrane regions, or if output from > something like TMHMM were parsed and used for input. Don't know if it's > included, but if not you probably should allow labeling of the > intracellular/extracellular space to designate periplasmic space, > mitochondrial matrix, thylakoid, etc. > > I think Bio::Graphics namespace is definitely the place to go. If I > ever get around to writing up the RNA structural stuff I may put > something there myself. > > chris > > On Feb 20, 2007, at 8:31 AM, Tim Nugent wrote: > >> Thanks Chris, glad it's appreciated. >> >> Is there anything else I can do? If anyone has any requests/suggestions >> please let me know too. >> >> Best wishes, >> >> Tim > > -- Tim Nugent (MRes) Research Student Bioinformatics Unit Department of Computer Science University College London Gower Street London WC1E 6BT Tel: 020-7679-0410 t.nugent at ucl.ac.uk From cjfields at uiuc.edu Tue Feb 20 11:09:00 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Feb 2007 10:09:00 -0600 Subject: [Bioperl-l] Module to draw transmembrane protein topology In-Reply-To: <45DB18B2.8070004@cs.ucl.ac.uk> References: <45D9DE8C.2010301@cs.ucl.ac.uk> <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu> <45DB0638.1030001@cs.ucl.ac.uk> <4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu> <45DB18B2.8070004@cs.ucl.ac.uk> Message-ID: On Feb 20, 2007, at 9:50 AM, Tim Nugent wrote: > Labeling of inside/outside and membrane is already possible via - > inside_label, -outside_label and -membrane_label tags, defaults are > intracellular, extracellular and plasma membrane. > > Was definitely going to add an input/parser for MEMSAT, developed > here at UCL, and probably a few other popular TM predictors too, > e.g. PHOBIUS, TMHMM etc. Can already accept topology in the string > format used by OPM (http://opm.phar.umich.edu/). > > Tim I'll definitely have to take a closer look at it when I have time. My guess is the best fit for data would be a seqfeatures, either in a collection or a Bio::Seq. As for the parsers you can look at the Bio::Tools::Tmhmm module, which scans Tmhmm output and converts everything to seqfeatures. chris From lstein at cshl.edu Tue Feb 20 12:25:24 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 20 Feb 2007 12:25:24 -0500 Subject: [Bioperl-l] Bio::Graphics::FeatureBase seq_id question In-Reply-To: <45DAD8F6.1030409@sendu.me.uk> References: <45DAD8F6.1030409@sendu.me.uk> Message-ID: <6dce9a0b0702200925g74d2db53j3252cca8a41765b@mail.gmail.com> Just an oversight. I'll fix it. Lincoln On 2/20/07, Sendu Bala wrote: > > Bio::Graphics::FeatureBase::seq_id is currently implemented as a > read-only alias to ref(): > sub seq_id { shift->ref() } > > > What is the reasoning behind this? Can it be made to handle setting of > the value as well?: > sub seq_id { shift->ref(@_) } > > > Cheers, > Sendu. > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From khan at cshl.edu Tue Feb 20 15:42:12 2007 From: khan at cshl.edu (Khan, Sohail) Date: Tue, 20 Feb 2007 15:42:12 -0500 Subject: [Bioperl-l] parsing a list of ids to a fasta file. Message-ID: Dear list, I am new to Bio-Perl. I have the following question: I have a list of ids, which I would like to parse against a large fasta file to retrieve the Seq for parsed ids. I appreciate any suggestions. Thanks. Khan From michael.watson at bbsrc.ac.uk Tue Feb 20 16:33:19 2007 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue, 20 Feb 2007 21:33:19 -0000 Subject: [Bioperl-l] parsing a list of ids to a fasta file. References: Message-ID: <8975119BCD0AC5419D61A9CF1A923E95020680FD@iahce2ksrv1.iah.bbsrc.ac.uk> Suggest you use Bio::Index::Fasta to create an index for the fasta file and then a simple script to retrieve sequences using that index. Or just use the pre-written bp_index.PLS and bp_fetch.PLS scripts. http://www.bioperl.org/wiki/Module:Bio::Index::Fasta ________________________________ From: bioperl-l-bounces at lists.open-bio.org on behalf of Khan, Sohail Sent: Tue 20/02/2007 8:42 PM To: Bioperl-l at lists.open-bio.org Subject: [Bioperl-l] parsing a list of ids to a fasta file. Dear list, I am new to Bio-Perl. I have the following question: I have a list of ids, which I would like to parse against a large fasta file to retrieve the Seq for parsed ids. I appreciate any suggestions. Thanks. Khan _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From neetisomaiya at gmail.com Wed Feb 21 03:19:14 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Wed, 21 Feb 2007 13:49:14 +0530 Subject: [Bioperl-l] need help in Bio-SCF Message-ID: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> Hi All, I downloaded module Bio-SCF-1.01from CPAN. And I am trying to install it when I got the following error. Can someone please guide me. [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL Checking if your kit is complete... Looks good Note (probably harmless): No library found for -lread Writing Makefile for Bio::SCF [root at ps2288 Bio-SCF-1.01]# make cp SCF.pm blib/lib/Bio/SCF.pm cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp -typemap /usr/lib/perl5/5.8.5/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc SCF.c Please specify prototyping behavior for SCF.xs (see perlxs manual) gcc -c -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386 -mtune=pentium4 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE" -DLITTLE_ENDIAN SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory SCF.xs:13:26: io_lib/mFILE.h: No such file or directory SCF.xs: In function `XS_Bio__SCF_get_scf_pointer': SCF.xs:27: error: `Scf' undeclared (first use in this function) SCF.xs:27: error: (Each undeclared identifier is reported only once SCF.xs:27: error: for each function it appears in.) SCF.xs:27: error: `scf_data' undeclared (first use in this function) SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer': SCF.xs:66: error: `Scf' undeclared (first use in this function) SCF.xs:66: error: `scf_data' undeclared (first use in this function) SCF.xs:68: error: `mFILE' undeclared (first use in this function) SCF.xs:68: error: `mf' undeclared (first use in this function) SCF.xs: In function `XS_Bio__SCF_scf_free': SCF.xs:89: error: `Scf' undeclared (first use in this function) SCF.xs:89: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_get_comments': SCF.xs:95: error: `Scf' undeclared (first use in this function) SCF.xs:95: error: `scf_data' undeclared (first use in this function) SCF.xs:95: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_set_comments': SCF.xs:108: error: `Scf' undeclared (first use in this function) SCF.xs:108: error: `scf_data' undeclared (first use in this function) SCF.xs:108: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_scf_write': SCF.xs:121: error: `Scf' undeclared (first use in this function) SCF.xs:121: error: `scf_data' undeclared (first use in this function) SCF.xs:121: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_scf_fwrite': SCF.xs:135: error: `mFILE' undeclared (first use in this function) SCF.xs:135: error: `mf' undeclared (first use in this function) SCF.xs:137: error: `Scf' undeclared (first use in this function) SCF.xs:137: error: `scf_data' undeclared (first use in this function) SCF.xs:137: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_get_from_header': SCF.xs:159: error: `Scf' undeclared (first use in this function) SCF.xs:159: error: `scf_data' undeclared (first use in this function) SCF.xs:159: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_get_at': SCF.xs:186: error: `Scf' undeclared (first use in this function) SCF.xs:186: error: `scf_data' undeclared (first use in this function) SCF.xs:186: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_set_base_at': SCF.xs:242: error: `Scf' undeclared (first use in this function) SCF.xs:242: error: `scf_data' undeclared (first use in this function) SCF.xs:242: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_set_at': SCF.xs:255: error: `Scf' undeclared (first use in this function) SCF.xs:255: error: `scf_data' undeclared (first use in this function) SCF.xs:255: error: syntax error before ')' token make: *** [SCF.o] Error 1 -- -Neeti Even my blood says, B positive From neetisomaiya at gmail.com Wed Feb 21 03:19:14 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Wed, 21 Feb 2007 13:49:14 +0530 Subject: [Bioperl-l] need help in Bio-SCF Message-ID: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> Hi All, I downloaded module Bio-SCF-1.01from CPAN. And I am trying to install it when I got the following error. Can someone please guide me. [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL Checking if your kit is complete... Looks good Note (probably harmless): No library found for -lread Writing Makefile for Bio::SCF [root at ps2288 Bio-SCF-1.01]# make cp SCF.pm blib/lib/Bio/SCF.pm cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp -typemap /usr/lib/perl5/5.8.5/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc SCF.c Please specify prototyping behavior for SCF.xs (see perlxs manual) gcc -c -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386 -mtune=pentium4 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE" -DLITTLE_ENDIAN SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory SCF.xs:13:26: io_lib/mFILE.h: No such file or directory SCF.xs: In function `XS_Bio__SCF_get_scf_pointer': SCF.xs:27: error: `Scf' undeclared (first use in this function) SCF.xs:27: error: (Each undeclared identifier is reported only once SCF.xs:27: error: for each function it appears in.) SCF.xs:27: error: `scf_data' undeclared (first use in this function) SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer': SCF.xs:66: error: `Scf' undeclared (first use in this function) SCF.xs:66: error: `scf_data' undeclared (first use in this function) SCF.xs:68: error: `mFILE' undeclared (first use in this function) SCF.xs:68: error: `mf' undeclared (first use in this function) SCF.xs: In function `XS_Bio__SCF_scf_free': SCF.xs:89: error: `Scf' undeclared (first use in this function) SCF.xs:89: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_get_comments': SCF.xs:95: error: `Scf' undeclared (first use in this function) SCF.xs:95: error: `scf_data' undeclared (first use in this function) SCF.xs:95: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_set_comments': SCF.xs:108: error: `Scf' undeclared (first use in this function) SCF.xs:108: error: `scf_data' undeclared (first use in this function) SCF.xs:108: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_scf_write': SCF.xs:121: error: `Scf' undeclared (first use in this function) SCF.xs:121: error: `scf_data' undeclared (first use in this function) SCF.xs:121: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_scf_fwrite': SCF.xs:135: error: `mFILE' undeclared (first use in this function) SCF.xs:135: error: `mf' undeclared (first use in this function) SCF.xs:137: error: `Scf' undeclared (first use in this function) SCF.xs:137: error: `scf_data' undeclared (first use in this function) SCF.xs:137: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_get_from_header': SCF.xs:159: error: `Scf' undeclared (first use in this function) SCF.xs:159: error: `scf_data' undeclared (first use in this function) SCF.xs:159: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_get_at': SCF.xs:186: error: `Scf' undeclared (first use in this function) SCF.xs:186: error: `scf_data' undeclared (first use in this function) SCF.xs:186: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_set_base_at': SCF.xs:242: error: `Scf' undeclared (first use in this function) SCF.xs:242: error: `scf_data' undeclared (first use in this function) SCF.xs:242: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_set_at': SCF.xs:255: error: `Scf' undeclared (first use in this function) SCF.xs:255: error: `scf_data' undeclared (first use in this function) SCF.xs:255: error: syntax error before ')' token make: *** [SCF.o] Error 1 -- -Neeti Even my blood says, B positive From sdavis2 at mail.nih.gov Wed Feb 21 06:17:50 2007 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 21 Feb 2007 06:17:50 -0500 Subject: [Bioperl-l] need help in Bio-SCF In-Reply-To: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> Message-ID: <200702210617.50616.sdavis2@mail.nih.gov> On Wednesday 21 February 2007 03:19, neeti somaiya wrote: > Hi All, > > I downloaded module > Bio-SCF-1.01from CPAN. > And I am trying to install it when I got the following error. Can someone > please guide me. You will probably need to read the INSTALL document. You need to install a couple of libraries first. Looks like you don't have the staden io-lib installed. > [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL > Checking if your kit is complete... > Looks good > Note (probably harmless): No library found for -lread > Writing Makefile for Bio::SCF > > [root at ps2288 Bio-SCF-1.01]# make > cp SCF.pm blib/lib/Bio/SCF.pm > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm > /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp -typemap > /usr/lib/perl5/5.8.5/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc SCF.c > Please specify prototyping behavior for SCF.xs (see perlxs manual) > gcc -c -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386 > -mtune=pentium4 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC > "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE" -DLITTLE_ENDIAN > SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory > SCF.xs:13:26: io_lib/mFILE.h: No such file or directory > SCF.xs: In function `XS_Bio__SCF_get_scf_pointer': > SCF.xs:27: error: `Scf' undeclared (first use in this function) > SCF.xs:27: error: (Each undeclared identifier is reported only once > SCF.xs:27: error: for each function it appears in.) > SCF.xs:27: error: `scf_data' undeclared (first use in this function) > SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer': > SCF.xs:66: error: `Scf' undeclared (first use in this function) > SCF.xs:66: error: `scf_data' undeclared (first use in this function) > SCF.xs:68: error: `mFILE' undeclared (first use in this function) > SCF.xs:68: error: `mf' undeclared (first use in this function) > SCF.xs: In function `XS_Bio__SCF_scf_free': > SCF.xs:89: error: `Scf' undeclared (first use in this function) > SCF.xs:89: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_get_comments': > SCF.xs:95: error: `Scf' undeclared (first use in this function) > SCF.xs:95: error: `scf_data' undeclared (first use in this function) > SCF.xs:95: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_set_comments': > SCF.xs:108: error: `Scf' undeclared (first use in this function) > SCF.xs:108: error: `scf_data' undeclared (first use in this function) > SCF.xs:108: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_scf_write': > SCF.xs:121: error: `Scf' undeclared (first use in this function) > SCF.xs:121: error: `scf_data' undeclared (first use in this function) > SCF.xs:121: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_scf_fwrite': > SCF.xs:135: error: `mFILE' undeclared (first use in this function) > SCF.xs:135: error: `mf' undeclared (first use in this function) > SCF.xs:137: error: `Scf' undeclared (first use in this function) > SCF.xs:137: error: `scf_data' undeclared (first use in this function) > SCF.xs:137: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_get_from_header': > SCF.xs:159: error: `Scf' undeclared (first use in this function) > SCF.xs:159: error: `scf_data' undeclared (first use in this function) > SCF.xs:159: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_get_at': > SCF.xs:186: error: `Scf' undeclared (first use in this function) > SCF.xs:186: error: `scf_data' undeclared (first use in this function) > SCF.xs:186: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_set_base_at': > SCF.xs:242: error: `Scf' undeclared (first use in this function) > SCF.xs:242: error: `scf_data' undeclared (first use in this function) > SCF.xs:242: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_set_at': > SCF.xs:255: error: `Scf' undeclared (first use in this function) > SCF.xs:255: error: `scf_data' undeclared (first use in this function) > SCF.xs:255: error: syntax error before ')' token > make: *** [SCF.o] Error 1 From sdavis2 at mail.nih.gov Wed Feb 21 06:17:50 2007 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 21 Feb 2007 06:17:50 -0500 Subject: [Bioperl-l] need help in Bio-SCF In-Reply-To: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> Message-ID: <200702210617.50616.sdavis2@mail.nih.gov> On Wednesday 21 February 2007 03:19, neeti somaiya wrote: > Hi All, > > I downloaded module > Bio-SCF-1.01from CPAN. > And I am trying to install it when I got the following error. Can someone > please guide me. You will probably need to read the INSTALL document. You need to install a couple of libraries first. Looks like you don't have the staden io-lib installed. > [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL > Checking if your kit is complete... > Looks good > Note (probably harmless): No library found for -lread > Writing Makefile for Bio::SCF > > [root at ps2288 Bio-SCF-1.01]# make > cp SCF.pm blib/lib/Bio/SCF.pm > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm > /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp -typemap > /usr/lib/perl5/5.8.5/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc SCF.c > Please specify prototyping behavior for SCF.xs (see perlxs manual) > gcc -c -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386 > -mtune=pentium4 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC > "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE" -DLITTLE_ENDIAN > SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory > SCF.xs:13:26: io_lib/mFILE.h: No such file or directory > SCF.xs: In function `XS_Bio__SCF_get_scf_pointer': > SCF.xs:27: error: `Scf' undeclared (first use in this function) > SCF.xs:27: error: (Each undeclared identifier is reported only once > SCF.xs:27: error: for each function it appears in.) > SCF.xs:27: error: `scf_data' undeclared (first use in this function) > SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer': > SCF.xs:66: error: `Scf' undeclared (first use in this function) > SCF.xs:66: error: `scf_data' undeclared (first use in this function) > SCF.xs:68: error: `mFILE' undeclared (first use in this function) > SCF.xs:68: error: `mf' undeclared (first use in this function) > SCF.xs: In function `XS_Bio__SCF_scf_free': > SCF.xs:89: error: `Scf' undeclared (first use in this function) > SCF.xs:89: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_get_comments': > SCF.xs:95: error: `Scf' undeclared (first use in this function) > SCF.xs:95: error: `scf_data' undeclared (first use in this function) > SCF.xs:95: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_set_comments': > SCF.xs:108: error: `Scf' undeclared (first use in this function) > SCF.xs:108: error: `scf_data' undeclared (first use in this function) > SCF.xs:108: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_scf_write': > SCF.xs:121: error: `Scf' undeclared (first use in this function) > SCF.xs:121: error: `scf_data' undeclared (first use in this function) > SCF.xs:121: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_scf_fwrite': > SCF.xs:135: error: `mFILE' undeclared (first use in this function) > SCF.xs:135: error: `mf' undeclared (first use in this function) > SCF.xs:137: error: `Scf' undeclared (first use in this function) > SCF.xs:137: error: `scf_data' undeclared (first use in this function) > SCF.xs:137: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_get_from_header': > SCF.xs:159: error: `Scf' undeclared (first use in this function) > SCF.xs:159: error: `scf_data' undeclared (first use in this function) > SCF.xs:159: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_get_at': > SCF.xs:186: error: `Scf' undeclared (first use in this function) > SCF.xs:186: error: `scf_data' undeclared (first use in this function) > SCF.xs:186: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_set_base_at': > SCF.xs:242: error: `Scf' undeclared (first use in this function) > SCF.xs:242: error: `scf_data' undeclared (first use in this function) > SCF.xs:242: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_set_at': > SCF.xs:255: error: `Scf' undeclared (first use in this function) > SCF.xs:255: error: `scf_data' undeclared (first use in this function) > SCF.xs:255: error: syntax error before ')' token > make: *** [SCF.o] Error 1 From cjfields at uiuc.edu Wed Feb 21 07:08:57 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 21 Feb 2007 06:08:57 -0600 Subject: [Bioperl-l] need help in Bio-SCF In-Reply-To: <200702210617.50616.sdavis2@mail.nih.gov> References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> <200702210617.50616.sdavis2@mail.nih.gov> Message-ID: <40C288FE-C74C-4B3F-A835-1A5C563B2B8E@uiuc.edu> On Feb 21, 2007, at 5:17 AM, Sean Davis wrote: > On Wednesday 21 February 2007 03:19, neeti somaiya wrote: >> Hi All, >> >> I downloaded module >> Bio-SCF-1.01from CPAN. >> And I am trying to install it when I got the following error. Can >> someone >> please guide me. > > You will probably need to read the INSTALL document. You need to > install a > couple of libraries first. Looks like you don't have the staden io- > lib > installed. Just to note, this module isn't part of BioPerl (I don't even think it has a Bioperl interface). You'll probably need to contact Lincoln for details on using this module. One thing you may run into is errors with the version of io_lib installed (a problem I've encountered with bioperl-ext), probably from API changes. If you run into problems with newer versions of io_lib you should try downgrading to io_lib 1.8.11 or 1.8.12. From neetisomaiya at gmail.com Wed Feb 21 07:25:26 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Wed, 21 Feb 2007 17:55:26 +0530 Subject: [Bioperl-l] need help in Bio-SCF In-Reply-To: <200702210617.50616.sdavis2@mail.nih.gov> References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> <200702210617.50616.sdavis2@mail.nih.gov> Message-ID: <764978cf0702210425j544330bbr12c86a89960dbb66@mail.gmail.com> Thanks. It resolved my problem. On 2/21/07, Sean Davis wrote: > > On Wednesday 21 February 2007 03:19, neeti somaiya wrote: > > Hi All, > > > > I downloaded module > > Bio-SCF-1.01from CPAN. > > And I am trying to install it when I got the following error. Can > someone > > please guide me. > > You will probably need to read the INSTALL document. You need to install > a > couple of libraries first. Looks like you don't have the staden io-lib > installed. > > > > [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL > > Checking if your kit is complete... > > Looks good > > Note (probably harmless): No library found for -lread > > Writing Makefile for Bio::SCF > > > > [root at ps2288 Bio-SCF-1.01]# make > > cp SCF.pm blib/lib/Bio/SCF.pm > > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm > > /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp -typemap > > /usr/lib/perl5/5.8.5/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc > SCF.c > > Please specify prototyping behavior for SCF.xs (see perlxs manual) > > gcc -c -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING > > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE > > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386 > > -mtune=pentium4 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC > > "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE" -DLITTLE_ENDIAN > > SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory > > SCF.xs:13:26: io_lib/mFILE.h: No such file or directory > > SCF.xs: In function `XS_Bio__SCF_get_scf_pointer': > > SCF.xs:27: error: `Scf' undeclared (first use in this function) > > SCF.xs:27: error: (Each undeclared identifier is reported only once > > SCF.xs:27: error: for each function it appears in.) > > SCF.xs:27: error: `scf_data' undeclared (first use in this function) > > SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer': > > SCF.xs:66: error: `Scf' undeclared (first use in this function) > > SCF.xs:66: error: `scf_data' undeclared (first use in this function) > > SCF.xs:68: error: `mFILE' undeclared (first use in this function) > > SCF.xs:68: error: `mf' undeclared (first use in this function) > > SCF.xs: In function `XS_Bio__SCF_scf_free': > > SCF.xs:89: error: `Scf' undeclared (first use in this function) > > SCF.xs:89: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_get_comments': > > SCF.xs:95: error: `Scf' undeclared (first use in this function) > > SCF.xs:95: error: `scf_data' undeclared (first use in this function) > > SCF.xs:95: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_set_comments': > > SCF.xs:108: error: `Scf' undeclared (first use in this function) > > SCF.xs:108: error: `scf_data' undeclared (first use in this function) > > SCF.xs:108: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_scf_write': > > SCF.xs:121: error: `Scf' undeclared (first use in this function) > > SCF.xs:121: error: `scf_data' undeclared (first use in this function) > > SCF.xs:121: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_scf_fwrite': > > SCF.xs:135: error: `mFILE' undeclared (first use in this function) > > SCF.xs:135: error: `mf' undeclared (first use in this function) > > SCF.xs:137: error: `Scf' undeclared (first use in this function) > > SCF.xs:137: error: `scf_data' undeclared (first use in this function) > > SCF.xs:137: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_get_from_header': > > SCF.xs:159: error: `Scf' undeclared (first use in this function) > > SCF.xs:159: error: `scf_data' undeclared (first use in this function) > > SCF.xs:159: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_get_at': > > SCF.xs:186: error: `Scf' undeclared (first use in this function) > > SCF.xs:186: error: `scf_data' undeclared (first use in this function) > > SCF.xs:186: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_set_base_at': > > SCF.xs:242: error: `Scf' undeclared (first use in this function) > > SCF.xs:242: error: `scf_data' undeclared (first use in this function) > > SCF.xs:242: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_set_at': > > SCF.xs:255: error: `Scf' undeclared (first use in this function) > > SCF.xs:255: error: `scf_data' undeclared (first use in this function) > > SCF.xs:255: error: syntax error before ')' token > > make: *** [SCF.o] Error 1 > -- -Neeti Even my blood says, B positive From neetisomaiya at gmail.com Wed Feb 21 07:25:26 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Wed, 21 Feb 2007 17:55:26 +0530 Subject: [Bioperl-l] need help in Bio-SCF In-Reply-To: <200702210617.50616.sdavis2@mail.nih.gov> References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> <200702210617.50616.sdavis2@mail.nih.gov> Message-ID: <764978cf0702210425j544330bbr12c86a89960dbb66@mail.gmail.com> Thanks. It resolved my problem. On 2/21/07, Sean Davis wrote: > > On Wednesday 21 February 2007 03:19, neeti somaiya wrote: > > Hi All, > > > > I downloaded module > > Bio-SCF-1.01from CPAN. > > And I am trying to install it when I got the following error. Can > someone > > please guide me. > > You will probably need to read the INSTALL document. You need to install > a > couple of libraries first. Looks like you don't have the staden io-lib > installed. > > > > [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL > > Checking if your kit is complete... > > Looks good > > Note (probably harmless): No library found for -lread > > Writing Makefile for Bio::SCF > > > > [root at ps2288 Bio-SCF-1.01]# make > > cp SCF.pm blib/lib/Bio/SCF.pm > > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm > > /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp -typemap > > /usr/lib/perl5/5.8.5/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc > SCF.c > > Please specify prototyping behavior for SCF.xs (see perlxs manual) > > gcc -c -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING > > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE > > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386 > > -mtune=pentium4 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC > > "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE" -DLITTLE_ENDIAN > > SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory > > SCF.xs:13:26: io_lib/mFILE.h: No such file or directory > > SCF.xs: In function `XS_Bio__SCF_get_scf_pointer': > > SCF.xs:27: error: `Scf' undeclared (first use in this function) > > SCF.xs:27: error: (Each undeclared identifier is reported only once > > SCF.xs:27: error: for each function it appears in.) > > SCF.xs:27: error: `scf_data' undeclared (first use in this function) > > SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer': > > SCF.xs:66: error: `Scf' undeclared (first use in this function) > > SCF.xs:66: error: `scf_data' undeclared (first use in this function) > > SCF.xs:68: error: `mFILE' undeclared (first use in this function) > > SCF.xs:68: error: `mf' undeclared (first use in this function) > > SCF.xs: In function `XS_Bio__SCF_scf_free': > > SCF.xs:89: error: `Scf' undeclared (first use in this function) > > SCF.xs:89: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_get_comments': > > SCF.xs:95: error: `Scf' undeclared (first use in this function) > > SCF.xs:95: error: `scf_data' undeclared (first use in this function) > > SCF.xs:95: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_set_comments': > > SCF.xs:108: error: `Scf' undeclared (first use in this function) > > SCF.xs:108: error: `scf_data' undeclared (first use in this function) > > SCF.xs:108: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_scf_write': > > SCF.xs:121: error: `Scf' undeclared (first use in this function) > > SCF.xs:121: error: `scf_data' undeclared (first use in this function) > > SCF.xs:121: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_scf_fwrite': > > SCF.xs:135: error: `mFILE' undeclared (first use in this function) > > SCF.xs:135: error: `mf' undeclared (first use in this function) > > SCF.xs:137: error: `Scf' undeclared (first use in this function) > > SCF.xs:137: error: `scf_data' undeclared (first use in this function) > > SCF.xs:137: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_get_from_header': > > SCF.xs:159: error: `Scf' undeclared (first use in this function) > > SCF.xs:159: error: `scf_data' undeclared (first use in this function) > > SCF.xs:159: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_get_at': > > SCF.xs:186: error: `Scf' undeclared (first use in this function) > > SCF.xs:186: error: `scf_data' undeclared (first use in this function) > > SCF.xs:186: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_set_base_at': > > SCF.xs:242: error: `Scf' undeclared (first use in this function) > > SCF.xs:242: error: `scf_data' undeclared (first use in this function) > > SCF.xs:242: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_set_at': > > SCF.xs:255: error: `Scf' undeclared (first use in this function) > > SCF.xs:255: error: `scf_data' undeclared (first use in this function) > > SCF.xs:255: error: syntax error before ')' token > > make: *** [SCF.o] Error 1 > -- -Neeti Even my blood says, B positive From jay at jays.net Tue Feb 20 19:27:01 2007 From: jay at jays.net (Jay Hannah) Date: Tue, 20 Feb 2007 18:27:01 -0600 Subject: [Bioperl-l] Alignment In-Reply-To: <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com> References: <188661178021328@lycos-europe.com> <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com> Message-ID: > On 2/20/07, marian thieme wrote: >> I have a series of sequences which should be aligned against a >> reference sequence. >> In this special case we dont need to calculate anything, we only need >> to represent the sequences and get for instance some columns of >> interest. >> The problem now is, that some sequences have gaps and we need to >> represent gaps in the rewference sequence as well as in some >> individual sequences. On Feb 20, 2007, at 9:30 AM, Albert Vilella wrote: > I think the SimpleAlign object contains a set of sequences, each of > which is a LocatableSeq object. Fascinating. In my BLAST-centric universe I went and rolled my own solution for SeqLab where I hold onto the Bio::Seq from the reference sequences and then hold onto the Bio::Search::HSP::GenericHSP objects for all my BLAST hits. From that dataset I can write whatever reports I want and/or perform any subsequent actions. I wonder if I should have done that differently... What typically creates .pfam files? j seqlab.net http://www.bioperl.org/wiki/User:Jhannah From cjfields at uiuc.edu Wed Feb 21 08:36:02 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 21 Feb 2007 07:36:02 -0600 Subject: [Bioperl-l] Alignment In-Reply-To: References: <188661178021328@lycos-europe.com> <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com> Message-ID: <2233F0EE-94FE-42F0-B8E5-1BE14A25C0D4@uiuc.edu> On Feb 20, 2007, at 6:27 PM, Jay Hannah wrote: ... > > On Feb 20, 2007, at 9:30 AM, Albert Vilella wrote: >> I think the SimpleAlign object contains a set of sequences, each of >> which is a LocatableSeq object. > > Fascinating. In my BLAST-centric universe I went and rolled my own > solution for SeqLab where I hold onto the Bio::Seq from the reference > sequences and then hold onto the Bio::Search::HSP::GenericHSP objects > for all my BLAST hits. From that dataset I can write whatever > reports I > want and/or perform any subsequent actions. I wonder if I should have > done that differently... > > What typically creates .pfam files? > > j > seqlab.net > http://www.bioperl.org/wiki/User:Jhannah Pfam alignments come in two formats (pfam and stockholm) that can both be parsed into SimpleAlign objects via Bio::AlignIO: my $alnin = Bio::AlignIO->new(-format => 'stockholm', -file => 'dho.sto'); while (my $aln = $alnin->next_aln) { # do stuff to $aln SimpleAlign } Personally I stick with Stockholm as it's a richer format (with annotations and so on), but the parser was rewritten recently (by moi!) so may have some bugs still. I'm a bit confused as to what you do with BLAST files. You can generate a SimpleAlign right from the HSP for most SearchIO parsers: http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods chris From sanjib at bic.boseinst.ernet.in Wed Feb 21 01:12:06 2007 From: sanjib at bic.boseinst.ernet.in (Sanjib Kumar Gupta) Date: Wed, 21 Feb 2007 11:42:06 +0530 Subject: [Bioperl-l] help on remote blast In-Reply-To: <20070220073200.M42567@bic.boseinst.ernet.in> References: <20070220073200.M42567@bic.boseinst.ernet.in> Message-ID: <20070221061206.M37845@bic.boseinst.ernet.in> Hi I have been running this script for some time and it was running fine. I am using this linux machine with live IP(no proxy). But suudenly it has stopped working with this errors waiting...waiting... -------------------- WARNING --------------------- MSG: An Error Occurred

An Error Occurred

500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad hostname 'www.ncbi.nlm.nih.gov') --------------------------------------------------- xx.pep -------------------- WARNING --------------------- MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5 Content-Length: 497 Content-Type: application/x-www-form-urlencoded DATABASE=nr&QUERY=%3ENM_005277.3%3B_1414961_1+837% 0AMEENMEEGQTQKGCFECCIKCLGGIPYASLIATILLYAGVALFCGCGHEALSGTVNILQTYFEMARTAGDTLDVF TMIDIFKYVIYGIAAAFFVYGILLMVEGFFTTGAIKDLYGDFKITTCGRCVSAWFIMLTYLFMLAWLGVTAFTSLPV YMYFNLWTICRNTTLVEGANLCLDLRQFGIVTIGEEKKICTVSENFLRMCESTELNMTFHLFIVALAGAGAAVIAMV HYLMVLSANWAYVKDACRMQKYEDIKSKEEQELHDIHSTRSKERLNAYT*&COMPOSITION_BASED_STATISTI CS=off&EXPECT=1e- 10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&MATRIX_NAME=BLOSUM62&ENTREZ_ QUERY=Xenopus+laevis[Organism]&PROGRAM=blastp An Error Occurred

An Error Occurred

500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad hostname 'www.ncbi.nlm.nih.gov') --------------------------------------------------- waiting...waiting... -------------------- WARNING --------------------- MSG: An Error Occurred

An Error Occurred

500 Internal Server Error --------------------------------------------------- Though I am able to see the ncbi page from browser but am unable to ping ot trace route to the server. Please help me. -- Sanjib Kumar Gupta Bioinformatics Centre Bose Institute Kolkata 700054, INDIA Phone : +91-33-2355 6626, 2816, 2355 4766 Fax : +91-33-2355 3886 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: n9.pl URL: From granjeau at tagc.univ-mrs.fr Wed Feb 21 08:50:39 2007 From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137) Date: Wed, 21 Feb 2007 14:50:39 +0100 Subject: [Bioperl-l] Adding empty member list in Bio::Cluster::SequenceFamily Message-ID: <45DC4E2F.4060804@tagc.univ-mrs.fr> Hello! Not clear to me, but I find a work around by checking for empty list before adding, here is what I noticed. Adding as members an empty list () is not the same as adding a reference to an empty list [], of course, but could be thought to be the same. Calling get_members, for the second case, I got a list of 0 member, but in the first case I got of 1 member, which is not an object at all. I am warned now, but may be the documentation should emphasize on using by the reference call. Best regards, --Samuel use Bio::Cluster::SequenceFamily; $f = new Bio::Cluster::SequenceFamily( -id => 'aa' ); $f->add_members( () ); print scalar $f->get_members(); # 1 $g = new Bio::Cluster::SequenceFamily( -id => 'aa' ); $g->add_members( [] ); print scalar $g->get_members(); # 0 From stephen.marshall at novartis.com Wed Feb 21 12:01:00 2007 From: stephen.marshall at novartis.com (stephen.marshall at novartis.com) Date: Wed, 21 Feb 2007 12:01:00 -0500 Subject: [Bioperl-l] Parsing kegg files Message-ID: Hello I"m trying to parse a Kegg file and I can't seem to get at the pathway information... Here's a snippet of my code. I only see dblink and description as annotation use Bio::SeqIO; my $stream = Bio::SeqIO->new(-file => $filename, -format => 'KEGG'); while ( my $seq = $stream->next_seq() ) { # do something with $seq my $id = $seq->display_id(); print "$id:"; my $ann = $seq->annotation(); foreach my $key ( $ann->get_all_annotation_keys() ) { my @values = $ann->get_Annotations($key); foreach my $value ( @values ) { print "Annotation: ",$key," value: ",$value->as_text,"\n"; } } } _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From prateek.vit at gmail.com Wed Feb 21 12:40:25 2007 From: prateek.vit at gmail.com (prateek singh yadav) Date: Wed, 21 Feb 2007 23:10:25 +0530 Subject: [Bioperl-l] Problem in BioPerl Installation Message-ID: Hello all, I was trying to install Bioperl on my redhat linux (EL) using CPAN. but CPAN shows this problem. [root at HX342SBC054 Desktop]# cpan Terminal does not support AddHistory. cpan shell -- CPAN exploration and modules installation (v1.7601) ReadLine support available (try 'install Bundle::CPAN') cpan> get bioperl CPAN: Storable loaded ok Going to read /root/.cpan/Metadata Warning: Found only 25 objects in /root/.cpan/Metadata Going to read /root/.cpan/sources/authors/01mailrc.txt.gz Going to read /root/.cpan/sources/modules/02packages.details.txt.gz Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not contain a Line-Count header. Please check the validity of the index file by comparing it to more than one CPAN mirror. I'll continue but problems seem likely to happen. Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not contain a Last-Updated header. Please check the validity of the index file by comparing it to more than one CPAN mirror. I'll continue but problems seem likely to happen. Going to read /root/.cpan/sources/modules/03modlist.data.gz Can't locate object method "data" via package "CPAN::Modulelist" (perhaps you forgot to load "CPAN::Modulelist"?) at (eval 13) line 1. at /usr/lib/perl5/5.8.5/CPAN.pm line 3406 CPAN::Index::rd_modlist('CPAN::Index', '/root/.cpan/sources/modules/03modlist.data.gz') called at /usr/lib/perl5/5.8.5/CPAN.pm line 3129 CPAN::Index::reload('CPAN::Index') called at /usr/lib/perl5/5.8.5/CPAN.pm line 675 CPAN::exists('CPAN=HASH(0x8548f20)', 'CPAN::Module', 'bioperl') called at /usr/lib/perl5/5.8.5/CPAN.pm line 1842 CPAN::Shell::expandany('CPAN::Shell', 'bioperl') called at /usr/lib/perl5/5.8.5/CPAN.pm line 2078 CPAN::Shell::rematein('CPAN::Shell', 'get', 'bioperl') called at /usr/lib/perl5/5.8.5/CPAN.pm line 2157 CPAN::Shell::get('CPAN::Shell', 'bioperl') called at /usr/lib/perl5/5.8.5/CPAN.pm line 201 eval {...} called at /usr/lib/perl5/5.8.5/CPAN.pm line 201 CPAN::shell() called at /usr/bin/cpan line 193 cpan> Can anyone give me direction how to configure cpan again or how to install BioPerl on linux with its complete dependencies. Because I think I have a problem in CPAN configuration. Regards, Prateek -- Prateek Singh 3rd year Bioinformatics(BTech) Vellore Institute Of Technology Vellore-632014 prateek.vit at gmail.com From bosborne11 at verizon.net Wed Feb 21 12:29:40 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 21 Feb 2007 12:29:40 -0500 Subject: [Bioperl-l] Parsing kegg files In-Reply-To: Message-ID: Stephen, I don't know what your eventual goals are but you might want to take a look at bioperl-network. However, there are problems with this package. One, it only parses DIP tab-delimited and PSI-MI and it does this last one only partially (you will get the graph though). Two, it seems to have only a single developer interested in it, that's me, and few users. In my Bioperl experience projects like this tend to fade away. http://www.bioperl.org/wiki/Network_package Brian O. On 2/21/07 12:01 PM, "stephen.marshall at novartis.com" wrote: > Hello > I"m trying to parse a Kegg file and I can't seem to get at the pathway > information... Here's a snippet of my code. I only see dblink and > description as annotation > > use Bio::SeqIO; > > my $stream = Bio::SeqIO->new(-file => $filename, -format => 'KEGG'); > > while ( my $seq = $stream->next_seq() ) { > # do something with $seq > my $id = $seq->display_id(); > print "$id:"; > my $ann = $seq->annotation(); > foreach my $key ( $ann->get_all_annotation_keys() ) { > my @values = $ann->get_Annotations($key); > foreach my $value ( @values ) { > print "Annotation: ",$key," value: > ",$value->as_text,"\n"; > } > } > > } > _________________________ > > CONFIDENTIALITY NOTICE > > The information contained in this e-mail message is intended only for the > exclusive use of the individual or entity named above and may contain > information that is privileged, confidential or exempt from disclosure > under applicable law. If the reader of this message is not the intended > recipient, or the employee or agent responsible for delivery of the > message to the intended recipient, you are hereby notified that any > dissemination, distribution or copying of this communication is strictly > prohibited. If you have received this communication in error, please > notify the sender immediately by e-mail and delete the material from any > computer. Thank you. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From arareko at campus.iztacala.unam.mx Wed Feb 21 13:18:37 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Wed, 21 Feb 2007 12:18:37 -0600 Subject: [Bioperl-l] Problem in BioPerl Installation In-Reply-To: References: Message-ID: <45DC8CFD.1060108@campus.iztacala.unam.mx> You can always rebuild your CPAN configuration by deleting the existing .cpan/ directory in root's $HOME dir (quick & dirty trick), then invoke CPAN again from root's shell to rebuild the config: # perl -MCPAN -e shell Hope this helps. Regards, Mauricio. prateek singh yadav wrote: > Hello all, > > I was trying to install Bioperl on my redhat linux (EL) using CPAN. but CPAN > shows this problem. > > > [root at HX342SBC054 Desktop]# cpan > Terminal does not support AddHistory. > > cpan shell -- CPAN exploration and modules installation (v1.7601) > ReadLine support available (try 'install Bundle::CPAN') > > cpan> get bioperl > CPAN: Storable loaded ok > Going to read /root/.cpan/Metadata > Warning: Found only 25 objects in /root/.cpan/Metadata > Going to read /root/.cpan/sources/authors/01mailrc.txt.gz > Going to read /root/.cpan/sources/modules/02packages.details.txt.gz > Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not > contain a Line-Count header. > Please check the validity of the index file by comparing it to more > than one CPAN mirror. I'll continue but problems seem likely to > happen. > Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not > contain a Last-Updated header. > Please check the validity of the index file by comparing it to more > than one CPAN mirror. I'll continue but problems seem likely to > happen. > Going to read /root/.cpan/sources/modules/03modlist.data.gz > Can't locate object method "data" via package "CPAN::Modulelist" (perhaps > you forgot to load "CPAN::Modulelist"?) at (eval 13) line 1. > at /usr/lib/perl5/5.8.5/CPAN.pm line 3406 > CPAN::Index::rd_modlist('CPAN::Index', > '/root/.cpan/sources/modules/03modlist.data.gz') called at > /usr/lib/perl5/5.8.5/CPAN.pm line 3129 > CPAN::Index::reload('CPAN::Index') called at > /usr/lib/perl5/5.8.5/CPAN.pm line 675 > CPAN::exists('CPAN=HASH(0x8548f20)', 'CPAN::Module', 'bioperl') > called at /usr/lib/perl5/5.8.5/CPAN.pm line 1842 > CPAN::Shell::expandany('CPAN::Shell', 'bioperl') called at > /usr/lib/perl5/5.8.5/CPAN.pm line 2078 > CPAN::Shell::rematein('CPAN::Shell', 'get', 'bioperl') called at > /usr/lib/perl5/5.8.5/CPAN.pm line 2157 > CPAN::Shell::get('CPAN::Shell', 'bioperl') called at > /usr/lib/perl5/5.8.5/CPAN.pm line 201 > eval {...} called at /usr/lib/perl5/5.8.5/CPAN.pm line 201 > CPAN::shell() called at /usr/bin/cpan line 193 > > cpan> > > Can anyone give me direction how to configure cpan again or how to install > BioPerl on linux with its complete dependencies. Because I think I have a > problem in CPAN configuration. > > Regards, > Prateek > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From hlapp at gmx.net Wed Feb 21 13:33:17 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 21 Feb 2007 13:33:17 -0500 Subject: [Bioperl-l] Adding empty member list in Bio::Cluster::SequenceFamily In-Reply-To: <45DC4E2F.4060804@tagc.univ-mrs.fr> References: <45DC4E2F.4060804@tagc.univ-mrs.fr> Message-ID: <5B31EEBD-FFE5-4A0F-BB05-DF7297103BBD@gmx.net> Fixed in CVS HEAD. -hilmar On Feb 21, 2007, at 8:50 AM, Samuel GRANJEAUD - IR/IFR137 wrote: > Hello! > > Not clear to me, but I find a work around by checking for empty list > before adding, here is what I noticed. Adding as members an empty list > () is not the same as adding a reference to an empty list [], of > course, > but could be thought to be the same. Calling get_members, for the > second > case, I got a list of 0 member, but in the first case I got of 1 > member, > which is not an object at all. I am warned now, but may be the > documentation should emphasize on using by the reference call. > > Best regards, > --Samuel > > > use Bio::Cluster::SequenceFamily; > > $f = new Bio::Cluster::SequenceFamily( -id => 'aa' ); > $f->add_members( () ); > print scalar $f->get_members(); > # 1 > $g = new Bio::Cluster::SequenceFamily( -id => 'aa' ); > $g->add_members( [] ); > print scalar $g->get_members(); > # 0 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Wed Feb 21 14:12:57 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 21 Feb 2007 13:12:57 -0600 Subject: [Bioperl-l] GenBank accession bug? Message-ID: <11B83C2C-7BFE-48B5-A20C-7EAF282F39BD@uiuc.edu> Dmitry, I'm forwarding this to the mail list. In the future please post/ respond to the regular mail list so other BioPerl developers/users can comment. You'll get feedback much faster here (and maybe even some support!). The issue at hand is whether we can support GenBank accessions/ display_id/version with your naming scheme. My feeling is that support for nonalphanumerics was removed to be compliant with the GenBank standard for accessions, though I may be wrong. Maybe someone who was around during bioperl 1.2 can elaborate more? From http://bugzilla.open-bio.org/show_bug.cgi?id=2214 -------------------------------------------------- .... Thanks for verbose explanation. It seems that I would need to apply my local patches to the BioPerl module(s). With BioPerl-1.2 there was no problem with '-' in sequence names. The problem is that in the project we participate (Vizier project) following sequence name convention was adopted: VZ##-(or)-<$$> VZ Stands for Vizier ## Your 2-digits Partner ID within the VIZIER consortium Virus name according to the ICTV nomenclature; , If sequence has not been assigned a GenBank LOCUS ID, available strain designation, short as possible, should be used <$$> Unique 2-digits number on your discretion to label sequence variant -------------------------------------------------- chris From gabriel.cardona at uib.es Thu Feb 22 04:33:14 2007 From: gabriel.cardona at uib.es (gcardona) Date: Thu, 22 Feb 2007 01:33:14 -0800 (PST) Subject: [Bioperl-l] bioperl-1.5.2_100 ... not found Message-ID: <9096740.post@talk.nabble.com> Hello, I am trying to install Bioperl on a Windows system, following the installation notes in http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows When the Perl Package Manager tries to download bioperl-1.5.2_100, it cannot find the package and answers: Downloading bioperl-1.5.2_100 ... not found I've looked the contents of http://bioperl.org/DIST and in package.xml the version for bioperl is bioperl-1.5.2_100, but in that folder the available version is bioperl-1.5.2_102 Is this a bug? or should I download and install manually? Thank you in advance, Gabriel Cardona -- View this message in context: http://www.nabble.com/bioperl-1.5.2_100-...-not-found-tf3271747.html#a9096740 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From bix at sendu.me.uk Thu Feb 22 07:35:14 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 22 Feb 2007 12:35:14 +0000 Subject: [Bioperl-l] bioperl-1.5.2_100 ... not found In-Reply-To: <9096740.post@talk.nabble.com> References: <9096740.post@talk.nabble.com> Message-ID: <45DD8E02.1070404@sendu.me.uk> gcardona wrote: > Hello, > > I am trying to install Bioperl on a Windows system, following the > installation notes in > http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows > When the Perl Package Manager tries to download bioperl-1.5.2_100, it cannot > find the package and answers: > Downloading bioperl-1.5.2_100 ... not found > > I've looked the contents of > http://bioperl.org/DIST > and in package.xml the version for bioperl is bioperl-1.5.2_100, but in that > folder the available version is bioperl-1.5.2_102 > Is this a bug? or should I download and install manually? Sorry, my mistake. I accidentally moved the ppm to a different folder. It should work now though. I may make a 1.5.2_102 ppm at some point, but there are no relevant differences between _102 and _100 as far as Windows users are concerned. From enrique_rulz at yahoo.com Thu Feb 22 15:41:37 2007 From: enrique_rulz at yahoo.com (Kurt Gobain) Date: Thu, 22 Feb 2007 12:41:37 -0800 (PST) Subject: [Bioperl-l] Sequence matching problem! Message-ID: <9107936.post@talk.nabble.com> Hi every1.. I m facing a great deal of problem in simple pattern matching between sequence & a pattern ..Program shod be designed such a way that it shod be able do two things 1) normal matching...For eg: GATCAAT....if TC is entered... output shod be 2...2) matching using spl character..In same example if C*T value is entered It shod give o/p as 3 & seq to b displayed is CAAT..I m easily getting 1st part...But in 2nd part Its giving sum problem..output I m gettin as 1 instead of 3...Code is really simple! #!/usr/bin/perl $alphabet = "GATCAAT"; $pattern= "C*T "; $alphabet =~ /($pattern)/i; print "The entire '$pattern' match began at $-[0] and ended at $+[0]\n"; ==================== OUTPUT! The entire C*T match began at 1 and ended at 2 ==================== but the o/p shod be 3???? & Is there n e chance I can get seq too..I mean instead of C*T'' i need 'CAAT'...???? Well..Its not compulsion to use regex....But I find it quite simple..can there be n e other method?? Thanx in advance! Kurt! -- View this message in context: http://www.nabble.com/Sequence-matching-problem%21-tf3275153.html#a9107936 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at uiuc.edu Thu Feb 22 16:01:03 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 22 Feb 2007 15:01:03 -0600 Subject: [Bioperl-l] GenBank accession bug? In-Reply-To: <51879.10.0.7.57.1172176272.squirrel@gscmail.wustl.edu> References: <11B83C2C-7BFE-48B5-A20C-7EAF282F39BD@uiuc.edu> <51879.10.0.7.57.1172176272.squirrel@gscmail.wustl.edu> Message-ID: <028E16D7-036A-44DA-BECD-F910BEA58E53@uiuc.edu> On Feb 22, 2007, at 2:31 PM, dmessina at watson.wustl.edu wrote: >> The issue at hand is whether we can support GenBank accessions/ >> display_id/version with your naming scheme. > > Chris, I'm a little unsure of what you're saying here (which might > mean > that you're already saying what I'm about to...say). Do you mean it > might > be tricky to support both the Genbank standard and Dmitry's > simultaneously? > > I would argue any arbitrary ID should be supported as long as that > ID is a > contiguous non-space word (\S+). > > Actually the existing accession regex looks like it already > supports IDs > with '-': > > /^ACCESSION\s+(\S.*\S)/ > > It's only the version regex which doesn't (\w doesn't include '-'): > > /^\w+\.(\d+)/ > > > Anyone else have thoughts or comments on this? Off the top of my > head, I > can't think of any issues that might arise from doing so (apart from > having to modify all of the SeqIO modules to support it). > > Dave You're right; the argument comes down simply to whether we would support \S+ or just \w+. I'm neutral on this myself, but I wonder how allowing \S+ would affect other modules (for instance, indexing for a flat db), where one might just use \w+ for accessions, expecting them to be GenBank- or EMBL-like alphanumerics. The fact that \S+ was supported in the past (as indicated in the bug report) and then wasn't post 1.2 makes me think there was a reason for someone going in and modifying it, but that was before my time on the group. I'll have a look at the CVS history when I have time to see what I can dig up. chris From mkiwala at watson.wustl.edu Thu Feb 22 15:36:33 2007 From: mkiwala at watson.wustl.edu (Michael Kiwala) Date: Thu, 22 Feb 2007 14:36:33 -0600 Subject: [Bioperl-l] Bio::AnnotatableI vs Bio::FeatureHolderI Message-ID: <45DDFED1.1090503@watson.wustl.edu> Are Bio::AnnotatableI and Bio::FeatureHolderI competing interfaces? I get the impression they are designed to do similar things. If so is one deprecated and the other preferred? If their responsibilities are orthogonal to each other, what sorts of tasks are suited to each? Thanks, Michael From dmessina at wustl.edu Thu Feb 22 15:53:01 2007 From: dmessina at wustl.edu (Dave Messina) Date: Thu, 22 Feb 2007 14:53:01 -0600 (CST) Subject: [Bioperl-l] GenBank accession bug? Message-ID: <51923.10.0.7.57.1172177581.squirrel@gscmail.wustl.edu> > The issue at hand is whether we can support GenBank accessions/ > display_id/version with your naming scheme. Chris, I'm a little unsure of what you're saying here (which might mean that you're already saying what I'm about to...say). Do you mean it might be tricky to support both the Genbank standard and Dmitry's simultaneously? I would argue any arbitrary ID should be supported as long as that ID is a contiguous non-space word (\S+). Actually the existing accession regex looks like it already supports IDs with '-': /^ACCESSION\s+(\S.*\S)/ It's only the version regex which doesn't (\w doesn't include '-'): /^\w+\.(\d+)/ Anyone else have thoughts or comments on this? Off the top of my head, I can't think of any issues that might arise from doing so (apart from having to modify all of the SeqIO modules to support it). Dave From heikki at sanbi.ac.za Fri Feb 23 03:25:39 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Fri, 23 Feb 2007 10:25:39 +0200 Subject: [Bioperl-l] Sequence matching problem! In-Reply-To: <9107936.post@talk.nabble.com> References: <9107936.post@talk.nabble.com> Message-ID: <200702231025.39416.heikki@sanbi.ac.za> Kurt, There are few things in your code to note: - regexp /C*T/ matches any T preceded by zero or more Cs, not what you meant - $- and $+ are among the "expensive" perl functions worth not using unless you have to. Using them once in your code slows execution down considerable. There is always an other way. - Keep in mind what you want to use the match positions for: Human readable locations usually start counting with 1 but perl code uses 0 as the first location. The code below assumes you want to print the locations out. Study my example code below. Yours, -Heikki ################################################################### #!/usr/bin/perl $seq = "GATCAAT"; #$pattern= 'C*T'; $pattern= 'C.*T'; while ($seq =~ m/($pattern)/gi) { $match = $1; $end = pos($seq); $start = $end - length($match) +1; print "$match : $start - $end\n"; } ################################################################### On Thursday 22 February 2007 22:41:37 Kurt Gobain wrote: > Hi every1.. > I m facing a great deal of problem in simple pattern matching between > sequence & a pattern ..Program shod be designed such a way that it shod be > able do two things 1) normal matching...For eg: GATCAAT....if TC is > entered... output shod be 2...2) matching using spl character..In same > example if C*T value is entered It shod give o/p as 3 & seq to b displayed > is CAAT..I m easily getting 1st part...But in 2nd part Its giving sum > problem..output I m gettin as 1 instead of 3...Code is really simple! > > #!/usr/bin/perl > $alphabet = "GATCAAT"; > $pattern= "C*T "; > > $alphabet =~ /($pattern)/i; > > print "The entire '$pattern' match began at $-[0] and ended at $+[0]\n"; > > ==================== > OUTPUT! > The entire C*T match began at 1 and ended at 2 > ==================== > > but the o/p shod be 3???? > & Is there n e chance I can get seq too..I mean instead of C*T'' i need > 'CAAT'...???? > > Well..Its not compulsion to use regex....But I find it quite simple..can > there be n e other method?? > > Thanx in advance! > Kurt! -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From avilella at gmail.com Fri Feb 23 04:59:49 2007 From: avilella at gmail.com (Albert Vilella) Date: Fri, 23 Feb 2007 09:59:49 +0000 Subject: [Bioperl-l] Sequence matching problem! In-Reply-To: <200702231025.39416.heikki@sanbi.ac.za> References: <9107936.post@talk.nabble.com> <200702231025.39416.heikki@sanbi.ac.za> Message-ID: <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com> now that we are at this pattern matching thread, I was wondering if any perl guru could enlighten me on the issue of matching exact sequence patterns on a gapped target sequence. E.g.: my $seq = "CGATCAACGAATCGTACGTACTC"; my $gapped_seq = "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG"; and one would like to get as a result: "CG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTC" which is the match of $seq but in $gapped_seq. Cheers, Albert. On 2/23/07, Heikki Lehvaslaiho wrote: > Kurt, > > There are few things in your code to note: > > - regexp /C*T/ matches any T preceded by zero or more Cs, > not what you meant > - $- and $+ are among the "expensive" perl functions worth > not using unless you have to. Using them once in your > code slows execution down considerable. There is always > an other way. > - Keep in mind what you want to use the match positions for: > Human readable locations usually start counting with 1 but > perl code uses 0 as the first location. The code below assumes > you want to print the locations out. > > Study my example code below. > > Yours, > -Heikki > > ################################################################### > #!/usr/bin/perl > $seq = "GATCAAT"; > #$pattern= 'C*T'; > $pattern= 'C.*T'; > > while ($seq =~ m/($pattern)/gi) { > > $match = $1; > $end = pos($seq); > $start = $end - length($match) +1; > > print "$match : $start - $end\n"; > } > > ################################################################### > > > On Thursday 22 February 2007 22:41:37 Kurt Gobain wrote: > > Hi every1.. > > I m facing a great deal of problem in simple pattern matching between > > sequence & a pattern ..Program shod be designed such a way that it shod be > > able do two things 1) normal matching...For eg: GATCAAT....if TC is > > entered... output shod be 2...2) matching using spl character..In same > > example if C*T value is entered It shod give o/p as 3 & seq to b displayed > > is CAAT..I m easily getting 1st part...But in 2nd part Its giving sum > > problem..output I m gettin as 1 instead of 3...Code is really simple! > > > > #!/usr/bin/perl > > $alphabet = "GATCAAT"; > > $pattern= "C*T "; > > > > $alphabet =~ /($pattern)/i; > > > > print "The entire '$pattern' match began at $-[0] and ended at $+[0]\n"; > > > > ==================== > > OUTPUT! > > The entire C*T match began at 1 and ended at 2 > > ==================== > > > > but the o/p shod be 3???? > > & Is there n e chance I can get seq too..I mean instead of C*T'' i need > > 'CAAT'...???? > > > > Well..Its not compulsion to use regex....But I find it quite simple..can > > there be n e other method?? > > > > Thanx in advance! > > Kurt! > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From js5 at sanger.ac.uk Fri Feb 23 06:34:37 2007 From: js5 at sanger.ac.uk (James Smith) Date: Fri, 23 Feb 2007 11:34:37 +0000 (GMT) Subject: [Bioperl-l] Sequence matching problem! In-Reply-To: <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com> References: <9107936.post@talk.nabble.com> <200702231025.39416.heikki@sanbi.ac.za> <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com> Message-ID: On Fri, 23 Feb 2007, Albert Vilella wrote: > now that we are at this pattern matching thread, I was wondering if > any perl guru could enlighten me on the issue of matching exact > sequence patterns on a gapped target sequence. E.g.: > > my $seq = "CGATCAACGAATCGTACGTACTC"; > my $gapped_seq = > "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG"; > > and one would like to get as a result: > > "CG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTC" > > which is the match of $seq but in $gapped_seq. Try... my $seq = "CGATCAACGAATCGTACGTACTC"; my $gapped_seq = "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG"; my $regexp = '('.join('-*?',split//,$seq).')'; if( $gapped_seq =~ /$regexp/ ) { print "Match is $1\n"; } else { print "No match\n"; } (not sure on the efficiency if $seq is long tho') James > > Cheers, From khoueiry at ibdm.univ-mrs.fr Fri Feb 23 08:09:33 2007 From: khoueiry at ibdm.univ-mrs.fr (pierre) Date: Fri, 23 Feb 2007 14:09:33 +0100 Subject: [Bioperl-l] Sequence matching problem! In-Reply-To: <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com> References: <9107936.post@talk.nabble.com> <200702231025.39416.heikki@sanbi.ac.za> <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com> Message-ID: <1172236173.4309.6.camel@ciona-pierre> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From neetisomaiya at gmail.com Fri Feb 23 07:27:28 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Fri, 23 Feb 2007 17:57:28 +0530 Subject: [Bioperl-l] need help urgently - needle output parsing Message-ID: <764978cf0702230427x5b5acf73y6538527ade3fd453@mail.gmail.com> Hi, I am using needle alignment tool (standalone, on a linux machine), and then I am using Bioperl to parse the output. All data - sequence files and alignment outputs are attached with this mail. I have 2 small sequences :- 693.seq and revcomp693.seq I have 2 big sequences :- 80768-4291-5639.84809_84810_84809_1.scf.seq and 80768-4291-5639.84809_84810_84810_1.scf.seq All these are in fasta format Now I am doing the following :- 1) Aligning 80768-4291-5639.84809_84810_84809_1.scf.seq and 693.seq - output file is 80768-4291-5639.84809_84810_84809_1.scf.out parsing the output gives me the alignment start in 'traceseq' as 97 2) Aligning 80768-4291-5639.84809_84810_84809_1.scf.seq and revcomp693.seq - output file is 80768-4291-5639.84809_84810_84809_1.scf.comp.out parsing the output gives me the alignment start in 'traceseq' as 91 All this is correct. Now I am doing the following :- 1) Aligning 80768-4291-5639.84809_84810_84810_1.scf.seq and 693.seq - output file is 80768-4291-5639.84809_84810_84810_1.scf.out parsing the output gives me the alignment start in 'traceseq' as 341 (this is correct) 2) Aligning 80768-4291-5639.84809_84810_84810_1.scf.seq and revcomp693.seq - output file is 80768-4291-5639.84809_84810_84810_1.scf.comp.out parsing the output gives me the alignment start in 'traceseq' as 341 (this is incorrect, correct position is 330) Part of my code is as follows :- --------------------------------------------- # running needle `$needle_path./needle $trace.seq $snp_position_on_con.seq -gapopen 10.0-gapextend 0.5 $output`; # parsing needle output my $str = Bio::AlignIO->new(-format => 'emboss',-file => $output); my $aln = $str->next_aln(); my $pos = $aln->column_from_residue_number('original',1); $logger->info("Alignment pos is $pos"); #################################### # running needle `$needle_path./needle $trace.seq revcomp$snp_position_on_con.seq -gapopen 10.0 -gapextend 0.5 $comp_output`; # parsing needle output my $comp_str = Bio::AlignIO->new(-format => 'emboss',-file => $comp_output); my $comp_aln = $comp_str->next_aln(); my $comp_pos = $comp_aln->column_from_residue_number('revcomp',1); $logger->info("Alignment pos is $comp_pos"); Can someone please tell me what is going wrong here? -- -Neeti Even my blood says, B positive -------------- next part -------------- A non-text attachment was scrubbed... Name: data.zip Type: application/zip Size: 4456 bytes Desc: not available URL: From bix at sendu.me.uk Fri Feb 23 08:55:24 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 23 Feb 2007 13:55:24 +0000 Subject: [Bioperl-l] Sequence matching problem! In-Reply-To: References: <9107936.post@talk.nabble.com> <200702231025.39416.heikki@sanbi.ac.za> <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com> Message-ID: <45DEF24C.1010303@sendu.me.uk> James Smith wrote: > On Fri, 23 Feb 2007, Albert Vilella wrote: > >> now that we are at this pattern matching thread, I was wondering if >> any perl guru could enlighten me on the issue of matching exact >> sequence patterns on a gapped target sequence. E.g.: >> >> my $seq = "CGATCAACGAATCGTACGTACTC"; >> my $gapped_seq = >> "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG"; >> >> and one would like to get as a result: >> >> "CG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTC" >> >> which is the match of $seq but in $gapped_seq. > > Try... > > my $seq = "CGATCAACGAATCGTACGTACTC"; > my $gapped_seq = > "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG"; > > my $regexp = '('.join('-*?',split//,$seq).')'; > > if( $gapped_seq =~ /$regexp/ ) { > print "Match is $1\n"; > } else { > print "No match\n"; > } That's great stuff. If you were matching thousands of different $seq against the same very large $gapped_seq, and only needed the first match of $seq in $gapped_seq, the alternative to the above approach (remove the gaps from $gapped_seq and do index() matching) will be faster. Here's one (overly long-winded) way of implementing it, that I found to take ~2s vs ~22s for the above regex approach when doing the job on 999999 copies of $seq: #!/usr/bin/perl -w use strict; use warnings; my $gapped_seq = "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG"; # note the total gap-length at position in gapless 0-based coords my @gap_lengths; my $gap_length = 0; while ($gapped_seq =~ /(-+)/g) { my $match = $1; my $prev_length = $gap_length; $gap_length += length($match); my $end = pos($gapped_seq) - $gap_length - 1; push(@gap_lengths, $prev_length) for (1..$end-$#gap_lengths); } push(@gap_lengths, $gap_length) for (1..(length($gapped_seq) - @gap_lengths - $gap_length)); # remove the gaps my $gapless_seq = $gapped_seq; $gapless_seq =~ s/-//g; # now for each of thousands of seqs... my $seq = 'CGATCAACGAATCGTACGTACTC'; my @seqs; for (1..999999) { push(@seqs, $seq); } foreach my $seq (@seqs) { my $start = index($gapless_seq, $seq); if ($start == -1) { print "No match found for seq '$seq'\n"; next; } my $end = $start + length($seq) - 1; # calculate the coords in $gapped_seq $start = $start + $gap_lengths[$start]; $end = $end + $gap_lengths[$end]; my $result = substr($gapped_seq, $start, ($end - $start + 1)); #print $result, "\n"; } exit; From MEC at stowers-institute.org Fri Feb 23 10:54:57 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 23 Feb 2007 09:54:57 -0600 Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes with multiple values In-Reply-To: <6dce9a0b0701301446w7fc31d6eufe27442fecd0f20e@mail.gmail.com> Message-ID: Lincoln, and other Bio::DB::SeqFeature wanderers: I find that generating GFF from a Bio::DB::SeqFeature using gff3_string does not respect the following: "Multiple attributes of the same type are indicated by separating the values with the comma "," character" (c.f. http://www.sequenceontology.org/gff3.shtml) This one-liner demonstrates the problem: perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id => "J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A', -name => 'mec', -attributes => {foo => [qw(bar blat)]})->gff3_string' J A PH 1 2 . . . foo=bar;foo=blat;Name=mec Do you agree this is a problem? The fix is in the post-sig patch to /Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the stylistic privilege of promoting any ID, Parent, or Name attribute to the front of column 9, so output is now: J A PH 1 2 . . . Name=mec;foo=bar,blat Do you agree this is better? I am poised to commit it, as well as the functionally same patch to the equivilent function in Bio/Graphics/FeatureBase.pm All clear? -- Malcolm Cook *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25 --- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000 *************** *** 481,494 **** next if $t eq 'load_id'; next if $t eq 'parent_id'; foreach (@values) { s/\s+$// } # get rid of trailing whitespace ! ! push @result,join '=',$self->escape($t),$self->escape($_) foreach @values; } my $id = $self->primary_id; my $name = $self->display_name; ! push @result,"ID=".$self->escape($id) if defined $id; ! push @result,"Parent=".$self->escape($parent->primary_id) if defined $parent; ! push @result,"Name=".$self->escape($name) if defined $name; return join ';', at result; } --- 481,498 ---- next if $t eq 'load_id'; next if $t eq 'parent_id'; foreach (@values) { s/\s+$// } # get rid of trailing whitespace ! ! push @result,join '=',$self->escape($t),$self->escape($_) foreach @values; ! # NO! Multiple attributes of the same type are indicated by ! # separating the values with the comma "," character - per ! # http://www.sequenceontology.org/gff3.shtml. Do it this way: ! #push @result,join '=',$self->escape($t),join(',', map {$self->escape($_)} @values); } my $id = $self->primary_id; my $name = $self->display_name; ! unshift @result,"ID=".$self->escape($id) if defined $id; ! unshift @result,"Parent=".$self->escape($parent->primary_id) if defined $parent; ! unshift @result,"Name=".$self->escape($name) if defined $name; return join ';', at result; } From MEC at stowers-institute.org Fri Feb 23 12:08:11 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 23 Feb 2007 11:08:11 -0600 Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes withmultiple values In-Reply-To: Message-ID: Oy, I hit send too soon. The patch I send had my new attribute encoder commented out. It should've been: *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25 --- NormalizedFeature.pm 23 Feb 2007 17:06:37 -0000 *************** *** 481,494 **** next if $t eq 'load_id'; next if $t eq 'parent_id'; foreach (@values) { s/\s+$// } # get rid of trailing whitespace ! ! push @result,join '=',$self->escape($t),$self->escape($_) foreach @values; } my $id = $self->primary_id; my $name = $self->display_name; ! push @result,"ID=".$self->escape($id) if defined $id; ! push @result,"Parent=".$self->escape($parent->primary_id) if defined $parent; ! push @result,"Name=".$self->escape($name) if defined $name; return join ';', at result; } --- 481,497 ---- next if $t eq 'load_id'; next if $t eq 'parent_id'; foreach (@values) { s/\s+$// } # get rid of trailing whitespace ! # push @result,join '=',$self->escape($t),$self->escape($_) foreach @values; ! # NO! Multiple attributes of the same type are indicated by ! # separating the values with the comma "," character - per ! # http://www.sequenceontology.org/gff3.shtml. Do it this way: ! push @result,join '=',$self->escape($t),join(',', map {$self->escape($_)} @values); } my $id = $self->primary_id; my $name = $self->display_name; ! unshift @result,"ID=".$self->escape($id) if defined $id; ! unshift @result,"Parent=".$self->escape($parent->primary_id) if defined $parent; ! unshift @result,"Name=".$self->escape($name) if defined $name; return join ';', at result; } Malcolm From lstein at cshl.edu Fri Feb 23 12:16:01 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 23 Feb 2007 12:16:01 -0500 Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes with multiple values In-Reply-To: References: <6dce9a0b0701301446w7fc31d6eufe27442fecd0f20e@mail.gmail.com> Message-ID: <6dce9a0b0702230916o684e490cu9d01d3be3e2ae061@mail.gmail.com> Hi Malcom, You're quite right, and I appreciate your work in tracking down and fixing it. Before you commit the patch, can you confirm that the loader is working correctly so that comma-separated values are read back into the data structure as multiple attributes? Lincoln On 2/23/07, Cook, Malcolm wrote: > > Lincoln, and other Bio::DB::SeqFeature wanderers: > > I find that generating GFF from a Bio::DB::SeqFeature using gff3_string > does not respect the following: > > "Multiple attributes of the same type are indicated by separating the > values with the comma "," character" (c.f. > http://www.sequenceontology.org/gff3.shtml) > > This one-liner demonstrates the problem: > > perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id => > "J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A', > -name => 'mec', -attributes => {foo => [qw(bar blat)]})->gff3_string' > J A PH 1 2 . . . > foo=bar;foo=blat;Name=mec > > Do you agree this is a problem? > > The fix is in the post-sig patch to > /Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the > stylistic privilege of promoting any ID, Parent, or Name attribute to > the front of column 9, so output is now: > > J A PH 1 2 . . . > Name=mec;foo=bar,blat > > Do you agree this is better? > > I am poised to commit it, as well as the functionally same patch to the > equivilent function in Bio/Graphics/FeatureBase.pm > > All clear? > > -- Malcolm Cook > > > > *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25 > --- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000 > *************** > *** 481,494 **** > next if $t eq 'load_id'; > next if $t eq 'parent_id'; > foreach (@values) { s/\s+$// } # get rid of trailing whitespace > ! > ! push @result,join '=',$self->escape($t),$self->escape($_) foreach > @values; > } > my $id = $self->primary_id; > my $name = $self->display_name; > ! push @result,"ID=".$self->escape($id) if defined > $id; > ! push @result,"Parent=".$self->escape($parent->primary_id) if defined > $parent; > ! push @result,"Name=".$self->escape($name) if > defined $name; > return join ';', at result; > } > > --- 481,498 ---- > next if $t eq 'load_id'; > next if $t eq 'parent_id'; > foreach (@values) { s/\s+$// } # get rid of trailing whitespace > ! > ! push @result,join '=',$self->escape($t),$self->escape($_) foreach > @values; > ! # NO! Multiple attributes of the same type are indicated by > ! # separating the values with the comma "," character - per > ! # http://www.sequenceontology.org/gff3.shtml. Do it this way: > ! #push @result,join '=',$self->escape($t),join(',', map > {$self->escape($_)} @values); > } > my $id = $self->primary_id; > my $name = $self->display_name; > ! unshift @result,"ID=".$self->escape($id) if > defined $id; > ! unshift @result,"Parent=".$self->escape($parent->primary_id) if > defined $parent; > ! unshift @result,"Name=".$self->escape($name) if > defined $name; > return join ';', at result; > } > > > > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From aaron.j.mackey at gsk.com Fri Feb 23 09:36:18 2007 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Fri, 23 Feb 2007 09:36:18 -0500 Subject: [Bioperl-l] Bio::AnnotatableI vs Bio::FeatureHolderI In-Reply-To: <45DDFED1.1090503@watson.wustl.edu> Message-ID: The fundamental difference (in my mind) between a feature and an annotation, is that a feature has a location/range, and thus the information represented in the feature is applicable only to that location/range. An annotation, on the other hand, is "global", or at least non-localizable (note: a feature with a "fuzzy" location of "somewhere along this sequence, but I'm not sure where" is still not global - if you did/could know the location, you'd describe it as a feature, so it shouldn't be represented with an annotation). -Aaron bioperl-l-bounces at lists.open-bio.org wrote on 02/22/2007 03:36:33 PM: > Are Bio::AnnotatableI and Bio::FeatureHolderI competing interfaces? > > I get the impression they are designed to do similar things. If so is > one deprecated and the other preferred? > > If their responsibilities are orthogonal to each other, what sorts of > tasks are suited to each? > > Thanks, > Michael > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From MEC at stowers-institute.org Fri Feb 23 13:46:00 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 23 Feb 2007 12:46:00 -0600 Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes with multiple values In-Reply-To: <6dce9a0b0702230916o684e490cu9d01d3be3e2ae061@mail.gmail.com> Message-ID: Lincoln, OK. I'll do that... ...let's see, a quick squiz at Bio/DB/SeqFeature/Store/ .... ...ok - parse_attributes _looks_ right to me ...so, let's try it #load a feature into a new database: bp_seqfeature_load.PLS -dsn 'dbi:mysql:database=test;host=mysql-dev' -create -user test -pass test <(echo -e "J\tA\tPH\t1\t2\t.\t.\t.\tfoo=bar,blat;Name=mec\n") #It loaded ok. Now, let's print it out in GFF3: perl -MBio::DB::SeqFeature::Store -e 'foreach (Bio::DB::SeqFeature::Store->new(-dsn => "dbi:mysql:database=test;host=mysql-dev;user=test;password=test")->featu res(-type => "PH:A")) {print $_->gff3_string . "\n"}' J A PH 1 2 . . . Name=mec;ID=1;foo=bar,blat #output looks good to me Note, I tried loading attributes foo=bar;foo=blat and it came back foo=bar,blat. So, you can load either way. I'll commit later today. --Malcolm ________________________________ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: Friday, February 23, 2007 11:16 AM To: Cook, Malcolm Cc: bioperl list; lstein at cshl.org Subject: Re: Bio::DB::SeqFeature to GFF mishandles attributes with multiple values Hi Malcom, You're quite right, and I appreciate your work in tracking down and fixing it. Before you commit the patch, can you confirm that the loader is working correctly so that comma-separated values are read back into the data structure as multiple attributes? Lincoln On 2/23/07, Cook, Malcolm wrote: Lincoln, and other Bio::DB::SeqFeature wanderers: I find that generating GFF from a Bio::DB::SeqFeature using gff3_string does not respect the following: "Multiple attributes of the same type are indicated by separating the values with the comma "," character" (c.f. http://www.sequenceontology.org/gff3.shtml) This one-liner demonstrates the problem: perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id => "J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A', -name => 'mec', -attributes => {foo => [qw(bar blat)]})->gff3_string' J A PH 1 2 . . . foo=bar;foo=blat;Name=mec Do you agree this is a problem? The fix is in the post-sig patch to /Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the stylistic privilege of promoting any ID, Parent, or Name attribute to the front of column 9, so output is now: J A PH 1 2 . . . Name=mec;foo=bar,blat Do you agree this is better? I am poised to commit it, as well as the functionally same patch to the equivilent function in Bio/Graphics/FeatureBase.pm All clear? -- Malcolm Cook *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25 --- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000 *************** *** 481,494 **** next if $t eq 'load_id'; next if $t eq 'parent_id'; foreach (@values) { s/\s+$// } # get rid of trailing whitespace ! ! push @result,join '=',$self->escape($t),$self->escape($_) foreach @values; } my $id = $self->primary_id; my $name = $self->display_name; ! push @result,"ID=".$self->escape($id) if defined $id; ! push @result,"Parent=".$self->escape($parent->primary_id) if defined $parent; ! push @result,"Name=".$self->escape($name) if defined $name; return join ';', at result; } --- 481,498 ---- next if $t eq 'load_id'; next if $t eq 'parent_id'; foreach (@values) { s/\s+$// } # get rid of trailing whitespace ! ! push @result,join '=',$self->escape($t),$self->escape($_) foreach @values; ! # NO! Multiple attributes of the same type are indicated by ! # separating the values with the comma "," character - per ! # http://www.sequenceontology.org/gff3.shtml. Do it this way: ! #push @result,join '=',$self->escape($t),join(',', map {$self->escape($_)} @values); } my $id = $self->primary_id; my $name = $self->display_name; ! unshift @result,"ID=".$self->escape($id) if defined $id; ! unshift @result,"Parent=".$self->escape($parent->primary_id) if defined $parent; ! unshift @result,"Name=".$self->escape($name) if defined $name; return join ';', at result; } -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Fri Feb 23 13:49:44 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 23 Feb 2007 12:49:44 -0600 Subject: [Bioperl-l] Bio::AnnotatableI vs Bio::FeatureHolderI In-Reply-To: References: Message-ID: To add to that, there's a HOWTO describing the differences: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation I agree w/ Aaron; if it has a location it's a feature, otherwise it's an annotation. chris On Feb 23, 2007, at 8:36 AM, aaron.j.mackey at gsk.com wrote: > The fundamental difference (in my mind) between a feature and an > annotation, is that a feature has a location/range, and thus the > information represented in the feature is applicable only to that > location/range. An annotation, on the other hand, is "global", or at > least non-localizable (note: a feature with a "fuzzy" location of > "somewhere along this sequence, but I'm not sure where" is still not > global - if you did/could know the location, you'd describe it as a > feature, so it shouldn't be represented with an annotation). > > -Aaron > > bioperl-l-bounces at lists.open-bio.org wrote on 02/22/2007 03:36:33 PM: > >> Are Bio::AnnotatableI and Bio::FeatureHolderI competing interfaces? >> >> I get the impression they are designed to do similar things. If >> so is >> one deprecated and the other preferred? >> >> If their responsibilities are orthogonal to each other, what sorts of >> tasks are suited to each? >> >> Thanks, >> Michael >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From lstein at cshl.edu Fri Feb 23 16:20:26 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 23 Feb 2007 16:20:26 -0500 Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes with multiple values In-Reply-To: References: <6dce9a0b0702230916o684e490cu9d01d3be3e2ae061@mail.gmail.com> Message-ID: <6dce9a0b0702231320j1f24d4b4oe33bce6d2da96db7@mail.gmail.com> Excellent! Lincoln On 2/23/07, Cook, Malcolm wrote: > > Lincoln, > > OK. I'll do that... > > ...let's see, a quick squiz at Bio/DB/SeqFeature/Store/ .... > > ...ok - parse_attributes _looks_ right to me > > ...so, let's try it > > #load a feature into a new database: > > bp_seqfeature_load.PLS -dsn 'dbi:mysql:database=test;host=mysql-dev' > -create -user test -pass test <(echo -e "J\tA\tPH\t1\t2\t.\t.\t.\tfoo=bar, > blat;Name=mec\n") > > #It loaded ok. Now, let's print it out in GFF3: > > perl -MBio::DB::SeqFeature::Store -e 'foreach > (Bio::DB::SeqFeature::Store->new(-dsn => > "dbi:mysql:database=test;host=mysql-dev;user=test;password=test")->features(-type > => "PH:A")) {print $_->gff3_string . "\n"}' > J A PH 1 2 . . . Name=mec;ID=1;foo=bar,blat > > #output looks good to me > > Note, I tried loading attributes foo=bar;foo=blat and it came back > foo=bar,blat. So, you can load either way. > > I'll commit later today. > > --Malcolm > > > ------------------------------ > *From:* lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] *On > Behalf Of *Lincoln Stein > *Sent:* Friday, February 23, 2007 11:16 AM > *To:* Cook, Malcolm > *Cc:* bioperl list; lstein at cshl.org > *Subject:* Re: Bio::DB::SeqFeature to GFF mishandles attributes with > multiple values > > Hi Malcom, > > You're quite right, and I appreciate your work in tracking down and fixing > it. Before you commit the patch, can you confirm that the loader is working > correctly so that comma-separated values are read back into the data > structure as multiple attributes? > > Lincoln > > On 2/23/07, Cook, Malcolm wrote: > > > > Lincoln, and other Bio::DB::SeqFeature wanderers: > > > > I find that generating GFF from a Bio::DB::SeqFeature using gff3_string > > does not respect the following: > > > > "Multiple attributes of the same type are indicated by separating the > > values with the comma "," character" (c.f. > > http://www.sequenceontology.org/gff3.shtml) > > > > This one-liner demonstrates the problem: > > > > perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id => > > "J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A', > > -name => 'mec', -attributes => {foo => [qw(bar blat)]})->gff3_string' > > J A PH 1 2 . . . > > foo=bar;foo=blat;Name=mec > > > > Do you agree this is a problem? > > > > The fix is in the post-sig patch to > > /Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the > > stylistic privilege of promoting any ID, Parent, or Name attribute to > > the front of column 9, so output is now: > > > > J A PH 1 2 . . . > > Name=mec;foo=bar,blat > > > > Do you agree this is better? > > > > I am poised to commit it, as well as the functionally same patch to the > > equivilent function in Bio/Graphics/FeatureBase.pm > > > > All clear? > > > > -- Malcolm Cook > > > > > > > > *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25 > > --- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000 > > *************** > > *** 481,494 **** > > next if $t eq 'load_id'; > > next if $t eq 'parent_id'; > > foreach (@values) { s/\s+$// } # get rid of trailing whitespace > > ! > > ! push @result,join '=',$self->escape($t),$self->escape($_) foreach > > @values; > > } > > my $id = $self->primary_id; > > my $name = $self->display_name; > > ! push @result,"ID=".$self->escape($id) if defined > > > > $id; > > ! push @result,"Parent=".$self->escape($parent->primary_id) if defined > > $parent; > > ! push @result,"Name=".$self->escape($name) if > > defined $name; > > return join ';', at result; > > } > > > > --- 481,498 ---- > > next if $t eq 'load_id'; > > next if $t eq 'parent_id'; > > foreach (@values) { s/\s+$// } # get rid of trailing whitespace > > ! > > ! push @result,join '=',$self->escape($t),$self->escape($_) foreach > > > > @values; > > ! # NO! Multiple attributes of the same type are indicated by > > ! # separating the values with the comma "," character - per > > ! # http://www.sequenceontology.org/gff3.shtml. Do it this way: > > ! #push @result,join '=',$self->escape($t),join(',', map > > {$self->escape($_)} @values); > > } > > my $id = $self->primary_id; > > my $name = $self->display_name; > > ! unshift @result,"ID=".$self->escape($id) if > > defined $id; > > ! unshift @result,"Parent=".$self->escape($parent->primary_id) if > > defined $parent; > > ! unshift @result,"Name=".$self->escape($name) if > > defined $name; > > return join ';', at result; > > } > > > > > > > > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From enrique_rulz at yahoo.com Sat Feb 24 16:23:59 2007 From: enrique_rulz at yahoo.com (Kurt Gobain) Date: Sat, 24 Feb 2007 13:23:59 -0800 (PST) Subject: [Bioperl-l] Sequence matching problem! In-Reply-To: <200702231025.39416.heikki@sanbi.ac.za> References: <9107936.post@talk.nabble.com> <200702231025.39416.heikki@sanbi.ac.za> Message-ID: <9137941.post@talk.nabble.com> Heikki Lehvaslaiho wrote: > > Kurt, > > There are few things in your code to note: > > - regexp /C*T/ matches any T preceded by zero or more Cs, > not what you meant > - $- and $+ are among the "expensive" perl functions worth > not using unless you have to. Using them once in your > code slows execution down considerable. There is always > an other way. > - Keep in mind what you want to use the match positions for: > Human readable locations usually start counting with 1 but > perl code uses 0 as the first location. The code below assumes > you want to print the locations out. > > Study my example code below. > > Yours, > -Heikki > > ################################################################### > #!/usr/bin/perl > $seq = "GATCAAT"; > #$pattern= 'C*T'; > $pattern= 'C.*T'; > > while ($seq =~ m/($pattern)/gi) { > > $match = $1; > $end = pos($seq); > $start = $end - length($match) +1; > > print "$match : $start - $end\n"; > } > > ################################################################### > > Thanx for the instant reply!...Sorry cudn reply earlier.. Code works perfectly fine...but...sum time its not givin reqd o/p..For eg. If I type sequence as "GATCAAGTCAGGAT" & pattern to be matched as T.*A..then o/p which I am getting frm above prog is TCAAGTCAGGA instead of TCA... & 1 more thing Is there n e chance by which I can replace T*A to T.*A cos the code which I need to write says T*A shod be only the input not T.*A..So Can we use replacment reg ex...sumthing like $pattern =~ s/.*/*/...or sumthing else... But its kinda givin sum error again...Dam! Regex is really hairy!!...:P N e ways thanx a lot again for the code...Hope to listen frm you soon! Kurt! -- View this message in context: http://www.nabble.com/Sequence-matching-problem%21-tf3275153.html#a9137941 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From biology0046 at hotmail.com Sat Feb 24 23:14:51 2007 From: biology0046 at hotmail.com (=?gb2312?B?va0gzsTi/Q==?=) Date: Sun, 25 Feb 2007 04:14:51 +0000 Subject: [Bioperl-l] how to change align output format Message-ID: Dear all: I have problems in changing the output format of clustal alignment. I use the Bio::Tools::Run::Alignment::Clustalw module to carry out an mulitple sequences alignment, then i use the Bio::AlignIO module to write out the alignment. Scripts like this: my $aln_out=Bio::AlignIO->new(-file=>">./clustal/${outfilename}.aln",-format=>'clustalw'); The output : dana_GLEANR_16071 MSKMKMLPVQLSLNQLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL dere_GLEANR_9270 ..............S............................................. FBgn0000097 ..............S............................................. dsec_GLEANR_671 ..............S............................................. dsim_GLEANR_6613 ..............S............................................. dyak_GLEANR_1669 ..............S............................................. . dana_GLEANR_16071 VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM dere_GLEANR_9270 ............................................................ FBgn0000097 ............................................................ dsec_GLEANR_671 ............................................................ dsim_GLEANR_6613 ............................................................ dyak_GLEANR_1669 ............................................................ But , I want to change the output format as below, which do not change the identical residues into "." character. dere_GLEANR_9270 MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL dyak_GLEANR_1669 MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL dsec_GLEANR_671 MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL dsim_GLEANR_6613 MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL FBgn0000097 MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL dana_GLEANR_16071 MSKMKMLPVQLSLNQLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL **************.********************************************* dere_GLEANR_9270 VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM dyak_GLEANR_1669 VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM dsec_GLEANR_671 VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM dsim_GLEANR_6613 VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM FBgn0000097 VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM dana_GLEANR_16071 VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM ************************************************************ Are their any parameters in the package that can be changed so that i can get the postier output format? Thank you Sincerely! Jiang _________________________________________________________________ ?????????????????????????????? MSN Hotmail?? http://www.hotmail.com From bix at sendu.me.uk Sun Feb 25 05:53:48 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 25 Feb 2007 10:53:48 +0000 Subject: [Bioperl-l] [Fwd: Bio::Graph::SimpleGraph] Message-ID: <45E16ABC.3060405@sendu.me.uk> Tels, I've forwarded this to the author of the module, Nat Goodman, and to the Bioperl mailing list (http://www.bioperl.org/wiki/Mailing_lists#Main_BioPerl_list). But actually we have Bio::Graph::* as tentatively deprecated: http://www.bioperl.org/wiki/Deprecated_modules#Bio::Graph_modules so any further work on it doesn't seem worthwhile. -------- Original Message -------- Subject: Bio::Graph::SimpleGraph Date: Sat, 24 Feb 2007 12:07:31 +0100 From: Tels Moin, I just stumble dover Bio::Graph::SimpleGraph and read this comment: "This is a simple, hopefully fast undirected graph package. The only reason this exists is that the standard CPAN Graph pacakge, Graph::Base, is seriously broken." Really sad to see people always reinventing the wheel :/ Anyway, I wonder if you would like to make your module support Graph::Easy (http://search.cpan.org/~tels/Graph-Easy/)? I would be willing to submit patches and do testing/documention for that. All the best, Tels From bix at sendu.me.uk Sun Feb 25 05:45:21 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 25 Feb 2007 10:45:21 +0000 Subject: [Bioperl-l] Sequence matching problem! In-Reply-To: <9137941.post@talk.nabble.com> References: <9107936.post@talk.nabble.com> <200702231025.39416.heikki@sanbi.ac.za> <9137941.post@talk.nabble.com> Message-ID: <45E168C1.80306@sendu.me.uk> Kurt Gobain wrote: > Code works perfectly fine...but...sum time its not givin reqd o/p..For eg. > If I type sequence as "GATCAAGTCAGGAT" & pattern to be matched as T.*A..then > o/p which I am getting frm above prog is TCAAGTCAGGA instead of TCA... > & 1 more thing Is there n e chance by which I can replace T*A to T.*A cos > the code which I need to write says T*A shod be only the input not T.*A..So > Can we use replacment reg ex...sumthing like > $pattern =~ s/.*/*/...or sumthing else... > But its kinda givin sum error again...Dam! Regex is really hairy!!...:P These aren't Bioperl questions. For regular expression help see: http://perldoc.perl.org/perlretut.html Basically, you want a non-greedy match, so T.*?A You can convert T*A by doing s/\*/.*?/ Here are some more regexs for you: s/sum/some/g s/frm/from/g s/n e/any/g etc... From biology0046 at hotmail.com Sun Feb 25 08:28:34 2007 From: biology0046 at hotmail.com (=?gb2312?B?va0gzsTi/Q==?=) Date: Sun, 25 Feb 2007 13:28:34 +0000 Subject: [Bioperl-l] AlignIO problems Message-ID: hi, all, I use the AlignIO module to convert the alignment file. my original file is : CLUSTAL W(1.81) multiple sequence alignment dana_GLEANR_11249 MEAIAKHDFSATADDELSFRKTQTLKILNMEDDSNWYRAELDGKEGLIPSNYIEMKNHDW dere_GLEANR_7213 ...V...................I.................................... dgri_GLEANR_6962 .......................I.................................... FBgn0004638 .......................I.................................... dmoj_GLEANR_6118 ...........N...........I.................................... dper_GLEANR_18885 ...V...................I.................................... dpse_GLEANR_14384 ...V...................I.................................... dsec_GLEANR_3096 .................N.....I.................................... dsim_GLEANR_9744 -----------------------------............................... dvir_GLEANR_4811 .......................I.................................... dwil_GLEANR_10869 .......................I.................................... dyak_GLEANR_13576 .......................I.................................... dana_GLEANR_11249 YYGRITRADAEKLLSNKHEGAFLIRISESSPGDFSLSVKCPDGVQHFKVLRDAQSKFFLW dere_GLEANR_7213 ............................................................ dgri_GLEANR_6962 ............................................................ FBgn0004638 ............................................................ dmoj_GLEANR_6118 .................L.......................................... dper_GLEANR_18885 ............................................................ dpse_GLEANR_14384 ............................................................ dsec_GLEANR_3096 ............................................................ dsim_GLEANR_9744 ............................................................ dvir_GLEANR_4811 ............................................................ dwil_GLEANR_10869 ............................................................ dyak_GLEANR_13576 ............................................................ dana_GLEANR_11249 VVKFNSLNELVEYHRTASVSRSQDVKLRDMIPEEMLVQALYDFVPQESGELDFRRGDVIT dere_GLEANR_7213 ............................................................ dgri_GLEANR_6962 ............................................................ FBgn0004638 ............................................................ dmoj_GLEANR_6118 ..............................V.D........................... dper_GLEANR_18885 .......................E.................................... dpse_GLEANR_14384 .......................E.................................... dsec_GLEANR_3096 ............................................................ dsim_GLEANR_9744 ............................................................ dvir_GLEANR_4811 ............................................................ dwil_GLEANR_10869 ............................................................ dyak_GLEANR_13576 ............................................................ dana_GLEANR_11249 VTDRSDENWWNGEIGNRKGIFPATYVTPYHS dere_GLEANR_7213 ............................... dgri_GLEANR_6962 ............................... FBgn0004638 ............................... dmoj_GLEANR_6118 ............Q.................. dper_GLEANR_18885 ............................... dpse_GLEANR_14384 ............................... dsec_GLEANR_3096 ............................... dsim_GLEANR_9744 ............................... dvir_GLEANR_4811 ............................... dwil_GLEANR_10869 ............................... dyak_GLEANR_13576 ............................... I want to change those "." characters back to alphabetic expression, then i write the code like this: use Bio::AlignIO; my $in=Bio::AlignIO->new(-file =>"FBgn0000097.aln", -format => 'clustalw'); my $out=Bio::AlignIO->new(-file=>">../clustalw/0097.aln", -format =>'clustalw'); while (my $aln=$in->next_aln() ){ $aln->unmatch(); $aln->set_displayname_flat(); $out->write_aln($aln); } but when i run the code, there are error message like: -------------------- WARNING --------------------- MSG: Got a sequence with no letters in it cannot guess alphabet [] --------------------------------------------------- ------------- EXCEPTION ------------- MSG: No sequence with name [dsim_GLEANR_9744/1-182] STACK Bio::SimpleAlign::displayname /home/src/bioperl/bioperl-live/Bio/SimpleAlign.pm:2307 STACK Bio::SimpleAlign::set_displayname_flat /home/src/bioperl/bioperl-live/Bio/SimpleAlign.pm:2374 STACK toplevel aligntest.pl:11 -------------------------------------- I don't know where is the problem. Jiang _________________________________________________________________ ???????? MSN Explorer: http://explorer.msn.com/lccn/ From cjfields at uiuc.edu Sun Feb 25 14:58:23 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 25 Feb 2007 13:58:23 -0600 Subject: [Bioperl-l] AlignIO problems In-Reply-To: References: Message-ID: <19EA5F46-D1A4-45B5-B2DB-55194F79215C@uiuc.edu> Bio::AlignIO::clustalw doesn't work with masked sequences; it parses the output quite literally as is, so any [.-] are treated as gaps. If the seqs are 100% identical then you will have a seq with 100% gaps and no sequence, thus giving you the warnings you see. The best way to accomplish what you want is to not mask the sequence alignment to begin with when running clustalw/muscle/whatever. Exactly how are you generating these? When I use clustalw no identity masking occurs by default. chris On Feb 25, 2007, at 7:28 AM, ? ?? wrote: > hi, all, > I use the AlignIO module to convert the alignment file. > my original file is : > CLUSTAL W(1.81) multiple sequence alignment > > > dana_GLEANR_11249 > MEAIAKHDFSATADDELSFRKTQTLKILNMEDDSNWYRAELDGKEGLIPSNYIEMKNHDW > dere_GLEANR_7213 ...V...................I....................... > ............. > dgri_GLEANR_6962 .......................I....................... > ............. > FBgn0004638 .......................I....................... > ............. > dmoj_GLEANR_6118 ...........N...........I....................... > ............. > dper_GLEANR_18885 ...V...................I....................... > ............. > dpse_GLEANR_14384 ...V...................I....................... > ............. > dsec_GLEANR_3096 .................N.....I....................... > ............. > dsim_GLEANR_9744 > -----------------------------............................... > dvir_GLEANR_4811 .......................I....................... > ............. > dwil_GLEANR_10869 .......................I....................... > ............. > dyak_GLEANR_13576 .......................I....................... > ............. > > > > dana_GLEANR_11249 > YYGRITRADAEKLLSNKHEGAFLIRISESSPGDFSLSVKCPDGVQHFKVLRDAQSKFFLW > dere_GLEANR_7213 ............................................... > ............. > dgri_GLEANR_6962 ............................................... > ............. > FBgn0004638 ............................................... > ............. > dmoj_GLEANR_6118 .................L............................. > ............. > dper_GLEANR_18885 ............................................... > ............. > dpse_GLEANR_14384 ............................................... > ............. > dsec_GLEANR_3096 ............................................... > ............. > dsim_GLEANR_9744 ............................................... > ............. > dvir_GLEANR_4811 ............................................... > ............. > dwil_GLEANR_10869 ............................................... > ............. > dyak_GLEANR_13576 ............................................... > ............. > > > > dana_GLEANR_11249 > VVKFNSLNELVEYHRTASVSRSQDVKLRDMIPEEMLVQALYDFVPQESGELDFRRGDVIT > dere_GLEANR_7213 ............................................... > ............. > dgri_GLEANR_6962 ............................................... > ............. > FBgn0004638 ............................................... > ............. > dmoj_GLEANR_6118 ..............................V.D.............. > ............. > dper_GLEANR_18885 .......................E....................... > ............. > dpse_GLEANR_14384 .......................E....................... > ............. > dsec_GLEANR_3096 ............................................... > ............. > dsim_GLEANR_9744 ............................................... > ............. > dvir_GLEANR_4811 ............................................... > ............. > dwil_GLEANR_10869 ............................................... > ............. > dyak_GLEANR_13576 ............................................... > ............. > > > > dana_GLEANR_11249 VTDRSDENWWNGEIGNRKGIFPATYVTPYHS > dere_GLEANR_7213 ............................... > dgri_GLEANR_6962 ............................... > FBgn0004638 ............................... > dmoj_GLEANR_6118 ............Q.................. > dper_GLEANR_18885 ............................... > dpse_GLEANR_14384 ............................... > dsec_GLEANR_3096 ............................... > dsim_GLEANR_9744 ............................... > dvir_GLEANR_4811 ............................... > dwil_GLEANR_10869 ............................... > dyak_GLEANR_13576 ............................... > > > I want to change those "." characters back to alphabetic > expression, then i write the code like this: > use Bio::AlignIO; > my $in=Bio::AlignIO->new(-file =>"FBgn0000097.aln", > -format => 'clustalw'); > my $out=Bio::AlignIO->new(-file=>">../clustalw/0097.aln", > -format =>'clustalw'); > while (my $aln=$in->next_aln() ){ > $aln->unmatch(); > $aln->set_displayname_flat(); > $out->write_aln($aln); > } > > but when i run the code, there are error message like: > > -------------------- WARNING --------------------- > MSG: Got a sequence with no letters in it cannot guess alphabet [] > --------------------------------------------------- > > ------------- EXCEPTION ------------- > MSG: No sequence with name [dsim_GLEANR_9744/1-182] > STACK Bio::SimpleAlign::displayname /home/src/bioperl/bioperl-live/ > Bio/SimpleAlign.pm:2307 > STACK Bio::SimpleAlign::set_displayname_flat /home/src/bioperl/ > bioperl-live/Bio/SimpleAlign.pm:2374 > STACK toplevel aligntest.pl:11 > > -------------------------------------- > > I don't know where is the problem. > > Jiang > > _________________________________________________________________ > ???? MSN Explorer: http://explorer.msn.com/lccn/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cristiangary at gmail.com Sun Feb 25 16:04:57 2007 From: cristiangary at gmail.com (Cristian Gary) Date: Sun, 25 Feb 2007 18:04:57 -0300 Subject: [Bioperl-l] problem with blast report to ncbi webpage Message-ID: <95ef8cd0702251304o45bea6a0tcedc59156cb0cfe4@mail.gmail.com> i have a problem with the blast report to the ncbi server. the time to wait the Rids dont showme any result. the problem is the ncbi server o the biperl version.? pd: the same code works very well a 3 weeks ago. -- "El conocimiento le pertecene a la humanidad" "Gnu/linux -------- free your mind...... www.kubuntu.org From granjeau at tagc.univ-mrs.fr Mon Feb 26 04:17:15 2007 From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137) Date: Mon, 26 Feb 2007 10:17:15 +0100 Subject: [Bioperl-l] Reading a XML sequence (UniParc) into a BioSeq object Message-ID: <45E2A59B.6080300@tagc.univ-mrs.fr> Hello ! I would like to fill a BioSeq object with the output from a dbfetch request at EI on UniParc database (which replies only XML code, as I am interested in references). If somebody could tell which BioPerl object to use or a way or convert it in Swiss format or could tell me the way to do it or has got a piece of code (is http://doc.bioperl.org/bioperl-live/Bio/SeqIO/interpro.html a good starting point), I would appreciate a lot. Best regards, --Samuel MSTRSVSSSSYRRMFGGPGTASRPSSSRSYVTTSTRTYSLGSALRPSTSRSLYASSPGGV YATRSSAVRLRSSVPGVRLLQDSVDFSLADAINTEFKNTRTNEKVELQELNDRFANYIDK VRFLEQQNKILLAELEQLKGQGKSRLGDLYEEEMRELRRQVDQLTNDKARVEVERDNLAE DIMRLREKLQEEMLQREEAENTLQSFRQDVDNASLARLDLERKVESLQEEIAFLKKLHEE EIQELQAQIQEQHVQIDVDVSKPDLTAALRDVRQQYESVAAKNLQEAEEWYKSKFADLSE AANRNNDALRQAKQESTEYRRQVQSLTCEVDALKGTNESLERQMREMEENFAVEAANYQD TIGRLQDEIQNMKEEMARHLREYQDLLNVKMALDIEIATYRKLLEGEESRISLPLPNFSS LNLRGKHFISL From bix at sendu.me.uk Mon Feb 26 06:46:39 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 26 Feb 2007 11:46:39 +0000 Subject: [Bioperl-l] [Fwd: Bio::Graph::SimpleGraph] In-Reply-To: <45E16ABC.3060405@sendu.me.uk> References: <45E16ABC.3060405@sendu.me.uk> Message-ID: <45E2C89F.1020402@sendu.me.uk> Nat replied, but I messed up to To:s so his reply didn't make it to the list. Here's what he said: Nathan (Nat) Goodman wrote: Hi Tels I agree it's sad to reinvent the wheel, but I don't think that's what happened here. Your module seems to be focused on rendering graphs while my module is concerned with computations on graphs. In any case, as Sendu notes, SimpleGraph is in the process of being deprecated. I fully support this move. It was intended to be a stopgap until the main Perl Graph module was fixed. Since that has now happened, it's time for SimpleGraph to retire. For the benefit of anyone using Graph: last I checked (six months or more ago), it had serious performance problems on large graphs (probably not too much of a surprise), and also was inexplicably slow on graphs with edge attributes. I see that the latter bug is marked "resolved" in CPAN, but there's no indication of when or how. We've moved to Boost for graphs as large as the human protein interaction network. Best, Nat From sanjib at bic.boseinst.ernet.in Mon Feb 26 00:23:36 2007 From: sanjib at bic.boseinst.ernet.in (Sanjib Kumar Gupta) Date: Mon, 26 Feb 2007 10:53:36 +0530 Subject: [Bioperl-l] Remote blast In-Reply-To: <20070221064743.M54123@bic.boseinst.ernet.in> References: <20070221064743.M54123@bic.boseinst.ernet.in> Message-ID: <20070226052336.M74918@bic.boseinst.ernet.in> Hi I have been running this script for some time and it was running fine. I am using this linux machine with live IP(no proxy). But suudenly it has stopped working with this errors waiting...waiting... -------------------- WARNING --------------------- MSG: An Error Occurred

An Error Occurred

500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad hostname 'www.ncbi.nlm.nih.gov') --------------------------------------------------- xx.pep -------------------- WARNING --------------------- MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5 Content-Length: 497 Content-Type: application/x-www-form-urlencoded DATABASE=nr&QUERY=%3ENM_005277.3%3B_1414961_1+837% 0AMEENMEEGQTQKGCFECCIKCLGGIPYASLIATILLYAGVALFCGCGHEALSGTVNILQTYFEMARTAGDTLDVF TMIDIFKYVIYGIAAAFFVYGILLMVEGFFTTGAIKDLYGDFKITTCGRCVSAWFIMLTYLFMLAWLGVTAFTSLPV YMYFNLWTICRNTTLVEGANLCLDLRQFGIVTIGEEKKICTVSENFLRMCESTELNMTFHLFIVALAGAGAAVIAMV HYLMVLSANWAYVKDACRMQKYEDIKSKEEQELHDIHSTRSKERLNAYT*&COMPOSITION_BASED_STATISTI CS=off&EXPECT=1e- 10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&MATRIX_NAME=BLOSUM62&ENTREZ_ QUERY=Xenopus+laevis[Organism]&PROGRAM=blastp An Error Occurred

An Error Occurred

500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad hostname 'www.ncbi.nlm.nih.gov') --------------------------------------------------- waiting...waiting... -------------------- WARNING --------------------- MSG: An Error Occurred

An Error Occurred

500 Internal Server Error --------------------------------------------------- Though I am able to see the ncbi page from browser but am unable to ping ot trace route to the server. Please help me. On Wed, 21 Feb 2007 01:00:46 -0500, bioperl-l-request wrote > Mailing list subscription confirmation notice for mailing list > Bioperl-l > > We have received a request from 202.141.148.27 for subscription of > your email address, "sanjib at bic.boseinst.ernet.in", to the > bioperl-l at lists.open-bio.org mailing list. To confirm that you want > to be added to this mailing list, simply reply to this message, > keeping the Subject: header intact. Or visit this web page: > > http://lists.open-bio.org/mailman/confirm/bioperl- l/d31449c0ad1146c7ae6d2d9b585816664f476568 > > Or include the following line -- and only the following line -- in a > message to bioperl-l-request at lists.open-bio.org: > > confirm d31449c0ad1146c7ae6d2d9b585816664f476568 > > Note that simply sending a `reply' to this message should work from > most mail readers, since that usually leaves the Subject: line in the > right form (additional "Re:" text in the Subject: is okay). > > If you do not wish to be subscribed to this list, please simply > disregard this message. If you think you are being maliciously > subscribed to the list, or have any other questions, send them to > bioperl-l-owner at lists.open-bio.org. -- Sanjib Kumar Gupta Bioinformatics Centre Bose Institute Kolkata 700054, INDIA Phone : +91-33-2355 6626, 2816, 2355 4766 Fax : +91-33-2355 3886 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: n9.pl URL: From cjfields at uiuc.edu Mon Feb 26 09:59:21 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 26 Feb 2007 08:59:21 -0600 Subject: [Bioperl-l] Remote blast In-Reply-To: <20070226052336.M74918@bic.boseinst.ernet.in> References: <20070221064743.M54123@bic.boseinst.ernet.in> <20070226052336.M74918@bic.boseinst.ernet.in> Message-ID: I tested this out and got BLAST to work for my test case (single fasta seq, since you didn't send any seqs for testing). It keeps querying for the RID in what appears to be an infinite loop (i.e. it doesn't get rid of the RID properly); you can see this if you add '- verbose => 1' to your parameters. I don't have time to delve into it but from a quick glance it may be due to your looping structure and how you are saving your rids. As for your particular error, could it be something as simple as the server was overloaded or down? It does happen from time to time... Beyond that I can't make heads or tails of your script. Was it cobbled together from a bunch of others? If you are doing that you can probably expect some bugs to occur. chris On Feb 25, 2007, at 11:23 PM, Sanjib Kumar Gupta wrote: > Hi > I have been running this script for some time and it was running > fine. I am > using this linux machine with live IP(no proxy). But suudenly it > has stopped > working with this errors > > waiting...waiting... > -------------------- WARNING --------------------- > MSG: > An Error Occurred > >

An Error Occurred

> 500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad > hostname 'www.ncbi.nlm.nih.gov') > > > > --------------------------------------------------- > xx.pep > > -------------------- WARNING --------------------- > MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi > User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5 > Content-Length: 497 > Content-Type: application/x-www-form-urlencoded > > DATABASE=nr&QUERY=%3ENM_005277.3%3B_1414961_1+837% > 0AMEENMEEGQTQKGCFECCIKCLGGIPYASLIATILLYAGVALFCGCGHEALSGTVNILQTYFEMARTA > GDTLDVF > TMIDIFKYVIYGIAAAFFVYGILLMVEGFFTTGAIKDLYGDFKITTCGRCVSAWFIMLTYLFMLAWLGVT > AFTSLPV > YMYFNLWTICRNTTLVEGANLCLDLRQFGIVTIGEEKKICTVSENFLRMCESTELNMTFHLFIVALAGAG > AAVIAMV > HYLMVLSANWAYVKDACRMQKYEDIKSKEEQELHDIHSTRSKERLNAYT*&COMPOSITION_BASED_S > TATISTI > CS=off&EXPECT=1e- > 10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&MATRIX_NAME=BLOSUM62& > ENTREZ_ > QUERY=Xenopus+laevis[Organism]&PROGRAM=blastp > > > An Error Occurred > >

An Error Occurred

> 500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad > hostname 'www.ncbi.nlm.nih.gov') > > > > --------------------------------------------------- > waiting...waiting... > -------------------- WARNING --------------------- > MSG: > An Error Occurred > >

An Error Occurred

> 500 Internal Server Error > > > > --------------------------------------------------- > > Though I am able to see the ncbi page from browser but am unable to > ping ot > trace route to the server. > > Please help me. From cjfields at uiuc.edu Mon Feb 26 10:05:50 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 26 Feb 2007 09:05:50 -0600 Subject: [Bioperl-l] AlignIO problems In-Reply-To: References: Message-ID: <082E0708-6B1C-45CE-B387-429B8B6A8D7A@uiuc.edu> Make sure to keep this on the list, others may have some input. You should be able to test the various sequence objects you're retrieving from Bio::DB::Fasta via Bio::SeqIO to see if they are what you're expecting, then track down the problematic sequences. My guess is the odd seqs are due to the way you are using Bio::DB::Fasta for each of the files. I'm wondering if you are having problems with indices overwriting one another and are thus getting back blank seq objects. You should probably consider just indexing all of your files together; according to the POD you can use a single Bio::DB::Fasta to index all of the files in one go (indicate the path and use '-glob') and retrieve what you need that way. Either that or separating them into separate directories so the indices are also separate. chris On Feb 25, 2007, at 9:50 PM, ? ?? wrote: > Thank you for your help! > May be you are right, I use the following code to create my seq > object arrays: > my $outfilename=$dmel; > my $ana_pep_db=Bio::DB::Fasta->new("dana.translation.fasta"); > my $ana_cdna_db=Bio::DB::Fasta->new("dana.cds.fasta"); > my $ere_pep_db=Bio::DB::Fasta->new("dere.translation.fasta"); > my $ere_cdna_db=Bio::DB::Fasta->new("dere.cds.fasta"); > my $mel_pep_db=Bio::DB::Fasta->new("dmel.translation.fasta"); > my $mel_cdna_db=Bio::DB::Fasta->new("dmel.cds.fasta"); > my $sec_pep_db=Bio::DB::Fasta->new("dsec.translation.fasta"); > my $sec_cdna_db=Bio::DB::Fasta->new("dsec.cds.fasta"); > my $sim_pep_db=Bio::DB::Fasta->new("dsim.translation.fasta"); > my $sim_cdna_db=Bio::DB::Fasta->new("dsim.cds.fasta"); > my $yak_pep_db=Bio::DB::Fasta->new("dyak.translation.fasta"); > my $yak_cdna_db=Bio::DB::Fasta->new("dyak.cds.fasta"); > my $ana_pep_obj=$ana_pep_db->get_Seq_by_id($dana); > my $ana_nuc_obj=$ana_cdna_db->get_Seq_by_id($dana); > my $ere_pep_obj=$ere_pep_db->get_Seq_by_id($dere); > my $ere_nuc_obj=$ere_cdna_db->get_Seq_by_id($dere); > my $mel_pep_obj=$mel_pep_db->get_Seq_by_id($dmel); > my $mel_nuc_obj=$mel_cdna_db->get_Seq_by_id($dmel); > my $sec_pep_obj=$sec_pep_db->get_Seq_by_id($dsec); > my $sec_nuc_obj=$sec_cdna_db->get_Seq_by_id($dsec); > my $sim_pep_obj=$sim_pep_db->get_Seq_by_id($dsim); > my $sim_nuc_obj=$sim_cdna_db->get_Seq_by_id($dsim); > my $yak_pep_obj=$yak_pep_db->get_Seq_by_id($ddyak); > my $yak_nuc_obj=$yak_cdna_db->get_Seq_by_id($ddyak); > push @prots, $ana_pep_obj; > push @cdna, $ana_nuc_obj; > push @prots, $ere_pep_obj; > push @cdna, $ere_nuc_obj; > push @prots, $mel_pep_obj; > push @cdna, $mel_nuc_obj; > push @prots, $sec_pep_obj; > push @cdna, $sec_nuc_obj; > push @prots, $sim_pep_obj; > push @cdna, $sim_nuc_obj; > push @prots, $yak_pep_obj; > push @cdna, $yak_nuc_obj; > > then I use the @prots as input for my $aln=$aln_factory->align > (\@prots); > This method will create align files with sequences masked. > > But if I use fasta files(not an object) which contain protein > sequences as input, $inputfile='FBgn0000097.pep'; > @params=('outorder'=>'INPUT'); > $factory=Bio::Tools::Run::Alignment::Clustalw->new(@params); > $aln=$factory->align($inputfile); > #$aln->gap_char('-'); > $aln->map_chars('\.','-'); > $aln_out=Bio::AlignIO->new(-file=>">0097.aln",-format=>'clustalw'); > $aln_out->write_aln($aln); > > This methods create files without masking~~~ > I think sequence objects created by "get_Seq_by_id" from sequence > databases directly are not appropriate. > > Thank you for your suggestion again! > > Jiang. > >> From: Chris Fields >> To: ????? >> Subject: Re: [Bioperl-l] AlignIO problems >> Date: Sun, 25 Feb 2007 21:26:34 -0600 >> >> I ran the same using a local fasta formatted file on my system >> which works (no masking). >> >> Of note, the gaps were all marked as '.'. You're gaps were both >> '.' and '-', which may mean that something is wrong with the seq >> objects themselves. Maybe SeqIO is misreading them? >> >> chris >> >> On Feb 25, 2007, at 7:34 PM, ????? wrote: >> >>> I use the Bio::Tools::Run::Alignment::Clustalw module to carry >>> out multiple alignment. >>> my code is: >>> my @clustal_param=('outorder'=>'INPUT'); >>> my $aln_factory=Bio::Tools::Run::Alignment::Clustalw->new >>> (@clustal_param); >>> my $aln=$aln_factory->align(\@prots);###@prots is >>> array of protein sequence objects >>> my $aln_out=Bio::AlignIO->new(-file=>">./dmel_group/ >>> clustal/ ${outfilename}.aln",-format=>'clustalw'); >>> >>> $aln_out->write_aln($aln); >>> This code produce alignment which mask identity residues. >>> But if i use clustalW directly, the output is normal. >>> Thank you for your help~ >>> >>> Jiang >> > > _________________________________________________________________ > ???? MSN Explorer: http://explorer.msn.com/lccn Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From michael.watson at bbsrc.ac.uk Mon Feb 26 11:00:31 2007 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Mon, 26 Feb 2007 16:00:31 -0000 Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna In-Reply-To: <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com> References: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk> <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2EBD3@iahce2ksrv1.iah.bbsrc.ac.uk> Hi Lincoln/List That's great, the axis now appears, but there are no labels. This in itself isn't a problem, as long as we can assume that the tick marks are at 0, 50% and 100%? If that's true, we can go with what we have, otherwise I'm going to have to figure out a way to label the y-axis Thanks Mick ________________________________ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: 15 February 2007 18:53 To: michael watson (IAH-C) Cc: BioPerl-List Subject: Re: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna Hi Michael, When you set up the panel, do this: Bio::Graphics::Panel->new(-blah -blah, -pad_left => 20, -pad_right => 20); This will leave enough room on the left and right for you to see the Y axis. Otherwise it runs off the edge of the image (ok, this is a mis-design, but it was the only way to solve a chicken-and-egg problem about who gets to say how wide the panel is) Lincoln On 2/15/07, michael watson (IAH-C) wrote: Hi OK I have some great images out of this glyph, but I can't see the axis, and nor is it labelled (ie does it go from 0 - 100%?) so isn't great for publication. The docs say: "NOTE: -gc_window=>'auto' gives nice results and is recommended for drawing GC content. The GC content axes draw slightly outside the panel, so you may wish to add some extra padding on the right and left. " Any idea how to do this? Basically, I want a nice GC graph with the axis quite clearly labelled, and a nice "%GC" title next to it :) Thanks Mick The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Mon Feb 26 12:18:38 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 26 Feb 2007 11:18:38 -0600 Subject: [Bioperl-l] AlignIO problems In-Reply-To: References: Message-ID: <7DF958E6-E233-427F-8901-3FE571CD99BD@uiuc.edu> On Feb 26, 2007, at 9:59 AM, ? ?? wrote: > Thank you! > I have checked the sequences retrieved through lots of Bio:DB > objects work simultaneously. > There are not problems you mentioned, the sequences are not > overwritten. Again, keep this on the list. I have my hands full this month so I will be checking the list only very sporadically; someone else may be able to help you. The only explanation for the clustalw output you get is that you are not retrieving the correct sequence in some way fundamental way, which to me indicates the bug originates either in the way the sequences are retrieved (i.e. somehow via Bio::DB::Fasta, hence my thought about conflicting indices) or in the way they are converted via Bio::SeqIO, which is used in Bio::Tools::Run::Alignment::Clustalw. When I have used Bio::DB::Fasta in the past I have never had a problem when indexing multiple files and retrieving sequences, so beyond running tests with your data I can't help you much beyond the above conjecturing. chris From jason at bioperl.org Mon Feb 26 13:45:34 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 26 Feb 2007 10:45:34 -0800 Subject: [Bioperl-l] Question to Bio::Tools::Run RemoteBlast In-Reply-To: <20070226095515.68810@gmx.net> References: <20070226095515.68810@gmx.net> Message-ID: <2D2DF6D9-6DAE-4BB7-B31B-8C19CCCA7301@bioperl.org> Alex - I am glad to see of your interest in the module, but I don't currently have any time to maintain it and so queries should be sent to the BioPerl mailing list. In general we prefer you don't contact developers directly, but use the mailing list so that others can learn from questions. Please note there are several tutorials and documentation on the website, you will get a better response from people if you can show you have at least tried to use the existing example code to construct your program. -jason On Feb 26, 2007, at 1:55 AM, Alexander Auner wrote: > Daer Jason Stajich, > I hope you can me help. > > I am inspired of their module and would like to work with it. > I am a student to the TFH Wildau. > I have problems with the understanding of the module. > > You could send me an example. > > The example is to process a text file (FASTA) with NCBI-Blast (Web). > > Parameter: > Choose database -> Others -> nr > Limit by entrez query -> Campylobacter -> or select from: -> > Bacteria [ORGN] > Expect -> 10 > Other advanced -> -q-1 > > output format > plain text without Graphical Overview > Number of: -> Descriptions -> 10000 > Alignment view -> query-anchored with identities > > All other parameters remain undef. > > Thank you for your help. > > faithfully Alexander Auner > -- > "Feel free" - 5 GB Mailbox, 50 FreeSMS/Monat ... > Jetzt GMX ProMail testen: www.gmx.net/de/go/mailfooter/promail-out From jason at bioperl.org Mon Feb 26 14:13:00 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 26 Feb 2007 11:13:00 -0800 Subject: [Bioperl-l] BioPerl leadership additions Message-ID: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> Dear BioPerl Users and Developers, I want to announce a addition in the leadership of BioPerl. Christopher Fields and and Sendu Bala are now members of the BioPerl Core developer group to recognize their ongoing leadership in the project. Chris and Sendu were instrumental in the 1.5.2 Developer release and have made a significant commitment and contribution to the quality of the code and the documentation of the project. We have invited them to be part of the core to recognize their work and to feel comfortable to ask them to do more. ;-) The Core group was established to insure that someone was responsible for making code releases, vetting new developers for CVS write accounts, and generally dealing with things that might otherwise slip through the cracks. We are very excited to have more people contributing to and maintaining the toolkit. We look forward to their help along with all the other developers, as we work towards a 1.6 release release this year. As always, while their is a need for some individuals to lead the project, we encourage contributions from all levels of expertise to improve the code, documentation, and tutorials of the project. We plan to discuss the progress of the toolkit at this year's Bioinformatics Open Source Conference held in Vienna, Austria in conjunction with the SIG meetings at ISMB. We are trying to use BOSC 2007 as a chance for the developers of Open Bioinformatics Foundation sponsored and related projects to coordinate future development and release cycles. Jason Stajich on behalf of the Core developers From khan at cshl.edu Mon Feb 26 15:29:19 2007 From: khan at cshl.edu (Khan, Sohail) Date: Mon, 26 Feb 2007 15:29:19 -0500 Subject: [Bioperl-l] parsing a list of ids to a fasta file. Message-ID: Thanks Michael. I have the scripts installed. I can pass an id to indexed fasta file and retrieve the seq. However, I was wondering if I can pass a list of ids from a file and get seq. for all the ids? Thanks. -Sohail -----Original Message----- From: michael watson (IAH-C) [mailto:michael.watson at bbsrc.ac.uk] Sent: Tuesday, February 20, 2007 4:33 PM To: Khan, Sohail; Bioperl-l at lists.open-bio.org Subject: RE: [Bioperl-l] parsing a list of ids to a fasta file. Suggest you use Bio::Index::Fasta to create an index for the fasta file and then a simple script to retrieve sequences using that index. Or just use the pre-written bp_index.PLS and bp_fetch.PLS scripts. http://www.bioperl.org/wiki/Module:Bio::Index::Fasta ________________________________ From: bioperl-l-bounces at lists.open-bio.org on behalf of Khan, Sohail Sent: Tue 20/02/2007 8:42 PM To: Bioperl-l at lists.open-bio.org Subject: [Bioperl-l] parsing a list of ids to a fasta file. Dear list, I am new to Bio-Perl. I have the following question: I have a list of ids, which I would like to parse against a large fasta file to retrieve the Seq for parsed ids. I appreciate any suggestions. Thanks. Khan _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From arareko at campus.iztacala.unam.mx Mon Feb 26 16:44:49 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Mon, 26 Feb 2007 15:44:49 -0600 Subject: [Bioperl-l] BioPerl leadership additions In-Reply-To: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> References: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> Message-ID: <45E354D1.4070600@campus.iztacala.unam.mx> Congrats Chris & Sendu! Very well-deserved. Keep up the great work. Cheers! Mauricio. Jason Stajich wrote: > Dear BioPerl Users and Developers, > > I want to announce a addition in the leadership of BioPerl. > Christopher Fields and and Sendu Bala are now members of the BioPerl > Core developer group to recognize their ongoing leadership in the > project. Chris and Sendu were instrumental in the 1.5.2 Developer > release and have made a significant commitment and contribution to > the quality of the code and the documentation of the project. We > have invited them to be part of the core to recognize their work and > to feel comfortable to ask them to do more. ;-) > > The Core group was established to insure that someone was responsible > for making code releases, vetting new developers for CVS write > accounts, and generally dealing with things that might otherwise slip > through the cracks. We are very excited to have more people > contributing to and maintaining the toolkit. We look forward to > their help along with all the other developers, as we work towards a > 1.6 release release this year. > > As always, while their is a need for some individuals to lead the > project, we encourage contributions from all levels of expertise to > improve the code, documentation, and tutorials of the project. > > We plan to discuss the progress of the toolkit at this year's > Bioinformatics Open Source Conference held in Vienna, Austria in > conjunction with the SIG meetings at ISMB. We are trying to use > BOSC 2007 as a chance for the developers of Open Bioinformatics > Foundation sponsored and related projects to coordinate future > development and release cycles. > > Jason Stajich on behalf of the Core developers > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From lubapardo at gmail.com Tue Feb 27 08:26:30 2007 From: lubapardo at gmail.com (Luba Pardo) Date: Tue, 27 Feb 2007 14:26:30 +0100 Subject: [Bioperl-l] parsing blast results Message-ID: <58ff33550702270526q43ab584ci7baf6a09e2584e38@mail.gmail.com> Hi, I am using the module Bio::SearchIO to parse some blast results. I would like to store the ids of the results into an array but I am not sure if this is possible to do it with an existing subroutine. Does anyone have an idea whether there is a method included within the module Bio::SearchIO to do so? Thanks in advance, L.Pardo From cjfields at uiuc.edu Tue Feb 27 09:11:37 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 27 Feb 2007 08:11:37 -0600 Subject: [Bioperl-l] parsing blast results In-Reply-To: <58ff33550702270526q43ab584ci7baf6a09e2584e38@mail.gmail.com> References: <58ff33550702270526q43ab584ci7baf6a09e2584e38@mail.gmail.com> Message-ID: On Feb 27, 2007, at 7:26 AM, Luba Pardo wrote: > Hi, > I am using the module Bio::SearchIO to parse some blast results. I > would > like to store the ids of the results into an array but I am not > sure if this > is possible to do it with an existing subroutine. Does anyone have > an idea > whether there is a method included within the module Bio::SearchIO > to do so? > Thanks in advance, > L.Pardo Bio::SearchIO doesn't currently have a method to retrieve all the accessions in a BLAST result. The best way to do this is to iterate through the objects: my @accs; while (my $result = $searchio->next_result) { while (my $hit = $result->next_hit) { push @accs, $hit->accession; # do whatever else here... } } print join ',', @accs; I don't think all accessions in the description are parsed out at the moment, just the first one (or the one in the hit table). If you want all of them or if you want the NCBI GI you'll need to parse them out of the description heading ($hit->description). chris From sac at bioperl.org Tue Feb 27 12:59:22 2007 From: sac at bioperl.org (Steve Chervitz) Date: Tue, 27 Feb 2007 09:59:22 -0800 Subject: [Bioperl-l] [Bioperl-announce-l] BioPerl leadership additions In-Reply-To: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> References: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> Message-ID: <8f200b4c0702270959n7521f693l915bfabe7ccb7ef7@mail.gmail.com> Welcome to the club, Chris & Sendu. Always good to have an infusion of new blood and capable, motivated hands. Steve On 2/26/07, Jason Stajich wrote: > > Dear BioPerl Users and Developers, > > I want to announce a addition in the leadership of BioPerl. > Christopher Fields and and Sendu Bala are now members of the BioPerl > Core developer group to recognize their ongoing leadership in the > project. Chris and Sendu were instrumental in the 1.5.2 Developer > release and have made a significant commitment and contribution to > the quality of the code and the documentation of the project. We > have invited them to be part of the core to recognize their work and > to feel comfortable to ask them to do more. ;-) > > The Core group was established to insure that someone was responsible > for making code releases, vetting new developers for CVS write > accounts, and generally dealing with things that might otherwise slip > through the cracks. We are very excited to have more people > contributing to and maintaining the toolkit. We look forward to > their help along with all the other developers, as we work towards a > 1.6 release release this year. > > As always, while their is a need for some individuals to lead the > project, we encourage contributions from all levels of expertise to > improve the code, documentation, and tutorials of the project. > > We plan to discuss the progress of the toolkit at this year's > Bioinformatics Open Source Conference held in Vienna, Austria in > conjunction with the SIG meetings at ISMB. We are trying to use > BOSC 2007 as a chance for the developers of Open Bioinformatics > Foundation sponsored and related projects to coordinate future > development and release cycles. > > Jason Stajich on behalf of the Core developers > > _______________________________________________ > Bioperl-announce-l mailing list > Bioperl-announce-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l > From cjfields at uiuc.edu Tue Feb 27 15:57:40 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 27 Feb 2007 14:57:40 -0600 Subject: [Bioperl-l] Bio::SeqIO::FTHelper Message-ID: Could anyone tell me what FTHelper is used for? From what I gather it rolls up seqfeature data into a lightweight object but then creates a SeqFeature::Generic anyway (at least for GenBank/EMBL/ Swiss), which seems to be a waste of memory and time. Is there something I'm missing (besides my sanity of course)? chris From Jay at jays.net Wed Feb 28 04:39:55 2007 From: Jay at jays.net (Jay Hannah) Date: Wed, 28 Feb 2007 03:39:55 -0600 Subject: [Bioperl-l] "Command-Line Bioinformatics" Message-ID: Reading this article: http://www.linuxjournal.com/article/6977 Sequencing the SARS Virus - Linux Journal, Nov 2003 This guy needs Perl and/or BioPerl. :) > The sequence file is in FASTA format consisting of a header line > and the sequence, split into fixed-width lines. The following > counts the number of Gs and Cs in the sequence and presents the > total as a fraction of the total number of bases: > > > grep -v "^>" AY274119.fa | fold -w 1 | > tr "ATGC" "..xx" | sort | uniq -c | > sed 's/[^0-9]//g' | t -s "\012" " " | > sed 's/\([0-9]*\) \([0-9]*\)/scale = 3; > ?\2 \/ (\1+\2)/' | > bc -i > scale = 3; 12127 / (17624+12127) > .407 > > Out of the 29,751 bases in our sequence, 12,127 are either G or C, > giving a GC content of 41%. BioPerl version: use Bio::SeqIO; my $io = Bio::SeqIO->new( -file => 'AY274119.fa', -format => 'Fasta' ); my $seq = $io->next_seq->seq; print ( ($seq =~ tr/GC/GC/) / length ($seq) ); Command-line Perl: perl -e '$/ = undef; $_ = <>; s/>.*//; s/\n//g; print tr/GC/GC/ / length($_)' AY274119.fa I'm sure you can Perl Golf my stabs at it. :) j seqlab.net http://www.bioperl.org/wiki/User:Jhannah From n.saunders at uq.edu.au Wed Feb 28 05:25:08 2007 From: n.saunders at uq.edu.au (Neil Saunders) Date: Wed, 28 Feb 2007 20:25:08 +1000 Subject: [Bioperl-l] Bio::Factory::EMBOSS, CGI and taint Message-ID: <45E55884.9010908@uq.edu.au> Dear Bioperlers, I'm trying to understand an error that occurs when Bio::Factory::EMBOSS is used in a CGI script. Using BioPerl 1.5.2 on Ubuntu Dapper, Apache 2.0.55, Perl 5.8.7. If I load this test CGI script (cgi.pl) in a browser: BEGIN CODE ---------- #!/usr/bin/perl -Tw use strict; use CGI; use Bio::Factory::EMBOSS; my $cgi = new CGI; my $f = new Bio::Factory::EMBOSS; print $cgi->header, $cgi->start_html, $cgi->end_html; -------- END CODE I get a 500 server error and the Apache error log reads: [error] [client 192.168.0.3] Premature end of script headers: cgi.pl I can fix this in 2 ways: (1) Move the "my $f = new Bio::Factory::EMBOSS" line to the end of the script, which isn't a very useful fix. (2) Remove the -T switch from the shebang line There seem to be a few old posts on the list regarding "taint-safe" modules. It seems that the new Bio::Factory::EMBOSS object is interfering with the headers in some way, but I'm no CGI.pm guru and wondered if anyone could shed light on this. thanks, Neil -- School of Molecular and Microbial Sciences University of Queensland Brisbane 4072 Australia http://nsaunders.wordpress.com From n.saunders at uq.edu.au Wed Feb 28 05:30:31 2007 From: n.saunders at uq.edu.au (Neil Saunders) Date: Wed, 28 Feb 2007 20:30:31 +1000 Subject: [Bioperl-l] more on Bio::Factory::EMBOSS, CGI and taint Message-ID: <45E559C7.1090308@uq.edu.au> Further to my previous email, adding: BEGIN { $|=1; print "Content-type: text/html\n\n"; use CGI::Carp('fatalsToBrowser'); } to my CGI script generates: Insecure $ENV{PATH} while running with -T switch at /usr/local/share/perl/5.8.7/Bio/Factory/EMBOSS.pm line 251. Neil -- School of Molecular and Microbial Sciences University of Queensland Brisbane 4072 Australia http://nsaunders.wordpress.com From n.saunders at uq.edu.au Wed Feb 28 05:50:58 2007 From: n.saunders at uq.edu.au (Neil Saunders) Date: Wed, 28 Feb 2007 20:50:58 +1000 Subject: [Bioperl-l] CGI taint solved Message-ID: <45E55E92.10608@uq.edu.au> Apologies for running a one-man thread, but I realised that I've now answered my own question regarding errors with CGI, Bio::Factory::EMBOSS and taint. Given that the EMBOSS binaries are in /usr/local/bin, adding: $ENV{'PATH'} = '/usr/local/bin' near the top of the script does the trick. Neil -- School of Molecular and Microbial Sciences University of Queensland Brisbane 4072 Australia http://nsaunders.wordpress.com From cjfields at uiuc.edu Wed Feb 28 08:39:24 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Feb 2007 07:39:24 -0600 Subject: [Bioperl-l] CGI taint solved In-Reply-To: <45E55E92.10608@uq.edu.au> References: <45E55E92.10608@uq.edu.au> Message-ID: That could possibly clobber any other program calls from within the same script (unless they reside in /usr/local/bin) since you're explicitly assigning PATH, not appending: $ENV{"PATH"} = '/usr/local/bin'; gets me (printing $ENV{"PATH"}): /usr/local/bin whereas this: $ENV{"PATH"} = '/usr/local/bin:' . $ENV{"PATH"}; gets me: /usr/local/bin:/Users/cjfields/bin:/Users/cjfields/dart/bin:/usr/ local/mysql/bin:/usr/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin There's probably a File::* module that does this safely per OS flavor. chris On Feb 28, 2007, at 4:50 AM, Neil Saunders wrote: > Apologies for running a one-man thread, but I realised that I've > now answered my > own question regarding errors with CGI, Bio::Factory::EMBOSS and > taint. > > Given that the EMBOSS binaries are in /usr/local/bin, adding: > > $ENV{'PATH'} = '/usr/local/bin' > > near the top of the script does the trick. > > > Neil > -- > School of Molecular and Microbial Sciences > University of Queensland > Brisbane 4072 Australia > > http://nsaunders.wordpress.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From stefan.kirov at bms.com Wed Feb 28 10:35:31 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Wed, 28 Feb 2007 10:35:31 -0500 Subject: [Bioperl-l] CGI taint solved In-Reply-To: References: <45E55E92.10608@uq.edu.au> Message-ID: <45E5A143.3080303@bms.com> Neil, I believe this is your situation: http://wn.cyberwerks.com/2000/0411.html my advice: any commands executed from within cgi script should have a path hardcoded whenever possible. If those commands require different path, try writing a wrapper shell script that sets the environment (which should be reset to the default once the shell script terminates). It all also depends on the type of environment you have- it it is not secure you may wish to think hard how to eliminate all security loopholes with CGI, I am definitely not an expert on this. Stefan Chris Fields wrote: > That could possibly clobber any other program calls from within the > same script (unless they reside in /usr/local/bin) since you're > explicitly assigning PATH, not appending: > > $ENV{"PATH"} = '/usr/local/bin'; > > gets me (printing $ENV{"PATH"}): > > /usr/local/bin > > whereas this: > > $ENV{"PATH"} = '/usr/local/bin:' . $ENV{"PATH"}; > > gets me: > > /usr/local/bin:/Users/cjfields/bin:/Users/cjfields/dart/bin:/usr/ > local/mysql/bin:/usr/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin > > There's probably a File::* module that does this safely per OS flavor. > > chris > > On Feb 28, 2007, at 4:50 AM, Neil Saunders wrote: > > >> Apologies for running a one-man thread, but I realised that I've >> now answered my >> own question regarding errors with CGI, Bio::Factory::EMBOSS and >> taint. >> >> Given that the EMBOSS binaries are in /usr/local/bin, adding: >> >> $ENV{'PATH'} = '/usr/local/bin' >> >> near the top of the script does the trick. >> >> >> Neil >> -- >> School of Molecular and Microbial Sciences >> University of Queensland >> Brisbane 4072 Australia >> >> http://nsaunders.wordpress.com >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From lubapardo at gmail.com Wed Feb 28 12:21:07 2007 From: lubapardo at gmail.com (Luba Pardo) Date: Wed, 28 Feb 2007 18:21:07 +0100 Subject: [Bioperl-l] retrieven ids Message-ID: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com> Hi everyone, I wonder if someone could give an advice of the following: I want to retrieve the DNA coding sequence of a RefSeq protein id. I do not want to translate the protein back to DNA, but rather get the DNA coding sequence ID and then retrieve the DNA sequence from Gen Bank. Is there any module that allow to get all possible ids for a sequence given a gi protein ? Thank you very much in advance, L. Pardo From johnston at biochem.ucl.ac.uk Wed Feb 28 12:05:49 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Wed, 28 Feb 2007 17:05:49 +0000 (GMT) Subject: [Bioperl-l] _rearrange Message-ID: hi, Is there a discussion of the rationale behind the _rearrange method somewhere? I'm probably just being gormless, but I think I'm missing the point a bit. Is it okay for a method just to expect named params like ->foo(arg1=>'stuff', arg2=>'things'); ? Cxx From ckuanglim at yahoo.com Wed Feb 28 10:51:50 2007 From: ckuanglim at yahoo.com (Chan Kuang Lim) Date: Wed, 28 Feb 2007 07:51:50 -0800 (PST) Subject: [Bioperl-l] Problem of Installing Bioperl Message-ID: <459942.77644.qm@web60518.mail.yahoo.com> I have problem of installing bioperl in windows using command-line installation. In the cmd windows, after ppm-shell search bioperl install 2 many downloading had done, but the next line is: Unpacking bioperl-1.5.2_100...ppm install failed: Can't extract files from C:.............../Bioperl-1.5.2_100.tgz Hope you can answer my question. Thank you. Regards, Chan Kuang Lim Malaysia --------------------------------- TV dinner still cooling? Check out "Tonight's Picks" on Yahoo! TV. From cjfields at uiuc.edu Wed Feb 28 13:30:45 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Feb 2007 12:30:45 -0600 Subject: [Bioperl-l] _rearrange In-Reply-To: References: Message-ID: <25C736A2-2DCA-413B-8F92-D799F583515B@uiuc.edu> From what I gather it's a convenient utility method that is used for consistent and enforced parameter checking/setting for any method, including the constructor. There are a few modules that don't use _rearrange (Bio::WebAgent::new () comes to mind). It's not required that you use it but the naming conventions for parameters outlined in _rearrange (in Bio::Root::RootI POD) are generally enforced for consistency across classes. As a note, Sendu has committed a related method (_set_from_args) to CVS which works rather well, but I don't think it is in the last release. chris On Feb 28, 2007, at 11:05 AM, Caroline Johnston wrote: > hi, > > Is there a discussion of the rationale behind the _rearrange method > somewhere? I'm probably just being gormless, but I think I'm > missing the > point a bit. > > Is it okay for a method just to expect named params like > ->foo(arg1=>'stuff', arg2=>'things'); ? > > Cxx > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dmessina at wustl.edu Wed Feb 28 14:31:29 2007 From: dmessina at wustl.edu (Dave Messina) Date: Wed, 28 Feb 2007 13:31:29 -0600 (CST) Subject: [Bioperl-l] retrieven ids In-Reply-To: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com> References: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com> Message-ID: <57419.10.0.7.57.1172691089.squirrel@gscmail.wustl.edu> Whenever I'm unsure of how to do something, I first look to see if one of the HOWTOs on bioperl.org covers it. In this case, the Features HOWTO has example code which I think will do what you want. Genbank records typically have the coding sequence of a protein as a feature, so I would do something like: - use the RefSeq protein IDs to query Entrez and get back the Genbank records. - read the Features HOWTO to refresh my memory on the syntax for grabbing features. That HOWTO is at: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation - whip up a little script to loop through the Genbank records one at a time with SeqIO and pull out the cDNA sequence features. Dave From bix at sendu.me.uk Wed Feb 28 14:38:46 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 28 Feb 2007 19:38:46 +0000 Subject: [Bioperl-l] _rearrange In-Reply-To: References: Message-ID: <45E5DA46.3020503@sendu.me.uk> Caroline Johnston wrote: > hi, > > Is there a discussion of the rationale behind the _rearrange method > somewhere? I'm probably just being gormless, but I think I'm missing the > point a bit. > > Is it okay for a method just to expect named params like > ->foo(arg1=>'stuff', arg2=>'things'); ? The Bioperl style for named args is -arg1, and wrong case is allowed as well. So, make use of _rearrange; it won't do you any harm. From johnsonm at gmail.com Wed Feb 28 14:59:09 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Wed, 28 Feb 2007 13:59:09 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer Message-ID: I happen to need something like Bio::Tools::Run::Genemark, so I'm coding one up. When I started on the tests for it, I realized I have a problem. I can distribute a fasta file downloaded from GenBank to use as input, but I can't distribute the model file needed to actually run Genemark ( Genemark.hmm for prokaryotes, gmhmmp, in my case). It took *forever* to get a license, and I'm not thrilled with the prospect of talking them out of a redistributable model file. I'd love to distribute the test, but I don't see how I'm going to be able to. Suggestions? Also, I've settled on IPC::Run instead of system(). The docs indicate the bits of it I'm using should be OK on Windows, except maybe for Win9X. I don't want to clutter up the console, I don't like embedding stdout/stderr redirection in command strings, and I don't want to have to worry about signal handling (What if the child catches a ctrl-c halfway through parsing? What if the parent does?). Anybody object to that? One final thing. I'm lazy, I don't want to deal with parsing arguments to the constructor, so I'm just calling _rearrange() to deal with it. The Bio::Tools:: parsers all take dash options, but it looks like a bunch of the stuff in Bio::Tools::Run:: takes dashless args. Objections? From dmessina at wustl.edu Wed Feb 28 15:14:56 2007 From: dmessina at wustl.edu (Dave Messina) Date: Wed, 28 Feb 2007 14:14:56 -0600 (CST) Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: References: Message-ID: <57538.10.0.7.57.1172693696.squirrel@gscmail.wustl.edu> > I'm not thrilled with the prospect of talking them out of a redistributable > model file. I suppose it's not possible to fake your own, or at least the parts of it you're testing for? If not, I'd put the tests in a skip block while waiting to hear from the Genemark folks. > The Bio::Tools:: parsers all take dash options, but it looks like a bunch of > the stuff in Bio::Tools::Run:: takes dashless args. Objections? Sendu will chime in I'm sure, but I think he was planning to switch everything in Bio::Tools::Run over to dashed args anyway... Dave From bix at sendu.me.uk Wed Feb 28 15:52:23 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 28 Feb 2007 20:52:23 +0000 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: References: Message-ID: <45E5EB87.9020106@sendu.me.uk> Mark Johnson wrote: > One final thing. I'm lazy, I don't want to deal with parsing arguments > to the constructor, so I'm just calling _rearrange() to deal with it. The > Bio::Tools:: parsers all take dash options, but it looks like a bunch of the > stuff in Bio::Tools::Run:: takes dashless args. Objections? You can make use of _set_from_args(). See Bio::Tools::Run::Phylo::Gumby for an example. From bix at sendu.me.uk Wed Feb 28 16:29:32 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 28 Feb 2007 21:29:32 +0000 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails Message-ID: <45E5F43C.9080902@sendu.me.uk> I have GD 2.35 and GD::SVG 2.33 installed. I have a working script in which a Bio::Graphics::Panel object is made and output with: print $panel->png; This is fine. Changing it to: print $panel->svg; Gives the error: Can't locate object method "svg" via package "GD:Image" at /.../Bio/Graphics/Panel.pm line 971, line 192. Am I supposed to do something else to get this to work? Cheers, Sendu. From crabtree at tigr.ORG Wed Feb 28 16:40:52 2007 From: crabtree at tigr.ORG (Jonathan Crabtree) Date: Wed, 28 Feb 2007 16:40:52 -0500 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <45E5F43C.9080902@sendu.me.uk> References: <45E5F43C.9080902@sendu.me.uk> Message-ID: <45E5F6E4.80003@tigr.org> Sendu- I believe you must set 'image_class' to 'GD::SVG' when you create the Panel (and note that older versions of Bio::Graphics::Panel don't know anything about this parameter.) Here's the relevant part of the Panel perldoc: -image_class To create output in scalable vector graphics (SVG), optionally pass the image class parameter 'GD::SVG'. Defaults to using vanilla GD. See the corresponding image_class() method below for details. Jonathan Sendu Bala wrote: > I have GD 2.35 and GD::SVG 2.33 installed. > > I have a working script in which a Bio::Graphics::Panel object is made > and output with: > > print $panel->png; > > This is fine. Changing it to: > > print $panel->svg; > > Gives the error: > > Can't locate object method "svg" via package "GD:Image" at > /.../Bio/Graphics/Panel.pm line 971, line 192. > > > Am I supposed to do something else to get this to work? > > > Cheers, > Sendu. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bix at sendu.me.uk Wed Feb 28 17:01:17 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 28 Feb 2007 22:01:17 +0000 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <45E5F6E4.80003@tigr.org> References: <45E5F43C.9080902@sendu.me.uk> <45E5F6E4.80003@tigr.org> Message-ID: <45E5FBAD.3030404@sendu.me.uk> Jonathan Crabtree wrote: > > Sendu- > > I believe you must set 'image_class' to 'GD::SVG' when you create the > Panel (and note that older versions of Bio::Graphics::Panel don't know > anything about this parameter.) Here's the relevant part of the Panel > perldoc: ... Oh! I had no idea there was any perldoc for these modules, hiding down there at the bottom. Does anyone want to intersperse the docs?... From cjfields at uiuc.edu Wed Feb 28 17:10:54 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Feb 2007 16:10:54 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: References: Message-ID: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> On Feb 28, 2007, at 1:59 PM, Mark Johnson wrote: > I happen to need something like Bio::Tools::Run::Genemark, so > I'm coding > one up. When I started on the tests for it, I realized I have a > problem. I > can distribute a fasta file downloaded from GenBank to use as > input, but I > can't distribute the model file needed to actually run Genemark ( > Genemark.hmm for prokaryotes, gmhmmp, in my case). > It took *forever* to get a license, and I'm not thrilled with the > prospect of talking them out of a redistributable model file. I'd > love to > distribute the test, but I don't see how I'm going to be able to. > Suggestions? For bioperl-run tests you have to have the program installed for tests to work (otherwise they are passed over). Therefore one would assume if you had the GeneMark program you would have the models as well. You could set up your module to require an env. variable be set (like the HMMER module, for instance) which contains the executables and/or the models, so that if it isn't set the tests are skipped. > Also, I've settled on IPC::Run instead of system(). The docs > indicate > the bits of it I'm using should be OK on Windows, except maybe for > Win9X. > I don't want to clutter up the console, I don't like embedding > stdout/stderr > redirection in command strings, and I don't want to have to worry > about > signal handling (What if the child catches a ctrl-c halfway through > parsing? What if the parent does?). Anybody object to that? I wouldn't worry too much about Win9x. Is IPC::Run in perl core? Otherwise we'll need to add it to the optional dependencies for bioperl-run. > One final thing. I'm lazy, I don't want to deal with parsing > arguments > to the constructor, so I'm just calling _rearrange() to deal with > it. The > Bio::Tools:: parsers all take dash options, but it looks like a > bunch of the > stuff in Bio::Tools::Run:: takes dashless args. Objections? Sendu's suggestion (_set_from_args() ) is the best. As mentioned in another thread _rearrange() works as well. chris From johnsonm at gmail.com Wed Feb 28 17:29:36 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Wed, 28 Feb 2007 16:29:36 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: <57538.10.0.7.57.1172693696.squirrel@gscmail.wustl.edu> References: <57538.10.0.7.57.1172693696.squirrel@gscmail.wustl.edu> Message-ID: On 2/28/07, Dave Messina wrote: > > > I'm not thrilled with the prospect of talking them out of a > redistributable model file. > > I suppose it's not possible to fake your own, or at least the parts of it > you're testing for? We got a gzipped tarball with some model files and a precompiled executable (gmhmmp). As far as building a model file goes, I don't even have two sticks to rub together. I suppose it's possible that it's not actually some weird proprietary format, I'll go dig for some docs...but I don't hold out a lot of hope. From sukhinder.sandhu at osumc.edu Wed Feb 28 16:49:31 2007 From: sukhinder.sandhu at osumc.edu (Sukhinder Sandhu) Date: Wed, 28 Feb 2007 16:49:31 -0500 Subject: [Bioperl-l] Problem installing bioperl: plz reply soon. thx Message-ID: Hi I am having trouble installing Bundle::BioPerl through CPAN. I don't know if this has something to do with my having root priveleges. Can you please suggest how may I proceed to get over this. I shall really appreciate any help. I am pasting part of the error it keeps giving after trying to install every module. ###################### CPAN.pm: Going to build G/GA/GAAS/HTML-Parser-3.56.tar.gz make: *** No rule to make target `/System/Library/Perl/5.8.6/darwin-thread-multi-2level/CORE/config.h', needed by `Makefile'. Stop. /usr/bin/make -- NOT OK Running make test Can't test without successful make Running make install make had returned bad status, install seems impossible ############################### Thanks sukhinder From sukhinder.sandhu at osumc.edu Tue Feb 27 23:41:43 2007 From: sukhinder.sandhu at osumc.edu (Sukhinder Sandhu) Date: Tue, 27 Feb 2007 23:41:43 -0500 Subject: [Bioperl-l] Problem installing bioperl-1.5.2_102 Message-ID: Hi I am trying to install bioperl on my MACOSX and having problems. I try to following the instructions both at the www.tc.umn.edu..... And in the README and INSTALL files in the bioperl folder that I downloaded. The error I get is the following: (end part of the output is copied) #################### t/versions........ok t/xs..............skipped all skipped: C_support not enabled Failed Test Stat Wstat Total Fail Failed List of Failed ---------------------------------------------------------------------------- --- t/compat.t 5 1280 60 5 8.33% 25-28 31 4 tests and 31 subtests skipped. Failed 1/22 test scripts, 95.45% okay. 5/683 subtests failed, 99.27% okay. make: *** [test] Error 2 /usr/bin/make test -- NOT OK Running make install make test had returned bad status, won't install without force Couldn't install Module::Build, giving up. BEGIN failed--compilation aborted at ModuleBuildBioperl.pm line 51. Compilation failed in require at Build.PL line 14. BEGIN failed--compilation aborted at Build.PL line 14. ########################################################################### I am not able to figure out whats' going wrong. And when I try to run the CPAN, I get the follwing error. I have no idea how to fix these. Any help is greatly appreciated. ############################################################################ [Sukhinders-Computer:~/Desktop/bioperl-1.5.2_102] sand60% perl -MCPAN -e shell Terminal does not support AddHistory. There seems to be running another CPAN process (pid 7207). Contacting... Lockfile /Users/sand60/.cpan/.lock not writeable by you. Cannot proceed. On UNIX try: rm /Users/sand60/.cpan/.lock and then rerun us. at -e line 1 ################################################### And doing what it says, removing some lock file doesn't help. I am wondering if all this has something to do with having root priveleges on the system and if so , is there an alternative? Thanks sukhinder From stefan.kirov at bms.com Wed Feb 28 16:44:05 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Wed, 28 Feb 2007 16:44:05 -0500 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <45E5F43C.9080902@sendu.me.uk> References: <45E5F43C.9080902@sendu.me.uk> Message-ID: <45E5F7A5.3090805@bms.com> I think you should create the object with -image_class='svg'. Can you post the code with wich you create the object? Stefan Sendu Bala wrote: > I have GD 2.35 and GD::SVG 2.33 installed. > > I have a working script in which a Bio::Graphics::Panel object is made > and output with: > > print $panel->png; > > This is fine. Changing it to: > > print $panel->svg; > > Gives the error: > > Can't locate object method "svg" via package "GD:Image" at > /.../Bio/Graphics/Panel.pm line 971, line 192. > > > Am I supposed to do something else to get this to work? > > > Cheers, > Sendu. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From johnsonm at gmail.com Wed Feb 28 17:54:02 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Wed, 28 Feb 2007 16:54:02 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> References: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> Message-ID: On 2/28/07, Chris Fields wrote: > For bioperl-run tests you have to have the program installed for > tests to work (otherwise they are passed over). Therefore one would > assume if you had the GeneMark program you would have the models as > well. > > You could set up your module to require an env. variable be set (like > the HMMER module, for instance) which contains the executables and/or > the models, so that if it isn't set the tests are skipped. Sounds like a plan. I wouldn't worry too much about Win9x. Is IPC::Run in perl core? > Otherwise we'll need to add it to the optional dependencies for > bioperl-run. I'd test it, but I don't have access to any Win9x boxes anymore. IPC::Run is not a core module, but I think it's worth the dependency. I considered IPC::Open3, but it can't be made reliable on Win32, something about not being able to select() on filehandles, only sockets. I also looked at IPC::Run3, but under the hood, it's just got STDOUT/STDERR redirection layered on top of system(). I don't like using system() due to issues with signals (Such as the user hitting ctrl-c and taking out the child). I feel better knowing the wrapped executable is in another process disconnected from the console. Sendu's suggestion (_set_from_args() ) is the best. As mentioned in > another thread _rearrange() works as well. I'm using _rearrange() now. I'll look at _set_from_args(). Is either one preferred to the other? From bix at sendu.me.uk Wed Feb 28 19:13:29 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 01 Mar 2007 00:13:29 +0000 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: References: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> Message-ID: <45E61AA9.9030906@sendu.me.uk> Mark Johnson wrote: > I'm using _rearrange() now. I'll look at _set_from_args(). Is either one > preferred to the other? _set_from_args() is implemented using _rearrange() iirc. In any case, they do different things but _set_from_args() just makes creating wrapper modules a lot simpler. Another example: compare revisions 1.15 and 1.16 of Bio::Tools::Run::Alignment::Lagan where I reimplemented it to use _set_from_args() and _setparams(). http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-run/Bio/Tools/Run/Alignment/Lagan.pm.diff?r1=text&tr1=1.15&r2=text&tr2=1.16&diff_format=h So, its new, but I'd recommend new modules, especially wrappers, make use of it. From bix at sendu.me.uk Wed Feb 28 19:19:29 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 01 Mar 2007 00:19:29 +0000 Subject: [Bioperl-l] Problem of Installing Bioperl In-Reply-To: <459942.77644.qm@web60518.mail.yahoo.com> References: <459942.77644.qm@web60518.mail.yahoo.com> Message-ID: <45E61C11.90806@sendu.me.uk> Chan Kuang Lim wrote: > I have problem of installing bioperl in windows using command-line installation. > In the cmd windows, after > ppm-shell > search bioperl > install 2 > > many downloading had done, but the next line is: > Unpacking bioperl-1.5.2_100...ppm install failed: Can't extract files from C:.............../Bioperl-1.5.2_100.tgz Does that file exist on your system? Is it larger than 0kb? Can you open it yourself? From cjfields at uiuc.edu Wed Feb 28 20:19:31 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Feb 2007 19:19:31 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: <45E61AA9.9030906@sendu.me.uk> References: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> <45E61AA9.9030906@sendu.me.uk> Message-ID: <93734147-BDDE-4D73-B8F1-FB4A9D073F9B@uiuc.edu> On Feb 28, 2007, at 6:13 PM, Sendu Bala wrote: > Mark Johnson wrote: >> I'm using _rearrange() now. I'll look at _set_from_args(). Is >> either one >> preferred to the other? > > _set_from_args() is implemented using _rearrange() iirc. In any case, > they do different things but _set_from_args() just makes creating > wrapper modules a lot simpler. Another example: compare revisions 1.15 > and 1.16 of Bio::Tools::Run::Alignment::Lagan where I reimplemented it > to use _set_from_args() and _setparams(). > > http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-run/Bio/Tools/Run/ > Alignment/Lagan.pm.diff? > r1=text&tr1=1.15&r2=text&tr2=1.16&diff_format=h > > So, its new, but I'd recommend new modules, especially wrappers, make > use of it. Agreed; I think it allows for parameter variations (dashed, dashless, etc) and can create on-the-fly simple get/setters, so is particularly suited for wrappers. _rearrange() will always have use in situations where using named parameters helps (long arg lists) but you don't want get/setters, just values. From dmessina at wustl.edu Wed Feb 28 20:40:39 2007 From: dmessina at wustl.edu (Dave Messina) Date: Wed, 28 Feb 2007 19:40:39 -0600 (CST) Subject: [Bioperl-l] Problem installing bioperl-1.5.2_102 In-Reply-To: References: Message-ID: <58485.75.33.119.169.1172713239.squirrel@gscmail.wustl.edu> > t/compat.t 5 1280 60 5 8.33% 25-28 31 This is the test that failed. I think you snipped the part above where the actual errors causing the failure was printed. > There seems to be running another CPAN process (pid 7207). Contacting... > Lockfile /Users/sand60/.cpan/.lock not writeable by you. Cannot proceed. > On UNIX try: > rm /Users/sand60/.cpan/.lock > and then rerun us. > at -e line 1 > ################################################### > And doing what it says, removing some lock file doesn't help. Are you sure the lock file is really being removed? If so, what was the error you got when running it after doing that? Also, this line is important: > /usr/bin/make test -- NOT OK It looks like you're trying to install on OS X. By default, OS X has perl but not make. So /usr/bin/make probably doesn't exist on your system, along with lots of other UNIX tools you'll want. To verify this, type: which /usr/bin/make on the command line. If you get: /usr/bin/make: Command not found. you'll need to install the OS X developer tools, called Xcode. You'll need to register first, but you can get the latest version at: http://developer.apple.com/tools/download/ After you do that, reread the BioPerl install docs and try to install again. Since you don't have root on your machine, be sure to read the part of the install instructions that describe what to do. Dave From hlapp at gmx.net Wed Feb 28 23:16:38 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 28 Feb 2007 23:16:38 -0500 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: References: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> Message-ID: On Feb 28, 2007, at 5:54 PM, Mark Johnson wrote: > I don't like using system() due to issues with > signals (Such as the user hitting ctrl-c and taking out the > child). I feel > better knowing the wrapped executable is in another process > disconnected > from the console. I'm not sure how the user would be able to take out the child hitting ctrl-c if you run it through system() (except if the parent terminates first - but maybe then terminating a run-away child is in good order). I haven't read the IPC::run POD in full detail but you will want to make sure that if the parent gets killed the child does get killed too, or otherwise you'll have a run-away process that novices will have trouble with understanding or terminating. Other than that though IPC::run seems like a useful module, so incurring this as a dependency should be fine. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cuiw at ncbi.nlm.nih.gov Thu Feb 1 14:47:38 2007 From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C]) Date: Thu, 1 Feb 2007 09:47:38 -0500 Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ? In-Reply-To: <45C1059D.1070100@tbi.univie.ac.at> References: <45C1059D.1070100@tbi.univie.ac.at> Message-ID: <18C407FD4FFB424292D769FBD68C1987020BB960@NIHCESMLBX8.nih.gov> This is a simple test from gene ID 3632373 (protein is 46100068) to contig coordinates: perl -MLWP::Simple -e 'map {print $_, "\n" if /<(Gene-source_src.*?>)(.*)?<$1/} (split "\n", get(q{http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&i d=3632373&retmode=xml}))' You need to translate protein id to gene id though. If the genome is available at Map Viewer, try (the contig name is NW_101115 from last step) http://www.ncbi.nlm.nih.gov/mapview/maps.cgi?taxid=5270&gnl=NW_101115&MA PS=genes&cmd=txt Wenwu Cui, PhD -----Original Message----- From: Rainer Machne [mailto:raim at tbi.univie.ac.at] Sent: Wednesday, January 31, 2007 4:10 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ? Dear Bioperl list, hoping not be on the wrong email list, i would have a short question: Is there a standard way or are there nice (Bioperl) tools to come from a gene id (gi) other ids (see below) to the genomic coordinates of the respective gene? We have Fasta files retrieved from NCBI protein Blast in fungal genomes: >gi|46100068|gb|EAK85301.1| hypothetical protein UM04252.1 [Ustilago maydis 521] or >gi|50292953|ref|XP_448909.1| unnamed protein product [Candida glabrata] (we only have gi, ref and gb in my set). I retrieved all my fasta files from whole fungal genomes with available protein sequences at http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?organism=fungi As I only searched whole finished genomes (not shotgun), I thought it would then be easy to get the genomic coordinates and retrieve upstream sequences, but we have failed so far to find a consistent way to do this automatically. Many of the gi entries refer to mRNAs or partial mRNAs and the way to the coordinates seems to differ for each case. Any suggestions would be appreciated. with kind regards, Rainer Machne University of Vienna Department for Theoretical Chemistry Theoretical Biochemistry Group _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From raim at tbi.univie.ac.at Thu Feb 1 12:54:21 2007 From: raim at tbi.univie.ac.at (Rainer Machne) Date: Thu, 01 Feb 2007 13:54:21 +0100 Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ? In-Reply-To: References: <45C1059D.1070100@tbi.univie.ac.at> Message-ID: <45C1E2FD.3070709@tbi.univie.ac.at> Barry and Jason, thanks for your quick and very helpful replies. I guess we should have done (or repeat) our blast search at http://fungal.genome.duke.edu/ to get better mapping from proteins to genomes ? As I retrieved all my proteins via whole genome blasts we should find (most of) them in the genbank files ... a good opportunity for me to learn some Bioperl and the other packages you mentioned in case we want to do more complex analysis later :-) Thank you very much! Rainer Barry Moore wrote: > Rainer, > > We use a perl library called CGL written by Mark Yandell and colleagues > (which in turn uses Chris Mungal's BioChaos and Unflattener.pm referred > to by Jason) for this type of task. The basic pipeline is convert > GenBank files to Chaos XML, then use CGL with those XML files to get a > nice object oriented access to exons, transcripts, proteins, > coordinates and more for of those genes. I am currently using this > with good success on most GenBank genomes (unfortunately I haven't been > working with the fungal genomes, but it should work fine). The Ensembl > API provides similar functionality for Ensembl genomes - but not very > many fungi there. > > http://www.yandell-lab.org/cgl/ > http://www.ensembl.org/info/software/core/core_tutorial.html > > Feel free to contact Mark or myself directly if you are interested in > using CGL. > > Barry > > On Jan 31, 2007, at 2:09 PM, Rainer Machne wrote: > >> Dear Bioperl list, >> >> hoping not be on the wrong email list, i would have a short question: >> >> Is there a standard way or are there nice (Bioperl) tools to come from a >> gene id (gi) other ids (see below) to the genomic coordinates of the >> respective gene? >> >> We have Fasta files retrieved from NCBI protein Blast in fungal genomes: >> >>> gi|46100068|gb|EAK85301.1| hypothetical protein UM04252.1 [Ustilago >> >> maydis 521] >> or >> >>> gi|50292953|ref|XP_448909.1| unnamed protein product [Candida glabrata] >> >> >> (we only have gi, ref and gb in my set). >> >> I retrieved all my fasta files from whole fungal genomes with available >> protein sequences at >> http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?organism=fungi >> >> As I only searched whole finished genomes (not shotgun), I thought it >> would then be easy to get the genomic coordinates and retrieve upstream >> sequences, but we have failed so far to find a consistent way to do this >> automatically. Many of the gi entries refer to mRNAs or partial mRNAs >> and the way to the coordinates seems to differ for each case. >> >> Any suggestions would be appreciated. >> >> with kind regards, >> Rainer Machne >> >> University of Vienna >> Department for Theoretical Chemistry >> Theoretical Biochemistry Group >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at uiuc.edu Thu Feb 1 17:55:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Feb 2007 11:55:27 -0600 Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ? In-Reply-To: <45C1E2FD.3070709@tbi.univie.ac.at> References: <45C1059D.1070100@tbi.univie.ac.at> <45C1E2FD.3070709@tbi.univie.ac.at> Message-ID: On Feb 1, 2007, at 6:54 AM, Rainer Machne wrote: > Barry and Jason, > > thanks for your quick and very helpful replies. > > I guess we should have done (or repeat) our blast search at > http://fungal.genome.duke.edu/ > to get better mapping from proteins to genomes ? > > As I retrieved all my proteins via whole genome blasts we should find > (most of) them in the genbank files ... a good opportunity for me to > learn some Bioperl and the other packages you mentioned in case we > want > to do more complex analysis later :-) > > Thank you very much! > > Rainer If the data is available in GenBank you could run the BLAST searches at NCBI and limit the search with an Entrez query: http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#entrez_query Most (all?) genome files are tagged as complete I'm not sure but there might be a way of doing this via Bio::Tools::Run::RemoteBlast. Jason, any ideas? chris From cjfields at uiuc.edu Thu Feb 1 18:09:16 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Feb 2007 12:09:16 -0600 Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ? In-Reply-To: References: <45C1059D.1070100@tbi.univie.ac.at> <45C1E2FD.3070709@tbi.univie.ac.at> Message-ID: <748CC48E-D224-4234-A5C4-E33968F17418@uiuc.edu> > If the data is available in GenBank you could run the BLAST searches > at NCBI and limit the search with an Entrez query: > > http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#entrez_query > > Most (all?) genome files are tagged as complete sorry, didn't finish that... "Most (all?) genome files are tagged as complete, wgs, in progress, etc. and can be limited by taxonomy using Fungi[ORGN] or similar." chris From jason at bioperl.org Thu Feb 1 18:36:02 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 1 Feb 2007 10:36:02 -0800 Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ? In-Reply-To: References: <45C1059D.1070100@tbi.univie.ac.at> <45C1E2FD.3070709@tbi.univie.ac.at> Message-ID: On Feb 1, 2007, at 9:55 AM, Chris Fields wrote: > > On Feb 1, 2007, at 6:54 AM, Rainer Machne wrote: > >> Barry and Jason, >> >> thanks for your quick and very helpful replies. >> >> I guess we should have done (or repeat) our blast search at >> http://fungal.genome.duke.edu/ >> to get better mapping from proteins to genomes ? >> Well I'm not quite sure of your exact goals. To find upstream regions of known genes, or look at upstream regions of orthologous genes? You can first figure out orthologs based on protein similarities, then go in an extract upstream regions for the orthologous genes (I provide a link to a big all-vs-all FASTA result at the bottom of the page if you want those results, as well as some pairiwise orthology assignments, although you may want more or less stringent parameters). All the GFF and AA data is freely available for download on the site for each genome we've annotated or for annotation we've re-formatted so you can do things locally and/or modify it to your liking. >> As I retrieved all my proteins via whole genome blasts we should find >> (most of) them in the genbank files ... a good opportunity for me to >> learn some Bioperl and the other packages you mentioned in case we >> want >> to do more complex analysis later :-) >> >> Thank you very much! >> >> Rainer > > If the data is available in GenBank you could run the BLAST > searches at NCBI and limit the search with an Entrez query: > > http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#entrez_query > > Most (all?) genome files are tagged as complete > > I'm not sure but there might be a way of doing this via > Bio::Tools::Run::RemoteBlast. Jason, any ideas? > > chris -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From reenayadav at gmail.com Thu Feb 1 18:38:03 2007 From: reenayadav at gmail.com (Reena Yadav) Date: Fri, 2 Feb 2007 00:08:03 +0530 Subject: [Bioperl-l] pdb parser Message-ID: <76f897dd0702011038v7afe0207gb05465478e026205@mail.gmail.com> hi need to extract pdb atomic coordinates (1ake), and do certain calculations. i am going stepwise: steps that involved are: (1) reading the atomic coordinates (2) read the result in a file. need to understand how to whole xyz line in another file. could someone help. R. From jason at bioperl.org Thu Feb 1 13:06:42 2007 From: jason at bioperl.org (sandhya khatal) Date: Thu, 1 Feb 2007 13:06:42 +0000 Subject: [Bioperl-l] Regarding Bioperl program Message-ID: <75899ED1-72C6-4272-8CAC-028CF133A0B4@gmail.com> Respected Sir, I want to do a program which gives dendrogram like UPGMA a clustering method, but i want this dendrogram by using single linkage or centroid method.Can u help me for this.U have given the code for tree but i want dendrogram as output by using above any method. Thanks for anticipating. Regards, Sandhya Khatal. From jason at bioperl.org Fri Feb 2 00:55:26 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 1 Feb 2007 16:55:26 -0800 Subject: [Bioperl-l] Fwd: Regarding Bioperl program References: <394d31ee0702010506j4bbd79dck41d5ac2162eaafdd@mail.gmail.com> Message-ID: <40020502-3421-407D-85EB-24F420AB699C@bioperl.org> re-forwarding Sandhya's email to the list so the email address is visible. The approach that is coded in bioperl is for distance based data such as evolutionary distance of DNA or protein sequences - I assume you are talking about clustering expression data? You may want to focus on the available literature and toolkits that focus on expression data - something BioPerl doesn't deliberately focus on right now. -jason Begin forwarded message: > From: "sandhya khatal" > Date: February 1, 2007 5:06:42 AM PST > To: jason at bioperl.org > Subject: Regarding Bioperl program > > Respected Sir, > I want to do a program which gives dendrogram > like > UPGMA a clustering method, but i want this dendrogram by using single > linkage or centroid method.Can u help me for this.U have given the > code for > tree but i want dendrogram as output by using above any method. > > Thanks for anticipating. > > Regards, > Sandhya Khatal. -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From lzhtom at hotmail.com Fri Feb 2 03:20:10 2007 From: lzhtom at hotmail.com (zhihua li) Date: Fri, 02 Feb 2007 03:20:10 +0000 Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file? Message-ID: _________________________________________________________________ ???? MSN Explorer: http://explorer.msn.com/lccn/ From lzhtom at hotmail.com Fri Feb 2 03:27:39 2007 From: lzhtom at hotmail.com (zhihua li) Date: Fri, 02 Feb 2007 03:27:39 +0000 Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file? Message-ID: Sorry guys, the former empty mail was sent out by mistake. I'm using Bio::index::Fasta to index a file containing lots of sequences in fasta format. All is fine except one thing. According to the bioperl tutorial and the documents, the following code will make a indexed file: my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx", -write_flag => 1); $inx->make_index("test.fasta"); And in another script I can access the indexed file by sayinig $ENV{BIOPERL_INDEX} = "."; # find index in current directory my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx"); my $seq=$inx->fetch("ent1001"); #fetch the sequence named ent1001 However, after running the first script, I cannot find a new file test.fasta.idx in my current directory. And not surprisingly, when I ran the second script, perl told me it couldn't find "test.fasta.idx". What's going on here? Thanks a lot! _________________________________________________________________ ?????????????? MSN Messenger: http://messenger.msn.com/cn From jason at bioperl.org Fri Feb 2 06:24:44 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 1 Feb 2007 22:24:44 -0800 Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file? In-Reply-To: References: Message-ID: I don't think BIOPERL_INDEX does anything in the module so that documentation is not quite right. the env variable is used in the scripts/index/bp_index and bp_fetch scripts so maybe a cut+paste job went bad somewhere. you need to specify the full path you want with -filename - you can just prepen the BIOPERL_INDEX to the filename like. -filename => $ENV{BIOPERL_INDEX}."/$index" -jason On Feb 1, 2007, at 7:27 PM, zhihua li wrote: > Sorry guys, the former empty mail was sent out by mistake. > > I'm using Bio::index::Fasta to index a file containing lots of > sequences in fasta format. All is fine except one thing. > > According to the bioperl tutorial and the documents, the following > code will make a indexed file: > > my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx", > -write_flag => 1); > $inx->make_index("test.fasta"); > > And in another script I can access the indexed file by sayinig > > $ENV{BIOPERL_INDEX} = "."; # find index in current directory > my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx"); > my $seq=$inx->fetch("ent1001"); #fetch the sequence named > ent1001 > > However, after running the first script, I cannot find a new file > test.fasta.idx in my current directory. And not surprisingly, when > I ran the second script, perl told me it couldn't find > "test.fasta.idx". > > What's going on here? > > Thanks a lot! > > _________________________________________________________________ > ?????????????? MSN Messenger: http:// > messenger.msn.com/cn > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From marian.thieme at lycos.de Fri Feb 2 10:06:09 2007 From: marian.thieme at lycos.de (marian thieme) Date: Fri, 2 Feb 2007 10:06:09 +0000 Subject: [Bioperl-l] seqDiff Message-ID: <101051013116870@lycos-europe.com> An HTML attachment was scrubbed... URL: From marian.thieme at lycos.de Fri Feb 2 11:37:05 2007 From: marian.thieme at lycos.de (marian thieme) Date: Fri, 2 Feb 2007 11:37:05 +0000 Subject: [Bioperl-l] susp. header Message-ID: <188661178024725@lycos-europe.com> An HTML attachment was scrubbed... URL: From lubapardo at gmail.com Fri Feb 2 14:31:06 2007 From: lubapardo at gmail.com (Luba Pardo) Date: Fri, 2 Feb 2007 15:31:06 +0100 Subject: [Bioperl-l] Problem using Bio::DB::Query::GenBank; Message-ID: <58ff33550702020631l4e7bc59dmabcf8c72fa67a6d5@mail.gmail.com> Hello, (I am using bioperl-1.5.2_100, linux machine) I am trying to get the ids of a list of genes using the module Bio::DB::Query:GenBank. I have the following code: use Bio::DB::Query::GenBank; use strict; use warnings; open (READER_1,"list.txt") || die "\n I can't open the file READER_1 !!\n"; my @a1=; close (READER_1); for (my $i=0; $i<=$#a1;$i=$i+1 ) { my @a1_s=split/\s+/,$a1[$i]; my $query_string = ' Homo Sapiens[Organism] AND $a1[$i] '; my $query = Bio::DB::Query::GenBank->new(-db=>'Protein', -query=>$query_string ); my $count = $query->count; my @ids = $query->ids; print " gene: $a1[$i] first id is $ids[0] o no? \n"; I want to tell the program to get all the genes contained in the file list.txt and to retrieve the ids from GenBank. However the program gives me the following error: ------------EXCEPTION: Bio::Root::Exception ------------- MSG: Id list has been truncated even after maxids requested STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.1/Bio/Root/Root.pm:359 STACK: Bio::DB::Query::WebQuery::_fetch_ids /usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:236 STACK: Bio::DB::Query::WebQuery::ids /usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:200 STACK: query.pl:27 ------------------ Is that a problem if I try to use the $a1[$i] to replace the name of the gene? I thank before hand for the attention you may pay to this message Regards, Luba Pardo From hlapp at gmx.net Fri Feb 2 15:44:02 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 2 Feb 2007 10:44:02 -0500 Subject: [Bioperl-l] susp. header In-Reply-To: <188661178024725@lycos-europe.com> References: <188661178024725@lycos-europe.com> Message-ID: You are sending HTML emails. You should configure your mailer to ideally just send plain text. If you really must have fancy formatted emails (i.e., HTML-formatted emails), then configure it such that the mailer will send a plain text and a HTML version. (Many spam filters will flag email the body of which consists of only an HTML attachment.) -hilmar On Feb 2, 2007, at 6:37 AM, marian thieme wrote: > why each message I sent to this list is considered to have a susp. > header ? > > Marian > > Schreiben Sie sich kostenlos ein und erhalten Sie eine Liste mit > 20 Singles aus Ihrer Umgebung.Meetic.de > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cain.cshl at gmail.com Fri Feb 2 16:03:16 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Fri, 02 Feb 2007 11:03:16 -0500 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu> References: <1170359746.2706.622.camel@localhost.localdomain> <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu> Message-ID: <1170432196.2706.661.camel@localhost.localdomain> Hi Hilmar, That is a good idea; when I started down this road, it felt like there would only be a few things that I might want to allow to be different, but I think you are right that having one standard implementation that can be subclassed for legacy systems is a good thing. Scott On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote: > On Feb 1, 2007, at 2:55 PM, Scott Cain wrote: > > > The second main change was to introduce a -flybase_compat argument > > when > > initializing the Bio::SeqIO writer, so that 'old style' cv and cvterms > > (that are compatable with flybase) will be used, but now the default > > will be to use current standards: > > Just my $0.02 ... obviously, Flybase may be the only organization > that uses an 'old style' or any other way not compliant with 'current > standards' (presumably SO), but if it's not the only one then this > approach won't scale. > > Also, an argument -flybase_compat suggests to the unsuspecting that > this is an endorsed flavor of the standard and fine to use for > everyone else too. > > If Flybase is idiosyncratic in this way, why not make chadoxml.pm > compliant with the standard as we all want it, keep it free from > litter caused by usage of old versions of SO, and create a second > module fb-chadoxml.pm that inherits from the first and merely > overrides a few things so that it works for Flybase. This way, other > organizations with similar needs can follow the path and create their > own xyz-chadoxml.pm, rather than having to muck around in the > chadoxml.pm that comes with the distribution. > > I'm not sure I fully grasp the underlying issue, so I may not make > much sense here. Apologies if that's the case ... > > -hilmar -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From bosborne11 at verizon.net Fri Feb 2 15:27:44 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 02 Feb 2007 10:27:44 -0500 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu> Message-ID: Hilmar, I second your motion, good idea. Let's keep the standard module nice and clean. Brian O. On 2/2/07 10:09 AM, "Hilmar Lapp" wrote: > and create a second > module fb-chadoxml.pm that inherits from the first and merely > overrides a few things so that it works for Flybase From Kevin.M.Brown at asu.edu Fri Feb 2 15:52:20 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Fri, 2 Feb 2007 08:52:20 -0700 Subject: [Bioperl-l] Problem using Bio::DB::Query::GenBank; References: <58ff33550702020631l4e7bc59dmabcf8c72fa67a6d5@mail.gmail.com> Message-ID: <1A4207F8295607498283FE9E93B775B402AABA1C@EX02.asurite.ad.asu.edu> It looks like you have some problems with the code you posted. use Bio::DB::Query::GenBank; use strict; use warnings; open (READER_1,"list.txt") || die "\n I can't open the file READER_1 !!\n"; my @a1=; close (READER_1); for (my $i=0; $i < @a1;$i++ ) { # is this necessary as you don't seem to use it anywhere later in your code. my @a1_s=split/\s+/,$a1[$i]; # you enclosed the variable in '' which means perl won't evaluate it # changed the query so that perl can evaluate the variable my $query_string = ' Homo Sapiens[Organism] AND '.$a1[$i] .' '; my $query = Bio::DB::Query::GenBank->new(-db=>'Protein', -query=>$query_string ); my $count = $query->count; my @ids = $query->ids; print " gene: $a1[$i] first id is $ids[0] o no? \n"; -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Luba Pardo Sent: Friday, February 02, 2007 7:31 AM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Problem using Bio::DB::Query::GenBank; Hello, (I am using bioperl-1.5.2_100, linux machine) I am trying to get the ids of a list of genes using the module Bio::DB::Query:GenBank. I have the following code: use Bio::DB::Query::GenBank; use strict; use warnings; open (READER_1,"list.txt") || die "\n I can't open the file READER_1 !!\n"; my @a1=; close (READER_1); for (my $i=0; $i<=$#a1;$i=$i+1 ) { my @a1_s=split/\s+/,$a1[$i]; my $query_string = ' Homo Sapiens[Organism] AND $a1[$i] '; my $query = Bio::DB::Query::GenBank->new(-db=>'Protein', -query=>$query_string ); my $count = $query->count; my @ids = $query->ids; print " gene: $a1[$i] first id is $ids[0] o no? \n"; I want to tell the program to get all the genes contained in the file list.txt and to retrieve the ids from GenBank. However the program gives me the following error: ------------EXCEPTION: Bio::Root::Exception ------------- MSG: Id list has been truncated even after maxids requested STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.1/Bio/Root/Root.pm:359 STACK: Bio::DB::Query::WebQuery::_fetch_ids /usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:236 STACK: Bio::DB::Query::WebQuery::ids /usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:200 STACK: query.pl:27 ------------------ Is that a problem if I try to use the $a1[$i] to replace the name of the gene? I thank before hand for the attention you may pay to this message Regards, Luba Pardo _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri Feb 2 16:37:49 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 2 Feb 2007 10:37:49 -0600 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: <1170432196.2706.661.camel@localhost.localdomain> References: <1170359746.2706.622.camel@localhost.localdomain> <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu> <1170432196.2706.661.camel@localhost.localdomain> Message-ID: <64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu> I was going to suggest maybe allowing one to switch out XML handlers/ writers based on the style (ala XML::SAX), but I see that chadoxml currently uses XML::Writer and there is no next_seq() implemented. Oh well... chris On Feb 2, 2007, at 10:03 AM, Scott Cain wrote: > Hi Hilmar, > > That is a good idea; when I started down this road, it felt like there > would only be a few things that I might want to allow to be different, > but I think you are right that having one standard implementation that > can be subclassed for legacy systems is a good thing. > > Scott > > > On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote: >> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote: >> >>> The second main change was to introduce a -flybase_compat argument >>> when >>> initializing the Bio::SeqIO writer, so that 'old style' cv and >>> cvterms >>> (that are compatable with flybase) will be used, but now the default >>> will be to use current standards: >> >> Just my $0.02 ... obviously, Flybase may be the only organization >> that uses an 'old style' or any other way not compliant with 'current >> standards' (presumably SO), but if it's not the only one then this >> approach won't scale. >> >> Also, an argument -flybase_compat suggests to the unsuspecting that >> this is an endorsed flavor of the standard and fine to use for >> everyone else too. >> >> If Flybase is idiosyncratic in this way, why not make chadoxml.pm >> compliant with the standard as we all want it, keep it free from >> litter caused by usage of old versions of SO, and create a second >> module fb-chadoxml.pm that inherits from the first and merely >> overrides a few things so that it works for Flybase. This way, other >> organizations with similar needs can follow the path and create their >> own xyz-chadoxml.pm, rather than having to muck around in the >> chadoxml.pm that comes with the distribution. >> >> I'm not sure I fully grasp the underlying issue, so I may not make >> much sense here. Apologies if that's the case ... >> >> -hilmar > -- > ---------------------------------------------------------------------- > -- > Scott Cain, Ph. D. > cain.cshl at gmail.com > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Fri Feb 2 16:45:30 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 2 Feb 2007 11:45:30 -0500 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: <64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu> References: <1170359746.2706.622.camel@localhost.localdomain> <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu> <1170432196.2706.661.camel@localhost.localdomain> <64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu> Message-ID: <3A3FF1B0-129A-410D-9DE2-D1CB28015C9A@gmx.net> There must be at least a stub for next_seq(). It may throw a not- implemented exception, but it should not just be absent. -hilmar On Feb 2, 2007, at 11:37 AM, Chris Fields wrote: > I was going to suggest maybe allowing one to switch out XML > handlers/writers based on the style (ala XML::SAX), but I see that > chadoxml currently uses XML::Writer and there is no next_seq() > implemented. Oh well... > > chris > > On Feb 2, 2007, at 10:03 AM, Scott Cain wrote: > >> Hi Hilmar, >> >> That is a good idea; when I started down this road, it felt like >> there >> would only be a few things that I might want to allow to be >> different, >> but I think you are right that having one standard implementation >> that >> can be subclassed for legacy systems is a good thing. >> >> Scott >> >> >> On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote: >>> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote: >>> >>>> The second main change was to introduce a -flybase_compat argument >>>> when >>>> initializing the Bio::SeqIO writer, so that 'old style' cv and >>>> cvterms >>>> (that are compatable with flybase) will be used, but now the >>>> default >>>> will be to use current standards: >>> >>> Just my $0.02 ... obviously, Flybase may be the only organization >>> that uses an 'old style' or any other way not compliant with >>> 'current >>> standards' (presumably SO), but if it's not the only one then this >>> approach won't scale. >>> >>> Also, an argument -flybase_compat suggests to the unsuspecting that >>> this is an endorsed flavor of the standard and fine to use for >>> everyone else too. >>> >>> If Flybase is idiosyncratic in this way, why not make chadoxml.pm >>> compliant with the standard as we all want it, keep it free from >>> litter caused by usage of old versions of SO, and create a second >>> module fb-chadoxml.pm that inherits from the first and merely >>> overrides a few things so that it works for Flybase. This way, other >>> organizations with similar needs can follow the path and create >>> their >>> own xyz-chadoxml.pm, rather than having to muck around in the >>> chadoxml.pm that comes with the distribution. >>> >>> I'm not sure I fully grasp the underlying issue, so I may not make >>> much sense here. Apologies if that's the case ... >>> >>> -hilmar >> -- >> --------------------------------------------------------------------- >> --- >> Scott Cain, Ph. D. >> cain.cshl at gmail.com >> GMOD Coordinator (http://www.gmod.org/) >> 216-392-3087 >> Cold Spring Harbor Laboratory >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cain.cshl at gmail.com Fri Feb 2 17:02:32 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Fri, 02 Feb 2007 12:02:32 -0500 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: <3A3FF1B0-129A-410D-9DE2-D1CB28015C9A@gmx.net> References: <1170359746.2706.622.camel@localhost.localdomain> <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu> <1170432196.2706.661.camel@localhost.localdomain> <64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu> <3A3FF1B0-129A-410D-9DE2-D1CB28015C9A@gmx.net> Message-ID: <1170435752.2706.676.camel@localhost.localdomain> Ah, I'll go ahead and add one, though it will just throw an exception because this is a write-only adapter. Scott On Fri, 2007-02-02 at 11:45 -0500, Hilmar Lapp wrote: > There must be at least a stub for next_seq(). It may throw a not- > implemented exception, but it should not just be absent. > > -hilmar > > On Feb 2, 2007, at 11:37 AM, Chris Fields wrote: > > > I was going to suggest maybe allowing one to switch out XML > > handlers/writers based on the style (ala XML::SAX), but I see that > > chadoxml currently uses XML::Writer and there is no next_seq() > > implemented. Oh well... > > > > chris > > > > On Feb 2, 2007, at 10:03 AM, Scott Cain wrote: > > > >> Hi Hilmar, > >> > >> That is a good idea; when I started down this road, it felt like > >> there > >> would only be a few things that I might want to allow to be > >> different, > >> but I think you are right that having one standard implementation > >> that > >> can be subclassed for legacy systems is a good thing. > >> > >> Scott > >> > >> > >> On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote: > >>> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote: > >>> > >>>> The second main change was to introduce a -flybase_compat argument > >>>> when > >>>> initializing the Bio::SeqIO writer, so that 'old style' cv and > >>>> cvterms > >>>> (that are compatable with flybase) will be used, but now the > >>>> default > >>>> will be to use current standards: > >>> > >>> Just my $0.02 ... obviously, Flybase may be the only organization > >>> that uses an 'old style' or any other way not compliant with > >>> 'current > >>> standards' (presumably SO), but if it's not the only one then this > >>> approach won't scale. > >>> > >>> Also, an argument -flybase_compat suggests to the unsuspecting that > >>> this is an endorsed flavor of the standard and fine to use for > >>> everyone else too. > >>> > >>> If Flybase is idiosyncratic in this way, why not make chadoxml.pm > >>> compliant with the standard as we all want it, keep it free from > >>> litter caused by usage of old versions of SO, and create a second > >>> module fb-chadoxml.pm that inherits from the first and merely > >>> overrides a few things so that it works for Flybase. This way, other > >>> organizations with similar needs can follow the path and create > >>> their > >>> own xyz-chadoxml.pm, rather than having to muck around in the > >>> chadoxml.pm that comes with the distribution. > >>> > >>> I'm not sure I fully grasp the underlying issue, so I may not make > >>> much sense here. Apologies if that's the case ... > >>> > >>> -hilmar > >> -- > >> --------------------------------------------------------------------- > >> --- > >> Scott Cain, Ph. D. > >> cain.cshl at gmail.com > >> GMOD Coordinator (http://www.gmod.org/) > >> 216-392-3087 > >> Cold Spring Harbor Laboratory > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From peili at morgan.harvard.edu Fri Feb 2 15:56:56 2007 From: peili at morgan.harvard.edu (Peili Zhang) Date: Fri, 02 Feb 2007 10:56:56 -0500 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: References: Message-ID: <1170431816.6583.47.camel@jacks> i 'third' Hilmar's opinion. flybase's fingerprint is shown in the module because i wrote it for fb's data loading task. no need to worry about flybase compatibility in making the module generic. in fact, at flybase, i tweak the module frequently to make it work for different scenarios. cheers, peili On Fri, 2007-02-02 at 10:27, Brian Osborne wrote: > Hilmar, > > I second your motion, good idea. Let's keep the standard module nice and > clean. > > Brian O. > > > On 2/2/07 10:09 AM, "Hilmar Lapp" wrote: > > > and create a second > > module fb-chadoxml.pm that inherits from the first and merely > > overrides a few things so that it works for Flybase > > > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier. > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Gmod-schema mailing list > Gmod-schema at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-schema > From cain.cshl at gmail.com Fri Feb 2 18:05:47 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Fri, 02 Feb 2007 13:05:47 -0500 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: <1170431816.6583.47.camel@jacks> References: <1170431816.6583.47.camel@jacks> Message-ID: <1170439549.2706.683.camel@localhost.localdomain> Hi Peili, A little bit ago I checked in Bio::SeqIO::flybase_chadoxml that is fairly simple. My suggestion is that when you make tweaks for different scenarios, that you turn the things you are tweaking into methods in BSIO::chadoxml and then override them in flybase_chadoxml (and commit at least the chadoxml module) to make it more flexible when other people have similar scenarios. Scott On Fri, 2007-02-02 at 10:56 -0500, Peili Zhang wrote: > i 'third' Hilmar's opinion. flybase's fingerprint is shown in the module > because i wrote it for fb's data loading task. no need to worry about > flybase compatibility in making the module generic. in fact, at flybase, > i tweak the module frequently to make it work for different scenarios. > > cheers, > peili > > On Fri, 2007-02-02 at 10:27, Brian Osborne wrote: > > Hilmar, > > > > I second your motion, good idea. Let's keep the standard module nice and > > clean. > > > > Brian O. > > > > > > On 2/2/07 10:09 AM, "Hilmar Lapp" wrote: > > > > > and create a second > > > module fb-chadoxml.pm that inherits from the first and merely > > > overrides a few things so that it works for Flybase > > > > > > > > ------------------------------------------------------------------------- > > Using Tomcat but need to do more? Need to support web services, security? > > Get stuff done quickly with pre-integrated technology to make your job easier. > > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > > _______________________________________________ > > Gmod-schema mailing list > > Gmod-schema at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From cjfields at uiuc.edu Fri Feb 2 20:33:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 2 Feb 2007 14:33:46 -0600 Subject: [Bioperl-l] seqDiff In-Reply-To: <101051013116870@lycos-europe.com> References: <101051013116870@lycos-europe.com> Message-ID: Judging by the code you'll have to recreate the SeqDiff while iterating through various alleles; there is no method to remove particular variants or purge them (at least I couldn't find one). I also noticed SeqDiff doesn't support deletions/insertions either; using a null allele (no seq) or leaving out either the mutant or original allele leads to errors. I'll look into the latter, and I may try to add a method to at least purge variants and reset dna_mut(). chris On Feb 2, 2007, at 4:06 AM, marian thieme wrote: > HI, > > is there a way to put out all mutated sequences of a seqdiff object ? > Suppose I add some variants via: > > $dnamut->add_Allele($a2); > $dnamut->add_Allele($a3); > $seqDiff->add_Variant($dnamut); > > and afterwards want to access the alternative sequences via > $seqDiff->dna_mut() > > which allele is choosen when using dna_mut(), respective can I > control to access the first or the second alternate sequence ? > If yes, how can I do this ? > > Regards, > Marian > > Brauchst du eine Schocktherapie gegen den Alltag? L?chle! Die warme > Sonne von Ibiza und ein bisschen Sand vom Mittelmeer ist die > Therapie, die du brauchst. Plan deinen Urlaub in Spanien auf > www.spain.info > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From MEC at stowers-institute.org Fri Feb 2 21:47:08 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 2 Feb 2007 15:47:08 -0600 Subject: [Bioperl-l] Bio::DB::SeqFeature treamtent of tags and annotations Message-ID: Lincoln, I don't think that adding this directive is a good idea after all either. But, I see that you remap the ID= to a load_id attribute which is preserved in the Bio::DB::SeqFeatureStore database. And then it gets squelched during GFF production by NormalizedFeature::format_attributes. However, if ID is prone to clashes, then certainly simply renaming the attribute to be load_id does not preclude clashes from happening, and only courts disaster. Don't you think? I'm a little blurry on the GFF3Loader, but it looks like you're using load_id to facilitate loading parent/child features out of order. Is that right? If so, I suggest you delete all load_id features immediately after performing a load. It has not further use. Or, you might consider instead of `round-trip-ids` directive, rather, give the GFF3Loader an IDAttribute option which would allow the use of the loader to preserve the ID values, but to use a named In my case, munging flybase gff, I would then use it like this: bp_seqfeature_load.PLS --fast --IDAttribute flybaseID which would preserve the ID values in the database but under the FlybaseID attribute for features so loaded. --------------------------------------------- On a related topic: I just committed this patch to Bio::DB::SeqFeature::NormalizedFeature _create_subfeatures : ensure that subfeatures get the `source` of their parent While doing this I wonder: what is the -class that subfeatures are getting from their parent...??? I left it in place. Please advise! Fix my thinking.... ---------------------------------------------- Further, I observe that Bio::Graphics::FeatureBase::new handles the -segments option is to call add_segment. So, when I create a new Bio::DB::SeqFeature with -segments [[ 100,200 ] [300,400]], the -segments option gets handled by Bio::Graphics::FeatureBase::new, which, as mentioned, calls add_segment. The surprising thing to me when thrying to trace through the class modules and understand what is going on is that what gets run at this point is not Bio::Graphics::FeatureBase::add_segment, but rather Bio::DB::SeqFeature::add_segment, whose semantics is different in at least one regard, namely, that it does not set the start and stop of the parent feature from the min and max of the segments. I have committed a patch to Bio::Graphics::FeatureBase with a comment to this effect, and have also patched it's add_segment method to copy the parent's source into the segment. I hope my commits and suggestions further the cause. Let me know if not! -- Malcolm ________________________________ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: Tuesday, January 30, 2007 4:46 PM To: Cook, Malcolm Cc: bioperl list; lstein at cshl.org Subject: Re: Bio::DB::SeqFeature treamtent of tags and annotations I've fixed the first issue in CVS. Sorry for the inconsistency. add_tag_value(), delete_tag_value() and get_Annotations() now all work as expected. The problem with the ID column is that it is supposed to be LOCAL to the GFF3 file and is not intended to be stored in the database. In contrast, Name can survive roundtripping. Perhaps the thing to do is to add a flag to the GFF3 file that turns on ID round-tripping, e.g. ##round-trip-ids: 1 If you like this idea, I can implement it. Lincoln On 1/29/07, Cook, Malcolm < MEC at stowers-institute.org > wrote: Lincoln, Thanks for your suggestions on approach to my problems augmenting Flybase annotation. I am trying to follow them and finding the following oddities The first issue relates to the intermix of 'annotations' and 'tag values'. I find that Bio::DB::SeqFeature implements some of the 'tag' methods and some of the 'Annotation' methods. Here is a perl one-liner that shows values stored using add_tag_value are not retreived with get_tag_values, but rather with get_Annotations. > perl -MBio::DB::SeqFeature -e 'my $f=Bio::DB::SeqFeature->new; $f->add_tag_value("x",666); print "get_tag_values:\t" . $f->get_tag_values("x") . "\nget_Annotations:\t" . $f->get_Annotations("x");' whose output is: get_tag_values: get_Annotations: 666 Tracing this shows me that this results from the fact that: Bio::DB::SeqFeature uses of Bio::Graphics::FeatureBase (via Bio::DB::SeqFeature::NormalizedFeature) which does not support -tags in ->new but rather -attributes, viz: -attributes a hashref of tag value attributes, in which the key is the tag and the value is an array reference of values And though Bio::Graphics::FeatureBase purports to implement Bio::SeqFeatureI, it only partially implements the 'tag' methods (now deprecated and relegated to Bio::AnnotatableI). In particular, the '*' methods Bio::SeqFeatureI are not implemented in Bio::Graphics::FeatureBase has_tag * add_tag_value get_tag_values get_all_tags * remove_tag get_tagset_values get_Annotations As a result, add_tag_value and remove_tag are inherited from different modules whose understanding of tags is not the same! This one-liner : >perl -MClass::ISA -MClass::Inspector -MBio::DB::SeqFeature -e 'my @c = Class::ISA::self_and_super_path("Bio::DB::SeqFeature"); foreach my $fn qw(add_tag_value get_tag_values) {print "\n$fn:\t", join "\t", (grep {Class::Inspector->function_exists($_, $fn)} @c)}' confirms that they are defined in different packages, namely: add_tag_value: Bio::AnnotatableI get_tag_values: Bio::Graphics::FeatureBase Bio::AnnotatableI Proposed solution... hmmmm ..... I dunno.... maybe the following patch to Bio::Graphics::FeatureBase->add_tag_value : sub add_tag_value { my ($self,$tag, at vals) = @_; push @{$self->{attributes}{$tag}}, @vals; } It fixes my use case for now but I'm still concerned and confused about this variety of methods. Suggestions? ------------------------------------------------------------------------ - Also, I think that any "ID" in column 9 of GFF3 float file should be preserved through a round-trip through a Bio::DB::SeqFeature store, but this is not yet possible since any ID attribute in GFF3 column 9 is being lost by GFF3Loader, causing me to locally patch GFF3Loader::handle_feature method to add the following: # mec at stowers-institute.org , wondering why not all attributes are # carried forward, adds ID tag in particular service of # round-tripping ID, which, though present in database as load_id # attribute, was getting lost as itself $unreserved->{ID}= $reserved->{ID} if exists $reserved->{ID}; Poised to patch.... what d'you think? Malcolm Cook Stowers Institute for Medical Research - Kansas City, Missouri ________________________________ From: lincoln.stein at gmail.com [mailto: lincoln.stein at gmail.com ] On Behalf Of Lincoln Stein Sent: Tuesday, December 19, 2006 3:58 PM To: Cook, Malcolm Cc: bioperl list; lstein at cshl.org Subject: Re: bp_seqfeature_load / Bio::DB::SeqFeature::Store::GFF3Loader problems augmenting Flybase annotation Hi Malcom, Your second guess was right. The use case of augmenting an existing gene with additional splice forms isn't provided for. You can get the functionality by making direct calls to Bio::DB::SeqFeature::Store methods: my @genes = $db->get_features_by_name('FBgn0017545'); @genes == 1 or die "Didn't get exactly one gene"; my $parent = $genes[0]; my $parent = $genes[0]; my $chr = $parent->seq_id; my $start = $parent->start; my $end = $parent->end; my $strand = $parent->strand; my $new_splice_form = $db->new_feature(-primary_tag => 'mRNA', -source => 'added', -seq_id => '4r', -strand => $strand, -start => $start+10, -end => $end, ); $parent->add_SeqFeature($new_splice_form); for my $pos ([$start+10,$start+100],[$start+200,$end]) { my ($e_start,$e_end) = @$pos; my $exon = Bio::DB::SeqFeature->new(-primary_tag => 'exon', -store => $db, -seq_id => '4r', -strand => $strand, -start => $e_start, -end => $e_end); $new_splice_form->add_SeqFeature($exon); } I found a bug in updating the seqfeature database when I wrote this script, so you'll have to get the latest biperl live. I think you can use this to write a splice form updating script. In order to support the idea of adding new splice forms to an existing gene using the GFF3 format, I will have to either modify the loader, or write a separate script (probably better to do the latter). It shouldn't be hard if you'd like to give it a try. Lincoln On 12/19/06, Cook, Malcolm > wrote: Lincoln and fellow Bio::DB::SeqFeature travelers, I find that using bp_seqfeature_load.PLS to load subfeatures of genes already loaded using bp_seqfeature_load.PLS fails with ------------- EXCEPTION ------------- MSG: FBgn0017545 doesn't have a primary id STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682 STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663 STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372 STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345 STACK Bio::DB::SeqFeature::Store::GFF3Loader::load /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242 STACK toplevel /home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo ad.PLS:76 Where FBgn0017545 is the ID of a gene previously loaded. I am unsure how to remedy my situation and welcome any advise on correct or improved approach to my problem. Here's some detail if it helps. I am developing a pipeline to design a microarray probes capable of distinguishing among splice variants in drosophila (using latest Flybase dmel_r5.1 annotation). So I 1) load a filtered selection of Flybase annotation using bp_seqfeature_load. (for testing purposes, I am using a single gene's worth of annotation, FBgn0017545.gff, attached). This is done as follows: > bp_seqfeature_load.PLS --create FBgn0017545.gff 2) analyze all the genes in the database, and create GFF3 output each feature of which has a 'Parent' that is a previously loaded gene (i.e. FBgn0017545). (These features represent the unique introns, splice sites, and exonic design targets. Output of this analysis, FBgn0017545_matd.gff, is also attached) 3) load these analysis results into the same database, as follows: > bp_seqfeature_load.PLS FBgn0017545_matd.gff It is at this point that I get the above error. However, I don't get any error and the data loads fine if I load the two files together, as follows: > bp_seqfeature_load.PLS --create <(cat FBgn0017545.gff FBgn0017545_matd.gff) So, I suspect that either I am misunderstanding when/how to use bp_seqfeature_load.PLS or else this use case has not yet arisen and must be provided for somehow. I am running against bioperl-live Thanks for your thoughts and assistance, Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From neha_bafs at yahoo.co.in Mon Feb 5 17:59:03 2007 From: neha_bafs at yahoo.co.in (Neha Nahar) Date: Mon, 5 Feb 2007 17:59:03 +0000 (GMT) Subject: [Bioperl-l] Convert newick to nexus format Message-ID: <20070205175903.30379.qmail@web8701.mail.in.yahoo.com> Hello everyone, I am trying to convert newick tree to nexus format. Using the script (refered from and email from George dated Wed Sep 22 11:52:47 EDT 2004) : /*------------------------------------------------------------*/ $ cat nexus.pl #!/usr/bin/perl -w use Bio::TreeIO; ($NEWICKFILE, $NEXUSFILE) = @ARGV; print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n"; my $treeio = new Bio::TreeIO(-format => 'newick', -file => "$NEWICKFILE"); my $treeout = new Bio::TreeIO(-format => 'nexus', -file => ">$NEXUSFILE"); while(my $tree = $treeio->next_tree) { $treeout->write_tree($treeout); } exit 0; /*------------------------------------------------------------*/ Running the script through command line: Gives the following error: $ ./nexus.pl mrp-input.txt nexus.out newickfile=mrp-input.txt, nexusfile=nexus.out ------------- EXCEPTION ------------- MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170 STACK toplevel ./nexus.pl:23 -------------------------------------- Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm Questions:- 1. Please let me know if I am using the correct version. If not, please point me to the latest one. 2. Provided that the version I am using is the right one, please let me know what is wrong with the script. Thank you. Regards, Neha. -Neha Nahar " Work for cause and not for applause, live to express and not to impress !" --------------------------------- Here?s a new way to find what you're looking for - Yahoo! Answers From jason at bioperl.org Mon Feb 5 18:10:42 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 5 Feb 2007 10:10:42 -0800 Subject: [Bioperl-l] Convert newick to nexus format In-Reply-To: <20070205175903.30379.qmail@web8701.mail.in.yahoo.com> References: <20070205175903.30379.qmail@web8701.mail.in.yahoo.com> Message-ID: <46219DCD-8C6E-4DBE-82F2-D4B58207AD54@bioperl.org> you want to write the TREE out not the TREE WRITER. $treeout->write_tree($tree) not $treeout->write_tree($treeout); On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote: > Hello everyone, > > I am trying to convert newick tree to nexus format. > Using the script (refered from and email from George dated Wed Sep > 22 11:52:47 EDT 2004) : > > /*------------------------------------------------------------*/ > > $ cat nexus.pl > #!/usr/bin/perl -w > > use Bio::TreeIO; > > ($NEWICKFILE, $NEXUSFILE) = @ARGV; > print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n"; > my $treeio = new Bio::TreeIO(-format => 'newick', -file => > "$NEWICKFILE"); > my $treeout = new Bio::TreeIO(-format => 'nexus', -file => "> > $NEXUSFILE"); > while(my $tree = $treeio->next_tree) { > $treeout->write_tree($treeout); > } > > exit 0; > > > /*------------------------------------------------------------*/ > > Running the script through command line: > Gives the following error: > > $ ./nexus.pl mrp-input.txt nexus.out > newickfile=mrp-input.txt, nexusfile=nexus.out > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus.pl:23 > > -------------------------------------- > > > Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ > ~sendu/bioperl/Bio/TreeIO.pm > > Questions:- > > 1. Please let me know if I am using the correct version. > If not, please point me to the latest one. > > 2. Provided that the version I am using is the right one, please > let me know what is wrong with the script. > > Thank you. > Regards, > Neha. > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From nehadnahar at yahoo.co.in Mon Feb 5 18:05:26 2007 From: nehadnahar at yahoo.co.in (Neha Nahar) Date: Mon, 5 Feb 2007 18:05:26 +0000 (GMT) Subject: [Bioperl-l] Convert newick to nexus format Message-ID: <288335.22352.qm@web8412.mail.in.yahoo.com> Hello everyone, I am trying to convert newick tree to nexus format. Using the script (refered from and email from George dated Wed Sep 22 11:52:47 EDT 2004) : /*------------------------------------------------------------*/ $ cat nexus.pl #!/usr/bin/perl -w use Bio::TreeIO; ($NEWICKFILE, $NEXUSFILE) = @ARGV; print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n"; my $treeio = new Bio::TreeIO(-format => 'newick', -file => "$NEWICKFILE"); my $treeout = new Bio::TreeIO(-format => 'nexus', -file => ">$NEXUSFILE"); while(my $tree = $treeio->next_tree) { $treeout->write_tree($treeout); } exit 0; /*------------------------------------------------------------*/ Running the script through command line: Gives the following error: $ ./nexus.pl mrp-input.txt nexus.out newickfile=mrp-input.txt, nexusfile=nexus.out ------------- EXCEPTION ------------- MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170 STACK toplevel ./nexus.pl:23 -------------------------------------- Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm Questions:- 1. Please let me know if I am using the correct version. If not, please point me to the latest one. 2. Provided that the version I am using is the right one, please let me know what is wrong with the script. Thank you. Regards, Neha. -Neha Nahar " Work for cause and not for applause, live to express and not to impress !" --------------------------------- Here?s a new way to find what you're looking for - Yahoo! Answers From hlapp at duke.edu Fri Feb 2 15:09:57 2007 From: hlapp at duke.edu (Hilmar Lapp) Date: Fri, 2 Feb 2007 10:09:57 -0500 Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml In-Reply-To: <1170359746.2706.622.camel@localhost.localdomain> References: <1170359746.2706.622.camel@localhost.localdomain> Message-ID: <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote: > The second main change was to introduce a -flybase_compat argument > when > initializing the Bio::SeqIO writer, so that 'old style' cv and cvterms > (that are compatable with flybase) will be used, but now the default > will be to use current standards: Just my $0.02 ... obviously, Flybase may be the only organization that uses an 'old style' or any other way not compliant with 'current standards' (presumably SO), but if it's not the only one then this approach won't scale. Also, an argument -flybase_compat suggests to the unsuspecting that this is an endorsed flavor of the standard and fine to use for everyone else too. If Flybase is idiosyncratic in this way, why not make chadoxml.pm compliant with the standard as we all want it, keep it free from litter caused by usage of old versions of SO, and create a second module fb-chadoxml.pm that inherits from the first and merely overrides a few things so that it works for Flybase. This way, other organizations with similar needs can follow the path and create their own xyz-chadoxml.pm, rather than having to muck around in the chadoxml.pm that comes with the distribution. I'm not sure I fully grasp the underlying issue, so I may not make much sense here. Apologies if that's the case ... -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== From jason at bioperl.org Mon Feb 5 19:43:09 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 5 Feb 2007 11:43:09 -0800 Subject: [Bioperl-l] Convert newick to nexus format In-Reply-To: <209988.63723.qm@web8715.mail.in.yahoo.com> References: <209988.63723.qm@web8715.mail.in.yahoo.com> Message-ID: <9E477447-67F5-46CA-BCC1-47BB4170EC76@bioperl.org> please cc the mailing list when asking a question or followup. Sorry I don't know what you are doing wrong - you didn't resend your code so I don't know if you still have a typo. This code works fine for me use Bio::TreeIO; use strict; my ($filein,$fileout) = @ARGV; my ($format,$oformat) = qw(newick nexus); my $in = Bio::TreeIO->new(-file => $filein, -format => $format); my $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout"); while( my $t = $in->next_tree ) { $out->write_tree($t); } On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote: > Thank you very much for the reply. > > I fixed the code as per your suggestion,but now am getting a > different error: > > $ ./nexus.pl mrp-input.txt nexus.out > newickfile=mrp-input.txt, nexusfile=nexus.out > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus.pl:23 > > -------------------------------------- > > Please help me out with this script. > > Thank you. > Regards, > Neha > > > > > Jason Stajich wrote: you want to write the TREE > out not the TREE WRITER. > > > $treeout->write_tree($tree) > > not > $treeout->write_tree($treeout); > > On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote: > > Hello everyone, > > > I am trying to convert newick tree to nexus format. > Using the script (refered from and email from George dated Wed Sep > 22 11:52:47 EDT 2004) : > > > /*------------------------------------------------------------*/ > > > $ cat nexus.pl > #!/usr/bin/perl -w > > > use Bio::TreeIO; > > > ($NEWICKFILE, $NEXUSFILE) = @ARGV; > print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n"; > my $treeio = new Bio::TreeIO(-format => 'newick', -file => > "$NEWICKFILE"); > my $treeout = new Bio::TreeIO(-format => 'nexus', -file => "> > $NEXUSFILE"); > while(my $tree = $treeio->next_tree) { > $treeout->write_tree($treeout); > } > > > exit 0; > > > > > /*------------------------------------------------------------*/ > > > Running the script through command line: > Gives the following error: > > > $ ./nexus.pl mrp-input.txt nexus.out > newickfile=mrp-input.txt, nexusfile=nexus.out > > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus.pl:23 > > > -------------------------------------- > > > > > Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ > ~sendu/bioperl/Bio/TreeIO.pm > > > Questions:- > > > 1. Please let me know if I am using the correct version. > If not, please point me to the latest one. > > > 2. Provided that the version I am using is the right one, please > let me know what is wrong with the script. > > > Thank you. > Regards, > Neha. > > > > > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From nehadnahar at yahoo.co.in Mon Feb 5 19:58:08 2007 From: nehadnahar at yahoo.co.in (Neha Nahar) Date: Mon, 5 Feb 2007 19:58:08 +0000 (GMT) Subject: [Bioperl-l] Convert newick to nexus format In-Reply-To: <99196.23114.qm@web8711.mail.in.yahoo.com> Message-ID: <36024.1212.qm@web8405.mail.in.yahoo.com> Hi, Thank you for the code. I tried it but I still get the same exception. ------------- EXCEPTION ------------- MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170 STACK toplevel ./nexus1.pl:18 Please find attached the perl file(nexus.pl). Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm Please let me know if I am using the correct version.If not, please point me to the latest one. Thank you. Regards, nnahar Jason Stajich wrote:please cc the mailing list when asking a question or followup. Sorry I don't know what you are doing wrong - you didn't resend your code so I don't know if you still have a typo. This code works fine for me use Bio::TreeIO; use strict; my ($filein,$fileout) = @ARGV; my ($format,$oformat) = qw(newick nexus); my $in = Bio::TreeIO->new(-file => $filein, -format => $format);my $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout"); while( my $t = $in->next_tree ) { $out->write_tree($t); } On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote: Thank you very much for the reply. I fixed the code as per your suggestion,but now am getting a different error: $ ./nexus.pl mrp-input.txt nexus.out newickfile=mrp-input.txt, nexusfile=nexus.out ------------- EXCEPTION ------------- MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170 STACK toplevel ./nexus.pl:23 -------------------------------------- Please help me out with this script. Thank you. Regards, Neha Jason Stajich wrote: you want to write the TREE out not the TREE WRITER. $treeout->write_tree($tree) not $treeout->write_tree($treeout); On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote: Hello everyone, I am trying to convert newick tree to nexus format. Using the script (refered from and email from George dated Wed Sep 22 11:52:47 EDT 2004) : /*------------------------------------------------------------*/ $ cat nexus.pl #!/usr/bin/perl -w use Bio::TreeIO; ($NEWICKFILE, $NEXUSFILE) = @ARGV; print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n"; my $treeio = new Bio::TreeIO(-format => 'newick', -file => "$NEWICKFILE"); my $treeout = new Bio::TreeIO(-format => 'nexus', -file => ">$NEXUSFILE"); while(my $tree = $treeio->next_tree) { $treeout->write_tree($treeout); } exit 0; /*------------------------------------------------------------*/ Running the script through command line: Gives the following error: $ ./nexus.pl mrp-input.txt nexus.out newickfile=mrp-input.txt, nexusfile=nexus.out ------------- EXCEPTION ------------- MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170 STACK toplevel ./nexus.pl:23 -------------------------------------- Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm Questions:- 1. Please let me know if I am using the correct version. If not, please point me to the latest one. 2. Provided that the version I am using is the right one, please let me know what is wrong with the script. Thank you. Regards, Neha. -Neha Nahar " Work for cause and not for applause, live to express and not to impress !" --------------------------------- Here?s a new way to find what you're looking for - Yahoo! Answers -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ -Neha Nahar " Work for cause and not for applause, live to express and not to impress !" --------------------------------- Here?s a new way to find what you're looking for - Yahoo! Answers -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ -Neha Nahar " Work for cause and not for applause, live to express and not to impress !" --------------------------------- Here?s a new way to find what you're looking for - Yahoo! Answers -Neha Nahar " Work for cause and not for applause, live to express and not to impress !" --------------------------------- Here?s a new way to find what you're looking for - Yahoo! Answers -------------- next part -------------- A non-text attachment was scrubbed... Name: nexus.pl Type: application/x-perl Size: 811 bytes Desc: 1389215665-nexus.pl URL: From jason at bioperl.org Mon Feb 5 22:15:52 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 5 Feb 2007 14:15:52 -0800 Subject: [Bioperl-l] Convert newick to nexus format In-Reply-To: <36024.1212.qm@web8405.mail.in.yahoo.com> References: <36024.1212.qm@web8405.mail.in.yahoo.com> Message-ID: <84703383-600F-42F4-A860-DD0D1C43EE83@bioperl.org> Something is wrong with your install I am guessing - can you run the tests? Go to bioperl directory: $ perl t/TreeIO.t can you describe how you installed bioperl? On Feb 5, 2007, at 11:58 AM, Neha Nahar wrote: > > Hi, > Thank you for the code. > I tried it but I still get the same exception. > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus1.pl:18 > > > Please find attached the perl file(nexus.pl). > > > Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ > ~sendu/bioperl/Bio/TreeIO.pm > > Please let me know if I am using the correct version.If not, please > point me to the latest one. > > Thank you. > Regards, > nnahar > > > > > > Jason Stajich wrote:please cc the mailing list > when asking a question or followup. > > Sorry I don't know what you are doing wrong - you didn't resend > your code so I don't know if you still have a typo. > > This code works fine for me > > use Bio::TreeIO; > use strict; > my ($filein,$fileout) = @ARGV; > my ($format,$oformat) = qw(newick nexus); > my $in = Bio::TreeIO->new(-file => $filein, -format => $format);my > $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout"); > > > while( my $t = $in->next_tree ) { > $out->write_tree($t); > } > > > > On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote: > > Thank you very much for the reply. > > > I fixed the code as per your suggestion,but now am getting a > different error: > > > $ ./nexus.pl mrp-input.txt nexus.out > newickfile=mrp-input.txt, nexusfile=nexus.out > > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus.pl:23 > > > -------------------------------------- > > > Please help me out with this script. > > > Thank you. > Regards, > Neha > > > > > > > > > Jason Stajich wrote: you want to write the TREE > out not the TREE WRITER. > > > > > $treeout->write_tree($tree) > > > not > $treeout->write_tree($treeout); > > > On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote: > > > Hello everyone, > > > > > I am trying to convert newick tree to nexus format. > Using the script (refered from and email from George dated Wed Sep > 22 11:52:47 EDT 2004) : > > > > > /*------------------------------------------------------------*/ > > > > > $ cat nexus.pl > #!/usr/bin/perl -w > > > > > use Bio::TreeIO; > > > > > ($NEWICKFILE, $NEXUSFILE) = @ARGV; > print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n"; > my $treeio = new Bio::TreeIO(-format => 'newick', -file => > "$NEWICKFILE"); > my $treeout = new Bio::TreeIO(-format => 'nexus', -file => "> > $NEXUSFILE"); > while(my $tree = $treeio->next_tree) { > $treeout->write_tree($treeout); > } > > > > > exit 0; > > > > > > > > > /*------------------------------------------------------------*/ > > > > > Running the script through command line: > Gives the following error: > > > > > $ ./nexus.pl mrp-input.txt nexus.out > newickfile=mrp-input.txt, nexusfile=nexus.out > > > > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus.pl:23 > > > > > -------------------------------------- > > > > > > > > > Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ > ~sendu/bioperl/Bio/TreeIO.pm > > > > > Questions:- > > > > > 1. Please let me know if I am using the correct version. > If not, please point me to the latest one. > > > > > 2. Provided that the version I am using is the right one, please > let me know what is wrong with the script. > > > > > Thank you. > Regards, > Neha. > > > > > > > > > > > > > > > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > > > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > > > > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > > > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > > > > > > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not > to impress !" > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > From lzhtom at hotmail.com Tue Feb 6 03:31:56 2007 From: lzhtom at hotmail.com (zhihua li) Date: Tue, 06 Feb 2007 03:31:56 +0000 Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file? In-Reply-To: Message-ID: Thanks a lot! After checking out the script bp_index, I changed the syntax to: my $inx = Bio::Index::Fasta->new("test.fasta.idx", 'WRITE'); $inx->make_index("test.fasta"); Now I have a index file test.fasta.idx in my current directory. And I can use it in my later script by saying my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx"); So now everything is OK. But I don't understand why I have to use that syntax. And why the syntax provided in the document didn't work? >From: Jason Stajich >To: zhihua li >CC: bioperl-l at lists.open-bio.org, arokfl at yahoo.com >Subject: Re: [Bioperl-l] Bio::index::Fasta- where's the indexed file? >Date: Thu, 1 Feb 2007 22:24:44 -0800 > >I don't think BIOPERL_INDEX does anything in the module so that >documentation is not quite right. the env variable is used in the >scripts/index/bp_index and bp_fetch scripts so maybe a cut+paste job >went bad somewhere. > >you need to specify the full path you want with -filename - you can >just prepen the BIOPERL_INDEX to the filename like. >-filename => $ENV{BIOPERL_INDEX}."/$index" > >-jason >On Feb 1, 2007, at 7:27 PM, zhihua li wrote: > > > Sorry guys, the former empty mail was sent out by mistake. > > > > I'm using Bio::index::Fasta to index a file containing lots of > > sequences in fasta format. All is fine except one thing. > > > > According to the bioperl tutorial and the documents, the following > > code will make a indexed file: > > > > my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx", > > -write_flag => 1); > > $inx->make_index("test.fasta"); > > > > And in another script I can access the indexed file by sayinig > > > > $ENV{BIOPERL_INDEX} = "."; # find index in current directory > > my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx"); > > my $seq=$inx->fetch("ent1001"); #fetch the sequence named > > ent1001 > > > > However, after running the first script, I cannot find a new file > > test.fasta.idx in my current directory. And not surprisingly, when > > I ran the second script, perl told me it couldn't find > > "test.fasta.idx". > > > > What's going on here? > > > > Thanks a lot! > > > > _________________________________________________________________ > > ???????????????????????????????????????? MSN Messenger: http:// > > messenger.msn.com/cn > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > >-- >Jason Stajich >Miller Research Fellow >University of California, Berkeley >lab: 510.642.8441 >http://pmb.berkeley.edu/~taylor/people/js.html >http://fungalgenomes.org/ > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l _________________________________________________________________ ???????? MSN Explorer: http://explorer.msn.com/lccn/ From johnston at biochem.ucl.ac.uk Tue Feb 6 11:52:08 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Tue, 6 Feb 2007 11:52:08 +0000 (GMT) Subject: [Bioperl-l] RNA folding Message-ID: Hello, I've just joined the list - I'm a Bioinformatics PhD student at Essex University doing transcriptomics-related things. Mainly microarray analysis and more recently looking at RNA structure prediction. I was thinking about having a go at writing a bioperl-run wrapper around some of the structure prediction stuff, but according to the wiki this is being done already (at least for the Vienna tools). I spoke to Albert Vilella at the EBI the other day and he said Chris Fields was the man to speak to. So could he (or anyone) let me know what the current state of RNA structure prediction tools in bioperl is? Cheers, Cass xx From marian.thieme at lycos.de Tue Feb 6 13:52:10 2007 From: marian.thieme at lycos.de (Marian Thieme) Date: Tue, 06 Feb 2007 14:52:10 +0100 Subject: [Bioperl-l] dbSNP Message-ID: <45C8880A.7030702@lycos.de> Hello all, I looked for a method/class/function/script in the docuementation which provides the opportunity to generate a snp assay suited to submit to dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP/ http://www.ncbi.nlm.nih.gov/projects/SNP/how_to_submit.html) I didnt find those code, but I recognized that there is at least a xml parser to read dbSNP reports. Does anybody know if there is also an output class to generate dbSNP reports ? I could imagine, that at least the snp assay section is worth to be implemented. This example is given by ncbi: TYPE:SNPASSAY HANDLE:WI BATCH: 1.98 MOLTYPE:Genomic METHOD:RESEQ SYN NAMES:WI-SNP,DnaId,MapDna COMMENT: Here is where some public comment that applies to the entire batch of SNPS could be put. PRIVATE: Here is where a note to NCBI regarding processing that would not be seen by the outside, could be put. Note that these are is not exactly real SNPs, as the data were modified. || SNP:WI|WIAF-1234567 SYNONYM:EST4291092,EST8291092,EST7291092 ACCESSION:H30533 LENGTH:101 5'_ASSAY:GGCAGGGAAGGAAAATCCTAGGGNCAGCATTGGGGAGGGGGGGACTCTG OBSERVED:C/T 3'_ASSAY:TAAATTTATTGGGCAACAGGCTGCAGGTGAGGGGGCTGACAGGAGGAGGGA || SNP:WI|WIAF-1722 SYNONYM:STS-T17494,STS-T17494,STS-T17494 ACCESSION:T17494 LENGTH:269 5'_FLANK:CTTTCCCTCATCCCCTCTTCCACCACACCATCCCGGAACAAGTGCTCCAGGATT 5'_ASSAY:CCCTGCCCACTGGCCATTTTGGAGTGTGTCC OBSERVED:A/T 3'_ASSAY:GTGGGTAGCAATGTGGAAACCACCAGGGCCTTTGTGGAGAAAA 3'_FLANK:TGGAGGGGGTTGAGGGAGTCCCAGGAGGGGCTTATTTGAGGGCCTTTGCCACTT GCTCATAGGCGAGCTCGATCTCCTCATCATCTGGACAGGTGGAAGCGAATTCTT CCCGGGCGTAGGCATTGCTCAAGTACCGAT || Regards, Marian P.S. this is not in contradiction to my first request about the brackets notation. We need both formats. From cjfields at uiuc.edu Tue Feb 6 16:45:36 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 6 Feb 2007 10:45:36 -0600 Subject: [Bioperl-l] RNA folding In-Reply-To: References: Message-ID: On Feb 6, 2007, at 5:52 AM, Caroline Johnston wrote: > Hello, > > I've just joined the list - I'm a Bioinformatics PhD student at Essex > University doing transcriptomics-related things. Mainly microarray > analysis and more recently looking at RNA structure prediction. > > I was thinking about having a go at writing a bioperl-run wrapper > around > some of the structure prediction stuff, but according to the wiki > this is > being done already (at least for the Vienna tools). I spoke to Albert > Vilella at the EBI the other day and he said Chris Fields was the > man to > speak to. So could he (or anyone) let me know what the current > state of > RNA structure prediction tools in bioperl is? > > Cheers, > Cass xx > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Actually, the only RNA tool wrappers I have made are ones for ERPIN, RNAMotif, and Infernal (the only one in bioperl-run CVS at this time is RNAMotif). I am planning on writing up wrappers for Vienna, UNAFold, and a few others but haven't really started in. Here's where I'm at right now... I am writing up a new set of AnnotationI classes which positionally describe data (Meta) which I hope will help deal with this stuff. These would be similar in nature to Heikki's Bio::Seq::Meta classes: http://bioperl.org/pipermail/bioperl-l/2006-December/024414.html I would use a regular Bio::SeqI and store the structural data and anything else (such as energy calculations, etc) as Annotation objects in an AnnotationCollection, and then write up a series of SeqIO modules to get data into/out of the designated structure formats, like UNAfold ct, RNAML, and so on. Each sequence would then be capable of holding more than one structural Annotation (i.e. could represent different folding pathways, alternative RNA folds, and so on). At this point I represent the data as an array of hashes where $array [0] is nt 1 and the hash keys indicate the type of interaction, base interacted with, etc. The text representation would be as simple Eddy WUSS (Rfam-like) format by default, which is capable of representing some complex data (pseudoknots, for instance), is compact, and is documented (via the Infernal manual). Tags will probably switch to more ontologically relevant terms (probably from RNAML or RNA Ontology), but in general it is something like this: [ {'interaction' => 'WC', 'base' => 24}, {'interaction' => 'WC', 'base' => 23}, {'interaction' => 'SS'}, ... ] In this implementation every seq position would have some kind of interaction designation, though that's open for debate as it could just be simple text or undef for single-stranded regions. This is also scalable based on complexity of the data: if one wanted to add tert/quaternary interactions, location, base modifications, remote sequence interactions, etc., extra key/value pairs could be used. Comversely, if one only wanted sec structure (for drawing RNA structures, for example), then only that data would be parsed. If you (or anyone listening) have any suggestions I would greatly appreciate them. chris From johnsonm at gmail.com Tue Feb 6 23:53:49 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Tue, 6 Feb 2007 17:53:49 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> Message-ID: Okay, I need to get something going for a project I'm working on. Options: 1) Stick it all in one module: This can get a bit ugly, as Glimmer, as opposed to GlimmerM and GlimmerHMM, does not explicitly identify itself in the prediction report. You can pick up on some unique things in the output file, but you don't know what you've got until you're actually parsing it. Unless you require a format argument up front, then you can split the parsing code up into different functions. 2) Two modules, one for GlimmerM/GlimmerHMM and one for Glimmer2/Glimmer3. With or without an abstract dispatch front end. I suppose at this point, after getting my hands dirty, I'd prefer 1), with an explicit -format => Glimmer2/3/M/HMM arg required in the constructor. Though I'm not opposed to 2) if that is what it takes to get it into Bioperl. If we can achieve some sort of consensus without too much bloodshed, I'll shoot y'all some patches and we can consider this issue checked off the list. On 9/20/06, Mark Johnson wrote: > > I think it's going to be at least two modules, one for the > prokaryotic stuff and one for the eukaryotic. And really, the > prokaryotic stuff is different enough to warrant two modules. So three > different parsers. Could do it in one, but it would be ugly and > nasty. However, this does not preclude three parsers and one abstract > interface, which is your excellent suggestion. > Oh, and excuse me, but I have a bit of a rant here, after dealing > with parsers and pipelines for the last few months. Parsers should > not load the whole input file into RAM to parse it. And Pipelines > using the parsers (Ensembl / biopipe) should not stuff the whole > result set from the parser into a single array. When you're trying to > annotate assemblies, it sucks to have to split up contigs/supercontigs > because the whole result set won't fit into RAM on a 12 gig blade. > Sheesh. Though this doesn't matter for bacterial genomes, as they're > tiny (by comparison to vertebrates). There, sorry, been saving up > that frustration for a while. No offense meant, hope I didn't tick > anybody off. 8) > Torsten: You sound like you know what you're doing with respect > to Bioperl more than I do, and I know I don't have CVS access, so I'll > defer to you. I'd be happy to help out, though. > > > On 9/20/06, Hilmar Lapp wrote: > > > > On Sep 19, 2006, at 9:13 PM, Torsten Seemann wrote: > > > > > I'm not sure whether to > > > > > > 1. parse them all under the same module, perhaps with a > > > -format=>'glimmerXXX' parameter > > > > > > 2. create a single new module Glimmer2 and Glimmer3 > > > > > > 3. create two new modules, one for Glimmer2 and one for Glimmer3, > > > given > > > they are different outputs both in syntax and number of output files > > > > > > Any advice from Bioperl 'old timers' appreciated ;-) > > > > > > > If at all possible I'd favor 1), with e.g. Bio::Tools::GFF being an > > example for how this can work. > > > > If this would amount to basically 4 modules stringed together into > > one file (because the parsing code can't share much if anything > > between the flavors), it'd still be advantageous to have a single > > frontend module that would then dispatch. > > > > -hilmar > > > > -- > > =========================================================== > > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > =========================================================== > > > > > > > > > > > > > From jason at bioperl.org Wed Feb 7 00:33:11 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 6 Feb 2007 16:33:11 -0800 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> Message-ID: <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> I definitely vote for 1) - worst case you have 4 separate methods if there is no good way to condense the parsing for each format and require the user to specify the format. I have no problem with requiring user to specify what program she used - if we can be fancy and guess the format later (i.e. guess format in SeqIO) -then that's icing. -jason On Feb 6, 2007, at 3:53 PM, Mark Johnson wrote: > Okay, I need to get something going for a project I'm working on. > Options: > > 1) Stick it all in one module: This can get a bit ugly, as > Glimmer, as > opposed to GlimmerM and GlimmerHMM, does not explicitly identify > itself in > the prediction report. You can pick up on some unique things in > the output > file, but you don't know what you've got until you're actually > parsing it. > Unless you require a format argument up front, then you can split the > parsing code up into different functions. > 2) Two modules, one for GlimmerM/GlimmerHMM and one for Glimmer2/ > Glimmer3. > With or without an abstract dispatch front end. > > I suppose at this point, after getting my hands dirty, I'd prefer > 1), with > an explicit -format => Glimmer2/3/M/HMM arg required in the > constructor. > Though I'm not opposed to 2) if that is what it takes to get it into > Bioperl. > > If we can achieve some sort of consensus without too much > bloodshed, I'll > shoot y'all some patches and we can consider this issue checked off > the > list. > > On 9/20/06, Mark Johnson wrote: >> >> I think it's going to be at least two modules, one for the >> prokaryotic stuff and one for the eukaryotic. And really, the >> prokaryotic stuff is different enough to warrant two modules. So >> three >> different parsers. Could do it in one, but it would be ugly and >> nasty. However, this does not preclude three parsers and one >> abstract >> interface, which is your excellent suggestion. >> Oh, and excuse me, but I have a bit of a rant here, after dealing >> with parsers and pipelines for the last few months. Parsers should >> not load the whole input file into RAM to parse it. And Pipelines >> using the parsers (Ensembl / biopipe) should not stuff the whole >> result set from the parser into a single array. When you're >> trying to >> annotate assemblies, it sucks to have to split up contigs/ >> supercontigs >> because the whole result set won't fit into RAM on a 12 gig blade. >> Sheesh. Though this doesn't matter for bacterial genomes, as they're >> tiny (by comparison to vertebrates). There, sorry, been saving up >> that frustration for a while. No offense meant, hope I didn't tick >> anybody off. 8) >> Torsten: You sound like you know what you're doing with respect >> to Bioperl more than I do, and I know I don't have CVS access, so >> I'll >> defer to you. I'd be happy to help out, though. >> >> >> On 9/20/06, Hilmar Lapp wrote: >>> >>> On Sep 19, 2006, at 9:13 PM, Torsten Seemann wrote: >>> >>>> I'm not sure whether to >>>> >>>> 1. parse them all under the same module, perhaps with a >>>> -format=>'glimmerXXX' parameter >>>> >>>> 2. create a single new module Glimmer2 and Glimmer3 >>>> >>>> 3. create two new modules, one for Glimmer2 and one for Glimmer3, >>>> given >>>> they are different outputs both in syntax and number of output >>>> files >>>> >>>> Any advice from Bioperl 'old timers' appreciated ;-) >>>> >>> >>> If at all possible I'd favor 1), with e.g. Bio::Tools::GFF being an >>> example for how this can work. >>> >>> If this would amount to basically 4 modules stringed together into >>> one file (because the parsing code can't share much if anything >>> between the flavors), it'd still be advantageous to have a single >>> frontend module that would then dispatch. >>> >>> -hilmar >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >>> >>> >>> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From torsten.seemann at infotech.monash.edu.au Wed Feb 7 02:36:54 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Wed, 7 Feb 2007 13:36:54 +1100 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: > I definitely vote for 1) - worst case you have 4 separate methods if > there is no good way to condense the parsing for each format and > require the user to specify the format. And make the defaut -format to be what is currently parses, ie. GlimmerM/GlimmerHMM > I have no problem with requiring user to specify what program she > used - if we can be fancy and guess the format later (i.e. guess > format in SeqIO) -then that's icing. Agreed. >> Okay, I need to get something going for a project I'm working on. I would normally try to help but I am so swamped with work-work at the moment. Just a reminder that last year I added examples of the different Glimmer outputs to the CVS repository: ./t/data/Glimmer3.predict ./t/data/Glimmer3.detail ./t/data/GlimmerHMM.out ./t/data/Glimmer2.out ./t/data/GlimmerM.out ./t/data/glimmer.out (this was the original one) Thanks for taking this on! --Torsten From mitch_skinner at berkeley.edu Wed Feb 7 04:37:35 2007 From: mitch_skinner at berkeley.edu (Mitch Skinner) Date: Tue, 06 Feb 2007 20:37:35 -0800 Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels Message-ID: <45C9578F.2060802@berkeley.edu> Hello, I'm working on an AJAX version of GBrowse (http://genome.biowiki.org), where we're pre-rendering entire chromosomes by breaking them up into tiles. One of the problems we have is that it takes a long time to render all those tiles. One of the things that's slowing the process down (and using lots of RAM) is rendering the gridlines, and it would make things a lot easier (and faster) for us if we could assume that the gridlines were the same for each tile. Since we're only rendering at a particular set of zoom levels (that we have control over), I think this is a reasonable assumption. Given the right set of zoom levels, the assumption works almost all the time, except for one specific case. It has to do with the way draw_grid and map_pt in Bio::Graphics::Panel work for the very first gridline. Here's how draw_grid (in CVS HEAD) computes the first gridline: my $first_tick = $minor * int($self->start/$minor); $first_tick, $minor and $self->start are in base-pair space, which is 1-based. However, if ($self->start < $minor) then $first_tick is 0. This might not be a problem, except that $first_tick is translated into pixel coordinates in map_pt, which expects 1-based bp coordinates. Here are the relevant lines in map_pt: my $val = $flip ? int (0.5 + $pr - ($length - ($_- 1)) * $scale) : int (0.5 + ($_-$offset-1) * $scale); This style of rounding only works for positive numbers; rounding 0.6 by doing int(0.5 + 0.6) gives you 1 as expected, but rounding -0.6 by doing int(0.5 + -0.6) gives you 0. So if the first three gridlines are at 0, 10, and 20 bp, then (assuming $scale is 1, $offset is 0, $flip evaluates false, and pad left is 0) they're drawn at pixels 0, 9, and 19. I think that there should be gridlines at pixels 0, 10, and 20. The fact that currently the first interval is 9 pixels and the second is 10 pixels is breaking my hopeful assumption about the gridlines. AFAICT my problems are solved if we make two changes: change the above line from draw_grid to this: my $first_tick = 1 + $minor * int(($start - 1)/$minor); and change the lines from map_pt to this: my $val = $flip ? ($pr - ($length - ($_- 1)) * $scale) : (($_-$offset-1) * $scale); $val = int($val + .5 * ($val <=> 0)); Does this make sense? If people agree that these changes are right then I can also produce a proper patch if y'all would prefer that. Regards, Mitch From lstein at cshl.edu Wed Feb 7 12:17:22 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 7 Feb 2007 07:17:22 -0500 Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels In-Reply-To: <45C9578F.2060802@berkeley.edu> References: <45C9578F.2060802@berkeley.edu> Message-ID: <6dce9a0b0702070417k692fedcchfac5b1f0e01f72bd@mail.gmail.com> Hi Mitch, Zero is not a forbidden coordinate, since gbrowse also works on genetic maps which have negative and floating point coordinates. You've simply picked up a boundary case where the rounding isn't working properly. I will fix this now. Lincoln On 2/6/07, Mitch Skinner wrote: > > Hello, > > I'm working on an AJAX version of GBrowse (http://genome.biowiki.org), > where we're pre-rendering entire chromosomes by breaking them up into > tiles. One of the problems we have is that it takes a long time to > render all those tiles. One of the things that's slowing the process > down (and using lots of RAM) is rendering the gridlines, and it would > make things a lot easier (and faster) for us if we could assume that the > gridlines were the same for each tile. Since we're only rendering at a > particular set of zoom levels (that we have control over), I think this > is a reasonable assumption. > > Given the right set of zoom levels, the assumption works almost all the > time, except for one specific case. It has to do with the way draw_grid > and map_pt in Bio::Graphics::Panel work for the very first gridline. > > Here's how draw_grid (in CVS HEAD) computes the first gridline: > > my $first_tick = $minor * int($self->start/$minor); > > $first_tick, $minor and $self->start are in base-pair space, which is > 1-based. However, if ($self->start < $minor) then $first_tick is 0. > This might not be a problem, except that $first_tick is translated into > pixel coordinates in map_pt, which expects 1-based bp coordinates. Here > are the relevant lines in map_pt: > > my $val = $flip > ? int (0.5 + $pr - ($length - ($_- 1)) * $scale) > : int (0.5 + ($_-$offset-1) * $scale); > > This style of rounding only works for positive numbers; rounding 0.6 by > doing int(0.5 + 0.6) gives you 1 as expected, but rounding -0.6 by doing > int(0.5 + -0.6) gives you 0. So if the first three gridlines are at 0, > 10, and 20 bp, then (assuming $scale is 1, $offset is 0, $flip evaluates > false, and pad left is 0) they're drawn at pixels 0, 9, and 19. > > I think that there should be gridlines at pixels 0, 10, and 20. The > fact that currently the first interval is 9 pixels and the second is 10 > pixels is breaking my hopeful assumption about the gridlines. > > AFAICT my problems are solved if we make two changes: > change the above line from draw_grid to this: > my $first_tick = 1 + $minor * int(($start - 1)/$minor); > and change the lines from map_pt to this: > > my $val = $flip > ? ($pr - ($length - ($_- 1)) * $scale) > : (($_-$offset-1) * $scale); > $val = int($val + .5 * ($val <=> 0)); > > Does this make sense? If people agree that these changes are right then > I can also produce a proper patch if y'all would prefer that. > > Regards, > Mitch > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Wed Feb 7 12:18:40 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 7 Feb 2007 07:18:40 -0500 Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels In-Reply-To: <45C9578F.2060802@berkeley.edu> References: <45C9578F.2060802@berkeley.edu> Message-ID: <6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com> However, I'm also very interested in why grid-drawing takes so long. When I've profiled drawing, neither grid drawing nor map_pt() consume any significant amount of time. Lincoln On 2/6/07, Mitch Skinner wrote: > > Hello, > > I'm working on an AJAX version of GBrowse (http://genome.biowiki.org), > where we're pre-rendering entire chromosomes by breaking them up into > tiles. One of the problems we have is that it takes a long time to > render all those tiles. One of the things that's slowing the process > down (and using lots of RAM) is rendering the gridlines, and it would > make things a lot easier (and faster) for us if we could assume that the > gridlines were the same for each tile. Since we're only rendering at a > particular set of zoom levels (that we have control over), I think this > is a reasonable assumption. > > Given the right set of zoom levels, the assumption works almost all the > time, except for one specific case. It has to do with the way draw_grid > and map_pt in Bio::Graphics::Panel work for the very first gridline. > > Here's how draw_grid (in CVS HEAD) computes the first gridline: > > my $first_tick = $minor * int($self->start/$minor); > > $first_tick, $minor and $self->start are in base-pair space, which is > 1-based. However, if ($self->start < $minor) then $first_tick is 0. > This might not be a problem, except that $first_tick is translated into > pixel coordinates in map_pt, which expects 1-based bp coordinates. Here > are the relevant lines in map_pt: > > my $val = $flip > ? int (0.5 + $pr - ($length - ($_- 1)) * $scale) > : int (0.5 + ($_-$offset-1) * $scale); > > This style of rounding only works for positive numbers; rounding 0.6 by > doing int(0.5 + 0.6) gives you 1 as expected, but rounding -0.6 by doing > int(0.5 + -0.6) gives you 0. So if the first three gridlines are at 0, > 10, and 20 bp, then (assuming $scale is 1, $offset is 0, $flip evaluates > false, and pad left is 0) they're drawn at pixels 0, 9, and 19. > > I think that there should be gridlines at pixels 0, 10, and 20. The > fact that currently the first interval is 9 pixels and the second is 10 > pixels is breaking my hopeful assumption about the gridlines. > > AFAICT my problems are solved if we make two changes: > change the above line from draw_grid to this: > my $first_tick = 1 + $minor * int(($start - 1)/$minor); > and change the lines from map_pt to this: > > my $val = $flip > ? ($pr - ($length - ($_- 1)) * $scale) > : (($_-$offset-1) * $scale); > $val = int($val + .5 * ($val <=> 0)); > > Does this make sense? If people agree that these changes are right then > I can also produce a proper patch if y'all would prefer that. > > Regards, > Mitch > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From johnsonm at gmail.com Wed Feb 7 16:50:05 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Wed, 7 Feb 2007 10:50:05 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: Well, each format has some unique features. If the user declines to specify the format, I can figure it out, but it will probably involve scanning the input file twice. I'll take a look. I can do all the parsing in one function, in fact I have, just to see how nasty it would end up being. I just can't stomach having the code that tightly coupled and hard to read. In the end it'll probably be three functions. GlimmermM/HMM are pretty close. Maybe two, Glimmer2 and Glimmer3 aren't *that* different, either. On 2/6/07, Jason Stajich wrote: > > I definitely vote for 1) - worst case you have 4 separate methods if there > is no good way to condense the parsing for each format and require the user > to specify the format. > > I have no problem with requiring user to specify what program she used - > if we can be fancy and guess the format later (i.e. guess format in SeqIO) > -then that's icing. > > -jason > > From adsj at novozymes.com Wed Feb 7 17:11:32 2007 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Wed, 07 Feb 2007 18:11:32 +0100 Subject: [Bioperl-l] Bioperl 1.4 to 1.5.2, adding qualifiers to Bio::Seq-objects Message-ID: <8764adoptn.fsf@topper.koldfront.dk> Hi. I am transitioning from Bioperl 1.4 to 1.5.2, and the qualifiers I add to features in Bio::Seq objects have stopped appearing when I output them as EMBL or GenBank-files. Below is a test-script that exercises the problem. I guess I should be doing something else when adding qualifiers, now with 1.5.2 (as reading an EMBL-file with Bio::SeqIO and outputting it again of course works perfectly), but I can't deduce what from perldoc Bio::SeqFeature::Generic - it still lists the add_tag_value method, and calling it doesn't croak nor warn. I have found some comments on this in the release notes of 1.5.0? on the Bioperl wiki, but I must admit I wasn't able to extract what methods I should be calling instead. If someone could point me to the relevant documentation or tell me what method to use instead, I would be happy as a clam. Best regards, Adam == = use Test::More tests=>2; use strict; use warnings; use Bio::Seq; use Bio::SeqFeature::Generic; use IO::String; use Bio::SeqIO; my $seq=Bio::Seq->new( -seq=>'actgactgactg', ); $seq->display_id('D27'); $seq->accession_number('DB:D27'); my $seq_feature=Bio::SeqFeature::Generic->new( -strand=>1, -primary=>'source', ); $seq_feature->set_attributes(-start=>2, -end=>8); $seq_feature->add_tag_value(note=>'TEST'); $seq_feature->add_tag_value(db_xref=>'DB:D27'); $seq->add_SeqFeature($seq_feature); my $raw=''; my $fh=IO::String->new($raw); my $out=Bio::SeqIO->new(-format=>'EMBL', -fh=>$fh); $out->write_seq($seq); ok($raw=~m!/note!, 'Qualifier note found'); ok($raw=~m!/db_xref!, 'Qualifier db_xref found'); == = ? -- Adam Sj?gren adsj at novozymes.com From cjfields at uiuc.edu Wed Feb 7 17:50:13 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Feb 2007 11:50:13 -0600 Subject: [Bioperl-l] Bioperl 1.4 to 1.5.2, adding qualifiers to Bio::Seq-objects In-Reply-To: <8764adoptn.fsf@topper.koldfront.dk> References: <8764adoptn.fsf@topper.koldfront.dk> Message-ID: On Feb 7, 2007, at 11:11 AM, Adam Sj?gren wrote: > Hi. > > > I am transitioning from Bioperl 1.4 to 1.5.2, and the qualifiers I add > to features in Bio::Seq objects have stopped appearing when I output > them as EMBL or GenBank-files. > > Below is a test-script that exercises the problem. > > I guess I should be doing something else when adding qualifiers, now > with 1.5.2 (as reading an EMBL-file with Bio::SeqIO and outputting it > again of course works perfectly), but I can't deduce what from perldoc > Bio::SeqFeature::Generic - it still lists the add_tag_value method, > and calling it doesn't croak nor warn. > > I have found some comments on this in the release notes of 1.5.0? on > the Bioperl wiki, but I must admit I wasn't able to extract what > methods I should be calling instead. > > If someone could point me to the relevant documentation or tell me > what method to use instead, I would be happy as a clam. > > > Best regards, > > Adam ... This works for me using bioperl-live (Mac OS X): ok 1 - Qualifier note found ok 2 - Qualifier db_xref found If I print the string I get: ID DB:D27; SV 1; linear; unassigned DNA; STD; UNC; 12 BP. XX AC DB:D27; XX XX FH Key Location/Qualifiers FH FT source 2..8 FT /db_xref="DB:D27" FT /note="TEST" XX SQ Sequence 12 BP; 3 A; 3 C; 3 G; 3 T; 0 other; actgactgac tg 12 // GenBank also works: LOCUS D27 12 bp dna linear UNK ACCESSION DB:D27 FEATURES Location/Qualifiers source 2..8 /db_xref="DB:D27" /note="TEST" BASE COUNT 3 a 3 c 3 g 3 t ORIGIN 1 actgactgac tg // If you haven't uninstalled 1.4, make sure you aren't running 1.4 or mixing the two versions (you can check by using 'perldoc -l Bio::Root::Root'). chris From cjfields at uiuc.edu Wed Feb 7 18:04:33 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Feb 2007 12:04:33 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: <91A3A651-C0D5-495F-941F-05B8AA0DDA60@uiuc.edu> On Feb 7, 2007, at 10:50 AM, Mark Johnson wrote: > Well, each format has some unique features. If the user > declines to > specify the format, I can figure it out, but it will probably involve > scanning the input file twice. I'll take a look. > I can do all the parsing in one function, in fact I have, just > to see > how nasty it would end up being. I just can't stomach having the > code that > tightly coupled and hard to read. In the end it'll probably be three > functions. GlimmermM/HMM are pretty close. Maybe two, Glimmer2 and > Glimmer3 aren't *that* different, either. I don't see a problem with passing off the parse to a defined class method either right off or mid-parse. I'm doing something like this with a revamped GenBank parser: # declare local to module my %GLIMMER_METHODS = ( 'GlimmerHMM' => '_parsehmm', 'Glimmer' => '_parsenormal', ....others if needed '_DEFAULT_' => '_parseabnormal' ); ... Then either preparse part of file using _readline() to determine format, or use -format and bypass preparsing: sub next_thingy { ... if (!$format) { while (my $line = $self->_readline()) { if ($line =~ m{(something)}) { $format = $1; $self->_pushback($line); last; } } } my $method = (exists $GLIMMER_METHODS($format)) ? $GLIMMER_METHODS($format) : ($GLIMMER_METHODS('_DEFAULT_'); # fallback to this one return $self->$method() # hand off parsing flow to to proper parser ... } # all parser variants would have this structure: sub _parsehmm { my $self = shift; ... init stuff here while (my $line = $self->_readline()) { ... do stuff until END of next prediction/report } ... return data if any } chris > On 2/6/07, Jason Stajich wrote: >> >> I definitely vote for 1) - worst case you have 4 separate methods >> if there >> is no good way to condense the parsing for each format and require >> the user >> to specify the format. >> >> I have no problem with requiring user to specify what program she >> used - >> if we can be fancy and guess the format later (i.e. guess format >> in SeqIO) >> -then that's icing. >> >> -jason >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From johnston at biochem.ucl.ac.uk Wed Feb 7 18:56:52 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Wed, 7 Feb 2007 18:56:52 +0000 (GMT) Subject: [Bioperl-l] RNA folding In-Reply-To: References: Message-ID: Thanks Chris. Storing the interaction data as a hash according to an ontology and using an extended bracket notation as the string representation seems to make sense, but I'm still unsure how this is supposed to be attached to the Seq objects. You reckon it should be an AnnotationI? I'm not sure I understand the distinction between annotations and features. From the docs I got the impression that Features were like annotation on bits of sequences and had a reference to the sequence to which they belong, whereas annotations don't. If that's the case though, why would RNA structure be an annotation rather than a feature? If not, what is the distinction between them? Are the positional Annotation subclasses you're developing intended to replace features? Have I got the wrong end of the stick entirely? Cheers, Cass On Tue, 6 Feb 2007, Chris Fields wrote: > Actually, the only RNA tool wrappers I have made are ones for ERPIN, > RNAMotif, and Infernal (the only one in bioperl-run CVS at this time > is RNAMotif). I am planning on writing up wrappers for Vienna, > UNAFold, and a few others but haven't really started in. Here's > where I'm at right now... > > I am writing up a new set of AnnotationI classes which positionally > describe data (Meta) which I hope will help deal with this stuff. > These would be similar in nature to Heikki's Bio::Seq::Meta classes: > > http://bioperl.org/pipermail/bioperl-l/2006-December/024414.html > > I would use a regular Bio::SeqI and store the structural data and > anything else (such as energy calculations, etc) as Annotation > objects in an AnnotationCollection, and then write up a series of > SeqIO modules to get data into/out of the designated structure > formats, like UNAfold ct, RNAML, and so on. Each sequence would then > be capable of holding more than one structural Annotation (i.e. could > represent different folding pathways, alternative RNA folds, and so on). > > At this point I represent the data as an array of hashes where $array > [0] is nt 1 and the hash keys indicate the type of interaction, base > interacted with, etc. The text representation would be as simple > Eddy WUSS (Rfam-like) format by default, which is capable of > representing some complex data (pseudoknots, for instance), is > compact, and is documented (via the Infernal manual). Tags will > probably switch to more ontologically relevant terms (probably from > RNAML or RNA Ontology), but in general it is something like this: > > [ > {'interaction' => 'WC', > 'base' => 24}, > {'interaction' => 'WC', > 'base' => 23}, > {'interaction' => 'SS'}, > ... > ] > > In this implementation every seq position would have some kind of > interaction designation, though that's open for debate as it could > just be simple text or undef for single-stranded regions. > > This is also scalable based on complexity of the data: if one wanted > to add tert/quaternary interactions, location, base modifications, > remote sequence interactions, etc., extra key/value pairs could be > used. Comversely, if one only wanted sec structure (for drawing RNA > structures, for example), then only that data would be parsed. > > If you (or anyone listening) have any suggestions I would greatly > appreciate them. > > chris > > From cjfields at uiuc.edu Wed Feb 7 22:15:44 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Feb 2007 16:15:44 -0600 Subject: [Bioperl-l] RNA folding In-Reply-To: References: Message-ID: <7360B66F-6AF3-4CB1-8343-0A19E42AD7F8@uiuc.edu> On Feb 7, 2007, at 12:56 PM, Caroline Johnston wrote: > Thanks Chris. > > Storing the interaction data as a hash according to an ontology and > using > an extended bracket notation as the string representation seems to > make > sense, but I'm still unsure how this is supposed to be > attached to the Seq objects. You reckon it should be an AnnotationI? As long as it describes everything in the object and that there is a reasonable way of textually representing the data, I think you can attach anything as annotation. A recent example is the addition of trees as annotation. Also, Annotation can be used to describe alignments (such as the structure consensus string in Rfam alignments), or added to SeqFeatures. The class just needs to implement AnnotatableI. > I'm not sure I understand the distinction between annotations and > features. From the docs I got the impression that Features were like > annotation on bits of sequences and had a reference to the sequence to > which they belong, whereas annotations don't. If that's the case > though, > why would RNA structure be an annotation rather than a feature? If > not, > what is the distinction between them? Are the positional Annotation > subclasses you're developing intended to replace features? Have I > got the > wrong end of the stick entirely? > > Cheers, > Cass The key distinction between seqfeatures and annotations is that annotations are normally associated with the entire sequence record, while seqfeatures normally describe a part of the sequence (and thus have a location on the sequence). There are a few exceptions, but in general that's that case. The HOWTO gives a bit more background: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation Using annotations or seqfeatures in a case like this may be completely dependent on one's point of view. For instance, one implementation I had considered was adding an interface to Bio::Seq which would allow Seq objects to also have Bio::Structure objects/ since my view is that any sequence could (optionally) have a structure associated with it. However, I reasoned that a sequence could actually have multiple structures (RNA, ssDNA, and protein can have several alternative folds or different folding pathways, for instance). Instead of splitting up each structure into individual seqfeatures (where each which would have to be tagged with the relevant structure and score info), I could have one class encompass all of that data in a reasonable way. Hence I used Annotation. BTW, this isn't meant to replace features in any way. It would be primarily used to describe (1) a sequence as a whole, such as a tRNA sequence, (2) a seqfeature, such as a tRNA, rRNA, riboswitch, etc in a genome sequence, or (3) a conserved structure in an alignment, such as Rfam stockholm output. I'll add that the option of splitting the data into seqfeatures isn't ruled out. It would be a matter of using a helper method, maybe in SeqUtils or directly in Annotation::Meta or whatever I end up calling it. I plan on adding something along those lines at some point. chris From mitch_skinner at berkeley.edu Wed Feb 7 23:26:53 2007 From: mitch_skinner at berkeley.edu (Mitch Skinner) Date: Wed, 07 Feb 2007 15:26:53 -0800 Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels In-Reply-To: <6dce9a0b0702070417k692fedcchfac5b1f0e01f72bd@mail.gmail.com> References: <45C9578F.2060802@berkeley.edu> <6dce9a0b0702070417k692fedcchfac5b1f0e01f72bd@mail.gmail.com> Message-ID: <45CA603D.1070901@berkeley.edu> Lincoln Stein wrote: > Zero is not a forbidden coordinate, since gbrowse also works on > genetic maps which have negative and floating point coordinates. > You've simply picked up a boundary case where the rounding isn't > working properly. I will fix this now. Thanks for the fix. What do you think of the following case?. This is something I actually ran into. Suppose you have: the original draw_grid: my $first_tick = $minor * int($self->start/$minor); and my version of map_pt: my $val = $flip ? ($pr - ($length - ($_- 1)) * $scale) : (($_-$offset-1) * $scale); $val = int($val + .5 * ($val <=> 0)); and scale=0.5, offset=0, pad_left=0, flip=0, and minor=10. Our tiles are currently 1000px wide. So the first gridline will be at 0bp => -1px and the 200th gridline will be at 2000bp => 1000px. So the first tile will not have a gridline at it's 0th pixel but the second tile will have one there. Last night I was thinking that this was an artifact of having gridlines start at 0bp but now I'm thinking this is just because rounding half-pixels leaves an extra space when crossing zero. Which is not unreasonable; it just invalidates the assumption I was hoping to make that the gridlines are the same for each tile. Maybe it's just unreasonable to think that floating point calculations will give pixel-exact results. Or I may just be barking up the wrong tree entirely. Perhaps it's time to reconsider at a higher level (see my next message). Mitch From mitch_skinner at berkeley.edu Wed Feb 7 23:28:11 2007 From: mitch_skinner at berkeley.edu (Mitch Skinner) Date: Wed, 07 Feb 2007 15:28:11 -0800 Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels In-Reply-To: <6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com> References: <45C9578F.2060802@berkeley.edu> <6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com> Message-ID: <45CA608B.80907@berkeley.edu> Lincoln Stein wrote: > However, I'm also very interested in why grid-drawing takes so long. > When I've profiled drawing, neither grid drawing nor map_pt() consume > any significant amount of time. Well, the approach that we've been taking is to hand Bio::Graphics::Panel a fake GD object that stores all of the graphical primitives (line, rectangle, filledRectangle, etc. + their parameters) and then draws them later in chunks (for each tile, we draw all the primitives that overlap its pixel boundaries). We're doing this because trying to create a real GD object that's hundreds of millions of pixels wide takes too much RAM. But storing all the gridlines (for a whole chromosome, at a high zoom level) also takes a lot of RAM, and getting the gridlines for the current tile and translating their coordinates into the coordinate space of the tile also takes a fair amount of CPU. The gridline hack I've been experimenting with (that prompted these emails) was motivated by the hope that the gridlines were regular enough that we wouldn't have to store them explicitly, but just draw the same gridlines over and over again. It runs almost twice as fast as the version that explicitly stores the gridlines. So the main slowdown is not in draw_grid or map_pt, but in our code that's storing/retrieving and translating the gridlines. Which we are also looking into speeding up. But the memory usage is harder to reduce; I've experimented with trying to compress the gridline data but it seems easier to just have the panel draw the grid directly. The more I read the Panel code, the more I think it would be nice to make more use of it. One of the reasons that we're trying to fool it right now is that there seem to be a number of behaviors in it (and/or in the glyphs?) that take the current image boundaries into account (drawing an arrow where a feature runs off the edge of the image, etc.). But in our browser each tile is supposed to mesh seamlessly with its neighbor, so if there's an easy way to turn off those edge-aware behaviors that would be pretty interesting. Ian has also suggested that it might be better to store less information than the full set of graphics primitives. For example, we could just store the Panel's glyph boxes and use their (pixel bound)->feature information to decide which features need to be drawn for each tile. I'm going to be spending some time reading the Bio::Graphics code in more depth. I'd also welcome suggestions from you or anyone on the list. Thanks, Mitch From sdbrown at annular.org Wed Feb 7 23:41:13 2007 From: sdbrown at annular.org (Steven Brown) Date: Wed, 7 Feb 2007 15:41:13 -0800 Subject: [Bioperl-l] Bio::Restriction::Analysis cut site problem in 1.5.2 Message-ID: <3828F5B9-BFF8-47F4-8228-3BEA036513A0@annular.org> The module seems to have trouble handling the cut-site specifiers that surround the sequence that the enzyme is specific for. The error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Bad end parameter (22). End must be less than the total length of sequence (total=6) STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.8.6/ Bio/Root/Root.pm:328 STACK: Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.8.6/ Bio/PrimarySeq.pm:371 STACK: Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/perl5/ site_perl/5.8.6/Bio/Restriction/Analysis.pm:884 STACK: Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ site_perl/5.8.6/Bio/Restriction/Analysis.pm:785 STACK: Bio::Restriction::Analysis::cut /usr/local/lib/perl5/site_perl/ 5.8.6/Bio/Restriction/Analysis.pm:369 STACK: Bio::Restriction::Analysis::cutters /usr/local/lib/perl5/ site_perl/5.8.6/Bio/Restriction/Analysis.pm:678 ---snip (my script line)--- ----------------------------------------------------------- The offending enzyme: ---snip--- <1>AcuI <2>Eco57I,Bsp6II,BspD6II,BspKT5I,Eco112I,Eco125I,FsfI <3>CTGAAG(16/14) ---snip--- If I get rid of the (16/14) the error disappears and the right sequence site is matched. It seems like maybe a decision was made not analyze enzymes with remote cut positions, or the code wouldn't throw the error...? Any information on this would be helpful. Thanks, Steve From adsj at novozymes.com Thu Feb 8 08:55:50 2007 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Thu, 08 Feb 2007 09:55:50 +0100 Subject: [Bioperl-l] Bioperl 1.4 to 1.5.2, adding qualifiers to Bio::Seq-objects References: <8764adoptn.fsf@topper.koldfront.dk> Message-ID: <87fy9hqb8p.fsf@topper.koldfront.dk> On Wed, 7 Feb 2007 11:50:13 -0600, Chris wrote: > This works for me using bioperl-live (Mac OS X): > ok 1 - Qualifier note found > ok 2 - Qualifier db_xref found *slaps forehead* Thanks for the test - your diagnose was spot on: > If you haven't uninstalled 1.4, make sure you aren't running 1.4 or > mixing the two versions (you can check by using 'perldoc -l > Bio::Root::Root'). I had a modified version of Bio::Seq and Bio::SeqFeature::Generic in my @INC (added, and promptly forgotten, writing the patch mentioned here: ). Removing those and patching 1.5.2 fixed my self-inflicted problem. Thanks again! Adam -- Adam Sj?gren adsj at novozymes.com From heikki at sanbi.ac.za Thu Feb 8 09:39:47 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 8 Feb 2007 11:39:47 +0200 Subject: [Bioperl-l] Bio::Restriction::Analysis cut site problem in 1.5.2 In-Reply-To: <3828F5B9-BFF8-47F4-8228-3BEA036513A0@annular.org> References: <3828F5B9-BFF8-47F4-8228-3BEA036513A0@annular.org> Message-ID: <200702081139.48125.heikki@sanbi.ac.za> The error comes from Bio::PrimarySeq::subseq when it tries to cut beyond an existing sequence. Maybe your sequence has a restriction site that is near the end of your sequence? This is a special case which has not been into account in Bio::Restriction::Analysis::_cuts method. The question is : should the site be be detected if its cut site is not within the studied sequence? Please submit a bugzilla bug, so this gets solved. I probably do not have time to tweak the code myself. -Heikki On Thursday 08 February 2007 01:41:13 Steven Brown wrote: > The module seems to have trouble handling the cut-site specifiers > that surround the sequence that the enzyme is specific for. The error: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Bad end parameter (22). End must be less than the total length > of sequence (total=6) > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.8.6/ > Bio/Root/Root.pm:328 > STACK: Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.8.6/ > Bio/PrimarySeq.pm:371 > STACK: Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/perl5/ > site_perl/5.8.6/Bio/Restriction/Analysis.pm:884 > STACK: Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ > site_perl/5.8.6/Bio/Restriction/Analysis.pm:785 > STACK: Bio::Restriction::Analysis::cut /usr/local/lib/perl5/site_perl/ > 5.8.6/Bio/Restriction/Analysis.pm:369 > STACK: Bio::Restriction::Analysis::cutters /usr/local/lib/perl5/ > site_perl/5.8.6/Bio/Restriction/Analysis.pm:678 > ---snip (my script line)--- > ----------------------------------------------------------- > > The offending enzyme: > > ---snip--- > <1>AcuI > <2>Eco57I,Bsp6II,BspD6II,BspKT5I,Eco112I,Eco125I,FsfI > <3>CTGAAG(16/14) > ---snip--- > > If I get rid of the (16/14) the error disappears and the right > sequence site is matched. It seems like maybe a decision was made > not analyze enzymes with remote cut positions, or the code wouldn't > throw the error...? Any information on this would be helpful. > > Thanks, > Steve > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From cjfields at uiuc.edu Thu Feb 8 14:20:26 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 8 Feb 2007 08:20:26 -0600 Subject: [Bioperl-l] BLASTXML changes (good this time!) Message-ID: All, BLAST XML parsing should now work for any CPAN-based XML::SAX parser! XML::SAX::PurePerl (comes with XML::SAX, the slowest) XML::SAX::Expat XML::SAX::ExpatXS (the fastest) XML::LibXML::SAX XML::LibXML::SAX::Parser Grant MacLean has updated XML::SAX on CPAN to fix a XML::SAX:PurePerl bug, so using that parser will necessitate an XML::SAX upgrade. I had also found a bug in the SAX handler which chopped off a large chunk of the description for hits which is now fixed in CVS. If Sendu is out there, I think we can safely remove any dependencies beyond XML::SAX 0.15 for the next release. Should I go ahead and modify Build.PL? chris From lstein at cshl.edu Thu Feb 8 15:51:49 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 8 Feb 2007 10:51:49 -0500 Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels In-Reply-To: <45CA608B.80907@berkeley.edu> References: <45C9578F.2060802@berkeley.edu> <6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com> <45CA608B.80907@berkeley.edu> Message-ID: <6dce9a0b0702080751m210e4d44k3e5c38bfdd3ee9ea@mail.gmail.com> Hi, I like the approach you're taking (creating a fake GD object that stores the graphics primitives). Perhaps the best thing to do is to subclass Panel itself so that it doesn't draw the gridlines (or turn gridlines off completely). Then you can draw gridlines after the fact in each tile as needed. Lincoln On 2/7/07, Mitch Skinner wrote: > > Lincoln Stein wrote: > > However, I'm also very interested in why grid-drawing takes so long. > > When I've profiled drawing, neither grid drawing nor map_pt() consume > > any significant amount of time. > Well, the approach that we've been taking is to hand > Bio::Graphics::Panel a fake GD object that stores all of the graphical > primitives (line, rectangle, filledRectangle, etc. + their parameters) > and then draws them later in chunks (for each tile, we draw all the > primitives that overlap its pixel boundaries). We're doing this because > trying to create a real GD object that's hundreds of millions of pixels > wide takes too much RAM. But storing all the gridlines (for a whole > chromosome, at a high zoom level) also takes a lot of RAM, and getting > the gridlines for the current tile and translating their coordinates > into the coordinate space of the tile also takes a fair amount of CPU. > The gridline hack I've been experimenting with (that prompted these > emails) was motivated by the hope that the gridlines were regular enough > that we wouldn't have to store them explicitly, but just draw the same > gridlines over and over again. It runs almost twice as fast as the > version that explicitly stores the gridlines. > > So the main slowdown is not in draw_grid or map_pt, but in our code > that's storing/retrieving and translating the gridlines. Which we are > also looking into speeding up. But the memory usage is harder to > reduce; I've experimented with trying to compress the gridline data but > it seems easier to just have the panel draw the grid directly. > > The more I read the Panel code, the more I think it would be nice to > make more use of it. One of the reasons that we're trying to fool it > right now is that there seem to be a number of behaviors in it (and/or > in the glyphs?) that take the current image boundaries into account > (drawing an arrow where a feature runs off the edge of the image, > etc.). But in our browser each tile is supposed to mesh seamlessly with > its neighbor, so if there's an easy way to turn off those edge-aware > behaviors that would be pretty interesting. > > Ian has also suggested that it might be better to store less information > than the full set of graphics primitives. For example, we could just > store the Panel's glyph boxes and use their (pixel bound)->feature > information to decide which features need to be drawn for each tile. > > I'm going to be spending some time reading the Bio::Graphics code in > more depth. I'd also welcome suggestions from you or anyone on the list. > > Thanks, > Mitch > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From Kevin.M.Brown at asu.edu Thu Feb 8 15:28:30 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 8 Feb 2007 08:28:30 -0700 Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels References: <45C9578F.2060802@berkeley.edu><6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com> <45CA608B.80907@berkeley.edu> Message-ID: <1A4207F8295607498283FE9E93B775B402AAC1D0@EX02.asurite.ad.asu.edu> > The more I read the Panel code, the more I think it would be > nice to make more use of it. One of the reasons that we're > trying to fool it right now is that there seem to be a number > of behaviors in it (and/or in the glyphs?) that take the > current image boundaries into account (drawing an arrow where > a feature runs off the edge of the image, etc.). But in our > browser each tile is supposed to mesh seamlessly with its > neighbor, so if there's an easy way to turn off those > edge-aware behaviors that would be pretty interesting. I think the glyphs try to deal with edges because if they didn't, then they would flow out into whatever right or left padding had been placed around the image when the panel was created. Something I've noticed is that when I create tiles for the chromosomes I'm working on the panels don't line up because the bump position in one panel is not accounted for when the next panel is drawn. From sheris at eps.berkeley.edu Thu Feb 8 17:42:27 2007 From: sheris at eps.berkeley.edu (Sheri Simmons) Date: Thu, 08 Feb 2007 09:42:27 -0800 Subject: [Bioperl-l] bioperl newbie needs help with extracting cds info Message-ID: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu> Hi, I'm a newbie to BioPerl so apologies if this is a very basic question. I am trying to parse GenBank files with the goal of creating concatenated gene lists in nucleic acid or amino acid format. It is working fine, except for one thing: I need to create gene labels incorporating information on whether the gene is on the complementary strand or not ("complement" in the CDS tag). How can I get Bioperl to tell me whether the CDS tag value includes the word "complement"? Thanks Sheri From george.heller at yahoo.com Thu Feb 8 18:54:41 2007 From: george.heller at yahoo.com (George Heller) Date: Thu, 8 Feb 2007 10:54:41 -0800 (PST) Subject: [Bioperl-l] Perl script to extract from ncbi Message-ID: <178139.85769.qm@web56506.mail.re3.yahoo.com> Hi all, I have a question regarding extracting data from Ncbi. I have a database to store the sequence data, but the files I have loaded into it, dont have a proper description line specified. Based on the accession number, I need to find out what is the genus and species name (organism name) from ncbi. I have about 1500 records for which I need to extract the names from ncbi. Any ideas of how I can go about writing a perl script for extracting this information from ncbi? Thanks! George. --------------------------------- Now that's room service! Choose from over 150,000 hotels in 45,000 destinations on Yahoo! Travel to find your fit. From Kevin.M.Brown at asu.edu Thu Feb 8 19:11:50 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 8 Feb 2007 12:11:50 -0700 Subject: [Bioperl-l] bioperl newbie needs help with extracting cds info References: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu> Message-ID: <1A4207F8295607498283FE9E93B775B402AAC29A@EX02.asurite.ad.asu.edu> When you extract the features, just look at the strand method on the returned sequence to find out. @features = $seq->all_SeqFeatures; # sort features by their primary tags for my $f (@features) { my $tag = $f->primary_tag; if ($tag eq 'CDS') { print $f->strand ."\n"; } } > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Sheri Simmons > Sent: Thursday, February 08, 2007 10:42 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] bioperl newbie needs help with > extracting cds info > > Hi, > I'm a newbie to BioPerl so apologies if this is a very basic > question. I am trying to parse GenBank files with the goal of > creating concatenated gene lists in nucleic acid or amino > acid format. It is working fine, except for one thing: I need > to create gene labels incorporating information on whether > the gene is on the complementary strand or not ("complement" > in the CDS tag). How can I get Bioperl to tell me whether the > CDS tag value includes the word "complement"? > > Thanks > Sheri > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From barry.moore at genetics.utah.edu Thu Feb 8 19:35:03 2007 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu, 8 Feb 2007 12:35:03 -0700 Subject: [Bioperl-l] bioperl newbie needs help with extracting cds info In-Reply-To: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu> References: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu> Message-ID: Sheri- The Bio::SeqFeature::Generic object has a 'strand' method, so you can just call strand on the CDS (or any other) feature like this. my @features = grep { $_->primary_tag eq 'CDS' } $seq- >get_SeqFeatures(); for my $feature (@features) { my $strand = $feature->strand; } Barry On Feb 8, 2007, at 10:42 AM, Sheri Simmons wrote: > Hi, > I'm a newbie to BioPerl so apologies if this is a very basic > question. I am trying to parse GenBank files with the goal of > creating concatenated gene lists in nucleic acid or amino acid > format. It is working fine, except for one thing: I need to create > gene labels incorporating information on whether the gene is on the > complementary strand or not ("complement" in the CDS tag). How can I > get Bioperl to tell me whether the CDS tag value includes the word > "complement"? > > Thanks > Sheri > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From torsten.seemann at infotech.monash.edu.au Fri Feb 9 04:18:33 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 9 Feb 2007 15:18:33 +1100 Subject: [Bioperl-l] BLASTXML changes (good this time!) In-Reply-To: References: Message-ID: Chris, > BLAST XML parsing should now work for any CPAN-based XML::SAX parser! > XML::SAX::PurePerl (comes with XML::SAX, the slowest) > XML::SAX::Expat > XML::SAX::ExpatXS (the fastest) > XML::LibXML::SAX > XML::LibXML::SAX::Parser That's excellent news - thanks for all the work you have put in on this one. I'm impressed. This is a good opportunity to encourage people who use Bio::SearchIO for BLAST parsing to switch to 'blastxml' format over 'blast'. Although the latter is more human readable, it perenially requires parser source changes to cope with the variations and new formatting introduced with each new NCBI BLAST release. Best to use "-m 7" XML format, and convert as appropriate using one of the Bio::Search::Writer:: classes. --Torsten From cjfields at uiuc.edu Fri Feb 9 13:58:24 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 9 Feb 2007 07:58:24 -0600 Subject: [Bioperl-l] BLASTXML changes (good this time!) In-Reply-To: References: Message-ID: <4FC966A7-7A03-41D9-ABF7-6ACD888720FB@uiuc.edu> On Feb 8, 2007, at 10:18 PM, Torsten Seemann wrote: > Chris, > >> BLAST XML parsing should now work for any CPAN-based XML::SAX parser! >> XML::SAX::PurePerl (comes with XML::SAX, the slowest) >> XML::SAX::Expat >> XML::SAX::ExpatXS (the fastest) >> XML::LibXML::SAX >> XML::LibXML::SAX::Parser > > That's excellent news - thanks for all the work you have put in on > this one. I'm impressed. Jason did most of the hard work; I just tinkered with it until it worked (and pestered a few perl XML guys along the way). Thanks Grant and Bj?rn! > This is a good opportunity to encourage people who use Bio::SearchIO > for BLAST parsing to switch to 'blastxml' format over 'blast'. > Although the latter is more human readable, it perenially requires > parser source changes to cope with the variations and new formatting > introduced with each new NCBI BLAST release. Best to use "-m 7" XML > format, and convert as appropriate using one of the > Bio::Search::Writer:: classes. > > --Torsten I'll try getting some benchmarks for the different parsers up today on the wiki if I have time. Strangely enough, NCBI changed a few things about BLAST XML a few releases back w/o mentioning it to anyone (it was a silent bug in BLAST XML parsing which I fixed recently). If you sent in multiple queries in older versions of BLAST you would get all of the BLAST XML reports concatenated together, which required preparsing the reports to carve up the XML prior to parsing. Now they treat it like PSI- BLAST where multiple queries = multiple iterations, so you get one long XML BLAST report where each iteration=Result. The current parser should handle both as it just caches the other results and returns them one at a time prior to new parses, but I wouldn't recommend parsing a huge BLAST XML file with hundreds of queries as you'll quickly run out of memory! If they get Perl SAX2 up to date with Expat they'll eventually add parse_chunk() and pause_parse() for each parser. Until then... chris Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cuiw at ncbi.nlm.nih.gov Fri Feb 9 14:20:10 2007 From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C]) Date: Fri, 9 Feb 2007 09:20:10 -0500 Subject: [Bioperl-l] Perl script to extract from ncbi In-Reply-To: <178139.85769.qm@web56506.mail.re3.yahoo.com> References: <178139.85769.qm@web56506.mail.re3.yahoo.com> Message-ID: <18C407FD4FFB424292D769FBD68C1987020BBC58@NIHCESMLBX8.nih.gov> This is an example for fetching two GenBank records (id=124504630,110665734) in XML format. Organism names like 'Rattus norvegicus' can be parsed from the XML. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&i d=124504630,110665734&retmode=xml&rettype=gb Or you can get TaxIds and translate them into real names: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=nucleotide &id=124504630,110665734&retmode=xml Wenwu Cui, PhD -----Original Message----- From: George Heller [mailto:george.heller at yahoo.com] Sent: Thursday, February 08, 2007 1:55 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Perl script to extract from ncbi Hi all, I have a question regarding extracting data from Ncbi. I have a database to store the sequence data, but the files I have loaded into it, dont have a proper description line specified. Based on the accession number, I need to find out what is the genus and species name () from ncbi. I have about 1500 records for which I need to extract the names from ncbi. Any ideas of how I can go about writing a perl script for extracting this information from ncbi? Thanks! George. --------------------------------- Now that's room service! Choose from over 150,000 hotels in 45,000 destinations on Yahoo! Travel to find your fit. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Fri Feb 9 17:51:19 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 09 Feb 2007 12:51:19 -0500 Subject: [Bioperl-l] Perl script to extract from ncbi In-Reply-To: <178139.85769.qm@web56506.mail.re3.yahoo.com> Message-ID: George, http://www.bioperl.org/wiki/HOWTO:Beginners#Retrieving_a_sequence_from_a_dat abase Brian O. On 2/8/07 1:54 PM, "George Heller" wrote: > Hi all, > > I have a question regarding extracting data from Ncbi. I have a database to > store the sequence data, but the files I have loaded into it, dont have a > proper description line specified. Based on the accession number, I need to > find out what is the genus and species name (organism name) from ncbi. > > I have about 1500 records for which I need to extract the names from ncbi. > > Any ideas of how I can go about writing a perl script for extracting this > information from ncbi? > > Thanks! > George. > > > --------------------------------- > Now that's room service! Choose from over 150,000 hotels > in 45,000 destinations on Yahoo! Travel to find your fit. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From johnston at biochem.ucl.ac.uk Fri Feb 9 19:23:41 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Fri, 9 Feb 2007 19:23:41 +0000 (GMT) Subject: [Bioperl-l] WrapperBase Message-ID: Hi, Could WrapperBase::executable warn you if it doesn't find the exe in program_path? At the moment it just silently goes ahead and uses one in the system path if it exists. Cass. I've never used diff, so not sure if this is right, but: 305,308c305,314 < if( $prog_path && -e $prog_path && -x $prog_path ) { < $self->{'_pathtoexe'} = $prog_path; < } else { < my $exe; --- > if($prog_path){ > if(-e $prog_path && -x $prog_path){ > $self->{'_pathtoexe'} = $prog_path; > } > else{ > $self->warn("executable not found in $prog_path, trying system path...") if $warn; > } > } > unless ($self->{_path_to_exe}){ > my $exe; 335a342 From bix at sendu.me.uk Fri Feb 9 22:38:59 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 09 Feb 2007 22:38:59 +0000 Subject: [Bioperl-l] WrapperBase In-Reply-To: References: Message-ID: <45CCF803.9030004@sendu.me.uk> Caroline Johnston wrote: > Hi, > > Could WrapperBase::executable warn you if it doesn't find the exe in > program_path? At the moment it just silently goes ahead and uses one in > the system path if it exists. No, I think not. That would be very annoying when using wrappers for programs that you just have in your system path. What specific problem are you encountering with the current behaviour? From bix at sendu.me.uk Fri Feb 9 22:40:33 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 09 Feb 2007 22:40:33 +0000 Subject: [Bioperl-l] BLASTXML changes (good this time!) In-Reply-To: References: Message-ID: <45CCF861.8030000@sendu.me.uk> Chris Fields wrote: > If Sendu is out there, I think we can safely remove any dependencies > beyond XML::SAX 0.15 for the next release. Should I go ahead and > modify Build.PL? Sure, good to hear. From cjfields at uiuc.edu Sat Feb 10 03:42:24 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 9 Feb 2007 21:42:24 -0600 Subject: [Bioperl-l] BLASTXML changes (good this time!) In-Reply-To: <45CCF861.8030000@sendu.me.uk> References: <45CCF861.8030000@sendu.me.uk> Message-ID: On Feb 9, 2007, at 4:40 PM, Sendu Bala wrote: > Chris Fields wrote: >> If Sendu is out there, I think we can safely remove any dependencies >> beyond XML::SAX 0.15 for the next release. Should I go ahead and >> modify Build.PL? > > Sure, good to hear. I added a version dependency for XML::SAX (v. 0.15) for the PurePerl fix. That likely obviates the need for a Bundle for XML::Simple. Not too pressing; we can determine that before the next release. chris From johnston at biochem.ucl.ac.uk Sat Feb 10 16:27:53 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Sat, 10 Feb 2007 16:27:53 +0000 (GMT) Subject: [Bioperl-l] WrapperBase In-Reply-To: <45CCF803.9030004@sendu.me.uk> References: <45CCF803.9030004@sendu.me.uk> Message-ID: > No, I think not. That would be very annoying when using wrappers for > programs that you just have in your system path. > Hmm, maybe I misundertood what the program_path was for? The executable method goes straight to the system path unless program_path is set, so I assumed you would only set program_path if you specifically wanted it to look somewhere else. You wouldn't get a warning if you didn't specify a program_path and just left it to look in the system path. > What specific problem are you encountering with the current behaviour? One version of an executable in /usr/local, another version - which I wanted to use in my home directory. The program_path method gets a path from an environment variable, which was set to ~/. I didn't realise I had the wrong permissions on the executable though, and it was silently failing to use my version and using the one in /usr/local instead. Cass From george.heller at yahoo.com Sat Feb 10 20:35:18 2007 From: george.heller at yahoo.com (George Heller) Date: Sat, 10 Feb 2007 12:35:18 -0800 (PST) Subject: [Bioperl-l] Error while parsing Message-ID: <162150.76282.qm@web56511.mail.re3.yahoo.com> Hi all, I am in the process of parsing a few files, actually blast results, but happen to get the following error: ------------- EXCEPTION ------------- MSG: Can't get HSPs: data not collected. STACK Bio::Search::Hit::GenericHit::hsp /usr/lib/perl5/site_perl/5.8.5/Bio/Search/Hit/GenericHit.pm:649 STACK toplevel parser.pl:31 -------------------------------------- I am not sure if this is a bug, or is there something I am doing wrong. Any pointers are appreciated. Thanks! George. --------------------------------- No need to miss a message. Get email on-the-go with Yahoo! Mail for Mobile. Get started. From cjfields at uiuc.edu Sat Feb 10 22:56:19 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 10 Feb 2007 16:56:19 -0600 Subject: [Bioperl-l] Error while parsing In-Reply-To: <162150.76282.qm@web56511.mail.re3.yahoo.com> References: <162150.76282.qm@web56511.mail.re3.yahoo.com> Message-ID: On Feb 10, 2007, at 2:35 PM, George Heller wrote: > Hi all, > > I am in the process of parsing a few files, actually blast > results, but happen to get the following error: > > ------------- EXCEPTION ------------- > MSG: Can't get HSPs: data not collected. > STACK Bio::Search::Hit::GenericHit::hsp /usr/lib/perl5/site_perl/ > 5.8.5/Bio/Search/Hit/GenericHit.pm:649 > STACK toplevel parser.pl:31 > -------------------------------------- > > I am not sure if this is a bug, or is there something I am doing > wrong. Any pointers are appreciated. > > Thanks! > George. We'll need more to go on than that. If the bioperl version is v1.5.2, please file a bug via the bioperl bugzilla: http://bugzilla.open-bio.org/ Don't forget to attach a test file which triggers the bug using the 'Create a new attachment' link after the report has been filed. chris From sac at bioperl.org Sun Feb 11 03:56:10 2007 From: sac at bioperl.org (Steve Chervitz) Date: Sat, 10 Feb 2007 19:56:10 -0800 Subject: [Bioperl-l] Error while parsing In-Reply-To: <162150.76282.qm@web56511.mail.re3.yahoo.com> References: <162150.76282.qm@web56511.mail.re3.yahoo.com> Message-ID: <8f200b4c0702101956h53fea96dm241126c680d64ab4@mail.gmail.com> Your report may be lacking HSP alignments for the hit you are attempting to process. Note that by default, blast will report twice as many one-line descriptions as it will alignments: -v Number of database sequences to show one-line descriptions for (V) [Integer] default = 500 -b Number of database sequence to show alignments for (B) [Integer] default = 250 Verify that this isn't the case for your error. If not, go ahead and file a bug report. Attach the report (zipped if big) as well as the relevant portion of your processing script. Steve On 2/10/07, George Heller wrote: > > Hi all, > > I am in the process of parsing a few files, actually blast results, but > happen to get the following error: > > ------------- EXCEPTION ------------- > MSG: Can't get HSPs: data not collected. > STACK Bio::Search::Hit::GenericHit::hsp > /usr/lib/perl5/site_perl/5.8.5/Bio/Search/Hit/GenericHit.pm:649 > STACK toplevel parser.pl:31 > -------------------------------------- > > I am not sure if this is a bug, or is there something I am doing wrong. > Any pointers are appreciated. > > Thanks! > George. > > > --------------------------------- > No need to miss a message. Get email on-the-go > with Yahoo! Mail for Mobile. Get started. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jay at jays.net Sun Feb 11 14:24:55 2007 From: jay at jays.net (Jay Hannah) Date: Sun, 11 Feb 2007 08:24:55 -0600 Subject: [Bioperl-l] wiki: Database error when attempting to change preferences (1054: Unknown column 'user_newpass_time') Message-ID: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net> Just a heads-up -- I wanted to check the "E-mail me when a page I'm watching is changed" box in my preferences http://www.bioperl.org/wiki/Special:Preferences But I can't. Even if I change nothing and hit the Save button I get this: ---------- Database error A database query syntax error has occurred. This may indicate a bug in the software. The last attempted database query was: (SQL query hidden) from within function "User::saveSettings". MySQL returned error "1054: Unknown column 'user_newpass_time' in 'field list' (localhost)". ---------- (Yes, it literally says "(SQL query hidden)". That wasn't me for the purposes of this email. -grin-) Thanks, j seqlab.net http://www.bioperl.org/wiki/User:Jhannah Username: Jhannah User ID: 51 From jay at jays.net Sun Feb 11 15:16:13 2007 From: jay at jays.net (Jay Hannah) Date: Sun, 11 Feb 2007 09:16:13 -0600 Subject: [Bioperl-l] wiki: Database error when attempting to change preferences (1054: Unknown column 'user_newpass_time') In-Reply-To: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net> References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net> Message-ID: <9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net> Hmm.... The error appears to not be limited to changing preferences. I tried to update a couple different pages and got errors like this: ------ Database error A database query syntax error has occurred. This may indicate a bug in the software. The last attempted database query was: (SQL query hidden) from within function "Article::updateRedirectOn". MySQL returned error "1146: Table 'perlwikidb.redirect' doesn't exist (localhost)". ------ So all changes to the wiki aren't working right now? j seqlab.net http://www.bioperl.org/wiki/User:Jhannah From jason at bioperl.org Sun Feb 11 20:18:15 2007 From: jason at bioperl.org (Jason Stajich) Date: Sun, 11 Feb 2007 12:18:15 -0800 Subject: [Bioperl-l] wiki: Database error when attempting to change preferences (1054: Unknown column 'user_newpass_time') In-Reply-To: <9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net> References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net> <9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net> Message-ID: <3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org> Should be fine now - I did an upgrade to mediawiki 1.9 this weekend and i think the upgrade script didn't finish. In the future system support requests should go to support - AT - open-bio.org so we can track them. -jason On Feb 11, 2007, at 7:16 AM, Jay Hannah wrote: > Hmm.... The error appears to not be limited to changing preferences. > I tried to update a couple different pages and got errors like this: > > ------ > Database error > A database query syntax error has occurred. This may indicate a bug > in the software. The last attempted database query was: > > (SQL query hidden) > > from within function "Article::updateRedirectOn". MySQL returned > error "1146: Table 'perlwikidb.redirect' doesn't exist (localhost)". > ------ > > So all changes to the wiki aren't working right now? > > j > seqlab.net > http://www.bioperl.org/wiki/User:Jhannah > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From cjfields at uiuc.edu Sun Feb 11 20:51:53 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 11 Feb 2007 14:51:53 -0600 Subject: [Bioperl-l] wiki: Database error when attempting to change preferences (1054: Unknown column 'user_newpass_time') In-Reply-To: <3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org> References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net> <9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net> <3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org> Message-ID: Is there a good place on the main wiki page to prominently display this? I wanted to place something at the top of the main page but I didn't know if we wanted to post the support email address on the page itself. chris On Feb 11, 2007, at 2:18 PM, Jason Stajich wrote: > Should be fine now - I did an upgrade to mediawiki 1.9 this weekend > and i think the upgrade script didn't finish. > > In the future system support requests should go to support - AT - > open-bio.org so we can track them. > > -jason > On Feb 11, 2007, at 7:16 AM, Jay Hannah wrote: > >> Hmm.... The error appears to not be limited to changing preferences. >> I tried to update a couple different pages and got errors like this: >> >> ------ >> Database error >> A database query syntax error has occurred. This may indicate a bug >> in the software. The last attempted database query was: >> >> (SQL query hidden) >> >> from within function "Article::updateRedirectOn". MySQL returned >> error "1146: Table 'perlwikidb.redirect' doesn't exist (localhost)". >> ------ >> >> So all changes to the wiki aren't working right now? >> >> j >> seqlab.net >> http://www.bioperl.org/wiki/User:Jhannah >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jay at jays.net Sun Feb 11 20:56:53 2007 From: jay at jays.net (Jay Hannah) Date: Sun, 11 Feb 2007 14:56:53 -0600 Subject: [Bioperl-l] wiki: Database error when attempting to change preferences (1054: Unknown column 'user_newpass_time') In-Reply-To: References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net> <9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net> <3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org> Message-ID: On Feb 11, 2007, at 2:51 PM, Chris Fields wrote: > Is there a good place on the main wiki page to prominently display > this? I wanted to place something at the top of the main page but > I didn't know if we wanted to post the support email address on the > page itself. I added it here: http://www.bioperl.org/wiki/About_site Which is linked from all pages via the left-hand bar: community | About this site j seqlab.net http://www.bioperl.org/wiki/User:Jhannah From agd27 at cornell.edu Sun Feb 11 17:47:03 2007 From: agd27 at cornell.edu (Adam Diehl) Date: Sun, 11 Feb 2007 12:47:03 -0500 Subject: [Bioperl-l] Getting GFF output in UCSC-specific format Message-ID: <45CF5697.60703@cornell.edu> Good morning folks, I've got sort of a newbie question regarding how to get gff's out of Bio::Tools:GFF objects that are formatted according to the UCSC browser conventions, described here: http://genome.ucsc.edu/goldenPath/help/customTrack.html#GFF (Ignore the custom track headers and what-not. I just need the fields to be set up according to the descriptions in 1 - 9). The write_feature($feature) method isn't doing it for me, as I get lines like the following (newlines excepted): chr1 EMBL/GenBank/SwissProt gene 1712 2848 . + . db_xref=GeneID:4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002 chr1 EMBL/GenBank/SwissProt CDS 1712 2848 . + . EC_number=2.7.7.7;codon_start=1;db_xref=GI:94989511,GeneID:4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002;product=DNA+polymerase+III%2C+beta+chain;protein_ id=YP_597611.1;transl_table=11;translation=MIQFSINRTLFIHALNATKRAISTKNAIPILSSIKIEVTSTGVTLTGSNGQISIENTIPVSNENAGLLITSPGAILLEASFFINIISSLPDISINVKEIEQHQVVLTSGKSEITLKGKDVDQYPRLQEVSTENPLILKTKLLKSIIAETAFAASLQESRPILTGVHIVLSNHKDFKAVATDSHRMSQRLIT LENTSADFDVVIPSKSLREFSAVFTDDIETVEVFFSPSQILFRSEHISFYTRLLEGNYPDTDRLLMTEFETEVVFNTQSLRHAMERAFLISNATQNGTVKLEITQNHISAHVNSPEVGKVNEDLDIVSQSGSDLTISFNPTYLIESLKAIKSETVKIHFLSPVRPFTLTPGDEEESFIQLITPVRTN As you can see, field 8, which should be frame according to UCSC conventions is blank, and field 9, group according to UCSC, has frame, along with ID, etc. All this extra stuff causes the UCSC browser to choke. First off, it can't identify which features are the same (it does this by matching the group field), and second, it can't interpret the CDS's into translated proteins because it lacks frame data. Basically what I need to do is, for CDS features, extract frame (or codon_start, as it were), from the last field, parse out the integer value and store that in field 8 (as frame), then parse out locus_tag from the last field, clear out everything else and store the locus_tag only in that field (preferably without the qualifier locus_tag=). For feature type gene, I just want to do the last step, so that the gene and CDS features for the same feature have matching group fields that are as simple as possible. Let me know if this is not clear. The way I've been trying to do this is by stringifying each gff object, splitting into an array, @tmp1, splitting @tmp1[8] into @tmp2 with the following code: my @tmp2 = split /\;\, $tmp1[8]; and finally, trying to parse out the bits I need with regular expressions and store back to @tmp1[n]. -- This does not work, because perl wants to interpret every / + etc. as a metacharacter! I am assuming there's a simple way to get at each value in the last field of the gff object using methods supplied by Bio::Tools::GFF, but the API docs seem a bit lacking in this area. Could anyone steer me towards what I need to know to do this? Please let me know if I can clarify any details! Cheers, Adam Diehl From jason at bioperl.org Sun Feb 11 23:29:16 2007 From: jason at bioperl.org (Jason Stajich) Date: Sun, 11 Feb 2007 15:29:16 -0800 Subject: [Bioperl-l] Getting GFF output in UCSC-specific format In-Reply-To: <45CF5697.60703@cornell.edu> References: <45CF5697.60703@cornell.edu> Message-ID: I assume you are getting your features from a Bio::SeqIO parse of a Genbank file? you get back a Bio::SeqFeature::Generic objects so you want to look at the docs for that module to see what the API is. you will need to set frame via $feature->frame($frame) You are going to have to determine the frame yourself if that isn't part of the feature, we don't calculate it for you. For the 9th column, this is available through the tags methods has_tag, add_tag_values, get_tag_values, get_all_tags, and remove_tag so you can remove all the tags you don't want through remove_tag (or if you want to remove them all) my $locus; for my $tag ( $feature->get_all_tags ) { if( $tag eq 'locus_tag' ) { # save the locus_tag when we see it ($locus) = $feature->get_tag_values($tag); } $feature->remove_tag($tag); } You will also want to set the GFF format when you call Bio::Tools::GFF - I think the UCSC site is only supporting GFF1, I don't know exactly how you set the tag then when they aren't paired with key=>value, you'll need to set the tag to 'group' so $feature->add_tag_value('group', $locus); If this is all unsatistfactory you can easily write your own GFF write for your flavor of the data with the print join("\t", $feat->seq_id, $feat->source_tag, $feat->primary_tag, $feat->start, $feat->end, $feat->score, $feat->strand > 0 ? '+' : '-', $feat->frame, $locus), "\n"; -jason On Feb 11, 2007, at 9:47 AM, Adam Diehl wrote: > Good morning folks, > > I've got sort of a newbie question regarding how to get gff's out of > Bio::Tools:GFF objects that are formatted according to the UCSC > browser > conventions, described here: > > http://genome.ucsc.edu/goldenPath/help/customTrack.html#GFF > (Ignore the custom track headers and what-not. I just need the > fields to > be set up according to the descriptions in 1 - 9). > > The write_feature($feature) method isn't doing it for me, as I get > lines > like the following (newlines excepted): > > chr1 EMBL/GenBank/SwissProt gene 1712 2848 . + > . db_xref=GeneID:4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002 > chr1 EMBL/GenBank/SwissProt CDS 1712 2848 . + > . > EC_number=2.7.7.7;codon_start=1;db_xref=GI:94989511,GeneID: > 4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002;product=DNA+polymerase > +III%2C+beta+chain;protein_ > id=YP_597611.1;transl_table=11;translation=MIQFSINRTLFIHALNATKRAISTKNA > IPILSSIKIEVTSTGVTLTGSNGQISIENTIPVSNENAGLLITSPGAILLEASFFINIISSLPDISINVK > EIEQHQVVLTSGKSEITLKGKDVDQYPRLQEVSTENPLILKTKLLKSIIAETAFAASLQESRPILTGVHI > VLSNHKDFKAVATDSHRMSQRLIT > LENTSADFDVVIPSKSLREFSAVFTDDIETVEVFFSPSQILFRSEHISFYTRLLEGNYPDTDRLLMTEFE > TEVVFNTQSLRHAMERAFLISNATQNGTVKLEITQNHISAHVNSPEVGKVNEDLDIVSQSGSDLTISFNP > TYLIESLKAIKSETVKIHFLSPVRPFTLTPGDEEESFIQLITPVRTN > > As you can see, field 8, which should be frame according to UCSC > conventions is blank, and field 9, group according to UCSC, has frame, > along with ID, etc. All this extra stuff causes the UCSC browser to > choke. First off, it can't identify which features are the same (it > does > this by matching the group field), and second, it can't interpret the > CDS's into translated proteins because it lacks frame data. > > Basically what I need to do is, for CDS features, extract frame (or > codon_start, as it were), from the last field, parse out the integer > value and store that in field 8 (as frame), then parse out locus_tag > from the last field, clear out everything else and store the locus_tag > only in that field (preferably without the qualifier locus_tag=). For > feature type gene, I just want to do the last step, so that the > gene and > CDS features for the same feature have matching group fields that > are as > simple as possible. Let me know if this is not clear. > > The way I've been trying to do this is by stringifying each gff > object, > splitting into an array, @tmp1, splitting @tmp1[8] into @tmp2 with the > following code: my @tmp2 = split /\;\, $tmp1[8]; and finally, > trying to > parse out the bits I need with regular expressions and store back to > @tmp1[n]. -- This does not work, because perl wants to interpret > every > / + etc. as a metacharacter! > > I am assuming there's a simple way to get at each value in the last > field of the gff object using methods supplied by Bio::Tools::GFF, but > the API docs seem a bit lacking in this area. Could anyone steer me > towards what I need to know to do this? Please let me know if I can > clarify any details! > > Cheers, > Adam Diehl > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From bix at sendu.me.uk Sun Feb 11 23:39:15 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 11 Feb 2007 23:39:15 +0000 Subject: [Bioperl-l] WrapperBase In-Reply-To: References: <45CCF803.9030004@sendu.me.uk> Message-ID: <45CFA923.8010201@sendu.me.uk> Caroline Johnston wrote: >> No, I think not. That would be very annoying when using wrappers for >> programs that you just have in your system path. > > Hmm, maybe I misundertood what the program_path was for? The executable > method goes straight to the system path unless program_path is set, so I > assumed you would only set program_path if you specifically wanted it to > look somewhere else. You wouldn't get a warning if you didn't specify a > program_path and just left it to look in the system path. Yes, sorry. Having now actually looked at your patch it seems fine. I'll commit it unless someone beats me to it. From flope004 at hotmail.com Mon Feb 12 02:40:08 2007 From: flope004 at hotmail.com (Wolverine Fran) Date: Mon, 12 Feb 2007 03:40:08 +0100 Subject: [Bioperl-l] TreeIO, how it works? Message-ID: Hi, I have a problem. I don't understand how TreeIO reads the trees: my input: ((dog:0.04,cat:0.08):0.12,(human:0.15,mouse:0.2)); An unrooted tree with 4 tips and 2 internal nodes. when I asked for: print "Total number of nodes ",$tree->number_nodes; I get 6 but when I ask for: foreach my $node (@nodes) { print $node->internal_id,","; } I get 6,0,1,2,3,4,5. Total 7. The root is number 6 and 2 and 5 are my internal nodes. If I set the root to be number 5 this node 6 is still present. Why? what is the node 6? when I try the following: $node5 = $tree->find_node(-internal_id => '5'); $node6 = $tree->find_node(-internal_id => '6'); $node2 = $tree->find_node(-internal_id => '2'); $distance1 = $tree->distance(-nodes =>[$node5,$node2]); $distance2 = $tree->distance(-nodes =>[$node5,$node6]); $distance3 = $tree->distance(-nodes =>[$node2,$node6]); or any other distance I get 2 warnings: -------------------- WARNING --------------------- MSG: Must provide a valid array reference for -nodes --------------------------------------------------- -------------------- WARNING --------------------- MSG: Could not find distance! --------------------------------------------------- What am I doing incorrectly? I am practicing with AlignIO and TreeIO to calculate the maximum likelihood for a given tree. So, other information about that would be of great help. I am practicing with this to see how Bioperl can help me with more complex problems. Thank you very much for your help! _________________________________________________________________ Acepta el reto MSN Premium: Correos m?s divertidos con fotos y textos incre?bles en MSN Premium. Desc?rgalo y pru?balo 2 meses gratis. http://join.msn.com?XAPID=1697&DI=1055&HL=Footer_mailsenviados_correosmasdivertidos From jason at bioperl.org Mon Feb 12 03:05:18 2007 From: jason at bioperl.org (Jason Stajich) Date: Sun, 11 Feb 2007 19:05:18 -0800 Subject: [Bioperl-l] TreeIO, how (does) it work? In-Reply-To: References: Message-ID: <60D2DD3B-2ED1-4A92-A9FA-2875FEAA28CC@bioperl.org> On Feb 11, 2007, at 6:40 PM, Wolverine Fran wrote: > Hi, > > I have a problem. I don't understand how TreeIO reads the trees: > my input: ((dog:0.04,cat:0.08):0.12,(human:0.15,mouse:0.2)); > > An unrooted tree with 4 tips and 2 internal nodes. > when I asked for: > print "Total number of nodes ",$tree->number_nodes; > > I get 6 but when I ask for: > foreach my $node (@nodes) { > print $node->internal_id,","; > } > I get 6,0,1,2,3,4,5. Total 7. > > The root is number 6 and 2 and 5 are my internal nodes. > If I set the root to be number 5 this node 6 is still present. > Why? what is the node 6? Node 6 is to hold the root or a fake root with a trifurcation for unrooted trees. Did you actually call the reroot method to set the root to node 5? > > when I try the following: > $node5 = $tree->find_node(-internal_id => '5'); > $node6 = $tree->find_node(-internal_id => '6'); > $node2 = $tree->find_node(-internal_id => '2'); > $distance1 = $tree->distance(-nodes =>[$node5,$node2]); > $distance2 = $tree->distance(-nodes =>[$node5,$node6]); > $distance3 = $tree->distance(-nodes =>[$node2,$node6]); > or any other distance I get 2 warnings: > -------------------- WARNING --------------------- > MSG: Must provide a valid array reference for -nodes > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: Could not find distance! > --------------------------------------------------- > What am I doing incorrectly? > The distance method is just summing branch lengths on the path between two nodes. Is that what are you trying to do? The error message you report doesn't make sense as "Must provide a valid array reference for -nodes" is only printed when you call is_monophyletic or is_paraphyletic as far as I can tell. what version of bioperl are you using? > I am practicing with AlignIO and TreeIO to calculate the maximum > likelihood > for a given tree. So,other information about that would be of great > help. I am practicing with > this to see how Bioperl can help me with more complex problems. > You are trying to calculate the likelihood of a tree or are you trying to generate a ML tree from an alignment? > Thank you very much for your help! > > _________________________________________________________________ > Acepta el reto MSN Premium: Correos m?s divertidos con fotos y textos > incre?bles en MSN Premium. Desc?rgalo y pru?balo 2 meses gratis. > http://join.msn.com? > XAPID=1697&DI=1055&HL=Footer_mailsenviados_correosmasdivertidos > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From er at xs4all.nl Mon Feb 12 13:03:06 2007 From: er at xs4all.nl (Erik) Date: Mon, 12 Feb 2007 14:03:06 +0100 (CET) Subject: [Bioperl-l] bioperl wiki changes rss / atom In-Reply-To: References: <162150.76282.qm@web56511.mail.re3.yahoo.com> Message-ID: <20181.156.83.0.124.1171285386.squirrel@webmail.xs4all.nl> Hi, The bioperl wiki changes rss / atom feed has two leading empty lines which invalidate the xml: XML Parsing Error: xml declaration not at start of external entity Location: http://www.bioperl.org/w/index.php?title=Special:Recentchanges&feed=rss Line Number 3, Column 1: ^ Could those be removed? (I didn't see a way to do it myself). Might be a useful feed :) thanks, Erik From cjfields at uiuc.edu Mon Feb 12 14:52:44 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 12 Feb 2007 08:52:44 -0600 Subject: [Bioperl-l] bioperl wiki changes rss / atom In-Reply-To: <20181.156.83.0.124.1171285386.squirrel@webmail.xs4all.nl> References: <162150.76282.qm@web56511.mail.re3.yahoo.com> <20181.156.83.0.124.1171285386.squirrel@webmail.xs4all.nl> Message-ID: I have forwarded this to support at open-bio.org, which should take care of it. chris On Feb 12, 2007, at 7:03 AM, Erik wrote: > Hi, > > > The bioperl wiki changes rss / atom feed has two leading empty > lines which > invalidate the xml: > > XML Parsing Error: xml declaration not at start of external entity > Location: > http://www.bioperl.org/w/index.php? > title=Special:Recentchanges&feed=rss > Line Number 3, Column 1: > ^ > > Could those be removed? (I didn't see a way to do it myself). Might > be a > useful feed :) > > > thanks, > > Erik > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From sm8 at sanger.ac.uk Mon Feb 12 17:12:00 2007 From: sm8 at sanger.ac.uk (Stephen Montgomery) Date: Mon, 12 Feb 2007 17:12:00 -0000 Subject: [Bioperl-l] subtract for Bio::RangeI.pm Message-ID: Hi - It is a subtract function for the Bio::RangeI class. (To be added if interested) All the best! Stephen Montgomery //ADD TO BIO::RANGEI =head2 subtract Title : subtract Usage : my @subtracted = $r1->subtract($r2) Function: Subtract range r2 from range r1 Args : arg #1 = a range to subtract from this one (mandatory) arg #2 = strand option ('strong', 'weak', 'ignore') (optional) Returns : undef if they do not overlap or r2 contains this RangeI, or an arrayref of Range objects (this is an array since some instances where the subtract range is enclosed within this range will result in the creation of two new disjoint ranges) =cut sub subtract() { my ($self, $range, $so) = @_; $self->throw("missing arg: you need to pass in another feature") unless $range; return unless $self->_testStrand($range, $so); if ($self eq "Bio::RangeI") { $self = "Bio::Range"; $self->warn("calling static methods of an interface is deprecated; use $self instead"); } $range->throw("Input a Bio::RangeI object") unless $range->isa('Bio::RangeI'); if (!$self->overlaps($range)) { return undef; } ##Subtracts everything if ($range->contains($self)) { return undef; } my ($start, $end, $strand) = $self->intersection($range, $so); ##Subtract intersection from $self range my @outranges = (); if ($self->start < $start) { push(@outranges, $self->new('-start'=>$self->start, '-end'=>$start - 1, '-strand'=>$self->strand, )); } if ($self->end > $end) { push(@outranges, $self->new('-start'=>$end + 1, '-end'=>$self->end, '-strand'=>$self->strand, )); } return \@outranges; } //UNIT TEST #!/usr/bin/perl use strict; use Bio::SeqFeature::Generic; use Data::Dumper; use Test; plan tests => 13; my $feature1 = new Bio::SeqFeature::Generic ( -start => 1, -end => 1000, -strand => 1); my $feature2 = new Bio::SeqFeature::Generic ( -start => 100, -end => 900, -strand => -1); my $subtracted = $feature1->subtract($feature2); ok(defined($subtracted)); ok(scalar(@$subtracted) == 2); foreach my $range (@$subtracted) { ok($range->start == 1 || $range->start == 901); ok($range->end == 99 || $range->end == 1000); } my $subtracted = $feature2->subtract($feature1); ok(!defined($subtracted)); my $subtracted = $feature1->subtract($feature2, 'weak'); ok(!defined($subtracted)); my $subtracted = $feature1->subtract($feature2, 'strong'); ok(!defined($subtracted)); my $feature3 = new Bio::SeqFeature::Generic ( -start => 500, -end => 1500, -strand => 1); my $subtracted = $feature1->subtract($feature3); ok(defined($subtracted)); ok(scalar(@$subtracted) == 1); my $subtracted_i = @$subtracted[0]; ok($subtracted_i->start == 1); ok($subtracted_i->end == 499); From sm8 at sanger.ac.uk Mon Feb 12 16:04:41 2007 From: sm8 at sanger.ac.uk (Stephen Montgomery) Date: Mon, 12 Feb 2007 16:04:41 -0000 Subject: [Bioperl-l] subtract for Bio::RangeI.pm Message-ID: Hi - It is a subtract function for the Bio::RangeI class. (To be added if interested) All the best! Stephen Montgomery //ADD TO BIO::RANGEI =head2 subtract Title : subtract Usage : my @subtracted = $r1->subtract($r2) Function: Subtract range r2 from range r1 Args : arg #1 = a range to subtract from this one (mandatory) arg #2 = strand option ('strong', 'weak', 'ignore') (optional) Returns : undef if they do not overlap or r2 contains this RangeI, or an arrayref of Range objects (this is an array since some instances where the subtract range is enclosed within this range will result in the creation of two new disjoint ranges) =cut sub subtract() { my ($self, $range, $so) = @_; $self->throw("missing arg: you need to pass in another feature") unless $range; return unless $self->_testStrand($range, $so); if ($self eq "Bio::RangeI") { $self = "Bio::Range"; $self->warn("calling static methods of an interface is deprecated; use $self instead"); } $range->throw("Input a Bio::RangeI object") unless $range->isa('Bio::RangeI'); if (!$self->overlaps($range)) { return undef; } ##Subtracts everything if ($range->contains($self)) { return undef; } my ($start, $end, $strand) = $self->intersection($range, $so); ##Subtract intersection from $self range my @outranges = (); if ($self->start < $start) { push(@outranges, $self->new('-start'=>$self->start, '-end'=>$start - 1, '-strand'=>$self->strand, )); } if ($self->end > $end) { push(@outranges, $self->new('-start'=>$end + 1, '-end'=>$self->end, '-strand'=>$self->strand, )); } return \@outranges; } //UNIT TEST #!/usr/bin/perl use strict; use Bio::SeqFeature::Generic; use Data::Dumper; use Test; plan tests => 13; my $feature1 = new Bio::SeqFeature::Generic ( -start => 1, -end => 1000, -strand => 1); my $feature2 = new Bio::SeqFeature::Generic ( -start => 100, -end => 900, -strand => -1); my $subtracted = $feature1->subtract($feature2); ok(defined($subtracted)); ok(scalar(@$subtracted) == 2); foreach my $range (@$subtracted) { ok($range->start == 1 || $range->start == 901); ok($range->end == 99 || $range->end == 1000); } my $subtracted = $feature2->subtract($feature1); ok(!defined($subtracted)); my $subtracted = $feature1->subtract($feature2, 'weak'); ok(!defined($subtracted)); my $subtracted = $feature1->subtract($feature2, 'strong'); ok(!defined($subtracted)); my $feature3 = new Bio::SeqFeature::Generic ( -start => 500, -end => 1500, -strand => 1); my $subtracted = $feature1->subtract($feature3); ok(defined($subtracted)); ok(scalar(@$subtracted) == 1); my $subtracted_i = @$subtracted[0]; ok($subtracted_i->start == 1); ok($subtracted_i->end == 499); From flope004 at hotmail.com Mon Feb 12 18:07:12 2007 From: flope004 at hotmail.com (Wolverine Fran) Date: Mon, 12 Feb 2007 19:07:12 +0100 Subject: [Bioperl-l] TreeIO, how (does) it work? In-Reply-To: <60D2DD3B-2ED1-4A92-A9FA-2875FEAA28CC@bioperl.org> Message-ID: thanks for your reply! I am using Bioperl 1.4. >Node 6 is to hold the root or a fake root with a trifurcation for >unrooted trees. Did you actually call the reroot method to set the >root to node 5? Yes, I tried the following with the same result: $tree->reroot($tree->find_node(-internal_id => '5')); or $tree->set_root_node($tree->find_node(-internal_id => '5')); Even if I use a rooted tree: (((dog:0.04,cat:0.08):0.12,human:0.15):0.1,mouse:0.1); I get the node #6. So, is it always present? Am I not representing properly a rooted tree in newick format? >The distance method is just summing branch lengths on the path >between two nodes. Is that what are you trying to do? > >The error message you report doesn't make sense as >"Must provide a valid array reference for -nodes" >is only printed when you call is_monophyletic or is_paraphyletic as >far as I can tell. I do not know yet what I was doing incorrectly but now It works. Yes, I was using the distance method to know where the node 6 was located. For the unrooted tree, node 6 was node 5 (an internal node) and for the rooted tree node 6 was 0.1 from the mouse leaf and the internal node (root). The error message: "Must provide a valid array reference for -nodes" is shown if I indicate a node which is not present in the tree. >You are trying to calculate the likelihood of a tree or are you >trying to generate a ML tree from an alignment? I am trying to calculate the likelihood of a tree, as a practice. Probably there are other bioperl modules, besides AlignIO and TreeIO, which can help me in the process and I do not know them. Again, thank you for your time! _________________________________________________________________ Acepta el reto MSN Premium: Protecci?n para tus hijos en internet. Desc?rgalo y pru?balo 2 meses gratis. http://join.msn.com?XAPID=1697&DI=1055&HL=Footer_mailsenviados_proteccioninfantil From dmessina at wustl.edu Mon Feb 12 17:49:49 2007 From: dmessina at wustl.edu (David Messina) Date: Mon, 12 Feb 2007 11:49:49 -0600 Subject: [Bioperl-l] subtract for Bio::RangeI.pm In-Reply-To: References: Message-ID: <1574ACCF-92D5-4DEC-AD04-14EB7767F22A@wustl.edu> Stephen, Great, thanks for this. Could you submit it to Bugzilla as an enhancement? http://bugzilla.open-bio.org/ Thanks, Dave From jason at bioperl.org Mon Feb 12 18:38:11 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 12 Feb 2007 10:38:11 -0800 Subject: [Bioperl-l] TreeIO, how (does) it work? In-Reply-To: References: Message-ID: I would definitely suggest getting ahold of bioperl 1.5.2 as I seem to remember there are several fixes in the tree module code for re- rooting a tree. -jason On Feb 12, 2007, at 10:07 AM, Wolverine Fran wrote: > thanks for your reply! > > I am using Bioperl 1.4. > >> Node 6 is to hold the root or a fake root with a trifurcation for >> unrooted trees. Did you actually call the reroot method to set the >> root to node 5? > > Yes, I tried the following with the same result: > $tree->reroot($tree->find_node(-internal_id => '5')); > or > $tree->set_root_node($tree->find_node(-internal_id => '5')); > > Even if I use a rooted tree: (((dog:0.04,cat:0.08):0.12,human:0.15): > 0.1,mouse:0.1); > I get the node #6. So, is it always present? Am I not representing > properly a rooted tree in newick format? > >> The distance method is just summing branch lengths on the path >> between two nodes. Is that what are you trying to do? >> >> The error message you report doesn't make sense as >> "Must provide a valid array reference for -nodes" >> is only printed when you call is_monophyletic or is_paraphyletic as >> far as I can tell. > > I do not know yet what I was doing incorrectly but now It works. > Yes, I was using the distance method to know where the node 6 was > located. For the unrooted tree, node 6 was node 5 (an internal > node) and for the rooted tree node 6 was 0.1 from the mouse leaf > and the internal node (root). > The error message: "Must provide a valid array reference for - > nodes" is shown if I indicate a node which is not present in the tree. > >> You are trying to calculate the likelihood of a tree or are you >> trying to generate a ML tree from an alignment? > > I am trying to calculate the likelihood of a tree, as a practice. > Probably there are other bioperl modules, besides AlignIO and > TreeIO, which can help me in the process and I do not know them. > > Again, thank you for your time! > > _________________________________________________________________ > Acepta el reto MSN Premium: Protecci?n para tus hijos en internet. > Desc?rgalo y pru?balo 2 meses gratis. http://join.msn.com? > XAPID=1697&DI=1055&HL=Footer_mailsenviados_proteccioninfantil > -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From johnsonm at gmail.com Mon Feb 12 23:13:09 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Mon, 12 Feb 2007 17:13:09 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: On 2/7/07, Mark Johnson wrote: > > Well, each format has some unique features. If the user declines to > specify the format, I can figure it out, but it will probably involve > scanning the input file twice. I'll take a look. > I can do all the parsing in one function, in fact I have, just to see > how nasty it would end up being. I just can't stomach having the code that > tightly coupled and hard to read. In the end it'll probably be three > functions. GlimmermM/HMM are pretty close. Maybe two, Glimmer2 and > Glimmer3 aren't *that* different, either. I've got a 4-in-1 parser roughed in per Chris Fields' suggestion. Two actual parsing routines (prokaryotic and eukaryotic). You can specify -format as an arg to the constructor (Glimmer, GlimmerM, GlimmerHMM), or it will look through the input until it can figure out what it is looking at. I've got one main issue to solve, the rest is just stuff like updating the POD. Torsten Seemann very helpfully added example output for all 4 formats to t/data. Looking at GlimmerHMM.out, the first line is 'GlimmerHMM'. However, I think there is a bug in the existing _parse_predictions: Shouldn't this: } elsif( /^(Glimmer\S*)$/ ) { # GlimmerHMM has no version $source = $1; next; } be this instead: } elsif( /^(GlimmerHMM\S*)$/ ) { # GlimmerHMM has no version $source = $1; next; } I lifted that bit of code to do format detection...we don't have GlimmerHMM installed locally, so I'm assuming Torsten's output is correct and the above is a bug. Guess I'll go check bugzilla... From torsten.seemann at infotech.monash.edu.au Tue Feb 13 02:07:40 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 13 Feb 2007 13:07:40 +1100 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: Mark, > I've got one main issue to solve, the rest is just stuff like updating > the POD. Torsten Seemann very helpfully added example output for all 4 > formats to t/data. Looking at GlimmerHMM.out, the first line is > 'GlimmerHMM'. However, I think there is a bug in the existing > _parse_predictions: > Shouldn't this: > } elsif( /^(Glimmer\S*)$/ ) { # GlimmerHMM has no version > be this instead: > } elsif( /^(GlimmerHMM\S*)$/ ) { # GlimmerHMM has no version I think it should be what it says, or perhaps now /^(Glimmer(M|HMM))/. Here's why: I came onto the scene at Glimmer.pm rev 1.4. At that stage it only parse GlimmerM. I noted that GlimmerHMM was the same output format as GlimmerM, except for the first line. So in rev 1.5 I modified the regexp to match both ie. \S* . This would also hopefully match any other Glimmer-clone formats that arose. I also fixed the pdocs to say this, and added tests to t/Genpred.t. % cvs diff -r 1.4 -r 1.5 Bio/Tools/Glimmer.pm % cvs diff -r 1.15 -r 1.16 t/Genpred.t I then planned to extend support to Glimmer2 and Glimmer3. I added the 4 test files (t/Glimmer*.out) but never wrote the code. This is where you have come in Mark :-) > I lifted that bit of code to do format detection...we don't have GlimmerHMM > installed locally, so I'm assuming Torsten's output is correct and the above > is a bug. Guess I'll go check bugzilla... I'm pretty sure my 4 test files are correct - I spent a lot of time ensuring they were consistent etc, as I was getting very confused with the different "glimmer" versions! Hope this all helps, --Torsten From avilella at gmail.com Tue Feb 13 13:20:15 2007 From: avilella at gmail.com (Albert Vilella) Date: Tue, 13 Feb 2007 13:20:15 +0000 Subject: [Bioperl-l] number of gaps for the other sequences in an alignment Message-ID: <358f4d650702130520n269419cfkb9cb6dac8feaaa5c@mail.gmail.com> Hi, It would be great if we could have a method to count, given one sequence in an alignment, the number of gaps present in the rest of the sequences of the alignment. That is, for each nucleotide/aminoacidic position of the sequence of interest, look at the column in the alignment, count the gaps, then sum them over for the rest of the non-gapped columns in the sequence of interest. Has anyone tried this before? My idea is to end up having a coefficient of indel contribution for each of the sequences in the alignment, with this coefficient being high when one sequences forces a lot of gaps to be inserted in the final alignment, in order to accommodate this given sequence. I would say that the best place for this is either using methods already available in SimpleAlign, or have something new added there. Looking forward to your comments, Cheers, Albert. From bix at sendu.me.uk Tue Feb 13 16:09:09 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 13 Feb 2007 16:09:09 +0000 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts Message-ID: <45D1E2A5.6060104@sendu.me.uk> I have some raw sequences in a Bio::DB::SeqFeature::Store mysql database and wanted to associated some basic information with them, like exon positions. I thought of creating Bio::SeqFeature::Gene::Transcript objects and storing them so I could later use features() to see what other features overlapped exons. I ran into a fatal error that can be replicated with the following simplified one-liner: perl -MBio::DB::SeqFeature::Store -MBio::SeqFeature::Gene::Transcript -e '$db = Bio::DB::SeqFeature::Store->new(-adaptor => "DBI::mysql", -dsn => "dbi:mysql:test"); $trans = Bio::SeqFeature::Gene::Transcript->new(-start => 1, -end => 2, -seq_id => "test"); $db->store($trans); @trans = $db->features(-seqid => $id, -type => "transcript"); print "@trans\n";' code sub { package Bio::SeqFeature::Generic; use strict 'refs'; my $self = shift @_; foreach my $f (@{[] unless $$self{'_gsf_sub_array'};}) { $f = undef; } $$self{'_gsf_seq'} = undef; foreach my $t (keys %{{} unless $$self{'_gsf_tag_hash'};}) { $$self{'_gsf_tag_hash'}{$t} = undef; delete $$self{'_gsf_tag_hash'}{$t}; } } did not evaluate to a subroutine reference, at /.../Bio/DB/SeqFeature/Store.pm line 2280 Is this a bug? Or am I taking the wrong approach? From johnsonm at gmail.com Tue Feb 13 20:10:23 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Tue, 13 Feb 2007 14:10:23 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: You're quite correct. I wasn't paying enough attention. That does work just fine. I fat-fingered something somewhere else, broke my version of the module for GlimmerHMM, hallucinated and confused \S and \s. 8) All I have left now is to fixup the POD documentation and such and then I can send the module along and somebody can make whatever tweaks and check it in. Shall I open a ticket in Bugzilla for this and attach diffs, or just send them along to somebody to take care of directly? Oh, one thing I have not mentioned. I also added a -seqname argument. Glimmer2 does not provide any kind of sequence identifier in the output, and only processes the first sequence in a fasta file. It would be tedious to have to code around this by fixing up the predictions after they are produced, so I added the option to provide this missing info up front, hopefully allowing downstream code to not have to care as much and have a special case for fixing up Glimmer2 predictions. On 2/12/07, Torsten Seemann wrote: > I think it should be what it says, or perhaps now /^(Glimmer(M|HMM))/. > Here's why: > > I came onto the scene at Glimmer.pm rev 1.4. At that stage it only > parse GlimmerM. I noted that GlimmerHMM was the same output format as > GlimmerM, except for the first line. So in rev 1.5 I modified the > regexp to match both ie. \S* . This would also hopefully match any > other Glimmer-clone formats that arose. I also fixed the pdocs to say > this, and added tests to t/Genpred.t. > % cvs diff -r 1.4 -r 1.5 Bio/Tools/Glimmer.pm > % cvs diff -r 1.15 -r 1.16 t/Genpred.t > > I then planned to extend support to Glimmer2 and Glimmer3. I added the > 4 test files (t/Glimmer*.out) but never wrote the code. This is where > you have come in Mark :-) > > > I lifted that bit of code to do format detection...we don't have > GlimmerHMM > > installed locally, so I'm assuming Torsten's output is correct and the > above > > is a bug. Guess I'll go check bugzilla... > > I'm pretty sure my 4 test files are correct - I spent a lot of time > ensuring they were consistent etc, as I was getting very confused with > the different "glimmer" versions! > > Hope this all helps, > > --Torsten > From cjfields at uiuc.edu Tue Feb 13 20:47:19 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 13 Feb 2007 14:47:19 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <451095C7.3020905@infotech.monash.edu.au> <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: You'll also want to update whatever relevant tests there are for Glimmer; looks like they are in GenPred.t. chris On Feb 13, 2007, at 2:10 PM, Mark Johnson wrote: > You're quite correct. I wasn't paying enough attention. That > does work > just fine. I fat-fingered something somewhere else, broke my > version of the > module for GlimmerHMM, hallucinated and confused \S and \s. 8) > All I have left now is to fixup the POD documentation and such > and then > I can send the module along and somebody can make whatever tweaks > and check > it in. Shall I open a ticket in Bugzilla for this and attach > diffs, or just > send them along to somebody to take care of directly? > Oh, one thing I have not mentioned. I also added a -seqname > argument. > Glimmer2 does not provide any kind of sequence identifier in the > output, and > only processes the first sequence in a fasta file. It would be > tedious to > have to code around this by fixing up the predictions after they are > produced, so I added the option to provide this missing info up front, > hopefully allowing downstream code to not have to care as much and > have a > special case for fixing up Glimmer2 predictions. > > On 2/12/07, Torsten Seemann > wrote: > >> I think it should be what it says, or perhaps now /^(Glimmer(M| >> HMM))/. >> Here's why: >> >> I came onto the scene at Glimmer.pm rev 1.4. At that stage it only >> parse GlimmerM. I noted that GlimmerHMM was the same output format as >> GlimmerM, except for the first line. So in rev 1.5 I modified the >> regexp to match both ie. \S* . This would also hopefully match any >> other Glimmer-clone formats that arose. I also fixed the pdocs to say >> this, and added tests to t/Genpred.t. >> % cvs diff -r 1.4 -r 1.5 Bio/Tools/Glimmer.pm >> % cvs diff -r 1.15 -r 1.16 t/Genpred.t >> >> I then planned to extend support to Glimmer2 and Glimmer3. I added >> the >> 4 test files (t/Glimmer*.out) but never wrote the code. This is where >> you have come in Mark :-) >> >>> I lifted that bit of code to do format detection...we don't have >> GlimmerHMM >>> installed locally, so I'm assuming Torsten's output is correct >>> and the >> above >>> is a bug. Guess I'll go check bugzilla... >> >> I'm pretty sure my 4 test files are correct - I spent a lot of time >> ensuring they were consistent etc, as I was getting very confused >> with >> the different "glimmer" versions! >> >> Hope this all helps, >> >> --Torsten >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From thokeller at gmail.com Tue Feb 13 22:00:06 2007 From: thokeller at gmail.com (Thomas Keller) Date: Tue, 13 Feb 2007 14:00:06 -0800 Subject: [Bioperl-l] update/install problem Message-ID: Could someone suggest a workaround or fix for this error? $ sudo fink update bioperl-pm586 Information about 5850 packages read in 2 seconds. The package 'bioperl-pm586' will be built and installed. The package 'xml-sax-pm586' will be installed. The package 'xml-sax-writer-pm586' will be built and installed. The package 'xml-filter-buffertext-pm586' will be built and installed. The following package will be installed or updated: bioperl-pm586 The following 3 additional packages will be installed: xml-filter-buffertext-pm586 xml-sax-pm586 xml-sax-writer-pm586 Do you want to continue? [Y/n] Y /sw/bin/dpkg-lockwait -i /sw/fink/dists/unstable/main/binary-darwin-powerpc/libs/perlmods/xml-sax-pm586_0.13-2_darwin- powerpc.deb (Reading database ... 48029 files and directories currently installed.) Preparing to replace xml-sax-pm586 0.13-2 (using .../xml-sax-pm586_0.13-2_darwin-powerpc.deb) ... Unpacking replacement xml-sax-pm586 ... Setting up xml-sax-pm586 (0.13-2) ... update-perl586-sax-parsers: adding Perl SAX parser module info file of XML::SAX::PurePerl... Can't locate object method "save_parsers_debian" via package "XML::SAX" at /sw/sbin/update-perl586-sax-parsers line 96. /sw/bin/dpkg: error processing xml-sax-pm586 (--install): subprocess post-installation script returned error exit status 22 Errors were encountered while processing: xml-sax-pm586 ### execution of /sw/bin/dpkg-lockwait failed, exit code 1 Failed: can't install package xml-sax-pm586-0.13-2 -- Tom Keller "Ecrasez l'Infame!" -- Voltaire From sac at bioperl.org Tue Feb 13 23:00:46 2007 From: sac at bioperl.org (Steve Chervitz) Date: Tue, 13 Feb 2007 15:00:46 -0800 Subject: [Bioperl-l] Bio::Root::Utilities.pm Message-ID: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com> I noticed that Bio::Root::Utilities was purged from bioperl-live for the 1.5.2 release, but I'd like us to consider adding it back. I agree that the other purged Root modules were ancient relics of the past, but Bio::Root:: Utilities.pm still has signs of life (at least I still find occasion to use it, or refer to code in it). I know that it's not currently used by any other modules in Bioperl, but there are likely some legacy scripts out there that rely on it. Probably most of those scripts are ones I've written, but there have been substantive commits by others in the not-to-distant past (Dec 2005), so at least some folks besides myself are using it and may hesitate to upgrade their bioperl installation if it's absent. I'm all for avoiding bloat in the codebase and am eager to see Bioperl be more lean and mean, but I'd like to keep this module around. I'll agree to add some tests for it as well as clean some things up (e.g., use Bio::Root::IO to get temp file name). Cheers, Steve -- Steve Chervitz sac at bioperl.org From cjfields at uiuc.edu Wed Feb 14 01:29:03 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 13 Feb 2007 19:29:03 -0600 Subject: [Bioperl-l] update/install problem In-Reply-To: References: Message-ID: On Feb 13, 2007, at 4:00 PM, Thomas Keller wrote: > Could someone suggest a workaround or fix for this error? > > $ sudo fink update bioperl-pm586 > Information about 5850 packages read in 2 seconds. > The package 'bioperl-pm586' will be built and installed. > The package 'xml-sax-pm586' will be installed. > The package 'xml-sax-writer-pm586' will be built and installed. > The package 'xml-filter-buffertext-pm586' will be built and installed. > The following package will be installed or updated: > bioperl-pm586 > The following 3 additional packages will be installed: > xml-filter-buffertext-pm586 xml-sax-pm586 xml-sax-writer-pm586 > Do you want to continue? [Y/n] Y > /sw/bin/dpkg-lockwait -i > /sw/fink/dists/unstable/main/binary-darwin-powerpc/libs/perlmods/ > xml-sax-pm586_0.13-2_darwin- > powerpc.deb > (Reading database ... 48029 files and directories currently > installed.) > Preparing to replace xml-sax-pm586 0.13-2 (using > .../xml-sax-pm586_0.13-2_darwin-powerpc.deb) ... > Unpacking replacement xml-sax-pm586 ... > Setting up xml-sax-pm586 (0.13-2) ... > update-perl586-sax-parsers: adding Perl SAX parser module info file of > XML::SAX::PurePerl... > Can't locate object method "save_parsers_debian" via package > "XML::SAX" at > /sw/sbin/update-perl586-sax-parsers line 96. > /sw/bin/dpkg: error processing xml-sax-pm586 (--install): > subprocess post-installation script returned error exit status 22 > Errors were encountered while processing: > xml-sax-pm586 > ### execution of /sw/bin/dpkg-lockwait failed, exit code 1 > Failed: can't install package xml-sax-pm586-0.13-2 The fink installation seems to be hanging on XML::SAX, not bioperl. You could try installing XML::SAX (now at v. 0.15) via CPAN using 'sudo cpan'; I updated just recently w/o problems. As an aside, you could similarly install bioperl directly from CPAN (which I also haven't had any problems with). The installation allows for installing optional modules. chris From cjfields at uiuc.edu Wed Feb 14 03:41:31 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 13 Feb 2007 21:41:31 -0600 Subject: [Bioperl-l] Bio::Root::Utilities.pm In-Reply-To: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com> References: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com> Message-ID: <1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu> On Feb 13, 2007, at 5:00 PM, Steve Chervitz wrote: > I noticed that Bio::Root::Utilities was purged from bioperl-live > for the > 1.5.2 release, but I'd like us to consider adding it back. I agree > that the > other purged Root modules were ancient relics of the past, but > Bio::Root:: > Utilities.pm still has signs of life (at least I still find > occasion to use > it, or refer to code in it). > > I know that it's not currently used by any other modules in > Bioperl, but > there are likely some legacy scripts out there that rely on it. > Probably > most of those scripts are ones I've written, but there have been > substantive > commits by others in the not-to-distant past (Dec 2005), so at > least some > folks besides myself are using it and may hesitate to upgrade their > bioperl > installation if it's absent. > > I'm all for avoiding bloat in the codebase and am eager to see > Bioperl be > more lean and mean, but I'd like to keep this module around. I'll > agree to > add some tests for it as well as clean some things up (e.g., use > Bio::Root::IO to get temp file name). > > Cheers, > Steve > -- > Steve Chervitz > sac at bioperl.org I don't have a problem with adding it back, esp. if tests are added. Everything in Bio::Root* not tied to a module was yanked out when no one spoke up about cleaning up Bio::Root* modules: http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12792/ focus=12839 Maybe others disagree? chris From bix at sendu.me.uk Wed Feb 14 08:00:35 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 14 Feb 2007 08:00:35 +0000 Subject: [Bioperl-l] update/install problem In-Reply-To: References: Message-ID: <45D2C1A3.9060300@sendu.me.uk> Chris Fields wrote: > As an aside, you could similarly install bioperl directly from CPAN > (which I also haven't had any problems with). Indeed. If you follow the unix instructions at http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix you should have a problem-free complete install under Mac OS X. From bix at sendu.me.uk Wed Feb 14 14:08:22 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 14 Feb 2007 14:08:22 +0000 Subject: [Bioperl-l] BLASTXML changes (good this time!) In-Reply-To: References: <45CCF861.8030000@sendu.me.uk> Message-ID: <45D317D6.5070903@sendu.me.uk> Chris Fields wrote: > > On Feb 9, 2007, at 4:40 PM, Sendu Bala wrote: > >> Chris Fields wrote: >>> If Sendu is out there, I think we can safely remove any dependencies >>> beyond XML::SAX 0.15 for the next release. Should I go ahead and >>> modify Build.PL? >> >> Sure, good to hear. > > I added a version dependency for XML::SAX (v. 0.15) for the PurePerl > fix. That likely obviates the need for a Bundle for XML::Simple. Not > too pressing; we can determine that before the next release. The bundle is now obsolete. Does anything in Bioperl, or any of its dependencies, now make use of the expat library? If not, I can remove mention of it from the install documentation. From bix at sendu.me.uk Wed Feb 14 14:02:39 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 14 Feb 2007 14:02:39 +0000 Subject: [Bioperl-l] DB.t failures Message-ID: <45D3167F.2000608@sendu.me.uk> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer getting sequences back from NCBI in the order we requested them in batch mode. Is this a change at NCBI? Is there some way we can make sure to return the sequences in the expected order? Or shouldn't the order be expected (should the test script be altered)? From cjfields at uiuc.edu Wed Feb 14 14:37:07 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 14 Feb 2007 08:37:07 -0600 Subject: [Bioperl-l] DB.t failures In-Reply-To: <45D3167F.2000608@sendu.me.uk> References: <45D3167F.2000608@sendu.me.uk> Message-ID: <49A5C7D3-8D63-452C-B0EA-6F7144F85E35@uiuc.edu> Confirmed on this end. It's possible that the default sort order from eutils is different now though I haven't seen anything on the eutils mail list. There may be a way to set the sort order via the base URL; I'll check into it later today; I'm still digging myself out from the midwest blizzard. chris On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: > DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer > getting sequences back from NCBI in the order we requested them in > batch > mode. > > Is this a change at NCBI? Is there some way we can make sure to return > the sequences in the expected order? Or shouldn't the order be > expected > (should the test script be altered)? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Feb 14 14:42:05 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 14 Feb 2007 08:42:05 -0600 Subject: [Bioperl-l] BLASTXML changes (good this time!) In-Reply-To: <45D317D6.5070903@sendu.me.uk> References: <45CCF861.8030000@sendu.me.uk> <45D317D6.5070903@sendu.me.uk> Message-ID: On Feb 14, 2007, at 8:08 AM, Sendu Bala wrote: > Chris Fields wrote: >> >> On Feb 9, 2007, at 4:40 PM, Sendu Bala wrote: >> >>> Chris Fields wrote: >>>> If Sendu is out there, I think we can safely remove any >>>> dependencies >>>> beyond XML::SAX 0.15 for the next release. Should I go ahead and >>>> modify Build.PL? >>> >>> Sure, good to hear. >> >> I added a version dependency for XML::SAX (v. 0.15) for the PurePerl >> fix. That likely obviates the need for a Bundle for XML::Simple. >> Not >> too pressing; we can determine that before the next release. > > The bundle is now obsolete. Does anything in Bioperl, or any of its > dependencies, now make use of the expat library? If not, I can remove > mention of it from the install documentation. I'll try getting something up about XML::SAX on the wiki today. XML::Parser, though, still requires expat AFAIK: http://www.bioperl.org/wiki/BioPerl_Dependencies chris From kellert at ohsu.edu Tue Feb 13 22:43:24 2007 From: kellert at ohsu.edu (Thomas J Keller) Date: Tue, 13 Feb 2007 14:43:24 -0800 Subject: [Bioperl-l] HowTo:SearchIO Message-ID: Greetings, I've been away from programming and informatics for many months. Hoping to get back into it, I thought it would be good to review the tutorials. I tried the code in the tutorial on the sample blast report in the tutorial and it worked fine. So I ran a blastx search and saved the results and tried to parse them: It gave the "... parsing" message, but no other results get reported. Any suggestions? Thanks, Tom Tom Keller, Ph.D. kellert at ohsu.edu 503-494-2442 6339b Basic Science Bldg http://www.ohsu.edu/research/core From mrouard at gmail.com Wed Feb 14 11:23:47 2007 From: mrouard at gmail.com (Mathieu Rouard) Date: Wed, 14 Feb 2007 12:23:47 +0100 Subject: [Bioperl-l] get the sequence of a column in a multiple alignment Message-ID: Dear all, I am starting to use the bioperl API to parse multiple alignments and I am wondering what is the most effective way to splice all the columns from an alignment (all the AA at the postion 1, position 2 etc.). I quickly implemented this simple code but it becomes quite slow when the length of sequences increases. my $stream = Bio::AlignIO->new(-file => $inputfilename, '-format' => 'stockholm'); my $aln = $stream->next_aln(); my $length = $aln->length(); my %column; for (my $i=1;$i<=$length;$i++) { my $aa; foreach my $seq ($aln->each_seq()) { my $obj = $seq->trunc($i,$i); $aa .=$obj->seq; } # need to track the column number and the sequence of the column push $column, $aa; } Would you have any other suggestion? thanks Mathieu From avilella at gmail.com Wed Feb 14 15:29:02 2007 From: avilella at gmail.com (Albert Vilella) Date: Wed, 14 Feb 2007 15:29:02 +0000 Subject: [Bioperl-l] get the sequence of a column in a multiple alignment In-Reply-To: References: Message-ID: <358f4d650702140729u4dae2847qc8eeeb45b20faca4@mail.gmail.com> there is a slice method: $mini_aln = $aln->slice(20,30); # get a block of columns Title : slice Usage : $aln2 = $aln->slice(20,30) Function : Creates a slice from the alignment inclusive of start and end columns, and the first column in the alignment is denoted 1. Sequences with no residues in the slice are excluded from the new alignment and a warning is printed. Slice beyond the length of the sequence does not do padding. Returns : A Bio::SimpleAlign object Args : Positive integer for start column, positive integer for end column, optional boolean which if true will keep gap-only columns in the newly created slice. Example: $aln2 = $aln->slice(20,30,1) but I don't know how well it behaves for lots of sequences :) On 2/14/07, Mathieu Rouard wrote: > Dear all, > > I am starting to use the bioperl API to parse multiple alignments and I am > wondering what is the most effective way to splice all the columns from an > alignment (all the AA at the postion 1, position 2 etc.). I quickly > implemented this simple code but it becomes quite slow when the length of > sequences increases. > > my $stream = Bio::AlignIO->new(-file => $inputfilename, > '-format' => 'stockholm'); > > my $aln = $stream->next_aln(); > > my $length = $aln->length(); > my %column; > > for (my $i=1;$i<=$length;$i++) { > my $aa; > foreach my $seq ($aln->each_seq()) { > my $obj = $seq->trunc($i,$i); > $aa .=$obj->seq; > } > # need to track the column number and the sequence of the column > push $column, $aa; > } > > Would you have any other suggestion? > > thanks > Mathieu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Wed Feb 14 16:59:49 2007 From: jason at bioperl.org (Jason Stajich) Date: Wed, 14 Feb 2007 08:59:49 -0800 Subject: [Bioperl-l] HowTo:SearchIO In-Reply-To: References: Message-ID: As always, reporting the version of BLAST and Bioperl you have installed will help someone diagnose if this is a fixed problem or not. If you trawl through the list archives you'll chris and others have been playing cat and mouse with the text version output from NCBI BLAST which appears to be an ever evolving beast. So the best advice right now is to get the latest bioperl from CVS to insure you have all the patches that might parse this version. If it still fails then the standard response will be to submit the report as an attachment to a new bug report on the bugzilla. thanks, -jason On Feb 13, 2007, at 2:43 PM, Thomas J Keller wrote: > Greetings, > I've been away from programming and informatics for many months. > Hoping to get back into it, I thought it would be good to review the > tutorials. > I tried the code in the tutorial on the sample blast report in the > tutorial and it worked fine. So I ran a blastx search and saved the > results and tried to parse them: It gave the "... parsing" message, > but no other results get reported. > > Any suggestions? > > Thanks, > Tom > > Tom Keller, Ph.D. > kellert at ohsu.edu > 503-494-2442 > 6339b Basic Science Bldg > http://www.ohsu.edu/research/core > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From dmessina at wustl.edu Wed Feb 14 16:58:45 2007 From: dmessina at wustl.edu (David Messina) Date: Wed, 14 Feb 2007 10:58:45 -0600 Subject: [Bioperl-l] HowTo:SearchIO In-Reply-To: References: Message-ID: <6E3CAB6B-9F9E-46FD-9021-50D7FE011860@wustl.edu> Hi Tom, Could you tell us what version of BioPerl you are using, and what specific example is failing for you? And could you post your code? That would make it easier to diagnose the problem. Thanks, Dave -- Dave Messina Senior Programmer/Analyst, Assembly Group WashU Genome Sequencing Center dmessina a t wustl.edu 314-286-1415 From cjfields at uiuc.edu Wed Feb 14 17:28:24 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 14 Feb 2007 11:28:24 -0600 Subject: [Bioperl-l] HowTo:SearchIO In-Reply-To: References: Message-ID: <3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu> I would also strongly encourage switching to using XML-based parsing, which is much more stable now. Here's the link to the NCBI response re: BLAST report parsing: http://bioperl.org/wiki/NCBI_Blast_email chris (taking a break from shoveling snow...) On Feb 14, 2007, at 10:59 AM, Jason Stajich wrote: > As always, reporting the version of BLAST and Bioperl you have > installed will help someone diagnose if this is a fixed problem or > not. If you trawl through the list archives you'll chris and others > have been playing cat and mouse with the text version output from > NCBI BLAST which appears to be an ever evolving beast. > > So the best advice right now is to get the latest bioperl from CVS > to insure you have all the patches that might parse this version. If > it still fails then the standard response will be to submit the > report as an attachment to a new bug report on the bugzilla. > > thanks, > -jason > > > On Feb 13, 2007, at 2:43 PM, Thomas J Keller wrote: > >> Greetings, >> I've been away from programming and informatics for many months. >> Hoping to get back into it, I thought it would be good to review the >> tutorials. >> I tried the code in the tutorial on the sample blast report in the >> tutorial and it worked fine. So I ran a blastx search and saved the >> results and tried to parse them: It gave the "... parsing" message, >> but no other results get reported. >> >> Any suggestions? >> >> Thanks, >> Tom >> >> Tom Keller, Ph.D. >> kellert at ohsu.edu >> 503-494-2442 >> 6339b Basic Science Bldg >> http://www.ohsu.edu/research/core >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From sac at bioperl.org Wed Feb 14 18:20:17 2007 From: sac at bioperl.org (Steve Chervitz) Date: Wed, 14 Feb 2007 10:20:17 -0800 Subject: [Bioperl-l] Bio::Root::Utilities.pm In-Reply-To: <1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu> References: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com> <1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu> Message-ID: <8f200b4c0702141020x5815018bvd69873222582d9d5@mail.gmail.com> On 2/13/07, Chris Fields wrote: > > > On Feb 13, 2007, at 5:00 PM, Steve Chervitz wrote: > > > I noticed that Bio::Root::Utilities was purged from bioperl-live > > for the > > 1.5.2 release, but I'd like us to consider adding it back. I agree > > that the > > other purged Root modules were ancient relics of the past, but > > Bio::Root:: > > Utilities.pm still has signs of life (at least I still find > > occasion to use > > it, or refer to code in it). > > > > I know that it's not currently used by any other modules in > > Bioperl, but > > there are likely some legacy scripts out there that rely on it. > > Probably > > most of those scripts are ones I've written, but there have been > > substantive > > commits by others in the not-to-distant past (Dec 2005), so at > > least some > > folks besides myself are using it and may hesitate to upgrade their > > bioperl > > installation if it's absent. > > > > I'm all for avoiding bloat in the codebase and am eager to see > > Bioperl be > > more lean and mean, but I'd like to keep this module around. I'll > > agree to > > add some tests for it as well as clean some things up (e.g., use > > Bio::Root::IO to get temp file name). > > > > Cheers, > > Steve > > -- > > Steve Chervitz > > sac at bioperl.org > > I don't have a problem with adding it back, esp. if tests are added. > Everything in Bio::Root* not tied to a module was yanked out when no > one spoke up about cleaning up Bio::Root* modules: > > http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12792/ > focus=12839 > > Maybe others disagree? > > chris > Sorry I missed out on that thread. I had some trouble with my bioperl-l email delivery getting disabled due to excessive bounces, and it took me a while to catch it. Bio::Root::Utilities is quite a grab bag of miscellaneous general functions that are occasionally useful for perl scripting (e.g., determining end-of-line characters, sending email, etc.). The code could definitely use a review, and maybe an example script to advertise it. I can look into this, and suggestions are welcome. Steve From dmessina at wustl.edu Wed Feb 14 18:55:18 2007 From: dmessina at wustl.edu (David Messina) Date: Wed, 14 Feb 2007 12:55:18 -0600 Subject: [Bioperl-l] HowTo:SearchIO In-Reply-To: <3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu> References: <3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu> Message-ID: <49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu> On Feb 14, 2007, at 11:28 AM, Chris Fields wrote: > I would also strongly encourage switching to using XML-based parsing, Unless anyone objects, I would be happy to update the HOWTO to suggest people make the switch and give an example of XML parsing. The Bio::SearchIO synopsis is already an XML example. However, there's no warning about text-based parsing nor a suggestion to use XML that I can see -- perhaps should be added? Dave From cjfields at uiuc.edu Wed Feb 14 20:12:21 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 14 Feb 2007 14:12:21 -0600 Subject: [Bioperl-l] HowTo:SearchIO In-Reply-To: <49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu> References: <3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu> <49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu> Message-ID: On Feb 14, 2007, at 12:55 PM, David Messina wrote: > > On Feb 14, 2007, at 11:28 AM, Chris Fields wrote: > >> I would also strongly encourage switching to using XML-based parsing, > > Unless anyone objects, I would be happy to update the HOWTO to > suggest people make the switch and give an example of XML parsing. > > The Bio::SearchIO synopsis is already an XML example. However, > there's no warning about text-based parsing nor a suggestion to use > XML that I can see -- perhaps should be added? > > Dave We should probably add something specifically for BLAST, yes. Other text parsers should be fine. Personally, I use XML or tabular output parsing simply b/c they are faster and do what I need. I think we'll need to retain the capability for text-based BLAST parsing, but it will become extremely bloated long-term if we plan on continuing support for parsing all versions and flavors of BLAST, particularly if NCBI continues to change the output. chris From dmessina at wustl.edu Wed Feb 14 20:46:31 2007 From: dmessina at wustl.edu (David Messina) Date: Wed, 14 Feb 2007 14:46:31 -0600 Subject: [Bioperl-l] HowTo:SearchIO In-Reply-To: References: <3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu> <49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu> Message-ID: <136DA052-B9FD-4547-B262-EC6E38B47392@wustl.edu> On Feb 14, 2007, at 2:12 PM, Chris Fields wrote: > We should probably add something specifically for BLAST, yes. > Other text parsers should be fine. Good point -- I'll make it clear it's only pertinent to BLAST. > I think we'll need to retain the capability for text-based BLAST > parsing, Agreed. Through the 1.6 release at least, I would think. > particularly if NCBI continues to change the output. Well, clearly the solution is not to use the NCBI flavor of BLAST. :) Dave (look at my email address) From jay at jays.net Thu Feb 15 13:08:56 2007 From: jay at jays.net (Jay Hannah) Date: Thu, 15 Feb 2007 07:08:56 -0600 Subject: [Bioperl-l] DB.t failures In-Reply-To: <45D3167F.2000608@sendu.me.uk> References: <45D3167F.2000608@sendu.me.uk> Message-ID: On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: > DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer > getting sequences back from NCBI in the order we requested them in > batch > mode. Is this the same result you get? DIED. FAILED tests 59-60, 63-64, 67-68, 71-72 Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97 okay, 85.84%) Failed Test Stat Wstat Total Fail Failed List of Failed ------------------------------------------------------------------------ ------- DB.t 8 2048 113 8 7.08% 59-60 63-64 67-68 71-72 8 subtests skipped. Thanks, j seqlab.net http://www.bioperl.org/wiki/User:Jhannah From bix at sendu.me.uk Thu Feb 15 13:37:32 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 15 Feb 2007 13:37:32 +0000 Subject: [Bioperl-l] DB.t failures In-Reply-To: References: <45D3167F.2000608@sendu.me.uk> Message-ID: <45D4621C.6040309@sendu.me.uk> Jay Hannah wrote: > On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: >> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer >> getting sequences back from NCBI in the order we requested them in >> batch >> mode. > > Is this the same result you get? > > > DIED. FAILED tests 59-60, 63-64, 67-68, 71-72 > Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97 > okay, 85.84%) > Failed Test Stat Wstat Total Fail Failed List of Failed > ------------------------------------------------------------------------ > ------- > DB.t 8 2048 113 8 7.08% 59-60 63-64 67-68 71-72 > 8 subtests skipped. Yes, those fails are all caused by results in the wrong order (I believe). From cjfields at uiuc.edu Thu Feb 15 14:22:09 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 08:22:09 -0600 Subject: [Bioperl-l] DB.t failures In-Reply-To: <45D4621C.6040309@sendu.me.uk> References: <45D3167F.2000608@sendu.me.uk> <45D4621C.6040309@sendu.me.uk> Message-ID: On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote: > Jay Hannah wrote: >> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: >>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer >>> getting sequences back from NCBI in the order we requested them in >>> batch >>> mode. >> >> Is this the same result you get? >> >> >> DIED. FAILED tests 59-60, 63-64, 67-68, 71-72 >> Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97 >> okay, 85.84%) >> Failed Test Stat Wstat Total Fail Failed List of Failed >> --------------------------------------------------------------------- >> --- >> ------- >> DB.t 8 2048 113 8 7.08% 59-60 63-64 67-68 71-72 >> 8 subtests skipped. > > Yes, those fails are all caused by results in the wrong order (I > believe). I'm fixing those now so it doesn't depend on order and will commit in the next few minutes. chris From bix at sendu.me.uk Thu Feb 15 14:37:00 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 15 Feb 2007 14:37:00 +0000 Subject: [Bioperl-l] DB.t failures In-Reply-To: <586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu> References: <45D3167F.2000608@sendu.me.uk> <45D4621C.6040309@sendu.me.uk> <586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu> Message-ID: <45D4700C.8020305@sendu.me.uk> Chris Fields wrote: > > On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote: > >> Jay Hannah wrote: >>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: >>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer >>>> getting sequences back from NCBI in the order we requested them in >>>> batch mode. > > Okay, I committed a fix for that. I hope there are many users who > depend on the returned sequence order for anything! s/are/aren't/ ? I suspect there might be, and its certainly a reasonable assumption to make. Did you not see an easy way of maintaining the order? From cjfields at uiuc.edu Thu Feb 15 14:28:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 08:28:46 -0600 Subject: [Bioperl-l] DB.t failures In-Reply-To: <45D4621C.6040309@sendu.me.uk> References: <45D3167F.2000608@sendu.me.uk> <45D4621C.6040309@sendu.me.uk> Message-ID: <586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu> On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote: > Jay Hannah wrote: >> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: >>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer >>> getting sequences back from NCBI in the order we requested them in >>> batch >>> mode. >> >> Is this the same result you get? >> >> >> DIED. FAILED tests 59-60, 63-64, 67-68, 71-72 >> Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97 >> okay, 85.84%) >> Failed Test Stat Wstat Total Fail Failed List of Failed >> --------------------------------------------------------------------- >> --- >> ------- >> DB.t 8 2048 113 8 7.08% 59-60 63-64 67-68 71-72 >> 8 subtests skipped. > > Yes, those fails are all caused by results in the wrong order (I > believe). Okay, I committed a fix for that. I hope there are many users who depend on the returned sequence order for anything! chris From michael.watson at bbsrc.ac.uk Thu Feb 15 14:44:27 2007 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu, 15 Feb 2007 14:44:27 -0000 Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk> Hi OK I have some great images out of this glyph, but I can't see the axis, and nor is it labelled (ie does it go from 0 - 100%?) so isn't great for publication. The docs say: "NOTE: -gc_window=>'auto' gives nice results and is recommended for drawing GC content. The GC content axes draw slightly outside the panel, so you may wish to add some extra padding on the right and left. " Any idea how to do this? Basically, I want a nice GC graph with the axis quite clearly labelled, and a nice "%GC" title next to it :) Thanks Mick The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. From nehadnahar at yahoo.co.in Thu Feb 15 15:28:42 2007 From: nehadnahar at yahoo.co.in (Neha Nahar) Date: Thu, 15 Feb 2007 15:28:42 +0000 (GMT) Subject: [Bioperl-l] Convert newick to nexus format In-Reply-To: <84703383-600F-42F4-A860-DD0D1C43EE83@bioperl.org> Message-ID: <777943.33252.qm@web8404.mail.in.yahoo.com> Thank you Jason. I ran the tests and they failed, so I re-installed the bioperl module and now it works fine. Regards, Neha. Jason Stajich wrote: Something is wrong with your install I am guessing - can you run the tests? Go to bioperl directory: $ perl t/TreeIO.t can you describe how you installed bioperl? On Feb 5, 2007, at 11:58 AM, Neha Nahar wrote: > > Hi, > Thank you for the code. > I tried it but I still get the same exception. > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus1.pl:18 > > > Please find attached the perl file(nexus.pl). > > > Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ > ~sendu/bioperl/Bio/TreeIO.pm > > Please let me know if I am using the correct version.If not, please > point me to the latest one. > > Thank you. > Regards, > nnahar > > > > > > Jason Stajich wrote:please cc the mailing list > when asking a question or followup. > > Sorry I don't know what you are doing wrong - you didn't resend > your code so I don't know if you still have a typo. > > This code works fine for me > > use Bio::TreeIO; > use strict; > my ($filein,$fileout) = @ARGV; > my ($format,$oformat) = qw(newick nexus); > my $in = Bio::TreeIO->new(-file => $filein, -format => $format);my > $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout"); > > > while( my $t = $in->next_tree ) { > $out->write_tree($t); > } > > > > On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote: > > Thank you very much for the reply. > > > I fixed the code as per your suggestion,but now am getting a > different error: > > > $ ./nexus.pl mrp-input.txt nexus.out > newickfile=mrp-input.txt, nexusfile=nexus.out > > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus.pl:23 > > > -------------------------------------- > > > Please help me out with this script. > > > Thank you. > Regards, > Neha > > > > > > > > > Jason Stajich wrote: you want to write the TREE > out not the TREE WRITER. > > > > > $treeout->write_tree($tree) > > > not > $treeout->write_tree($treeout); > > > On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote: > > > Hello everyone, > > > > > I am trying to convert newick tree to nexus format. > Using the script (refered from and email from George dated Wed Sep > 22 11:52:47 EDT 2004) : > > > > > /*------------------------------------------------------------*/ > > > > > $ cat nexus.pl > #!/usr/bin/perl -w > > > > > use Bio::TreeIO; > > > > > ($NEWICKFILE, $NEXUSFILE) = @ARGV; > print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n"; > my $treeio = new Bio::TreeIO(-format => 'newick', -file => > "$NEWICKFILE"); > my $treeout = new Bio::TreeIO(-format => 'nexus', -file => "> > $NEXUSFILE"); > while(my $tree = $treeio->next_tree) { > $treeout->write_tree($treeout); > } > > > > > exit 0; > > > > > > > > > /*------------------------------------------------------------*/ > > > > > Running the script through command line: > Gives the following error: > > > > > $ ./nexus.pl mrp-input.txt nexus.out > newickfile=mrp-input.txt, nexusfile=nexus.out > > > > > ------------- EXCEPTION ------------- > MSG: Cannot call method write_tree on Bio::TreeIO object must use a > subclass > STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ > 5.8.8/Bio/TreeIO/nexus.pm:170 > STACK toplevel ./nexus.pl:23 > > > > > -------------------------------------- > > > > > > > > > Using bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ > ~sendu/bioperl/Bio/TreeIO.pm > > > > > Questions:- > > > > > 1. Please let me know if I am using the correct version. > If not, please point me to the latest one. > > > > > 2. Provided that the version I am using is the right one, please > let me know what is wrong with the script. > > > > > Thank you. > Regards, > Neha. > > > > > > > > > > > > > > > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > > > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > > > > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > > > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > > > > > > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > > > > > > > > -Neha Nahar > " Work for cause and not for applause, live to express and not > to impress !" > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > > > -Neha Nahar > " Work for cause and not for applause, live to express and not to > impress !" > > --------------------------------- > Here?s a new way to find what you're looking for - Yahoo! Answers > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -Neha Nahar " Work for cause and not for applause, live to express and not to impress !" --------------------------------- Here?s a new way to find what you're looking for - Yahoo! Answers From cjfields at uiuc.edu Thu Feb 15 15:44:23 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 09:44:23 -0600 Subject: [Bioperl-l] DB.t failures In-Reply-To: <45D4700C.8020305@sendu.me.uk> References: <45D3167F.2000608@sendu.me.uk> <45D4621C.6040309@sendu.me.uk> <586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu> <45D4700C.8020305@sendu.me.uk> Message-ID: <809A1A9D-0344-412D-98D7-EA852EFED63D@uiuc.edu> On Feb 15, 2007, at 8:37 AM, Sendu Bala wrote: > Chris Fields wrote: >> >> On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote: >> >>> Jay Hannah wrote: >>>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: >>>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no >>>>> longer >>>>> getting sequences back from NCBI in the order we requested them in >>>>> batch mode. >> >> Okay, I committed a fix for that. I hope there are many users who >> depend on the returned sequence order for anything! > > s/are/aren't/ ? Yes, my oops. > I suspect there might be, and its certainly a reasonable assumption to > make. Did you not see an easy way of maintaining the order? I haven't looked (been busy the last few days), but I think there is a way via efetch. We could add in something to the default base URL if there is something or (probably better) add a sort_order() method to designate a particular sort order, defaulting to the old order if not set. chris From lstein at cshl.edu Thu Feb 15 18:53:13 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 15 Feb 2007 13:53:13 -0500 Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com> Hi Michael, When you set up the panel, do this: Bio::Graphics::Panel->new(-blah -blah, -pad_left => 20, -pad_right => 20); This will leave enough room on the left and right for you to see the Y axis. Otherwise it runs off the edge of the image (ok, this is a mis-design, but it was the only way to solve a chicken-and-egg problem about who gets to say how wide the panel is) Lincoln On 2/15/07, michael watson (IAH-C) wrote: > > Hi > > OK I have some great images out of this glyph, but I can't see the axis, > and nor is it labelled (ie does it go from 0 - 100%?) so isn't great for > publication. The docs say: > > "NOTE: -gc_window=>'auto' gives nice results and is recommended for > drawing GC content. The GC content axes draw slightly outside the > panel, so you may wish to add some extra padding on the right and > left. " > > Any idea how to do this? > > Basically, I want a nice GC graph with the axis quite clearly labelled, > and a nice "%GC" title next to it :) > > Thanks > > Mick > > The information contained in this message may be confidential or legally > privileged and is intended solely for the addressee. If you have > received this message in error please delete it & notify the originator > immediately. > Unauthorised use, disclosure, copying or alteration of this message is > forbidden & may be unlawful. > The contents of this e-mail are the views of the sender and do not > necessarily represent the views of the Institute. > This email and associated attachments has been checked locally for > viruses but we can accept no responsibility once it has left our > systems. > Communications on Institute computers are monitored to secure the > effective operation of the systems and for other lawful purposes. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From johnsonm at gmail.com Thu Feb 15 19:24:08 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Thu, 15 Feb 2007 13:24:08 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: Done. Bug opened in Bugzilla, diffs attached including new/updated tests: http://bugzilla.open-bio.org/show_bug.cgi?id=2206 Can somebody grab that, take a look, tweak to taste, test and commit? Tests pass on my end presently. On 2/13/07, Chris Fields wrote: > > You'll also want to update whatever relevant tests there are for > Glimmer; looks like they are in GenPred.t. > > chris > From cjfields at uiuc.edu Thu Feb 15 19:37:22 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 13:37:22 -0600 Subject: [Bioperl-l] Bio::Tools::Glimmer In-Reply-To: References: <0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net> <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org> Message-ID: <4C15214E-AE4B-4D85-A710-60536B08BE86@uiuc.edu> On Feb 15, 2007, at 1:24 PM, Mark Johnson wrote: > Done. Bug opened in Bugzilla, diffs attached including new/updated > tests: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2206 > > Can somebody grab that, take a look, tweak to taste, test and > commit? Tests > pass on my end presently. > > On 2/13/07, Chris Fields wrote: >> >> You'll also want to update whatever relevant tests there are for >> Glimmer; looks like they are in GenPred.t. >> >> chris Done; everything passed on this end as well, no tweaking necessary. If there are problems we'll definitely hear about it down the road (Glimmer is a popular tool), but I think you'll be fine. Thanks Mark! chris From cjfields at uiuc.edu Thu Feb 15 19:46:07 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 13:46:07 -0600 Subject: [Bioperl-l] DB.t failures In-Reply-To: <809A1A9D-0344-412D-98D7-EA852EFED63D@uiuc.edu> References: <45D3167F.2000608@sendu.me.uk> <45D4621C.6040309@sendu.me.uk> <586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu> <45D4700C.8020305@sendu.me.uk> <809A1A9D-0344-412D-98D7-EA852EFED63D@uiuc.edu> Message-ID: On Feb 15, 2007, at 9:44 AM, Chris Fields wrote: > > On Feb 15, 2007, at 8:37 AM, Sendu Bala wrote: > >> Chris Fields wrote: >>> >>> On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote: >>> >>>> Jay Hannah wrote: >>>>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote: >>>>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no >>>>>> longer >>>>>> getting sequences back from NCBI in the order we requested >>>>>> them in >>>>>> batch mode. >>> >>> Okay, I committed a fix for that. I hope there are many users who >>> depend on the returned sequence order for anything! >> >> s/are/aren't/ ? > > Yes, my oops. > >> I suspect there might be, and its certainly a reasonable >> assumption to >> make. Did you not see an easy way of maintaining the order? > > I haven't looked (been busy the last few days), but I think there is > a way via efetch. > > We could add in something to the default base URL if there is > something or (probably better) add a sort_order() method to designate > a particular sort order, defaulting to the old order if not set. > > chris Delving in to it further, the problem only occurs when using get_seq_stream() directly in batch mode, which is likely only used by developers for testing. The sort issue only pops up when eposting IDs using that mode; retrieved seqs are returned in a different order than through a direct efetch query (the default with get_Stream* or get_Seq* methods). No use of the 'sort' parameter works to get around that problem, not a complete surprise since it is supposed to only work for PubMed, but since the method is rarely used I'll just leave the bullet-proofed tests alone. chris From letondal at pasteur.fr Thu Feb 15 20:23:55 2007 From: letondal at pasteur.fr (Catherine Letondal) Date: Thu, 15 Feb 2007 21:23:55 +0100 Subject: [Bioperl-l] Problem parsing clustalw format ni AlignIO Message-ID: Hi bioperlers, I have a script called protal2dna (http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html, see attachment #1) that realign DNA sequences giving their sequences + the corresponding protein alignment (sequences have to be in the same order or named equivalently). We have a parsing problem reported from the AlignIO class when users enter some clustalw file (see attachment #2 for an example): % protal2dna alig-protal2dna.dat dna-protal2dna.data no alignment available in 'clustalw' format from file 'alig-protal2dna.dat' % I have tried with bioperl 1.4. I have looked in the archive and in the BUGS, but found nothing? Is there any bug fix for this? I also provide the DNA sequences file if you want to test. Thanks a lot in advance, -- Catherine Letondal -- Institut Pasteur www.pasteur.fr/~letondal -------------- next part -------------- A non-text attachment was scrubbed... Name: protal2dna Type: application/octet-stream Size: 11093 bytes Desc: not available URL: -------------- next part -------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: alig-protal2dna.dat Type: application/octet-stream Size: 12022 bytes Desc: not available URL: -------------- next part -------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: dna-protal2dna.data Type: application/octet-stream Size: 7739 bytes Desc: not available URL: From Kevin.M.Brown at asu.edu Thu Feb 15 21:38:25 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 15 Feb 2007 14:38:25 -0700 Subject: [Bioperl-l] Problem parsing clustalw format ni AlignIO In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B402BA7764@EX02.asurite.ad.asu.edu> Did you try Bioperl 1.5.2 to see if updates to it might fix the issue? IIRC 1.4 is nearly 2 years old now. 1.5.2 was released within the last few months. > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Catherine Letondal > Sent: Thursday, February 15, 2007 1:24 PM > To: bioperl-l > Cc: Catherine Letondal; Katja Schuerer > Subject: [Bioperl-l] Problem parsing clustalw format ni AlignIO > > Hi bioperlers, > > I have a script called protal2dna > (http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html, > see attachment #1) that realign DNA sequences giving their > sequences + the corresponding protein alignment (sequences > have to be in the same order or named equivalently). We have > a parsing problem reported from the AlignIO class when users > enter some clustalw file (see attachment #2 for an example): > > % protal2dna alig-protal2dna.dat dna-protal2dna.data no > alignment available in 'clustalw' format from file > 'alig-protal2dna.dat' > % > > I have tried with bioperl 1.4. I have looked in the archive > and in the BUGS, but found nothing? > Is there any bug fix for this? I also provide the DNA > sequences file if you want to test. > > Thanks a lot in advance, > > -- > Catherine Letondal -- Institut Pasteur > www.pasteur.fr/~letondal > > From cjfields at uiuc.edu Thu Feb 15 21:50:54 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 15:50:54 -0600 Subject: [Bioperl-l] Bio::Root::Utilities.pm In-Reply-To: <8f200b4c0702141020x5815018bvd69873222582d9d5@mail.gmail.com> References: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com> <1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu> <8f200b4c0702141020x5815018bvd69873222582d9d5@mail.gmail.com> Message-ID: On Feb 14, 2007, at 12:20 PM, Steve Chervitz wrote: ... >> >> I don't have a problem with adding it back, esp. if tests are added. >> Everything in Bio::Root* not tied to a module was yanked out when no >> one spoke up about cleaning up Bio::Root* modules: >> >> http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12792/ >> focus=12839 >> >> Maybe others disagree? >> >> chris >> > > Sorry I missed out on that thread. I had some trouble with my > bioperl-l > email delivery getting disabled due to excessive bounces, and it > took me a > while to catch it. > > Bio::Root::Utilities is quite a grab bag of miscellaneous general > functions > that are occasionally useful for perl scripting (e.g., determining > end-of-line characters, sending email, etc.). The code could > definitely use > a review, and maybe an example script to advertise it. I can look > into this, > and suggestions are welcome. > > Steve Steve, I have added Root::Utilities back to CVS but I didn't know if I should add back the other related Root modules (didn't know what your future plans were for them). Could the Bio::Root::Global and Bio::Root::Object stuff be consolidated into Bio::Root::Utilities or would that be too problematic? None of the other Bio* modules currently use them. Personally, I use Date::Manip for anything that requires date/time manipulation (updating seq records based on dates, for instance). Some of the other utilities could come in handy, though. Don't know if that helps... chris From cjfields at uiuc.edu Thu Feb 15 21:51:58 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 15:51:58 -0600 Subject: [Bioperl-l] XEMBL deprecation Message-ID: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> I have gone ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService both for deprecation in the wiki and in CVS (though I haven't set any timeline): http://www.bioperl.org/wiki/Deprecated_modules The XEMBL web services are no longer available, and it looks like everything is running through DBFetch now. The XEMBL tests are skipped if no server is detected, so they shouldn't cause any problems with Bioperl installations. Lincoln, was there anything to salvage from these? I noticed they used SOAP::Lite, so maybe we could convert these over to a SOAP-based interface to DBFetch web services? chris From johnsonm at gmail.com Thu Feb 15 22:29:37 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Thu, 15 Feb 2007 16:29:37 -0600 Subject: [Bioperl-l] Bio::Tools::Run::Glimmer? Message-ID: Now that I've got Bio::Tools::Glimmer parsing Glimmer2 and Glimmer3 output, I suppose I might as well go and write Bio::Tools::Run::Glimmer. I suspect another 4-in-1 module may be possible. Now that I think about it, I'll need one for GeneMark, too. Comments? Suggestions on a good module to use as a template? From hlapp at gmx.net Fri Feb 16 01:18:56 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 15 Feb 2007 20:18:56 -0500 Subject: [Bioperl-l] XEMBL deprecation In-Reply-To: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> Message-ID: <989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net> On Feb 15, 2007, at 4:51 PM, Chris Fields wrote: > The XEMBL web services are no longer available What happens if someone invokes the module? Should it maybe return nothing and warn()? I don't think it's a good idea if the module just silently does not function because its backend is no more. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Fri Feb 16 01:48:12 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 19:48:12 -0600 Subject: [Bioperl-l] XEMBL deprecation In-Reply-To: <989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net> References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> <989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net> Message-ID: <00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu> On Feb 15, 2007, at 7:18 PM, Hilmar Lapp wrote: > On Feb 15, 2007, at 4:51 PM, Chris Fields wrote: > >> The XEMBL web services are no longer available > > What happens if someone invokes the module? Should it maybe return > nothing and warn()? I don't think it's a good idea if the module > just silently does not function because its backend is no more. > > -hilmar Yes, I thought the same. I have added a warn() noting the deprecation to the XEMBL constructor and removed XEMBL tests from CVS. The modules are still there for the time being. I actually worry more about the internals; it would be a shame to toss them altogether. Would it be worth it to shift this towards a SOAP-based interface to DBFetch? Or, more precisely, how much trouble would it be to do so? chris From hlapp at gmx.net Fri Feb 16 01:54:29 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 15 Feb 2007 20:54:29 -0500 Subject: [Bioperl-l] XEMBL deprecation In-Reply-To: <00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu> References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> <989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net> <00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu> Message-ID: Well, if dbFetch dosn't have a SOAP based interface, how would you want to do this? -hilmar On Feb 15, 2007, at 8:48 PM, Chris Fields wrote: > On Feb 15, 2007, at 7:18 PM, Hilmar Lapp wrote: > >> On Feb 15, 2007, at 4:51 PM, Chris Fields wrote: >> >>> The XEMBL web services are no longer available >> >> What happens if someone invokes the module? Should it maybe return >> nothing and warn()? I don't think it's a good idea if the module >> just silently does not function because its backend is no more. >> >> -hilmar > > Yes, I thought the same. I have added a warn() noting the > deprecation to the XEMBL constructor and removed XEMBL tests from > CVS. The modules are still there for the time being. > > I actually worry more about the internals; it would be a shame to > toss them altogether. Would it be worth it to shift this towards a > SOAP-based interface to DBFetch? Or, more precisely, how much > trouble would it be to do so? > > chris -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Fri Feb 16 01:59:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Feb 2007 19:59:46 -0600 Subject: [Bioperl-l] XEMBL deprecation In-Reply-To: References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> <989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net> <00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu> Message-ID: <8C7E18C6-B38D-4E15-BE9C-84256B09C312@uiuc.edu> On Feb 15, 2007, at 7:54 PM, Hilmar Lapp wrote: > Well, if dbFetch dosn't have a SOAP based interface, how would you > want to do this? > > -hilmar DBfetch has a SOAP-based interface: http://www.ebi.ac.uk/Tools/webservices/services/dbfetch Just not sure how easy it would be to switch XEMBL code over to using it. We already have Bio::DB::DBFetch so it may be redundant, but I don't recall any other SOAP-based tools in BioPerl beyond some stuff in bioperl-run (and I'm not sure how up-to-date the DBFetch module is). chris From jimhu at tamu.edu Fri Feb 16 05:20:09 2007 From: jimhu at tamu.edu (Jim Hu) Date: Thu, 15 Feb 2007 23:20:09 -0600 Subject: [Bioperl-l] Pathway tools output parser In-Reply-To: References: Message-ID: <1632E2BF-4402-47DE-B750-9763E02711D2@tamu.edu> Hi Chris, I need to check the list more often! I never got an answer here, but Eric Just pointed out a perl api at TAIR that's linked from the BioCyc site. I've used the lisp parser functions from that to move the data to a perl array of arrays, and I'm working on creating object classes for BioCyc objects, starting with genes and products. I need to look at the appropriate ways to link this up to the existing codebase for interconverting to Chado and other BioPerl data types. Jim ===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 On Feb 7, 2007, at 12:07 AM, Chris Mungall wrote: > > Hi Jim > > Did you ever get an answer to this? I'm interested in storing > pathway data > in Chado & I remember enough lisp to get it into something perl- > manageable > like XML > > On Thu, 25 Jan 2007, Jim Hu wrote: > >> Is there a module to parse the lisp object files from Peter Karp's >> Pathway Tools? I need a parser to convert the gene and protein >> objects in EcoCyc releases into something that can be imported into >> Chado. >> ===================================== >> Jim Hu >> Associate Professor >> Dept. of Biochemistry and Biophysics >> 2128 TAMU >> Texas A&M Univ. >> College Station, TX 77843-2128 >> 979-862-4054 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From lstein at cshl.edu Fri Feb 16 13:35:19 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 16 Feb 2007 08:35:19 -0500 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts In-Reply-To: <45D1E2A5.6060104@sendu.me.uk> References: <45D1E2A5.6060104@sendu.me.uk> Message-ID: <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com> Hi, Older versions of Storable can't deal with features that contain subroutine refs. You should get the current version from CPAN. Note that there is a slight security problem here if you don't trust the objects stored in the database. If they contain code refs, the code will be evaluated during deserialization. Lincoln On 2/13/07, Sendu Bala wrote: > > I have some raw sequences in a Bio::DB::SeqFeature::Store mysql database > and wanted to associated some basic information with them, like exon > positions. I thought of creating Bio::SeqFeature::Gene::Transcript > objects and storing them so I could later use features() to see what > other features overlapped exons. I ran into a fatal error that can be > replicated with the following simplified one-liner: > > perl -MBio::DB::SeqFeature::Store -MBio::SeqFeature::Gene::Transcript -e > '$db = Bio::DB::SeqFeature::Store->new(-adaptor => "DBI::mysql", -dsn => > "dbi:mysql:test"); $trans = > Bio::SeqFeature::Gene::Transcript->new(-start => 1, -end => 2, -seq_id > => "test"); $db->store($trans); @trans = $db->features(-seqid => $id, > -type => "transcript"); print "@trans\n";' > > code sub { > package Bio::SeqFeature::Generic; > use strict 'refs'; > my $self = shift @_; > foreach my $f (@{[] unless $$self{'_gsf_sub_array'};}) { > $f = undef; > } > $$self{'_gsf_seq'} = undef; > foreach my $t (keys %{{} unless $$self{'_gsf_tag_hash'};}) { > $$self{'_gsf_tag_hash'}{$t} = undef; > delete $$self{'_gsf_tag_hash'}{$t}; > } > } did not evaluate to a subroutine reference, at > /.../Bio/DB/SeqFeature/Store.pm line 2280 > > > Is this a bug? Or am I taking the wrong approach? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Fri Feb 16 13:47:29 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 16 Feb 2007 08:47:29 -0500 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts In-Reply-To: <45D5B42A.1080303@sendu.me.uk> References: <45D1E2A5.6060104@sendu.me.uk> <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com> <45D5B42A.1080303@sendu.me.uk> Message-ID: <6dce9a0b0702160547s5873cd2bg2c5cf09779138249@mail.gmail.com> Hi Sendu, I'll do a little digging and let you know. Lincoln On 2/16/07, Sendu Bala wrote: > > Lincoln Stein wrote: > > Hi, > > > > Older versions of Storable can't deal with features that contain > > subroutine refs. You should get the current version from CPAN. > > Do you have any idea which version of Storable first supported this? I > can specify that version in Bioperl's Build.PL. > > (else I just just specify the latest version) > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Fri Feb 16 13:52:30 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 16 Feb 2007 08:52:30 -0500 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts In-Reply-To: <45D5B42A.1080303@sendu.me.uk> References: <45D1E2A5.6060104@sendu.me.uk> <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com> <45D5B42A.1080303@sendu.me.uk> Message-ID: <6dce9a0b0702160552l11417675kec7b5b4537cbfa70@mail.gmail.com> It looks like 2.05 or higher is the Storable version to use. It requires B::Deparse, which is (I think) standard on perl 5.6 or higher. Lincoln On 2/16/07, Sendu Bala wrote: > > Lincoln Stein wrote: > > Hi, > > > > Older versions of Storable can't deal with features that contain > > subroutine refs. You should get the current version from CPAN. > > Do you have any idea which version of Storable first supported this? I > can specify that version in Bioperl's Build.PL. > > (else I just just specify the latest version) > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Fri Feb 16 13:55:06 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 16 Feb 2007 08:55:06 -0500 Subject: [Bioperl-l] XEMBL deprecation In-Reply-To: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> Message-ID: <6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com> I like the idea of converting these over to use DBFetch's SOAP services. On the other hand, it isn't llikely that I'm going to have time to do this anytime soon. Probably the best thing to do is to issue a warning and return undef if someone tries to use othe XEMBL module. I'll make that change. Lincoln On 2/15/07, Chris Fields wrote: > > I have gone ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService > both for deprecation in the wiki and in CVS (though I haven't set any > timeline): > > http://www.bioperl.org/wiki/Deprecated_modules > > The XEMBL web services are no longer available, and it looks like > everything is running through DBFetch now. The XEMBL tests are > skipped if no server is detected, so they shouldn't cause any > problems with Bioperl installations. > > Lincoln, was there anything to salvage from these? I noticed they > used SOAP::Lite, so maybe we could convert these over to a SOAP-based > interface to DBFetch web services? > > chris > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Fri Feb 16 13:55:47 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 16 Feb 2007 08:55:47 -0500 Subject: [Bioperl-l] XEMBL deprecation In-Reply-To: <6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com> References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> <6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com> Message-ID: <6dce9a0b0702160555s5b4d7d70n936c603bc79dd40d@mail.gmail.com> Oh, looks like someone has inserted the warnings already. Good. Lincoln On 2/16/07, Lincoln Stein wrote: > > I like the idea of converting these over to use DBFetch's SOAP services. > On the other hand, it isn't llikely that I'm going to have time to do this > anytime soon. > > Probably the best thing to do is to issue a warning and return undef if > someone tries to use othe XEMBL module. I'll make that change. > > Lincoln > > On 2/15/07, Chris Fields wrote: > > > > I have gone ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService > > both for deprecation in the wiki and in CVS (though I haven't set any > > timeline): > > > > http://www.bioperl.org/wiki/Deprecated_modules > > > > The XEMBL web services are no longer available, and it looks like > > everything is running through DBFetch now. The XEMBL tests are > > skipped if no server is detected, so they shouldn't cause any > > problems with Bioperl installations. > > > > Lincoln, was there anything to salvage from these? I noticed they > > used SOAP::Lite, so maybe we could convert these over to a SOAP-based > > interface to DBFetch web services? > > > > chris > > > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From bix at sendu.me.uk Fri Feb 16 13:56:50 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 16 Feb 2007 13:56:50 +0000 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts In-Reply-To: <6dce9a0b0702160552l11417675kec7b5b4537cbfa70@mail.gmail.com> References: <45D1E2A5.6060104@sendu.me.uk> <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com> <45D5B42A.1080303@sendu.me.uk> <6dce9a0b0702160552l11417675kec7b5b4537cbfa70@mail.gmail.com> Message-ID: <45D5B822.6080908@sendu.me.uk> Lincoln Stein wrote: > It looks like 2.05 or higher is the Storable version to use. It requires > B::Deparse, which is (I think) standard on perl 5.6 or higher. Thanks, now recommended in Build.PL From cjfields at uiuc.edu Fri Feb 16 14:05:08 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 16 Feb 2007 08:05:08 -0600 Subject: [Bioperl-l] XEMBL deprecation In-Reply-To: <6dce9a0b0702160555s5b4d7d70n936c603bc79dd40d@mail.gmail.com> References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu> <6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com> <6dce9a0b0702160555s5b4d7d70n936c603bc79dd40d@mail.gmail.com> Message-ID: I added the warning yesterday. We can add something to the project priority list on modifying XEMBL to use DBFetch instead; I like the SOAP-based interface. I am thinking of a similar interface for NCBI eutils but I haven't had time to work on it. chris On Feb 16, 2007, at 7:55 AM, Lincoln Stein wrote: > Oh, looks like someone has inserted the warnings already. Good. > > Lincoln > > On 2/16/07, Lincoln Stein wrote:I like the idea > of converting these over to use DBFetch's SOAP services. On the > other hand, it isn't llikely that I'm going to have time to do this > anytime soon. > > Probably the best thing to do is to issue a warning and return > undef if someone tries to use othe XEMBL module. I'll make that > change. > > Lincoln > > > On 2/15/07, Chris Fields < cjfields at uiuc.edu> wrote: I have gone > ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService > both for deprecation in the wiki and in CVS (though I haven't set any > timeline): > > http://www.bioperl.org/wiki/Deprecated_modules > > The XEMBL web services are no longer available, and it looks like > everything is running through DBFetch now. The XEMBL tests are > skipped if no server is detected, so they shouldn't cause any > problems with Bioperl installations. > > Lincoln, was there anything to salvage from these? I noticed they > used SOAP::Lite, so maybe we could convert these over to a SOAP-based > interface to DBFetch web services? > > chris > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Fri Feb 16 13:39:54 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 16 Feb 2007 13:39:54 +0000 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts In-Reply-To: <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com> References: <45D1E2A5.6060104@sendu.me.uk> <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com> Message-ID: <45D5B42A.1080303@sendu.me.uk> Lincoln Stein wrote: > Hi, > > Older versions of Storable can't deal with features that contain > subroutine refs. You should get the current version from CPAN. Do you have any idea which version of Storable first supported this? I can specify that version in Bioperl's Build.PL. (else I just just specify the latest version) From eu at otelo-online.de Sat Feb 17 12:55:08 2007 From: eu at otelo-online.de (eu at otelo-online.de) Date: Sat, 17 Feb 2007 13:55:08 +0100 (CET) Subject: [Bioperl-l] Bioperl Module OddCodes(help) Message-ID: <29037001.1171716908969.JavaMail.ngmail@webmail18> Hello @all, i want translate a Sequence in Fasta Format only to acidic,basic and polar dependent on the pH. OddCodes Module can ony to acidic,basic, polar and hydrophobic. And i think on default pH. Can somebody help me? I dont know whether it is possible? Because i need for each amino acid a positive, negative charge and unchargedly. thx Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: g?nstig und schnell mit DSL - das All-Inclusive-Paket f?r clevere Doppel-Sparer, nur 44,85 ? inkl. DSL- und ISDN-Grundgeb?hr! http://www.arcor.de/rd/emf-dsl-2 From The_Polymorph at rocketmail.com Sun Feb 18 19:08:34 2007 From: The_Polymorph at rocketmail.com (Caitlin) Date: Sun, 18 Feb 2007 11:08:34 -0800 (PST) Subject: [Bioperl-l] Missing ppm for Bioperl 1.5.2_100(?) Message-ID: <148421.50501.qm@web50801.mail.yahoo.com> Hi. In an attempt to upgrade my Bioperl install from 1.5.2 RC5 to 1.5.2_100, I noticed the ppm was not found on the activestate repositories. Thanks, ~Caitlin ____________________________________________________________________________________ No need to miss a message. Get email on-the-go with Yahoo! Mail for Mobile. Get started. http://mobile.yahoo.com/mail From bix at sendu.me.uk Sun Feb 18 20:36:03 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 18 Feb 2007 20:36:03 +0000 Subject: [Bioperl-l] Missing ppm for Bioperl 1.5.2_100(?) In-Reply-To: <148421.50501.qm@web50801.mail.yahoo.com> References: <148421.50501.qm@web50801.mail.yahoo.com> Message-ID: <45D8B8B3.4000408@sendu.me.uk> Caitlin wrote: > Hi. > > In an attempt to upgrade my Bioperl install from 1.5.2 RC5 to > 1.5.2_100, I noticed the ppm was not found on the activestate > repositories. Follow the install instructions: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows Its not in the normal activestate repository, but on bioperl.org. From t.nugent at cs.ucl.ac.uk Mon Feb 19 17:29:48 2007 From: t.nugent at cs.ucl.ac.uk (Tim Nugent) Date: Mon, 19 Feb 2007 17:29:48 +0000 Subject: [Bioperl-l] Module to draw transmembrane protein toplogoy Message-ID: <45D9DE8C.2010301@cs.ucl.ac.uk> Hi everyone, I've written a perl module to display transmembrane protein topology using GD. There are various options, including labels, helix/loop dimensions, colour schemes etc but it only requires a string or array containing the protein topology (e.g. transmembrane helix start/stop points). It produces output like this: http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png using the code at the bottom. Here is a the module: http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm I've never submitted anything to Bioperl before - is this sort of thing likely to be of use to others? I imagine it would sit alongside some of the Bio::Graphics stuff. Best wishes, Tim #!/usr/bin/perl use strict; use warnings; use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to module use DrawTransmembrane; my @topology = (20,45,59,70,86,109,145,168,194,220); my %labels = ('5' => '5 - Sulphation Site', '21' => '1st Helix', '47' => '40 - Mutation', '60' => 'Voltage Sensor', '72' => '72 - Mutation 2', '73' => '73 - Mutation 3', '138' => '138 - Glycosylation Site', '170' => '170 - Phosphorylation Site', '200' => 'Last Helix'); my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a cartoon displaying transmembrane helices.', -topology => \@topology, -n_terminal => 'out', -helix_width => 48, -helix_height => 125, -short_loop_limit => 10, -long_loop_limit => 35, -loop_width => 25, -colour_scheme => 'yellow', -labels => \%labels, -text_offset => -10); ## print the .png file my $output = 'test.png'; open(OUTPUT, ">$output"); binmode OUTPUT; print OUTPUT $im->png; close OUTPUT; my $system = `display $output`; -- Tim Nugent (MRes) Research Student Bioinformatics Unit Department of Computer Science University College London Gower Street London WC1E 6BT Tel: 020-7679-0410 t.nugent at ucl.ac.uk From bix at sendu.me.uk Mon Feb 19 17:42:23 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 19 Feb 2007 17:42:23 +0000 Subject: [Bioperl-l] t/FeatureHolder.x Message-ID: <45D9E17F.4030302@sendu.me.uk> Is this supposed to work? It doesn't get run in the test suite normally because of its name. With a live checkout I get: ./Build test --test_files t/FeatureHolder.x --verbose t/FeatureHolder....1..6 ok 1 ok 2 Set group tag to: locus_tag GROUPS: GROUP [?]:source [snip] resolved pair Bio::SeqFeature::Generic=HASH(0x1375dc0) Bio::SeqFeature::Generic=HASH(0x1362830) UNFLATTENING GROUP: GROUP [?]:gene UNFLATTENING GROUP: GROUP [?]:repeat_region UNFLATTENING GROUP: GROUP [?]:gene UNFLATTENING GROUP: GROUP [?]:repeat_region UNFLATTENING GROUP: GROUP [BG:DS07721.3]:gene mRNA CDS UNFLATTENING GROUP: GROUP [BG:DS07721.6]:gene mRNA CDS ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: DUPLICATE ID: AAF53399.1 STACK: Error::throw STACK: Bio::Root::Root::throw /home/sendu/src/bioperl/core/blib/lib/Bio/Root/Root.pm:359 STACK: Bio::SeqFeature::Tools::IDHandler::create_hierarchy_from_ParentIDs /home/sendu/src/bioperl/core/blib/lib/Bio/SeqFeature/Tools/IDHandler.pm:175 STACK: Bio::FeatureHolderI::create_hierarchy_from_ParentIDs /home/sendu/src/bioperl/core/blib/lib/Bio/FeatureHolderI.pm:245 STACK: t/FeatureHolder.x:68 ----------------------------------------------------------- dubious Test returned status 255 (wstat 65280, 0xff00) DIED. FAILED tests 3-6 Failed 4/6 tests, 33.33% okay Failed Test Stat Wstat Total Fail List of Failed ------------------------------------------------------------------------------- t/FeatureHolder.x 255 65280 6 8 3-6 Failed 1/1 test scripts. 4/6 subtests failed. Files=1, Tests=6, 1 wallclock secs ( 0.55 cusr + 0.04 csys = 0.59 CPU) Failed 1/1 test programs. 4/6 subtests failed. It also fails quite differently with 1.5.2. From cjfields at uiuc.edu Mon Feb 19 20:04:20 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 19 Feb 2007 14:04:20 -0600 Subject: [Bioperl-l] t/FeatureHolder.x In-Reply-To: <45D9E17F.4030302@sendu.me.uk> References: <45D9E17F.4030302@sendu.me.uk> Message-ID: <534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu> Looks like that's some of Chris Mungall's stuff for GFF3. Don't know if he's stalking the mail list. Wonder if this has anything to do the feature/annotation changes around rel 1.5. (the other) chris On Feb 19, 2007, at 11:42 AM, Sendu Bala wrote: > Is this supposed to work? It doesn't get run in the test suite > normally > because of its name. > > With a live checkout I get: > ./Build test --test_files t/FeatureHolder.x --verbose > t/FeatureHolder....1..6 ... From cjfields at uiuc.edu Mon Feb 19 21:24:04 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 19 Feb 2007 15:24:04 -0600 Subject: [Bioperl-l] Module to draw transmembrane protein toplogoy In-Reply-To: <45D9DE8C.2010301@cs.ucl.ac.uk> References: <45D9DE8C.2010301@cs.ucl.ac.uk> Message-ID: <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu> I think this is pretty nice! We can add the code and test script to bugzilla and (if someone has time) try to see where it might fit in, though Bio::Graphics sounds like a good spot. Anyone else have ideas on where this could go? chris On Feb 19, 2007, at 11:29 AM, Tim Nugent wrote: > Hi everyone, > > I've written a perl module to display transmembrane protein topology > using GD. There are various options, including labels, helix/loop > dimensions, colour schemes etc but it only requires a string or array > containing the protein topology (e.g. transmembrane helix start/stop > points). It produces output like this: > > http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png > http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png > > using the code at the bottom. > > Here is a the module: > http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm > > I've never submitted anything to Bioperl before - is this sort of > thing > likely to be of use to others? I imagine it would sit alongside > some of > the Bio::Graphics stuff. > > Best wishes, > > Tim > > #!/usr/bin/perl > > use strict; > use warnings; > use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to module > use DrawTransmembrane; > > my @topology = (20,45,59,70,86,109,145,168,194,220); > > my %labels = ('5' => '5 - Sulphation Site', > '21' => '1st Helix', > '47' => '40 - Mutation', > '60' => 'Voltage Sensor', > '72' => '72 - Mutation 2', > '73' => '73 - Mutation 3', > '138' => '138 - Glycosylation Site', > '170' => '170 - Phosphorylation Site', > '200' => 'Last Helix'); > > my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a > cartoon displaying transmembrane helices.', > -topology => > \@topology, > -n_terminal => 'out', > -helix_width => 48, > -helix_height => 125, > -short_loop_limit > => 10, > -long_loop_limit => > 35, > -loop_width => 25, > -colour_scheme => > 'yellow', > -labels => \%labels, > -text_offset => -10); > > ## print the .png file > my $output = 'test.png'; > open(OUTPUT, ">$output"); > binmode OUTPUT; > print OUTPUT $im->png; > close OUTPUT; > > my $system = `display $output`; > > -- > Tim Nugent (MRes) > Research Student > Bioinformatics Unit > Department of Computer Science > University College London > Gower Street > London WC1E 6BT > Tel: 020-7679-0410 > t.nugent at ucl.ac.uk > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjm at fruitfly.org Mon Feb 19 22:23:56 2007 From: cjm at fruitfly.org (Chris Mungall) Date: Mon, 19 Feb 2007 14:23:56 -0800 Subject: [Bioperl-l] t/FeatureHolder.x In-Reply-To: <534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu> References: <45D9E17F.4030302@sendu.me.uk> <534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu> Message-ID: On Feb 19, 2007, at 12:04 PM, Chris Fields wrote: > Looks like that's some of Chris Mungall's stuff for GFF3. Don't know > if he's stalking the mail list. occasionally.. > Wonder if this has anything to do the feature/annotation changes > around rel 1.5. possibly even before then. there was a reason for the .x prefix... I think it was intended to denote requirements; tests that don't pass yet but should in the future anyway, this file can go > (the other) chris > > On Feb 19, 2007, at 11:42 AM, Sendu Bala wrote: > >> Is this supposed to work? It doesn't get run in the test suite >> normally >> because of its name. >> >> With a live checkout I get: >> ./Build test --test_files t/FeatureHolder.x --verbose >> t/FeatureHolder....1..6 > ... > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From torsten.seemann at infotech.monash.edu.au Mon Feb 19 23:20:48 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 20 Feb 2007 10:20:48 +1100 Subject: [Bioperl-l] Bioperl Module OddCodes(help) In-Reply-To: <29037001.1171716908969.JavaMail.ngmail@webmail18> References: <29037001.1171716908969.JavaMail.ngmail@webmail18> Message-ID: > i want translate a Sequence in Fasta Format only to acidic,basic and polar dependent on the pH. > OddCodes Module can ony to acidic,basic, polar and hydrophobic. And i think on default pH. > Can somebody help me? I dont know whether it is possible? > Because i need for each amino acid a positive, negative charge and unchargedly. The latest released Bioperl 1.5.x has a charge() function which does what you want: http://doc.bioperl.org/releases/bioperl-1.5.2/Bio/Tools/OddCodes.html It returns A, N, C for the charges. --Torsten From bix at sendu.me.uk Tue Feb 20 11:18:14 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 20 Feb 2007 11:18:14 +0000 Subject: [Bioperl-l] Bio::Graphics::FeatureBase seq_id question Message-ID: <45DAD8F6.1030409@sendu.me.uk> Bio::Graphics::FeatureBase::seq_id is currently implemented as a read-only alias to ref(): sub seq_id { shift->ref() } What is the reasoning behind this? Can it be made to handle setting of the value as well?: sub seq_id { shift->ref(@_) } Cheers, Sendu. From cjfields at uiuc.edu Tue Feb 20 13:39:11 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Feb 2007 07:39:11 -0600 Subject: [Bioperl-l] t/FeatureHolder.x In-Reply-To: References: <45D9E17F.4030302@sendu.me.uk> <534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu> Message-ID: <67E26F10-67D5-405E-A00E-826EF51C476F@uiuc.edu> On Feb 19, 2007, at 4:23 PM, Chris Mungall wrote: > On Feb 19, 2007, at 12:04 PM, Chris Fields wrote: > >> Looks like that's some of Chris Mungall's stuff for GFF3. Don't know >> if he's stalking the mail list. > > occasionally.. > >> Wonder if this has anything to do the feature/annotation changes >> around rel 1.5. > > possibly even before then. > > there was a reason for the .x prefix... I think it was intended to > denote requirements; tests that don't pass yet but should in the > future > > anyway, this file can go Chris, I removed it from CVS. Thanks! (the other) chris besides chris D. P.S. I may have some Data::Stag questions for you at some point. I'm guessing you're still at fruitfly.org? From cjfields at uiuc.edu Tue Feb 20 13:29:20 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Feb 2007 07:29:20 -0600 Subject: [Bioperl-l] Fwd: help on remote blast References: <20070220073200.M42567@bic.boseinst.ernet.in> Message-ID: <6CC54E14-0581-45AF-8F12-E500A2FFDE86@uiuc.edu> Sanjib, You shouldn't email the developers directly. Questions like this should go to the bioperl mail list in case I (or others) can't answer them immediately. chris Begin forwarded message: > From: "Sanjib Kumar Gupta" > Date: February 20, 2007 1:32:00 AM CST > To: cjfields at uiuc.edu > Subject: help on remote blast > > Dear Dr. Chris > I am very new usedr to bioperl. and have been using the script for > retrieving some blast sequences . But suddenly it has stopped > retrieving > #perl n9.pl > te.pep > waiting........ > for a long time > > I am attaching the file. Can you please tell me what I should do so > that it > again runs. > > > -- > Sanjib Kumar Gupta > Bioinformatics Centre > Bose Institute > Kolkata 700054, INDIA > Phone : +91-33-2355 6626, 2816, 2355 4766 > Fax : +91-33-2355 3886 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: n9.pl URL: -------------- next part -------------- Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From t.nugent at cs.ucl.ac.uk Tue Feb 20 14:31:20 2007 From: t.nugent at cs.ucl.ac.uk (Tim Nugent) Date: Tue, 20 Feb 2007 14:31:20 +0000 Subject: [Bioperl-l] Module to draw transmembrane protein topology In-Reply-To: <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu> References: <45D9DE8C.2010301@cs.ucl.ac.uk> <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu> Message-ID: <45DB0638.1030001@cs.ucl.ac.uk> Thanks Chris, glad it's appreciated. Is there anything else I can do? If anyone has any requests/suggestions please let me know too. Best wishes, Tim Chris Fields wrote: > I think this is pretty nice! We can add the code and test script to > bugzilla and (if someone has time) try to see where it might fit in, > though Bio::Graphics sounds like a good spot. > > Anyone else have ideas on where this could go? > > chris > > On Feb 19, 2007, at 11:29 AM, Tim Nugent wrote: > >> Hi everyone, >> >> I've written a perl module to display transmembrane protein topology >> using GD. There are various options, including labels, helix/loop >> dimensions, colour schemes etc but it only requires a string or array >> containing the protein topology (e.g. transmembrane helix start/stop >> points). It produces output like this: >> >> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png >> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png >> >> using the code at the bottom. >> >> Here is a the module: >> http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm >> >> I've never submitted anything to Bioperl before - is this sort of >> thing >> likely to be of use to others? I imagine it would sit alongside >> some of >> the Bio::Graphics stuff. >> >> Best wishes, >> >> Tim >> >> #!/usr/bin/perl >> >> use strict; >> use warnings; >> use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to module >> use DrawTransmembrane; >> >> my @topology = (20,45,59,70,86,109,145,168,194,220); >> >> my %labels = ('5' => '5 - Sulphation Site', >> '21' => '1st Helix', >> '47' => '40 - Mutation', >> '60' => 'Voltage Sensor', >> '72' => '72 - Mutation 2', >> '73' => '73 - Mutation 3', >> '138' => '138 - Glycosylation Site', >> '170' => '170 - Phosphorylation Site', >> '200' => 'Last Helix'); >> >> my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a >> cartoon displaying transmembrane helices.', >> -topology => >> \@topology, >> -n_terminal => 'out', >> -helix_width => 48, >> -helix_height => 125, >> -short_loop_limit >> => 10, >> -long_loop_limit => >> 35, >> -loop_width => 25, >> -colour_scheme => >> 'yellow', >> -labels => \%labels, >> -text_offset => -10); >> >> ## print the .png file >> my $output = 'test.png'; >> open(OUTPUT, ">$output"); >> binmode OUTPUT; >> print OUTPUT $im->png; >> close OUTPUT; >> >> my $system = `display $output`; >> >> -- >> Tim Nugent (MRes) >> Research Student >> Bioinformatics Unit >> Department of Computer Science >> University College London >> Gower Street >> London WC1E 6BT >> Tel: 020-7679-0410 >> t.nugent at ucl.ac.uk >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Tim Nugent (MRes) Research Student Bioinformatics Unit Department of Computer Science University College London Gower Street London WC1E 6BT Tel: 020-7679-0410 t.nugent at ucl.ac.uk From marian.thieme at lycos.de Tue Feb 20 13:34:24 2007 From: marian.thieme at lycos.de (marian thieme) Date: Tue, 20 Feb 2007 13:34:24 +0000 Subject: [Bioperl-l] Alignment Message-ID: <188661178021328@lycos-europe.com> Hi all, perhaps somebody can give some comments in the following matter: I have a series of sequences which should be aligned against a reference sequence. In this special case we dont need to calculate anything, we only need to represent the sequences and get for instance some columns of interest. The problem now is, that some sequences have gaps and we need to represent gaps in the rewference sequence as well as in some individual sequences. Question: Can I use LocatableSeq to describe sequences with gaps and to add the sequence to the alignment ? If yes how I have to understand the example in the doc: use Bio::LocatableSeq; my $seq = new Bio::LocatableSeq(-seq => "CAGT-GGT",-id => "seq1", -start => 1,-end => 7); Does the "-" sign represents a gap ? When this sequence starts at position 1 why it ends at position 7, because when considering the gap, there are 8 positions. Does the SimpleAlign object can treat the gap ? Thanks for your attention, Marian Benachrichtigung bei E-Mail Empfang! - http://mail.lycos.de/app/lycosinside/setupLI.exe From cjfields at uiuc.edu Tue Feb 20 14:40:38 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Feb 2007 08:40:38 -0600 Subject: [Bioperl-l] Module to draw transmembrane protein topology In-Reply-To: <45DB0638.1030001@cs.ucl.ac.uk> References: <45D9DE8C.2010301@cs.ucl.ac.uk> <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu> <45DB0638.1030001@cs.ucl.ac.uk> Message-ID: You can add the module and test code (the script) to bugzilla: http://www.bioperl.org/wiki/Bugs http://bugzilla.open-bio.org/ Basically file a new bug report but note that it in an enhancement request when filling it out. Attach the code and test script to the report after it is generated (note that it may be easier to add all of the files together as a zipped archive). I think you could also add the graphical output as a binary file if they are huge files. chris On Feb 20, 2007, at 8:31 AM, Tim Nugent wrote: > Thanks Chris, glad it's appreciated. > > Is there anything else I can do? If anyone has any requests/ > suggestions please let me know too. > > Best wishes, > > Tim > > Chris Fields wrote: >> I think this is pretty nice! We can add the code and test script >> to bugzilla and (if someone has time) try to see where it might >> fit in, though Bio::Graphics sounds like a good spot. >> Anyone else have ideas on where this could go? >> chris >> On Feb 19, 2007, at 11:29 AM, Tim Nugent wrote: >>> Hi everyone, >>> >>> I've written a perl module to display transmembrane protein topology >>> using GD. There are various options, including labels, helix/loop >>> dimensions, colour schemes etc but it only requires a string or >>> array >>> containing the protein topology (e.g. transmembrane helix start/stop >>> points). It produces output like this: >>> >>> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png >>> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png >>> >>> using the code at the bottom. >>> >>> Here is a the module: >>> http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm >>> >>> I've never submitted anything to Bioperl before - is this sort >>> of thing >>> likely to be of use to others? I imagine it would sit alongside >>> some of >>> the Bio::Graphics stuff. >>> >>> Best wishes, >>> >>> Tim >>> >>> #!/usr/bin/perl >>> >>> use strict; >>> use warnings; >>> use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to >>> module >>> use DrawTransmembrane; >>> >>> my @topology = (20,45,59,70,86,109,145,168,194,220); >>> >>> my %labels = ('5' => '5 - Sulphation Site', >>> '21' => '1st Helix', >>> '47' => '40 - Mutation', >>> '60' => 'Voltage Sensor', >>> '72' => '72 - Mutation 2', >>> '73' => '73 - Mutation 3', >>> '138' => '138 - Glycosylation Site', >>> '170' => '170 - Phosphorylation Site', >>> '200' => 'Last Helix'); >>> >>> my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a >>> cartoon displaying transmembrane helices.', >>> -topology => >>> \@topology, >>> -n_terminal => >>> 'out', >>> -helix_width => 48, >>> -helix_height => >>> 125, >>> - >>> short_loop_limit => 10, >>> -long_loop_limit >>> => 35, >>> -loop_width => 25, >>> -colour_scheme >>> => 'yellow', >>> -labels => \%labels, >>> -text_offset => >>> -10); >>> >>> ## print the .png file >>> my $output = 'test.png'; >>> open(OUTPUT, ">$output"); >>> binmode OUTPUT; >>> print OUTPUT $im->png; >>> close OUTPUT; >>> >>> my $system = `display $output`; >>> >>> -- >>> Tim Nugent (MRes) >>> Research Student >>> Bioinformatics Unit >>> Department of Computer Science >>> University College London >>> Gower Street >>> London WC1E 6BT >>> Tel: 020-7679-0410 >>> t.nugent at ucl.ac.uk >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Tim Nugent (MRes) > Research Student > Bioinformatics Unit > Department of Computer Science > University College London > Gower Street > London WC1E 6BT > Tel: 020-7679-0410 > t.nugent at ucl.ac.uk Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From avilella at gmail.com Tue Feb 20 15:30:17 2007 From: avilella at gmail.com (Albert Vilella) Date: Tue, 20 Feb 2007 15:30:17 +0000 Subject: [Bioperl-l] Alignment In-Reply-To: <188661178021328@lycos-europe.com> References: <188661178021328@lycos-europe.com> Message-ID: <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com> I think the SimpleAlign object contains a set of sequences, each of which is a LocatableSeq object. These LocatableSeq objects will have gaps, represented by '-' or whatever other symbol is specified (I think there are methods for it), and then one can use methods like column_from_residue_number to map the coordinates between the primary sequence and the aligned sequence. The perldoc for LocatableSeq has some examples on how to use these methods. [Hopefully I haven't written any lie in this message], Cheers, Albert. On 2/20/07, marian thieme wrote: > Hi all, > > perhaps somebody can give some comments in the following matter: > > I have a series of sequences which should be aligned against a reference sequence. > In this special case we dont need to calculate anything, we only need to represent the sequences and get for instance some columns of interest. > The problem now is, that some sequences have gaps and we need to represent gaps in the rewference sequence as well as in some individual sequences. > > Question: Can I use LocatableSeq to describe sequences with gaps and to add the sequence to the alignment ? > If yes how I have to understand the example in the doc: > use Bio::LocatableSeq; > my $seq = new Bio::LocatableSeq(-seq => "CAGT-GGT",-id => "seq1", -start => 1,-end => 7); > > Does the "-" sign represents a gap ? When this sequence starts at position 1 > why it ends at position 7, because when considering the gap, there are 8 positions. > Does the SimpleAlign object can treat the gap ? > > > Thanks for your attention, > Marian > > Benachrichtigung bei E-Mail Empfang! - http://mail.lycos.de/app/lycosinside/setupLI.exe > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Tue Feb 20 15:30:15 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Feb 2007 09:30:15 -0600 Subject: [Bioperl-l] Module to draw transmembrane protein topology In-Reply-To: <45DB0638.1030001@cs.ucl.ac.uk> References: <45D9DE8C.2010301@cs.ucl.ac.uk> <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu> <45DB0638.1030001@cs.ucl.ac.uk> Message-ID: <4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu> Sorry, I sent that last one off prematurely. I could see this being used as a very useful utility if a Bioperl object had SeqFeatures which described transmembrane regions, or if output from something like TMHMM were parsed and used for input. Don't know if it's included, but if not you probably should allow labeling of the intracellular/extracellular space to designate periplasmic space, mitochondrial matrix, thylakoid, etc. I think Bio::Graphics namespace is definitely the place to go. If I ever get around to writing up the RNA structural stuff I may put something there myself. chris On Feb 20, 2007, at 8:31 AM, Tim Nugent wrote: > Thanks Chris, glad it's appreciated. > > Is there anything else I can do? If anyone has any requests/ > suggestions > please let me know too. > > Best wishes, > > Tim From cjfields at uiuc.edu Tue Feb 20 15:49:56 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Feb 2007 09:49:56 -0600 Subject: [Bioperl-l] Alignment In-Reply-To: <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com> References: <188661178021328@lycos-europe.com> <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com> Message-ID: <97E36074-1CF4-4348-85AB-DF23F1048727@uiuc.edu> On Feb 20, 2007, at 9:30 AM, Albert Vilella wrote: > I think the SimpleAlign object contains a set of sequences, each of > which is a LocatableSeq object. > > These LocatableSeq objects will have gaps, represented by '-' or > whatever other symbol is specified (I think there are methods for it), > and then one can use methods like column_from_residue_number to map > the coordinates between the primary sequence and the aligned sequence. > The perldoc for LocatableSeq has some examples on how to use these > methods. > > [Hopefully I haven't written any lie in this message], > > Cheers, > > Albert. No lies. The comparison methods are in SimpleAlign; if you look in SimpleAlign.t you'll see several demos on how to go abouot adding LocatableSeqs to a SimpleAlign object and then use SimpleAlign methods for them. chris PS (to marian): I'm a bit behind this week, so the bracket_strings stuff is lagging behind; I'm writing up some stuff on a deadline. From t.nugent at cs.ucl.ac.uk Tue Feb 20 15:50:10 2007 From: t.nugent at cs.ucl.ac.uk (Tim Nugent) Date: Tue, 20 Feb 2007 15:50:10 +0000 Subject: [Bioperl-l] Module to draw transmembrane protein topology In-Reply-To: <4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu> References: <45D9DE8C.2010301@cs.ucl.ac.uk> <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu> <45DB0638.1030001@cs.ucl.ac.uk> <4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu> Message-ID: <45DB18B2.8070004@cs.ucl.ac.uk> Labeling of inside/outside and membrane is already possible via -inside_label, -outside_label and -membrane_label tags, defaults are intracellular, extracellular and plasma membrane. Was definitely going to add an input/parser for MEMSAT, developed here at UCL, and probably a few other popular TM predictors too, e.g. PHOBIUS, TMHMM etc. Can already accept topology in the string format used by OPM (http://opm.phar.umich.edu/). Tim Chris Fields wrote: > Sorry, I sent that last one off prematurely. > > I could see this being used as a very useful utility if a Bioperl object > had SeqFeatures which described transmembrane regions, or if output from > something like TMHMM were parsed and used for input. Don't know if it's > included, but if not you probably should allow labeling of the > intracellular/extracellular space to designate periplasmic space, > mitochondrial matrix, thylakoid, etc. > > I think Bio::Graphics namespace is definitely the place to go. If I > ever get around to writing up the RNA structural stuff I may put > something there myself. > > chris > > On Feb 20, 2007, at 8:31 AM, Tim Nugent wrote: > >> Thanks Chris, glad it's appreciated. >> >> Is there anything else I can do? If anyone has any requests/suggestions >> please let me know too. >> >> Best wishes, >> >> Tim > > -- Tim Nugent (MRes) Research Student Bioinformatics Unit Department of Computer Science University College London Gower Street London WC1E 6BT Tel: 020-7679-0410 t.nugent at ucl.ac.uk From cjfields at uiuc.edu Tue Feb 20 16:09:00 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Feb 2007 10:09:00 -0600 Subject: [Bioperl-l] Module to draw transmembrane protein topology In-Reply-To: <45DB18B2.8070004@cs.ucl.ac.uk> References: <45D9DE8C.2010301@cs.ucl.ac.uk> <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu> <45DB0638.1030001@cs.ucl.ac.uk> <4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu> <45DB18B2.8070004@cs.ucl.ac.uk> Message-ID: On Feb 20, 2007, at 9:50 AM, Tim Nugent wrote: > Labeling of inside/outside and membrane is already possible via - > inside_label, -outside_label and -membrane_label tags, defaults are > intracellular, extracellular and plasma membrane. > > Was definitely going to add an input/parser for MEMSAT, developed > here at UCL, and probably a few other popular TM predictors too, > e.g. PHOBIUS, TMHMM etc. Can already accept topology in the string > format used by OPM (http://opm.phar.umich.edu/). > > Tim I'll definitely have to take a closer look at it when I have time. My guess is the best fit for data would be a seqfeatures, either in a collection or a Bio::Seq. As for the parsers you can look at the Bio::Tools::Tmhmm module, which scans Tmhmm output and converts everything to seqfeatures. chris From lstein at cshl.edu Tue Feb 20 17:25:24 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 20 Feb 2007 12:25:24 -0500 Subject: [Bioperl-l] Bio::Graphics::FeatureBase seq_id question In-Reply-To: <45DAD8F6.1030409@sendu.me.uk> References: <45DAD8F6.1030409@sendu.me.uk> Message-ID: <6dce9a0b0702200925g74d2db53j3252cca8a41765b@mail.gmail.com> Just an oversight. I'll fix it. Lincoln On 2/20/07, Sendu Bala wrote: > > Bio::Graphics::FeatureBase::seq_id is currently implemented as a > read-only alias to ref(): > sub seq_id { shift->ref() } > > > What is the reasoning behind this? Can it be made to handle setting of > the value as well?: > sub seq_id { shift->ref(@_) } > > > Cheers, > Sendu. > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From khan at cshl.edu Tue Feb 20 20:42:12 2007 From: khan at cshl.edu (Khan, Sohail) Date: Tue, 20 Feb 2007 15:42:12 -0500 Subject: [Bioperl-l] parsing a list of ids to a fasta file. Message-ID: Dear list, I am new to Bio-Perl. I have the following question: I have a list of ids, which I would like to parse against a large fasta file to retrieve the Seq for parsed ids. I appreciate any suggestions. Thanks. Khan From michael.watson at bbsrc.ac.uk Tue Feb 20 21:33:19 2007 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue, 20 Feb 2007 21:33:19 -0000 Subject: [Bioperl-l] parsing a list of ids to a fasta file. References: Message-ID: <8975119BCD0AC5419D61A9CF1A923E95020680FD@iahce2ksrv1.iah.bbsrc.ac.uk> Suggest you use Bio::Index::Fasta to create an index for the fasta file and then a simple script to retrieve sequences using that index. Or just use the pre-written bp_index.PLS and bp_fetch.PLS scripts. http://www.bioperl.org/wiki/Module:Bio::Index::Fasta ________________________________ From: bioperl-l-bounces at lists.open-bio.org on behalf of Khan, Sohail Sent: Tue 20/02/2007 8:42 PM To: Bioperl-l at lists.open-bio.org Subject: [Bioperl-l] parsing a list of ids to a fasta file. Dear list, I am new to Bio-Perl. I have the following question: I have a list of ids, which I would like to parse against a large fasta file to retrieve the Seq for parsed ids. I appreciate any suggestions. Thanks. Khan _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From neetisomaiya at gmail.com Wed Feb 21 08:19:14 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Wed, 21 Feb 2007 13:49:14 +0530 Subject: [Bioperl-l] need help in Bio-SCF Message-ID: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> Hi All, I downloaded module Bio-SCF-1.01from CPAN. And I am trying to install it when I got the following error. Can someone please guide me. [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL Checking if your kit is complete... Looks good Note (probably harmless): No library found for -lread Writing Makefile for Bio::SCF [root at ps2288 Bio-SCF-1.01]# make cp SCF.pm blib/lib/Bio/SCF.pm cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp -typemap /usr/lib/perl5/5.8.5/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc SCF.c Please specify prototyping behavior for SCF.xs (see perlxs manual) gcc -c -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386 -mtune=pentium4 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE" -DLITTLE_ENDIAN SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory SCF.xs:13:26: io_lib/mFILE.h: No such file or directory SCF.xs: In function `XS_Bio__SCF_get_scf_pointer': SCF.xs:27: error: `Scf' undeclared (first use in this function) SCF.xs:27: error: (Each undeclared identifier is reported only once SCF.xs:27: error: for each function it appears in.) SCF.xs:27: error: `scf_data' undeclared (first use in this function) SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer': SCF.xs:66: error: `Scf' undeclared (first use in this function) SCF.xs:66: error: `scf_data' undeclared (first use in this function) SCF.xs:68: error: `mFILE' undeclared (first use in this function) SCF.xs:68: error: `mf' undeclared (first use in this function) SCF.xs: In function `XS_Bio__SCF_scf_free': SCF.xs:89: error: `Scf' undeclared (first use in this function) SCF.xs:89: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_get_comments': SCF.xs:95: error: `Scf' undeclared (first use in this function) SCF.xs:95: error: `scf_data' undeclared (first use in this function) SCF.xs:95: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_set_comments': SCF.xs:108: error: `Scf' undeclared (first use in this function) SCF.xs:108: error: `scf_data' undeclared (first use in this function) SCF.xs:108: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_scf_write': SCF.xs:121: error: `Scf' undeclared (first use in this function) SCF.xs:121: error: `scf_data' undeclared (first use in this function) SCF.xs:121: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_scf_fwrite': SCF.xs:135: error: `mFILE' undeclared (first use in this function) SCF.xs:135: error: `mf' undeclared (first use in this function) SCF.xs:137: error: `Scf' undeclared (first use in this function) SCF.xs:137: error: `scf_data' undeclared (first use in this function) SCF.xs:137: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_get_from_header': SCF.xs:159: error: `Scf' undeclared (first use in this function) SCF.xs:159: error: `scf_data' undeclared (first use in this function) SCF.xs:159: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_get_at': SCF.xs:186: error: `Scf' undeclared (first use in this function) SCF.xs:186: error: `scf_data' undeclared (first use in this function) SCF.xs:186: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_set_base_at': SCF.xs:242: error: `Scf' undeclared (first use in this function) SCF.xs:242: error: `scf_data' undeclared (first use in this function) SCF.xs:242: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_set_at': SCF.xs:255: error: `Scf' undeclared (first use in this function) SCF.xs:255: error: `scf_data' undeclared (first use in this function) SCF.xs:255: error: syntax error before ')' token make: *** [SCF.o] Error 1 -- -Neeti Even my blood says, B positive From neetisomaiya at gmail.com Wed Feb 21 08:19:14 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Wed, 21 Feb 2007 13:49:14 +0530 Subject: [Bioperl-l] need help in Bio-SCF Message-ID: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> Hi All, I downloaded module Bio-SCF-1.01from CPAN. And I am trying to install it when I got the following error. Can someone please guide me. [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL Checking if your kit is complete... Looks good Note (probably harmless): No library found for -lread Writing Makefile for Bio::SCF [root at ps2288 Bio-SCF-1.01]# make cp SCF.pm blib/lib/Bio/SCF.pm cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp -typemap /usr/lib/perl5/5.8.5/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc SCF.c Please specify prototyping behavior for SCF.xs (see perlxs manual) gcc -c -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386 -mtune=pentium4 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE" -DLITTLE_ENDIAN SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory SCF.xs:13:26: io_lib/mFILE.h: No such file or directory SCF.xs: In function `XS_Bio__SCF_get_scf_pointer': SCF.xs:27: error: `Scf' undeclared (first use in this function) SCF.xs:27: error: (Each undeclared identifier is reported only once SCF.xs:27: error: for each function it appears in.) SCF.xs:27: error: `scf_data' undeclared (first use in this function) SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer': SCF.xs:66: error: `Scf' undeclared (first use in this function) SCF.xs:66: error: `scf_data' undeclared (first use in this function) SCF.xs:68: error: `mFILE' undeclared (first use in this function) SCF.xs:68: error: `mf' undeclared (first use in this function) SCF.xs: In function `XS_Bio__SCF_scf_free': SCF.xs:89: error: `Scf' undeclared (first use in this function) SCF.xs:89: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_get_comments': SCF.xs:95: error: `Scf' undeclared (first use in this function) SCF.xs:95: error: `scf_data' undeclared (first use in this function) SCF.xs:95: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_set_comments': SCF.xs:108: error: `Scf' undeclared (first use in this function) SCF.xs:108: error: `scf_data' undeclared (first use in this function) SCF.xs:108: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_scf_write': SCF.xs:121: error: `Scf' undeclared (first use in this function) SCF.xs:121: error: `scf_data' undeclared (first use in this function) SCF.xs:121: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_scf_fwrite': SCF.xs:135: error: `mFILE' undeclared (first use in this function) SCF.xs:135: error: `mf' undeclared (first use in this function) SCF.xs:137: error: `Scf' undeclared (first use in this function) SCF.xs:137: error: `scf_data' undeclared (first use in this function) SCF.xs:137: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_get_from_header': SCF.xs:159: error: `Scf' undeclared (first use in this function) SCF.xs:159: error: `scf_data' undeclared (first use in this function) SCF.xs:159: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_get_at': SCF.xs:186: error: `Scf' undeclared (first use in this function) SCF.xs:186: error: `scf_data' undeclared (first use in this function) SCF.xs:186: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_set_base_at': SCF.xs:242: error: `Scf' undeclared (first use in this function) SCF.xs:242: error: `scf_data' undeclared (first use in this function) SCF.xs:242: error: syntax error before ')' token SCF.xs: In function `XS_Bio__SCF_set_at': SCF.xs:255: error: `Scf' undeclared (first use in this function) SCF.xs:255: error: `scf_data' undeclared (first use in this function) SCF.xs:255: error: syntax error before ')' token make: *** [SCF.o] Error 1 -- -Neeti Even my blood says, B positive From sdavis2 at mail.nih.gov Wed Feb 21 11:17:50 2007 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 21 Feb 2007 06:17:50 -0500 Subject: [Bioperl-l] need help in Bio-SCF In-Reply-To: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> Message-ID: <200702210617.50616.sdavis2@mail.nih.gov> On Wednesday 21 February 2007 03:19, neeti somaiya wrote: > Hi All, > > I downloaded module > Bio-SCF-1.01from CPAN. > And I am trying to install it when I got the following error. Can someone > please guide me. You will probably need to read the INSTALL document. You need to install a couple of libraries first. Looks like you don't have the staden io-lib installed. > [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL > Checking if your kit is complete... > Looks good > Note (probably harmless): No library found for -lread > Writing Makefile for Bio::SCF > > [root at ps2288 Bio-SCF-1.01]# make > cp SCF.pm blib/lib/Bio/SCF.pm > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm > /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp -typemap > /usr/lib/perl5/5.8.5/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc SCF.c > Please specify prototyping behavior for SCF.xs (see perlxs manual) > gcc -c -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386 > -mtune=pentium4 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC > "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE" -DLITTLE_ENDIAN > SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory > SCF.xs:13:26: io_lib/mFILE.h: No such file or directory > SCF.xs: In function `XS_Bio__SCF_get_scf_pointer': > SCF.xs:27: error: `Scf' undeclared (first use in this function) > SCF.xs:27: error: (Each undeclared identifier is reported only once > SCF.xs:27: error: for each function it appears in.) > SCF.xs:27: error: `scf_data' undeclared (first use in this function) > SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer': > SCF.xs:66: error: `Scf' undeclared (first use in this function) > SCF.xs:66: error: `scf_data' undeclared (first use in this function) > SCF.xs:68: error: `mFILE' undeclared (first use in this function) > SCF.xs:68: error: `mf' undeclared (first use in this function) > SCF.xs: In function `XS_Bio__SCF_scf_free': > SCF.xs:89: error: `Scf' undeclared (first use in this function) > SCF.xs:89: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_get_comments': > SCF.xs:95: error: `Scf' undeclared (first use in this function) > SCF.xs:95: error: `scf_data' undeclared (first use in this function) > SCF.xs:95: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_set_comments': > SCF.xs:108: error: `Scf' undeclared (first use in this function) > SCF.xs:108: error: `scf_data' undeclared (first use in this function) > SCF.xs:108: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_scf_write': > SCF.xs:121: error: `Scf' undeclared (first use in this function) > SCF.xs:121: error: `scf_data' undeclared (first use in this function) > SCF.xs:121: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_scf_fwrite': > SCF.xs:135: error: `mFILE' undeclared (first use in this function) > SCF.xs:135: error: `mf' undeclared (first use in this function) > SCF.xs:137: error: `Scf' undeclared (first use in this function) > SCF.xs:137: error: `scf_data' undeclared (first use in this function) > SCF.xs:137: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_get_from_header': > SCF.xs:159: error: `Scf' undeclared (first use in this function) > SCF.xs:159: error: `scf_data' undeclared (first use in this function) > SCF.xs:159: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_get_at': > SCF.xs:186: error: `Scf' undeclared (first use in this function) > SCF.xs:186: error: `scf_data' undeclared (first use in this function) > SCF.xs:186: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_set_base_at': > SCF.xs:242: error: `Scf' undeclared (first use in this function) > SCF.xs:242: error: `scf_data' undeclared (first use in this function) > SCF.xs:242: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_set_at': > SCF.xs:255: error: `Scf' undeclared (first use in this function) > SCF.xs:255: error: `scf_data' undeclared (first use in this function) > SCF.xs:255: error: syntax error before ')' token > make: *** [SCF.o] Error 1 From sdavis2 at mail.nih.gov Wed Feb 21 11:17:50 2007 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 21 Feb 2007 06:17:50 -0500 Subject: [Bioperl-l] need help in Bio-SCF In-Reply-To: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> Message-ID: <200702210617.50616.sdavis2@mail.nih.gov> On Wednesday 21 February 2007 03:19, neeti somaiya wrote: > Hi All, > > I downloaded module > Bio-SCF-1.01from CPAN. > And I am trying to install it when I got the following error. Can someone > please guide me. You will probably need to read the INSTALL document. You need to install a couple of libraries first. Looks like you don't have the staden io-lib installed. > [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL > Checking if your kit is complete... > Looks good > Note (probably harmless): No library found for -lread > Writing Makefile for Bio::SCF > > [root at ps2288 Bio-SCF-1.01]# make > cp SCF.pm blib/lib/Bio/SCF.pm > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm > /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp -typemap > /usr/lib/perl5/5.8.5/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc SCF.c > Please specify prototyping behavior for SCF.xs (see perlxs manual) > gcc -c -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386 > -mtune=pentium4 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC > "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE" -DLITTLE_ENDIAN > SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory > SCF.xs:13:26: io_lib/mFILE.h: No such file or directory > SCF.xs: In function `XS_Bio__SCF_get_scf_pointer': > SCF.xs:27: error: `Scf' undeclared (first use in this function) > SCF.xs:27: error: (Each undeclared identifier is reported only once > SCF.xs:27: error: for each function it appears in.) > SCF.xs:27: error: `scf_data' undeclared (first use in this function) > SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer': > SCF.xs:66: error: `Scf' undeclared (first use in this function) > SCF.xs:66: error: `scf_data' undeclared (first use in this function) > SCF.xs:68: error: `mFILE' undeclared (first use in this function) > SCF.xs:68: error: `mf' undeclared (first use in this function) > SCF.xs: In function `XS_Bio__SCF_scf_free': > SCF.xs:89: error: `Scf' undeclared (first use in this function) > SCF.xs:89: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_get_comments': > SCF.xs:95: error: `Scf' undeclared (first use in this function) > SCF.xs:95: error: `scf_data' undeclared (first use in this function) > SCF.xs:95: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_set_comments': > SCF.xs:108: error: `Scf' undeclared (first use in this function) > SCF.xs:108: error: `scf_data' undeclared (first use in this function) > SCF.xs:108: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_scf_write': > SCF.xs:121: error: `Scf' undeclared (first use in this function) > SCF.xs:121: error: `scf_data' undeclared (first use in this function) > SCF.xs:121: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_scf_fwrite': > SCF.xs:135: error: `mFILE' undeclared (first use in this function) > SCF.xs:135: error: `mf' undeclared (first use in this function) > SCF.xs:137: error: `Scf' undeclared (first use in this function) > SCF.xs:137: error: `scf_data' undeclared (first use in this function) > SCF.xs:137: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_get_from_header': > SCF.xs:159: error: `Scf' undeclared (first use in this function) > SCF.xs:159: error: `scf_data' undeclared (first use in this function) > SCF.xs:159: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_get_at': > SCF.xs:186: error: `Scf' undeclared (first use in this function) > SCF.xs:186: error: `scf_data' undeclared (first use in this function) > SCF.xs:186: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_set_base_at': > SCF.xs:242: error: `Scf' undeclared (first use in this function) > SCF.xs:242: error: `scf_data' undeclared (first use in this function) > SCF.xs:242: error: syntax error before ')' token > SCF.xs: In function `XS_Bio__SCF_set_at': > SCF.xs:255: error: `Scf' undeclared (first use in this function) > SCF.xs:255: error: `scf_data' undeclared (first use in this function) > SCF.xs:255: error: syntax error before ')' token > make: *** [SCF.o] Error 1 From cjfields at uiuc.edu Wed Feb 21 12:08:57 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 21 Feb 2007 06:08:57 -0600 Subject: [Bioperl-l] need help in Bio-SCF In-Reply-To: <200702210617.50616.sdavis2@mail.nih.gov> References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> <200702210617.50616.sdavis2@mail.nih.gov> Message-ID: <40C288FE-C74C-4B3F-A835-1A5C563B2B8E@uiuc.edu> On Feb 21, 2007, at 5:17 AM, Sean Davis wrote: > On Wednesday 21 February 2007 03:19, neeti somaiya wrote: >> Hi All, >> >> I downloaded module >> Bio-SCF-1.01from CPAN. >> And I am trying to install it when I got the following error. Can >> someone >> please guide me. > > You will probably need to read the INSTALL document. You need to > install a > couple of libraries first. Looks like you don't have the staden io- > lib > installed. Just to note, this module isn't part of BioPerl (I don't even think it has a Bioperl interface). You'll probably need to contact Lincoln for details on using this module. One thing you may run into is errors with the version of io_lib installed (a problem I've encountered with bioperl-ext), probably from API changes. If you run into problems with newer versions of io_lib you should try downgrading to io_lib 1.8.11 or 1.8.12. From neetisomaiya at gmail.com Wed Feb 21 12:25:26 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Wed, 21 Feb 2007 17:55:26 +0530 Subject: [Bioperl-l] need help in Bio-SCF In-Reply-To: <200702210617.50616.sdavis2@mail.nih.gov> References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> <200702210617.50616.sdavis2@mail.nih.gov> Message-ID: <764978cf0702210425j544330bbr12c86a89960dbb66@mail.gmail.com> Thanks. It resolved my problem. On 2/21/07, Sean Davis wrote: > > On Wednesday 21 February 2007 03:19, neeti somaiya wrote: > > Hi All, > > > > I downloaded module > > Bio-SCF-1.01from CPAN. > > And I am trying to install it when I got the following error. Can > someone > > please guide me. > > You will probably need to read the INSTALL document. You need to install > a > couple of libraries first. Looks like you don't have the staden io-lib > installed. > > > > [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL > > Checking if your kit is complete... > > Looks good > > Note (probably harmless): No library found for -lread > > Writing Makefile for Bio::SCF > > > > [root at ps2288 Bio-SCF-1.01]# make > > cp SCF.pm blib/lib/Bio/SCF.pm > > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm > > /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp -typemap > > /usr/lib/perl5/5.8.5/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc > SCF.c > > Please specify prototyping behavior for SCF.xs (see perlxs manual) > > gcc -c -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING > > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE > > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386 > > -mtune=pentium4 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC > > "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE" -DLITTLE_ENDIAN > > SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory > > SCF.xs:13:26: io_lib/mFILE.h: No such file or directory > > SCF.xs: In function `XS_Bio__SCF_get_scf_pointer': > > SCF.xs:27: error: `Scf' undeclared (first use in this function) > > SCF.xs:27: error: (Each undeclared identifier is reported only once > > SCF.xs:27: error: for each function it appears in.) > > SCF.xs:27: error: `scf_data' undeclared (first use in this function) > > SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer': > > SCF.xs:66: error: `Scf' undeclared (first use in this function) > > SCF.xs:66: error: `scf_data' undeclared (first use in this function) > > SCF.xs:68: error: `mFILE' undeclared (first use in this function) > > SCF.xs:68: error: `mf' undeclared (first use in this function) > > SCF.xs: In function `XS_Bio__SCF_scf_free': > > SCF.xs:89: error: `Scf' undeclared (first use in this function) > > SCF.xs:89: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_get_comments': > > SCF.xs:95: error: `Scf' undeclared (first use in this function) > > SCF.xs:95: error: `scf_data' undeclared (first use in this function) > > SCF.xs:95: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_set_comments': > > SCF.xs:108: error: `Scf' undeclared (first use in this function) > > SCF.xs:108: error: `scf_data' undeclared (first use in this function) > > SCF.xs:108: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_scf_write': > > SCF.xs:121: error: `Scf' undeclared (first use in this function) > > SCF.xs:121: error: `scf_data' undeclared (first use in this function) > > SCF.xs:121: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_scf_fwrite': > > SCF.xs:135: error: `mFILE' undeclared (first use in this function) > > SCF.xs:135: error: `mf' undeclared (first use in this function) > > SCF.xs:137: error: `Scf' undeclared (first use in this function) > > SCF.xs:137: error: `scf_data' undeclared (first use in this function) > > SCF.xs:137: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_get_from_header': > > SCF.xs:159: error: `Scf' undeclared (first use in this function) > > SCF.xs:159: error: `scf_data' undeclared (first use in this function) > > SCF.xs:159: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_get_at': > > SCF.xs:186: error: `Scf' undeclared (first use in this function) > > SCF.xs:186: error: `scf_data' undeclared (first use in this function) > > SCF.xs:186: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_set_base_at': > > SCF.xs:242: error: `Scf' undeclared (first use in this function) > > SCF.xs:242: error: `scf_data' undeclared (first use in this function) > > SCF.xs:242: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_set_at': > > SCF.xs:255: error: `Scf' undeclared (first use in this function) > > SCF.xs:255: error: `scf_data' undeclared (first use in this function) > > SCF.xs:255: error: syntax error before ')' token > > make: *** [SCF.o] Error 1 > -- -Neeti Even my blood says, B positive From neetisomaiya at gmail.com Wed Feb 21 12:25:26 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Wed, 21 Feb 2007 17:55:26 +0530 Subject: [Bioperl-l] need help in Bio-SCF In-Reply-To: <200702210617.50616.sdavis2@mail.nih.gov> References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com> <200702210617.50616.sdavis2@mail.nih.gov> Message-ID: <764978cf0702210425j544330bbr12c86a89960dbb66@mail.gmail.com> Thanks. It resolved my problem. On 2/21/07, Sean Davis wrote: > > On Wednesday 21 February 2007 03:19, neeti somaiya wrote: > > Hi All, > > > > I downloaded module > > Bio-SCF-1.01from CPAN. > > And I am trying to install it when I got the following error. Can > someone > > please guide me. > > You will probably need to read the INSTALL document. You need to install > a > couple of libraries first. Looks like you don't have the staden io-lib > installed. > > > > [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL > > Checking if your kit is complete... > > Looks good > > Note (probably harmless): No library found for -lread > > Writing Makefile for Bio::SCF > > > > [root at ps2288 Bio-SCF-1.01]# make > > cp SCF.pm blib/lib/Bio/SCF.pm > > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm > > /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp -typemap > > /usr/lib/perl5/5.8.5/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc > SCF.c > > Please specify prototyping behavior for SCF.xs (see perlxs manual) > > gcc -c -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING > > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE > > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386 > > -mtune=pentium4 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC > > "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE" -DLITTLE_ENDIAN > > SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory > > SCF.xs:13:26: io_lib/mFILE.h: No such file or directory > > SCF.xs: In function `XS_Bio__SCF_get_scf_pointer': > > SCF.xs:27: error: `Scf' undeclared (first use in this function) > > SCF.xs:27: error: (Each undeclared identifier is reported only once > > SCF.xs:27: error: for each function it appears in.) > > SCF.xs:27: error: `scf_data' undeclared (first use in this function) > > SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer': > > SCF.xs:66: error: `Scf' undeclared (first use in this function) > > SCF.xs:66: error: `scf_data' undeclared (first use in this function) > > SCF.xs:68: error: `mFILE' undeclared (first use in this function) > > SCF.xs:68: error: `mf' undeclared (first use in this function) > > SCF.xs: In function `XS_Bio__SCF_scf_free': > > SCF.xs:89: error: `Scf' undeclared (first use in this function) > > SCF.xs:89: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_get_comments': > > SCF.xs:95: error: `Scf' undeclared (first use in this function) > > SCF.xs:95: error: `scf_data' undeclared (first use in this function) > > SCF.xs:95: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_set_comments': > > SCF.xs:108: error: `Scf' undeclared (first use in this function) > > SCF.xs:108: error: `scf_data' undeclared (first use in this function) > > SCF.xs:108: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_scf_write': > > SCF.xs:121: error: `Scf' undeclared (first use in this function) > > SCF.xs:121: error: `scf_data' undeclared (first use in this function) > > SCF.xs:121: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_scf_fwrite': > > SCF.xs:135: error: `mFILE' undeclared (first use in this function) > > SCF.xs:135: error: `mf' undeclared (first use in this function) > > SCF.xs:137: error: `Scf' undeclared (first use in this function) > > SCF.xs:137: error: `scf_data' undeclared (first use in this function) > > SCF.xs:137: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_get_from_header': > > SCF.xs:159: error: `Scf' undeclared (first use in this function) > > SCF.xs:159: error: `scf_data' undeclared (first use in this function) > > SCF.xs:159: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_get_at': > > SCF.xs:186: error: `Scf' undeclared (first use in this function) > > SCF.xs:186: error: `scf_data' undeclared (first use in this function) > > SCF.xs:186: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_set_base_at': > > SCF.xs:242: error: `Scf' undeclared (first use in this function) > > SCF.xs:242: error: `scf_data' undeclared (first use in this function) > > SCF.xs:242: error: syntax error before ')' token > > SCF.xs: In function `XS_Bio__SCF_set_at': > > SCF.xs:255: error: `Scf' undeclared (first use in this function) > > SCF.xs:255: error: `scf_data' undeclared (first use in this function) > > SCF.xs:255: error: syntax error before ')' token > > make: *** [SCF.o] Error 1 > -- -Neeti Even my blood says, B positive From jay at jays.net Wed Feb 21 00:27:01 2007 From: jay at jays.net (Jay Hannah) Date: Tue, 20 Feb 2007 18:27:01 -0600 Subject: [Bioperl-l] Alignment In-Reply-To: <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com> References: <188661178021328@lycos-europe.com> <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com> Message-ID: > On 2/20/07, marian thieme wrote: >> I have a series of sequences which should be aligned against a >> reference sequence. >> In this special case we dont need to calculate anything, we only need >> to represent the sequences and get for instance some columns of >> interest. >> The problem now is, that some sequences have gaps and we need to >> represent gaps in the rewference sequence as well as in some >> individual sequences. On Feb 20, 2007, at 9:30 AM, Albert Vilella wrote: > I think the SimpleAlign object contains a set of sequences, each of > which is a LocatableSeq object. Fascinating. In my BLAST-centric universe I went and rolled my own solution for SeqLab where I hold onto the Bio::Seq from the reference sequences and then hold onto the Bio::Search::HSP::GenericHSP objects for all my BLAST hits. From that dataset I can write whatever reports I want and/or perform any subsequent actions. I wonder if I should have done that differently... What typically creates .pfam files? j seqlab.net http://www.bioperl.org/wiki/User:Jhannah From cjfields at uiuc.edu Wed Feb 21 13:36:02 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 21 Feb 2007 07:36:02 -0600 Subject: [Bioperl-l] Alignment In-Reply-To: References: <188661178021328@lycos-europe.com> <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com> Message-ID: <2233F0EE-94FE-42F0-B8E5-1BE14A25C0D4@uiuc.edu> On Feb 20, 2007, at 6:27 PM, Jay Hannah wrote: ... > > On Feb 20, 2007, at 9:30 AM, Albert Vilella wrote: >> I think the SimpleAlign object contains a set of sequences, each of >> which is a LocatableSeq object. > > Fascinating. In my BLAST-centric universe I went and rolled my own > solution for SeqLab where I hold onto the Bio::Seq from the reference > sequences and then hold onto the Bio::Search::HSP::GenericHSP objects > for all my BLAST hits. From that dataset I can write whatever > reports I > want and/or perform any subsequent actions. I wonder if I should have > done that differently... > > What typically creates .pfam files? > > j > seqlab.net > http://www.bioperl.org/wiki/User:Jhannah Pfam alignments come in two formats (pfam and stockholm) that can both be parsed into SimpleAlign objects via Bio::AlignIO: my $alnin = Bio::AlignIO->new(-format => 'stockholm', -file => 'dho.sto'); while (my $aln = $alnin->next_aln) { # do stuff to $aln SimpleAlign } Personally I stick with Stockholm as it's a richer format (with annotations and so on), but the parser was rewritten recently (by moi!) so may have some bugs still. I'm a bit confused as to what you do with BLAST files. You can generate a SimpleAlign right from the HSP for most SearchIO parsers: http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods chris From sanjib at bic.boseinst.ernet.in Wed Feb 21 06:12:06 2007 From: sanjib at bic.boseinst.ernet.in (Sanjib Kumar Gupta) Date: Wed, 21 Feb 2007 11:42:06 +0530 Subject: [Bioperl-l] help on remote blast In-Reply-To: <20070220073200.M42567@bic.boseinst.ernet.in> References: <20070220073200.M42567@bic.boseinst.ernet.in> Message-ID: <20070221061206.M37845@bic.boseinst.ernet.in> Hi I have been running this script for some time and it was running fine. I am using this linux machine with live IP(no proxy). But suudenly it has stopped working with this errors waiting...waiting... -------------------- WARNING --------------------- MSG: An Error Occurred

An Error Occurred

500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad hostname 'www.ncbi.nlm.nih.gov') --------------------------------------------------- xx.pep -------------------- WARNING --------------------- MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5 Content-Length: 497 Content-Type: application/x-www-form-urlencoded DATABASE=nr&QUERY=%3ENM_005277.3%3B_1414961_1+837% 0AMEENMEEGQTQKGCFECCIKCLGGIPYASLIATILLYAGVALFCGCGHEALSGTVNILQTYFEMARTAGDTLDVF TMIDIFKYVIYGIAAAFFVYGILLMVEGFFTTGAIKDLYGDFKITTCGRCVSAWFIMLTYLFMLAWLGVTAFTSLPV YMYFNLWTICRNTTLVEGANLCLDLRQFGIVTIGEEKKICTVSENFLRMCESTELNMTFHLFIVALAGAGAAVIAMV HYLMVLSANWAYVKDACRMQKYEDIKSKEEQELHDIHSTRSKERLNAYT*&COMPOSITION_BASED_STATISTI CS=off&EXPECT=1e- 10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&MATRIX_NAME=BLOSUM62&ENTREZ_ QUERY=Xenopus+laevis[Organism]&PROGRAM=blastp An Error Occurred

An Error Occurred

500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad hostname 'www.ncbi.nlm.nih.gov') --------------------------------------------------- waiting...waiting... -------------------- WARNING --------------------- MSG: An Error Occurred

An Error Occurred

500 Internal Server Error --------------------------------------------------- Though I am able to see the ncbi page from browser but am unable to ping ot trace route to the server. Please help me. -- Sanjib Kumar Gupta Bioinformatics Centre Bose Institute Kolkata 700054, INDIA Phone : +91-33-2355 6626, 2816, 2355 4766 Fax : +91-33-2355 3886 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: n9.pl URL: From granjeau at tagc.univ-mrs.fr Wed Feb 21 13:50:39 2007 From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137) Date: Wed, 21 Feb 2007 14:50:39 +0100 Subject: [Bioperl-l] Adding empty member list in Bio::Cluster::SequenceFamily Message-ID: <45DC4E2F.4060804@tagc.univ-mrs.fr> Hello! Not clear to me, but I find a work around by checking for empty list before adding, here is what I noticed. Adding as members an empty list () is not the same as adding a reference to an empty list [], of course, but could be thought to be the same. Calling get_members, for the second case, I got a list of 0 member, but in the first case I got of 1 member, which is not an object at all. I am warned now, but may be the documentation should emphasize on using by the reference call. Best regards, --Samuel use Bio::Cluster::SequenceFamily; $f = new Bio::Cluster::SequenceFamily( -id => 'aa' ); $f->add_members( () ); print scalar $f->get_members(); # 1 $g = new Bio::Cluster::SequenceFamily( -id => 'aa' ); $g->add_members( [] ); print scalar $g->get_members(); # 0 From stephen.marshall at novartis.com Wed Feb 21 17:01:00 2007 From: stephen.marshall at novartis.com (stephen.marshall at novartis.com) Date: Wed, 21 Feb 2007 12:01:00 -0500 Subject: [Bioperl-l] Parsing kegg files Message-ID: Hello I"m trying to parse a Kegg file and I can't seem to get at the pathway information... Here's a snippet of my code. I only see dblink and description as annotation use Bio::SeqIO; my $stream = Bio::SeqIO->new(-file => $filename, -format => 'KEGG'); while ( my $seq = $stream->next_seq() ) { # do something with $seq my $id = $seq->display_id(); print "$id:"; my $ann = $seq->annotation(); foreach my $key ( $ann->get_all_annotation_keys() ) { my @values = $ann->get_Annotations($key); foreach my $value ( @values ) { print "Annotation: ",$key," value: ",$value->as_text,"\n"; } } } _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From prateek.vit at gmail.com Wed Feb 21 17:40:25 2007 From: prateek.vit at gmail.com (prateek singh yadav) Date: Wed, 21 Feb 2007 23:10:25 +0530 Subject: [Bioperl-l] Problem in BioPerl Installation Message-ID: Hello all, I was trying to install Bioperl on my redhat linux (EL) using CPAN. but CPAN shows this problem. [root at HX342SBC054 Desktop]# cpan Terminal does not support AddHistory. cpan shell -- CPAN exploration and modules installation (v1.7601) ReadLine support available (try 'install Bundle::CPAN') cpan> get bioperl CPAN: Storable loaded ok Going to read /root/.cpan/Metadata Warning: Found only 25 objects in /root/.cpan/Metadata Going to read /root/.cpan/sources/authors/01mailrc.txt.gz Going to read /root/.cpan/sources/modules/02packages.details.txt.gz Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not contain a Line-Count header. Please check the validity of the index file by comparing it to more than one CPAN mirror. I'll continue but problems seem likely to happen. Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not contain a Last-Updated header. Please check the validity of the index file by comparing it to more than one CPAN mirror. I'll continue but problems seem likely to happen. Going to read /root/.cpan/sources/modules/03modlist.data.gz Can't locate object method "data" via package "CPAN::Modulelist" (perhaps you forgot to load "CPAN::Modulelist"?) at (eval 13) line 1. at /usr/lib/perl5/5.8.5/CPAN.pm line 3406 CPAN::Index::rd_modlist('CPAN::Index', '/root/.cpan/sources/modules/03modlist.data.gz') called at /usr/lib/perl5/5.8.5/CPAN.pm line 3129 CPAN::Index::reload('CPAN::Index') called at /usr/lib/perl5/5.8.5/CPAN.pm line 675 CPAN::exists('CPAN=HASH(0x8548f20)', 'CPAN::Module', 'bioperl') called at /usr/lib/perl5/5.8.5/CPAN.pm line 1842 CPAN::Shell::expandany('CPAN::Shell', 'bioperl') called at /usr/lib/perl5/5.8.5/CPAN.pm line 2078 CPAN::Shell::rematein('CPAN::Shell', 'get', 'bioperl') called at /usr/lib/perl5/5.8.5/CPAN.pm line 2157 CPAN::Shell::get('CPAN::Shell', 'bioperl') called at /usr/lib/perl5/5.8.5/CPAN.pm line 201 eval {...} called at /usr/lib/perl5/5.8.5/CPAN.pm line 201 CPAN::shell() called at /usr/bin/cpan line 193 cpan> Can anyone give me direction how to configure cpan again or how to install BioPerl on linux with its complete dependencies. Because I think I have a problem in CPAN configuration. Regards, Prateek -- Prateek Singh 3rd year Bioinformatics(BTech) Vellore Institute Of Technology Vellore-632014 prateek.vit at gmail.com From bosborne11 at verizon.net Wed Feb 21 17:29:40 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 21 Feb 2007 12:29:40 -0500 Subject: [Bioperl-l] Parsing kegg files In-Reply-To: Message-ID: Stephen, I don't know what your eventual goals are but you might want to take a look at bioperl-network. However, there are problems with this package. One, it only parses DIP tab-delimited and PSI-MI and it does this last one only partially (you will get the graph though). Two, it seems to have only a single developer interested in it, that's me, and few users. In my Bioperl experience projects like this tend to fade away. http://www.bioperl.org/wiki/Network_package Brian O. On 2/21/07 12:01 PM, "stephen.marshall at novartis.com" wrote: > Hello > I"m trying to parse a Kegg file and I can't seem to get at the pathway > information... Here's a snippet of my code. I only see dblink and > description as annotation > > use Bio::SeqIO; > > my $stream = Bio::SeqIO->new(-file => $filename, -format => 'KEGG'); > > while ( my $seq = $stream->next_seq() ) { > # do something with $seq > my $id = $seq->display_id(); > print "$id:"; > my $ann = $seq->annotation(); > foreach my $key ( $ann->get_all_annotation_keys() ) { > my @values = $ann->get_Annotations($key); > foreach my $value ( @values ) { > print "Annotation: ",$key," value: > ",$value->as_text,"\n"; > } > } > > } > _________________________ > > CONFIDENTIALITY NOTICE > > The information contained in this e-mail message is intended only for the > exclusive use of the individual or entity named above and may contain > information that is privileged, confidential or exempt from disclosure > under applicable law. If the reader of this message is not the intended > recipient, or the employee or agent responsible for delivery of the > message to the intended recipient, you are hereby notified that any > dissemination, distribution or copying of this communication is strictly > prohibited. If you have received this communication in error, please > notify the sender immediately by e-mail and delete the material from any > computer. Thank you. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From arareko at campus.iztacala.unam.mx Wed Feb 21 18:18:37 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Wed, 21 Feb 2007 12:18:37 -0600 Subject: [Bioperl-l] Problem in BioPerl Installation In-Reply-To: References: Message-ID: <45DC8CFD.1060108@campus.iztacala.unam.mx> You can always rebuild your CPAN configuration by deleting the existing .cpan/ directory in root's $HOME dir (quick & dirty trick), then invoke CPAN again from root's shell to rebuild the config: # perl -MCPAN -e shell Hope this helps. Regards, Mauricio. prateek singh yadav wrote: > Hello all, > > I was trying to install Bioperl on my redhat linux (EL) using CPAN. but CPAN > shows this problem. > > > [root at HX342SBC054 Desktop]# cpan > Terminal does not support AddHistory. > > cpan shell -- CPAN exploration and modules installation (v1.7601) > ReadLine support available (try 'install Bundle::CPAN') > > cpan> get bioperl > CPAN: Storable loaded ok > Going to read /root/.cpan/Metadata > Warning: Found only 25 objects in /root/.cpan/Metadata > Going to read /root/.cpan/sources/authors/01mailrc.txt.gz > Going to read /root/.cpan/sources/modules/02packages.details.txt.gz > Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not > contain a Line-Count header. > Please check the validity of the index file by comparing it to more > than one CPAN mirror. I'll continue but problems seem likely to > happen. > Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not > contain a Last-Updated header. > Please check the validity of the index file by comparing it to more > than one CPAN mirror. I'll continue but problems seem likely to > happen. > Going to read /root/.cpan/sources/modules/03modlist.data.gz > Can't locate object method "data" via package "CPAN::Modulelist" (perhaps > you forgot to load "CPAN::Modulelist"?) at (eval 13) line 1. > at /usr/lib/perl5/5.8.5/CPAN.pm line 3406 > CPAN::Index::rd_modlist('CPAN::Index', > '/root/.cpan/sources/modules/03modlist.data.gz') called at > /usr/lib/perl5/5.8.5/CPAN.pm line 3129 > CPAN::Index::reload('CPAN::Index') called at > /usr/lib/perl5/5.8.5/CPAN.pm line 675 > CPAN::exists('CPAN=HASH(0x8548f20)', 'CPAN::Module', 'bioperl') > called at /usr/lib/perl5/5.8.5/CPAN.pm line 1842 > CPAN::Shell::expandany('CPAN::Shell', 'bioperl') called at > /usr/lib/perl5/5.8.5/CPAN.pm line 2078 > CPAN::Shell::rematein('CPAN::Shell', 'get', 'bioperl') called at > /usr/lib/perl5/5.8.5/CPAN.pm line 2157 > CPAN::Shell::get('CPAN::Shell', 'bioperl') called at > /usr/lib/perl5/5.8.5/CPAN.pm line 201 > eval {...} called at /usr/lib/perl5/5.8.5/CPAN.pm line 201 > CPAN::shell() called at /usr/bin/cpan line 193 > > cpan> > > Can anyone give me direction how to configure cpan again or how to install > BioPerl on linux with its complete dependencies. Because I think I have a > problem in CPAN configuration. > > Regards, > Prateek > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From hlapp at gmx.net Wed Feb 21 18:33:17 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 21 Feb 2007 13:33:17 -0500 Subject: [Bioperl-l] Adding empty member list in Bio::Cluster::SequenceFamily In-Reply-To: <45DC4E2F.4060804@tagc.univ-mrs.fr> References: <45DC4E2F.4060804@tagc.univ-mrs.fr> Message-ID: <5B31EEBD-FFE5-4A0F-BB05-DF7297103BBD@gmx.net> Fixed in CVS HEAD. -hilmar On Feb 21, 2007, at 8:50 AM, Samuel GRANJEAUD - IR/IFR137 wrote: > Hello! > > Not clear to me, but I find a work around by checking for empty list > before adding, here is what I noticed. Adding as members an empty list > () is not the same as adding a reference to an empty list [], of > course, > but could be thought to be the same. Calling get_members, for the > second > case, I got a list of 0 member, but in the first case I got of 1 > member, > which is not an object at all. I am warned now, but may be the > documentation should emphasize on using by the reference call. > > Best regards, > --Samuel > > > use Bio::Cluster::SequenceFamily; > > $f = new Bio::Cluster::SequenceFamily( -id => 'aa' ); > $f->add_members( () ); > print scalar $f->get_members(); > # 1 > $g = new Bio::Cluster::SequenceFamily( -id => 'aa' ); > $g->add_members( [] ); > print scalar $g->get_members(); > # 0 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Wed Feb 21 19:12:57 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 21 Feb 2007 13:12:57 -0600 Subject: [Bioperl-l] GenBank accession bug? Message-ID: <11B83C2C-7BFE-48B5-A20C-7EAF282F39BD@uiuc.edu> Dmitry, I'm forwarding this to the mail list. In the future please post/ respond to the regular mail list so other BioPerl developers/users can comment. You'll get feedback much faster here (and maybe even some support!). The issue at hand is whether we can support GenBank accessions/ display_id/version with your naming scheme. My feeling is that support for nonalphanumerics was removed to be compliant with the GenBank standard for accessions, though I may be wrong. Maybe someone who was around during bioperl 1.2 can elaborate more? From http://bugzilla.open-bio.org/show_bug.cgi?id=2214 -------------------------------------------------- .... Thanks for verbose explanation. It seems that I would need to apply my local patches to the BioPerl module(s). With BioPerl-1.2 there was no problem with '-' in sequence names. The problem is that in the project we participate (Vizier project) following sequence name convention was adopted: VZ##-(or)-<$$> VZ Stands for Vizier ## Your 2-digits Partner ID within the VIZIER consortium Virus name according to the ICTV nomenclature; , If sequence has not been assigned a GenBank LOCUS ID, available strain designation, short as possible, should be used <$$> Unique 2-digits number on your discretion to label sequence variant -------------------------------------------------- chris From gabriel.cardona at uib.es Thu Feb 22 09:33:14 2007 From: gabriel.cardona at uib.es (gcardona) Date: Thu, 22 Feb 2007 01:33:14 -0800 (PST) Subject: [Bioperl-l] bioperl-1.5.2_100 ... not found Message-ID: <9096740.post@talk.nabble.com> Hello, I am trying to install Bioperl on a Windows system, following the installation notes in http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows When the Perl Package Manager tries to download bioperl-1.5.2_100, it cannot find the package and answers: Downloading bioperl-1.5.2_100 ... not found I've looked the contents of http://bioperl.org/DIST and in package.xml the version for bioperl is bioperl-1.5.2_100, but in that folder the available version is bioperl-1.5.2_102 Is this a bug? or should I download and install manually? Thank you in advance, Gabriel Cardona -- View this message in context: http://www.nabble.com/bioperl-1.5.2_100-...-not-found-tf3271747.html#a9096740 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From bix at sendu.me.uk Thu Feb 22 12:35:14 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 22 Feb 2007 12:35:14 +0000 Subject: [Bioperl-l] bioperl-1.5.2_100 ... not found In-Reply-To: <9096740.post@talk.nabble.com> References: <9096740.post@talk.nabble.com> Message-ID: <45DD8E02.1070404@sendu.me.uk> gcardona wrote: > Hello, > > I am trying to install Bioperl on a Windows system, following the > installation notes in > http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows > When the Perl Package Manager tries to download bioperl-1.5.2_100, it cannot > find the package and answers: > Downloading bioperl-1.5.2_100 ... not found > > I've looked the contents of > http://bioperl.org/DIST > and in package.xml the version for bioperl is bioperl-1.5.2_100, but in that > folder the available version is bioperl-1.5.2_102 > Is this a bug? or should I download and install manually? Sorry, my mistake. I accidentally moved the ppm to a different folder. It should work now though. I may make a 1.5.2_102 ppm at some point, but there are no relevant differences between _102 and _100 as far as Windows users are concerned. From enrique_rulz at yahoo.com Thu Feb 22 20:41:37 2007 From: enrique_rulz at yahoo.com (Kurt Gobain) Date: Thu, 22 Feb 2007 12:41:37 -0800 (PST) Subject: [Bioperl-l] Sequence matching problem! Message-ID: <9107936.post@talk.nabble.com> Hi every1.. I m facing a great deal of problem in simple pattern matching between sequence & a pattern ..Program shod be designed such a way that it shod be able do two things 1) normal matching...For eg: GATCAAT....if TC is entered... output shod be 2...2) matching using spl character..In same example if C*T value is entered It shod give o/p as 3 & seq to b displayed is CAAT..I m easily getting 1st part...But in 2nd part Its giving sum problem..output I m gettin as 1 instead of 3...Code is really simple! #!/usr/bin/perl $alphabet = "GATCAAT"; $pattern= "C*T "; $alphabet =~ /($pattern)/i; print "The entire '$pattern' match began at $-[0] and ended at $+[0]\n"; ==================== OUTPUT! The entire C*T match began at 1 and ended at 2 ==================== but the o/p shod be 3???? & Is there n e chance I can get seq too..I mean instead of C*T'' i need 'CAAT'...???? Well..Its not compulsion to use regex....But I find it quite simple..can there be n e other method?? Thanx in advance! Kurt! -- View this message in context: http://www.nabble.com/Sequence-matching-problem%21-tf3275153.html#a9107936 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at uiuc.edu Thu Feb 22 21:01:03 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 22 Feb 2007 15:01:03 -0600 Subject: [Bioperl-l] GenBank accession bug? In-Reply-To: <51879.10.0.7.57.1172176272.squirrel@gscmail.wustl.edu> References: <11B83C2C-7BFE-48B5-A20C-7EAF282F39BD@uiuc.edu> <51879.10.0.7.57.1172176272.squirrel@gscmail.wustl.edu> Message-ID: <028E16D7-036A-44DA-BECD-F910BEA58E53@uiuc.edu> On Feb 22, 2007, at 2:31 PM, dmessina at watson.wustl.edu wrote: >> The issue at hand is whether we can support GenBank accessions/ >> display_id/version with your naming scheme. > > Chris, I'm a little unsure of what you're saying here (which might > mean > that you're already saying what I'm about to...say). Do you mean it > might > be tricky to support both the Genbank standard and Dmitry's > simultaneously? > > I would argue any arbitrary ID should be supported as long as that > ID is a > contiguous non-space word (\S+). > > Actually the existing accession regex looks like it already > supports IDs > with '-': > > /^ACCESSION\s+(\S.*\S)/ > > It's only the version regex which doesn't (\w doesn't include '-'): > > /^\w+\.(\d+)/ > > > Anyone else have thoughts or comments on this? Off the top of my > head, I > can't think of any issues that might arise from doing so (apart from > having to modify all of the SeqIO modules to support it). > > Dave You're right; the argument comes down simply to whether we would support \S+ or just \w+. I'm neutral on this myself, but I wonder how allowing \S+ would affect other modules (for instance, indexing for a flat db), where one might just use \w+ for accessions, expecting them to be GenBank- or EMBL-like alphanumerics. The fact that \S+ was supported in the past (as indicated in the bug report) and then wasn't post 1.2 makes me think there was a reason for someone going in and modifying it, but that was before my time on the group. I'll have a look at the CVS history when I have time to see what I can dig up. chris From mkiwala at watson.wustl.edu Thu Feb 22 20:36:33 2007 From: mkiwala at watson.wustl.edu (Michael Kiwala) Date: Thu, 22 Feb 2007 14:36:33 -0600 Subject: [Bioperl-l] Bio::AnnotatableI vs Bio::FeatureHolderI Message-ID: <45DDFED1.1090503@watson.wustl.edu> Are Bio::AnnotatableI and Bio::FeatureHolderI competing interfaces? I get the impression they are designed to do similar things. If so is one deprecated and the other preferred? If their responsibilities are orthogonal to each other, what sorts of tasks are suited to each? Thanks, Michael From dmessina at wustl.edu Thu Feb 22 20:53:01 2007 From: dmessina at wustl.edu (Dave Messina) Date: Thu, 22 Feb 2007 14:53:01 -0600 (CST) Subject: [Bioperl-l] GenBank accession bug? Message-ID: <51923.10.0.7.57.1172177581.squirrel@gscmail.wustl.edu> > The issue at hand is whether we can support GenBank accessions/ > display_id/version with your naming scheme. Chris, I'm a little unsure of what you're saying here (which might mean that you're already saying what I'm about to...say). Do you mean it might be tricky to support both the Genbank standard and Dmitry's simultaneously? I would argue any arbitrary ID should be supported as long as that ID is a contiguous non-space word (\S+). Actually the existing accession regex looks like it already supports IDs with '-': /^ACCESSION\s+(\S.*\S)/ It's only the version regex which doesn't (\w doesn't include '-'): /^\w+\.(\d+)/ Anyone else have thoughts or comments on this? Off the top of my head, I can't think of any issues that might arise from doing so (apart from having to modify all of the SeqIO modules to support it). Dave From heikki at sanbi.ac.za Fri Feb 23 08:25:39 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Fri, 23 Feb 2007 10:25:39 +0200 Subject: [Bioperl-l] Sequence matching problem! In-Reply-To: <9107936.post@talk.nabble.com> References: <9107936.post@talk.nabble.com> Message-ID: <200702231025.39416.heikki@sanbi.ac.za> Kurt, There are few things in your code to note: - regexp /C*T/ matches any T preceded by zero or more Cs, not what you meant - $- and $+ are among the "expensive" perl functions worth not using unless you have to. Using them once in your code slows execution down considerable. There is always an other way. - Keep in mind what you want to use the match positions for: Human readable locations usually start counting with 1 but perl code uses 0 as the first location. The code below assumes you want to print the locations out. Study my example code below. Yours, -Heikki ################################################################### #!/usr/bin/perl $seq = "GATCAAT"; #$pattern= 'C*T'; $pattern= 'C.*T'; while ($seq =~ m/($pattern)/gi) { $match = $1; $end = pos($seq); $start = $end - length($match) +1; print "$match : $start - $end\n"; } ################################################################### On Thursday 22 February 2007 22:41:37 Kurt Gobain wrote: > Hi every1.. > I m facing a great deal of problem in simple pattern matching between > sequence & a pattern ..Program shod be designed such a way that it shod be > able do two things 1) normal matching...For eg: GATCAAT....if TC is > entered... output shod be 2...2) matching using spl character..In same > example if C*T value is entered It shod give o/p as 3 & seq to b displayed > is CAAT..I m easily getting 1st part...But in 2nd part Its giving sum > problem..output I m gettin as 1 instead of 3...Code is really simple! > > #!/usr/bin/perl > $alphabet = "GATCAAT"; > $pattern= "C*T "; > > $alphabet =~ /($pattern)/i; > > print "The entire '$pattern' match began at $-[0] and ended at $+[0]\n"; > > ==================== > OUTPUT! > The entire C*T match began at 1 and ended at 2 > ==================== > > but the o/p shod be 3???? > & Is there n e chance I can get seq too..I mean instead of C*T'' i need > 'CAAT'...???? > > Well..Its not compulsion to use regex....But I find it quite simple..can > there be n e other method?? > > Thanx in advance! > Kurt! -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From avilella at gmail.com Fri Feb 23 09:59:49 2007 From: avilella at gmail.com (Albert Vilella) Date: Fri, 23 Feb 2007 09:59:49 +0000 Subject: [Bioperl-l] Sequence matching problem! In-Reply-To: <200702231025.39416.heikki@sanbi.ac.za> References: <9107936.post@talk.nabble.com> <200702231025.39416.heikki@sanbi.ac.za> Message-ID: <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com> now that we are at this pattern matching thread, I was wondering if any perl guru could enlighten me on the issue of matching exact sequence patterns on a gapped target sequence. E.g.: my $seq = "CGATCAACGAATCGTACGTACTC"; my $gapped_seq = "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG"; and one would like to get as a result: "CG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTC" which is the match of $seq but in $gapped_seq. Cheers, Albert. On 2/23/07, Heikki Lehvaslaiho wrote: > Kurt, > > There are few things in your code to note: > > - regexp /C*T/ matches any T preceded by zero or more Cs, > not what you meant > - $- and $+ are among the "expensive" perl functions worth > not using unless you have to. Using them once in your > code slows execution down considerable. There is always > an other way. > - Keep in mind what you want to use the match positions for: > Human readable locations usually start counting with 1 but > perl code uses 0 as the first location. The code below assumes > you want to print the locations out. > > Study my example code below. > > Yours, > -Heikki > > ################################################################### > #!/usr/bin/perl > $seq = "GATCAAT"; > #$pattern= 'C*T'; > $pattern= 'C.*T'; > > while ($seq =~ m/($pattern)/gi) { > > $match = $1; > $end = pos($seq); > $start = $end - length($match) +1; > > print "$match : $start - $end\n"; > } > > ################################################################### > > > On Thursday 22 February 2007 22:41:37 Kurt Gobain wrote: > > Hi every1.. > > I m facing a great deal of problem in simple pattern matching between > > sequence & a pattern ..Program shod be designed such a way that it shod be > > able do two things 1) normal matching...For eg: GATCAAT....if TC is > > entered... output shod be 2...2) matching using spl character..In same > > example if C*T value is entered It shod give o/p as 3 & seq to b displayed > > is CAAT..I m easily getting 1st part...But in 2nd part Its giving sum > > problem..output I m gettin as 1 instead of 3...Code is really simple! > > > > #!/usr/bin/perl > > $alphabet = "GATCAAT"; > > $pattern= "C*T "; > > > > $alphabet =~ /($pattern)/i; > > > > print "The entire '$pattern' match began at $-[0] and ended at $+[0]\n"; > > > > ==================== > > OUTPUT! > > The entire C*T match began at 1 and ended at 2 > > ==================== > > > > but the o/p shod be 3???? > > & Is there n e chance I can get seq too..I mean instead of C*T'' i need > > 'CAAT'...???? > > > > Well..Its not compulsion to use regex....But I find it quite simple..can > > there be n e other method?? > > > > Thanx in advance! > > Kurt! > > > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From js5 at sanger.ac.uk Fri Feb 23 11:34:37 2007 From: js5 at sanger.ac.uk (James Smith) Date: Fri, 23 Feb 2007 11:34:37 +0000 (GMT) Subject: [Bioperl-l] Sequence matching problem! In-Reply-To: <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com> References: <9107936.post@talk.nabble.com> <200702231025.39416.heikki@sanbi.ac.za> <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com> Message-ID: On Fri, 23 Feb 2007, Albert Vilella wrote: > now that we are at this pattern matching thread, I was wondering if > any perl guru could enlighten me on the issue of matching exact > sequence patterns on a gapped target sequence. E.g.: > > my $seq = "CGATCAACGAATCGTACGTACTC"; > my $gapped_seq = > "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG"; > > and one would like to get as a result: > > "CG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTC" > > which is the match of $seq but in $gapped_seq. Try... my $seq = "CGATCAACGAATCGTACGTACTC"; my $gapped_seq = "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG"; my $regexp = '('.join('-*?',split//,$seq).')'; if( $gapped_seq =~ /$regexp/ ) { print "Match is $1\n"; } else { print "No match\n"; } (not sure on the efficiency if $seq is long tho') James > > Cheers, From khoueiry at ibdm.univ-mrs.fr Fri Feb 23 13:09:33 2007 From: khoueiry at ibdm.univ-mrs.fr (pierre) Date: Fri, 23 Feb 2007 14:09:33 +0100 Subject: [Bioperl-l] Sequence matching problem! In-Reply-To: <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com> References: <9107936.post@talk.nabble.com> <200702231025.39416.heikki@sanbi.ac.za> <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com> Message-ID: <1172236173.4309.6.camel@ciona-pierre> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From neetisomaiya at gmail.com Fri Feb 23 12:27:28 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Fri, 23 Feb 2007 17:57:28 +0530 Subject: [Bioperl-l] need help urgently - needle output parsing Message-ID: <764978cf0702230427x5b5acf73y6538527ade3fd453@mail.gmail.com> Hi, I am using needle alignment tool (standalone, on a linux machine), and then I am using Bioperl to parse the output. All data - sequence files and alignment outputs are attached with this mail. I have 2 small sequences :- 693.seq and revcomp693.seq I have 2 big sequences :- 80768-4291-5639.84809_84810_84809_1.scf.seq and 80768-4291-5639.84809_84810_84810_1.scf.seq All these are in fasta format Now I am doing the following :- 1) Aligning 80768-4291-5639.84809_84810_84809_1.scf.seq and 693.seq - output file is 80768-4291-5639.84809_84810_84809_1.scf.out parsing the output gives me the alignment start in 'traceseq' as 97 2) Aligning 80768-4291-5639.84809_84810_84809_1.scf.seq and revcomp693.seq - output file is 80768-4291-5639.84809_84810_84809_1.scf.comp.out parsing the output gives me the alignment start in 'traceseq' as 91 All this is correct. Now I am doing the following :- 1) Aligning 80768-4291-5639.84809_84810_84810_1.scf.seq and 693.seq - output file is 80768-4291-5639.84809_84810_84810_1.scf.out parsing the output gives me the alignment start in 'traceseq' as 341 (this is correct) 2) Aligning 80768-4291-5639.84809_84810_84810_1.scf.seq and revcomp693.seq - output file is 80768-4291-5639.84809_84810_84810_1.scf.comp.out parsing the output gives me the alignment start in 'traceseq' as 341 (this is incorrect, correct position is 330) Part of my code is as follows :- --------------------------------------------- # running needle `$needle_path./needle $trace.seq $snp_position_on_con.seq -gapopen 10.0-gapextend 0.5 $output`; # parsing needle output my $str = Bio::AlignIO->new(-format => 'emboss',-file => $output); my $aln = $str->next_aln(); my $pos = $aln->column_from_residue_number('original',1); $logger->info("Alignment pos is $pos"); #################################### # running needle `$needle_path./needle $trace.seq revcomp$snp_position_on_con.seq -gapopen 10.0 -gapextend 0.5 $comp_output`; # parsing needle output my $comp_str = Bio::AlignIO->new(-format => 'emboss',-file => $comp_output); my $comp_aln = $comp_str->next_aln(); my $comp_pos = $comp_aln->column_from_residue_number('revcomp',1); $logger->info("Alignment pos is $comp_pos"); Can someone please tell me what is going wrong here? -- -Neeti Even my blood says, B positive -------------- next part -------------- A non-text attachment was scrubbed... Name: data.zip Type: application/zip Size: 4456 bytes Desc: not available URL: From bix at sendu.me.uk Fri Feb 23 13:55:24 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 23 Feb 2007 13:55:24 +0000 Subject: [Bioperl-l] Sequence matching problem! In-Reply-To: References: <9107936.post@talk.nabble.com> <200702231025.39416.heikki@sanbi.ac.za> <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com> Message-ID: <45DEF24C.1010303@sendu.me.uk> James Smith wrote: > On Fri, 23 Feb 2007, Albert Vilella wrote: > >> now that we are at this pattern matching thread, I was wondering if >> any perl guru could enlighten me on the issue of matching exact >> sequence patterns on a gapped target sequence. E.g.: >> >> my $seq = "CGATCAACGAATCGTACGTACTC"; >> my $gapped_seq = >> "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG"; >> >> and one would like to get as a result: >> >> "CG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTC" >> >> which is the match of $seq but in $gapped_seq. > > Try... > > my $seq = "CGATCAACGAATCGTACGTACTC"; > my $gapped_seq = > "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG"; > > my $regexp = '('.join('-*?',split//,$seq).')'; > > if( $gapped_seq =~ /$regexp/ ) { > print "Match is $1\n"; > } else { > print "No match\n"; > } That's great stuff. If you were matching thousands of different $seq against the same very large $gapped_seq, and only needed the first match of $seq in $gapped_seq, the alternative to the above approach (remove the gaps from $gapped_seq and do index() matching) will be faster. Here's one (overly long-winded) way of implementing it, that I found to take ~2s vs ~22s for the above regex approach when doing the job on 999999 copies of $seq: #!/usr/bin/perl -w use strict; use warnings; my $gapped_seq = "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG"; # note the total gap-length at position in gapless 0-based coords my @gap_lengths; my $gap_length = 0; while ($gapped_seq =~ /(-+)/g) { my $match = $1; my $prev_length = $gap_length; $gap_length += length($match); my $end = pos($gapped_seq) - $gap_length - 1; push(@gap_lengths, $prev_length) for (1..$end-$#gap_lengths); } push(@gap_lengths, $gap_length) for (1..(length($gapped_seq) - @gap_lengths - $gap_length)); # remove the gaps my $gapless_seq = $gapped_seq; $gapless_seq =~ s/-//g; # now for each of thousands of seqs... my $seq = 'CGATCAACGAATCGTACGTACTC'; my @seqs; for (1..999999) { push(@seqs, $seq); } foreach my $seq (@seqs) { my $start = index($gapless_seq, $seq); if ($start == -1) { print "No match found for seq '$seq'\n"; next; } my $end = $start + length($seq) - 1; # calculate the coords in $gapped_seq $start = $start + $gap_lengths[$start]; $end = $end + $gap_lengths[$end]; my $result = substr($gapped_seq, $start, ($end - $start + 1)); #print $result, "\n"; } exit; From MEC at stowers-institute.org Fri Feb 23 15:54:57 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 23 Feb 2007 09:54:57 -0600 Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes with multiple values In-Reply-To: <6dce9a0b0701301446w7fc31d6eufe27442fecd0f20e@mail.gmail.com> Message-ID: Lincoln, and other Bio::DB::SeqFeature wanderers: I find that generating GFF from a Bio::DB::SeqFeature using gff3_string does not respect the following: "Multiple attributes of the same type are indicated by separating the values with the comma "," character" (c.f. http://www.sequenceontology.org/gff3.shtml) This one-liner demonstrates the problem: perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id => "J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A', -name => 'mec', -attributes => {foo => [qw(bar blat)]})->gff3_string' J A PH 1 2 . . . foo=bar;foo=blat;Name=mec Do you agree this is a problem? The fix is in the post-sig patch to /Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the stylistic privilege of promoting any ID, Parent, or Name attribute to the front of column 9, so output is now: J A PH 1 2 . . . Name=mec;foo=bar,blat Do you agree this is better? I am poised to commit it, as well as the functionally same patch to the equivilent function in Bio/Graphics/FeatureBase.pm All clear? -- Malcolm Cook *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25 --- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000 *************** *** 481,494 **** next if $t eq 'load_id'; next if $t eq 'parent_id'; foreach (@values) { s/\s+$// } # get rid of trailing whitespace ! ! push @result,join '=',$self->escape($t),$self->escape($_) foreach @values; } my $id = $self->primary_id; my $name = $self->display_name; ! push @result,"ID=".$self->escape($id) if defined $id; ! push @result,"Parent=".$self->escape($parent->primary_id) if defined $parent; ! push @result,"Name=".$self->escape($name) if defined $name; return join ';', at result; } --- 481,498 ---- next if $t eq 'load_id'; next if $t eq 'parent_id'; foreach (@values) { s/\s+$// } # get rid of trailing whitespace ! ! push @result,join '=',$self->escape($t),$self->escape($_) foreach @values; ! # NO! Multiple attributes of the same type are indicated by ! # separating the values with the comma "," character - per ! # http://www.sequenceontology.org/gff3.shtml. Do it this way: ! #push @result,join '=',$self->escape($t),join(',', map {$self->escape($_)} @values); } my $id = $self->primary_id; my $name = $self->display_name; ! unshift @result,"ID=".$self->escape($id) if defined $id; ! unshift @result,"Parent=".$self->escape($parent->primary_id) if defined $parent; ! unshift @result,"Name=".$self->escape($name) if defined $name; return join ';', at result; } From MEC at stowers-institute.org Fri Feb 23 17:08:11 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 23 Feb 2007 11:08:11 -0600 Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes withmultiple values In-Reply-To: Message-ID: Oy, I hit send too soon. The patch I send had my new attribute encoder commented out. It should've been: *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25 --- NormalizedFeature.pm 23 Feb 2007 17:06:37 -0000 *************** *** 481,494 **** next if $t eq 'load_id'; next if $t eq 'parent_id'; foreach (@values) { s/\s+$// } # get rid of trailing whitespace ! ! push @result,join '=',$self->escape($t),$self->escape($_) foreach @values; } my $id = $self->primary_id; my $name = $self->display_name; ! push @result,"ID=".$self->escape($id) if defined $id; ! push @result,"Parent=".$self->escape($parent->primary_id) if defined $parent; ! push @result,"Name=".$self->escape($name) if defined $name; return join ';', at result; } --- 481,497 ---- next if $t eq 'load_id'; next if $t eq 'parent_id'; foreach (@values) { s/\s+$// } # get rid of trailing whitespace ! # push @result,join '=',$self->escape($t),$self->escape($_) foreach @values; ! # NO! Multiple attributes of the same type are indicated by ! # separating the values with the comma "," character - per ! # http://www.sequenceontology.org/gff3.shtml. Do it this way: ! push @result,join '=',$self->escape($t),join(',', map {$self->escape($_)} @values); } my $id = $self->primary_id; my $name = $self->display_name; ! unshift @result,"ID=".$self->escape($id) if defined $id; ! unshift @result,"Parent=".$self->escape($parent->primary_id) if defined $parent; ! unshift @result,"Name=".$self->escape($name) if defined $name; return join ';', at result; } Malcolm From lstein at cshl.edu Fri Feb 23 17:16:01 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 23 Feb 2007 12:16:01 -0500 Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes with multiple values In-Reply-To: References: <6dce9a0b0701301446w7fc31d6eufe27442fecd0f20e@mail.gmail.com> Message-ID: <6dce9a0b0702230916o684e490cu9d01d3be3e2ae061@mail.gmail.com> Hi Malcom, You're quite right, and I appreciate your work in tracking down and fixing it. Before you commit the patch, can you confirm that the loader is working correctly so that comma-separated values are read back into the data structure as multiple attributes? Lincoln On 2/23/07, Cook, Malcolm wrote: > > Lincoln, and other Bio::DB::SeqFeature wanderers: > > I find that generating GFF from a Bio::DB::SeqFeature using gff3_string > does not respect the following: > > "Multiple attributes of the same type are indicated by separating the > values with the comma "," character" (c.f. > http://www.sequenceontology.org/gff3.shtml) > > This one-liner demonstrates the problem: > > perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id => > "J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A', > -name => 'mec', -attributes => {foo => [qw(bar blat)]})->gff3_string' > J A PH 1 2 . . . > foo=bar;foo=blat;Name=mec > > Do you agree this is a problem? > > The fix is in the post-sig patch to > /Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the > stylistic privilege of promoting any ID, Parent, or Name attribute to > the front of column 9, so output is now: > > J A PH 1 2 . . . > Name=mec;foo=bar,blat > > Do you agree this is better? > > I am poised to commit it, as well as the functionally same patch to the > equivilent function in Bio/Graphics/FeatureBase.pm > > All clear? > > -- Malcolm Cook > > > > *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25 > --- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000 > *************** > *** 481,494 **** > next if $t eq 'load_id'; > next if $t eq 'parent_id'; > foreach (@values) { s/\s+$// } # get rid of trailing whitespace > ! > ! push @result,join '=',$self->escape($t),$self->escape($_) foreach > @values; > } > my $id = $self->primary_id; > my $name = $self->display_name; > ! push @result,"ID=".$self->escape($id) if defined > $id; > ! push @result,"Parent=".$self->escape($parent->primary_id) if defined > $parent; > ! push @result,"Name=".$self->escape($name) if > defined $name; > return join ';', at result; > } > > --- 481,498 ---- > next if $t eq 'load_id'; > next if $t eq 'parent_id'; > foreach (@values) { s/\s+$// } # get rid of trailing whitespace > ! > ! push @result,join '=',$self->escape($t),$self->escape($_) foreach > @values; > ! # NO! Multiple attributes of the same type are indicated by > ! # separating the values with the comma "," character - per > ! # http://www.sequenceontology.org/gff3.shtml. Do it this way: > ! #push @result,join '=',$self->escape($t),join(',', map > {$self->escape($_)} @values); > } > my $id = $self->primary_id; > my $name = $self->display_name; > ! unshift @result,"ID=".$self->escape($id) if > defined $id; > ! unshift @result,"Parent=".$self->escape($parent->primary_id) if > defined $parent; > ! unshift @result,"Name=".$self->escape($name) if > defined $name; > return join ';', at result; > } > > > > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From aaron.j.mackey at gsk.com Fri Feb 23 14:36:18 2007 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Fri, 23 Feb 2007 09:36:18 -0500 Subject: [Bioperl-l] Bio::AnnotatableI vs Bio::FeatureHolderI In-Reply-To: <45DDFED1.1090503@watson.wustl.edu> Message-ID: The fundamental difference (in my mind) between a feature and an annotation, is that a feature has a location/range, and thus the information represented in the feature is applicable only to that location/range. An annotation, on the other hand, is "global", or at least non-localizable (note: a feature with a "fuzzy" location of "somewhere along this sequence, but I'm not sure where" is still not global - if you did/could know the location, you'd describe it as a feature, so it shouldn't be represented with an annotation). -Aaron bioperl-l-bounces at lists.open-bio.org wrote on 02/22/2007 03:36:33 PM: > Are Bio::AnnotatableI and Bio::FeatureHolderI competing interfaces? > > I get the impression they are designed to do similar things. If so is > one deprecated and the other preferred? > > If their responsibilities are orthogonal to each other, what sorts of > tasks are suited to each? > > Thanks, > Michael > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From MEC at stowers-institute.org Fri Feb 23 18:46:00 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 23 Feb 2007 12:46:00 -0600 Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes with multiple values In-Reply-To: <6dce9a0b0702230916o684e490cu9d01d3be3e2ae061@mail.gmail.com> Message-ID: Lincoln, OK. I'll do that... ...let's see, a quick squiz at Bio/DB/SeqFeature/Store/ .... ...ok - parse_attributes _looks_ right to me ...so, let's try it #load a feature into a new database: bp_seqfeature_load.PLS -dsn 'dbi:mysql:database=test;host=mysql-dev' -create -user test -pass test <(echo -e "J\tA\tPH\t1\t2\t.\t.\t.\tfoo=bar,blat;Name=mec\n") #It loaded ok. Now, let's print it out in GFF3: perl -MBio::DB::SeqFeature::Store -e 'foreach (Bio::DB::SeqFeature::Store->new(-dsn => "dbi:mysql:database=test;host=mysql-dev;user=test;password=test")->featu res(-type => "PH:A")) {print $_->gff3_string . "\n"}' J A PH 1 2 . . . Name=mec;ID=1;foo=bar,blat #output looks good to me Note, I tried loading attributes foo=bar;foo=blat and it came back foo=bar,blat. So, you can load either way. I'll commit later today. --Malcolm ________________________________ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: Friday, February 23, 2007 11:16 AM To: Cook, Malcolm Cc: bioperl list; lstein at cshl.org Subject: Re: Bio::DB::SeqFeature to GFF mishandles attributes with multiple values Hi Malcom, You're quite right, and I appreciate your work in tracking down and fixing it. Before you commit the patch, can you confirm that the loader is working correctly so that comma-separated values are read back into the data structure as multiple attributes? Lincoln On 2/23/07, Cook, Malcolm wrote: Lincoln, and other Bio::DB::SeqFeature wanderers: I find that generating GFF from a Bio::DB::SeqFeature using gff3_string does not respect the following: "Multiple attributes of the same type are indicated by separating the values with the comma "," character" (c.f. http://www.sequenceontology.org/gff3.shtml) This one-liner demonstrates the problem: perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id => "J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A', -name => 'mec', -attributes => {foo => [qw(bar blat)]})->gff3_string' J A PH 1 2 . . . foo=bar;foo=blat;Name=mec Do you agree this is a problem? The fix is in the post-sig patch to /Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the stylistic privilege of promoting any ID, Parent, or Name attribute to the front of column 9, so output is now: J A PH 1 2 . . . Name=mec;foo=bar,blat Do you agree this is better? I am poised to commit it, as well as the functionally same patch to the equivilent function in Bio/Graphics/FeatureBase.pm All clear? -- Malcolm Cook *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25 --- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000 *************** *** 481,494 **** next if $t eq 'load_id'; next if $t eq 'parent_id'; foreach (@values) { s/\s+$// } # get rid of trailing whitespace ! ! push @result,join '=',$self->escape($t),$self->escape($_) foreach @values; } my $id = $self->primary_id; my $name = $self->display_name; ! push @result,"ID=".$self->escape($id) if defined $id; ! push @result,"Parent=".$self->escape($parent->primary_id) if defined $parent; ! push @result,"Name=".$self->escape($name) if defined $name; return join ';', at result; } --- 481,498 ---- next if $t eq 'load_id'; next if $t eq 'parent_id'; foreach (@values) { s/\s+$// } # get rid of trailing whitespace ! ! push @result,join '=',$self->escape($t),$self->escape($_) foreach @values; ! # NO! Multiple attributes of the same type are indicated by ! # separating the values with the comma "," character - per ! # http://www.sequenceontology.org/gff3.shtml. Do it this way: ! #push @result,join '=',$self->escape($t),join(',', map {$self->escape($_)} @values); } my $id = $self->primary_id; my $name = $self->display_name; ! unshift @result,"ID=".$self->escape($id) if defined $id; ! unshift @result,"Parent=".$self->escape($parent->primary_id) if defined $parent; ! unshift @result,"Name=".$self->escape($name) if defined $name; return join ';', at result; } -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Fri Feb 23 18:49:44 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 23 Feb 2007 12:49:44 -0600 Subject: [Bioperl-l] Bio::AnnotatableI vs Bio::FeatureHolderI In-Reply-To: References: Message-ID: To add to that, there's a HOWTO describing the differences: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation I agree w/ Aaron; if it has a location it's a feature, otherwise it's an annotation. chris On Feb 23, 2007, at 8:36 AM, aaron.j.mackey at gsk.com wrote: > The fundamental difference (in my mind) between a feature and an > annotation, is that a feature has a location/range, and thus the > information represented in the feature is applicable only to that > location/range. An annotation, on the other hand, is "global", or at > least non-localizable (note: a feature with a "fuzzy" location of > "somewhere along this sequence, but I'm not sure where" is still not > global - if you did/could know the location, you'd describe it as a > feature, so it shouldn't be represented with an annotation). > > -Aaron > > bioperl-l-bounces at lists.open-bio.org wrote on 02/22/2007 03:36:33 PM: > >> Are Bio::AnnotatableI and Bio::FeatureHolderI competing interfaces? >> >> I get the impression they are designed to do similar things. If >> so is >> one deprecated and the other preferred? >> >> If their responsibilities are orthogonal to each other, what sorts of >> tasks are suited to each? >> >> Thanks, >> Michael >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From lstein at cshl.edu Fri Feb 23 21:20:26 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Fri, 23 Feb 2007 16:20:26 -0500 Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes with multiple values In-Reply-To: References: <6dce9a0b0702230916o684e490cu9d01d3be3e2ae061@mail.gmail.com> Message-ID: <6dce9a0b0702231320j1f24d4b4oe33bce6d2da96db7@mail.gmail.com> Excellent! Lincoln On 2/23/07, Cook, Malcolm wrote: > > Lincoln, > > OK. I'll do that... > > ...let's see, a quick squiz at Bio/DB/SeqFeature/Store/ .... > > ...ok - parse_attributes _looks_ right to me > > ...so, let's try it > > #load a feature into a new database: > > bp_seqfeature_load.PLS -dsn 'dbi:mysql:database=test;host=mysql-dev' > -create -user test -pass test <(echo -e "J\tA\tPH\t1\t2\t.\t.\t.\tfoo=bar, > blat;Name=mec\n") > > #It loaded ok. Now, let's print it out in GFF3: > > perl -MBio::DB::SeqFeature::Store -e 'foreach > (Bio::DB::SeqFeature::Store->new(-dsn => > "dbi:mysql:database=test;host=mysql-dev;user=test;password=test")->features(-type > => "PH:A")) {print $_->gff3_string . "\n"}' > J A PH 1 2 . . . Name=mec;ID=1;foo=bar,blat > > #output looks good to me > > Note, I tried loading attributes foo=bar;foo=blat and it came back > foo=bar,blat. So, you can load either way. > > I'll commit later today. > > --Malcolm > > > ------------------------------ > *From:* lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] *On > Behalf Of *Lincoln Stein > *Sent:* Friday, February 23, 2007 11:16 AM > *To:* Cook, Malcolm > *Cc:* bioperl list; lstein at cshl.org > *Subject:* Re: Bio::DB::SeqFeature to GFF mishandles attributes with > multiple values > > Hi Malcom, > > You're quite right, and I appreciate your work in tracking down and fixing > it. Before you commit the patch, can you confirm that the loader is working > correctly so that comma-separated values are read back into the data > structure as multiple attributes? > > Lincoln > > On 2/23/07, Cook, Malcolm wrote: > > > > Lincoln, and other Bio::DB::SeqFeature wanderers: > > > > I find that generating GFF from a Bio::DB::SeqFeature using gff3_string > > does not respect the following: > > > > "Multiple attributes of the same type are indicated by separating the > > values with the comma "," character" (c.f. > > http://www.sequenceontology.org/gff3.shtml) > > > > This one-liner demonstrates the problem: > > > > perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id => > > "J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A', > > -name => 'mec', -attributes => {foo => [qw(bar blat)]})->gff3_string' > > J A PH 1 2 . . . > > foo=bar;foo=blat;Name=mec > > > > Do you agree this is a problem? > > > > The fix is in the post-sig patch to > > /Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the > > stylistic privilege of promoting any ID, Parent, or Name attribute to > > the front of column 9, so output is now: > > > > J A PH 1 2 . . . > > Name=mec;foo=bar,blat > > > > Do you agree this is better? > > > > I am poised to commit it, as well as the functionally same patch to the > > equivilent function in Bio/Graphics/FeatureBase.pm > > > > All clear? > > > > -- Malcolm Cook > > > > > > > > *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25 > > --- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000 > > *************** > > *** 481,494 **** > > next if $t eq 'load_id'; > > next if $t eq 'parent_id'; > > foreach (@values) { s/\s+$// } # get rid of trailing whitespace > > ! > > ! push @result,join '=',$self->escape($t),$self->escape($_) foreach > > @values; > > } > > my $id = $self->primary_id; > > my $name = $self->display_name; > > ! push @result,"ID=".$self->escape($id) if defined > > > > $id; > > ! push @result,"Parent=".$self->escape($parent->primary_id) if defined > > $parent; > > ! push @result,"Name=".$self->escape($name) if > > defined $name; > > return join ';', at result; > > } > > > > --- 481,498 ---- > > next if $t eq 'load_id'; > > next if $t eq 'parent_id'; > > foreach (@values) { s/\s+$// } # get rid of trailing whitespace > > ! > > ! push @result,join '=',$self->escape($t),$self->escape($_) foreach > > > > @values; > > ! # NO! Multiple attributes of the same type are indicated by > > ! # separating the values with the comma "," character - per > > ! # http://www.sequenceontology.org/gff3.shtml. Do it this way: > > ! #push @result,join '=',$self->escape($t),join(',', map > > {$self->escape($_)} @values); > > } > > my $id = $self->primary_id; > > my $name = $self->display_name; > > ! unshift @result,"ID=".$self->escape($id) if > > defined $id; > > ! unshift @result,"Parent=".$self->escape($parent->primary_id) if > > defined $parent; > > ! unshift @result,"Name=".$self->escape($name) if > > defined $name; > > return join ';', at result; > > } > > > > > > > > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From enrique_rulz at yahoo.com Sat Feb 24 21:23:59 2007 From: enrique_rulz at yahoo.com (Kurt Gobain) Date: Sat, 24 Feb 2007 13:23:59 -0800 (PST) Subject: [Bioperl-l] Sequence matching problem! In-Reply-To: <200702231025.39416.heikki@sanbi.ac.za> References: <9107936.post@talk.nabble.com> <200702231025.39416.heikki@sanbi.ac.za> Message-ID: <9137941.post@talk.nabble.com> Heikki Lehvaslaiho wrote: > > Kurt, > > There are few things in your code to note: > > - regexp /C*T/ matches any T preceded by zero or more Cs, > not what you meant > - $- and $+ are among the "expensive" perl functions worth > not using unless you have to. Using them once in your > code slows execution down considerable. There is always > an other way. > - Keep in mind what you want to use the match positions for: > Human readable locations usually start counting with 1 but > perl code uses 0 as the first location. The code below assumes > you want to print the locations out. > > Study my example code below. > > Yours, > -Heikki > > ################################################################### > #!/usr/bin/perl > $seq = "GATCAAT"; > #$pattern= 'C*T'; > $pattern= 'C.*T'; > > while ($seq =~ m/($pattern)/gi) { > > $match = $1; > $end = pos($seq); > $start = $end - length($match) +1; > > print "$match : $start - $end\n"; > } > > ################################################################### > > Thanx for the instant reply!...Sorry cudn reply earlier.. Code works perfectly fine...but...sum time its not givin reqd o/p..For eg. If I type sequence as "GATCAAGTCAGGAT" & pattern to be matched as T.*A..then o/p which I am getting frm above prog is TCAAGTCAGGA instead of TCA... & 1 more thing Is there n e chance by which I can replace T*A to T.*A cos the code which I need to write says T*A shod be only the input not T.*A..So Can we use replacment reg ex...sumthing like $pattern =~ s/.*/*/...or sumthing else... But its kinda givin sum error again...Dam! Regex is really hairy!!...:P N e ways thanx a lot again for the code...Hope to listen frm you soon! Kurt! -- View this message in context: http://www.nabble.com/Sequence-matching-problem%21-tf3275153.html#a9137941 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From biology0046 at hotmail.com Sun Feb 25 04:14:51 2007 From: biology0046 at hotmail.com (=?gb2312?B?va0gzsTi/Q==?=) Date: Sun, 25 Feb 2007 04:14:51 +0000 Subject: [Bioperl-l] how to change align output format Message-ID: Dear all: I have problems in changing the output format of clustal alignment. I use the Bio::Tools::Run::Alignment::Clustalw module to carry out an mulitple sequences alignment, then i use the Bio::AlignIO module to write out the alignment. Scripts like this: my $aln_out=Bio::AlignIO->new(-file=>">./clustal/${outfilename}.aln",-format=>'clustalw'); The output : dana_GLEANR_16071 MSKMKMLPVQLSLNQLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL dere_GLEANR_9270 ..............S............................................. FBgn0000097 ..............S............................................. dsec_GLEANR_671 ..............S............................................. dsim_GLEANR_6613 ..............S............................................. dyak_GLEANR_1669 ..............S............................................. . dana_GLEANR_16071 VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM dere_GLEANR_9270 ............................................................ FBgn0000097 ............................................................ dsec_GLEANR_671 ............................................................ dsim_GLEANR_6613 ............................................................ dyak_GLEANR_1669 ............................................................ But , I want to change the output format as below, which do not change the identical residues into "." character. dere_GLEANR_9270 MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL dyak_GLEANR_1669 MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL dsec_GLEANR_671 MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL dsim_GLEANR_6613 MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL FBgn0000097 MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL dana_GLEANR_16071 MSKMKMLPVQLSLNQLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL **************.********************************************* dere_GLEANR_9270 VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM dyak_GLEANR_1669 VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM dsec_GLEANR_671 VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM dsim_GLEANR_6613 VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM FBgn0000097 VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM dana_GLEANR_16071 VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM ************************************************************ Are their any parameters in the package that can be changed so that i can get the postier output format? Thank you Sincerely! Jiang _________________________________________________________________ ??????????????? MSN Hotmail? http://www.hotmail.com From bix at sendu.me.uk Sun Feb 25 10:53:48 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 25 Feb 2007 10:53:48 +0000 Subject: [Bioperl-l] [Fwd: Bio::Graph::SimpleGraph] Message-ID: <45E16ABC.3060405@sendu.me.uk> Tels, I've forwarded this to the author of the module, Nat Goodman, and to the Bioperl mailing list (http://www.bioperl.org/wiki/Mailing_lists#Main_BioPerl_list). But actually we have Bio::Graph::* as tentatively deprecated: http://www.bioperl.org/wiki/Deprecated_modules#Bio::Graph_modules so any further work on it doesn't seem worthwhile. -------- Original Message -------- Subject: Bio::Graph::SimpleGraph Date: Sat, 24 Feb 2007 12:07:31 +0100 From: Tels Moin, I just stumble dover Bio::Graph::SimpleGraph and read this comment: "This is a simple, hopefully fast undirected graph package. The only reason this exists is that the standard CPAN Graph pacakge, Graph::Base, is seriously broken." Really sad to see people always reinventing the wheel :/ Anyway, I wonder if you would like to make your module support Graph::Easy (http://search.cpan.org/~tels/Graph-Easy/)? I would be willing to submit patches and do testing/documention for that. All the best, Tels From bix at sendu.me.uk Sun Feb 25 10:45:21 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 25 Feb 2007 10:45:21 +0000 Subject: [Bioperl-l] Sequence matching problem! In-Reply-To: <9137941.post@talk.nabble.com> References: <9107936.post@talk.nabble.com> <200702231025.39416.heikki@sanbi.ac.za> <9137941.post@talk.nabble.com> Message-ID: <45E168C1.80306@sendu.me.uk> Kurt Gobain wrote: > Code works perfectly fine...but...sum time its not givin reqd o/p..For eg. > If I type sequence as "GATCAAGTCAGGAT" & pattern to be matched as T.*A..then > o/p which I am getting frm above prog is TCAAGTCAGGA instead of TCA... > & 1 more thing Is there n e chance by which I can replace T*A to T.*A cos > the code which I need to write says T*A shod be only the input not T.*A..So > Can we use replacment reg ex...sumthing like > $pattern =~ s/.*/*/...or sumthing else... > But its kinda givin sum error again...Dam! Regex is really hairy!!...:P These aren't Bioperl questions. For regular expression help see: http://perldoc.perl.org/perlretut.html Basically, you want a non-greedy match, so T.*?A You can convert T*A by doing s/\*/.*?/ Here are some more regexs for you: s/sum/some/g s/frm/from/g s/n e/any/g etc... From biology0046 at hotmail.com Sun Feb 25 13:28:34 2007 From: biology0046 at hotmail.com (=?gb2312?B?va0gzsTi/Q==?=) Date: Sun, 25 Feb 2007 13:28:34 +0000 Subject: [Bioperl-l] AlignIO problems Message-ID: hi, all, I use the AlignIO module to convert the alignment file. my original file is : CLUSTAL W(1.81) multiple sequence alignment dana_GLEANR_11249 MEAIAKHDFSATADDELSFRKTQTLKILNMEDDSNWYRAELDGKEGLIPSNYIEMKNHDW dere_GLEANR_7213 ...V...................I.................................... dgri_GLEANR_6962 .......................I.................................... FBgn0004638 .......................I.................................... dmoj_GLEANR_6118 ...........N...........I.................................... dper_GLEANR_18885 ...V...................I.................................... dpse_GLEANR_14384 ...V...................I.................................... dsec_GLEANR_3096 .................N.....I.................................... dsim_GLEANR_9744 -----------------------------............................... dvir_GLEANR_4811 .......................I.................................... dwil_GLEANR_10869 .......................I.................................... dyak_GLEANR_13576 .......................I.................................... dana_GLEANR_11249 YYGRITRADAEKLLSNKHEGAFLIRISESSPGDFSLSVKCPDGVQHFKVLRDAQSKFFLW dere_GLEANR_7213 ............................................................ dgri_GLEANR_6962 ............................................................ FBgn0004638 ............................................................ dmoj_GLEANR_6118 .................L.......................................... dper_GLEANR_18885 ............................................................ dpse_GLEANR_14384 ............................................................ dsec_GLEANR_3096 ............................................................ dsim_GLEANR_9744 ............................................................ dvir_GLEANR_4811 ............................................................ dwil_GLEANR_10869 ............................................................ dyak_GLEANR_13576 ............................................................ dana_GLEANR_11249 VVKFNSLNELVEYHRTASVSRSQDVKLRDMIPEEMLVQALYDFVPQESGELDFRRGDVIT dere_GLEANR_7213 ............................................................ dgri_GLEANR_6962 ............................................................ FBgn0004638 ............................................................ dmoj_GLEANR_6118 ..............................V.D........................... dper_GLEANR_18885 .......................E.................................... dpse_GLEANR_14384 .......................E.................................... dsec_GLEANR_3096 ............................................................ dsim_GLEANR_9744 ............................................................ dvir_GLEANR_4811 ............................................................ dwil_GLEANR_10869 ............................................................ dyak_GLEANR_13576 ............................................................ dana_GLEANR_11249 VTDRSDENWWNGEIGNRKGIFPATYVTPYHS dere_GLEANR_7213 ............................... dgri_GLEANR_6962 ............................... FBgn0004638 ............................... dmoj_GLEANR_6118 ............Q.................. dper_GLEANR_18885 ............................... dpse_GLEANR_14384 ............................... dsec_GLEANR_3096 ............................... dsim_GLEANR_9744 ............................... dvir_GLEANR_4811 ............................... dwil_GLEANR_10869 ............................... dyak_GLEANR_13576 ............................... I want to change those "." characters back to alphabetic expression, then i write the code like this: use Bio::AlignIO; my $in=Bio::AlignIO->new(-file =>"FBgn0000097.aln", -format => 'clustalw'); my $out=Bio::AlignIO->new(-file=>">../clustalw/0097.aln", -format =>'clustalw'); while (my $aln=$in->next_aln() ){ $aln->unmatch(); $aln->set_displayname_flat(); $out->write_aln($aln); } but when i run the code, there are error message like: -------------------- WARNING --------------------- MSG: Got a sequence with no letters in it cannot guess alphabet [] --------------------------------------------------- ------------- EXCEPTION ------------- MSG: No sequence with name [dsim_GLEANR_9744/1-182] STACK Bio::SimpleAlign::displayname /home/src/bioperl/bioperl-live/Bio/SimpleAlign.pm:2307 STACK Bio::SimpleAlign::set_displayname_flat /home/src/bioperl/bioperl-live/Bio/SimpleAlign.pm:2374 STACK toplevel aligntest.pl:11 -------------------------------------- I don't know where is the problem. Jiang _________________________________________________________________ ???? MSN Explorer: http://explorer.msn.com/lccn/ From cjfields at uiuc.edu Sun Feb 25 19:58:23 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 25 Feb 2007 13:58:23 -0600 Subject: [Bioperl-l] AlignIO problems In-Reply-To: References: Message-ID: <19EA5F46-D1A4-45B5-B2DB-55194F79215C@uiuc.edu> Bio::AlignIO::clustalw doesn't work with masked sequences; it parses the output quite literally as is, so any [.-] are treated as gaps. If the seqs are 100% identical then you will have a seq with 100% gaps and no sequence, thus giving you the warnings you see. The best way to accomplish what you want is to not mask the sequence alignment to begin with when running clustalw/muscle/whatever. Exactly how are you generating these? When I use clustalw no identity masking occurs by default. chris On Feb 25, 2007, at 7:28 AM, ? ?? wrote: > hi, all, > I use the AlignIO module to convert the alignment file. > my original file is : > CLUSTAL W(1.81) multiple sequence alignment > > > dana_GLEANR_11249 > MEAIAKHDFSATADDELSFRKTQTLKILNMEDDSNWYRAELDGKEGLIPSNYIEMKNHDW > dere_GLEANR_7213 ...V...................I....................... > ............. > dgri_GLEANR_6962 .......................I....................... > ............. > FBgn0004638 .......................I....................... > ............. > dmoj_GLEANR_6118 ...........N...........I....................... > ............. > dper_GLEANR_18885 ...V...................I....................... > ............. > dpse_GLEANR_14384 ...V...................I....................... > ............. > dsec_GLEANR_3096 .................N.....I....................... > ............. > dsim_GLEANR_9744 > -----------------------------............................... > dvir_GLEANR_4811 .......................I....................... > ............. > dwil_GLEANR_10869 .......................I....................... > ............. > dyak_GLEANR_13576 .......................I....................... > ............. > > > > dana_GLEANR_11249 > YYGRITRADAEKLLSNKHEGAFLIRISESSPGDFSLSVKCPDGVQHFKVLRDAQSKFFLW > dere_GLEANR_7213 ............................................... > ............. > dgri_GLEANR_6962 ............................................... > ............. > FBgn0004638 ............................................... > ............. > dmoj_GLEANR_6118 .................L............................. > ............. > dper_GLEANR_18885 ............................................... > ............. > dpse_GLEANR_14384 ............................................... > ............. > dsec_GLEANR_3096 ............................................... > ............. > dsim_GLEANR_9744 ............................................... > ............. > dvir_GLEANR_4811 ............................................... > ............. > dwil_GLEANR_10869 ............................................... > ............. > dyak_GLEANR_13576 ............................................... > ............. > > > > dana_GLEANR_11249 > VVKFNSLNELVEYHRTASVSRSQDVKLRDMIPEEMLVQALYDFVPQESGELDFRRGDVIT > dere_GLEANR_7213 ............................................... > ............. > dgri_GLEANR_6962 ............................................... > ............. > FBgn0004638 ............................................... > ............. > dmoj_GLEANR_6118 ..............................V.D.............. > ............. > dper_GLEANR_18885 .......................E....................... > ............. > dpse_GLEANR_14384 .......................E....................... > ............. > dsec_GLEANR_3096 ............................................... > ............. > dsim_GLEANR_9744 ............................................... > ............. > dvir_GLEANR_4811 ............................................... > ............. > dwil_GLEANR_10869 ............................................... > ............. > dyak_GLEANR_13576 ............................................... > ............. > > > > dana_GLEANR_11249 VTDRSDENWWNGEIGNRKGIFPATYVTPYHS > dere_GLEANR_7213 ............................... > dgri_GLEANR_6962 ............................... > FBgn0004638 ............................... > dmoj_GLEANR_6118 ............Q.................. > dper_GLEANR_18885 ............................... > dpse_GLEANR_14384 ............................... > dsec_GLEANR_3096 ............................... > dsim_GLEANR_9744 ............................... > dvir_GLEANR_4811 ............................... > dwil_GLEANR_10869 ............................... > dyak_GLEANR_13576 ............................... > > > I want to change those "." characters back to alphabetic > expression, then i write the code like this: > use Bio::AlignIO; > my $in=Bio::AlignIO->new(-file =>"FBgn0000097.aln", > -format => 'clustalw'); > my $out=Bio::AlignIO->new(-file=>">../clustalw/0097.aln", > -format =>'clustalw'); > while (my $aln=$in->next_aln() ){ > $aln->unmatch(); > $aln->set_displayname_flat(); > $out->write_aln($aln); > } > > but when i run the code, there are error message like: > > -------------------- WARNING --------------------- > MSG: Got a sequence with no letters in it cannot guess alphabet [] > --------------------------------------------------- > > ------------- EXCEPTION ------------- > MSG: No sequence with name [dsim_GLEANR_9744/1-182] > STACK Bio::SimpleAlign::displayname /home/src/bioperl/bioperl-live/ > Bio/SimpleAlign.pm:2307 > STACK Bio::SimpleAlign::set_displayname_flat /home/src/bioperl/ > bioperl-live/Bio/SimpleAlign.pm:2374 > STACK toplevel aligntest.pl:11 > > -------------------------------------- > > I don't know where is the problem. > > Jiang > > _________________________________________________________________ > ???? MSN Explorer: http://explorer.msn.com/lccn/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cristiangary at gmail.com Sun Feb 25 21:04:57 2007 From: cristiangary at gmail.com (Cristian Gary) Date: Sun, 25 Feb 2007 18:04:57 -0300 Subject: [Bioperl-l] problem with blast report to ncbi webpage Message-ID: <95ef8cd0702251304o45bea6a0tcedc59156cb0cfe4@mail.gmail.com> i have a problem with the blast report to the ncbi server. the time to wait the Rids dont showme any result. the problem is the ncbi server o the biperl version.? pd: the same code works very well a 3 weeks ago. -- "El conocimiento le pertecene a la humanidad" "Gnu/linux -------- free your mind...... www.kubuntu.org From granjeau at tagc.univ-mrs.fr Mon Feb 26 09:17:15 2007 From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137) Date: Mon, 26 Feb 2007 10:17:15 +0100 Subject: [Bioperl-l] Reading a XML sequence (UniParc) into a BioSeq object Message-ID: <45E2A59B.6080300@tagc.univ-mrs.fr> Hello ! I would like to fill a BioSeq object with the output from a dbfetch request at EI on UniParc database (which replies only XML code, as I am interested in references). If somebody could tell which BioPerl object to use or a way or convert it in Swiss format or could tell me the way to do it or has got a piece of code (is http://doc.bioperl.org/bioperl-live/Bio/SeqIO/interpro.html a good starting point), I would appreciate a lot. Best regards, --Samuel MSTRSVSSSSYRRMFGGPGTASRPSSSRSYVTTSTRTYSLGSALRPSTSRSLYASSPGGV YATRSSAVRLRSSVPGVRLLQDSVDFSLADAINTEFKNTRTNEKVELQELNDRFANYIDK VRFLEQQNKILLAELEQLKGQGKSRLGDLYEEEMRELRRQVDQLTNDKARVEVERDNLAE DIMRLREKLQEEMLQREEAENTLQSFRQDVDNASLARLDLERKVESLQEEIAFLKKLHEE EIQELQAQIQEQHVQIDVDVSKPDLTAALRDVRQQYESVAAKNLQEAEEWYKSKFADLSE AANRNNDALRQAKQESTEYRRQVQSLTCEVDALKGTNESLERQMREMEENFAVEAANYQD TIGRLQDEIQNMKEEMARHLREYQDLLNVKMALDIEIATYRKLLEGEESRISLPLPNFSS LNLRGKHFISL From bix at sendu.me.uk Mon Feb 26 11:46:39 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 26 Feb 2007 11:46:39 +0000 Subject: [Bioperl-l] [Fwd: Bio::Graph::SimpleGraph] In-Reply-To: <45E16ABC.3060405@sendu.me.uk> References: <45E16ABC.3060405@sendu.me.uk> Message-ID: <45E2C89F.1020402@sendu.me.uk> Nat replied, but I messed up to To:s so his reply didn't make it to the list. Here's what he said: Nathan (Nat) Goodman wrote: Hi Tels I agree it's sad to reinvent the wheel, but I don't think that's what happened here. Your module seems to be focused on rendering graphs while my module is concerned with computations on graphs. In any case, as Sendu notes, SimpleGraph is in the process of being deprecated. I fully support this move. It was intended to be a stopgap until the main Perl Graph module was fixed. Since that has now happened, it's time for SimpleGraph to retire. For the benefit of anyone using Graph: last I checked (six months or more ago), it had serious performance problems on large graphs (probably not too much of a surprise), and also was inexplicably slow on graphs with edge attributes. I see that the latter bug is marked "resolved" in CPAN, but there's no indication of when or how. We've moved to Boost for graphs as large as the human protein interaction network. Best, Nat From sanjib at bic.boseinst.ernet.in Mon Feb 26 05:23:36 2007 From: sanjib at bic.boseinst.ernet.in (Sanjib Kumar Gupta) Date: Mon, 26 Feb 2007 10:53:36 +0530 Subject: [Bioperl-l] Remote blast In-Reply-To: <20070221064743.M54123@bic.boseinst.ernet.in> References: <20070221064743.M54123@bic.boseinst.ernet.in> Message-ID: <20070226052336.M74918@bic.boseinst.ernet.in> Hi I have been running this script for some time and it was running fine. I am using this linux machine with live IP(no proxy). But suudenly it has stopped working with this errors waiting...waiting... -------------------- WARNING --------------------- MSG: An Error Occurred

An Error Occurred

500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad hostname 'www.ncbi.nlm.nih.gov') --------------------------------------------------- xx.pep -------------------- WARNING --------------------- MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5 Content-Length: 497 Content-Type: application/x-www-form-urlencoded DATABASE=nr&QUERY=%3ENM_005277.3%3B_1414961_1+837% 0AMEENMEEGQTQKGCFECCIKCLGGIPYASLIATILLYAGVALFCGCGHEALSGTVNILQTYFEMARTAGDTLDVF TMIDIFKYVIYGIAAAFFVYGILLMVEGFFTTGAIKDLYGDFKITTCGRCVSAWFIMLTYLFMLAWLGVTAFTSLPV YMYFNLWTICRNTTLVEGANLCLDLRQFGIVTIGEEKKICTVSENFLRMCESTELNMTFHLFIVALAGAGAAVIAMV HYLMVLSANWAYVKDACRMQKYEDIKSKEEQELHDIHSTRSKERLNAYT*&COMPOSITION_BASED_STATISTI CS=off&EXPECT=1e- 10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&MATRIX_NAME=BLOSUM62&ENTREZ_ QUERY=Xenopus+laevis[Organism]&PROGRAM=blastp An Error Occurred

An Error Occurred

500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad hostname 'www.ncbi.nlm.nih.gov') --------------------------------------------------- waiting...waiting... -------------------- WARNING --------------------- MSG: An Error Occurred

An Error Occurred

500 Internal Server Error --------------------------------------------------- Though I am able to see the ncbi page from browser but am unable to ping ot trace route to the server. Please help me. On Wed, 21 Feb 2007 01:00:46 -0500, bioperl-l-request wrote > Mailing list subscription confirmation notice for mailing list > Bioperl-l > > We have received a request from 202.141.148.27 for subscription of > your email address, "sanjib at bic.boseinst.ernet.in", to the > bioperl-l at lists.open-bio.org mailing list. To confirm that you want > to be added to this mailing list, simply reply to this message, > keeping the Subject: header intact. Or visit this web page: > > http://lists.open-bio.org/mailman/confirm/bioperl- l/d31449c0ad1146c7ae6d2d9b585816664f476568 > > Or include the following line -- and only the following line -- in a > message to bioperl-l-request at lists.open-bio.org: > > confirm d31449c0ad1146c7ae6d2d9b585816664f476568 > > Note that simply sending a `reply' to this message should work from > most mail readers, since that usually leaves the Subject: line in the > right form (additional "Re:" text in the Subject: is okay). > > If you do not wish to be subscribed to this list, please simply > disregard this message. If you think you are being maliciously > subscribed to the list, or have any other questions, send them to > bioperl-l-owner at lists.open-bio.org. -- Sanjib Kumar Gupta Bioinformatics Centre Bose Institute Kolkata 700054, INDIA Phone : +91-33-2355 6626, 2816, 2355 4766 Fax : +91-33-2355 3886 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: n9.pl URL: From cjfields at uiuc.edu Mon Feb 26 14:59:21 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 26 Feb 2007 08:59:21 -0600 Subject: [Bioperl-l] Remote blast In-Reply-To: <20070226052336.M74918@bic.boseinst.ernet.in> References: <20070221064743.M54123@bic.boseinst.ernet.in> <20070226052336.M74918@bic.boseinst.ernet.in> Message-ID: I tested this out and got BLAST to work for my test case (single fasta seq, since you didn't send any seqs for testing). It keeps querying for the RID in what appears to be an infinite loop (i.e. it doesn't get rid of the RID properly); you can see this if you add '- verbose => 1' to your parameters. I don't have time to delve into it but from a quick glance it may be due to your looping structure and how you are saving your rids. As for your particular error, could it be something as simple as the server was overloaded or down? It does happen from time to time... Beyond that I can't make heads or tails of your script. Was it cobbled together from a bunch of others? If you are doing that you can probably expect some bugs to occur. chris On Feb 25, 2007, at 11:23 PM, Sanjib Kumar Gupta wrote: > Hi > I have been running this script for some time and it was running > fine. I am > using this linux machine with live IP(no proxy). But suudenly it > has stopped > working with this errors > > waiting...waiting... > -------------------- WARNING --------------------- > MSG: > An Error Occurred > >

An Error Occurred

> 500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad > hostname 'www.ncbi.nlm.nih.gov') > > > > --------------------------------------------------- > xx.pep > > -------------------- WARNING --------------------- > MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi > User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5 > Content-Length: 497 > Content-Type: application/x-www-form-urlencoded > > DATABASE=nr&QUERY=%3ENM_005277.3%3B_1414961_1+837% > 0AMEENMEEGQTQKGCFECCIKCLGGIPYASLIATILLYAGVALFCGCGHEALSGTVNILQTYFEMARTA > GDTLDVF > TMIDIFKYVIYGIAAAFFVYGILLMVEGFFTTGAIKDLYGDFKITTCGRCVSAWFIMLTYLFMLAWLGVT > AFTSLPV > YMYFNLWTICRNTTLVEGANLCLDLRQFGIVTIGEEKKICTVSENFLRMCESTELNMTFHLFIVALAGAG > AAVIAMV > HYLMVLSANWAYVKDACRMQKYEDIKSKEEQELHDIHSTRSKERLNAYT*&COMPOSITION_BASED_S > TATISTI > CS=off&EXPECT=1e- > 10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&MATRIX_NAME=BLOSUM62& > ENTREZ_ > QUERY=Xenopus+laevis[Organism]&PROGRAM=blastp > > > An Error Occurred > >

An Error Occurred

> 500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad > hostname 'www.ncbi.nlm.nih.gov') > > > > --------------------------------------------------- > waiting...waiting... > -------------------- WARNING --------------------- > MSG: > An Error Occurred > >

An Error Occurred

> 500 Internal Server Error > > > > --------------------------------------------------- > > Though I am able to see the ncbi page from browser but am unable to > ping ot > trace route to the server. > > Please help me. From cjfields at uiuc.edu Mon Feb 26 15:05:50 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 26 Feb 2007 09:05:50 -0600 Subject: [Bioperl-l] AlignIO problems In-Reply-To: References: Message-ID: <082E0708-6B1C-45CE-B387-429B8B6A8D7A@uiuc.edu> Make sure to keep this on the list, others may have some input. You should be able to test the various sequence objects you're retrieving from Bio::DB::Fasta via Bio::SeqIO to see if they are what you're expecting, then track down the problematic sequences. My guess is the odd seqs are due to the way you are using Bio::DB::Fasta for each of the files. I'm wondering if you are having problems with indices overwriting one another and are thus getting back blank seq objects. You should probably consider just indexing all of your files together; according to the POD you can use a single Bio::DB::Fasta to index all of the files in one go (indicate the path and use '-glob') and retrieve what you need that way. Either that or separating them into separate directories so the indices are also separate. chris On Feb 25, 2007, at 9:50 PM, ? ?? wrote: > Thank you for your help! > May be you are right, I use the following code to create my seq > object arrays: > my $outfilename=$dmel; > my $ana_pep_db=Bio::DB::Fasta->new("dana.translation.fasta"); > my $ana_cdna_db=Bio::DB::Fasta->new("dana.cds.fasta"); > my $ere_pep_db=Bio::DB::Fasta->new("dere.translation.fasta"); > my $ere_cdna_db=Bio::DB::Fasta->new("dere.cds.fasta"); > my $mel_pep_db=Bio::DB::Fasta->new("dmel.translation.fasta"); > my $mel_cdna_db=Bio::DB::Fasta->new("dmel.cds.fasta"); > my $sec_pep_db=Bio::DB::Fasta->new("dsec.translation.fasta"); > my $sec_cdna_db=Bio::DB::Fasta->new("dsec.cds.fasta"); > my $sim_pep_db=Bio::DB::Fasta->new("dsim.translation.fasta"); > my $sim_cdna_db=Bio::DB::Fasta->new("dsim.cds.fasta"); > my $yak_pep_db=Bio::DB::Fasta->new("dyak.translation.fasta"); > my $yak_cdna_db=Bio::DB::Fasta->new("dyak.cds.fasta"); > my $ana_pep_obj=$ana_pep_db->get_Seq_by_id($dana); > my $ana_nuc_obj=$ana_cdna_db->get_Seq_by_id($dana); > my $ere_pep_obj=$ere_pep_db->get_Seq_by_id($dere); > my $ere_nuc_obj=$ere_cdna_db->get_Seq_by_id($dere); > my $mel_pep_obj=$mel_pep_db->get_Seq_by_id($dmel); > my $mel_nuc_obj=$mel_cdna_db->get_Seq_by_id($dmel); > my $sec_pep_obj=$sec_pep_db->get_Seq_by_id($dsec); > my $sec_nuc_obj=$sec_cdna_db->get_Seq_by_id($dsec); > my $sim_pep_obj=$sim_pep_db->get_Seq_by_id($dsim); > my $sim_nuc_obj=$sim_cdna_db->get_Seq_by_id($dsim); > my $yak_pep_obj=$yak_pep_db->get_Seq_by_id($ddyak); > my $yak_nuc_obj=$yak_cdna_db->get_Seq_by_id($ddyak); > push @prots, $ana_pep_obj; > push @cdna, $ana_nuc_obj; > push @prots, $ere_pep_obj; > push @cdna, $ere_nuc_obj; > push @prots, $mel_pep_obj; > push @cdna, $mel_nuc_obj; > push @prots, $sec_pep_obj; > push @cdna, $sec_nuc_obj; > push @prots, $sim_pep_obj; > push @cdna, $sim_nuc_obj; > push @prots, $yak_pep_obj; > push @cdna, $yak_nuc_obj; > > then I use the @prots as input for my $aln=$aln_factory->align > (\@prots); > This method will create align files with sequences masked. > > But if I use fasta files(not an object) which contain protein > sequences as input, $inputfile='FBgn0000097.pep'; > @params=('outorder'=>'INPUT'); > $factory=Bio::Tools::Run::Alignment::Clustalw->new(@params); > $aln=$factory->align($inputfile); > #$aln->gap_char('-'); > $aln->map_chars('\.','-'); > $aln_out=Bio::AlignIO->new(-file=>">0097.aln",-format=>'clustalw'); > $aln_out->write_aln($aln); > > This methods create files without masking~~~ > I think sequence objects created by "get_Seq_by_id" from sequence > databases directly are not appropriate. > > Thank you for your suggestion again! > > Jiang. > >> From: Chris Fields >> To: ????? >> Subject: Re: [Bioperl-l] AlignIO problems >> Date: Sun, 25 Feb 2007 21:26:34 -0600 >> >> I ran the same using a local fasta formatted file on my system >> which works (no masking). >> >> Of note, the gaps were all marked as '.'. You're gaps were both >> '.' and '-', which may mean that something is wrong with the seq >> objects themselves. Maybe SeqIO is misreading them? >> >> chris >> >> On Feb 25, 2007, at 7:34 PM, ????? wrote: >> >>> I use the Bio::Tools::Run::Alignment::Clustalw module to carry >>> out multiple alignment. >>> my code is: >>> my @clustal_param=('outorder'=>'INPUT'); >>> my $aln_factory=Bio::Tools::Run::Alignment::Clustalw->new >>> (@clustal_param); >>> my $aln=$aln_factory->align(\@prots);###@prots is >>> array of protein sequence objects >>> my $aln_out=Bio::AlignIO->new(-file=>">./dmel_group/ >>> clustal/ ${outfilename}.aln",-format=>'clustalw'); >>> >>> $aln_out->write_aln($aln); >>> This code produce alignment which mask identity residues. >>> But if i use clustalW directly, the output is normal. >>> Thank you for your help~ >>> >>> Jiang >> > > _________________________________________________________________ > ???? MSN Explorer: http://explorer.msn.com/lccn Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From michael.watson at bbsrc.ac.uk Mon Feb 26 16:00:31 2007 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Mon, 26 Feb 2007 16:00:31 -0000 Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna In-Reply-To: <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com> References: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk> <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2EBD3@iahce2ksrv1.iah.bbsrc.ac.uk> Hi Lincoln/List That's great, the axis now appears, but there are no labels. This in itself isn't a problem, as long as we can assume that the tick marks are at 0, 50% and 100%? If that's true, we can go with what we have, otherwise I'm going to have to figure out a way to label the y-axis Thanks Mick ________________________________ From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein Sent: 15 February 2007 18:53 To: michael watson (IAH-C) Cc: BioPerl-List Subject: Re: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna Hi Michael, When you set up the panel, do this: Bio::Graphics::Panel->new(-blah -blah, -pad_left => 20, -pad_right => 20); This will leave enough room on the left and right for you to see the Y axis. Otherwise it runs off the edge of the image (ok, this is a mis-design, but it was the only way to solve a chicken-and-egg problem about who gets to say how wide the panel is) Lincoln On 2/15/07, michael watson (IAH-C) wrote: Hi OK I have some great images out of this glyph, but I can't see the axis, and nor is it labelled (ie does it go from 0 - 100%?) so isn't great for publication. The docs say: "NOTE: -gc_window=>'auto' gives nice results and is recommended for drawing GC content. The GC content axes draw slightly outside the panel, so you may wish to add some extra padding on the right and left. " Any idea how to do this? Basically, I want a nice GC graph with the axis quite clearly labelled, and a nice "%GC" title next to it :) Thanks Mick The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Mon Feb 26 17:18:38 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 26 Feb 2007 11:18:38 -0600 Subject: [Bioperl-l] AlignIO problems In-Reply-To: References: Message-ID: <7DF958E6-E233-427F-8901-3FE571CD99BD@uiuc.edu> On Feb 26, 2007, at 9:59 AM, ? ?? wrote: > Thank you! > I have checked the sequences retrieved through lots of Bio:DB > objects work simultaneously. > There are not problems you mentioned, the sequences are not > overwritten. Again, keep this on the list. I have my hands full this month so I will be checking the list only very sporadically; someone else may be able to help you. The only explanation for the clustalw output you get is that you are not retrieving the correct sequence in some way fundamental way, which to me indicates the bug originates either in the way the sequences are retrieved (i.e. somehow via Bio::DB::Fasta, hence my thought about conflicting indices) or in the way they are converted via Bio::SeqIO, which is used in Bio::Tools::Run::Alignment::Clustalw. When I have used Bio::DB::Fasta in the past I have never had a problem when indexing multiple files and retrieving sequences, so beyond running tests with your data I can't help you much beyond the above conjecturing. chris From jason at bioperl.org Mon Feb 26 18:45:34 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 26 Feb 2007 10:45:34 -0800 Subject: [Bioperl-l] Question to Bio::Tools::Run RemoteBlast In-Reply-To: <20070226095515.68810@gmx.net> References: <20070226095515.68810@gmx.net> Message-ID: <2D2DF6D9-6DAE-4BB7-B31B-8C19CCCA7301@bioperl.org> Alex - I am glad to see of your interest in the module, but I don't currently have any time to maintain it and so queries should be sent to the BioPerl mailing list. In general we prefer you don't contact developers directly, but use the mailing list so that others can learn from questions. Please note there are several tutorials and documentation on the website, you will get a better response from people if you can show you have at least tried to use the existing example code to construct your program. -jason On Feb 26, 2007, at 1:55 AM, Alexander Auner wrote: > Daer Jason Stajich, > I hope you can me help. > > I am inspired of their module and would like to work with it. > I am a student to the TFH Wildau. > I have problems with the understanding of the module. > > You could send me an example. > > The example is to process a text file (FASTA) with NCBI-Blast (Web). > > Parameter: > Choose database -> Others -> nr > Limit by entrez query -> Campylobacter -> or select from: -> > Bacteria [ORGN] > Expect -> 10 > Other advanced -> -q-1 > > output format > plain text without Graphical Overview > Number of: -> Descriptions -> 10000 > Alignment view -> query-anchored with identities > > All other parameters remain undef. > > Thank you for your help. > > faithfully Alexander Auner > -- > "Feel free" - 5 GB Mailbox, 50 FreeSMS/Monat ... > Jetzt GMX ProMail testen: www.gmx.net/de/go/mailfooter/promail-out From jason at bioperl.org Mon Feb 26 19:13:00 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 26 Feb 2007 11:13:00 -0800 Subject: [Bioperl-l] BioPerl leadership additions Message-ID: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> Dear BioPerl Users and Developers, I want to announce a addition in the leadership of BioPerl. Christopher Fields and and Sendu Bala are now members of the BioPerl Core developer group to recognize their ongoing leadership in the project. Chris and Sendu were instrumental in the 1.5.2 Developer release and have made a significant commitment and contribution to the quality of the code and the documentation of the project. We have invited them to be part of the core to recognize their work and to feel comfortable to ask them to do more. ;-) The Core group was established to insure that someone was responsible for making code releases, vetting new developers for CVS write accounts, and generally dealing with things that might otherwise slip through the cracks. We are very excited to have more people contributing to and maintaining the toolkit. We look forward to their help along with all the other developers, as we work towards a 1.6 release release this year. As always, while their is a need for some individuals to lead the project, we encourage contributions from all levels of expertise to improve the code, documentation, and tutorials of the project. We plan to discuss the progress of the toolkit at this year's Bioinformatics Open Source Conference held in Vienna, Austria in conjunction with the SIG meetings at ISMB. We are trying to use BOSC 2007 as a chance for the developers of Open Bioinformatics Foundation sponsored and related projects to coordinate future development and release cycles. Jason Stajich on behalf of the Core developers From khan at cshl.edu Mon Feb 26 20:29:19 2007 From: khan at cshl.edu (Khan, Sohail) Date: Mon, 26 Feb 2007 15:29:19 -0500 Subject: [Bioperl-l] parsing a list of ids to a fasta file. Message-ID: Thanks Michael. I have the scripts installed. I can pass an id to indexed fasta file and retrieve the seq. However, I was wondering if I can pass a list of ids from a file and get seq. for all the ids? Thanks. -Sohail -----Original Message----- From: michael watson (IAH-C) [mailto:michael.watson at bbsrc.ac.uk] Sent: Tuesday, February 20, 2007 4:33 PM To: Khan, Sohail; Bioperl-l at lists.open-bio.org Subject: RE: [Bioperl-l] parsing a list of ids to a fasta file. Suggest you use Bio::Index::Fasta to create an index for the fasta file and then a simple script to retrieve sequences using that index. Or just use the pre-written bp_index.PLS and bp_fetch.PLS scripts. http://www.bioperl.org/wiki/Module:Bio::Index::Fasta ________________________________ From: bioperl-l-bounces at lists.open-bio.org on behalf of Khan, Sohail Sent: Tue 20/02/2007 8:42 PM To: Bioperl-l at lists.open-bio.org Subject: [Bioperl-l] parsing a list of ids to a fasta file. Dear list, I am new to Bio-Perl. I have the following question: I have a list of ids, which I would like to parse against a large fasta file to retrieve the Seq for parsed ids. I appreciate any suggestions. Thanks. Khan _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From arareko at campus.iztacala.unam.mx Mon Feb 26 21:44:49 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Mon, 26 Feb 2007 15:44:49 -0600 Subject: [Bioperl-l] BioPerl leadership additions In-Reply-To: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> References: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> Message-ID: <45E354D1.4070600@campus.iztacala.unam.mx> Congrats Chris & Sendu! Very well-deserved. Keep up the great work. Cheers! Mauricio. Jason Stajich wrote: > Dear BioPerl Users and Developers, > > I want to announce a addition in the leadership of BioPerl. > Christopher Fields and and Sendu Bala are now members of the BioPerl > Core developer group to recognize their ongoing leadership in the > project. Chris and Sendu were instrumental in the 1.5.2 Developer > release and have made a significant commitment and contribution to > the quality of the code and the documentation of the project. We > have invited them to be part of the core to recognize their work and > to feel comfortable to ask them to do more. ;-) > > The Core group was established to insure that someone was responsible > for making code releases, vetting new developers for CVS write > accounts, and generally dealing with things that might otherwise slip > through the cracks. We are very excited to have more people > contributing to and maintaining the toolkit. We look forward to > their help along with all the other developers, as we work towards a > 1.6 release release this year. > > As always, while their is a need for some individuals to lead the > project, we encourage contributions from all levels of expertise to > improve the code, documentation, and tutorials of the project. > > We plan to discuss the progress of the toolkit at this year's > Bioinformatics Open Source Conference held in Vienna, Austria in > conjunction with the SIG meetings at ISMB. We are trying to use > BOSC 2007 as a chance for the developers of Open Bioinformatics > Foundation sponsored and related projects to coordinate future > development and release cycles. > > Jason Stajich on behalf of the Core developers > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From lubapardo at gmail.com Tue Feb 27 13:26:30 2007 From: lubapardo at gmail.com (Luba Pardo) Date: Tue, 27 Feb 2007 14:26:30 +0100 Subject: [Bioperl-l] parsing blast results Message-ID: <58ff33550702270526q43ab584ci7baf6a09e2584e38@mail.gmail.com> Hi, I am using the module Bio::SearchIO to parse some blast results. I would like to store the ids of the results into an array but I am not sure if this is possible to do it with an existing subroutine. Does anyone have an idea whether there is a method included within the module Bio::SearchIO to do so? Thanks in advance, L.Pardo From cjfields at uiuc.edu Tue Feb 27 14:11:37 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 27 Feb 2007 08:11:37 -0600 Subject: [Bioperl-l] parsing blast results In-Reply-To: <58ff33550702270526q43ab584ci7baf6a09e2584e38@mail.gmail.com> References: <58ff33550702270526q43ab584ci7baf6a09e2584e38@mail.gmail.com> Message-ID: On Feb 27, 2007, at 7:26 AM, Luba Pardo wrote: > Hi, > I am using the module Bio::SearchIO to parse some blast results. I > would > like to store the ids of the results into an array but I am not > sure if this > is possible to do it with an existing subroutine. Does anyone have > an idea > whether there is a method included within the module Bio::SearchIO > to do so? > Thanks in advance, > L.Pardo Bio::SearchIO doesn't currently have a method to retrieve all the accessions in a BLAST result. The best way to do this is to iterate through the objects: my @accs; while (my $result = $searchio->next_result) { while (my $hit = $result->next_hit) { push @accs, $hit->accession; # do whatever else here... } } print join ',', @accs; I don't think all accessions in the description are parsed out at the moment, just the first one (or the one in the hit table). If you want all of them or if you want the NCBI GI you'll need to parse them out of the description heading ($hit->description). chris From sac at bioperl.org Tue Feb 27 17:59:22 2007 From: sac at bioperl.org (Steve Chervitz) Date: Tue, 27 Feb 2007 09:59:22 -0800 Subject: [Bioperl-l] [Bioperl-announce-l] BioPerl leadership additions In-Reply-To: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> References: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org> Message-ID: <8f200b4c0702270959n7521f693l915bfabe7ccb7ef7@mail.gmail.com> Welcome to the club, Chris & Sendu. Always good to have an infusion of new blood and capable, motivated hands. Steve On 2/26/07, Jason Stajich wrote: > > Dear BioPerl Users and Developers, > > I want to announce a addition in the leadership of BioPerl. > Christopher Fields and and Sendu Bala are now members of the BioPerl > Core developer group to recognize their ongoing leadership in the > project. Chris and Sendu were instrumental in the 1.5.2 Developer > release and have made a significant commitment and contribution to > the quality of the code and the documentation of the project. We > have invited them to be part of the core to recognize their work and > to feel comfortable to ask them to do more. ;-) > > The Core group was established to insure that someone was responsible > for making code releases, vetting new developers for CVS write > accounts, and generally dealing with things that might otherwise slip > through the cracks. We are very excited to have more people > contributing to and maintaining the toolkit. We look forward to > their help along with all the other developers, as we work towards a > 1.6 release release this year. > > As always, while their is a need for some individuals to lead the > project, we encourage contributions from all levels of expertise to > improve the code, documentation, and tutorials of the project. > > We plan to discuss the progress of the toolkit at this year's > Bioinformatics Open Source Conference held in Vienna, Austria in > conjunction with the SIG meetings at ISMB. We are trying to use > BOSC 2007 as a chance for the developers of Open Bioinformatics > Foundation sponsored and related projects to coordinate future > development and release cycles. > > Jason Stajich on behalf of the Core developers > > _______________________________________________ > Bioperl-announce-l mailing list > Bioperl-announce-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l > From cjfields at uiuc.edu Tue Feb 27 20:57:40 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 27 Feb 2007 14:57:40 -0600 Subject: [Bioperl-l] Bio::SeqIO::FTHelper Message-ID: Could anyone tell me what FTHelper is used for? From what I gather it rolls up seqfeature data into a lightweight object but then creates a SeqFeature::Generic anyway (at least for GenBank/EMBL/ Swiss), which seems to be a waste of memory and time. Is there something I'm missing (besides my sanity of course)? chris From Jay at jays.net Wed Feb 28 09:39:55 2007 From: Jay at jays.net (Jay Hannah) Date: Wed, 28 Feb 2007 03:39:55 -0600 Subject: [Bioperl-l] "Command-Line Bioinformatics" Message-ID: Reading this article: http://www.linuxjournal.com/article/6977 Sequencing the SARS Virus - Linux Journal, Nov 2003 This guy needs Perl and/or BioPerl. :) > The sequence file is in FASTA format consisting of a header line > and the sequence, split into fixed-width lines. The following > counts the number of Gs and Cs in the sequence and presents the > total as a fraction of the total number of bases: > > > grep -v "^>" AY274119.fa | fold -w 1 | > tr "ATGC" "..xx" | sort | uniq -c | > sed 's/[^0-9]//g' | t -s "\012" " " | > sed 's/\([0-9]*\) \([0-9]*\)/scale = 3; > ?\2 \/ (\1+\2)/' | > bc -i > scale = 3; 12127 / (17624+12127) > .407 > > Out of the 29,751 bases in our sequence, 12,127 are either G or C, > giving a GC content of 41%. BioPerl version: use Bio::SeqIO; my $io = Bio::SeqIO->new( -file => 'AY274119.fa', -format => 'Fasta' ); my $seq = $io->next_seq->seq; print ( ($seq =~ tr/GC/GC/) / length ($seq) ); Command-line Perl: perl -e '$/ = undef; $_ = <>; s/>.*//; s/\n//g; print tr/GC/GC/ / length($_)' AY274119.fa I'm sure you can Perl Golf my stabs at it. :) j seqlab.net http://www.bioperl.org/wiki/User:Jhannah From n.saunders at uq.edu.au Wed Feb 28 10:25:08 2007 From: n.saunders at uq.edu.au (Neil Saunders) Date: Wed, 28 Feb 2007 20:25:08 +1000 Subject: [Bioperl-l] Bio::Factory::EMBOSS, CGI and taint Message-ID: <45E55884.9010908@uq.edu.au> Dear Bioperlers, I'm trying to understand an error that occurs when Bio::Factory::EMBOSS is used in a CGI script. Using BioPerl 1.5.2 on Ubuntu Dapper, Apache 2.0.55, Perl 5.8.7. If I load this test CGI script (cgi.pl) in a browser: BEGIN CODE ---------- #!/usr/bin/perl -Tw use strict; use CGI; use Bio::Factory::EMBOSS; my $cgi = new CGI; my $f = new Bio::Factory::EMBOSS; print $cgi->header, $cgi->start_html, $cgi->end_html; -------- END CODE I get a 500 server error and the Apache error log reads: [error] [client 192.168.0.3] Premature end of script headers: cgi.pl I can fix this in 2 ways: (1) Move the "my $f = new Bio::Factory::EMBOSS" line to the end of the script, which isn't a very useful fix. (2) Remove the -T switch from the shebang line There seem to be a few old posts on the list regarding "taint-safe" modules. It seems that the new Bio::Factory::EMBOSS object is interfering with the headers in some way, but I'm no CGI.pm guru and wondered if anyone could shed light on this. thanks, Neil -- School of Molecular and Microbial Sciences University of Queensland Brisbane 4072 Australia http://nsaunders.wordpress.com From n.saunders at uq.edu.au Wed Feb 28 10:30:31 2007 From: n.saunders at uq.edu.au (Neil Saunders) Date: Wed, 28 Feb 2007 20:30:31 +1000 Subject: [Bioperl-l] more on Bio::Factory::EMBOSS, CGI and taint Message-ID: <45E559C7.1090308@uq.edu.au> Further to my previous email, adding: BEGIN { $|=1; print "Content-type: text/html\n\n"; use CGI::Carp('fatalsToBrowser'); } to my CGI script generates: Insecure $ENV{PATH} while running with -T switch at /usr/local/share/perl/5.8.7/Bio/Factory/EMBOSS.pm line 251. Neil -- School of Molecular and Microbial Sciences University of Queensland Brisbane 4072 Australia http://nsaunders.wordpress.com From n.saunders at uq.edu.au Wed Feb 28 10:50:58 2007 From: n.saunders at uq.edu.au (Neil Saunders) Date: Wed, 28 Feb 2007 20:50:58 +1000 Subject: [Bioperl-l] CGI taint solved Message-ID: <45E55E92.10608@uq.edu.au> Apologies for running a one-man thread, but I realised that I've now answered my own question regarding errors with CGI, Bio::Factory::EMBOSS and taint. Given that the EMBOSS binaries are in /usr/local/bin, adding: $ENV{'PATH'} = '/usr/local/bin' near the top of the script does the trick. Neil -- School of Molecular and Microbial Sciences University of Queensland Brisbane 4072 Australia http://nsaunders.wordpress.com From cjfields at uiuc.edu Wed Feb 28 13:39:24 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Feb 2007 07:39:24 -0600 Subject: [Bioperl-l] CGI taint solved In-Reply-To: <45E55E92.10608@uq.edu.au> References: <45E55E92.10608@uq.edu.au> Message-ID: That could possibly clobber any other program calls from within the same script (unless they reside in /usr/local/bin) since you're explicitly assigning PATH, not appending: $ENV{"PATH"} = '/usr/local/bin'; gets me (printing $ENV{"PATH"}): /usr/local/bin whereas this: $ENV{"PATH"} = '/usr/local/bin:' . $ENV{"PATH"}; gets me: /usr/local/bin:/Users/cjfields/bin:/Users/cjfields/dart/bin:/usr/ local/mysql/bin:/usr/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin There's probably a File::* module that does this safely per OS flavor. chris On Feb 28, 2007, at 4:50 AM, Neil Saunders wrote: > Apologies for running a one-man thread, but I realised that I've > now answered my > own question regarding errors with CGI, Bio::Factory::EMBOSS and > taint. > > Given that the EMBOSS binaries are in /usr/local/bin, adding: > > $ENV{'PATH'} = '/usr/local/bin' > > near the top of the script does the trick. > > > Neil > -- > School of Molecular and Microbial Sciences > University of Queensland > Brisbane 4072 Australia > > http://nsaunders.wordpress.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From stefan.kirov at bms.com Wed Feb 28 15:35:31 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Wed, 28 Feb 2007 10:35:31 -0500 Subject: [Bioperl-l] CGI taint solved In-Reply-To: References: <45E55E92.10608@uq.edu.au> Message-ID: <45E5A143.3080303@bms.com> Neil, I believe this is your situation: http://wn.cyberwerks.com/2000/0411.html my advice: any commands executed from within cgi script should have a path hardcoded whenever possible. If those commands require different path, try writing a wrapper shell script that sets the environment (which should be reset to the default once the shell script terminates). It all also depends on the type of environment you have- it it is not secure you may wish to think hard how to eliminate all security loopholes with CGI, I am definitely not an expert on this. Stefan Chris Fields wrote: > That could possibly clobber any other program calls from within the > same script (unless they reside in /usr/local/bin) since you're > explicitly assigning PATH, not appending: > > $ENV{"PATH"} = '/usr/local/bin'; > > gets me (printing $ENV{"PATH"}): > > /usr/local/bin > > whereas this: > > $ENV{"PATH"} = '/usr/local/bin:' . $ENV{"PATH"}; > > gets me: > > /usr/local/bin:/Users/cjfields/bin:/Users/cjfields/dart/bin:/usr/ > local/mysql/bin:/usr/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin > > There's probably a File::* module that does this safely per OS flavor. > > chris > > On Feb 28, 2007, at 4:50 AM, Neil Saunders wrote: > > >> Apologies for running a one-man thread, but I realised that I've >> now answered my >> own question regarding errors with CGI, Bio::Factory::EMBOSS and >> taint. >> >> Given that the EMBOSS binaries are in /usr/local/bin, adding: >> >> $ENV{'PATH'} = '/usr/local/bin' >> >> near the top of the script does the trick. >> >> >> Neil >> -- >> School of Molecular and Microbial Sciences >> University of Queensland >> Brisbane 4072 Australia >> >> http://nsaunders.wordpress.com >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From lubapardo at gmail.com Wed Feb 28 17:21:07 2007 From: lubapardo at gmail.com (Luba Pardo) Date: Wed, 28 Feb 2007 18:21:07 +0100 Subject: [Bioperl-l] retrieven ids Message-ID: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com> Hi everyone, I wonder if someone could give an advice of the following: I want to retrieve the DNA coding sequence of a RefSeq protein id. I do not want to translate the protein back to DNA, but rather get the DNA coding sequence ID and then retrieve the DNA sequence from Gen Bank. Is there any module that allow to get all possible ids for a sequence given a gi protein ? Thank you very much in advance, L. Pardo From johnston at biochem.ucl.ac.uk Wed Feb 28 17:05:49 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Wed, 28 Feb 2007 17:05:49 +0000 (GMT) Subject: [Bioperl-l] _rearrange Message-ID: hi, Is there a discussion of the rationale behind the _rearrange method somewhere? I'm probably just being gormless, but I think I'm missing the point a bit. Is it okay for a method just to expect named params like ->foo(arg1=>'stuff', arg2=>'things'); ? Cxx From ckuanglim at yahoo.com Wed Feb 28 15:51:50 2007 From: ckuanglim at yahoo.com (Chan Kuang Lim) Date: Wed, 28 Feb 2007 07:51:50 -0800 (PST) Subject: [Bioperl-l] Problem of Installing Bioperl Message-ID: <459942.77644.qm@web60518.mail.yahoo.com> I have problem of installing bioperl in windows using command-line installation. In the cmd windows, after ppm-shell search bioperl install 2 many downloading had done, but the next line is: Unpacking bioperl-1.5.2_100...ppm install failed: Can't extract files from C:.............../Bioperl-1.5.2_100.tgz Hope you can answer my question. Thank you. Regards, Chan Kuang Lim Malaysia --------------------------------- TV dinner still cooling? Check out "Tonight's Picks" on Yahoo! TV. From cjfields at uiuc.edu Wed Feb 28 18:30:45 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Feb 2007 12:30:45 -0600 Subject: [Bioperl-l] _rearrange In-Reply-To: References: Message-ID: <25C736A2-2DCA-413B-8F92-D799F583515B@uiuc.edu> From what I gather it's a convenient utility method that is used for consistent and enforced parameter checking/setting for any method, including the constructor. There are a few modules that don't use _rearrange (Bio::WebAgent::new () comes to mind). It's not required that you use it but the naming conventions for parameters outlined in _rearrange (in Bio::Root::RootI POD) are generally enforced for consistency across classes. As a note, Sendu has committed a related method (_set_from_args) to CVS which works rather well, but I don't think it is in the last release. chris On Feb 28, 2007, at 11:05 AM, Caroline Johnston wrote: > hi, > > Is there a discussion of the rationale behind the _rearrange method > somewhere? I'm probably just being gormless, but I think I'm > missing the > point a bit. > > Is it okay for a method just to expect named params like > ->foo(arg1=>'stuff', arg2=>'things'); ? > > Cxx > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dmessina at wustl.edu Wed Feb 28 19:31:29 2007 From: dmessina at wustl.edu (Dave Messina) Date: Wed, 28 Feb 2007 13:31:29 -0600 (CST) Subject: [Bioperl-l] retrieven ids In-Reply-To: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com> References: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com> Message-ID: <57419.10.0.7.57.1172691089.squirrel@gscmail.wustl.edu> Whenever I'm unsure of how to do something, I first look to see if one of the HOWTOs on bioperl.org covers it. In this case, the Features HOWTO has example code which I think will do what you want. Genbank records typically have the coding sequence of a protein as a feature, so I would do something like: - use the RefSeq protein IDs to query Entrez and get back the Genbank records. - read the Features HOWTO to refresh my memory on the syntax for grabbing features. That HOWTO is at: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation - whip up a little script to loop through the Genbank records one at a time with SeqIO and pull out the cDNA sequence features. Dave From bix at sendu.me.uk Wed Feb 28 19:38:46 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 28 Feb 2007 19:38:46 +0000 Subject: [Bioperl-l] _rearrange In-Reply-To: References: Message-ID: <45E5DA46.3020503@sendu.me.uk> Caroline Johnston wrote: > hi, > > Is there a discussion of the rationale behind the _rearrange method > somewhere? I'm probably just being gormless, but I think I'm missing the > point a bit. > > Is it okay for a method just to expect named params like > ->foo(arg1=>'stuff', arg2=>'things'); ? The Bioperl style for named args is -arg1, and wrong case is allowed as well. So, make use of _rearrange; it won't do you any harm. From johnsonm at gmail.com Wed Feb 28 19:59:09 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Wed, 28 Feb 2007 13:59:09 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer Message-ID: I happen to need something like Bio::Tools::Run::Genemark, so I'm coding one up. When I started on the tests for it, I realized I have a problem. I can distribute a fasta file downloaded from GenBank to use as input, but I can't distribute the model file needed to actually run Genemark ( Genemark.hmm for prokaryotes, gmhmmp, in my case). It took *forever* to get a license, and I'm not thrilled with the prospect of talking them out of a redistributable model file. I'd love to distribute the test, but I don't see how I'm going to be able to. Suggestions? Also, I've settled on IPC::Run instead of system(). The docs indicate the bits of it I'm using should be OK on Windows, except maybe for Win9X. I don't want to clutter up the console, I don't like embedding stdout/stderr redirection in command strings, and I don't want to have to worry about signal handling (What if the child catches a ctrl-c halfway through parsing? What if the parent does?). Anybody object to that? One final thing. I'm lazy, I don't want to deal with parsing arguments to the constructor, so I'm just calling _rearrange() to deal with it. The Bio::Tools:: parsers all take dash options, but it looks like a bunch of the stuff in Bio::Tools::Run:: takes dashless args. Objections? From dmessina at wustl.edu Wed Feb 28 20:14:56 2007 From: dmessina at wustl.edu (Dave Messina) Date: Wed, 28 Feb 2007 14:14:56 -0600 (CST) Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: References: Message-ID: <57538.10.0.7.57.1172693696.squirrel@gscmail.wustl.edu> > I'm not thrilled with the prospect of talking them out of a redistributable > model file. I suppose it's not possible to fake your own, or at least the parts of it you're testing for? If not, I'd put the tests in a skip block while waiting to hear from the Genemark folks. > The Bio::Tools:: parsers all take dash options, but it looks like a bunch of > the stuff in Bio::Tools::Run:: takes dashless args. Objections? Sendu will chime in I'm sure, but I think he was planning to switch everything in Bio::Tools::Run over to dashed args anyway... Dave From bix at sendu.me.uk Wed Feb 28 20:52:23 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 28 Feb 2007 20:52:23 +0000 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: References: Message-ID: <45E5EB87.9020106@sendu.me.uk> Mark Johnson wrote: > One final thing. I'm lazy, I don't want to deal with parsing arguments > to the constructor, so I'm just calling _rearrange() to deal with it. The > Bio::Tools:: parsers all take dash options, but it looks like a bunch of the > stuff in Bio::Tools::Run:: takes dashless args. Objections? You can make use of _set_from_args(). See Bio::Tools::Run::Phylo::Gumby for an example. From bix at sendu.me.uk Wed Feb 28 21:29:32 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 28 Feb 2007 21:29:32 +0000 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails Message-ID: <45E5F43C.9080902@sendu.me.uk> I have GD 2.35 and GD::SVG 2.33 installed. I have a working script in which a Bio::Graphics::Panel object is made and output with: print $panel->png; This is fine. Changing it to: print $panel->svg; Gives the error: Can't locate object method "svg" via package "GD:Image" at /.../Bio/Graphics/Panel.pm line 971, line 192. Am I supposed to do something else to get this to work? Cheers, Sendu. From crabtree at tigr.ORG Wed Feb 28 21:40:52 2007 From: crabtree at tigr.ORG (Jonathan Crabtree) Date: Wed, 28 Feb 2007 16:40:52 -0500 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <45E5F43C.9080902@sendu.me.uk> References: <45E5F43C.9080902@sendu.me.uk> Message-ID: <45E5F6E4.80003@tigr.org> Sendu- I believe you must set 'image_class' to 'GD::SVG' when you create the Panel (and note that older versions of Bio::Graphics::Panel don't know anything about this parameter.) Here's the relevant part of the Panel perldoc: -image_class To create output in scalable vector graphics (SVG), optionally pass the image class parameter 'GD::SVG'. Defaults to using vanilla GD. See the corresponding image_class() method below for details. Jonathan Sendu Bala wrote: > I have GD 2.35 and GD::SVG 2.33 installed. > > I have a working script in which a Bio::Graphics::Panel object is made > and output with: > > print $panel->png; > > This is fine. Changing it to: > > print $panel->svg; > > Gives the error: > > Can't locate object method "svg" via package "GD:Image" at > /.../Bio/Graphics/Panel.pm line 971, line 192. > > > Am I supposed to do something else to get this to work? > > > Cheers, > Sendu. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bix at sendu.me.uk Wed Feb 28 22:01:17 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 28 Feb 2007 22:01:17 +0000 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <45E5F6E4.80003@tigr.org> References: <45E5F43C.9080902@sendu.me.uk> <45E5F6E4.80003@tigr.org> Message-ID: <45E5FBAD.3030404@sendu.me.uk> Jonathan Crabtree wrote: > > Sendu- > > I believe you must set 'image_class' to 'GD::SVG' when you create the > Panel (and note that older versions of Bio::Graphics::Panel don't know > anything about this parameter.) Here's the relevant part of the Panel > perldoc: ... Oh! I had no idea there was any perldoc for these modules, hiding down there at the bottom. Does anyone want to intersperse the docs?... From cjfields at uiuc.edu Wed Feb 28 22:10:54 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Feb 2007 16:10:54 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: References: Message-ID: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> On Feb 28, 2007, at 1:59 PM, Mark Johnson wrote: > I happen to need something like Bio::Tools::Run::Genemark, so > I'm coding > one up. When I started on the tests for it, I realized I have a > problem. I > can distribute a fasta file downloaded from GenBank to use as > input, but I > can't distribute the model file needed to actually run Genemark ( > Genemark.hmm for prokaryotes, gmhmmp, in my case). > It took *forever* to get a license, and I'm not thrilled with the > prospect of talking them out of a redistributable model file. I'd > love to > distribute the test, but I don't see how I'm going to be able to. > Suggestions? For bioperl-run tests you have to have the program installed for tests to work (otherwise they are passed over). Therefore one would assume if you had the GeneMark program you would have the models as well. You could set up your module to require an env. variable be set (like the HMMER module, for instance) which contains the executables and/or the models, so that if it isn't set the tests are skipped. > Also, I've settled on IPC::Run instead of system(). The docs > indicate > the bits of it I'm using should be OK on Windows, except maybe for > Win9X. > I don't want to clutter up the console, I don't like embedding > stdout/stderr > redirection in command strings, and I don't want to have to worry > about > signal handling (What if the child catches a ctrl-c halfway through > parsing? What if the parent does?). Anybody object to that? I wouldn't worry too much about Win9x. Is IPC::Run in perl core? Otherwise we'll need to add it to the optional dependencies for bioperl-run. > One final thing. I'm lazy, I don't want to deal with parsing > arguments > to the constructor, so I'm just calling _rearrange() to deal with > it. The > Bio::Tools:: parsers all take dash options, but it looks like a > bunch of the > stuff in Bio::Tools::Run:: takes dashless args. Objections? Sendu's suggestion (_set_from_args() ) is the best. As mentioned in another thread _rearrange() works as well. chris From johnsonm at gmail.com Wed Feb 28 22:29:36 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Wed, 28 Feb 2007 16:29:36 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: <57538.10.0.7.57.1172693696.squirrel@gscmail.wustl.edu> References: <57538.10.0.7.57.1172693696.squirrel@gscmail.wustl.edu> Message-ID: On 2/28/07, Dave Messina wrote: > > > I'm not thrilled with the prospect of talking them out of a > redistributable model file. > > I suppose it's not possible to fake your own, or at least the parts of it > you're testing for? We got a gzipped tarball with some model files and a precompiled executable (gmhmmp). As far as building a model file goes, I don't even have two sticks to rub together. I suppose it's possible that it's not actually some weird proprietary format, I'll go dig for some docs...but I don't hold out a lot of hope. From sukhinder.sandhu at osumc.edu Wed Feb 28 21:49:31 2007 From: sukhinder.sandhu at osumc.edu (Sukhinder Sandhu) Date: Wed, 28 Feb 2007 16:49:31 -0500 Subject: [Bioperl-l] Problem installing bioperl: plz reply soon. thx Message-ID: Hi I am having trouble installing Bundle::BioPerl through CPAN. I don't know if this has something to do with my having root priveleges. Can you please suggest how may I proceed to get over this. I shall really appreciate any help. I am pasting part of the error it keeps giving after trying to install every module. ###################### CPAN.pm: Going to build G/GA/GAAS/HTML-Parser-3.56.tar.gz make: *** No rule to make target `/System/Library/Perl/5.8.6/darwin-thread-multi-2level/CORE/config.h', needed by `Makefile'. Stop. /usr/bin/make -- NOT OK Running make test Can't test without successful make Running make install make had returned bad status, install seems impossible ############################### Thanks sukhinder From sukhinder.sandhu at osumc.edu Wed Feb 28 04:41:43 2007 From: sukhinder.sandhu at osumc.edu (Sukhinder Sandhu) Date: Tue, 27 Feb 2007 23:41:43 -0500 Subject: [Bioperl-l] Problem installing bioperl-1.5.2_102 Message-ID: Hi I am trying to install bioperl on my MACOSX and having problems. I try to following the instructions both at the www.tc.umn.edu..... And in the README and INSTALL files in the bioperl folder that I downloaded. The error I get is the following: (end part of the output is copied) #################### t/versions........ok t/xs..............skipped all skipped: C_support not enabled Failed Test Stat Wstat Total Fail Failed List of Failed ---------------------------------------------------------------------------- --- t/compat.t 5 1280 60 5 8.33% 25-28 31 4 tests and 31 subtests skipped. Failed 1/22 test scripts, 95.45% okay. 5/683 subtests failed, 99.27% okay. make: *** [test] Error 2 /usr/bin/make test -- NOT OK Running make install make test had returned bad status, won't install without force Couldn't install Module::Build, giving up. BEGIN failed--compilation aborted at ModuleBuildBioperl.pm line 51. Compilation failed in require at Build.PL line 14. BEGIN failed--compilation aborted at Build.PL line 14. ########################################################################### I am not able to figure out whats' going wrong. And when I try to run the CPAN, I get the follwing error. I have no idea how to fix these. Any help is greatly appreciated. ############################################################################ [Sukhinders-Computer:~/Desktop/bioperl-1.5.2_102] sand60% perl -MCPAN -e shell Terminal does not support AddHistory. There seems to be running another CPAN process (pid 7207). Contacting... Lockfile /Users/sand60/.cpan/.lock not writeable by you. Cannot proceed. On UNIX try: rm /Users/sand60/.cpan/.lock and then rerun us. at -e line 1 ################################################### And doing what it says, removing some lock file doesn't help. I am wondering if all this has something to do with having root priveleges on the system and if so , is there an alternative? Thanks sukhinder From stefan.kirov at bms.com Wed Feb 28 21:44:05 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Wed, 28 Feb 2007 16:44:05 -0500 Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails In-Reply-To: <45E5F43C.9080902@sendu.me.uk> References: <45E5F43C.9080902@sendu.me.uk> Message-ID: <45E5F7A5.3090805@bms.com> I think you should create the object with -image_class='svg'. Can you post the code with wich you create the object? Stefan Sendu Bala wrote: > I have GD 2.35 and GD::SVG 2.33 installed. > > I have a working script in which a Bio::Graphics::Panel object is made > and output with: > > print $panel->png; > > This is fine. Changing it to: > > print $panel->svg; > > Gives the error: > > Can't locate object method "svg" via package "GD:Image" at > /.../Bio/Graphics/Panel.pm line 971, line 192. > > > Am I supposed to do something else to get this to work? > > > Cheers, > Sendu. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From johnsonm at gmail.com Wed Feb 28 22:54:02 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Wed, 28 Feb 2007 16:54:02 -0600 Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark and Glimmer In-Reply-To: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> References: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu> Message-ID: On 2/28/07, Chris Fields wrote: > For bioperl-run tests you have to have the program installed for > tests to work (otherwise they are passed over). Therefore one would > assume if you had the GeneMark program you would have the models as > well. > > You could set up your module to require an env. variable be set (like > the HMMER module, for instance) which contains the executables and/or > the models, so that if it isn't set the tests are skipped. Sounds like a plan. I wouldn't worry too much about Win9x. Is IPC::Run in perl core? > Otherwise we'll need to add it to the optional dependencies for > bioperl-run. I'd test it, but I don't have access to any Win9x boxes anymore. IPC::Run is not a core module, but I think it's worth the dependency. I considered IPC::Open3, but it can't be made reliable on Win32, something about not being able to select() on filehandles, only sockets. I also looked at IPC::Run3, but under the hood, it's just got STDOUT/STDERR redirection layered on top of system(). I don't like using system() due to issues with signals (Such as the user hitting ctrl-c and taking out the child). I feel better knowing the wrapped executable is in another process disconnected from the console. Sendu's suggestion (_set_from_args() ) is the best. As mentioned in > another thread _rearrange() works as well. I'm using _rearrange() now. I'll look at _set_from_args(). Is either one preferred to the other?